Thread Manager Description

Thread Manager

Revision History
About This Document
Overview
Architecture
Usage Scenarios
References

Revision History

Version	Version Information	Date
Initial version	Nadya Morozova, Andrey Chernyshev: document created.	June 5, 2006
Update 1	Nadya Morozova, Pavel Rebriy: update of graphics and text, Java* layer is now part of VM core, Porting component restructured, all scenarios rewritten	March 20, 2008

About This Document

Purpose

This document introduces the thread manager component delivered as part of the DRL (Dynamic Runtime Layer) initiative. This document focuses on the specifics of the current implementation showing the thread manager role inside the DRL virtual machine, and the internal organization of the thread management subsystem.

Intended Audience

The target audience for the document includes a wide community of engineers interested in further work with threading technologies to contribute to their development. The document assumes that readers are familiar with DRLVM architecture basics, threading methodologies and structures.

Documentation Conventions

This document uses the unified conventions for the DRL documentation kit.

Using This Document

Use this document to learn all about implementation specifics of the current version. It describes the thread manager functionality in a variety of aspects, including internal data structures, architecture specifics, and the key usage scenarios involving the thread manager. The document has the following major parts:

Overview gives the general definition of the thread manager component and its role in the VM architecture.
Architecture describes the internal structure of the thread manager, its data structures and the interfaces it exports.
Usage scenarios demonstrate major thread-related operations, such as the thread life cycle and thread suspension.
References are links to materials relevant to this description.

Overview

The thread manager (TM) is a library aimed to provide threading capabilities for Java* virtual machines. The main purpose of TM is to build a bridge between the POSIX-like threading model [5] provided by the operating system, and the Java*-like threading model implied by the J2SE specification [1].

In the current implementation, the JVM threading subsystem consists of three different layers:

The porting layer interacting with the operating system
The native layer providing basic threading functionality
The Java* layer interacting with the Java* objects of the user application

Note that the thread manager consists of the native and Java* layers of the subsystem, whereas as the porting layer is external.

Each layer adds certain functionality to the threading provided by the underlying layer. That is, the porting layer adds portability to the threading provided by OS, the native layer adds Java*-specific enhancements to the porting layer, and the Java* layer adds a connection to Java* threads and objects to the native layer, as shown in Figure 1 below. These interfaces are grouped in a set of headers described in the Exported Interfaces section below.

Basic layers in the thread manager

Figure 1: Threading Subsystem

Key Features

The supplied thread manager has the following characteristics:

Support for the threading functionality required by J2SE API [1], JVMTI [2] and JNI [3] specifications
Portable implementation mostly based on DRLVM and Apache Porting Layers [4]
Compliance with the Harmony hythread interface [8]
Support for the garbage collector
Optimizations specific to the just-in-time (JIT) compiler supplied with DRLVM

Thread Manager in DRLVM

Figure 2 below demonstrates the interaction of the thread manager with the following components of the virtual machine:

The VM core to access information on object layout and for binding between java.lang.Thread objects and appropriate native threads.
The garbage collector (GC) to serve thread manipulation requests for root set enumeration and garbage collection activities. GC works with the native and Java* layers of the thread manager.
The porting layer to interact with the underlying system and enable portability for threading. The TM native layer queries functions of the PORT and APR interfaces.
The just-in-time compiler (JIT) to provide optimized threading functions, called VM helpers, for JIT-compiled code. The thread manager exports this functionality via the thread_helpers.h interface of the Java* layer.

Thread Manager and other VM components

Figure 2: Thread Manager in VM Architecture

Portability

The thread manager code is mostly platform-independent and relies on the underlying porting layer to adjust to platform specifics. The current TM implementation is written on top of DRLVM and Apache Porting Layers (APR). The platform-dependent TM parts are the VM helpers' package, which is tied to the specific architecture, and the porting layer extensions package, which is partially tied with the OS API.

DRLVM-based and APR-based porting layers enable compilation of the thread manager code on every platform where porting is available. The current version of the thread manager supports the Linux* and Windows* OSes on x86 and x86_64 platforms, and Linux* OS on IA-64 platforms.

Architecture

Subsequent sections describe the functional interfaces that the thread manager exports to interact with other VM components and its internal data structures.

Exported Interfaces

As indicated in the overview, the thread manager exports the native and the Java* interfaces. These interfaces are represented as groups of functions providing specific functionality upon external requests, as described in the subsequent sections.

Native Interface

The native interface is inspired by the Harmony hythread module. This is a low-level layer that provides Java*-like native threading functionality, such as interruption support for waiting operations (for example, wait, park, join and sleep) and helps establish correct interaction of threads with the garbage collector. This layer does not deal with Java* objects.

The native interface consists of the following function sets:

open/hythread.h

Consists of functions of the hythread set [8] responsible for the following:

Basic manipulation
Parking
Thread local storage support
Read-write mutex support
Monitors support

open/hythread_ext.h

Includes the description of native thread structures and the set of functions extending the hythread set responsible for the following:

Thread manager initialization and shutdown
Thread groups support
Conditional variable
Safe suspension support
Latch
Thread iterator support
Attributes access
Querying state of the thread
Semaphore
Mutex
Thin monitors support
Querying the thread state
Thread attributes access
Interruption support
Task management support

Java* Interface

The Java* interface connects the threading functionality provided by the native layer to Java* threads and objects.

The functions of the Java* interface take Java* objects as parameters and can be easily used to implement kernel classes, JNI or JVMTI function sets. The Java* interface consists of the following parts:

open/jthread.h

Functions supporting java.lang.Object and java.lang.Thread API responsible for:

Basic manipulation
Identification
Pointer conversion
Attributes access
Interruption
Monitors
Parking
Suspension

open/ti_thread.h

Functions supporting various JVMTI functions and the java.lang.management API responsible for:

State query
Instrumentation
Local storage
Monitor info
CPU timing
Peak count
Raw monitors

VM core interface

thread_manager.h

Includes the description of thread structures and functions providing accesstors to them:

Accessors to Java* TM data
Attacher/detacher Java* thread.
Convertors from Java* to native and vise versa.

thread_helpers.h

Consists of functions providing the assembly code stubs that help to optimize the performance due to tighter TM and JIT integration:

Generator for Thread Local Storage (TLS) accessor
Generators for monitor functions

Data Structures

The thread manager data structures are typically not exposed: external VM components access these structures via opaque pointers instead. The pointers are defined in the public header files hythread.h, jthread.h and ti_thread.h. Structures themselves are described in the hythread_ext.h, and thread_manager.h files.

Thread Control Structures

The thread manager requires each thread to be registered before threading functions can be called. Thread registration is called attaching a thread and can be done by using one of the following:

Function hythread_attach() registers the current native thread in the thread manager, so that threading operations can be performed over this thread via the native layer. Calling hythread_attach() involves call to port_thread_detach() to register current thread in the porting layer.
Function hythread_detach() unregisters the current native thread. This involves call to port_thread_detach() to unregister current thread from the porting layer.
Function jthread_attach() associates the current Java* thread with the appropriate java.lang.Thread object, so that threading operations can be performed over this thread via the Java* layer.
Function jthread_detach() disjoins the current Java* thread from the thread manager.

Depending on the attaching function, the thread manager operates with two types of threads:

Native thread attached to the native layer of the thread manager
Java* thread attached to the Java* layer of the thread manager and associated with a java.lang.Thread object

Each thread type has a structure assigned to it that holds thread-specific data, as described below.

Other VM components work with opaque handles to those structures and have no information about their contents. This way, to work with a thread, a component calls one of the attaching functions, receives an opaque handle to the thread control structure for the thread, and performs whatever operations with this thread using this opaque handle.

Native Thread Structure

When registered with the thread manager’s native layer, each thread obtains a control structure with all thread-specific data required for operations with the thread, such as state, attributes, references to OS-specific thread structures, and synchronization aids. The control structure is subsequently used for miscellaneous threading operations.

Note

The actual content of a thread control structure is implementation-specific and is not exposed to other components.

For details on thread control structures, see Doxygen documentation hosted on the website.

Java* Thread Structure

A thread control structure of a Java* thread is defined by the JVMTIThread type and holds mostly JVMTI information specific to that thread, as shown in Figure 3 below.

Structure of the Java* Attached thread

Figure 3: Java* Attached Thread

For details on thread control structures, see Doxygen documentation hosted on the website.

Thread Groups

The thread manager enables co-existence of multiple groups of threads, for example, groups of Java* threads and GC threads not visible for Java* applications. Each thread maintained by the thread manager belongs to a specific thread group, as shown in Figure 4.

Threads distributed into thread groups, for example a group of Java* threads and a group of GC threads

Figure 4: Thread Groups

The thread manager provides a set of functions for iterating over the list of threads within a specific group. All threads are organized in a group array and a specific system-wide lock is used to prevent concurrent modifications of the groups array and the thread list inside the group. This lock is acquired internally during thread creation, deletion and iteration over the thread list.

Synchronizers

The thread manager synchronizers are functional modules used for thread synchronization. Certain synchronizers have internal data structures associated with them, others can only delegate function calls to the appropriate synchronizers provided by the porting layer. The current implementation of synchronizers within the thread manager is based on two fundamental primitives: the conditional variable and the lock, as shown in Figure 5.

Implementing the thread synchronizer in TM

Figure 5: Components of the TM Synchronizer

The elements in the figure have the following meaning:

The conditional variable and lock are basic primitives provided by the porting layer.
The TM conditional variable and TM lock wrap appropriate porting primitives by adding the wait interruption support. These synchronizers also ensure that a thread enters the safe suspension mode when it is put into a wait state using the conditional variable or when the thread is blocked while acquiring a lock.
The thin monitor is an inflatable lock coupled with the condition variable. This combination serves as a base for building Java* monitors.
The semaphore is the same as the POSIX semaphore, and also enables specifying the count limit.
The Java* monitor is the same as java.lang.Object.
The JVMTI raw monitor is the monitor defined in the JVMTI specification.
The park and unpark lock support primitives are used in the java.util.concurrent package.

The above hierarchy is optimized for porting code re-use. Other implementations of the Thread Manager component are also possible and can utilize a different set of porting synchronizers.

Note

The thread manager does not expose the internal structures of synchronizers to the external components. All synchronizers are referenced by means of opaque handles similarly to thread control structures.

Monitors

The current version of the thread manager implements Java* monitors in a specific way to address the common problem of space comsumption. The DRL thread manager has a special type of monitor, thin monitor, holding the lock optimized for space consumption and single-threaded usage.

Inflation Technique

Monitor inflation is implemented using a thin-fat lock technique [6], which works as follows:

In the absence of thread contention, lock data are stored in a few bytes, so that the lock can be allocated directly within the Java* object.
Whenever contention takes place, the bytes allocated for lock data hold a reference to the fat lock, which can be conventional mutex.

Different implementations of thin monitors are free to choose any space compaction or other optimization techniques (or none at all). However, the general recommendation is to use thin monitors when memory needs to be saved and a thread contention is not expected to be high. It is also recommended that the conventional mutex and conditional variables be used to achieve the better scalability in case of high contention. Java* monitors in the thread manager are built on top of thin monitors. This enables the thread manager to allocate the lock structure for thin monitors directly in the Java* objects and thus makes Java* monitors space usage more efficient.

Monitor Structure

The thin monitor is a synchronizer primitive that implements the lock compression technique and serves as a base for building Java* monitors [6]. In other words, the thin monitor resides in the native layer of the TM subsystem and has no data on Java* objects. Java* monitors are tightly coupled with Java* objects and reside on the higher Java* level of the TM subsystem.

The central point of the synchronizer is the lock word, which holds the thin lock value or a reference to the fat lock depending on the contention.

In the absence of contention, the lock type is zero, and the lock word has the following structure:

uninflated loack

Figure 6: Lock Word Structure: Contention Bit is 0

Contention bit : 0 indicating that absence of contention
Thread ID (15 bits): the ID of the owning thread, or 0 if the lock is free
Recursion count: the number of times that the lock has been acquired by the same thread minus 1
Reservation bit: the flag indicating whether the lock is reserved by a thread [7]
Rightmost 10 bits unused in TM and reserved for storing the hash codes of Java* objects

In the presence of contention, the contention bit is set to 1, and a thin compressed lock becomes a fat inflated lock with the following layout:

inflated lock

Figure 7: Lock Word Structure: Contention Bit is 1

Contention bit: 1 indicating presence of contention
Fat Lock ID (20 bits): the ID of the corresponding fat lock
Reservation bit: the flag indicating whether the lock is reserved by a thread [7]
Rightmost 10 bits unused in TM and reserved for storing the hash codes of Java* objects

The thread manager has a global lock table to map between the lock ID and the appropriate fat monitor, as follows:

Inflated thin monitor

Figure 8: Fat Monitor

Acquiring a Monitor

The process of acquiring a monitor with the help of the hythread_thin_monitor_try_enter() function is shown on the following diagram:

Lock reservation

Figure 9: Acquiring a Thin Lock

First, the thread uses the reservation bit to check whether the required lock is owned by this thread. If yes, the thread increases the recursion count by 1 and the function exits. This makes the fast path of the monitor enter operation for a single-threaded application. The fast path involves only a few assembly instructions and does no expensive atomic compare-and-swap (CAS) operations.

If the lock is not yet reserved, then it is checked for being occupied. The free lock is set to be reserved and acquired simultaneously with a single CAS operation. If the lock becomes busy then, the system checks whether the lock is fat.

The lock table holds a mapping between the fat lock ID and the actual monitor. Fat monitors are extracted from the lock table and acquired. If the lock is not fat and reserved by another thread, then this thread suspends the execution of the lock owner thread, removes the reservation, and resumes the owner thread. After that, the lock acquisition is tried again.

Usage Scenarios

This section contains various scenarios of thread manipulation.

Java* Thread Life Cycle

The Java* thread creation procedure consists of the following key stages:

After creating a new thread, the Thread() constructor creates a new native thread and initializes thread control structures VM_thread and HyThread as part of VM_thread.
The user application then calls the java.lang.Thread.start() method.
The java.lang.Thread.start() method executes the jthread_create() function of the Java* layer in the thread manager via the java.lang.VMThreadManager.start() function.
The function jthread_create() calls jthread_create_with_function().
The jthread_create_with_function() function calls hythread_create_ex() supplying jthread_wrapper_start_proc() as a new thread body procedure.
The hythread_create_ex() function executes the port_thread_create() function of the porting layer, which does the actual fork and creates a new thread. The newly created thread begins to execute jthread_wrapper_start_proc().
The function jthread_wrapper_start_proc() performs the required registration of the new thread in the thread group, allocates VM_thread pool, creates the top-level M2N frame and local handles, initiates thread stack info and JNI thread environment. If needed, the function performs JVMTI callback by sending the JVMTI_EVENT_THREAD_START event.
After thread initialization, the function jthread_wrapper_start_proc() executes the java.lang.Thead.run() method, which makes the user-defined body of the new Java* thread.
After java.lang.Thread.run() has finished, the function jthread_wrapper_start_proc() unsets the thread in the thread group and de-allocates VM_thread data. If needed, the function performs JVMTI callback by sending the JVMTI_EVENT_THREAD_END event.

The following figures illustrate the detailed sequence of thread creation and completion:

Thread operation from creation to finalization

Figure 10: Java* Thread Life Cycle

Figure 11: New Java* Thread Life Cycle

Note

The native thread control structures, such as VM_thread, are not de-allocated once the new thread body is finished. The thread manager creates a weak reference for each java.lang.Thread object supplying its internal reference queue. The garbage collector places a reference into that queue when a specific java.lang.Thread object is garbage-collected. Before allocating native resources for new threads, the thread manager seeks for the weak references in the queue. In case the weak references queue is not empty, the thread manager extracts the first available reference and re-uses its native resources for the newly created thread.

Thread Suspension

One of the important features that the native layer adds for threading in the porting layer is safe suspension. This mechanism ensures that the suspended thread can be safely explored by the garbage collector during the enumeration of live references. If a thread holds some system-critical locks, such as the locks associated with the native heap memory, safe suspension can keep it running even during the enumeration. Otherwise, doing the system or “hard” call to suspend the thread may result in deadlocks in case system locks are requested by other parts of the VM.

The algorithm of safe suspension describes the protocol of communication between two threads, for example, thread T1 and thread T2, where is T1 safely suspends thread T2. The T1 thread calls the hythread_suspend(T2) function to suspend thread T2. The procedure goes in the following stages:

The hythread_suspend(T2) function increments the flag for the T2 thread indicating a request for suspension. Depending on the current state of thread T2, the hythread_suspend(T2) function activates one of the following mechanisms:
- If thread T2 is currently running in a safe code region, the hythread_suspend(T2) call immediately returns, see Figure 12.
- If thread T2 is currently in an unsafe region, then the hythread_suspend() function gets blocked until thread T2 reaches the beginning of a safe region or a safe point.
Thread T2 runs to the end of the safe region and gets blocked until T1 resumes it by calling hythread_resume(T2).

The T2 thread undergoes the following:

T2 thread marks itself with functions hythread_suspend_enable() and hythread_suspend_disable() in order to denote a safe region of code. T2 thread may also call hythread_safe_point() method to denote a selected point where safe suspension is possible.
If a suspension request has been set, T2 thread reaches the end of safe point and gets blocked in hythread_safe_point() method until T1 resumes it by calling hythread_resume(T2).

A typical example of the safe suspension scenario takes place when the garbage collector suspends a Java* thread to enumerate live references. Figure 12 illustrates the case when the GC uses the thread manager to suspend the Java* thread while it is running in the safe code region.

Safe region during thread execution

Figure 12: Suspension: Safe Region

To understand the safe thread suspension algorithm better, think of each thread as having a lock associated with it. Thread T2 releases the lock when it enters a safe region and acquires the lock when it leaves the safe region. To suspend thread T2, acquire the lock associated with it. Resuming thread T2 is equivalent to releasing the lock associated with it. A straight-forward implementation of the safe suspension algorithm reserves a single-thread optimized lock (that is, the thin monitor) for each thread and uses it for suspending and resuming that thread.

Another safe suspension case is when a GC thread hits a Java* thread while it is in an unsafe region of code, as shown in Figure 13.

Safe Point in Thread Execution

Figure 13: Safe Point

Consider the hythread_safe_point() operation as a wait operation performed over the monitor associated with the thread. In this case, the hythread_resume() operation is equivalent to notifying that monitor.

Stop-the-world Thread Suspension

The stop-the-world thread suspension happens when the garbage collector needs to enumerate the live object references for all threads of a given thread group. Figure 14 illustrates the case when only a GC thread an indefinite number of Java* threads are running, so that the GC needs to suspend all Java* threads.

stop-the-world suspension

Figure 14: Suspending a Group of Threads

First, the garbage collector calls the thread manager interface function hythread_suspend_all() to suspend every thread running within the given group (in this scenario, all Java* threads). The thread manager then returns the iterator for traversing the list of suspended threads. GC uses this iterator to analyze each Java* thread with respect to live references and then does a garbage collection. After it is complete, GC instructs the thread manager to resume all suspended threads.

Thread Locking

Locking with the thread manager can be done by means of a mutex or a thin monitor. The mutex is preferable in case of high contention, while thin monitors are better optimized for space. This section describes a scenario when the VM core attempts to lock a resource from multiple threads, T1 and T2. The major stages of the process of locking and unlocking are shown in Figure 15.

locking and unlocking a mutex

Figure 15: Locking with a Mutex

Initially, the mutex is not occupied, that is, the label lock is set to zero. Thread T1 calls the hymutex_lock() function, which instructs the thread manager to mark the mutex as locked by T1.

T2 can also call the hymutex_lock() function later, and if it happens to call on a lock already occupied, then T2 is placed into the internal waiting queue associated with the mutex and gets blocked until T1 unlocks the mutex. The T1 thread calls hymutex_unlock() to release the mutex, which enables the mutex to extract T2 from the queue, transfer the lock ownership to this thread, and to notify T2 that it can wake up.

Monitor Enter and Exit

Locking Java* monitors implies interaction between the thread manager and the VM core since the thread manager requires the memory address within the Java* object where it keeps the lock data. The process of locking Java* monitors is shown on Figure 16 below.

The code generated by the JIT compiler calls the gen_restore_monitor_enter() helper function. The helper function provides a chunk of code (stub) that can be inlined by the JIT compiler directly into the generated assembly code.

The helper function argument is a physical address of the lock word within the Java* object.
The first step of helper is to perform a fast path of acquiring the lock associated with the object. The action is the same as thread manager does in hythread_thin_monitor_try_enter() function.
The helper tries to follow the fast path. If the lock is acquired and the monitor is not contended, the helper returns. The helper does not need to switch between Java* and native frames and operates directly involving no Java* objects.
If the lock is not acquired, the helper continues by entering a slow path, which switches between Java* and native code. The slow path actions are pushing an M2nFrame and creating the local handle (see in the figure below).
The helper calls the jthread_monitor_enter() function, which works with the Java* object as with JNI code.

slow and fast paths to locking a Java* monitor

Figure 16: Locking Java* Monitors

References

This section lists the resources used in this document and other related documents.

[1] J2SE 1.5.0 specification, http://java.sun.com/j2se/1.5.0/docs/api/

[2] JVM Tool Interface Specification, http://java.sun.com/j2se/1.5.0/docs/guide/jvmti/jvmti.html

[3] Java* Native Interface Specification, http://java.sun.com/j2se/1.5.0/docs/guide/jni/spec/jniTOC.html

[4] Apache Portable Runtime project, http://apr.apache.org/

[5] POSIX standard in threading, http://www.opengroup.org/onlinepubs/009695399/idx/threads.html

[6] David F. Bacon, Ravi Konuru, Chet Murthy, Mauricio Serrano, Thin locks: featherweight synchronization for Java, http://portal.acm.org/citation.cfm?id=277734

[7] Kiyokuni Kawachiya Akira Koseki Tamiya Onodera, Lock Reservation: Java Locks Can Mostly Do Without Atomic Operation, http://portal.acm.org/citation.cfm?id=582433

[8] HyThread documentation, http://harmony.apache.org/externals/vm_doc/html/group__Thread.html

* Other brands and names are the property of their respective owners.