Tuesday, November 9, 2010

Commercial-Off-The-Shell Enterprise Real Time Computing (COTS ERTC) -- Part 5: Java Requirements

Because programs designed using standard Java (either Java SE or Java EE or Java ME) exhibit non-deterministic timing of execution, standard Java is not widely used in RTC, especially in HRT.
However due to standard Java's popularity in many industries, extending it with RTC functions is very attractive. The Real-Time Specification for Java (RTSJ) is designed to seamlessly augment standard Java with RTC functions.

In this topic I will describe the key areas where standard Java creates jitters and how the two populate RTSJ implementations - Oracle's Java RTS and IBM's WebSphere RT - reduce them.
This is also a good time to compare RTSJ with C/C++ if they both run on the same underlying RT OS.

But before we start, here are few things about RTSJ:
Its implementations usually need a RT OS. Actually it is a must for HRT. In order to lower the learning curve, RTSJ doesn't introduce any new Java syntax, instead provides several new classes. Both Java RTS and WebSphere RT support Java SE 5 or later.

1. Threads, Synchronizations and Memory Management Models
Because Java threads have a one-to-one relationship with the underlying OS native threads (e.g. Java threads are implemented using the native POSIX threads on Solaris and stock Linux), most of Java threads' functions are delegated to OS native threads as we discussed in Part 3.

1.1 Regular Java Threads vs Java Real-Time Threads
Regular Java threads created by class java.lang.Thread (JLT) are not suitable for RTC due to the following factors:
  • JLT only has 10 priority levels, which is not enough for COTS ERTC.
  • JLT maps to OS non-real-time threads. For example, JLT maps to the SCHED_OTHER scheduling policy on Linux that creates jitter due to its dynamic priority adjustment.
  • JLT has the "priority inversion" problem because java.lang.Object.wait() doesn't implement the "priority inheritance" logic.
  • JLT doesn't provide priority-based synchronization because java.lang.Object.notify() is not required to wake up the highest-priority thread; and the "unlock" action in the "synchronized" statement is not required to choose the highest-priority thread to be the next lock owner.
  • JLT is subject to unpredictable pauses induced by GC because JLT allocates objects in the standard Java heap.
RTSJ provides a new thread type javax.realtime.RealtimeThread (RTT) to fix the above problems:
  • RTT must have at least 28 priority levels. For example Java RTS has a default of 60 on Solaris and 49 on stock Linux;
  • RTT maps to OS level real-time threads. RTT maps to the SCHED_FIFO scheduling policy on stock Linux which provides fixed-priority scheduling based on FIFO queues (another stock Linux RT scheduling policy "SCHED_RR" doesn't meet RTT requirements due to its round-robin feature). 
  • RTSJ requires java.lang.Object.wait() implement the "priority inheritance" logic. On Linux RT and Solaris, it can delegate to the underlying POSIX threading services.
  • RTSJ requires java.lang.Object.notify() wake up the highest-priority thread, and the "unlock" action in the "synchronized" statement choose the highest-priority thread to be the next lock owner. On Linux RT and Solaris, they can again delegate to the underlying POSIX threading services.
  • RTSJ provides a subclass of RTT called javax.realtime.NoHeapRealtimeThread (NHRT) which is protected from GC-induced jitter (see next section for details). NHRT is intended for HRT.
Because RTT exposes many of the underlying OS native threading functions to Java, RTSJ can compete squarely with C/C++ in the above areas.

1.2 New Memory Management Models
Because NHRT is not affected by GC pauses, it is allowed neither to use the standard garbage-collected heap nor to manipulate references to the heap.
Besides the standard heap, RTSJ provides two memory management models that RTT and NHRT can use to allocate memory on a more predicable basis:
  • Immortal Memory.
    It is not garbage-collected. Its primary use is to avoid dynamic allocation by statically allocating all the needed memory ahead of time.
    It is like the malloc() / new operator in C/C++ without the corresponding free() / delete operator because once an object is allocated from it, the memory used by the object will never be reclaimed. Any unintended allocation into it is regarded as memory leak and can cause out of memory.
  • Scoped Memory. It is not garbage-collected either. It is intended for objects with a known lifetime such as temporary objects created during the processing of a task. It will be entirely reclaimed at the end of the lifetime such as when the task finishes.
    Because many standard and third-party Java libraries create many temporary objects, it is impractical to use immortal memory if RTT or NHRT has to link to many libraries. In this case scoped memory is a better choice.
Although the two new models enable RTSJ to compete with C/C++ equally at the memory allocation area, they are hard to use and very error-prone because objects allocated in the two new models and the standard heap have difference GC characteristics and lifetimes, and assignment rules must be enforced.
So the recommended use of the two models is limited to programs that can't tolerate GC pauses such as in HRT.

1.3 Communication between NHRT and RTT / JLS
Because an NHRT could block on a lock held by an RTT or JLS. While the RTT or JLT holds the lock, GC could preempt it and indirectly preempt the NHRT. If this is not tolerable, you either should void lock sharing or use the following non-blocking queues for resource sharing:
  • javax.realtime.WaitFreeReadQueue class for passing objects from RTT / JLT to NHRT.
  • javax.realtime.WaitFreeWriteQueue class for passing objects from NHRT to RTT / JLS
Because many Java libraries use internal locking, NHRT linked to Java libraries must avoid indirect GC preemption to ensure HRT behavior.

Because the new JTSJ classes and RT functions above mentioned have a medium-level learning curve for Java SE developers, you are recommended to still use JLT with careful tunings if SRT can meet your business requirements. Those tunings included the OS level tunings as mentioned in Part 3, limiting the number of your application threads, avoiding of resource sharing and GC low-pause tuning as mentioned in Section 2 in this part.

2. Garbage Collection
Although automatic garbage collection provided by JVM greatly eases the memory management in Java compared to C/C++, it is unfortunately another big source of non-determinism. 
Although standard JVM provides different GC algorithms such as serial, parallel and concurrent to allow users to make a trade-off between pause and throughput, all algorithms involve a stop-the-world (STW) pause, in which all application threads except NHRT are stopped so that GC can run without interference. This STW behavior is only acceptable for SRT or loose HRT.

Although RTSJ doesn't define deterministic GC, a RT JVM must provide one in order to support HRT. There are basically two approaches: work-based and time-based.
Both approaches are aimed to minimizing the effect of long pause by doing incremental collection in a GC cycle. Unfortunately neither can provide HRC guarantee.

2.1 Work-Based
Because the standard STW GC behavior blankly taxes all application threads, the work-based approach has each thread do a specific amount of incremental GC work proportional to its allocation amount each time it allocates an object.
However your application is still unpredictable because the GC cost spread-out is often uneven because the allocation is often uneven and the amount of time to do a fixed amount of incremental collection work is variable as shows in Figure 5.1 courtesy of this resource:
Figure 5.1 Risks of Work-Based Garbage Collection 
2.2 Time-Based
It schedules a fixed amount of collection time in each GC cycle.
Although it spreads the GC cost equally through a GC cycle, there is no direct correlation between the allocated collection time and the reclaimed memory, and your application is still unpredictable as shown in Figure 5.2 courtesy of this resource:

Figure 5.2 Time-Based GC and Undesirable Outcomes
2.3 Oracle Java RTS 2's Approach
Java RTS uses a modified work-based approach called "Henriksson's GC" or Real-Time GC (RTGC). This RTGC can be configured to run as one or more RTTs that run at a priority lower than critical threads (NHRT and critical RTT) and higher than non-RT threads (JLS) and maybe non-critical threads as well (non-critical RTT) so that critical threads may preempt the RTGC, and RTGC may preempt non-RT threads and non-critical threads to keep up with the application memory allocation rate. This is shown in Figure 5.3 courtesy of this resource:
Figure 5.3 Henriksson's GC
The initial RTGC priority is lower than non-critical RTT but is boosted to its configurable "maximum priority" higher than non-critical RTT if remaining memory is close to another configurable "memory threshold". These two configurable parameters enable you to tune the balance between non-critical threads' deterministic and memory throughput.

Figure 5.3 shows that RTGC ensures HRT only for critical threads that should competes pretty well with C/C++,  while trying to offer SRT for non-critical threads that usually can't compete with C/C++.

2.4 IBM WebSphere RT 2's Approach
WebSphere RT's Metronome is a time-based deterministic GC. It divides a GC cycle into a series of discrete quanta, approximately 0.5ms but no more than 1ms in length, that are devoted to either GC work or application work.
Even a single GC pause has an upper bound of 1ms, it is not enough because if several quanta were devoted to GC work, the application still experience a longer pause time as we discussed in Part 1. It must also meet another parameter called "utilization" that is the percentage of time quanta allocated to an application in a given window of time continuously sliding over the application's complete run.
Figure 5.4 shows a GC cycle divided into multiple 0.5ms time slices preserving the default 70% utilization over a 10ms windows courtesy of this resource:
Figure 5.4 Metronome GC Sliding window utilization

Compared to Oracle's RTGC, Metronome provides more determinism to JLT and non-critical RTT while RTGC ensures HRT for critical RTT.  Because learning RTT takes time, Metronome is recommended to implement SRT with JLS.

2.5 Standard JVM's Approach

Both Oracle's Hotspot JVM and JRockit provide so called low-pause concurrent GC that only briefly pauses the application and runs concurrently with your application for most of the time.
Hotspot JVM allows you to specify both a target pause time and throughput while JRockit only allows you to specify a target pause time that seems to be inadequate based on our discussion in Part 1.
Such a low-pause concurrent GC can at best provide SRT.

Another very promising SRT implementation is provided by JRockit Real Time, which extends the existing mostly concurrent low-pause GC by ensuring the target pause time and limiting the total pause time within a prescribed windows (unfortunately such a windows can't be configured by users).
JRockit Real Time GC is more deterministic than the regular JRockit concurrent low-pause GC. Oracle claims it is the industry’s fastest JVM with its average response time in microseconds for most well-managed applications and low, single-digit, millisecond response with a five nines reliability guarantee. These numbers are pretty good for SRT even in the financial services industries.
 
Here is the biggest advantage of tuning GC compared to the new RTJS classes in Section 1: it is transparent to Java SE developers. In other words, it doesn't need developers to change their existing programming models or adapting to new RTC models.

Finally in this section it is often hard to pick a selection between SRT  or loose HRT coded using C/C++ and SRT coded using Java along with a deterministic low-pause concurrent GC.


3. Class Loading, Linking, Initializing and Compilation
3.1 Class Loading, Linking, Initializing (LLI)
A standard JVM loads, links and initializes classes on demand; it also unloads classes no longer referenced. Though LLI provides great flexibility on memory and CPU consumptions and the one-time cost can probably be made up if the same class is referenced multiple times, it is still another big source of non-determinism.
For example class loading usually takes at least tens of milliseconds because it usually involves disk IO. The linking and initializing are also very CPU-intensive.

You can eliminate the LLI jitter by pre-doing LLI in the application warm-up phase (it is defined in Part 3) either by calling the standard java.lang.Class.forName() or using RTSJ implementation-specific utilities.
For example, Oracle's Java RTS 2 allows you to either specify a list of classes for pre-loading and / or pre-initializing on the command line at JVM startup or use the Initialization-Time-Compilation (ITC) API during runtime (Both Java RTS and WebSphere RT can automatically generate a list of classes that were loaded or initialized by your application's execution).

Both approaches for eliminating LLI jitter are basically transparent to Java SE developers.

3.2 JIT vs Ahead-of-Time (AOT / ITC)
If you used C/C++ static compiler before, you know compiling (optimization included) is very CPU and memory intensive. So modern JVM JITs initially interpret Java methods and , for only those methods that execute frequently, later compile to native code.
Because the dynamic nature of Java makes much important information only available during runtime, JIT can generate even better code than statically compiled language like C/C++. Recompiling (deoptimization), overriding (virtual) method optimization and escape analysis are just three examples.
So if JIT can carefully balance the compiling time and optimization aggressiveness, the one-time compiling cost will be made up by the multiple later executions of the native code and the average execution time will be equal to or even shorter than the corresponding C/C++ program.

However because the compiling time is up to JVM and there is execution time variation between interpreted code and native code, the dynamic JIT is another big source of non-determinism.
The only solution to eliminate JIT jitter is to use some kind of static compiling such as AOT in WebSphere RT and ITC in Java RTS.

However AOT/ITC also has some disadvantages.
Firstly, the platform portability is comprised by AOT. ITC is a bit better due to its initialization-time compiling instead of AOT's development-time compiling.
Secondly AOT/ITC-compiled code, though faster than interpreted code, can be much slower than JIT-compiled or C/C++ code because AOT/ITC can only make few conservative optimizations with little information available at hand.

The debate of  JIT vs AOT/ITC is still involving. Readers are recommenced to this resource to gain more insight.  I personally recommend JIT because you can still achieve SRT and even loose HRT with it. With such fine tunings as warm-up and background JIT compiling, your Java programs can compete toe-to-toe with C/C++ programs.

4. Timers and Clocks
If RTC needs nanosecond resolution and your system can only provide milliseconds at beat, it will cause jitter.
RTSJ provides new classes javax.realtime.Clock and javax.realtime.HighResolutionTime to expose high-resolution clocks in underlying hardware and OS as discussed in Part 2 and 3.
You should also use the following standard Java methods if you need nanosecond resolution:
  java.lang.Object.wait(long timeout, int nanos); 
  java.lang.Thread .sleep(long millis, int nanos);

You shouldn't use standard Java's java.util.Date or java.lang.System.currentTimeMillis() due to their low resolution and synchronization with the world clock (this synchronization looks like jitter when this world clock is updated).

In summary RTSJ can compete toe-to-toe with C/C++ in most areas for building COTS ERTC applications. The only concern is the learning curve of its functions extensions to standard Java. However the curve is not steep and often necessary for RTC.
If you had bias against using Java to build RTC systems, hopefully you will have a second thought after this discussion.

No comments:

Post a Comment