2009-03-16

Java Specialist Master Course

Last week I attended Dr. Heinz M. Kabutz' Java Specialist Master Course, which my employer held as an on-site course.  These are my field notes from this training.


I strongly recommend this course if you're fluent in Java, but want to take it one step further or remind yourself of some of the advanced features or intricacies.

Day One

Multi-Threading

TreadGroup: Stay away!   Not really usable.


A thread can also wake up without being notified, interrupted, or timing out, a so-called spurious wakeup. While this will rarely occur in practice, applications must guard against it by testing for the condition that should have caused the thread to be awakened, and continuing to wait if the condition is not satisfied. In other words, waits should always occur in loops, like this one:

     synchronized (obj) {
         while (<condition does not hold>)
             obj.wait(timeout);
         ... // Perform action appropriate to condition
     }


You cannot break out of a synchronized block in a safe way.  If you do, the JVM will be in an unsafe state.  The Java 5 locks are much safer.

Synchronizing on this is very dangerous:  There's always someone who holds a reference to you (or you would be garbage collected).  If they use you as a lock, they can effectively lock you out.  They can also notify() you, confusing the internal work flow.

Locking on specific lock instances is always easier to get correct.

Java 5 locks must be wrapped in a try / finally block -- always.

Lock with WAITING state instead for BLOCKING state:  lock.lockinterruptibly().

Deadlock: lock.tryLock(long,TimeUnit) can give false deadlock warnings in JConsole.

Thread priorities cannot be relied upon.  They can cause starvation, and not all operating systems have a direct mapping between Java thread priorities and OS priorities.

volatile makes race conditions worse.

The value of a ThreadLocal is stored with the thread.

Usually not recommended to introduce fairness in Java. Starvation is usually not a problem, and the performance impact is significant.  The OS scheduler does a good job at ensuring fairness.

Day Two

Java IO

Java Input and Output Streams have been designed using Decorator, where additional classes add functionality to underlying streams.  Without this we would have extremely many permutations of these classes.


It is only necessary to close() the instance on top of this decorated chain. Closing the input or output stream on a socket will also close the underlying socket.


ObjectOutputStream has two caveats:

  1. The caching for previously written objects is not bounded, and can result in an OutOfMemoryError
  2. If you change an object after it has been written, this change will not be picked up and written.

ObjectInputStream also cache objects in an unbounded Hashtable, and can get an OutOfMemoryError.

SerialVersionUID is a hash of both fields and methods.  Adding a method will also change it!

readResolve() and writeReplace() overrides the default serialization input or output form.

Externalizable classes are very easy to hack.  The public read / write methods could be called from anyone to either extract its private contents or inject fabricated data into an instance.

Versioned serialization:  Write a version number to the head of the stream, and switch on this version number when you read the object back.

writeObject() / readObject() can have roughly the same performance benefit as Externalizable.


Java NIO

You read or write as much data from / to a channel as you can without blocking. So a thread busy transfering data would spend less time waiting for data. But there is a risk of spin looping if either the source or target is unable to accept data, but isn't closed.

Direct allocation and MappedByteBuffer uses the C heap, not the Java heap.  Overallocating would therefore not cause a OutOfMemoryError, but virtual memory thrashing.

Java Memory

Premature tenuring can be a real performance problem.  If the survivor spaces are slightly too small, and you hold on to your objects slightly too long, the survivor spaces get contended, and objects get tenured (moved to the old gen tenured space) too soon.


This can be fixed by tuning the survivor space sizes.

When tuning GC logs: Chop off the startup phase of the JVM -- it can confuse the steady state behavior.

Problems with JConsole:
  1. It takes snapshots.  A lot can happen between snapshots.
  2. It cannot connect to a JVM which is having problems.

Tuning JVM

Don't confuse memory leaks with loitering objects (objects that just stick around longer than necessary).


Recommended heap size:  10% larger than steady state.

Day Three

References

What is the size of a Boolean (64-bit)

  • 16 bytes for an object
  • 1 byte for the boolean value.  This is rounded up to the nearest 8 bytes
  • Total: 24 bytes

Finalizers: 80 bytes overhead per instance.

References makes object stay around longer than necessary, thereby leaking into old space.

Phantom references must be cleared, and must be constructed with a reference queue.

Object Pooling

Object Pooling is like ceramic mugs compared to disposable cups. It has a higher initial cost, and also a maintenance cost.

Generally used only in two cases: Thread pools and database connection pools.

Reflection API

Class.getDeclaredFields() uses a different SecurityManager permission than Class.getFields().  It is thus possible that the former is not allowed; only the latter.

Class.newInstance() has dangerous side effects.  It can throw checked exceptions that aren't declared!  The culprit is this:
        // Run constructor
        try {
            return tmpConstructor.newInstance((Object[])null);
        } catch (InvocationTargetException e) {
            Unsafe.getUnsafe().throwException(e.getTargetException());
            // Not reached
            return null;
        }

It is generally safer to use Class.getConstructor().newInstance() and catch the InvocationTargetException wrapping this checked exception.

Changing a final field through setAccessible(true) is currently possible, but this might change.  Also, changing a final field might not be visible everywhere, since the Hotspot compiler can cache or inline the value of this final field.

java.util.Arrays.copyOfRange(T[], int, int) -- return a subarray.  There's also a method where you can declare the return type.

java.lang.Class.getComponentType() returns the type of the array members, and null if the class is not an array.

Dynamic Proxies

The Method instance is linked to the declaring class (or interface), not to the implementing class or instance.  This means this can be passed from the InvocationHandler to the actual implementation you are facading.

A generic factory for dynamic proxies:

import java.lang.reflect.InvocationHandler;
import java.lang.reflect.Proxy;

public class ProxyFactory {

    public static <T> T proxy(Class<T> type, final T obj, final InvocationHandler handler,
            Class<?>... otherInterfaces) {
        return type.cast(Proxy.newProxyInstance(type.getClassLoader(),
                merge(type, otherInterfaces), handler));
    }

    private static Class<?>[] merge(Class<?> first, Class<?>[] rest) {
        Class<?>[] result = new Class<?>[rest.length + 1];
        result[0] = first;
        System.arraycopy(rest, 0, result, 1, rest.length);
        return result;
    }
}

Data Structures

CopyOnWriteArrayList is faster for iterating than the regular ArrayList, but much more expensive for modifications.

Sorting

Java uses merge sort because of overall good performance. Quick sort is often faster, but can be much slower in the worst cases, one being if the list is already sorted.


1.4+ HashMap bit masking was originally introduced for performance reasons, but over time the bit mixing has become more complex, thus closing this performance gap.  The end result is that the hashing is completely unpredictable and much harder to understand.


equals() should cover all attributes of an object, but hashCode() doesn't need to.  Find the attributes that most uniquely identify your instance.

Generics

<N extends Number> means that you now can call methods on Number using N:  N n; n.doubleValue();


Generics sometimes produce very confusing compiler errors.

Other Structures

A LinkedHashMap normally orders elements by insertion order, but can also be set up to order elements by access order, placing most accessed elements first.  This can be used to create a LRU cache.

Exceptions

RuntimeException's original definition was "exceptions thrown by the Runtime to protect itself from your code."

Checked exceptions were designated for errors where there would be some corrective action.

It is always a bad idea to use exceptions for flow control.

An uncaught exception handler can be used to restart a thread replacing the one that just died, continuing its work.

Day Four

Critical Errors Inside the JVM

What to do when the JVM dies:

  1. Upgrade to a later version of the JRE
  2. Examine the hs_err file, find out what was causing the problem, and change your code so that it doesn't provoke this behaviour

To analyze the hs_err file, take a look at http://java.sun.com/javase/6/webnotes/trouble/TSG-VM/TSG-VM.pdf

Best Practices for Exceptions

  1. Checked vs unchecked
  2. Custom exceptions -- avoid when possible
  3. Wrap (chain) low level exceptions
  4. Don't swallow exceptions!
  5. Cleanup after an exception -- in the finally block
  6. Never abuse exceptions for flow control
  7. Exceptions management strategy is a vital element of code reviews and post mortem analysis

Exceptions thrown in a finally block will hide the original exception.

Assertions

Ensure you avoid side-effects : assertions could always be disabled, effectively disabling this statement.


You could assert that you hold the lock as expected:  assert Thread.currentThread().holdsLock(lock)


When using assertions, the entire system must be tested both with assertions enabled and disabled.


Useful to verify preconditions, postconditions, and class invariants.

Java Optimizations

Order of importance for performance improvements:

  1. Design and architecture
  2. Algorithm selection
  3. Code implementation
  4. System configuration
  5. System infrastructure

Good design should win over performance.  The Hotspot is very good at optimizing "normal" code, but could struggle if the code is cryptic. By coding in a complex way, you can make the performance worse than the clean and simple way.

Measure, don't guess.

Performance improvements <15% are typically not noticeable.

Specify your target, so you know when to stop.

Top five bottlenecks, find the easiest one, and fix this.  Rinse and repeat.

Record everything.  Use paper or some other means independent of the test setup.

  1. Start by looking at hardware -- CPU, memory usage, I/O.  Locate any bottlenecks on the hw level.  Most performance problems appear as CPU bottlenecks.
  2. Then look at JVM level: threads, memory usage, garbage collection.
  3. Application level: Basically lock contention.
  4. People level: study usage pattern, arrival rates.


Beware of Hotspot effects.  Discard the first few results of a run.  Look at the variance to determine whether the test is deterministic.

Work around GC interference: Larger initial heap size, forciby run System.gc() periodically.  Sleep a while after the gc, at least 100ms, ideally 2 seconds.

Compare -client vs -server vs -Xint

The HotSpot compiler

Two levels of compilation: Just-In-Time compilation replaces bytecodes with machine instructions. The HotSpot compiler profiles the running application, directing the JIT on how to compile based on an execution profile.

-Xcomp compiles everything to native.  This is generally slower, since it compiles too much, like methods invoked only once.

Bi-morphic: special case of polymorphism if only two implementations of a certian function is available.  The method dispatch is simplified into an if / else statement.

The cost of a method call: 0 if only one implementation exists, little if two implementations exist, more if three or more implementations exist.

The thread must stop while the On Stack Replacement is performed.

Problem Areas

The two major problem areas are excessive looping and excessive object creation.


Special byte codes to handle the first four arguments and local variables. Long and double counts as two.  Move most-used variables into these slots.


Object Pooling usable only for expensive resources (files, sockets, connections, threads)


You don't save memory by using substring(), since this new String instance would share the char[].  You'd need to create a new String.

Profiling Tools

hprof is bundled with the JVM. Can do cpu and heap profaling.  Its output can be read by many tools.

Logging

Don't write your own logging framework....


Mentions Commons Logging, but fails to mention that Commons Logging is hated by many developers: http://www.qos.ch/logging/thinkAgain.jsp.  Also it is claimed to have memory leaks: http://www.szegedi.org/articles/memleak.html


Also describes setting up logger statically per class, with some tricks involving a stack trace to get the class name.  It is probably better to have a transient instance field.


SLF4J is also a wrapper API, enabling you to select logging implementation at runtime through the classpath. But SLF4J is more modern.


For Log4J he mentions only the properties format, and not the XML format, which is more structured.

Logging guards: use log.ifXXXEnabled() before your log statement.  One neat trick is to create an aspect, inserting this code for you.

Varargs has a performance side effect.  It will create an Object[] containing object instances of all parameters.

With asynchronous logging you would loose the most interesting messages if the JVM crashes.