Tuesday, July 15, 2008

System.identityHashCode

A recent thread on the the Scala IRC channel piqued my curiosity about exactly how System.identityHashcode works.  It looks like as less heated remarks have flushed out the conversation more the definition was irrelevant, but regardless, I think it's interesting.  Thankfully Sun open-sourced the their implementation of the Java platform so it was pretty easy to find out exactly how it works.

It turns out the Sun JVM contains three different algorithms for calculating the identity hash, none of which are guaranteed to yield unique hash codes.  That's not surprising, because on a 64 bit architecture, doing that would be almost impossible.  I say almost because one could just generate them as sequential numbers, and then crash the JVM when it runs out of 32 bit hash codes (an OutOfHashCodesError ;-).

From share/vm/runtime/synchronizer.cpp:

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0 ; 
  if (hashCode == 0) { 
     // This form uses an unguarded global Park-Miller RNG, 
     // so it's possible for two threads to race and generate the same RNG.
     // On MP system we'll have lots of RW access to a global, so the
     // mechanism induces lots of coherency traffic.  
     value = os::random() ; 
  } else
  if (hashCode == 1) { 
     // This variation has the property of being stable (idempotent)
     // between STW operations.  This can be useful in some of the 1-0
     // synchronization schemes.  
     intptr_t addrBits = intptr_t(obj) >> 3 ; 
     value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ; 
  } else 
  if (hashCode == 2) { 
     value = 1 ;            // for sensitivity testing
  } else { 
     // Marsaglia's xor-shift scheme with thread-specific state
     // This is probably the best overall implementation -- we'll
     // likely make this the default in future releases.
     unsigned t = Self->_hashStateX ; 
     t ^= (t << 11) ; 
     Self->_hashStateX = Self->_hashStateY ; 
     Self->_hashStateY = Self->_hashStateZ ; 
     Self->_hashStateZ = Self->_hashStateW ; 
     unsigned v = Self->_hashStateW ; 
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ; 
     Self->_hashStateW = v ; 
     value = v ; 
  } 

  value &= markOopDesc::hash_mask;
  if (value == 0) value = 0xBAD ; 
  assert (value != markOopDesc::no_hash, "invariant") ; 
  TEVENT (hashCode: GENERATE) ;
  return value;
}

What's interesting is that there's an algorithm in there (hashCode == 2) where the identity hash code turns out to always be the same all objects.  I'm pretty sure it's there solely for testing purposes, so they can run tests against the JVM and standard library and ensure no piece of code is relying on the uniqueness of the identity hash code.

This relates back to my previous point about transparency.  Because Sun's JVM is open source, it's possible for anyone to peek inside and see how the various bits work, and what assumptions the Sun JVM engineers are and are not making about those pieces.

Sphere: Related Content

No comments: