internals learned: May 2011

During my time at Terracotta i have been following pretty strict policy of agile software development methodology. A feature is not complete if there are no unit tests to test each class and subsystem or if it does not has a system tests to check whether the code behaves as expected. The build system we have uses ruby and it had become my habit to write test assuming that each test would be run in a separate JVM. But in changing worlds where maven seems awesome this assumption fails and this gives a whole lot problems for test cases for singleton objects.

Let me explain it with an example.
Lets say we have a factory which creates arrows for archers in a city. Every time a archer comes it gives 2*n arrows to it where n is the number of the soldier while making sure no two archers get the arrows at the same time.
It comes naturally that this factory should be a singleton instance, since archers can come from all over the place. So we write a singleton class like this

public class WeaponFactory {

private int arrowCount = 0;

private WeaponFactory(){

// make it private so that no one can create it

}

//Let class loader do the magic of creating a singleton Instance for u

public static WeaponFactory getWamboo(){

return WeaponFactoryHolder.instance;

}

private static class WeaponFactoryHolder {

static final WeaponFactory instance = new WeaponFactory();

}

//no two archers can take an arrow at the same time

public synchronized Weapon[] getArrows() {

arrowCount++;

Weapon weapons[] = new Arrow[this.arrowCount];

for(int i = 0; i < arrowCount; i++){

weapons[i] = new Arrow(Arrow.FIRE_POWER);

}

return weapons;

}

//to create an arrow with specified firepower

public Weapon createArrow(int firePower){

return new Arrow(firePower);

}

Its a pretty standard class which uses singleton design pattern and fulfills the requirement. Now to test this class we have bunch of tests. For example lets say we have these two classes

public class WeaponFactoryTest1 extends TestCase {

public void testFactory(){

WeaponFactory weaponFactory = WeaponFactory.getWamboo();

Weapon[] weapons = weaponFactory.getArrows();

Assert.assertEquals(1, weapons.length);

}

public class WeaponFactoryTest2 extends TestCase {

public void testFactory(){

WeaponFactory weaponFactory = WeaponFactory.getWamboo();

Weapon[] weapons = weaponFactory.getArrows();

Assert.assertEquals(1, weapons.length);

}

Now as long as the two tests run in a different JVM we are fine, each test will have the newly created singleton instance of WeaponFactory and the test logic will test the classes correctly. But the moment both test run in the same JVM(e.g. mvn clean install command is fired, all tests will run in the same JVM) we have an issue. In the above example among the two tests the one which ran first will pass while the next one will fail since the first test created the instance of WeaponFactory and made changes to its state which the second test was not expecting.

In a traditional build system where each test is run in a separate JVM we were doing good. Now imagine that you have to change your build system and use maven instead. After the pain you will take to migrate, you will realize that all the tests which were using singleton classes and had logics like what is explained in the example will start failing. While running the same test using mvn test command individually will pass.

Its not always possible to have a quick fix for it but simple thing like resetting the state to initial state of the singleton instance for each test run might fix it. For example in the above case exposing this method in the class WeaponFactory

public void reset(){

this.arrowCount = 0;

}

and adding this in the setup in each of your junit test will fix it

@Override

protected void setUp() throws Exception {

Marina.getInstance().reset();

WeaponFactory.getWamboo().reset();

EnemyFactory.getInstance().reset();

}

The point here is that its always tricky to test singleton classes in your code and it can be pretty confusing to figure out why a test which passes when its run individually fails when "mvn clean install" command is fired and while designing your singleton classes you should keep this in mind.

In terracotta's world every shared object is associated with an Object Id which is a long. Depending on the use case objects are created and old one might get dereferenced and collected by Terracotta's Distributed Garbage Collector. So during the cluster operation we end up having a large number of object Ids which are not contiguous in nature. In a lot of operations we need the all or fraction of the object ids present in the system. Creating a collection for all those object ids will result in occupying a large space on heap and moreover sending that collection over wire will also be not that optimized. So we needed to compress those object ids efficiently. These are the two approaches we took to implement our compressed set for object ids called ObjectIdSet

1. Range Based Compression: As the name suggests the object ids are compressed based on the range they are in. So we have a set of Range objects under ObjectIdSet which have start and end defined. Any Range object present in the ObjectIdSet means that Range.start to Range.end (both inclusive) are present in the ObjectIdSet.

While adding Object ids in the set two Range objects can be merged to get replaced by one Range objects. For example if you have Range(5,8) and Range(10,15) present and add id 9 in the set, the two Range objects would get merged into Range(5,15). Similarly while removing an id from ObjectIdSet a Range object might get split into two. For example Range(5,15) will get split into Range(5,8) and Range(10,15) if object id 9 is removed.

2. BitSet Based Compression: In this approach we have a set of Objects called BitSet. BitSet contains two long variables.

public class BitSet {

private long start;

private long nextLongs = 0;

......

}

Where BitSet.start defines the start index and BiSet.nextLongs represents the next 64 a bit set representing the next 64 ids in the set. For example if we have only two Ids 6 and 84 present in the set we will have Two BitSet Objects having these as start and nextLongs

1. BitSet(0, 0x0000000000000000000000000000000000000000000000000000000001000000)

2. BitSet(64, 0x0000000000000000000000000000000000000000001000000000000000000000)

With this approach the compression is based on fix sets. To add any id in ObjectIdSet we just have to set corresponding bit in BitSet.nextRange and similarly to remove any id just need to unset the bit. This approach is less complex and compresses more generally since the previous approach would have a lot of fragmented Range objects.

Ofcourse there would be scenarios where the Range based object id set would perform better but in our testing of general cases we have found that BitSet based approach worked better in most cases. So by default in terracotta the compression is based on BitSet approach.

The implementation can be checked out by these classes

1. ObjectIdSet

2. RangeObjectIdSet

3. BitSetObjectIdSet

internals learned

Sunday, May 22, 2011

What to remember while using singleton classes in agile software development

Tuesday, May 10, 2011

A Compressed Set of Longs

http://iyc.in/sns/pg/blog/raghu.iitr

Green living with commonfloor.com

Blog Archive

About Me