Tuesday, May 10, 2011

A Compressed Set of Longs

In terracotta's world every shared object is associated with an Object Id which is a long. Depending on the use case objects are created and old one might get dereferenced and collected by Terracotta's Distributed Garbage Collector. So during the cluster operation we end up having a large number of object Ids which are not contiguous in nature. In a lot of operations we need the all or fraction of the object ids present in the system. Creating a collection for all those object ids will result in occupying a large space on heap and moreover sending that collection over wire will also be not that optimized. So we needed to compress those object ids efficiently. These are the two approaches we took to implement our compressed set for object ids called ObjectIdSet

1. Range Based Compression: As the name suggests the object ids are compressed based on the range they are in. So we have a set of Range objects under ObjectIdSet which have start and end defined. Any Range object present in the ObjectIdSet means that Range.start to Range.end (both inclusive) are present in the ObjectIdSet.
While adding Object ids in the set two Range objects can be merged to get replaced by one Range objects. For example if you have Range(5,8) and Range(10,15) present and add id 9 in the set, the two Range objects would get merged into Range(5,15). Similarly while removing an id from ObjectIdSet a Range object might get split into two. For example Range(5,15) will get split into Range(5,8) and Range(10,15) if object id 9 is removed.

2. BitSet Based Compression: In this approach we have a set of Objects called BitSet. BitSet contains two long variables.

public class BitSet {

private long start;

private long nextLongs = 0;

......

......

}

Where BitSet.start defines the start index and BiSet.nextLongs represents the next 64 a bit set representing the next 64 ids in the set. For example if we have only two Ids 6 and 84 present in the set we will have Two BitSet Objects having these as start and nextLongs

1. BitSet(0, 0x0000000000000000000000000000000000000000000000000000000001000000)
2. BitSet(64, 0x0000000000000000000000000000000000000000001000000000000000000000)

With this approach the compression is based on fix sets. To add any id in ObjectIdSet we just have to set corresponding bit in BitSet.nextRange and similarly to remove any id just need to unset the bit. This approach is less complex and compresses more generally since the previous approach would have a lot of fragmented Range objects.

Ofcourse there would be scenarios where the Range based object id set would perform better but in our testing of general cases we have found that BitSet based approach worked better in most cases. So by default in terracotta the compression is based on BitSet approach.

The implementation can be checked out by these classes

1 comment:

Pankaj Bhambhani said...

Hi,

Thanks for the explanation on Object ID Set. I am using Terracotta 4.3.0 along with Ehcache 2.10.0 and I am getting an exception while establishing a connection to the server. The exception says that range was inserted out of order. Any help is appreciated.

2015-09-17 09:44:11,174 [WorkerThread(hydrate_message_stage, 13, 13)] ERROR com.tc.net.protocol.tcm.Hydra
teHandler - Error hydrating message of type REQUEST_MANAGED_OBJECT_MESSAGE (12)
java.lang.UnsupportedOperationException: Inserts out of order
at com.tc.util.BasicObjectIDSet.doRangeInsert(BasicObjectIDSet.java:62)
at com.tc.util.BasicObjectIDSet.insertRange(BasicObjectIDSet.java:47)
at com.tc.util.ObjectIDSet.deserializeFrom(ObjectIDSet.java:99)
at com.tc.net.protocol.tcm.TCMessageImpl.getObject(TCMessageImpl.java:248)
at com.tc.object.msg.RequestManagedObjectMessageImpl.hydrateValue(RequestManagedObjectMessageImpl.java:61)
at com.tc.net.protocol.tcm.TCMessageImpl.hydrate(TCMessageImpl.java:166)
at com.tc.net.protocol.tcm.HydrateHandler.handleEvent(HydrateHandler.java:33)
at com.tc.async.impl.StageImpl$WorkerThread.run(StageImpl.java:188)