Friday, September 10, 2010

Terracotta Operator Events Console

With the 3.3 release of Terracotta we have come up with a brilliant feature called "Operator Events Console". The main idea behind this project was to come up with a dynamic UI which will inform the operator of Terracotta cluster of some very important events which he/she might be interested in to ensure the smooth operation of the cluster. This feature is enterprise only and is not available for open source kit. In this blog i will try to explain how does the new feature look and what all operations it comes up with.

To take an example i will have a cluster with two server stripes (multiple server stripes are enterprise only feature) each having one active and one passive server. Click on config to download the tc config i used for the demonstration

I started two servers (server1 and server3) from two different stripes (so that cluster is ready for action) since the dev console doesn't get ready till at least one of the sever in each stripe is not up and running. Then start the dev console and let it connect to "localhost:9521". If you list "Monitoring" tab in the console left pane then you will see a new item called "Operator Events". Below is the screen shot for it. Once you click that it will show you what all important events have happened till now.


I will now start the two more server and start and kill some clients and run some DGC cycles to get some important events. After this the operator events console shows like the screenshot below.

If you see then the new events that came through are in bold text and the one we already saw when we started the dev console are not. This is to show the operators that these new events have happened since you connected the dev console to the cluster.
If you click any of the row the bold text will become non bold which will mark that you have read the event already. Below is the screen shot showing this.

If you want to mark all event as read then you can click the button "Mark All Viewed" and then all the events would be marked as non bold. Below is the screen shot after i clicked that button.

You can filter the events by selecting any kind of filters in the select view drop down list. Below are the screen shots demonstrating this.



Also if you notice then in the node columns we see similar events clubbed (We see two nodes name in the same row). If you hover the mouse over it then it will show a pop up explaining what the other similar event was. Below is the screen shot for it.

The cool thing about this feature is that the operator does not need to monitor all the events at all the time. The dev console can be started at any point of time and we can see all the events happened before. This is because we store a number of events happened in each server to show when dev console is connected. Also there is a limit to the number of events we show in this panel. Both are controlled by tc the following tc properties respectively.

dev.console.max.operator.events = 5000

l2.operator.events.store = 1500

The operator can also export all the events into a text file by clicking "Export" button. This text file can be very handy to be sent in case of debugging for what went wrong in the cluster. Below is the screen shot for the same.

This feature is aimed to reduce the pain point of looking into huge log files to debug what went wrong and also for the operator to know the overall health of the system so that he/she can take corrective measures in case of trouble.

Feed back for this are most welcome.

Wednesday, February 17, 2010

Terracotta Ehcache Sync Write Feature

Terracotta's latest release has a loads of new feature. This blog is to describe Ehcache sync write feature

Sync Write Lock, Old Behavior
The previous version had the following behavior on using sync write locks:

For a sync write lock, on unlock it waits until it gets an acknowledgment from the server that the transaction has been processed and applied on all the other L1s having changes from that transaction.

Basically a call like this:

ManagerUtil.commitLock(lockId, SYNC_WRITE);

This call will wait until all the transactions associated with this lock have been acknowledged by the server i.e. applied on all the L1s and the ack has been received back from all the concerned clients (to which the broadcast was made).

Sync Write Lock, New Behavior
The new release has focused on making the performance of sync write lock better.

In the new implementation server acks the client immediately when it receives a transaction batch from a client. To achieve this server now has a acknowledgment which will ensure that the client knows that the server has received the transaction batch.

So when this ack is received for this transaction, a client simply releases the lock locally (only possible when the lock is greedy). This ensures that this client can make use of the lock but will make all the other clients wait for it (since the lock will only be recalled once all the transactions for this lock will be flushed).

Performance
The numbers are only for 1 client and 1 thread and are only to demonstrate that there is not much degradation in the performance

Number of Elements: 1m
synchronousWrite false : TPS = 19873
synchronousWrite true : TPS = 19094

How to turn it on/off
To enable or disable this we need to add synchronousWrites=true|false in the terracotta element in the ehcache.xml

Below is a sample of the ehcache.xml which has this set as true

<?xml version="1.0" encoding="UTF-8"?>
<ehcache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="ehcache.xsd" updateCheck="true"
monitoring="autodetect" dynamicConfig="true">
<cache name="testCache" maxElementsInMemory="1000000" eternal="false"
timeToIdleSeconds="0" timeToLiveSeconds="0" overflowToDisk="false" synchronousWrites="false" >
<terracotta />
</cache>
<terracottaConfig url="eng03:9510" />
</ehcache>

Here is the Source code for my test