Mon 11 Sep 2006
Instrument your code young man – Part 2
Posted by gremlin under Application Servers,  Java,  Tech	
	Comments Off
In my last piece about instrumentation I (hopefully) showed the importance of instrumenting your code – in many ways it is one of the major “unwritten� requirements in any project – along with a good logging strategy. The tough question is what do you instrument and what are the options?
As with most things in development there are many options and variants – you just need to choose what feels right for your project. A passionate and professional developer after a few years in the minefield of software development gets to know what will and more importantly what will not work in a situations.
Ninety percent of software development involves something (user/software) invoking a method or command on some target object. Some of the things you can track from that is:
- 
How long the invocation took. 
- 
The result of the invocation (success, error etc). 
- 
The average time invocation of this method/command. 
- 
The maximum time of any invocation. 
- 
The minimum time of any invocation. 
- 
The invocation/min/max/average times for 1-5 for the last n commands. 
- 
Values 1-5 but broken down by success/error etc – can help if “success� times take a lot longer than the errors. 
- 
How much memory was used. 
- 
How many times method/command has run. 
- 
Tallies on errors. 
- 
Deviation counters – How many times <1 second, <10 second, <100 sec, <1000 sec and so on. For All/success/error etc. 
- 
Tallies by hour, day, week, month etc. 
- 
Throughput calculation by the second, minute,hour,day depending on the domain. 
- 
Error rates by second, minute,hour,day depending on the domain. 
- 
Drop rates – in some systems requests can time out. You might need to track these. 
- 
Uptime. 
- 
Load – number of concurrent invocations etc. 
- 
Memory usage (very very rough in Java). 
- 
Data transfer – if you can calculate request and response sizes. 
This list is by no means complete, and if you have any suggestions then drop me a line at gary(at)garyleeson.com.
You can also from the above work out what invocations were executing concurrently which can help in figuring out the hog processes when certain “odd� things occur that depend on when and what things are running; For example when two processes use a lock to access some resource. When run individually everything runs fine. When they run concurrently one gets the lock and the other has to wait; if the process that has to wait has a hard time constraint then you might have issues.
Another thing to do is make sure that all this is kept as an in-memory database for performance reasons – using a traditional DB backend such as postgres could have quite an impact; this does not prevent you from building in a mechanism to flush to backing store during quiet periods or every 5 minutes or so.