Mon 11 Sep 2006
Instrument your code young man – Part 2
Posted by gremlin under Application Servers, Java, Tech
Comments Off
In my last piece about instrumentation I (hopefully) showed the importance of instrumenting your code – in many ways it is one of the major “unwritten� requirements in any project – along with a good logging strategy. The tough question is what do you instrument and what are the options?
As with most things in development there are many options and variants – you just need to choose what feels right for your project. A passionate and professional developer after a few years in the minefield of software development gets to know what will and more importantly what will not work in a situations.
Ninety percent of software development involves something (user/software) invoking a method or command on some target object. Some of the things you can track from that is:
-
How long the invocation took.
-
The result of the invocation (success, error etc).
-
The average time invocation of this method/command.
-
The maximum time of any invocation.
-
The minimum time of any invocation.
-
The invocation/min/max/average times for 1-5 for the last n commands.
-
Values 1-5 but broken down by success/error etc – can help if “success� times take a lot longer than the errors.
-
How much memory was used.
-
How many times method/command has run.
-
Tallies on errors.
-
Deviation counters – How many times <1 second, <10 second, <100 sec, <1000 sec and so on. For All/success/error etc.
-
Tallies by hour, day, week, month etc.
-
Throughput calculation by the second, minute,hour,day depending on the domain.
-
Error rates by second, minute,hour,day depending on the domain.
-
Drop rates – in some systems requests can time out. You might need to track these.
-
Uptime.
-
Load – number of concurrent invocations etc.
-
Memory usage (very very rough in Java).
-
Data transfer – if you can calculate request and response sizes.
This list is by no means complete, and if you have any suggestions then drop me a line at gary(at)garyleeson.com.
You can also from the above work out what invocations were executing concurrently which can help in figuring out the hog processes when certain “odd� things occur that depend on when and what things are running; For example when two processes use a lock to access some resource. When run individually everything runs fine. When they run concurrently one gets the lock and the other has to wait; if the process that has to wait has a hard time constraint then you might have issues.
Another thing to do is make sure that all this is kept as an in-memory database for performance reasons – using a traditional DB backend such as postgres could have quite an impact; this does not prevent you from building in a mechanism to flush to backing store during quiet periods or every 5 minutes or so.