In my last piece about instrumentation I (hopefully) showed the importance of instrumenting your code – in many ways it is one of the major “unwritten� requirements in any project – along with a good logging strategy. The tough question is what do you instrument and what are the options?

As with most things in development there are many options and variants – you just need to choose what feels right for your project. A passionate and professional developer after a few years in the minefield of software development gets to know what will and more importantly what will not work in a situations.

Ninety percent of software development involves something (user/software) invoking a method or command on some target object. Some of the things you can track from that is:

  1. How long the invocation took.

  2. The result of the invocation (success, error etc).

  3. The average time invocation of this method/command.

  4. The maximum time of any invocation.

  5. The minimum time of any invocation.

  6. The invocation/min/max/average times for 1-5 for the last n commands.

  7. Values 1-5 but broken down by success/error etc – can help if “success� times take a lot longer than the errors.

  8. How much memory was used.

  9. How many times method/command has run.

  10. Tallies on errors.

  11. Deviation counters – How many times <1 second, <10 second, <100 sec, <1000 sec and so on. For All/success/error etc.

  12. Tallies by hour, day, week, month etc.

  13. Throughput calculation by the second, minute,hour,day depending on the domain.

  14. Error rates by second, minute,hour,day depending on the domain.

  15. Drop rates – in some systems requests can time out. You might need to track these.

  16. Uptime.

  17. Load – number of concurrent invocations etc.

  18. Memory usage (very very rough in Java).

  19. Data transfer – if you can calculate request and response sizes.

This list is by no means complete, and if you have any suggestions then drop me a line at gary(at)garyleeson.com.

You can also from the above work out what invocations were executing concurrently which can help in figuring out the hog processes when certain “odd� things occur that depend on when and what things are running; For example when two processes use a lock to access some resource. When run individually everything runs fine. When they run concurrently one gets the lock and the other has to wait; if the process that has to wait has a hard time constraint then you might have issues.

Another thing to do is make sure that all this is kept as an in-memory database for performance reasons – using a traditional DB backend such as postgres could have quite an impact; this does not prevent you from building in a mechanism to flush to backing store during quiet periods or every 5 minutes or so.