Performance tuning systems

Posted in software-engineering with tags performance-tuning large-scale visibility - Wednesday, August 1, 2018

Once a software system is running correctly it is natural to ask, “can we do more with less?” Can we get the system to be more performant without needing to add more capacity?

This can be important as your business grows and succeeds, because you may see increases in:

systems scaling and therefore the resource requirements that will scale in unison with any potential inefficiencies.
system criticality and therefore demands for lower latencies from customers.

Once the motivation for the performance tuning is understood, it becomes possible to target the investigation into how the current system behaves and to look for opportunities for optimisation.

As discussed in more detail below, the specific technical choices for achieving greater performance can be quite varied. However, by keeping the business goals in mind, we are often able to uncover good opportunities that will aid your business in reaching its goals with the software systems that underpin its operations.

more...

Following the data flow

While there are many different techniques and strategies for increasing system performance, the starting point is always increasing visibility of the system performance characteristics.

This means that the system may need to be instrumented so that statistics and telemetry can be gathered from the system. This might be done via:

application specific counters that are published via tools like statsd, graphana, prometheus etc.
existing tools like VisualVM for Java or strace for Unix processes.
combinations of code changes and tools like dtrace.
system measurements like CPU usage, network tx/rx values etc.

With this data in hand, we can start to look for opportunities for improvement. While this is an exploratory exercise, and finding optimisations can not be guaranteed, there are often low hanging fruit. These might include:

accidental inclusion of debug logging,
double copying of memory buffers,
unnecessary allocation of objects,
overly conservative synchronisation locks,
etc.

Once these avenues have been exhausted, we might find that the performance gains are sufficient for the business needs. That is, the results may lower latency to acceptable levels for individual users, or resource usage might make it viable for the business to engage in the next phase of scaling.

However, where the performance gains are still not sufficient we can start to look at gaining a deeper understanding of the data flows and looking for other candidates for change. This might involve more technical changes such as:

converting from boxed types to primitive types in order to avoid memory allocation overhead or reduce the complete memory footprint,
restructuring subsystems to reduce unnecessary data movement,
inverting the use of I/O buffers to be able to manage populated buffers across the full I/O marshalling and transport stack rather than incurring additional data copies,
converting data structures from lock-based synchronisation to lock-free implementations based on CAS operations and memory barriers,
preallocating bookkeeping data structures and explicitly managing resource pools rather than offloading this work to the garbage collection layer,
introducing back-pressure in between concurrently operating subsystems so as to be able to run closer to capacity without resource breaches,
etc.

As can be seen, there are many different techniques and avenues that could be explored in order to improve the performance of the system. Sometimes these performance gains can be in multiple orders of magnitude, and make the difference to the bottom line that moves a business from being unviable to having a competitive advantage.