Performance and Optimization

Performance and Optimization

How to Understand Performance Problems

Learning to understand the performance of a running system is unavoidable. Even if you understand perfectly precisely the cost of the code you write, your code will make calls into other software systems that you have little control over or visibility into. However, in practice performance problems are a little different and a little easier than debugging in general.

Suppose that you or your customers consider a system or a subsystem to be too slow. Before you try to make it faster, you must build a mental model of why it is slow. To do this you can use a profiling tool or a good log to figure out where the time or other resources are really being spent. There is a famous dictum that 90% of the time will be spent in 10% of the code. I would add to that the importance of input/output expense (I/O) to performance issues. Often most of the time is spent in I/O in one way or another. Finding the expensive I/O and the expensive 10% of the code is a good first step to building your mental model.

There are many dimensions to the performance of a computer system, and many resources consumed. The first resource to measure is wall-clock time, the total time that passes for the computation. Logging wall-clock time is particularly valuable because it can inform about unpredictable circumstances that arise in situations where other profiling is impractical. However, this may not always represent the whole picture. Sometimes something that takes a little longer but doesn't burn up so many processor seconds will be much better in the computing environment you actually have to deal with. Similarly, memory, network bandwidth, database or other server accesses may, in the end, be far more expensive than processor seconds.

Contention for shared resources that are synchronized can cause deadlock and starvation. Deadlock is the inability to proceed because of improper synchronization or resource demands. Starvation is the failure to schedule a component properly. If it can be at all anticipated, it is best to have a way of measuring this contention from the start of your project. Even if this contention does not occur, it is very helpful to be able to assert that with confidence.

How to Fix Performance Problems

Most software projects can be made with relatively little effort 10 to 100 times faster than they are at the time they are first released. Under time-to-market pressure, it is both wise and effective to choose a solution that gets the job done simply and quickly, but less efficiently than some other solution. However, performance is a part of usability, and often it must eventually be considered more carefully.

The key to improving the performance of a very complicated system is to analyse it well enough to find the bottlenecks, or places where most of the resources are consumed. There is not much sense in optimizing a function that accounts for only 1% of the computation time. As a rule of thumb you should think carefully before doing anything unless you think it is going to make the system or a significant part of it at least twice as fast. There is usually a way to do this. Consider the test and quality assurance effort that your change will require. Each change brings a test burden with it, so it is much better to have a few big changes.

After you've made a two-fold improvement in something, you need to at least rethink and perhaps reanalyze to discover the next-most-expensive bottleneck in the system, and attack that to get another two-fold improvement.

Often, the bottlenecks in performance will be an example of counting cows by counting legs and dividing by four, instead of counting heads. For example, I've made errors such as failing to provide a relational database system with a proper index on a column I look up a lot, which probably made it at least 20 times slower. Other examples include doing unnecessary I/O in inner loops, leaving in debugging statements that are no longer needed, unnecessary memory allocation, and, in particular, inexpert use of libraries and other subsystems that are often poorly documented with respect to performance. This kind of improvement is sometimes called low-hanging fruit, meaning that it can be easily picked to provide some benefit.

What do you do when you start to run out of low-hanging fruit? Well, you can reach higher, or chop the tree down. You can continue making small improvements or you can seriously redesign a system or a subsystem. (This is a great opportunity to use your skills as a good programmer, not only in the new design but also in convincing your boss that this is a good idea.) However, before you argue for the redesign of a subsystem, you should ask yourself whether or not your proposal will make it five to ten time better.

How to Optimize Loops

Sometimes you'll encounter loops, or recursive functions, that take a long time to execute and are bottlenecks in your product. Before you try to make the loop a little faster, spend a few minutes considering if there is a way to remove it entirely. Would a different algorithm do? Could you compute that while computing something else? If you can't find a way around it, then you can optimize the loop. This is simple; move stuff out. In the end, this will require not only ingenuity but also an understanding of the expense of each kind of statement and expression. Here are some suggestions:

  • Remove floating point operations.

  • Don't allocate new memory blocks unnecessarily.

  • Fold constants together.

  • Move I/O into a buffer.

  • Try not to divide.

  • Try not to do expensive typecasts.

  • Move a pointer rather than re-computing indices.

The cost of each of these operations depends on your specific system. On some systems compilers and hardware do these things for you. Clear, efficient code is better than code that requires an understanding of a particular platform.

How to Deal with I/O Expense

For a lot of problems, processors are fast compared to the cost of communicating with a hardware device. This cost is usually abbreviated I/O, and can include network cost, disk I/O, database queries, file I/O, and other use of some hardware not very close to the processor. Therefore building a fast system is often more a question of improving I/O than improving the code in some tight loop, or even improving an algorithm.

There are two very fundamental techniques to improving I/O: caching and representation. Caching is avoiding I/O (generally avoiding the reading of some abstract value) by storing a copy of that value locally so no I/O is performed to get the value. The first key to caching is to make it crystal clear which data is the master and which are copies. There is only one master - period. Caching brings with it the danger that the copy sometimes can't reflect changes to the master instantaneously.

Representation is the approach of making I/O cheaper by representing data more efficiently. This is often in tension with other demands, like human readability and portability.

Representations can often be improved by a factor of two or three from their first implementation. Techniques for doing this include using a binary representation instead of one that is human readable, transmitting a dictionary of symbols along with the data so that long symbols don't have to be encoded, and, at the extreme, things like Huffman encoding.

A third technique that is sometimes possible is to improve the locality of reference by pushing the computation closer to the data. For instance, if you are reading some data from a database and computing something simple from it, such as a summation, try to get the database server to do it for you. This is highly dependent on the kind of system you're working with, but you should explore it.

How to Manage Memory

Memory is a precious resource that you can't afford to run out of. You can ignore it for a while but eventually you will have to decide how to manage memory.

Space that needs to persist beyond the scope of a single subroutine is often called heap allocated. A chunk of memory is useless, hence garbage, when nothing refers to it. Depending on the system you use, you may have to explicitly deallocate memory yourself when it is about to become garbage. More often you may be able to use a system that provides a garbage collector. A garbage collector notices garbage and frees its space without any action required by the programmer. Garbage collection is wonderful: it lessens errors and increases code brevity and concision cheaply. Use it when you can.

But even with garbage collection, you can fill up all memory with garbage. A classic mistake is to use a hash table as a cache and forget to remove the references in the hash table. Since the reference remains, the referent is non-collectable but useless. This is called a memory leak. You should look for and fix memory leaks early. If you have long running systems memory may never be exhausted in testing but will be exhausted by the user.

The creation of new objects is moderately expensive on any system. Memory allocated directly in the local variables of a subroutine, however, is usually cheap because the policy for freeing it can be very simple. You should avoid unnecessary object creation.

An important case occurs when you can define an upper bound on the number of objects you will need at one time. If these objects all take up the same amount of memory, you may be able to allocate a single block of memory, or a buffer, to hold them all. The objects you need can be allocated and released inside this buffer in a set rotation pattern, so it is sometimes called a ring buffer. This is usually faster than heap allocation.

Sometimes you have to explicitly free allocated space so it can be reallocated rather than rely on garbage collection. Then you must apply careful intelligence to each chunk of allocated memory and design a way for it to be deallocated at the appropriate time. The method may differ for each kind of object you create. You must make sure that every execution of a memory allocating operation is matched by a memory deallocating operation eventually. This is so difficult that programmers often simply implement a rudimentary form of garbage collection, such as reference counting, to do this for them.

Last updated