Testing just isn’t good enough anymore

A Contemplate technical briefing

Exploitation of multi-core hardware is essential for achieving performance. But developing software that exploits such hardware is difficult and risky. The fact that concurrent software is inherently non-deterministic means that reliance on testing for software quality assurance is inappropriate and dangerous. Use of static analysis to discover and diagnose concurrency defects is a cost-effective solution.

Key points

  • The results produced by concurrent software depend on uncontrollable aspects of thread execution
  • Lack of repeatability makes testing a fundamentally unreliable QA approach for concurrent software
  • Static analysis does not suffer from the flaws of testing since it can take all thread execution pathways and deployment scenarios into account
  • Contemplate’s ThreadSafe static analysis tool reduces risk by discovering and diagnosing concurrency defects in Java


Modern multicore processors, in which several possibly unrelated computations can be performed at the same time, provide a rich and powerful foundation for building high performance systems. The features of present and future hardware that enable highly concurrent software must be fully exploited to attain the high throughput and low latency required of high-performance software systems.

Exploiting the benefits of multicore hardware has proven to be difficult and risky. Writing software that correctly and safely makes use of concurrency requires careful thought to account for the effects of running in a concurrent environment.

The increasing use of concurrency poses a fundamentally new challenge to software quality assurance processes. The way that threads are scheduled and the order in which events in different threads occur is unpredictable and can differ each time a concurrent program runs. Software that incorrectly accounts for concurrency can therefore contain intermittent defects that elude even the most rigorous testing processes, because the outcome is non-deterministic: errors are not repeatable. Since repeatability is the corner-stone of QA approaches based on testing, a new approach is needed.

Non-deterministic results

A program is deterministic if it always produces the same output, via the same computation steps, for any given input. Most non-concurrent programs are deterministic. Programs whose output depends on the time (or the contents of a web page, entries in a database, etc.) can be regarded as deterministic, where time (or web page contents, database contents, etc.) is an additional input.

Concurrent programs are inherently non-deterministic, with the order of and interaction between events depending on — among other things — the exact scheduling of threads as well as whether and at what points thread execution is pre-empted by some other activity. This fundamental non-determinism is the main reason why concurrent programming is so hard.

A very simple example that illustrates the problem is given by a program having two threads with a shared variable x, where one thread writes 1 to x and the other thread writes 2 to x:

Image: race between two threads

Whether the final value of x is 1 or 2 depends on which write takes place last. (In fact it is more complicated than that, because changes in per-core hardware caches are not immediately propagated to main memory, so the speed of propagation is another uncontrollable factor that influences the result.)

This is an example of a race condition: there is a “race” between the two threads, with the final value of x depending on which thread “wins” the race.

It is easy to understand very simple non-deterministic programs like the one above. The difficulty comes with more complicated programs. To work out the result of running two concurrent threads, we need to consider the results of all of the possible interleavings of events from the two threads. For two threads with one event each, there are two possible interleavings; for two threads with 10 events each, there are 184756 interleavings, and each interleaving may produce a different result.

Under normal operation, there is no way to control which interleaving is taken. So even very extensive testing will at best give a false sense of security: there is no way of knowing which interleavings have been tested and so no way of being sure which untested interleavings will lead to disastrous failures.

Place your bets

As everybody knows, rolling a pair of dice produces non-deterministic results: the outcome depends on the exact way that the dice hit the table and each other before they come to rest, the force with which they were thrown, and their initial orientation. Throw the dice twice and the outcome is unlikely to be the same. The same goes for the result of spinning a roulette wheel. Gamblers would obviously prefer the outcome to be deterministic: just wait to see the outcome once, and then bet on the outcome being the same next time. Casino owners rely on non-determinism: their profit depends on the fact that outcomes are unpredictable.

Now, drawing an analogy with concurrent programs, you — the one who cares about the result of the computation — are the gambler!

Image: a pair of 6-sided dice

Imagine testing that a pair of standard 6-sided dice never produce a result of 12. Of course, that’s not true! But since there is a 1:36 chance of a given roll producing a result of 12, 18 trials are required on average before 12 will appear. One test is not enough. Ten tests are probably not enough. 40 tests might not be enough, if you’re unlucky.

Image: 100-sided die

Now imagine testing that a pair of Zocchihedrons (100-sided dice) never produce a result of 200. Again, that’s not true. But on average, 5000 trials are required before 200 will appear. One test is obviously not enough. 1000 tests are not enough. Even 20000 tests might not be enough, if you’re unlucky.

Back to our analogy with concurrent code. Under the assumption that the developers of a concurrent system are competent and careful, we might expect that most thread interleavings produce correct results. A wrong result — leading perhaps to deadlock or crash — might be produced only rarely, when some unusual thread interleaving is encountered. This is like dealing with 100-sided or 1000-sided dice, not 6-sided dice. Place your bets!

Testing isn’t good enough anymore

Making careful use of synchronisation and other concurrency mechanisms so as to produce deterministic results is vital to keeping risks under control. Unfortunately, over-use of synchronisation can have a very serious impact on performance, and in the worse case can lead to deadlock. It’s easy to get it wrong — there are many pitfalls — and you can’t test whether you’ve got it right! Getting the same result 100 times in a row, or 1000 times, might make a program look deterministic, but there’s no guarantee that the result will be the same on the next attempt.

With non-deterministic programs, correct results under test don’t guarantee correct results in deployment. Correct results during the first week or month of deployment don’t guarantee correct results in the future: the particular unlucky interleaving of threads that leads to catastrophic data corruption might be a one-in-a-million chance or even less likely. The fact that the problematic case only happens once in a blue moon means that it has probably completely escaped testing until it happens one day in production. If you’re lucky, the result will be relatively benign. But if you’re not . . .

Correct results during years of single-core deployment don’t guarantee correct results when the hardware is upgraded to multi-core. Although single-core execution of a concurrent program is in theory non-deterministic to the same extent as multi-core execution, with the same number of possible interleavings, far fewer interleavings are actually encountered in practice because the events of one thread are only interleaved with the events of a different thread when thread execution is pre-empted by the need to perform some other activity. As a consequence, upgrading to multi-core hardware significantly increases the risk that untested interleavings will be encountered.

In order to debug a faulty program, it is typically run in an instrumented environment so that intermediate results can be logged, or so that its execution can be stopped and intermediate results probed. The fact that the debugging environment is so different from the production environment makes intermittent faults even more difficult to track down. A “Heisenbug” is a tongue-in-cheek name for a software bug that disappears when one tries to examine it, by analogy with the “observer effect” in quantum mechanics, where the act of observing affects the system that is observed.

Static analysis to the rescue

A static analysis tool scans the source code (or bytecode) of a codebase, performing a kind of symbolic execution to work out the potential consequences of each line of code. Inconsistencies and defects can be brought to the developers’ attention before the code has even been completely written.

Static analysis is cost-effective for development of non-concurrent programs because it leads to early detection of bugs, meaning that they are much less expensive to fix than if they are caught during testing or deployment. In that case, static analysis can be thought of as testing with all data values for certain kinds of errors.

Static analysis is essential for concurrent programs because testing, even with all data values, isn’t good enough. Static analysis amounts to exhaustive testing for all data values and for all interleavings, for all deployment scenarios.

Static analysis readily accommodates the possibility of non-deterministic outcomes. Going back to our example of rolling dice, imagine a version of static analysis that simply keeps track of bounds on values:

6-sided dice
One 6-sided die: result is [1. .6]
Sum of two 6-sided dice: result is [1. .6] + [1. .6] = [2. .12]
So 12 is possible
100-sided dice
One 100-sided die: result is [1. .100]
Sum of two 100-sided dice: result is [1. .100] + [1. .100] = [2. .200]
So 200 is possible

The same essential idea can be adapted to static analysis of non-deterministic concurrent programs, by taking account of all possible outcomes. This can be done without the huge computational overhead of checking all of the possible interleavings individually.

An example

The following simple example in Java demonstrates how easy it is to introduce a concurrency defect when extending a correct system to add new functionality.

Stage One: A concurrency-correct class is written

Our example starts with a Java class BankAccount for representing bank accounts that may be accessed concurrently.

The original developer has designed this class with two Application Programming Interfaces (APIs). The first is a public API through which the current balance can either be read or be altered by credits and debits. The public API is intended to be safely used concurrently.

The second API is an internal API intended for use by subclasses that add additional features to the basic BankAccount class. This API offers direct access to read and modify the current balance. In order to allow for complex sequences of operations on the balance to happen atomically, the internal API does not perform any synchronization itself. Instead, a lock object is provided to subclasses via a protected field lock. The concurrent design of this internal API can be summed up in the following rule:

All clients must synchronize on the object in the field lock while accessing the internal API.

The actual implementation of the BankAccount class goes as follows.

package com.contemplateltd.example;

public class BankAccount {

    protected final Object lock = new Object();

    private int balance;

    public BankAccount(int initialBalance) {
        if (initialBalance < 0)
            throw new IllegalArgumentException("initial balance must be >= 0");

        balance = initialBalance;

This code declares a private field balance that contains the internal state of a BankAccount object, and a constructor to initialise that field. The protected final field lock declared at the start of the method contains the object to be used for synchronizing access to the internal API. Note that the constructor does not need to synchronize its write to the balance field.

    protected int readBalance() {
        return balance;

    protected void adjustBalance(int adjustment) {
        balance = balance + adjustment;

This pair of method declarations implements the internal API. Since the field balance has been declared private, we use two protected methods to allow subclasses of BankAccount to manipulate it.

public void credit(int amount) {
        if (amount < 0)
            throw new IllegalArgumentException("credit amount must be >= 0");

        synchronized (lock) {

    public void debit(int amount) throws Exception {
        if (amount < 0)
            throw new IllegalArgumentException("debit amount must be >= 0");

        synchronized (lock) {
            if (readBalance () - amount < 0)
                throw new Exception("insufficient funds");

    public void tax(double taxRate) throws Exception {
        if (taxRate < 0.0 || taxRate > 1.0)
            throw new IllegalArgumentException("tax amount must be >= 0.0 and <= 1.0");

        synchronized (lock) {
            int tax = (int)(taxRate * (double)readBalance());

    public int getCurrentBalance() {
        synchronized (lock) {
            return readBalance();

Finally, we have declarations of the four methods of the public API. These methods adhere to the concurrent design that we stated above: a lock on the object in the field lock is acquired before calling the internal API methods. The implementations of the debit() and tax() methods demonstrate why the internal API cannot handle synchronization by itself: the balance check and the decrement must happen atomically to avoid a race condition. Atomicity is achieved by wrapping the sequence of operations in a single synchronized block.

Stage two: Software evolves, and a concurrency defect is introduced

Some time after the BankAccount class has been designed and implemented, another developer is given the task of implementing a class for “bonus” bank accounts. The class BonusBankAccount includes an additional method that applies a fixed bonus amount to the current balance:

package com.contemplateltd.example;

public class BonusBankAccount extends BankAccount {

    private final int bonus;

    public BonusBankAccount(int initialBalance, int bonus) {

        if (bonus < 0)
            throw new IllegalArgumentException("bonus must be >= 0");

        this.bonus = bonus;

    public void applyBonus() {

The developer of BonusBankAccount has introduced a serious concurrency defect.

A consumer of instances of BonusBankAccount may reasonably assume that, as for instances of its super class, it is safe to call public API methods from multiple threads concurrently. However, the applyBonus() method in BonusBankAccount does not adhere to the concurrency policy intended by the developer of BankAccount. No synchronization occurs on the object in the field lock, even though the internal API method adjustBalance() was invoked.

When executed in a concurrent environment, the effects of the lack of synchronization include, but are not limited to, the following:

  • Updates even from the correctly synchronized credit() and debit() methods in BankAccount may be lost if bonuses are applied concurrently. Updating the balance field in the adjustBalance() method involves a read of the balance and then a write. If the balance field is updated between the read and the write, then the intervening update will be lost. Without correct synchronization, atomicity of updates to the balance field cannot be guaranteed.
  • Lack of synchronization can cause updates to the shared balance field to be delayed for an arbitrarily long time. Therefore, multiple cores invoking applyBonus() may see starkly different views of the balance field. This problem is exacerbated on machines with many cores distributed over multiple physical chips, as is common in server-class hardware.

The effects of this mistake can be observed by setting up several threads to concurrently apply thousands of bonuses and credits to the same account over a short period of time. On a single-core processor, no errors were observed in 160,000 transactions. On dual-core hardware, 30 transactions were lost. On quad-core hardware, 44 transactions were lost.

Discovering and diagnosing the defect with ThreadSafe

With BonusBankAccount as it is, the concurrency defect in the applyBonus() method could go undetected for many months, running in production, silently corrupting the value of the balance field.

Static Analysis can help to discover and diagnose this kind of concurrency defect, long before software enters production. Contemplate’s ThreadSafe is a static analysis tool for detecting and diagnosing concurrency defects in Java code. ThreadSafe’s results can be presented in the Eclipse IDE‘s Java perspective during development as illustrated here, or via the Sonar software quality platform for a higher level view.

Applying ThreadSafe’s static analysis to this example produces the following report in Eclipse, directing our attention to the balance field:

Image: A screenshot of the ThreadSafe view in the Eclipse IDE

To investigate further, we right-click on the finding to get ThreadSafe to show us where the accesses to the balance field occur, and the locks that are being used to guard each access.

Image: A screenshot of ThreadSafe's Guards view in the Eclipse IDE

This view shows us the list of locations in the source code where the balance field is accessed from. Accesses that are likely to be problematic are marked with a warning symbol. Across the top of the view are descriptions of the objects that have been synchronized on for these accesses. In this relatively simple example, only the object referred to by the lock field is relevant, so this is the only object displayed by ThreadSafe.

The first row refers to the access to balance in the readBalance() method. Since this is only ever called from the correctly synchronized credit(), debit() and tax() methods, ThreadSafe has shown that the lock BankAccount.this.lock is Always Held. The second and third rows refer to the read and write accesses in the adjustBalance() method. ThreadSafe has indicated that the lock is only Maybe Held for these accesses, indicating that there are contexts in which these accesses may be unsynchronized.

By double clicking on each of the access locations listed in this view, we are taken to the lines in the implementation of BankAccount where the accesses to balance occur. From this information, we learn two useful facts.

  1. Since all of the accesses appear within the implementation of the internal API, we can deduce that the original developer of the BankAccount class probably intended that all uses of the internal API should synchronize on the object in the BankAccount.lock field.
  2. ThreadSafe has highlighted that some accesses do not synchronize on the object in the BankAccount.lock field: this is indicated by the Maybe Held label. By using Eclipse’s Call Hierarchy feature to discover calls to the internal API methods, the failure to synchronize can be traced back to the applyBonus() method in the BonusBankAccount class.

The developer of BonusBankAccount can now amend their code to take into account the concurrency policy of the BankAccount class. The new implementation of the applyBonus() method is:

    public void applyBonus() {
        synchronized (lock) {

Re-running ThreadSafe on the amended code produces no reports. The concurrency defect introduced by the developer of BonusBankAccount has been remediated.

Related reading

Contemplate Ltd. Maintaining safe concurrent code with ThreadSafe. April 2013.

Victor Grazi. Exterminating Heisenbugs. InfoQ 2012.

Edward A. Lee. The problem with threads. IEEE Computer 39(5):33-42, 2006. [UC Berkeley technical report version]

Ben Ylvisaker. Multi-core processors are a headache for multithreaded code. GrammaTech 2013.