Adam Donlin. SE3H
The best way to characterise the problem is by example. Imagine a scenario in which there exists two Destinct processes both operating on a single shared data area. One process, the Producer inserts information into the data area whilst the other process, the Consumer, removes information from that same area. In order for the Producer to insert information into the data area, there must be enough space. The Producer's sole function is to insert data into the data-area, it is not allowed to remove any data from the area. Similarly, for the Consumer to be able to remove information from the data area, there must be information there in the first place. Once again, the sole function of the Consumer is to remove data from the data area. If no data is present then the Consumer is not allowed to insert some data of it's own to later be removed by itself.
In short, the Producer relies on the Consumer to make space in the data-area so that it may insert more information whilst at the same time, the Consumer relies on the Producer to insert information into the data area so that it may remove that information. It therefore follows that a mechanism is required to allow the Producer and Consumer to communicate so that they know when it is safe to attempt to write or read information from the data-area.
Therefore, the solution of the Producer-Consumer problem lies with devising a suitable communication protocol through which the two processes may exchange information.
The definition of such a protocol is the main factor that makes the Producer-Consumer problem interesting in terms of concurrent systems. Not all the processes in a concurrent system operate alone, co-operating processes need a way to communicate. For example, resource management is a fundamental concern in any operating system and can be facilitated in this instance by a suitable Producer/Consumer protocol.
Without such a protocol, processes would be stand alone and a lot of the benefits of abstraction and component virtualisation would be lost.
Assume a scenario with identical characteristics as those which form the basic Producer-Consumer problem, i.e. two processes, one Producer, one Consumer and a shared data area. Furthermore, assume that the code executed by the Producer and the Consumer follows the same pattern as outlined below.
#define MAX SIZE 100
To begin with, we'll assume that the above code has been running for a short period of time and that the data-area (a buffer-array in this case) is now full. The Producer can no longer put any items in the buffer as there is no more room. The producer's reaction to this buffer state is to wait until the Consumer clears some space. When the scheduler next runs the Consumer, it will remove an element from the buffer and then a special if statement will cause the Consumer to send some form of signal to the Producer. The Producer, when next run will receive this message and begin processing again, the Producer will assume that since it has been signalled (or woken up ) there is now space available in the buffer. The integrity of this assumption can be logically deduced because the producer's original reason for halting (or going to sleep ) was that the buffer was full and the only way the Producer would have been awoken would have been by a signal after the Consumer had removed an item from the buffer.
As indicated in the code, this approach requires a counter variable whose contents change to reflect the amount of free buffer slots. The two processes decide when to sleep and when to wakeup each other by referencing and changing the contents of this variable. For example, when the counter variable reaches zero, the Consumer will sleep as the buffer is empty, and vice versa for the Producers sleep condition. Wakeup conditions outlined previously are also implemented by reference to the counter in a similar way.
A trajectory diagram showing a correct execution path for the problem is shown in figure 1. Examining the possible trajectories of this diagram can help give an insight into the mechanics of this solution.
To begin with, we will examine a route of normal execution and assume two facts. Firstly, we assume that the buffer is non-empty (and one slot is available) at point 0, therefore both processes can run. Secondly, assume that the Producer is scheduled to run first and that it will run until it blocks (goes to sleep).
When the Producer begins its execution, it will produce and insert it's first item. To do this, it will test the count variable and wait until the count is less than the maximum number of elements in the buffer. In this instance, count is less than the maximum number of elements by our pre-conditional assumption. Given this, the Producer carries on past point A1 and inserts its item into the buffer. Now, had the buffer been empty, the process would wakeup the Consumer at point B1, but in this instance the buffer was non-empty to begin with and we conclude that the Consumer is already awake. The Producer continues its execution and attempts to insert another item into the buffer, but now the buffer is full, so the Producer has reached point C1 in it's execution and blocks (goes to sleep).
The Consumer would now be scheduled to run and begins its execution. As the Consumer progresses, it will remove an item from the buffer and, after successfully removing an item, if the buffer was previously full, the Consumer will wake up the Producer. The Producer will then reason (once again) that there are empty slots in the buffer and begin inserting data again. If one process blocks, the next action of the remaining process will unblock the other ( or will it? )
The flaw in this solution lies with a series of chance events happening in order. The events which cause the system to fail are visualised in figure 2 and outlined as follows. To begin with, imagine that the buffer is empty and the Consumer is executing. The Consumer will examine the count variable and find it to be zero. If the scheduler steps in at this point (X1), preventing the Consumer from executing it's sleep() call and begins to run the Producer, then we will have the beginnings of the error situation. Imagine now that the Producer inserts an item into the buffer and, since the buffer was empty previously, tries to wake up the Consumer. However, the Consumer was not yet sleeping so it will fail to catch this signal. If the Consumer now gets re-scheduled at point (X2) it will continue execution from the point it left off and go to sleep (Z1). The Producer will now continue it's execution until it fills up the buffer and goes to sleep (Z2). Neither process will ever wake up...
The key factor in the problem is the fact that a process cannot catch a wakeup call if it is already awake. If a solution were to be devised where a process can store it's unused wakeup calls, then the system will work.
One such solution is given as follows. There exists a special variable type called semaphore which has associated with it two function calls, a semaphore can be signal() ed and in so doing, the integer value of that semaphore is increased by one and is used to mimic the wakeup() call. A semaphore can also be wait() 'ed on. Waiting on a semaphore involves decrementing the semaphore each time wait() is called and then progressing past the wait. If the semaphore's value is zero when the wait call is made then the caller will block until another process signals the semaphore, increasing the semaphore's value to something greater than zero.
Both the signal() and wait() calls implemented on semaphores are indivisible atomic actions, possibly defined at an operating system level.
If we now use two semaphores to represent the number of unused (available) buffer slots and the number of occupied buffer slots, call them available and occupied . The introduction of these two semaphores will replace the count variable used in the flawed solution. We must now assume that the Producer and Consumer are executing the following code respectively...
#define MAX SLOTS 100
Either of these signals may cause the process other than the signaller to unblock, if the signalled semaphore was previously zero. Otherwise, the signal will be caught by the semaphore, regardless of the awake/asleep state of the target process and will he held until the target process next calls wait() at which point the semaphore will use up one of it's saved signals.
By this method, the flaw scenario outlined previously can no longer occur. A trajectory diagram of this correct method is given in figure 3.
The execution path through the diagram occurs in the same way as outline for the flawed solution when the processes don't enter the error state. One other difference is that wherever the flawed code/diagram made reference to the count variable, the improved code/diagram makes reference to signalling and waiting on the two semaphores. The error condition shown in figure 1 cannot occur in this trajectory diagram.
To begin with, assume that there are exists a bank of Consumer processes and a bank of Producer processes, all using semaphore based code as outlined previously. Assume further that we have two indicator variables that show the next available slot and the next occupied slot into which processes may read and write. For ease of reference, assume that the two indicators are named WP and RP respectively.
When a process wants to read from the shared buffer, it first waits on the appropriate semaphore to ensure a suitable buffer slot exists and then the process uses the RP indicator to get a handle onto the buffer slot it is interested in. Once this handle has been obtained, the process uses it to read from the buffer slot and then, when finished reading, updates the indicator and signals the the appropriate semaphore to inform the other process bank of the changes in the buffer structure. A similar scenario exists for writing into the buffer slot using the WP indicator.
On appearance, this may seem an adequate system, however the following fundamental error condition exists.
Imagine there are two processes in the same bank that want to modify the buffer and assume further that there are appropriate buffer slots available for each process to perform it's proposed modification.
The scheduler may let the first process P1 perform its task, say write into the buffer slot indicated by WP , however before the process gets a chance to update the indicator so that it points to the next available slot, the scheduler halts it and lets the other process run. The other process, called P2, will now consult the WP indicator variable and begin writing into the same slot that P1 had written it's data into, update the pointer and finish. Overwriting the P1's data is a serious flaw and is further compounded by the fact that, when P1 resumes execution, it will also update the indicator and subsequently introduce an empty buffer slot as full.
The underlying reason for the error scenario surrounds the fact that neither of the two indicators (which are both shared variables between processes in the same bank, respectively) are protected from two tasks in the same bank using one of them simultaneously.
Obviously, the solution would comprise of some method of restricting concurrent access of the processes in the same bank to that bank's shared indicator variable. Such a restriction can be achieved by introducing a third semaphore that will mutually exclude any process in the same bank from modifying the indicator variable associated with that bank when another process is already using it. The modified code for the Producers and Consumers is now as follows...
#define MAX SLOTS 100