Next: 2.3. Occam

Back: 2.1. C++ with Encore Parallel Threads

Up: 2. The Programming Models

2.2. Concurrent Aggregates

Improvements to Kiwi would center on eliminating the sample summation bottleneck and exploiting more unit generator concurrency while maintaining the encapsulation of data and functions. When a compound object is built out of atomic objects, one particular object receives all the external messages (i. e., the root of a tree of objects). This object is a bottleneck, and divides the available parallelism among its children. Concurrent Aggregates [Chien & Dally] introduces a new abstraction, the aggregate, to solve this problem. An aggregate, or multi-access object, routes external messages to randomly selected representatives, el iminating the serialization bottleneck.

In Kiwi, the Collector was a serial bottleneck. The problem was alleviated by reducing locking granularity on the sample buffer, so that several threads could write to it at once. Unfortunately, C++ provides no mechanism for distributed objects; placing the entire Collector on a single processor in a a message-passing computer would represent a tremendous communication bottleneck.

(aggregate summingArray values[valuesPerSite]
   (parameters numberOfRepresentatives)
   (initial numberOfRepresentatives
      (forall numberOfElements from 1 below groupSize
 (forall temporary from 1 below valuesPerSite
    (setValue (sibling group index) 0)))))

(handler summingArray add (index value)
   (forward
      (internalAdd
         (sibling group
                  (/ index valuesPerSite))
         index value)))

(handler summingArray internalAdd (index value2)
   (setValue self
             (+ value[index mod valuesPerSite] value2)))
Figure 2 - CA pseudo-code for a distributed Collector

With CA, the Collector becomes a multi-access parallel array (see Figure 2). Each processor in the aggregate holds several array values. CA routes incoming messages to random members of the aggregate. When a aggregate representative r eceives receives a sample, it forwards the sample to the appropriate sibling, where the actual summation takes place. Since the array is distributed over many processors, no communication hot spots will develop. Communication costs in a message-passing mul ticomputer could be lowered further using a placement policy based on information from the score. A unit generator would be placed near the portion of the distributed Collector where its samples would be stored, and would send samples directly to the appro priate Collector representative, bypassing the random routing scheme.

CA could also achieve parallelism by implementing some unit generators as multi-access aggregates. For example, frequency modulators and lookup table oscillators provide samples in arbitrary order; they could be implemented by aggregates with as many membe rs as the hardware can efficiently support. Thus, many samples for a given FM sound can be computed simultaneously. Serial and parallel unit generators can comfortably coexist in the sa me computation. For pieces orchestrated largely with FM, multi-access objects will yield considerably more parallelism than simple note-level parallelism.

These ideas have not been implemented in CA. Since, however, they are straightforward applications of CA features, it is reasonable to assume that an actual implementation would have the predicted parallelism.


[Bill's Home Page] Comments to walker@shout.net