Next: 3. Analysis

Back: 2.3. Occam

Up: 2. The Programming Models

2.4. PARIS

Of all the programming models presented here, the Connection Machine assembly language PARIS is the least obvious choice for implementing sample generation. Most composers employ a variety of synthesis techniques in each piece, and this seems to require mu ltiple control threads. The Connection Machine belongs to the class of SIMD machines; Single Instruction stream, Multiple Data stream. In other words, it performs a single stream of opera tions on many sets of data simultaneously. The only way to simulate multiple control threads is by multiplexing them. During such a simulation, processors representing certain notes or unit generators would execute a part of their algorithm while the oth er processors would sit idle. As the number of algorithms to be multiplexed increases, processor utilization plummets.

1)    A  B  *  D  E  *
         |     |
2)    *  B  C  D  *  F
Figure 4 - Shared execution example

Either an intelligent compiler or a run-time system could improve processor utilization by looking inside the control threads and finding shared instruction sequences. These sequences could be executed for both control threads simultaneously. For example, if one thread consisted of four su broutines ABDE and a second thread consisted of BCDF, one could execute ABCDEF, which would be significantly shorter than the naive ABDEBCDF schedule (see Figure 4). Only the unique portions of control threads should be multiplexed; shared portions should share execution. Shared instruction sequences might be as large as unit generators or as small as individual instructions. Run-time overhead must be considered; when a shared sequence begins or ends, each processor checks to see whether it should execute the next segment. Each processor stores this information in a context bit. The overhead of setting context bits may outweigh the savings from sharing execution of small instruction sequences.

for each instrument i in the orchestra
  initialize an array Ai[0..pieceLength] to zeroes
  find the set N of notes in the score played by i
  place the state information for N on the CM-2
  compute samples for N using i and store results in Ai
  write Ai to the DataVault

initialize an array B[0..pieceLength] to zeroes

for each instrument I in the orchestra
  read Ai from the DataVault into the CM-2
  add Ai to B

write B to the DataVault
Figure 5 - Connection Machine synthesis framework

The assertion that processor utilization plummets as the number of algorith ms multiplexed increases presumes that all the instances of all the control threads will fit on the machine at once. Suppose instead that each control thread requires the entire machine; the machine would be fully utilized. Figure 5 shows pseudo-code for this idea. An orchestra with n different timbres will require n passes over the score. During the ith pass, the system computes all the samples from notes played by the i th timbre and writes them to a large, high-speed parallel disk system, the DataVault. After the n passes are complete, the system sums all n disk files to produce the full synthesis.

synthesize()
{
  short i, j;

  for(i = 0; i < maxTableEntries; i++) {
    CM_set_context();
    CM_u_move_constant_1L(arrayTemp, i, 16);
    CM_u_le_1L(arrayTemp, numTableEntries, 16);
    CM_logand_context_with_test();

    CM_aref_2L(tempA, phases, arrayTemp,
               32, 16, maxTableEntries, 32);
    CM_aref_2L(tempB, phaseIncrs, arrayTemp, 32, 16,
       maxTableEntries, 32);
    CM_u_move_constant(arrayTemp, 0, 16);

    for(j = 0; j < sampleTableSize; j++) {
      CM_aref_2L(tempD, sampleTable, arrayTemp, 16, 16,
                 sampleTableSize, 16);
      CM_f_sin_2_1L(tempC, tempA, 23, 8);
      CM_f_add_2_1L(tempA, tempB, 23, 8);
      CM_f_multiply_constant_2_1L(tempC,
                 (16383.0 / maxSimul), 23, 8);
      CM_s_f_round_2_2L(tempE, tempC, 16, 23, 8);
      CM_s_add_2_1L(tempD, tempE, 16);
      CM_aset_2L(tempD, sampleTable, arrayTemp, 16, 16,
                 sampleTableSize, 16);
      CM_u_add_constant_2_1L(arrayTemp, 1, 16);
    }
  }
}
Figure 6 - C/PARIS code for a sample-wise sine wave generator

Once the sample generation problem is divided into passes, the passes can be individually optimized to suit the synthesis algorithm. If a synthesis algorithm has a closed form, each CM-2 processor will represent a sample (or a group of samples). If a synt hesis algorithm has an open form, the processors will represent notes played by that algorithm. Figure 6 shows C/Paris code for a sine wave oscillator, a simple closed form unit generator. Each processor holds a table specifying which notes sound during its samples.

The CM-2 is controlled by a host computer, usually a Sun or Vax running UNIX. To permit multiple simultaneous users on the CM-2, the host computer interface contains several sequencers, each driving eight thousand processors. By controlling several sequenc ers from several UNIX processes, one could bring several instruction streams to bear on the same problem. The Connection Machines high speed disk system (the DataVault) could be used to communicate between sequencers.


[Bill's Home Page] Comments to walker@shout.net