Next: 3. Analysis
Back: 2.3. Occam
1) A B * D E *
| |
2) * B C D * F
Figure 4 - Shared execution example Either an intelligent compiler or a run-time system could improve processor utilization by looking inside the control threads and finding shared instruction sequences. These sequences could be executed for both control threads simultaneously. For example, if one thread consisted of four su broutines ABDE and a second thread consisted of BCDF, one could execute ABCDEF, which would be significantly shorter than the naive ABDEBCDF schedule (see Figure 4). Only the unique portions of control threads should be multiplexed; shared portions should share execution. Shared instruction sequences might be as large as unit generators or as small as individual instructions. Run-time overhead must be considered; when a shared sequence begins or ends, each processor checks to see whether it should execute the next segment. Each processor stores this information in a context bit. The overhead of setting context bits may outweigh the savings from sharing execution of small instruction sequences.
for each instrument i in the orchestra initialize an array Ai[0..pieceLength] to zeroes find the set N of notes in the score played by i place the state information for N on the CM-2 compute samples for N using i and store results in Ai write Ai to the DataVault initialize an array B[0..pieceLength] to zeroes for each instrument I in the orchestra read Ai from the DataVault into the CM-2 add Ai to B write B to the DataVaultFigure 5 - Connection Machine synthesis framework
The assertion that processor utilization plummets as the number of algorith ms multiplexed increases presumes that all the instances of all the control threads will fit on the machine at once. Suppose instead that each control thread requires the entire machine; the machine would be fully utilized. Figure 5 shows pseudo-code for this idea. An orchestra with n different timbres will require n passes over the score. During the ith pass, the system computes all the samples from notes played by the i th timbre and writes them to a large, high-speed parallel disk system, the DataVault. After the n passes are complete, the system sums all n disk files to produce the full synthesis.
synthesize()
{
short i, j;
for(i = 0; i < maxTableEntries; i++) {
CM_set_context();
CM_u_move_constant_1L(arrayTemp, i, 16);
CM_u_le_1L(arrayTemp, numTableEntries, 16);
CM_logand_context_with_test();
CM_aref_2L(tempA, phases, arrayTemp,
32, 16, maxTableEntries, 32);
CM_aref_2L(tempB, phaseIncrs, arrayTemp, 32, 16,
maxTableEntries, 32);
CM_u_move_constant(arrayTemp, 0, 16);
for(j = 0; j < sampleTableSize; j++) {
CM_aref_2L(tempD, sampleTable, arrayTemp, 16, 16,
sampleTableSize, 16);
CM_f_sin_2_1L(tempC, tempA, 23, 8);
CM_f_add_2_1L(tempA, tempB, 23, 8);
CM_f_multiply_constant_2_1L(tempC,
(16383.0 / maxSimul), 23, 8);
CM_s_f_round_2_2L(tempE, tempC, 16, 23, 8);
CM_s_add_2_1L(tempD, tempE, 16);
CM_aset_2L(tempD, sampleTable, arrayTemp, 16, 16,
sampleTableSize, 16);
CM_u_add_constant_2_1L(arrayTemp, 1, 16);
}
}
}
Figure 6 - C/PARIS code for a sample-wise sine wave generator Once the sample generation problem is divided into passes, the passes can be individually optimized to suit the synthesis algorithm. If a synthesis algorithm has a closed form, each CM-2 processor will represent a sample (or a group of samples). If a synt hesis algorithm has an open form, the processors will represent notes played by that algorithm. Figure 6 shows C/Paris code for a sine wave oscillator, a simple closed form unit generator. Each processor holds a table specifying which notes sound during its samples.
The CM-2 is controlled by a host computer, usually a Sun or Vax running UNIX. To permit multiple simultaneous users on the CM-2, the host computer interface contains several sequencers, each driving eight thousand processors. By controlling several sequenc ers from several UNIX processes, one could bring several instruction streams to bear on the same problem. The Connection Machines high speed disk system (the DataVault) could be used to communicate between sequencers.