EMS: 22 channel Vocoder Manual
- Published in ? date ? by EMS, written by ?
- INTRODUCTION
- The EMS Vocoder (voice-coder) is a 22-channel vocoder that has been
specifically designed to process speech and other sounds in a variety of new
and interesting ways.
In the following chapters, I will explain its many functions and describe how
to operate the machine.
I would now like to discuss the way in which natural speech mechanisms work
and how the Vocoder analyses them. The speech production system is a very
elaborate machine.
When, for instance, vowel sounds are being voiced, that is when the vocal
cords are oscillating, it acts like a reed instrument. But, when whistle
sounds are being generated, it acts like a wind instrument. It can also
generate articulated noise sounds. Therefore, a brief description of the
speech production system would be a network composed of a set of variable
resonators which can be excited either by the vocal cords at a variable pitch
amplitude or by a flow of air (from the lungs) producing noise or whistles.
This system is described in Fig. 2.

- The vocal cords, airflow and the resonators are all controlled by the brain
and manipulated in such a way as to produce intelligible speech (except
in the case of politicians).
Consider, for example, the following piece of speech: "EMS Vocoder",
Fig. 3.

- Note that the 'S' sounds, (I will call them the unvoiced parts of speech)
are generated by noise excitation, and that the rest of the words
(these I will call the voiced parts) are generated by the vocal cord
excitation. Also note that the pitch of the vocal cord excitation changes
during the words.
Now, this information is only part of the data necessary to reconstruct the
speech. If you were to listen to a noise source and an oscillator being
controlled in the same way as they are in the vocal tract, you would not
hear anything speech-like at all. The mechanism that transforms this sound
into speech is the set of resonators. That is, the throat, nose, mouth
in conjunction with the tongue. Therefore, to be able to reproduce speech,
the Vocoder has to analyse the signal in the following way. It must
decide whether or not the speech is voiced or unvoiced. It must calculate
the pitch of the voiced portions and it must continuously analyse the
frequency spectrum of the speech so as to determine the operation of the set
of resonators, see Fig. 4.

- Having done this, we have all the information necessary to reproduce the
speech, but the important thing is that in doing so, we can change some of
the parameters and so get some interesting effects. For instance, in the
forthcoming example we exchange the vocal cords and noise source for the
output of an organ and get a talking organ.
To be able to reproduce the speech, the Vocoder uses an electronic model of
the vocal tract, Fig. 5.
- In this model a noise source and an oscillator (VCO) can be turned on and off
and the VCO controlled in pitch. Their outputs are used to excite the
synthesis filter bank. This is a model of the set of resonators and it is
controlled by the data from the analysing filter bank. Figures 4 and 5 are
the analysis and synthesis sections of the Vocoder and they form the main
part of the machine.
- 1. SIMPLE TALKING ORGAN
- The Vocoder is a signal processing device, and therefore it requires a signal
input to obtain an output. To illustrate this, I will take the example of
making an organ speak, Fig. 6.

- Two signal sources are required: one, a source of speech which in this case
is a pre-recorded tape, and two, an organ which can be used to play chords or
tunes. The resultant output of the Vocoder is a speaking organ; that is, it
is a sound that contains the characteristics of both of the inputs. It has
the melody and a proportion of the harmonic structure of the organ plus the
articulation of the speech.
One of the fundamental properties of a good vocoder is its ability to make
its inputs sound like speech. Therefore, it is particularly important when
using pre-recorded speech tapes that no other spurious noises are on that
tape, such as rustling of paper or background traffic noises - because the
Vocoder will try to synthesize speech from them, and the result will be a
rather curious output. Also, any noise on the speech tape will upset some
of the Vocoders' modules and so a signal to noise ratio of 50 to 55 dB is
required if good results are to be obtained.
- 2. MORE COMPLEX SOUNDS
- The generation of synthetic natural sounding speech is a complex problem
which can be solved using the Vocoder. However, the resultant speech always
has a slight mechanical 'feel' about it and the quality of the speech varies
depending on the characteristics of the original speaker. To produce synthetic
speech, we must use the Vocoder to both analyse and then re-synthesize, Fig. 7.
- In this example, the speech is analysed to break it up into its voiced and
unvoiced portions, and the pitch of the voiced portions is determined. This
data is then used to turn on and off the VCO and the noise source, and to
control the pitch of the VCO. The VCO and noise source are then used to
excite the synthesizing filter bank which is itself controlled by the data
from the analysing filter bank. This enables the excitation to be articulated, the result being synthetic speech.
- 3. ADDITI0NAL FEATURES
- I should now like to briefly describe some of the additional features of the
EMS Vocoder.
- SPECTRUM DISPLAY
produces a 22-bar histogram of the energy distribution of the analysed
speech. The display medium is an external XY scope.
- COMPUTER INTERFACE
energy levels of the 22 analysing filters is available simultaneously at
a large multiway connector at the back of the Vocoder, thus enabling a
computer to process this data. Also, the computer can inject control signals
so that the 22 synthesizing channels may be manipulated.
- FILTER BANK PATCHING
analysing filter bank is connected to the synthesizing filter bank via a
22 x 22 way patch board system. This enables the routing of the filters to
be altered at will. Also, there are 22 synthesis input level controls which
can also be used as a 22 channel equaliser.
- FREQUENCY SHIFTER
shifter has both UP and DOWN shifted outputs, with a frequency range of
0.05Hz to 1kHz. It can also be used to generate phasing effects.
- CONNECTIONS AND CONTROLS
the input and output connections to the vocoder are made at the back
panel, Fig. 8.

- is a real time spectrum display of the input speech signal represented by
a 22 bar histogram. Each bar shows the energy level in a bandpass filter
of 1/4 octave spacing.
When speech is being analysed it is possibJe to observe the motion of the
formants. The diagram shows such a situation
- Spectrum Display is an add-on optional function. A Vocoder owner can
obtain this facility by simply buying the board and plugging it in. An XY
oscilloscope is required to view the display.
- I will now describe the function of each of the connectors.
- MAINS:
Power input 240V - 220V. 50Hz or 11OV 60Hz selectable.
- FUSE: Mains fuse 1 amp.
- X Y DISPLAY: BNC connectors. Spectrum display output to an XY scope.
Vocoder Input and Output connectors are all 1/4" mono jacks, unbalanced.
All outputs are short circuit protected.
- OUTPUTS:
- Vocoder
This is the output of the machine. It will drive 640 ohms equipment at Line
Level. However, there is an output level control on the front panel if
smaller levels are required. Usually, this output will be connected to a
tape recorder and/or a monitoring system.
- Equaliser
This output is the sum of all the signals on the synthesis input level controls.
Therefore it is basically a 22 channel equaliser acting upon the signal
injected into speech input.
- Pitch Voltage
The control voltage generated by the pitch extractor appears here, as well as
inside the Vocoder to control the VCOs. Thus it is possible to control exterral pieces of synthesizer equipment with this voltage.
- Voiced
When the voiced/unvoiced detector has decided that the incoming speech signal is voiced, a +2.5V signal is produced. Otherwise a -2.5V is generated.
- Unvoiced
Operation is complementary to the above. These signals can he used to turn on and off external pieces of equipment.
- Up Mix
This is the UP shifted mixed signal from the frequency shifter.
- Down Mix
The DOWN shifted mixed signal is also available and can be used in conjunction with the UP MIX output. For instance, when they are both being used to produce slow phasing, a mobile stereo image can be generated.
- INPUTS
- Speech
Input impedance: 10k ohms. Signal level required, line level. This is the
speech input to the machine; if you want to use a microphone, then the signal
level will have to be brought up to line level with an external pre-amplifier.
Also, when trying to either set up or demonstrate the machine, it is very
useful to use a pre-recorded speech tape, about 10 to 15 minutes in length.
- Excitation A and B
Input impedance: 10k ohms. Signal level required line level.
Any external excitation such as organ, music engine noises etc., is inserted
here.
- VC Slew
It is possible to voltage control the slew freeze function. Input voltage
range: ca. 1V.
- External VC
This connector allows the VCOs and the frequency shifter to be externally
controlled. Pitch spread approximately + 0.5V/octave.
- Envelope Outputs and Control Inputs
There are 22 inputs and outputs on this multi-way connector, which are intended
for computer control. Output voltage range 0 to -4V. Input voltage range 0 to
+4V. Input impedance 33k to 68k.
- Now for the front panel and its many controls. Let us suppose that we wish to
produce synthetic speech from a pre-recorded tape. I will list the necessary
sequence of events. Firstly, connect the mains to the power connection, and
switch on. The orange lights should illuminate. Next, connect the tape
recorder output to the speech input of the Vocoder, roll the tape and turn
the speech input level control (Fig. 9) clockwise until speech PPM meter
reads peaks of 6 to 7.

- If the meter reading is low with the control at maximum, then the tape
recorder signal will have to be externally increased, otherwise the Vocoder's
performance will be degraded. The orange and green lamps will now flash on
and off indicating that voiced/unvoiced decisions are being made. If this
does not happen, but only one colour remains on when speech is being produced
from the tape recorder, then check the SLEW-FREEZE function. This is below
the patchboard. When the red lamp is lit, then the voiced/unvoiced mechanism
is frozen. The normal position for the slew freeze controls is with the
switch off and the knob at 'fast'.
- Next the filter bank patching and the synthesis input levels, Fig.10.
- The Patch Board connects the analysing filter bank to the syntiesis filter bank. This is done with the patch pins, and they should be inserted as in Fig. 9. This is the normal position. Also, all the synthesis input levels should be turned fully clockwise. Now you should see the signal level amps above these pots being lit up the the incoming speech. These lamps indicate the energy levels in the filters and so they are in fact a rather crude real time spectrum display. If the lamps do not light up, then check the squelch switch. This is located inbetween the PPM's and it should be off (in the up position). Also make sure that the FREQUENCY SHIFTER ON/OFF switch is off.
Now, the output mixer, Fig. 11.

- This mixer has three inputs, A, B and C and one output, D, the level of which
is displayed on a PPM meter, vertically above. The three inputs are one, the
original speech signal; two, the excitation signal; and three, the Vocoder
output. (That is the output from the synthesis filter bank). Note that the
speech and the excitation can both be switched off completely.
Turn off controls B, C, D and turn on control A. Now connect the output of
the amplifier and speaker. As control D (the mixed output) is turned on, two
things should happen. Firstly, the original speech will be heard and secondly,
the speech signal will be seen on the output PPM. Turn off control A (speech).
We now have nearly all the conditions necessary to produce synthetic speech.
The only thing that is missing is the excitation.
Figure 12 shows the excitation section of the Vocoder, with the exception of
the excitation PPM which has been omitted.

- This section comprises two VCO's, a noise source and the external excitation
controls.
First, I will explain the operation of the VCO's. Note that VCO 1 and 2 are
the same and so for the purposes of this demonstration, I will use VCO 1.
Set up the knobs and switches for VCO 1 as they are in Fig. 12.
The excitation PPM will indicate a level of about 6 or 7, which can be adjusted
by altering the level control knob. For best results use an excitation level
of about 6 or 7 on the PPM. Now turn up the excitation control on the output
mixer. You will hear a continuous tone which you can vary in pitch using the
slow motion drive of VCO 1. Turn off the excitation control (this is the
moment you have been waiting for) and turn on the Vocoder control to maximum .
You are now hearing synthetic monotonic (constant pitch) speech. Try altering
the slow motion drive and the pitch of the speech will vary.
Now I will explain the functions of the switches, numbers 1 to 5, Fig. 12.
Switch 1 enables the VCO's to be externally voltage controlled in frequency.
Switch 2 is the connection between the VCO's and the pitch extractor.
It has three positions: CALibrated, whereby a change in pitch in the input
speech will produce the same interval change in the VCO; OFF, pitch extractor
has no effect; VARiable, the pitch extractor controls the VCO with a gain
factor between +2.2 to -2.2, which is itself controlled by the pitch knob.
Switch 3 is used to select real time or sequencer control of the VCO's, Fig. 13.

- That is, they can be controlled by an EMS keyboard, such as a KS or DKl or DK2.
However, only the KS keyboard has a sequencer output. The real time and
sequercer signals have pitch spread controls on the Vocoder, Fig. 13.
Note that other manufacturers keyboards can be used, either by putting the the
control in the external VC or in via the keyboard controls. Switch 3 has a
centre off position. Switch 4 selects the output of the VCO to be either a
ramp or a squarewave, the latter having only odd harmonics the former having
both odd and even.
On previous Vocoders, switch 4 was a sync switch which synchronised the VCO
to the fundamental of the input speech signal. Switch 5 controls the output
level of the VCO. When it is in the ON position, the VCO is on continuously;
when the switch is in the V position, the VCO is only on when the incoming
speech is 'voiced' speech. The green lamp is lit when this state is detected.
Listen to the synthetic speech with switch 5 in the V position. You will
note that the 'S' sounds are absent.
Next, I will describe the noise source. This section is used to generate
the 'S' sounds and the whispered speech effects. It has a level control and
a colour control (a filter) as well as an output switch. This switch has
three states: ON all the time, OFF and UV which is only on when unvoiced
signals are detected. This last state is the opposite to the VCO output
switch.
Switch off the VCO and switch on the noise source. Turn both of its controls
fully clockwise. The synthetic speech now produced should be a whisper.
Now switch the noise to UV. Only the 'S' sounds should be produced.
To complete the synthetic speech, switch VCO 1 to V. Now we have monotonic
speech with 'S' sounds, known as fricatives.
The last of the excitation sections is the input controls for the external
excitations A and B. These are simply level controls plus a switch that
allows the signal to pass all the time or only when voiced states exist or
only when unvoiced states exist. Green and orange lamps indicate V and UV
states.
Next, the squelch switch inbetween the PPM's. This switch has the job of
cleaning up the signal processing in the filter bank. When there is no
speech signal, then any excitation breakthrough becomes noticeable and vice
versa.
The squelch switch brings into operation a circuit that detects the absence
of either the speech or the excitation and then squelches the filter bank
output. However, this switch is not to be used when slewing or freezing the
Vocoder.
Back to the synthetic speech. This speech is monotonic and therefore
requires some movement in pitch to make it sound more natural.
This is the job of the pitch extractor, Fig. 14.

- This device extracts the fundamental signal from the speech and converts it
to a control voltage. The switch and the pitch output knob,control the
voltage that is sent to the back panel, in the same way as they do in the two
VCO's. The QUALITY knob controls a filter which preceeds the pitch extractor.
With this knob set at NORMAL the device makes less errors, when it is set at
ERRATIC, it makes more. To demonstrate this, let's go back to the synthetic
speech.
Turn the QUALITY knob to normal and set switch 2 (VCO 1) to CAL. The result
should be synthetic speech with pitch variance and fricatives. You may have
to adjust the slow motion drive of VCO 1 to restore it to a natural pitch.
Now turn the QUALITY knob to ERRATIC. The synthesised speech will occasionally
produce a noticeably wild pitch. Turn the knob back to NORMAL. Next the
SET-ZERO knob. The pitch range used by one speaker is normally not very great,
but the range of speakers is. That is, a large man may be 3 octaves lower in
pitch than a child, but both of them may only have a speaking range of 1/2 an
octave. Now, it is sometimes important that the output voltage of the pitch
extractor swings equally positive and negative for one particular speaker.
For instance, when using the variable pitch spread knob to get a voice to go
slowly from monotonic to varying pitch, there must not be a standing DC
voltage on the output of the pitch extractor. If there is, then the resultant
speech will have a fixed frequency shift proportional to the PITCH knob setting.
Therefore, the job of the SET ZERO knob is to bias the pitch extractors'
voltage output so that it swings equally positive and negative.
Pitch Extractor
The diagram below shows the Pitch Extractorts output for three speakers; A,
a child: B, a woman; C, a man.

- In this example, they are all speaking the same text, and thus their pitch
variance is similar. However, they are displaced by a fixed interval, due to
the physical differences between them. It is useful to have an output from
the Pitch Extractor which swings equally about 0 volts. This is achieved
using the SET ZERO knob. By adjusting this control, both A and C can be
biased so as to swing equally about 0 volts.
Next the SLEW-FREEZE section, Fig. 15.

- Information inside the filter bank is analysed and the data produced is then
used to control the synthesis process. However, before this data reaches the
synthesis filters it has to pass through the SLEW-FREEZE section. Thus, it
is possible to freeze the data and so hold a particular filter structure, on
say a vowel sound. That is, the excitation can still be varied at will, but
the filter structure is frozen from that point in time. Also, it is possible
to slew the data. That is, to smear it out in time. The SLEW knob performs
this function. As it is rotated anticlockwise, the data flow becomes slower
and slower unit it eventually freezes (the lamp comes on). Note that the
freeze switch will always freeze the sound, no matter where the setting of
the knob.
To demonstrate this, set up a synthetic monotonic speech output with only
VCO 1, and no noise source. Turn the SLEW pot anti-clockwise and listen to
the time smearing effect. Return it to fast and then use the freeze switch
to freeze at various points in the speech. Note that the SLEW FREEZE section
affects the filter bank, the V/UV detector and the pitch voltage. Try the
previous operations on synthetic speech with both pitch and noise. The SLEW
FREEZE section is also voltage controllable. For instance, you can use an
external squarewave oscillator to freeze the Vocoder. If the square wave has
a long freeze period and a short fast slew period, then some interesting
effects can be obtained with an oscillator frequency of about 1 to 10Hz.
- The last section to be covered is the FREQUENCY SHIFTER, Fig. 16.

- Adjust the output mixer so that we are listening to the original speech only
and set up the FREQUENCY SHIFTER so that its knobs and switches are as shown
in Fig. 16. As the FREQUENCY SHIFT knob is rotated clockwise, the pitch of
the speech will rise. You may have heard this effect before; it is, in fact,
single sideband modulation as used in radio communications.
Now rotate the OUTPUT knob fully anti-clockwise and repeat the process. This
time the signal will fall in pitch. Thus we have seen how the SHIFTER can
move signals both up and down in pitch. Now, set both the UP MIX and
DOWN MIX to 5 and set the FREQUENCY SHIFT knob to a low frequency, about 1 or
2Hz. You should now hear phasing sounds which continually sweep in one
direction. Turn the OUTPUT knob back to UP MIX and this direction will
reverse. It is possible to shift any one of the signals present at the
OUTPUT MIXER. This selection is done with rotary switch 12.
Now the SQUELCH knob. Sometimes when there is no signal coming in, the
frequency shifter will produce a faint audible tone. This is known as
carrier breakthrough, but it should be about 60dB down on the signal level.
However, if this is still unacceptable, the SQUELCH knob can be used to remove
the breakthrough. This knob merely defines a signal level below which the
entire signal is squelched off.
Finally the remaining knob and switches. The PITCH knob and switch 9 are
exactly the same in operation as those on the VCO's. Switch 10 allows an
external voltage to control the shift frequency and switch 11 turns the
frequency shifter on and off.
That is all the controls described, so now (very briefly) I will show how to
produce a few effects:
Double Tracking
Set up a synthetic speech output with pitch and fricatives, but move all the
pins up one hole (a 1/4 octave). In fact, all the pins will not go in, and
you will have to lose one. Note that the speech sounds like it is being
produced by a smaller speaker.
Now mix in some of the original speech and notice the double tracking effect.
Try this again, but move the pins down.
Time Compression
Move the pins down by 4 holes (1 octave), play the tape at twice the normal
speed and adjust the VCO pitch so that the best intelligibility is obtained.
You are now hearing time compressed speech. Compare it with the original.
Try different forms of excitation.. i.e. monotonic VCO or continuous noise.
Time Expansion
Repeat as above but move the pins up by 4 holes and run the tape at half speed. This decreased data rate makes things much easier for the Vocoder.
- USING THE EMS VOCODER
Although the Vocoder is a portable device, and although it can be used live,
it is still best thought of as being a piece of studio equipment.
Experience has shown that it is much easier to use the machine when only one
of its inputs (speech or excitation) is live. That is, either use
pre-recorded speech and then try to match the excitation to it, or vice
versa. For instance, when trying to make an organ sing, the procedure
would be as follows:-
1. make a high quality recording of the singing.
2. Patch to recorded singing to the output of the organ to the Vocoder.
3. Give the organist the Vocoder output to listen to. Have several rehearsals
before recording. Note that if the organist takes his hands off the keyboard,
all the sound will cease.
When it is required to articulate the sounds of normally inanimate sources,
then it is necessary to tailor the speech to the excitation. For example, if
the wind has to speak, then the speech must be long and uninterrupted. Or if
you want to freeze on a vowel, it would be advisable for the speaker to
elongate this vowel so as to enable the freeze to be operated in time.
If you want to make a crowd speak, then the speech must be smeared out in
time, possibly by adding reverberation to it.
Another way of using the Vocoder is to feed the output back into the
excitation input, having no other sources of excitation. This will make the
Vocoder self oscillate, but controlled by the incoming speech.
This makes a sound like a talking wind instrument.
Of course, the speech input can have other signals applied to it, such as
musical irstruments, animal noises etc., and these will produce a variety of
new sounds.
The possibilities are limitless and I must leave it up to you to discover
them.
