Dictionary of Meaning
<<Back
Please select a letter:
A |
B |
C |
D |
E |
F |
G |
H |
I |
J |
K |
L |
M |
N |
O |
P |
Q |
R |
S |
T |
U |
V |
W |
X |
Y |
Z |
0-9
Click here for Shopping
Audio timescale-pitch modification
*** Shopping-Tip: Audio timescale-pitch modification
'''Time stretching''' is the process of changing the speed or duration of an
audio signal processing audio signal without affecting its
pitch (music) pitch.
'''Pitch scaling''' or '''pitch shifting''' is the reverse: the process of changing the pitch without affecting the speed. There are also more advanced methods used to change speed, pitch, or both at once, as a function of time.
These processes are used, for instance, to match the pitches and tempos of two pre-recorded clips for mixing when the clips cannot be reperformed or resampled. (A drum track could be moderately resampled for tempo without adverse effects, but a pitched track could not). They are also used to create effects such as increasing the range of an instrument (like pitch shifting a guitar down an octave).
Resampling
The simplest way to change the duration or pitch of a
digital signal digital audio clip is to
resampling resample it. This is a mathematical operation that effectively rebuilds a continuous waveform from its samples and then samples that waveform again at a different rate. When the new samples are played at the original sampling frequency, the audio clip sounds faster or slower. Unfortunately, the frequencies in the sample are always scaled at the same rate as the speed. In other words, slowing down the recording lowers the pitch, speeding it up raises the pitch, and the two effects cannot be separated. This is analogous to speeding up or slowing down an
analog signal analog recording, like a
phonograph record or
Sound recording#Magnetic Recording tape.
Phase vocoder
{{main|Phase vocoder}}
One way of stretching the length of a signal without affecting the pitch is to build a
phase vocoder after Flanagan, Golden, and Portnoff.
Basic steps:
#compute the instantaneous frequency/amplitude relationship of the signal using the
Short-time Fourier transform STFT, which is the
discrete Fourier transform of a short, overlapping and smoothly windowed block of samples;
#apply some processing to the Fourier transform magnitudes and phases; and
#perform an inverse STFT by taking the inverse Fourier transform on each chunk and adding the resulting waveform chunks.
The phase vocoder handles
sinusoid components well, but early implementations introduced considerable smearing on
transient signal transient ("beat") waveforms at all non-integer compression/expansion rates, which renders the results phasey and diffuse. Recent improvements allow better quality results at all compression/expansion ratios but a residual
smearing effect still remains.
The phase vocoder technique can also be used to perform pitch shifting, chorusing, timbre manipulation, harmonizing, and other unusual modifications, all of which can be changed as a function of time.
Time domain
Rabiner and Schafer in 1978 put forth an alternate solution that works in the
time domain: attempt to find the
periodic signal period (or equivalently the
fundamental frequency) of a given section of the wave using some
pitch detection algorithm (commonly the peak of the signal's
autocorrelation, or sometimes
cepstrum cepstral processing), and
crossfade one period into another.
This is called
time domain harmonic scaling or the
synchronized overlap-add method and performs somewhat faster than the phase vocoder on slower machines but fails when the autocorrelation mis-estimates the period of a signal with complicated harmonics (such as
orchestral pieces).
Cool Edit Pro seems to solve this by looking for the period closest to a center period that the user specifies, which should be an integer multiple of the tempo, and between 30
hertz Hz and the lowest bass frequency. For a 120
beats per minute bpm tune, use 48 Hz because 48 Hz = 2,880 cycles/minute = 24 cycles/beat * 120 bpm.
This is much more limited in scope than the phase vocoder based processing, but can be made much less processor intensive, for real-time applications. It provides the most coherent results for single-pitched sounds like voice or musically monophonic instrument recordings.
High-end commercial audio processing packages either combine the two techniques (for example by separating the signal into sinusoid and transient waveforms), or use other techniques based on the
wavelet transform, or artificial neural network processing, producing the highest-quality time stretching.
Pitch scaling
These techniques can also be used to scale the
pitch (music) pitch of an audio sample while holding speed or duration constant.
Note that the technique can be called '''pitch scaling''' or '''
pitch shifting''', depending on perspective. Under one definition of musical pitch, pitch is defined as the
logarithm of frequency; as the musical pitch is ''shifted'' linearly (shifting every note up the scale by a perfect fifth, for instance), the frequencies of the signal are actually being ''scaled'', because of the logarithmic relationship between the notes we hear and the actual frequencies of those notes. A ''frequency shift'', which is performed by
amplitude modulation, does not preserve the ratios of the
harmonic frequencies that determine the sound's
timbre, and is not a "musical" transformation. Similarly, a literal ''pitch scaling'', in which the musical pitch is scaled (a higher note would be shifted at a greater interval than a lower note) is highly unusual, and not musical. However, "pitch" can also be used to refer to frequency, and the other two transformations are not commonly used, so either term usually refers to ''musical'' pitch shifting.
Time domain processing works much better here, as smearing is less noticeable, but scaling vocal samples distorts the
formants into a sort of
Alvin and the Chipmunks-like effect, which may be desirable or undesirable.
To preserve the formants and character of a voice, you can analyze the signal with a
vocoder channel vocoder or
Linear predictive coding LPC vocoder plus any of several pitch detection algorithms and then resynthesize it at a different fundamental frequency.
See also
*
Audio signal processing
*
Sound effects
External links
-
The Phase Vocoder: A Tutorial - A good description of the phase vocoder
-
New Phase-Vocoder Techniques for Pitch-Shifting, Harmonizing and Other Exotic Effects
-
A new Approach to Transient Processing in the Phase Vocoder
-
PSOLA Synthesis, [http://www.ee.columbia.edu/~dpwe/papers/HejMus91-solafs.pdf SOLAFS Synthesis] - Two specific methods of time domain
time domain harmonic scaling TDHS or
synchronous overlap-add processing SOLA processing.
-
Audio Engineering Society
*Original E2 article (http://everything2.com/index.pl?node_id=1074923)
-
http://www.dspdimension.com/html/timepitch.html
-
http://www.bdti.com/faq/dsp_faq.htm - comp.dsp FAQ
-
SoundTouch library - An open-source implementation of time/pitch scaling algorithms.
-
http://www.dspteam.com/ - Wide-range real-time-stretching (Procrustes) and resampling (Sylea) libraries with FPU/3DNow!/
SSE versions.
Category:Audio engineering
Category:Digital signal processing
Category:Sound effects
de:Pitch shifter
*** Shopping-Tip: Audio timescale-pitch modification