Sign up & Download
Sign in

Musical Sound Modeling with Sinusoids plus Noise

by Xavier Serra
Musical Signal Processing (1997)

Abstract

When generating musical sound on a digital computer, it is important to have a good model whose parameters provide a rich source of meaningful sound transformations. Three basic model types are in prevalent use today for musical sound generation instrument models, spectrum models, and abstract models. Instrument models attempt to parametrize a sound at its source, such as a violin, clarinet, or vocal tract. Spectrum models attempt to parametrize a sound at the basilar membrane of the ear, discarding whatever information the ear seems to discard in the spectrum. Abstract models, such as FM, attempt to provide musically useful parameters in an abstract formula. This article addresses the second category of synthesis techniques spectrum modeling. The main advantage of this group of techniques is the existence of analysis procedures that extract the synthesis parameters out of real sounds, thus being able to reproduce and modify actual sounds. Our particular approach is based on modeling sounds as stable sinusoids (partials) plus noise (residual component), therefore analyzing sounds with this model and generating new sounds from the analyzed data. The analysis procedure detects partials by studying the time-varying spectral characteristics of a sound and represents them with time-varying sinusoids. These partials are then subtracted from the original sound and the remaining "residual" is represented as a time-varying filtered white noise component. The synthesis procedure is a combination of additive synthesis for the sinusoidal part, and subtractive synthesis for the noise part.

Cite this document (BETA)

Available from Xavier Serra's profile on Mendeley.
Page 1
hidden

Musical Sound Modeling with Sinusoids plus Noise

Musical Sound Modeling

Musical Sound Modeling with Sinusoids plus Noise

Xavier Serra
Audiovisual Institute, Pompeu Fabra University
Rambla 31, 08002 Barcelona, Spain
URL: http://www.iua.upf.es
email: xserra@iua.upf.es

[published in C. Roads, S. Pope, A. Picialli, G. De Poli, editors. 1997. “Musical Signal
Processing”. Swets & Zeitlinger Publishers]

1. Introduction
When generating musical sound on a digital computer, it is important to have a good model whose
parameters provide a rich source of meaningful sound transformations. Three basic model types are in
prevalent use today for musical sound generation: instrument models, spectrum models, and abstract
models. Instrument models attempt to parametrize a sound at its source, such as a violin, clarinet, or
vocal tract. Spectrum models attempt to parametrize a sound at the basilar membrane of the ear,
discarding whatever information the ear seems to discard in the spectrum. Abstract models, such as
FM, attempt to provide musically useful parameters in an abstract formula.
This article addresses the second category of synthesis techniques: spectrum modeling. The main
advantage of this group of techniques is the existence of analysis procedures that extract the synthesis
parameters out of real sounds, thus being able to reproduce and modify actual sounds. Our particular
approach is based on modeling sounds as stable sinusoids (partials) plus noise (residual component),
therefore analyzing sounds with this model and generating new sounds from the analyzed data. The
analysis procedure detects partials by studying the time-varying spectral characteristics of a sound and
represents them with time-varying sinusoids. These partials are then subtracted from the original
sound and the remaining “residual” is represented as a time-varying filtered white noise component.
The synthesis procedure is a combination of additive synthesis for the sinusoidal part, and subtractive
synthesis for the noise part.
This analysis/synthesis strategy can be used for either generating sounds (synthesis) or transforming
pre-existing ones (sound processing). To synthesize sounds we generally want to model an entire
timbre family, i.e., an instrument, and that can be done by analyzing single tones and isolated note
transitions performed on an instrument, and building a data base that characterizes the whole
instrument or any desired timbre family, from which new sounds are synthesized. In the case of the
sound processing application the goal is to manipulate any given sound, that is, not being restricted to
isolated tones and not requiring a previously built data-base of analyzed data.
Some of the intermediate results from this analysis/synthesis scheme, and some of the techniques
developed for it, can also be applied to other music related problems, e.g., sound compression, sound
source separation, musical acoustics, music perception, performance analysis,... but a discussion of
these topics is beyond the current presentation.
2. Background
Additive synthesis is the original spectrum modeling technique. It is rooted in Fourier’s theorem
which states that any periodic waveform can be modeled as a sum of sinusoids at various amplitudes
and harmonic frequencies. Additive synthesis was among the first synthesis techniques in computer
music. In fact, it was described extensively in the very first article of the very first issue of the
Computer Music Journal (Moorer, 1977).



1
Page 2
hidden
Musical Sound Modeling

In the early 1970s, Andy Moorer developed a series of analysis programs to support additive
synthesis. He first used the “heterodyne filter” to measure the instantaneous amplitude and frequency
of individual sinusoids (Moorer, 1973). The heterodyne filter implements a single frequency bin of the
Discrete Fourier Transform (DFT), using the rectangular window. The magnitude and phase derivative
of the complex numbers produced by the sliding DFT bin provided instantaneous amplitude and
frequency estimates. The next implementation (Moorer, 1978) was based on the Digital Phase
Vocoder (Portnoff, 1976). In this system, the fast Fourier transform (FFT) was used to provide,
effectively, a heterodyne filter at each harmonic of the fundamental frequency. The use of a non
rectangular window gave better isolation among the spectral components.
The main problem with the phase vocoder was that inharmonic sounds, or sounds with time-varying
frequency characteristics, were difficult to analyze. The FFT can be regarded as a fixed filter bank or
“graphic equalizer”: If the size of the FFT is N, then there are N narrow bandpass filters, slightly
overlapping, equally spaced between 0 Hz and the sampling rate. In the phase vocoder, the
instantaneous amplitude and frequency are computed only for each “channel filter” or “bin.” A
consequence of using a fixed-frequency filter bank is that the frequency of each sinusoid is not
normally allowed to vary outside the bandwidth of its channel, unless one is willing to combine
channels in some fashion which requires extra work. (The channel bandwidth is nominally the
sampling rate divided by the FFT size.) Also, the analysis system was really set up for harmonic
signals⎯you could analyze a piano if you had to, but the progressive sharpening of the partials meant
that there would be frequencies where a sinusoid would be in the crack between two adjacent FFT
bins. This was not an insurmountable condition (the adjacent bins could be combined intelligently to
provide accurate amplitude and frequency envelopes), but it was inconvenient and outside the original
scope of the analysis framework of the phase vocoder.
In the mid eighties Julius Smith developed the program PARSHL for the purpose of supporting
inharmonic and pitch-changing sounds (Smith and Serra, 1987). PARSHL was a simple application of
FFT peak-tracking technology commonly used in the Navy signal processing community (General
Electric, 1977; Wolcin 1980a; 1980b; Smith and Friedlander, 1984). As in the phase vocoder, a series
of FFT frames is computed by PARSHL. However, instead of writing out the magnitude and phase
derivative of each bin, the FFT is searched for peaks, and the largest peaks are “tracked” from frame to
frame. The principal difference in the analysis is the replacement of the phase derivative in each FFT
bin by interpolated magnitude peaks across FFT bins. This approach is better suited for analysis of
inharmonic sounds and pseudo-harmonic sounds with important frequency variation in time.
Independently at about the same time, Quatieri and McAulay developed a technique similar to
PARSHL for analyzing speech (McAulay and Quatieri, 1984; 1986). Both systems were built on top
of the short-time Fourier transform (Allen, 1977).
The PARSHL program worked well for most sounds created by simple physical vibrations or driven
periodic oscillations. It went beyond the phase vocoder to support spectral modeling of inharmonic
sounds. A problem with PARSHL, however, is that it was unwieldy to represent noise-like signals
such as the attack of many instrumental sounds. Using sinusoids to simulate noise is extremely
expensive because, in principle, noise consists of sinusoids at every frequency within the band limits.
Also, modeling noise with sinusoids does not yield a flexible sound representation useful for music
applications. Therefore the next natural step to take in spectral modeling of musical sounds was to
represent sinusoids and noise as two separate components (Serra, 1989; Serra and Smith, 1990).



2

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

24 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
25% Ph.D. Student
 
13% Student (Master)
 
13% Researcher (at an Academic Institution)
by Country
 
21% United States
 
17% United Kingdom
 
13% Ireland