Sign up & Download
Sign in

Freesound Radio: supporting music creation by exploration of a sound database

by Gerard Roma, Perfecto Herrera, Xavier Serra
on Computational Creativity Support (2009)

Abstract

The habit of sharing media online has created a platform with great potential for creative applications that are accessible to large numbers of users with very different backgrounds. As an example, a lively community has grown around Freesound.org to share sound files typically to be reused in music and multimedia content. However, in order to fully realize this potential, new interfaces are needed beyond concept searching to discover interesting multimedia content. We describe Freesound Radio, an experimental environment that allows users to collectively explore the content in Freesound.org by listening to combinations of sounds represented using a graph data structure. Users can create new combinations from scratch or from existing ones. A continuous supply of potential combinations is provided by a genetic algorithm for the radio to play.

Cite this document (BETA)

Available from Xavier Serra's profile on Mendeley.
Page 1
hidden

Freesound Radio: supporting music creation by exploration of a sound database

Freesound Radio: supporting music creation by
exploration of a sound database
Gerard Roma
Music Technology Group
Universitat Pompeu Fabra
gerard.roma@upf.edu
Perfecto Herrera
Music Technology Group
Universitat Pompeu Fabra
perfecto.herrera@upf.edu
Xavier Serra
Music Technology Group
Universitat Pompeu Fabra
xavier.serra@upf.edu
ABSTRACT
The habit of sharing media online has created a plat-
form with great potential for creative applications that
are accessible to large numbers of users with very dif-
ferent backgrounds. As an example, a lively commu-
nity has grown around Freesound.org to share sound
files typically to be reused in music and multimedia
content. However, in order to fully realize this poten-
tial, new interfaces are needed beyond concept search-
ing to discover interesting multimedia content. We de-
scribe Freesound Radio, an experimental environment
that allows users to collectively explore the content in
Freesound.org by listening to combinations of sounds
represented using a graph data structure. Users can
create new combinations from scratch or from existing
ones. A continuous supply of potential combinations is
provided by a genetic algorithm for the radio to play.
INTRODUCTION
Media sharing has become one of the most prominent
uses of the internet. Unlike traditional entertainment,
media sharing is driven by users who enjoy uploading,
commenting and rating all kinds of multimedia content.
Started as an academic research project, Freesound.org
has become one of the most widely used sites for shar-
ing sound files licensed under a Creative Commons (CC)
license. Sharing sounds, however has different implica-
tions than sharing other media such as video or images.
Since the early experiments of Musique concre`te in the
60s [12], sound recordings have been commonly used
as building blocks for musical and multimedia compo-
sitions. As computers have gained a central position
in music and sound production, the practice of reusing
sound recordings has become standard. Hence, the ac-
tivity of sound sharing is not only carried for the sake
of sharing experiences and opinions, but also because
of the possibilities for recombination of the sound ma-
terials. Since the interest of a given sound may not be
directly related to its source (which may be not even
recognizable in the final mix), text based search and re-
trieval may be limiting the use of large sound databases
for creative applications.
A good deal of research has been done on the issue of
content based retrieval to support creative uses of sound
databases. Musical mosaicing [15] refers to a way of
building music pieces by specifying a target using sound
descriptors (typically extracted from an existing piece
or performance). The target is then used as a set of con-
straints to automatically retrieve the samples that real-
ize the piece. Concatenative sound synthesis improves
on this concept by leveraging the tradition from speech
synthesis of modifying the retrieved samples to improve
the sound quality of the result [13]. The problem with
this approach for retrieving sounds from a real world
database is the requirement that the target is specified
in advance, which often doesn’t match popular prac-
tices in dealing with sound samples. For instance using
descriptors to visually browse the database may require
specialized knowledge that can’t be assumed from inter-
net visitors. On the other hand, the possibility to au-
tomatically describe units on the database typically re-
lies heavily on a process that segments audio into small
units, through which the database is artificially built.
Hence, the problem of retrieving sounds from an audio
database collaboratively built by internet visitors may
benefit from a different approach.
The term creativity has traditionally been used to re-
fer to unexpected events with regard to human actions.
The idea that something is created out of nothing is usu-
ally a way to stress a disconnection between cause and
effect, and the inability to offer a rational explanation.
Nevertheless, attempts have been made to understand
and describe creativity in computational terms. One of
the most well known is due to Boden [3], who explained
personal creativity as an exploration of a conceptual
space where unusual generative rules are used to find
concepts that are novel and valuable. This perspective
may be useful when looking at the problem of creative
exploration of a sound database: as we have seen in
many occasions sounds are sought to be used as build-
ing blocks. Often, the fitness of a given sound will only
be known when listening to it in a given context. In
this sense, it may be worth to explore the space of po-
tential combinations of sounds. For creativity support
tools, the decision on value and novelty may be let to the
user. On the other hand, further research has tended to
consider creativity more as a social phenomenon than a
product of individual minds. Computer programs that
support social creativity may then be thought as ecosys-
tems that allow things to happen from the interaction
between different agents. One possible framework was
proposed by Kosorukoff [8] with the definition of Human
Based Genetic Algorithms (HBGA) as evolutionary al-
gorithms where all genetic operations may be performed
Page 2
hidden
by users. Hybrid systems may balance work load be-
tween human and computational agents.
The Freesound Radio1 project was developed to provide
an alternative interface to the one of Freesound.org by
creating a continuous sound stream. Instead of visual
browsing, the radio allows to explore the database by
listening to it. Instead of playing individual sounds, it
generates potential combinations of them, defined by a
graph data structure. Visitors can create and share such
combinations which are seeded to a genetic algorithm
that continuously generates new ones. All sounds used
can be identified and downloaded. In the following sec-
tions we describe the main problems encountered in the
development of this system and the solutions provided
so far.
RELATED WORK
Several network music projects have investigated the
potential of the web for collective creativity. Because of
the intrinsic delay in internet communications and the
usage patterns of the web, involving users in different
time zones, some research has focused on asynchronous
sharing of musical creations. For example in [7], the
music for a theatre piece was composed by internet vis-
itors using FMOL, a custom synthesis engine. Users
could depart from existing compositions to create their
own. In [9] the concept of music prototyping was pro-
posed as a way to accommodate different levels of exper-
tise in asynchronous creation of musical pieces. On the
other hand, many projects have explored the potential
of concurrent activity to drive or create a shared musi-
cal process, although this often needs to deal with re-
laxed synchronicity requirements in comparison to tra-
ditional music creation tools. Daisyphone [4] allowed
a group of users to concurrently modify a musical se-
quence displayed as a circle that is played in a loop.
CC-Remix [14] explored online remixing of loops ex-
tracted from a single CD of songs released under the
CC license.
The advantage of our approach over using a synthesis
engine or a MIDI player is that using samples allows
visitors without any musical or specialized training to
participate. On the other hand, the size and variety
of the database is obviously very important in order to
provide a greater vocabulary to people with different
tastes. However, with large databases comes the need
of appropriate description and organization. We now
describe the approach followed in Freesound Radio for
navigating sounds of many different kinds.
ORGANIZATIONOFGENERAL AUDIO FORCREATIVE AP-
PLICATIONS
One important aspect of creative use of sounds is the
possibility to cross established boundaries and use record-
ings of different types in unintended ways. Thus, in cur-
rent popular music it is an established practice to use
1http://radio.freesound.org
recordings from very different domains: voice record-
ings, environmental sounds or fragments of existing mu-
sic are commonly reinterpreted by putting them in new
auditory contexts. The Freesound.org database is ac-
tually a good example of this tendency towards eclectic
use of sounds. The site contains as of January 2009
more than 60000 sounds of the most diverse nature,
with the only common denominator that they are ex-
pected to be reused in some new context. Several dif-
ferent cultures can be identified on the site: many users
upload sounds recorded from the environment with dif-
ferent levels of expertise and technical sophistication,
other users upload sounds created with their synthesiz-
ers or computer programs, a number of users upload
voice recordings, often upon request. Applications of
these sounds range from media products (video, games)
to mobile phone ring tones or music.
This diversity poses a problem on how to organize sounds.
Our aim to support exploration of the database can
greatly benefit from a measure of similarity between
sounds, as well as some categorization. For an appli-
cation based on collaborative media sharing it seems
reasonable that any such categorization is not preset in
advance, but emerges from community activity. Just as
in other social media sites, probably the most promi-
nent feature used to describe and retrieve sounds in
Freesound.org is the tag folksonomy. Although rather
noisy, tag folksonomies are an important sources of in-
formation because of the agreement they represent in
using the same concepts to describe sounds [5]. In this
sense, Music Information Retrieval (MIR) methodolo-
gies used for automatic classification of music, typically
based on machine learning algorithms that leverage au-
tomatic extraction of content descriptors, may be used
also for creative organization of musical building blocks.
An initial step in this direction was given in the develop-
ment of Freesound Radio in order to identify a set of ba-
sic categories where different descriptors may be used.
As an example, a monophonic pitch extractor may be
used to describe a single piano note, a tempo induction
mechanism may be more useful for a drum loop, and
a detailed description of timbre may be needed for an
environmental sound. For the task of telling apart a set
of basic categories we use a decision tree learning algo-
rithm so that the resulting models are understandable
and may be developed and refined. We use some pop-
ular tags to train the algorithm. For example, ’note’
and ’noise’ are used on sounds with a single onset to
tell apart pitched and unpitched ones, and ’drumloop’
versus ’rain’ to identify rhythmic sounds among those
with multiple onsets. This technique can be extended
to other tags and sound families in order to analyze
and understand the relationships between concepts em-
ployed by users and automatic content descriptors. For
simplicity, we use the euclidean distance which is com-
monly used with sound segments (e.g. in [6]) for calcu-
lating the most similar sounds to a given one within its
class, using an appropriate set of descriptors for that
class.

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

8 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
50% Ph.D. Student
 
25% Assistant Professor
 
13% Student (Bachelor)
by Country
 
25% Spain
 
13% Japan
 
13% Switzerland