

Method and apparatus of synthesizing plucked string instruments using recurrent neural networks 
6292791 
Method and apparatus of synthesizing plucked string instruments using recurrent neural networks


Patent Drawings: 
(27 images) 

Inventor: 
Su, et al. 
Date Issued: 
September 18, 2001 
Application: 
09/097,739 
Filed: 
June 16, 1998 
Inventors: 
Chung; TienHo (HsinTien, TW) Liang; Shengfu (Tainan, TW) Su; Wenyu (ChangHwa, TW)

Assignee: 
Industrial Technology Research Institute (Hsinchu, TW) 
Primary Examiner: 
Powell; Mark 
Assistant Examiner: 
Starks; Wilbert 
Attorney Or Agent: 
Finnegan, Henderson, Farabow, Garrett & Dunner, L.L.P. 
U.S. Class: 
706/15; 706/16; 706/23 
Field Of Search: 
84/601; 84/603; 84/604; 706/23 
International Class: 

U.S Patent Documents: 
4984276; 5138924; 5212334; 5308915; 5448010; 5986199 
Foreign Patent Documents: 

Other References: 
Liang, S.F. and Su, Alvin, W.Y, Dynamics Modeling of Musical String by Linear Scattering Recurrent Network, Joint Conference of 1996International Computer Symposium, Dec. 1921, ICS '96 Proceedings on Artificial Intelligence, Taiwan, Jan. 1996, pp. 263270.*. Su, A.W.Y.; Liang SanFu, Synthesis of pluckedstring tones by physical modeling with recurrent neural networks, Multimedia Signal Processing, Jan. 1997., IEEE First Workshop on, 1997, pp.: 7176.*. ShengFu Liang; Su, A.W.Y.; ChinTeng Lin, Modelbased synthesis of plucked string instruments by using a class of scattering recurrent networks, Neural Networks, IEEE Transactions on, vol.: 11 1, Jan. 2000, pp.: 171185.*. Bresin, R.; Vedovetto, A., Neural networks for a simpler control of synthesis algorithm of musical tones and for their compression, TimeFrequency and TimeScale Analysis, Jan. 1994., Proceedings of, the IEEESP International Symposium on, 1994, p.:628.*. Burr, D.J.; Miyata, Y., Hierarchical recurrent networks for learning musical structure, Neural Networks for Processing [1993]III. Proceedings of the, 1993 IEEESP Workshop, Jan. 1993, pp.: 216225.*. Helweg, D.A.; Roiblat, H.L.; Nachtigall, P.E., Using a biomimetric neural net to model dolphin echolocation, Artificial Neural Networks and Expert Systems, Jan. 1993. Proceedings., First New Zealand International TwoStream Conference on, 1993, p.:247.*. Masri, P.; Canagarajah, N., Synthesis from musical instrument character maps, Audio and Music Technology: The CHallenge of Creative DSP (Ref. No. 1998/470), IEE Colloquium on, Jan. 1998, pp.: 10/110/5.*. John W. Gordon; System architectures for computer music; ACM Comput. Surv. 17, 2 (Jun. 1985), pp. 191233.. 

Abstract: 
A "virtual string" is generated for synthesizing sound produced by pluckedstring instruments using recurrent neural networks. The disclosed recurrent neural network, called a Scattering Recurrent Network (SRN), is based on the physics of waves traveling in the string. Vibration measured from a plucked string is used as the training data for the SRN. The trained SRN is a virtual model capable of generating tones similar to the tones generated by the physical string. As with a real string, the "virtual string" corresponding to the SRN responds differently to different types of string "plucking" motions. 
Claim: 
What is claimed is:
1. A method of synthesizing sounds produced by a plucked string comprising the steps of:
generating an input waveform corresponding to an initial plucking of the string, wherein waveform values are assigned to a plurality of predetermined points on the string and interpolating further waveform values based on the assigned values;
initializing nodes of a neural network with values based on the input waveform;
iteratively changing the node values based on weights associated with the nodes;
outputting a waveform based on selected ones of the node values at a plurality of the iterations; and
generating sound based on a sequence of the output waveforms, the generated sound simulating the sound made by the plucked string.
2. The method of claim 1, wherein the step of generating the input waveform further includes the substep of assigning waveform values to a plurality of predetermined points on the string based on a mathematical function.
3. The method of claim 1, wherein the step of initializing the neural network further includes the substep of setting displacement nodes of the neural network to values of the input waveform.
4. The method of claim 1, wherein the neural network is a recurrent neural network.
5. The method of claim 1, wherein the step of iteratively changing the node values further includes, for each iteration, the substeps of:
calculating values of arrival nodes in the neural network based on values of departure nodes in the neural network at a previous iteration;
calculating values of displacement nodes in the neural network based on the calculated values of the arrival nodes; and
calculating values of the departure nodes in the neural network based on the calculated values of the displacement nodes and arrival nodes.
6. The method of claim 5, wherein the substep of calculating the values of the arrival nodes in the neural network further includes the step of evaluating the equation: ##EQU17##
7. The method of claim 5, wherein the substep of calculating the values of the displacement nodes in the neural network further includes the step of evaluating the equations: ##EQU18## net.sub.i.sup.y (t+1)=r.sub.i,i1.multidot..phi..sub.i,i1(t+1)+r.sub.i,i+1.multidot..phi..sub.i,i+1 (t+1).
8. The method of claim 5, wherein the substep of calculating the values of the departure nodes in the neural network further includes the step of evaluating the equation: ##EQU19##
9. A computer readable medium containing instructions for execution on a computer that synthesizes sounds produced by a plucked string, the instructions, when executed, causing the computer to perform the steps of:
generating an input waveform corresponding to an initial plucking of the string, wherein waveform values are assigned to a plurality of predetermined points on the string and interpolating further waveform values based on the assigned values;
initializing nodes of a neural network with values based on the input waveform;
iteratively changing the node values based on weights associated with the nodes;
outputting a waveform based on selected ones of the node values at a plurality of the iterations; and
generating sound based on a sequence of the output waveforms, the generated sound simulating the sound made by the plucked string. 
Description: 
FIELD OF THE INVENTION
This invention relates generally to music synthesis by computers, and more specifically, to the synthesis of sounds produced by plucked string instruments.
BACKGROUND OF THE INVENTION
Realistic electronic sound synthesis is becoming increasingly important, in part due to the rapid development of multimedia and virtual reality applications. Two traditional techniques for synthesizing music are frequency modulation (FM)synthesis and Wavetable (sampleandplay) synthesis. Generally, FM methods generate sounds using sinusoidal oscillators. Wavetable methods store sound segments produced by actual instruments and synthesize the instruments by playing back digitallyprocessed sequences of the sound segments. Both FM and Wavetable sound synthesis systems have their drawbacks. For instance, although FM synthesis can produce a wide range of interesting sounds and Wavetable synthesis is able to produce tones havingtimbres close to that of musical instruments, both methods are deficient at producing the large dynamic sound range capable of being produced by acoustic instruments.
An alternate, more recent technique, for electronically synthesizing sound use, digital waveguide filters (DWFs). DWFs mathematically simulate, using a bidirectional delay line with modification filters, wave propagation on a plucked string. That is, DWFs physically model wave propagation occurring in the string. A number of patents and publications describe music synthesis using DWFs, including: U.S. Pat. No. 4,984,276 to Julius O. Smith; U.S. Pat. No. 5,212,334 to Julius O. Smith;U.S. Pat. No. 5,448,010 to Julius O. Smith; and the publication by Julius O. Smith, "Physical Modeling Using Digital Waveguides," Computer Music Journal, Vol. 16, No. 4, 1992. The contents of these three patents and the publication are incorporated byreference herein.
DWFs provide a "virtual string" that can authentically simulate the dynamics of a plucked string. However, DWF modeling requires a number of physical string parameters that are difficult to measure directly. For example, parameters describingthe amount of energy loss (the energy loss constant) and traveling wave reflection (reflection coefficient) at any particular point in the string often must be determined by cumbersome trialanderror methods.
There is, therefore, a need in the art to be able to synthesize sounds produced by a plucked string using a "virtual string" that can be modeled with easily obtainable parameters.
SUMMARY OF THE INVENTION
Objects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the inventionwill be realized and attained by means of the elements and combinations particularly pointed out in the appended claims.
To achieve the objects and in accordance with the purpose of the invention, as embodied and broadly described herein, a first aspect consistent with the present invention is directed to a method of synthesizing sounds produced by a pluckedstring. The method comprises a plurality of steps, including, generating an input waveform corresponding to an initial plucking of a string; initializing nodes of a neural network with values based on the input waveform; iteratively changing the nodevalues; outputting a waveform based on selected ones of the node values at a plurality of the iterations; and generating sound based on a sequence of the output waveforms, the generated sound simulating the sound made by this particular plucked string.
A second method consistent with the present invention is directed to simulating the behavior of traveling waves in a string. The method comprises the steps of: stimulating a recurrent neural network, the recurrent neural network including groupsof neurons arranged to simulate wave propagation through a scattering junction; and iteratively evaluating the neurons of the neural network.
A third method consistent with the present invention is directed to a method of training a recurrent neural network. The method comprises the steps of: measuring a timevarying sequence of vibrational values of a plucked string; initializing therecurrent neural network by setting displacement neurons in the neural network to values based on the measured timevarying sequence; iteratively calculating values of neurons in the recurrent neural network based on the initial values of thedisplacement neurons; calculating a total cost function value based on the measured sequence of vibrational values and the values of the displacement neurons obtained in the step of iteratively calculating values; and adjusting weights corresponding tothe neurons in the recurrent neural network when the total cost function is above a predetermined threshold.
Other aspects consistent with the present invention are directed to a computer readable media containing instructions for execution the methods similar to the first, second, and third aspects of the invention.
Another aspect consistent with the present invention is directed to a recurrent neural network having displacement nodes, arrival nodes, and departure nodes. The departure nodes receive values output from the displacement nodes and values outputfrom arrival nodes that are associated with a displacement node corresponding to a first predetermined location on the string, the departure nodes outputting values to arrival nodes that are associated with a displacement node corresponding to a secondpredetermined location on the string.
Still another aspect of the present invention is directed to an electronic apparatus for synthesizing sounds produced by plucked string instruments. The apparatus comprises: a memory for storing virtual models of strings that are to besynthesized; a waveform generation section for generating initial waveforms stimulating the virtual models; a scattering recurrent network synthesis section for generating timevarying waveforms using a recurrent neural network with neurons assigned theweighting values stored in the memory; and a speaker for outputting the synthesized timevarying waveforms.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate several embodiments consistent with this invention and, together with the description, help explain the principles of the invention. Inthe drawings,
FIG. 1 is a graph illustrating vibration of an ideal string;
FIG. 2 is a diagram illustrating a model of a scattering junction;
FIG. 3 is a diagram of a neuron used in an artificial neural network;
FIG. 4 ia a diagram of a feedforward type neural network;
FIGS. 5A5C are graphs of neuron activation functions;
FIG. 6 is a diagram of a feedback type neural network;
FIG. 7 is a diagram of a general purpose computer on which methods consistent with the present invention may be implemented;
FIG. 8 is a diagram illustrating a scattering recurrent network;
FIGS. 9A9C are diagrams illustrating nodes of a scattering recurrent network;
FIG. 10 is a block diagram of a steel string measuring device consistent with an aspect of the,present invention for obtaining string vibration;
FIG. 11A is diagram of a bineural recurrent neural network;
FIG. 11B is a diagram of an "timeunfolded" version of the neural network shown in FIG. 11A;
FIG. 12 is diagram of a scattering recurrent network "timeunfolded";
FIG. 13 is a flow chart illustrating methods consistent with the present invention for training a scattering recurrent network;
FIG. 14 is a diagram of music synthesis system consistent with the present invention;
FIGS. 15A through 15E are graphs illustrating exemplary waveforms used to stimulate a scattering recurrent network;
FIG. 16 is a flow chart illustrating methods consistent with the present invention for synthesizing sound;
FIGS. 17A17D are graphs showing measured vibration values of a string;
FIGS. 18A18D are graphs showing vibration values generated through sound synthesis by a trained scattering recurrent network;
FIGS. 19A19D are graphs showing the ShortTimeFourierTransform of measured vibration values of a string; and
FIGS. 20A20D are graphs showing the ShortTimeFourierTransform of vibration values generated through sound synthesis by a trained scattering recurrent network.
DETAILED DESCRIPTION
Reference will now be made in detail to the preferred embodiments of the invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer tothe same or like parts.
This disclosure describes methods and systems for physically modeling and synthesizing sounds produced by pluckedstring instruments using a recurrent neural network. The recurrent neural network, called a Scattering Recurrent Network (SRN) isbased on the physics of waves traveling in the string. Vibration measured from a plucked string is used as the training data for the SRN.
The trained SRN is a virtual model capable of generating tones similar to the tones generated by the physical string. As with a real string, the "virtual string" corresponding to the SRN responds differently to different types of string"plucking" motions.
String Vibration Model
An initial step in establishing a physical model of a vibrating string is the formulation of a mathematical model representing the dynamics of the string. One well known model of a vibrating string uses the equation:
where ##EQU1##
This model assumes an ideal string, i.e., a string that is lossless, linear, uniform, volumeless, and flexible. FIG. 1 is a graph illustrating the relationship of the variables used in equation (1).
By varying the string startup and boundary conditions, the solution for wave movement in the string can be derived from equation (1) as the aggregate of two transmission waves of the same velocity, one moving to the left and one moving to theright. The general solution is:
where c=square_root(K/e) represents the horizontal wave velocity, y.sub.r (tx/c) and y.sub.r (t+x/c) express transmission waves traveling to the right and the left, respectively, and y(x,t) is the amount of displacement (vibration) of the stringat position x and time t.
Equation (2) describes wave movement for a lossless wave moving through al string that has ideal uniformity and flexibility. In real life, friction consumes energy. A resistive force in direct proportion to the string's vibration velocity canbe used to simulate frictional forces. Assuming a resistive force constant of .mu., resistive force can be added to equation (1), yielding the wave equation:
which can be used to derive an equation for a travelingwave experiencing energy loss:
To convert the travelingwave solution of equation (4) into the digital domain, it is necessary to sample the traveling wave amplitude at intervals of T seconds. Because the wave velocity is c, the physical interval between samples on a string,.DELTA.x, equals cT. The digital version of equation (4) is thus ##EQU2##
where t.sub.n =nT (n is an integer), x.sub.m =(m)(.DELTA.x)=mcT (m is an integer). By further defining the energy loss constant (passive loss factor) as w.ident.e.sup..mu.T/2.epsilon. and f.sub.r (nm).ident.y.sub.r ((nm)T), equation (5) canbe rewritten as
With actual instruments, the two ends of the string are typically stationary, thus the amount of vibration at the end points of the strings is always zero. Assuming a string length of L, this restriction can be imposed on equation (6), with theresult ##EQU3##
Accordingly, at the right and left end points of the string, the transmission waves moving to the right and left can be expressed as ##EQU4##
Factors such as lack of uniformity in the string cause changes of acoustic impedance in the string, which tend to reflect the traveling waves. The phenomenon of changing acoustic impedance can be modeled as intersection points, called scatteringjunctions, that partially reflect the traveling waves. FIG. 2 is a diagram illustrating a model of a scattering junction 202 having left and right impedances of Z.sub.1 and Z.sub.2, rightmoving and leftmoving input transmission waves,.phi..sub.r.sup.1 and .phi..sub.l.sup.2, respectively, and right and left moving departing transmission waves, f.sub.r.sup.2 and f.sub.l.sup.1, respectively.
The vibration at junction 202 can be expressed as
where, in order to simplify the traveling wave expression, the symbols, (x.sub.m,t.sub.n) have been omitted from equation (9). The departure waves, f.sub.l.sup.1 and f.sub.r.sup.2, can be expressed as ##EQU5##
As described above, equations (1) through (8) are used to model physical vibration of a string. Scattering junctions, introduced in equations (9) and (10), and illustrated in FIG. 2, extend this model to include partial wave reflections alongthe string.
Consistent with the present invention, traveling waves in a string are modeled to implement equations (1) through (10) using a closed circuit feedback type of recurrent neural network. To assist the reader in understanding concepts of thepresent invention, a brief review of neural networks will now be presented.
Neural Networks
The basic building block of a neural network is the neuron. FIG. 3 is a diagram illustrating neuron 301, which includes an input section 302 and an output section 303. Input section 302 receives signals and generates a weighted sumcorresponding to the received input signals. The weighted sum is passed through an activation function associated with output section 303, and the result, y, is output from the neuron.
A feedforward neural network typically includes multiple neurons arranged in two or more layers, as shown in FIG. 4. Input layer 401 receives external stimuli labeled as input signals x.sub.1 x.sub.3. Each neuron in input layer 401 transmitsits output to the neurons in the next neuron layer, hidden layer 402. In a similar manner, the neurons in layer 402 transmit their outputs to the neurons in hidden layer 403, which transmit their outputs to the neurons in the next layer. This processcontinues until the neurons in the final layer, output layer 405, are stimulated. The output of the neurons in layer 405 is the output of the neural network.
The output of a neuron is equal to the weighted sum of its inputs passed through an activation function. More specifically, the weighted sum, net.sub.i, is calculated as:
where .THETA..sub.i represents a bias value for the neuron and wj.sub.i,j represents a weighting value associated with input x.sub.j of neuron i. Exemplary activation functions include step functions, bipolar sigmoid functions, and bipolar rampfunctions, as shown in FIGS. 5A5C, respectively.
The neural network shown in FIG. 4 is a feedforward type neural network. In a feedforward network, the input layer receives signals from outside the network, the signals proceed successively through the network, and the output layer forwards theprocessed signals out of the network.
A second common type of neural network are feedback type neural networks. An exemplary feedback type neural network is shown in FIG. 6. In feedback network 600, signals output from a layer may be fed back to a previous layer. A neural networkwith at least one mutually connected layer is a feedback network. A special type of feedback neural network, in which the signal transmission path forms a closed circuit, is called a recurrent neural network. Neural network 600 is an example of arecurrent neural network.
Neural networks "learn" by adjusting the weights, w.sub.i,j, associated with the inputs of each neuron. Typically, to train a neural network, training data having input values and corresponding known output values are run through the neuralnetwork. The weights, w.sub.i,j, are adjusted so that the network's output values tend to match the known training values. Many techniques are known for appropriately adjusting the weights, w.sub.i,j. One popular technique for adjusting the weights ina feedback network is known as Error Backpropagation.
Neural networks may be implemented in a variety of ways, such as by a computer or through special neural network hardware. FIG. 7 is a diagram of a general purpose computer on which methods consistent with the present invention may beimplemented.
Computer system 700 is preferably a multimedia computer system on which neural networks may be trained and/or used. Computer system 700 includes a chassis 710, which holds the computer's main processor and main memory; an input device such askeyboard 712; a storage device such as floppy or hard disk 714; and a display such as monitor 716. Computer system 700 is optionally connected to a network 718, and may be operated directly by a user or through a network. Speakers 719 deliver audiooutput to the user.
Many variations of computer system 700 are possible. For example, storage device 714 may additionally include storage media such as optical disks, and user input device 712, instead of or in addition to keyboard 712, may include any type of userinput device, such as: an electronic mouse, a trackball, a lightpen, a touchsensitive pad, a digitizing tablet, or a joystick.
Additional input devices, such as devices used to capture and modify audio data, may be used to input data to computer system 700. String vibration measuring device 1000 is one such input device. This device is described below in more detailwith reference to FIG. 10.
Scattering Recurrent Network
Consistent with the present invention, a closed circuit feedback type of recurrent neural network is used to implement the previously described string vibrational model. This recurrent neural network will be referred to as a scattering recurrentnetwork (SRN).
FIG. 8 is a diagram illustrating scattering recurrent network 800. The upper half of SRN 800 represents transmission waves moving to the right and the lower half represents transmission waves moving to the left. Each of nodes 802, 804, and 806correspond to a neuron in the SRN. Neurons 802 simulate the degree of vibration, y, at each sampling location on the string, and will be called displacement nodes. Displacement nodes 802 have scattering junction characteristics. Neurons 804 (labeled.phi.) are arrival nodes that represent transmission waves flowing into displacement nodes 802. Finally, neurons 806 (labeled f) are departure nodes that represent transmission waves flowing out of displacement nodes 802. The links between arrivalnodes 804 and departure nodes 806 include an energy loss factor and a unit delay.
FIGS. 9A9C are diagrams illustrating nodes 802, 804, and 806, respectively, in more detail. Each displacement node 802, as shown in FIG. 9A, receives two inputs and generates two outputs. Each arrival node 804, shown in FIG. 9C, receives oneinput and generates two outputs. Each departure node 806, shown in FIG. 9B, receives two inputs and generates one output.
The dynamics of SRN 800, shown in FIGS. 8 and 9, are described by a number of equations. Specifically, values of the upper and lower half arrival nodes 804 at time t+1 are given by ##EQU6##
The t+1 instantaneous degree of vibration at each displacement node 802 is expressed as ##EQU7##
where
while the departure waves at departure nodes 806 moving to the left and right at timid t+1 are obtained with the equation ##EQU8##
In equations (11)(14) the neuron activation functions, a( ), are preferably one of the functions illustrated in FIGS. 5A5C, such as the bipolar ramp function shown in FIG. 5B.
Measurement of Training Data
Before SRN 800 can be used to synthesize string sounds, it must be trained. Neural networks are trained using premeasured or prederived training data.
FIG. 10 is a block diagram of a steel string measuring device consistent with an aspect of the present invention for obtaining string vibration values that are used by computer 700 as training data. Measuring device 1000 includes severalelectric guitar electromagnetic sensors 1002, such as those found in electric guitars, located at each sampling point along string 1006. Electromagnetic sensors 1002 are controlled to synchronously sample the vibration values at each sampling point andfor each sampled time point.
Electromagnetic sensors 1002 each include a coil with a permanent magnet. Plucking the steel string causes magnetic flux changes that induce electric signals in the coils of sensors 1002. The sampled values are based on the induced electricsignals. The sampled values are amplified and converted to digital form by preamps 1003 and analogtodigital converters 1004. The analogtodigital converter may digitize its analog input to a 16bit quantization level at a sampling rate of 32 kHz. The digital samples may then be stored on a multitrack realtime digital audio storage device (DAT) 1005, such as the Audio Engine, from Sprectral Company, USA.
Although measuring device 1000 is illustrated as having six electromagnetic sensors, the number of electromagnetic sensors can be arbitrarily increased or decreased depending on experimental requirements. Further, the electromagnetic sensors arepreferably mounted on a slide positioning assembly that allows the sensors to be moved along the track of the slide so that the sampling locations may be easily changed.
Training of the Scattering Recurrent Network
As discussed, training data for SRN 800, obtained by measuring device 1000, is a time sequence of string vibrational measurements taken at points longitudinally along the string. The vibrational measurements represent the instantaneous degree ofvibration at selected ones of displacement nodes 802.
A number of training algorithms are known for training recurrent neural networks. Preferably, computer 700 uses the Backpropagation Through Time (BPTT) method to train the SRN, although other recurrent training algorithms may provesatisfactory. The BPTT training algorithm is described in Introduction to the Theory of Neural Computing, A. Hertz and R. G. Palmer, AddisonWesley, New York, 1991 and "An Efficient GradientBased Algorithm for OnLine Training of Recurrent NetworkTrajectories," R. Williams and J. Peng, Neural Computing, Vol. 2, p. 490501, 1990.
A brief explanation of the BPTT training algorithm will now be described with reference to FIGS. 11A and 11B.
FIG. 11A is diagram illustrating a simple bineural recurrent neural network 1100. The network is a "completely linked" recurrent network, as each of its two neurons, 1102 and 1104, receive two inputs, one from itself and one from the otherneuron. BPTT effectively "opens up" (or "timeunfolds") network 1100 through time to obtain a network that resembles a feedforward network. The "opened up" version of network 1100 is illustrated in FIG. 11B. The beginning values of the network are thevalues at y.sub.1 (0) and y.sub.2 (0), respectively. The values for neurons 1102 and 1104 at time 1 are ##EQU9##
Extrapolating from this equation, the values of the neurons at any time, t+1, can be obtained from the equation ##EQU10##
On the "feedforward" network shown in FIG. 11B, which was generated by timeunfolding network 1100, the backpropagation training algorithm is employed to find trained values for weights w.sub.11, w.sub.12, w.sub.21, and w.sub.22. Backpropagationis well known in the art for training feedforward neural networks. Essentially, backpropagation defines a cost function (or error function) based on the weights, w, and then draws gradients from the cost function. Negative gradients are followed in anattempt to find a cost function minimum.
FIG. 12 is a diagram of SRN 800 timeunfolded. The network comprises a plurality of time layers 12011206. Each of time layers 12011206 includes three subsidiary layers: a displacement layer, such as layer 1210; an arrival layer, such as. layer 1211; and a departure layer, such as layer 1212. The layers are respectively comprised of displacement nodes 802, arrival nodes 804, and departure nodes 806.
Each layer 12011206 may contain, for example, onehundred displacement nodes. Six displacement nodes are labeled in layer 1201 as displacement nodes 12201225. As previously mentioned, the value of each displacement node corresponds to theamount of string vibration at the physical location in the string associated with the displacement node. In training the network, vibration values at various positions along the string are measured at multiple instances in time, to thereby create a setof timevarying sequences of vibrational values. The time sequences correspond to time layers. Practically, it is generally not possible to obtain a measured training vibrational value for every displacement node 12201225 in the network. For example,as shown in FIG. 12, only nodes 1220 and 1225 are set to measured training values (indicated by a 0 written in these nodes). Displacement nodes in which data is either input from an external source or output to an external device, such as displacementnodes 1220 and 1225, are called visible nodes. The remaining placement nodes are nonvisible "hidden" nodes. For example, a string simulation may have eight visible nodes, all of which are used to construct output waveforms, but only six of whichreceive input values corresponding to measured string vibration.
To train SRN 800, a cost function is defined. Assuming d.sub.i (t) represents measured vibration values at the i.sup.th measurement location at time t (i.e., the desired node values), and A(t) represents the set of visible nodes, then the errorsignals at any time t can be defined as
##EQU11##
where y.sub.i (t) represents the output of the i.sup.th displacement node (i.e., the value generated by the neuron) at time t. An error function can be defined based on equation (17) as ##EQU12##
Assuming t.sub.0 is the beginning of the training period and t.sub.1 the end of the training period, a total cost function, based on the error function, may be defined as ##EQU13##
When training SRN 800 using the measured vibration values, the object is to minimize the total cost function (19). A minimized cost function indicates that the outputs generated by the SRN displacement nodes are close to the measured values.
In order to achieve total cost function minimization, computer 700 calculates gradients of the parametric functions relating to the energy loss constants and the medium value reflection coefficients along the negative direction of the total costfunction. Specifically, the degree of change corresponding to the energy loss constants are ##EQU14##
and the degree of adjustment corresponding to the medium value reflection coefficient are ##EQU15##
where .eta. represents the learning constant. A typical value for the learning constant is 10.sup.5 to 10.sup.7. Further, ##EQU16##
FIG. 13 is a flow chart illustrating methods consistent with the present invention for training the SRN. For each SRN to be trained, timesequenced vibrational samples of the string to be simulated are obtained, as previously described, usingsteel string measuring device 1000 (step 1301). The initial (time zero) displacements that defined the starting position of the plucked string are determined and used to initialize the displacement values y.sub.i (0) (steps 1302, 1303). The waveform atbeginning of the string plucking has the characteristic that the degree of displacement at the plucking location has the highest displacement throughout the simulation. For example, if the string is plucked by pulling the center of the string up 2 cm,the initial waveform would resemble a triangleits largest value (2 cm) is in the center, and the other displacement nodes would be assigned linearly decreasing values out to the endpoint displacement nodes, which would have a value of zero. Initialvalues for the departure and arrival nodes at time equal to zero are also assigned. Preferably, these nodes are assigned values based on the loss factors being one and the reflection coefficient being zero. Alternatively, the initial values for thesenodes can be randomly assigned.
Based on the initial node values assigned in step 1303, computer system 700 iteratively calculates and stores, using equations (11)(14), the node values for each time increment (step 1304). From the stored displacement node values, the value ofthe total cost function is calculated using equation (19) (step 1305). If the total cost value is below a prespecified error threshold, the network is considered trained (steps 1306, 1307). Otherwise, the node weights are adjusted, as previouslydescribed, based on equations (20)(21) and (22) (step 1308), and the node values are then recalculated based on the new weights.
Music Synthesis
After computer system 700 has trained the SRN for a particular string, music synthesis can be performed. FIG. 14 is a diagram illustrating a complete music synthesis system, which may be implemented on computer system 700.
Music synthesis system 1400 includes a plurality of waveform generation sections 1401 through 1403, each connected to a corresponding SRN synthesis section 1405 through 1407. Waveform generation sections 1401 through 1403 generate the pluckingmotions used to "play" the virtual strings of SRN synthesis sections 1405 through 1407. The waveforms generated by SRN synthesis sections 1405 through 1407 are input to mixer 1410. The mixed digital sound signal is then converted to all analog signalby D/A converter 1412, and output via amp 1415 and speaker 1417.
FIGS. 15A through 15E are graphs illustrating exemplary waveforms generated by waveform generation sections 1401 through 1403. FIG. 15A is a waveform simulating the use of a fingernail or a pick to pluck the string in the middle. Similarly, thewaveforms of FIGS. 15B and 15C illustrate plucking of the string towards its right sides and left end, respectively. FIG. 15D is a waveform simulating plucking with the fleshy portion of the finger. FIG. 15E simulates a rapping or hitting of the string(for example, the waveform made by the hammer of a piano).
There are a number of methods through which the waveforms shown in FIGS. 15A through 15E can be generated. Three methods, in particular, will now be described in more detail: interpolation, function evaluation, and sketch point method.
Using the interpolation method, the user or the software assigns displacement to each important, predefined point on the string. The values of the remaining displacement nodes are then interpolated based on the predefined known points. Thewaveforms shown in FIGS. 15A through 15C were generated using interpolation.
Function evaluation is simply the generation of a waveform based on a mathematical function. The waveform of FIG. 15D, for example, was obtained from a Gauss function.
With the sketch point method, the user may assign any type of plucking waveform, preferably by graphically drawing or sketching the waveform using appropriate software. Sketching software that allows one to draw arbitrary waveforms; using apointing device, such as a mouse, are well known and accordingly will not be described further.
Initial waveforms generated by waveform generation sections 1401 through 1403 are propagated through time by SRN synthesis sections 1405 through 1407, to thereby synthesize sound waveforms. FIG. 16 is a flow chart illustrating methods consistentwith the present invention for synthesizing sound. Synthesis involves initializing the network (steps 1601 and 1602) and propagating the node values through time (steps 1603 to 1606).
An initial waveform, such as those illustrated in FIGS. 15A through 15E, can be expressed as
The displacement nodes of the SRN at time t=0 are set to the values of the initial waveform, from which the output waveform, out(t), is determined (step 1601). The output waveform, out(t), for any particular time, is the values of thedisplacement nodes corresponding to the desired output locations. Expressed mathematically, out(t) is
where O is the set of displacement node values desired in the output signal. The displacement node values obtained in step 1601 are distributed to adjacent departure nodes (step 1602) as
thereby completing initialization of the network.
Having filled the departure and displacement initial node values, sound synthesis through the remaining time points is a straightforward application of equations (11), (12), (13), (14) and (24). In particular, having obtained the initialdisplacement node and departure node values, the SRN synthesis sections use equation (11) to find the arrival node values for time t=1 (step 1603). Equations (12) and (13) are then used to obtain the displacement node values for time t=1, (step 1604),and equation (14) to find the departure node values (step 1605). Output waveform, out(t), is determined at step 1604 using equation (24). Steps 1603 through 1605 are repeated for each of times t=2 to t=end.
Example Simulation
The following example illustrates application of the above discussed principles of the present invention to the A string of a cello.
The training data for the cello's string, shown in FIGS. 17A17D, was obtained with a measurement device similar to device 1000, but having seven sensors instead of six. FIGS. 17A17D are graphs showing 10000 sample values measured by four ofthe seven sensors (sensors 1, 2, 3, and 6).
The first 2000 samples (t=0 to 2000) measured by device 1000 were used to train an SRN having 100 displacement nodes and a learning constant, .eta., of 0.0000001. The network was trained for 10,000 epochs.
FIGS. 18A18D are graphs showing values generated through sound synthesis by the trained SRN at each of the four visible displacement nodes. The initial waveform used to stimulate the SRN was designed to resemble the plucking motion used togenerate the values shown in FIGS. 17A17D. As can be seen from a comparison of FIGS. 17A17D to FIGS. 18A18D, the synthesized waveform is similar to the waveform measured from the physically plucked cello string.
FIGS. 19A19D are waveforms illustrating the ShortTimeFourierTransform (STFT) of the waveforms of FIGS. 17A17D, respectively. Similarly, FIGS. 20A20D are waveforms illustrating the ShortTimeFourierTransform (STFT) of the waveforms ofFIGS. 18A18D, respectively. The STFT waveforms more readily illustrate the frequency components present in the waveforms of FIGS. 17 and 18.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the scope or spirit of the invention. For example, although the embodiments disclosed hereindescribe a neural network trained and implemented on a multimedia computer, the networks could, of course, be implemented via dedicated hardware.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. It is intended that the specification and examples be considered as exemplaryonly, with the true scope and spirit of the invention being indicated by the following claims.
* * * * * 


