Welcometoyourfinalprogrammingassignmentofthisweek!Inthisnotebook,youwillimplementamodelthatusesanLSTMtogeneratemusic.Youwillevenbeabletolistentoyourownmusicattheendoftheassignment.
Youwilllearnto:
Pleaserunthefollowingcelltoloadallthepackagesrequiredinthisassignment.Thismaytakeafewminutes.
from__future__importprint_functionimportIPythonimportsysfrommusic21import*importnumpyasnpfromgrammarimport*fromqaimport*frompreprocessimport*frommusic_utilsimport*fromdata_utilsimport*fromkeras.modelsimportload_model,Modelfromkeras.layersimportDense,Activation,Dropout,Input,LSTM,Reshape,Lambda,RepeatVectorfromkeras.initializersimportglorot_uniformfromkeras.utilsimportto_categoricalfromkeras.optimizersimportAdamfromkerasimportbackendasKUsingTensorFlowbackend.1-ProblemstatementYouwouldliketocreateajazzmusicpiecespeciallyforafriend'sbirthday.However,youdon'tknowanyinstrumentsormusiccomposition.Fortunately,youknowdeeplearningandwillsolvethisproblemusinganLSTMnetwork.
Youwilltrainanetworktogeneratenoveljazzsolosinastylerepresentativeofabodyofperformedwork.
YouwilltrainyouralgorithmonacorpusofJazzmusic.Runthecellbelowtolistentoasnippetoftheaudiofromthetrainingset:
Youcaninformallythinkofeach"value"asanote,whichcomprisesapitchandduration.Forexample,ifyoupressdownaspecificpianokeyfor0.5seconds,thenyouhavejustplayedanote.Inmusictheory,a"value"isactuallymorecomplicatedthanthis--specifically,italsocapturestheinformationneededtoplaymultiplenotesatthesametime.Forexample,whenplayingamusicpiece,youmightpressdowntwopianokeysatthesametime(playingmultiplenotesatthesametimegenerateswhat'scalleda"chord").Butwedon'tneedtoworryaboutthedetailsofmusictheoryforthisassignment.
Runthefollowingcodetoloadtherawmusicdataandpreprocessitintovalues.Thismighttakeafewminutes.
X,Y,n_values,indices_values=load_music_utils()print('numberoftrainingexamples:',X.shape[0])print('Tx(lengthofsequence):',X.shape[1])print('total#ofuniquevalues:',n_values)print('shapeofX:',X.shape)print('ShapeofY:',Y.shape)numberoftrainingexamples:60Tx(lengthofsequence):30total#ofuniquevalues:78shapeofX:(60,30,78)ShapeofY:(30,60,78)Youhavejustloadedthefollowing:
X:Thisisan(m,\(T_x\),78)dimensionalarray.
Y:a\((T_y,m,78)\)dimensionalarray
n_values:Thenumberofuniquevaluesinthisdataset.Thisshouldbe78.
indices_values:pythondictionarymappingintegers0through77tomusicalvalues.
Hereisthearchitectureofthemodelwewilluse.ThisissimilartotheDinosaurusmodel,exceptthatyouwillimplementitinKeras.
Exercise:Implementdjmodel().
Totalparams:41,678Trainableparams:41,678Non-trainableparams:0CompilethemodelfortrainingYounowneedtocompileyourmodeltobetrained.Wewilluse:optimizer:AdamoptimizerLossfunction:categoricalcross-entropy(formulti-classclassification)opt=Adam(lr=0.01,beta_1=0.9,beta_2=0.999,decay=0.01)model.compile(optimizer=opt,loss='categorical_crossentropy',metrics=['accuracy'])InitializehiddenstateandcellstateFinally,let'sinitializea0andc0fortheLSTM'sinitialstatetobezero.
Epoch1/10060/60[==============================]-3s-loss:125.7673...ScrolltothebottomtocheckEpoch100
...Epoch100/10060/60[==============================]-0s-loss:6.1861Nowthatyouhavetrainedamodel,let'sgotothefinalsectiontoimplementaninferencealgorithm,andgeneratesomemusic!
Younowhaveatrainedmodelwhichhaslearnedthepatternsofthejazzsoloist.Letsnowusethismodeltosynthesizenewmusic.
Ateachstepofsampling,youwill:
Exercise:
Herearesomeofthekeystepsyou'llneedtoimplementinsidethefor-loopthatgeneratesthe\(T_y\)outputcharacters:
Step2.A:UseLSTM_Cell,whichtakesintheinputlayer,aswellasthepreviousstep's'c'and'a'togeneratethecurrentstep's'c'and'a'.
next_hidden_state,_,next_cell_state=LSTM_cell(input_x,initial_state=[previous_hidden_state,previous_cell_state])Choosetheappropriatevariablesfortheinput_x,hidden_state,andcell_stateStep2.B:Computetheoutputbyapplyingdensortocomputeasoftmaxon'a'togettheoutputforthecurrentstep.
Step2.C:Appendtheoutputtothelistoutputs.
Step2.D:Samplextobetheone-hotversionof'out'.
ThisallowsyoutopassittothenextLSTM'sstep.
Wehaveprovidedthedefinitionofone_hot(x)inthe'music_utils.py'fileandimportedit.Hereisthedefinitionofone_hot
defone_hot(x):x=K.argmax(x)x=tf.one_hot(indices=x,depth=78)x=RepeatVector(1)(x)returnxHereiswhattheone_hotfunctionisdoing:
argmax:withinthevectorx,findthepositionwiththemaximumvalueandreturntheindexofthatposition.
result=Lambda(lambdax:x+1)(input_var)Ifyoupre-defineafunction,youcandothesamething:
defadd_one(x)returnx+1#usetheadd_onefunctioninsideoftheLambdafunctionresult=Lambda(add_one)(input_var)Step3:InferenceModel:ThisishowtousetheKerasModel.
Totalparams:41,678Trainableparams:41,678Non-trainableparams:0InitializeinferencemodelThefollowingcodecreatesthezero-valuedvectorsyouwillusetoinitializexandtheLSTMstatevariablesaandc.
x_initializer=np.zeros((1,1,78))a_initializer=np.zeros((1,n_a))c_initializer=np.zeros((1,n_a))Exercise:Implementpredict_and_sample().
Finally,youarereadytogeneratemusic.YourRNNgeneratesasequenceofvalues.Thefollowingcodegeneratesmusicbyfirstcallingyourpredict_and_sample()function.Thesevaluesarethenpost-processedintomusicalchords(meaningthatmultiplevaluesornotescanbeplayedatthesametime).
Mostcomputationalmusicalgorithmsusesomepost-processingbecauseitisdifficulttogeneratemusicthatsoundsgoodwithoutsuchpost-processing.Thepost-processingdoesthingssuchascleanupthegeneratedaudiobymakingsurethesamesoundisnotrepeatedtoomanytimes,thattwosuccessivenotesarenottoofarfromeachotherinpitch,andsoon.Onecouldarguethatalotofthesepost-processingstepsarehacks;also,alotofthemusicgenerationliteraturehasalsofocusedonhand-craftingpost-processors,andalotoftheoutputqualitydependsonthequalityofthepost-processingandnotjustthequalityoftheRNN.Butthispost-processingdoesmakeahugedifference,solet'suseitinourimplementationaswell.
Let'smakesomemusic!
Runthefollowingcelltogeneratemusicandrecorditintoyourout_stream.Thiscantakeacoupleofminutes.
out_stream=generate_music(inference_model)Predictingnewvaluesfordifferentsetofchords.Generated51soundsusingthepredictedvaluesforthesetofchords("1")andafterpruningGenerated51soundsusingthepredictedvaluesforthesetofchords("2")andafterpruningGenerated50soundsusingthepredictedvaluesforthesetofchords("3")andafterpruningGenerated51soundsusingthepredictedvaluesforthesetofchords("4")andafterpruningGenerated50soundsusingthepredictedvaluesforthesetofchords("5")andafterpruningYourgeneratedmusicissavedinoutput/my_music.midiTolistentoyourmusic,clickFile->Open...Thengoto"output/"anddownload"my_music.midi".Eitherplayitonyourcomputerwithanapplicationthatcanreadmidifilesifyouhaveone,oruseoneofthefreeonline"MIDItomp3"conversiontoolstoconvertthistomp3.
Asareference,hereisa30secondaudioclipwegeneratedusingthisalgorithm.
Congratulationsoncompletingthisassignmentandgeneratingajazzsolo!
References
Theideaspresentedinthisnotebookcameprimarilyfromthreecomputationalmusicpaperscitedbelow.TheimplementationherealsotooksignificantinspirationandusedmanycomponentsfromJi-SungKim'sGitHubrepository.