Êðàòêîñðî÷íîå ïðåäñêàçàíèå ìóçûêàëüíûõ ïðîèçâåäåíèé ñ èñïîëüçîâàíèåì ïîñëåäîâàòåëüíîñòåé àêêîðäîâ Ìèõàèë Ìàòðîñîâ 1,2,3 , íàó÷íûé ðóêîâîäèòåëü Â. Â. Ñòðèæîâ 1,2,3 , 1 êîíñóëüòàíò Àíòîí Ìàòðîñîâ 1 Ìîñêîâñêèé Ôèçèêî-Òåõíè÷åñêèé Èíñòèòóò 2 Ñêîëêîâñêèé Èíñòèòóò Íàóêè è Òåõíîëîãèé 3 Âû÷èñëèòåëüíûé Öåíòð Ðîññèéñêîé Àêàäåìèè Íàóê Âû÷èñëèòåëüíûé Öåíòð Ðîññèéñêîé Àêàäåìèè Íàóê Èþíü 2015, Ìîñêâà, ÐÔ Ì. Ï. Ìàòðîñîâ Ïðåäñêàçàíèå ìóçûêè 1 / 28 Öåëü èññëåäîâàíèÿ Ïðåäñêàçàòü ñëåäóþùèé ýëåìåíò â ïîñëåäîâàòåëüíîñòè àêêîðäîâ â òåîðåòèêî-ãðóïïîâîì ïðåäñòàâëåíèè, íå ó÷èòûâàÿ òåìïîðàëüíóþ ñîñòàâëÿþùóþ (àðïåäæèî, äëèòåëüíîñòè, ïàóçû). Íîâèçíà: Ìåòîä: áîëåå òî÷íîå (áîëüøå èíôîðìàöèè î êàæäîì àêêîðäå) ïðåäñòàâëåíèå ìóçûêàëüíîé ïîñëåäîâàòåëüíîñòè. Pitch êîìïîçèöèÿ áàéåñîâñêèõ êëàññèôèêàòîðîâ. Time , not A sequence of simple tones Òèçåð: 92.5% Õýììèíã-ñõîäñòâî (58% ïî-àêêîðäíî) ìåæäó ïðåäñêàçàíèåì è îðèãèíàëüíîé ìåëîäèåé èç òåñòîâîé âûáîðêè. Ì. Ï. Ìàòðîñîâ Ïðåäñêàçàíèå ìóçûêè 2 / 28 Îêòàâà Îêòàâà ñîñòîèò èç 12 ïîëóòîíîâ. Ìóçûêàíò ìîæåò ñûãðàòü âñå íîòû â ïðåäåëàõ îäíîé îêòàâû, ïðè ýòîì çâó÷àíèå ìåëîäèå èçìåíèòñÿ íå çíà÷èòåëüíî. Ïî-ýòîìó ìû ïðåäñòàâëÿåì àêêîðäû â ïðåäåëàõ òîëüêî îäíîé îêòàâû. Ì. Ï. Ìàòðîñîâ Ïðåäñêàçàíèå ìóçûêè 3 / 28 Àêêîðäû Êàæäûé àêêîðä ñîäåðæèò îò 1 äî 12 îäíîâðåìåííî çâó÷àùèõ ïîëóòîíîâ. Àêêîðä ìîæåò áûòü òðàíñïîíèðîâàí íà íåñêîëüêî ïîëóòîíîâ ââåðõ èëè âíèç. Åãî òàê æå ìîæíî èçîáðàçèòü íà êðóãå, ïðè ýòîì âðàùåíèå ýêâèâàëåíòíî èçìåíåíèþ âûñîòû òîíà. Triadic chord examples (key of C) A G#b A Bb# A C DC# b D# Eb E D# Eb E F D# Eb E A G#b A Bb# A B D D D F F# Gb G A G#b A Bb# A B C DC# b F# Gb G B diminished minor C DC# b F major F# Gb G Seventh Chords (key of C) F# Gb G A G#b A Bb# A A G#b A Bb# A F F A G#b A Bb# A C DC# b D# Eb E D# Eb E D# Eb E F# Gb G Ïðåäñêàçàíèå ìóçûêè B D D D Ì. Ï. Ìàòðîñîâ B C DC# b F# Gb G B diminished minor C DC# b F major 4 / 28 Ñîçâó÷èå è áàçîâûé òîí Íèæå èçáðàæåíû åùå íåñêîëüêî àêêîðäîâ. Ñóùåñòâóåò 351 âîçìîæíîå ñîçâó÷èå. Êàæäîå èç íèõ ìîæåò áûòü ñûãðàíî â 12 ðàçëè÷íûõ êëþ÷àõ, êðîìå íåñêîëüêèõ ñèììåòðè÷íûõ ñëó÷àåâ (äîâîëüíî ðåäêèõ). Èòîãî 212 − 1 = 4095 âîçìîæíûõ àêêîðäîâ. Ì. Ï. Ìàòðîñîâ Ïðåäñêàçàíèå ìóçûêè 5 / 28 Ïðåäñòàâëåíèå àêêîðäîâ Ìåëîäèÿ ïðåäñòàâëÿåòñÿ â âèäå ïîñëåäîâàòåëüíîñòè àêêîðäîâ (öåëûõ îò 1 äî 212 − 1), i çäåñü è äàëåå îáîçíà÷àåò âðåìÿ. c = {ci } , ci ∈ C, melody sequence C = {1, 2, 3, . . . , 4095} ïðîñòðàíñòâî àêêîðäîâ Êàæäûé àêêîðä èìååò ñâîþ ôîðìó (ñòðîé (s, z): c = s × z, chord strum key c = {(s, z)i }, melody pairs B Z12 ñòðîåâ , 351 ãðóïïà âû÷åòîâ ïî ìîäóëþ 12. Ì. Ï. Ìàòðîñîâ Ïðåäñêàçàíèå ìóçûêè D# Eb E ìíîæåñòâî óíèêàëüíûõ ýëåìåíòîâ. D Ïðîèçâåäåíèå îçíà÷àåò òðàíñïîçèöèþ. C DC# b A G#b A Bb# A ïîñëåäîâàòåëüíîñòü ïàð S è áàçîâûé òîí Òàêèì îáðàçîì, âñÿ ìåëîäèÿ ïðåäñòàâëÿåòñÿ êàê F# Gb G ∈ Z12 ). F (z . s ∈ S) 6 / 28 Ïîñëåäîâàòåëüíîñòü ýëåìåíòîâ Ìåëîäèÿ ìîæåò áûòü òðàíñïîíèðîâàíà, òàê ÷òî íîòû íóæíî ñîèçìåðÿòü ñ ïðåäøåñòâóþùèìè: ri = zi − zi−1 Ïàðà (s, r) íàçûâàåòñÿ mod 12, r1 = z1 . ýëåìåíòîì è îáîçíà÷àåòñÿ x ∈ E. C = S × Z12 = S × R12 = E, ìíîæåñòâî óíèêàëüíûõ , , , Z12 âû÷åòû ïî ìîäóëþ 12, N = 12, R12 ñìåæíûå ðàçíîñòè Z12 , N = 12, E ïðîñòðàíñòâî ýëåìåíòîâ B C A G#b A Bb# A , , â êîòîðîì ìû ïðåäñêàçûâàåì. po , r s an n Tr sitio D# Eb E àêêîðäîâ N = 4095 ñòðîåâ N = 351 F# Gb G ïðîñòðàíñòâî F C S Ïðîñòðàíñòâî ýòî ìíîæåñòâî ñ îïåðàöèåé òðàíñïîíèðîâàíèÿ (èçìåíåíèÿ âûñîòû òîíà). Ì. Ï. Ìàòðîñîâ Ïðåäñêàçàíèå ìóçûêè 7 / 28 N-ãðàììû N -ãðàìì N ýëåìåíòîâ x ∈ E x = {xi }. N -ãðàìì ðàçìåðà 1 ïîäïîñëåäîâàòåëüíîñòü èç äàííîé ïîñëåäîâàòåëüíîñòè ïðîñòî îäèí ýëåìåíò x ∈ E. ýòî Íàïðèìåð: xN i = {xi , xi+1 , . . . , xi+N−1 }.  äàííîé ðàáîòå N -ãðàììû èñïîëüçóþòñÿ êàê õàðàêòåðèçóþùèå âåêòîðà, îïèñûâàþùèå òåêóùóþ òî÷êó â ìóçûêàëüíîé ïîñëåäîâàòåëüíîñòè. Ì. Ï. Ìàòðîñîâ Ïðåäñêàçàíèå ìóçûêè 8 / 28 Íàáîð äàííûõ Midi50k 50 000 ñëó÷àéíûõ midi-ôàéëîâ áûëè ñîáðàíû â èíòåðíåòå. Êàæäûé midi-ôàéë ïðåîáðàçîâàí â ïîñëåäîâàòåëüíîñòü àêêîðäîâ c = {ci }, ci ∈ C ïî ñëåäóþùåìó àëãîðèòìó: îòêðûòü midi-ôàéë êàê piano roll, îòáðîñèòü ïåðêóññèþ, êâàíòîâàòü ñ ÷àñòîòîé 2 · tempo , îòáðîñèòü íîìåð îêòàâû (pitch = pitch mod 12). 600 àêêîðäîâ 30 ìèëëèîíîâ Ñðåäíèé midi-ôàéë ñîäåðæèì äàåò ñóììàðíî àêêîðäîâ, ÷òî àêêîðäîâ. Ìåëîäèÿ ïîñëåäîâàòåëüíîñòü ýëåìåíòîâ (èíäåêñ ýòî âðåìÿ): x = {xi }, xi ∈ E. X íàáîð ìåëîäèé: X = {xj }. Ì. Ï. Ìàòðîñîâ Ïðåäñêàçàíèå ìóçûêè 9 / 28 Ðàçáèåíèå âûáîðêè Äëÿ îöåíèâàíèÿ àëãîðèòìà ïîëíûé íàáîð Midi50k X0 äåëèëñÿ íà äâå ÷àñòè ðàçíîãî ðàçìåðà. Êàæäûé ðàç äåëåíèå ïðîèçâîäèëîñü ñëó÷àéíûì îáðàçîì èç íàáîðà âûáèðàëîñü ïîäìíîæåñòâî çàäàííîãî ðàçìåðà M. Xtraining ⊂ X0 , |Xtraining | = M. Äëÿ ïðîâåðêè àëãîðèòìà èñïîëüçîâàëàñü îñòàâøàÿñÿ ÷àñòü íàáîðà: Xtesting = X0 \ Xtraining . Ì. Ï. Ìàòðîñîâ Ïðåäñêàçàíèå ìóçûêè 10 / 28 Ïîñòàíîâêà çàäà÷è â òåðìèíàõ ìèíèìèçàöèè îøèáêè Äëÿ òðåíèðîâî÷íîãî íàáîðà äàííûõ ïàðàìåòðîâ àëãîðèòìà w, X íàéòè âåêòîð ìèíèìèçèðóþùèé ôóíêöèþ îøèáêè: ŵ = arg min S(w, X). w∈R2K Ì. Ï. Ìàòðîñîâ Ïðåäñêàçàíèå ìóçûêè 11 / 28 Ôóíêöèÿ ïðåäñêàçàíèÿ Ïðåäñêàçàíèå äåëàåòñÿ ïî âçâåøåííîé ñóììå êëàññèôèêàòîðîâ: xi+1 = f (X, {x1 , . . . , xi }, w ={uk ,vk }, k=1,...K K X ) = arg max (Aek uk + Bek vk ), e∈E k=1 (1) Aek ∝ N {xi−k+2 , . . . , xi , e} {z } | k -gram in X , |{z} Training set Bek ∝ N {xi−k+2 , . . . , xi , e} | {z } k -gram in {x1 , . . . , xi } , | {z } Part of melody before i+1 ãäå "∝"îçíà÷àåò, ÷òî A∗k è B∗k íîðìèðîâàíû íà L1, xi ∈ E, N(g in dataset) êîëè÷åñòâî k -ãðàììîâ g ∈ Ek â íàáîðå, w = {uk , vk |k = 1, . . . K } ∈ R2K âåêòîð ïàðàìåòðîâ, K ñëîæíîñòü ìîäåëè (ìàêñèìàëüíàÿ äëèíà N -ãðàììîâ). Ì. Ï. Ìàòðîñîâ Ïðåäñêàçàíèå ìóçûêè 12 / 28 Ôóíêöèÿ îøèáêè Ïóñòûå êðóãè èñòèííûå ïîëóòîíà, òî÷êè ïðåäñêàçàííûå, îøèáêè ïîäñâå÷åíû, ãîðèçîíòàëüíàÿ îñü âðåìÿ. Ì. Ï. Ìàòðîñîâ Ïðåäñêàçàíèå ìóçûêè 13 / 28 Ôóíêöèÿ îøèáêè Ôóíêöèÿ f ýòî êëàññèôèêàòîð, ïðåäñêàçûâàþùèé ñëåäþùèé ýëåìåíò. Òîãäà ôóíêöèÿ îøèáêè (i îçíà÷àåò âðåìåííîé èíòåðâàë): S(w, X) = x −1 X NX x∈X i=1 xi+1 6= f (X, {x1 , . . . , xi } , w). | {z } Prev. part of the melody Brackets stand for 1 if the statement inside is true and 0 if false. Êâàäðàòíûå ñêîáêè äàþò 1, åñëè âûðàæåíèå âíóòðè èñòèííî, è 0, åñëè ëîæíî. X íàáîð ìåëîäèé, w ∈ R2K Ì. Ï. Ìàòðîñîâ âåêòîð ïàðàìåòðîâ. Ïðåäñêàçàíèå ìóçûêè 14 / 28 Ñãëàæåííàÿ ôóíêöèÿ îøèáêè Âââåäåì òàêæå ñãëàæåííóþ ôóíêöèþ îøèáêè, ÷òîáû èçáàâèòüñÿ îò âîçìîæíûõ ïðîáëåì ñî ñòàáèëüíîñòüþ ïðîöåññà ìèíèìèçàöèè: Ssmooth (w, X) = x −1 X NX x∈X i=1 B xi+1 , fˆ(X, {x1 , . . . , xi } , w), {z } | Prev. part of the melody B(x, fˆ) = tanh(−s ãäå M1 è M2 M1 − M2 · e), |M1 + M2 | ïåðâûé è âòîðîé íàèáîëüøèå êîìïîíåíòû ðàñïðåäåëåíèÿ âåðîÿòíîñòåé ïàðàìåòð, e=1 fˆ èç f â (1), s åñëè ïðåäñêàçàíèå âåðíî è Ì. Ï. Ìàòðîñîâ ìàñøòàáíûé −1 èíà÷å. Ïðåäñêàçàíèå ìóçûêè 15 / 28 Ñãëàæåííàÿ ôóíêöèÿ îøèáêè Ìåíüøèé ïàðàìåòð ìàñøòàáà s ôóíêöèè îøèáêè Ssmooth äåëàåò ñòóïåíüêè ìåíåå çàìåòíûìè, òåì ñàìûì ïðîöåññ îïòèìèçàöèè ïðîùå, íî ìåíåå òî÷íûì. 0.5 0 S(x) −0.5 −1 −1.5 s = 100 s = 10 s=1 −2 −2 −1 0 1 2 3 x Ì. Ï. Ìàòðîñîâ Ïðåäñêàçàíèå ìóçûêè 16 / 28 Ñãëàæåííàÿ ôóíêöèÿ îøèáêè Ñãëàæåííàÿ Ssmooth (w, X) è èñõîäíàÿ S(w, X) ôóíêöèè îøèáêè â áîëüøèíñòâå ñëó÷àåâ èìåþò áëèçêèå ìèíèìóìû. 0.325 0.32 0.315 0.31 0.305 0.3 0.295 Smooth Raw 0.29 −0 8 −0 6 −0 4 −0 2 Ì. Ï. Ìàòðîñîâ 0 02 04 06 Ïðåäñêàçàíèå ìóçûêè 08 1 17 / 28 Ñòîõàñòè÷åñêèé ãðàäèåíòíûé ñïóñê Ïðåäñêàçàíèå è ôóíêöèÿ îøèáêè âû÷èñëÿåòñÿ äî 100 ÷àñîâ. Ëó÷øå äåëàòü ìàëåíüêèå øàæêè äëÿ íåáîëüøèõ ÷àñòåé òðåíèðîâî÷íîãî ìíîæåñòâà ñëó÷àéíîå ïîäìíîæåñòâî èç ìåëîäèé 100 . Äëÿ ñðàâíåíèÿ, â îáû÷íîì íàáîðå ìîæåò áûòü äî 10 000. Òàê îïòèìèçàöèþ ìîæíî ïðîâåñòè áûñòðåå. −4 2 i L2 distance ρ(w −w optimal ) x 10 1 0 0 10 20 30 40 50 Iteration number Ì. Ï. Ìàòðîñîâ 60 70 Ïðåäñêàçàíèå ìóçûêè 80 90 18 / 28 Ìóçûêà îáëàäàåò ñâîéñòâàìè åñòåñòâåííîãî ÿçûêà Ðàñïðåäåëåíèå N -ãðàììîâ ïî ÷àñòîòå äëÿ ðàçëè÷íûõ ñîáûòèé ýòî ÷èñëî ïîÿâëåíèé ðàâåí −0.6, N -ãðàììà N. ×èñëî â Midi50k. Íàêëîí ðàñïðåäåëåíèå ïîõîæå íà ðàñïðåäåëåíèå ñëîâ â åñòåñòâåííîì ÿçûêå (çàêîí Öèïôà). 7 10 6 1−grams Number of events 10 5 10 4 10 3 10 16−grams 2 10 1 10 0 10 0 10 1 10 2 10 3 10 4 10 5 10 Most popular N−grams Ì. Ï. Ìàòðîñîâ Ïðåäñêàçàíèå ìóçûêè 19 / 28 Âåðîÿòíîñòü ýëåìåíòîâ E Êàðòà ðàñïðåäåëåíèÿ ÷àñòîò ýëåìåíòîâ ýòî ÷èñëî ïîÿâëåíèé N -ãðàììà E. ×èñëî ñîáûòèé â Midi50k. Ïîðÿäîê ñòðîåâ (ïî ãîðèçîíòàëè) ïðîèçâîëüíûé ðåçóëüòàò ïðåäñòàâëåíèÿ àêêîðäà â âèäå ïàðû (s, r ). Ì. Ï. Ìàòðîñîâ Ïðåäñêàçàíèå ìóçûêè 20 / 28 Ïðîñòðàíñòâî of 2-ãðàììîâ Ðàñïðåäåëåíèå êîìïîçèöèé ïî æàíðàì ñïðîåöèðîâàííîå èç ïðîñòðàíñòâà áèãðàììîâ àêêîðäîâ íà äâóìåðíóþ ïëîñêîñòü ìåòîäîì PCA. Cluster Assignments 0.6 blues classical hip hop 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −0.4 −0.3 −0.2 −0.1 Ì. Ï. Ìàòðîñîâ 0 0.1 0.2 0.3 0.4 Ïðåäñêàçàíèå ìóçûêè 0.5 21 / 28 Êà÷åñòâî ïðåäñêàçàíèÿ ×èñëî ïàðàìåòðîâ ðàâíî 16, òðåíèðîâî÷íûé íàáîð âåñü Midi50k. Ñðåäíÿÿ âåëè÷èíà îøèáêè ðàâíà óñïåøíî ïðåäñêàçàííûõ ýëåìåíòîâ x ∈ E). 0.42 (îçíà÷àåò 58% Ñóùåñòâóþò ìåëîäèè, ïðåäñêàçàííûå íà 100%, òàê æå êàê è ìåëîäèè ïðåäñêàçàííûå ïëîõî (<5%). 4000 Number of melodies 3500 3000 2500 2000 1500 1000 500 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Average prediction error S(w). Lower is better. Ì. Ï. Ìàòðîñîâ Ïðåäñêàçàíèå ìóçûêè 0.9 1 22 / 28 Ðàçìåð òðåíèðîâî÷íûõ äàííûõ S(w, X) = x −1 X NX x∈X i=1 xi+1 6= f (X, {x1 , . . . , xi } , w). {z } | Prev. part of the melody Average error S(w), lower is better 0.7 Train Test 0.65 0.6 0.55 0.5 0.45 0.4 0.35 0.3 1 10 2 10 3 10 4 10 Number of melodies in dataset Ì. Ï. Ìàòðîñîâ Ïðåäñêàçàíèå ìóçûêè 23 / 28 Ñëîæíîñòü ìîäåëè Ôóíêöèÿ îøèáêè S(w, X) è ÷èñëî ïàðàìåòðâ K (w ∈ R2K ), ïðîâåðî÷íàÿ âûáîðêà: Average prediction error, S(w) 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 0 2 4 6 8 10 12 14 16 Number of model parameters Ì. Ï. Ìàòðîñîâ Ïðåäñêàçàíèå ìóçûêè 24 / 28 Ñðàâíåíèå ìåòîäîâ Quality Main idea Mozer[1] Conklin[2] Proposed Bayes classiers Neural network Music patterns Chords 40% Pitches 90% Durations Datasize [1] 93% 20 95% 58.0% 92.5% 75% 4500 50 000 Neural network music composition by prediction Multiple viewpoint systems for music prediction M. Mozer, Connection Science, 1994. [2] D. Conklin, I. Witten, Journal of New Music Research, 1995, rev. 2002. Ì. Ï. Ìàòðîñîâ Ïðåäñêàçàíèå ìóçûêè 25 / 28 Ðåçóëüòàòû âû÷èñëèòåëüíîãî ýêñïåðèìåíòà Îïòèìàëüíàÿ ñëîæíîñòü ìîäåëè (ìàêñèìàëüíàÿ äëèíà N -ãðàììîâ) ðàâíà 8, íî ÷åì áîëüøå òåì ëó÷øå. ×èñëî ìåëîäèé â òðåíèðîâî÷íîì íàáîðå æåëàòåëüíî áîëüøå 1000. Êà÷åñòâî ïðåäñêàçàíèÿ 58% (ïî-àêêîðäíî, 0.024% åñëè íàóãàä), ðàññòîÿíèå Õýìèíãà ðàâíî 0.075 (92.5% ñîâïàäàþùèõ ïîëóòîíîâ, 50% äëÿ ñëó÷àéíîãî óãàäûâàíèÿ). Ì. Ï. Ìàòðîñîâ Ïðåäñêàçàíèå ìóçûêè 26 / 28 Çàùèùàåìûå óòâåðæäåíèÿ Ïðåäñòàâëåíèå ïîñëåäîâàòåëüíîñòè àêêîðäîâ êàê ïîëåäîâàòåëüíîñòè ýëåìåíòîâ ñïåöèàëüíî ñêîíñòðóèðîâàííîé ãðóïïû âïîëíå îáîñíîâàíî. Ðàçðàáîòàí àëãîðèòì ïðåäñêàçàíèÿ îäíîãî ñëåäóþùåãî ýëåìåíòà ìóçûêàëüíîé ïîñëåäîâàòåëüíîñòè. Àëãîðèòì ïðîòåñòèðîâàí íà íàáîðå Midi50k è ïðîâåäåíî åãî ñðàâíåíèå ñ äðóãèìè ðàáîòàìè (Mozer, Conklin) ñ òî÷êè çðåíèÿ êà÷åñòâà ïðåäñêàçàíèÿ. Ì. Ï. Ìàòðîñîâ Ïðåäñêàçàíèå ìóçûêè 27 / 28 Ïóáëèêàöèè M. Matrosov, V. Strijov, A. Matrosov. Short-term forecasting of musical compositions using chord sequences. Conference of International Federation of Operational Research Societies, July 2014, Barcelona, Spain. M. Matrosov, V. Strijov. Short-term forecasting of musical compositions using chord sequences. 57-th Scientic Conference of MIPT. November 2014, Dolgoprudny, Russia. Seminar Music and Science, Moscow State Conservatory named for P. I. Tchaikovsky. January 20, 2015, Moscow, Russia. Ì. Ï. Ìàòðîñîâ Ïðåäñêàçàíèå ìóçûêè 28 / 28