±¾ÎÄΪºÉÀ¼´ú¶û·òÌØÀí¹¤´óѧ£¨×÷ÕߣºJOEP DE JONG£©µÄѧʿÂÛÎÄ£¬¹²50Ò³¡£
ÀûÓÃÉñ¾ÍøÂç¶ÔÓïÒô½øÐÐת¼ÊÇÒ»ÏîÖµµÃ¹Ø×¢µÄ¼¼Êõ£¬Ä¿Ç°£¬ÓïÒôÖúÊÖÕý±äµÃÔ½À´Ô½Á÷ÐС£Éñ¾ÍøÂçͨ³£ºÜÄÑÈ·¶¨Ëµ»°È˺ÍÔëÒôÖ®¼äµÄÇø±ð¡£ÈËÀà¶ÔÕâÒ»µãÓÐÁ˸üºÃµÄÀí½â£¬²¢¿ÉÄÜÓ¦ÓÃËüÃǶÔÐźŽṹµÄ֪ʶÀ´Ìá¸ß¶ÔÉñ¾ÍøÂçµÄÀí½â¡£
Àí½âºÍת¼¸èÇúµÄ¸è´ÊÊÇÒ»¸ö·Ç³£À§ÄѵÄÎÊÌ⣬±¾ÎÄ·ÖÎöÁË¿ÉÓ¦ÓÃÓÚ¸èÇúµÄÐźŴ¦Àí¼¼Êõ£¬ÒÔÌá¸ß¶ÔÓïÒôʶ±ðËã·¨µÄÀí½â¡£Ö÷Òª¼¯ÖÐÔÚ´Ó°é×àÖйýÂ˸è´Ê¡£½éÉÜÁ˼¸ÖÖ»ù±¾µÄÂ˲¨·½·¨£¬°üÀ¨µÍ·ù¶ÈÂ˲¨½á°´µÀºÍ´øͨÂ˲¨¡£Í¬Ê±£¬»¹ÃñÓÚÔÚÌÖÂÛÁËÀûÓñ³¾°ÒôÀÖÖÜÆÚÐÔµÄÁ½¸ö¸ü¸´ÔÓµÄÂ˲¨Æ÷¡£µÚÒ»ÖÖÂ˲¨Æ÷ÊÇʹÓöþά¸µÀïÒ¶±ä»»µÄÓïÒô·ÖÀë·½·¨¡£¸Ã·½·¨ÓÉPremSeetharaman¡¢Fatemeh Pishdadian¡¢BryanPardoÓÚ2025ÄêÌá³ö£¬½«ÐźŴ¦ÀíºÍͼÏñ´¦Àí¼¼ÊõÏà½áºÏ£¬Í¨¹ýʶ±ðÐźÅÆ×ͼµÄ¶þά¸µÀïÒ¶±ä»»ÖеķåÖµÀ´·¢ÏÖÐźÅÖеÄÐÂ×¥ÖÜÆÚÐÔÖظ´¡£µÚ¶þÖÖÂ˲¨Æ÷ÊÇÒ»ÖÖÐÂÌá³öµÄ·½·¨£¬¿ÉÓÃÓÚ·ÖÀë±³¾°ÒôÀÖ¡£¸ÃË㷨ͨ¹ý±È½ÏÆ×ͼÖеÄÐòÁУ¬Èç¹ûÓжà¸öÓëËùÑ¡ÁÐÏàËƵسöÏÖ£¨Öظ´£©£¬Ôò½«¸ÃÁзÖÀàΪÖصþÁС£È»ºó£¬½«ÖصþÁеÄƵÂÊ·ÖÁ¿£¨Í¨¹ýÀëÉ¢¶Ìʱ¸µÀïÒ¶±ä»»»ñµÃµÄ²»Í¬ÆµÂÊ£©ÓëÆäËûÁÐÖÐÏàͬƵÂʵķÖÁ¿½øÐбȽϡ£ÔÚijЩÇé¿öÏ£¬ÖصþµÄƵÂÊ·ÖÁ¿´ÓƵÆ×ͼµÄÆäËûÁзÖÁ¿ÖмõÈ¥£¬ÒÔ´ËÏû³ýÁ˸èÇúÖÐÖظ´µÄƵÂÊ¡£ÔÚÕâÖÖ·½·¨µÄ¶à´Îµü´úÖ®ºó£¬Æ×ͼµÄÖ÷Òª³É·Ö×îÓпÉÄܶÔÓ¦ÓÚ¸èÇúÖÐÖظ´×îÉٵIJ¿·Ö¡£ÌÖÂÛÁËÔÚ¹¹Ôì±È½ÏÆ×ͼÁеķ½·¨Ê±Ëù×÷µÄ¾ö¶¨£¬²¢ÓëʹÓöþά¸µÀïÒ¶±ä»»·½·¨µÄ²½Öè½øÐÐÁ˱Ƚϡ£´ÓÑо¿½á¹û¿ÉÒÔ¿´³ö£¬¶þά¸µÁ¢Ò¶±ä»»ÔÚÑϸñµÄÖÜÆÚ°é×àÖбíÏֵøüºÃ£¬¶ø±È½ÏÆ×ͼÁеķ½·¨ÔÚ½Ú×಻̫½ô´ÕµÄ¸èÇúÖбíÏֵøüºÃ¡£
The transcription of voice using neural networks is a technique that deserves attention, asspeech assistantsare becoming increasingly popular. Neural networks have often difficulty withdetermining the differencesbetween a talking person and noise. Humans have a much betterunderstanding of this and could possibly applytheir knowledge of the structure of the signalsto improve the understanding ofthe neural network. A problem that isextremely difficult for aneural network is understanding and transcribing thelyrics of a song.This thesis analyzes signal-processing techniques that can beapplied to a song to improvethe understanding of a speech-recognitionalgorithm. It is mainly focused onfiltering the fore-ground lyrics from the accompaniment. Some basic filtering methods are describedincluding alow-amplitude filter and a band-pass filter. But also two more complicated filters whichmakeuse of the periodicity of the background music will be treated.The first filter is a method of voice separation using the two-dimensional Fourier transform.This method, proposed by PremSeetharaman, Fatemeh Pishdadian, Bryan Pardo in 2025 [15],combines techniquesof signal-processing and image-processing by finding periodic repetitionsin a signalby identifying peaks in the two-dimensional Fourier transform of thespectrogram ofthe signal.The second filter is a newly proposed method that canbe used for the separation of foregroundfrom background music. The algorithm compares columns in the spectrogram and classifiescolumns asoverlapping if there are multiple occurrences of columns similar to theselected col-umn (repetitions). Thefrequency components, the different frequencies obtained from adiscreteshort-time Fourier transform, of overlapping columns are afterwardcompared with componentsof the same frequency in other columns. Under certain circumstances, overlapping frequencycomponents are subtractedfrom components in other columns of the spectrogram. This removesrepetitions of that frequencythroughout the song. The components ofthe spectrogram that re-main after several iterations of this method are mostlikely to correspond to the least repetitiveparts of the song.The decisions that are made while constructing the method of comparing spectrogramcolumns are discussed and are compared with steps performed in the method that uses thetwo-dimensional Fourier transform. An implementation and demonstration are alsoattached.From the research it is expected that the two-dimensional Fouriertransform perform better onstrict periodic accompaniment, while the method thatcompares spectrogram columns is morelikely to perform better on songs with aless tight rhythm.
1.ÒýÑÔ
2.Ðźš¢²ÉÑùÓëƵÆ×ÀíÂÛ
3.Â˲¨
4.ͨ¹ý±È½ÏƵÆ×ÁзÖÀëÓïÒôÐźÅ
5.¾ßÌåʵÏÖÓëÑéÖ¤
6.ÌÖÂÛÓë½áÂÛ
ÃâÔðÉùÃ÷£º ±¾ÎÄÕÂת×ÔÆäËüƽ̨£¬²¢²»´ú±í±¾Õ¾¹Ûµã¼°Á¢³¡¡£ÈôÓÐÇÖȨ»òÒìÒ飬ÇëÁªÏµÎÒÃÇɾ³ý¡£Ð»Ð»£¡ leyu¡¤ÀÖÓã(Öйú)ÌåÓý¹Ù·½ÍøÕ¾ChipSourceTek |