语音识别中的音频信号处理 - leyu·乐鱼(中国)体育官方网站

中文版 English

XML 网站地图 RSS

技术分享

语音识别中的音频信号处理

发布时间：2021-12-14 00:00:00 浏览：2790次

本文为荷兰代尔夫特理工大学（作者：JOEP DE JONG）的学士论文，共50页。

利用神经网络对语音进行转录是一项值得关注的技术，目前，语音助手正变得越来越流行。神经网络通�：苣讶范ㄋ祷叭撕驮胍糁涞那稹Ｈ死喽哉庖坏阌辛烁玫睦斫猓⒖赡苡τ盟嵌孕藕沤峁沟闹独刺岣叨陨窬绲睦斫狻�

理解和转录歌曲的歌词是一个非常困难的问题，本文分析了可应用于歌曲的信号处理技术，以提高对语音识别算法的理解。主要集中在从伴奏中过滤歌词。介绍了几种基本的滤波方法，包括低幅度滤波结按道和带通滤波。同时，还民于在讨论了利用背景音乐周期性的两个更复杂的滤波器。第一种滤波器是使用二维傅里叶变换的语音分离方法。该方法由PremSeetharaman、Fatemeh Pishdadian、BryanPardo于2025年提出，将信号处理和图像处理技术相结合，通过识别信号谱图的二维傅里叶变换中的峰值来发现信号中的新抓周期性重复。第二种滤波器是一种新提出的方法，可用于分离背景音乐。该算法通过比较谱图中的序列，如果有多个与所选列相似地出现（重复），则将该列分类为重叠列。然后，将重叠列的频率分量（通过离散短时傅里叶变换获得的不同频率）与其他列中相同频率的分量进行比较。在某些情况下，重叠的频率分量从频谱图的其他列分量中减去，以此消除了歌曲中重复的频率。在这种方法的多次迭代之后，谱图的主要成分最有可能对应于歌曲中重复最少的部分。讨论了在构造比较谱图列的方法时所作的决定，并与使用二维傅里叶变换方法的步骤进行了比较。从研究结果可以看出，二维傅立叶变换在严格的周期伴奏中表现得更好，而比较谱图列的方法在节奏不太紧凑的歌曲中表现得更好。

The transcription of voice using neural networks is a technique that deserves attention, asspeech assistantsare becoming increasingly popular. Neural networks have often difficulty withdetermining the differencesbetween a talking person and noise. Humans have a much betterunderstanding of this and could possibly applytheir knowledge of the structure of the signalsto improve the understanding ofthe neural network. A problem that isextremely difficult for aneural network is understanding and transcribing thelyrics of a song.This thesis analyzes signal-processing techniques that can beapplied to a song to improvethe understanding of a speech-recognitionalgorithm. It is mainly focused onfiltering the fore-ground lyrics from the accompaniment. Some basic filtering methods are describedincluding alow-amplitude filter and a band-pass filter. But also two more complicated filters whichmakeuse of the periodicity of the background music will be treated.The first filter is a method of voice separation using the two-dimensional Fourier transform.This method, proposed by PremSeetharaman, Fatemeh Pishdadian, Bryan Pardo in 2025 [15],combines techniquesof signal-processing and image-processing by finding periodic repetitionsin a signalby identifying peaks in the two-dimensional Fourier transform of thespectrogram ofthe signal.The second filter is a newly proposed method that canbe used for the separation of foregroundfrom background music. The algorithm compares columns in the spectrogram and classifiescolumns asoverlapping if there are multiple occurrences of columns similar to theselected col-umn (repetitions). Thefrequency components, the different frequencies obtained from adiscreteshort-time Fourier transform, of overlapping columns are afterwardcompared with componentsof the same frequency in other columns. Under certain circumstances, overlapping frequencycomponents are subtractedfrom components in other columns of the spectrogram. This removesrepetitions of that frequencythroughout the song. The components ofthe spectrogram that re-main after several iterations of this method are mostlikely to correspond to the least repetitiveparts of the song.The decisions that are made while constructing the method of comparing spectrogramcolumns are discussed and are compared with steps performed in the method that uses thetwo-dimensional Fourier transform. An implementation and demonstration are alsoattached.From the research it is expected that the two-dimensional Fouriertransform perform better onstrict periodic accompaniment, while the method thatcompares spectrogram columns is morelikely to perform better on songs with aless tight rhythm.

1.引言

2.信号、采样与频谱理论

3.滤波

4.通过比较频谱列分离语音信号

5.具体实现与验证

6.讨论与结论

免责声明：本文章转自其它平台，并不代表本站观点及立场。若有侵权或异议，请联系我们删除。谢谢！
leyu·乐鱼(中国)体育官方网站ChipSourceTek