²»Çë×ÔÀ´£¬×î½ü¸ÕºÃÒòΪ³ï±¸JDDÓïÒôʶ±ð´óÈüÔÚ×öÕâ·½ÃæµÄÑо¿¡£
ÓïÒôʶ±ð£¬¹ËÃû˼Ò壬ÊÇÀûÓûúÆ÷½«ÓïÒôÐźÅת»»³ÉÎı¾ÐÅÏ¢¡£±ÈÈç±»ÎÒÃÇÍ滵µÄSiri£¬¾©¶«µÄ¶£ßËÒôÏ䣬СÃ×µÄС°®Í¬Ñ§£¬ÑÇÂíÑ·µÄEchoµÈ¡£
ÏëÒª´ÓÁ㿪ʼ´î½¨Ò»¸öÓïÒôʶ±ðϵͳ£¬Ê×ÏÈÓöµ½µÄÒ»¸öÎÊÌâ¾ÍÊÇʹÓÃʲôÊý¾ÝÓÃÀ´ÉÏÊÖ¡£Ò»¸öÓÐЧµÄÓïÒôʶ±ðÄ£ÐÍ£¬Í¨³£ÐèÒª¼¸°Ùµ½¼¸Ç§Ð¡Ê±°üº¬Îı¾±ê×¢µÄÒôƵÎļþ×÷ΪѵÁ·Êý¾Ý¡£
Ä¿Ç°ÔÚѧÊõÂÛÎÄÖбȽϳ£¼ûµÄÓïÒôʶ±ðÊý¾Ý¼¯£¬±ÈÈçSwitchboard£¬TIMIT£¬WSJÕâЩ£¬¶¼²¢²»Ãâ·ÑÇÒ¼Û¸ñ¸ß°º¡£
¹«¿ªÊý¾Ý¼¯ÖÐ×î³£ÓõÄÓ¢ÎÄÓïÁÏÊÇLibriSpeech£¬ÆäÖаüº¬ÁË1000СʱµÄ16kHzÓÐÉùÊé¼Òô£¬²¢ÇÒ¾¹ýÇиîºÍÕûÀí³ÉÿÌõ10Ãë×óÓҵġ¢¾¹ýÎı¾±ê×¢µÄÒôƵÎļþ£¬·Ç³£ÊʺÏÈëÃÅʹÓá£
ÖÐÎÄÓïÁÏ·½Ã棬Ŀǰ¹«¿ªµÄ´ó¹æÄ£ÓïÒôʶ±ðÊý¾Ý¼¯±È½ÏÉÙ¼û¡£Ç廪´óѧ¿ªÔ´¹ý30СʱµÄÁ¬ÐøÆÕͨ»°ÓïÒôÊý¾Ý¿âTHCHS-30£¬ÓÉ´óѧÉú²ÎÓë¼Òô»ñµÃ¡£
ÒÔÉϽéÉܵÄÁ½¸ö¹«¿ªÓïÒôÊý¾Ý¼¯£¬¶¼¿ÉÔÚhttp:// ** .openslr.orgÃâ·ÑÏÂÔØ¡£
ÁíÍ⣬²Î¼ÓһЩ¿ª·ÅÓïÒôÊý¾ÝµÄËã·¨±ÈÈü£¬Ò²ÊÇÖÐÎÄÓïÒôʶ±ðÏîÄ¿ÈëÃŵĺÃ;¾¶¡£±ÈÈçÎÒÃǽñÄêÖ÷°ìµÄ¾©¶«½ðÈÚ¶Ô»°ÓïÒôʶ±ð´óÈü£¬Ê״οª·ÅÁËÉÏǧСʱµÄÖÐÎÄ¿Í·þ¶Ô»°ÓïÒôÊý¾Ý£¬²¢Ìṩǿ´óµÄGPU×ÊÔ´±£ÕÏÔËËãÄÜÁ¦¡£±ÈÈüÃæÏò¹ã´óµÄÓïÒô¼¼Êõ°®ºÃÕߣ¬¸ßУѧÉú£¬ÆóÒµ¿ª·¢Õߣ¬Ï£Íûͨ¹ý¿ª·Å¿Í·þ¶Ô»°ÓïÒô±ê×¢Êý¾Ý(¾ÑϸñÍÑÃô)£¬×¨×¢ÓÚÑ°ÕÒ×îС¢×îÇ¿µÄÓïÒôʶ±ðËã·¨£¬´Ù½øÓïÒô¼¼ÊõµÄÆÕ¼°ºÍ·¢Õ¹£¬Í¬Ê±¼ÓÇ¿ÓïÒôʶ±ð¼¼Êõ°®ºÃÕßÖ®¼äµÄ½»Á÷Óë·ÖÏí¡£
JDD¿Õ¼äÕ¾ºÍÓïÒôʶ±ð´óÈü¾ßÌåÐÅÏ¢¿É²é¿´ÎÄÕ£º
JDD¿Õ¼äÕ¾Ê׳¡ÑûÇëÈü¿ªÈü£¬¿ª·ÅÊýǧСʱÕæʵ¿Í·þ¶Ô»°ÓïÒôÊý¾Ý
×î¼òµ¥µÄÓïÒôʶ±ðÀýÈçyes/no,Õû¸öʶ±ðµÄ´Êµä·¶Î§Ö»ÓÐyesºÍnoÁ½¸öµ¥´Ê¡£
Èç¹û´Êµä·¶Î§ÔÚÀ©´óһЩ£¬ÀýÈçÊý×Öʶ±ð£¬Ê¶±ðÁ¬ÐøµÄ ** Êý×Ö¡£ÕâÁ½ÖÖÓïÒôʶ±ðµÄÈÎÎñ¶¼ÊÇÏà¶Ô¼òµ¥µÄ¡£
¶øÔÚ¶ÔÓڵ绰¼ÒôתÒëÔòÒª¸´Ôӵö࣬ÖÐÎĵÄÓïÒôʶ±ð´Êµä·¶Î§¿ÉÒÔ´ïµ½6000¶à¸öºº×Ö£¬¶øÓ¢Óïµ¥´ÊÔòÓÐ ** 000¸öµ¥´ÊÖ®¶à¡£
ÁíÍâÓïÒôʶ±ð»¹¿ÉÒÔ´Ó³¡¾°½Ç¶ÈÈ¥·ÖÀ࣬·ÖΪ¹ÂÁ¢´Êʶ±ð(Isolated Word)£¬Á¬ÐøÓïÒôʶ±ð(continuous speech)£¬¶øÁ¬ÐøÓïÒôʶ±ðÓÖ¿ÉÒÔ·ÖΪÈË»ú¼ÒôºÍ¶Ô»°ÓïÒôʶ±ð£¬ÈË»ú¶Ô»°ÊÇÈ˶ԻúÆ÷·¢Éù£¬ÀýÈçÓïÒôÊäÈë·¨£¬¶ø¶Ô»°ÓïÒôʶ±ð£¬ÀýÈç»áÒ飬¿Í·þµç»°µÈ
¶ø±¾´ÎJDD¾ÙÐеĶԻ°ÓïÒôʶ±ð´óÈü£¬´ÓÈÈÉíµÄÊý×ÖÓïÒôʶ±ðµ½¿Í·þÓïÒôʶ±ð£¬¶¼ÌṩÁËÒ»¸öÈëÃÅÓïÒôʶ±ðµÄ¾øºÃ»ú»á¡£
±¨Ãû²ÎÈü¿É´Á£ºhttps://jdder.jd.com/index/jddDetail
ËäÈ»Êý×ÖÓïÒôʶ±ðÄܸøÎÒÃÇÌṩһ¸öÈëÃÅ£¬µ«ÊÇͨ³£ÎÒÃÇËùÖ¸µÄÓïÒôʶ±ð¶¼ÊÇÖ¸´ó¹æÄ£´Ê»ãÁ¬ÐøÓïÒôʶ±ð(Large-Vocabulary Continuous Speech Recognition--LVCSR),Ó¢Óï´Ê»ãÔÚ20000-60000µÄ·¶Î§£¬¶øÖÐÎĺº×ֵķ¶Î§ÊÇ2500-6000µÄ·¶Î§¡£
´«Í³ÓïÒôʶ±ðµÄ¿ò¼ÜÈçÏ£º
ÓïÒôÐźÅÌØÕ÷ÌáÈ¡£º³£¼ûµÄÓïÒôÐźÅÌØÕ÷ÓÐMFCC£¬Filterbank£¬SpectrogramµÈÉùѧģÐÍ£º´«Í³µÄÓïÒôʶ±ð»ùÓÚHMM¡¢GMM½øÐÐÉùѧģÐͽ¨Ä£ÓïÑÔÄ£ÐÍ£ºÒ»°ã²ÉÓÃN-gramµÄÓïÑÔÄ£ÐÍ£¬Ä¿Ç°»ùÓÚRNNµÄÓïÑÔÄ£ÐÍÒ²Öð²½·¢Õ¹ÆðÀ´½âÂ룺´«Í³µÄ½âÂëÒ»°ãÊÇ»ùÓÚWFST£¬ÔÚHMM¡¢×ÖµäºÍÓïÑÔÄ£Ð͹¹³ÉµÄ¶¯Ì¬ÍøÂçÖÐËÑÑ°×îÓŵÄÊä³ö×Ö·ûÐòÁÐ
¶ø¶Ë¶Ô¶ËµÄÓïÒôʶ±ð¿ò¼Ü£¬ºÜÊʺϳõѧÕßÈëÃÅ£¬´ó´ó½µµÍÁËÓïÒôʶ±ð¼¼ÊõµÄÃż÷¡£¶Ëµ½¶ËµÄÓïÒôʶ±ðϵͳһ°ã²ÉÓÃCTC»òÕßAttentionÁ½ÖÖ»úÖÆ¡£Ëæ×ÅÉñ¾ÍøÂç¼¼ÊõÒÔ¼°Ó²¼þ¼ÆËãÄÜÁ¦µÄ²»¶Ï·¢Õ¹£¬²ÉÓÃÉÏÍòСʱÓïÁÏѵÁ·µÃµ½µÄ¶Ëµ½¶ËÓïÒôʶ±ð½á¹û½Ï´«Í³·½·¨È¡µÃÁËÃ÷ÏԵĽø²½£¬ÆäÖÐÒ»¸öÀý×ÓΪ°Ù¶ÈµÄDeepspeech¿ò¼Ü¡£ÏÂÃæΪһЩ¾µäµÄ¶Ëµ½¶ËÓïÒôʶ±ð·½ÃæµÄÂÛÎÄ¡£
1. D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos et al., ¡°Deep speech 2: End-to-end speech recognition in english and ** ndarin,¡± CoRR arXiv:1512.02595, 2015.
2. A. Graves, S. Fern¨¢ndez, F. Gomez, and J. Schmidhuber. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In ICML, pages 369¨C376. ACM, 2006.
3. W. Chan, N. Jaitly, Q. Le, and O. Vinyals. Listen, attend, and spell. abs/1508.01211, 2015. http://arxiv.org/abs/1508.01211.
ÓïÒôʶ±ð¿ªÔ´ÏîÄ¿»ã×Ü:
https://github.com/SeanNaren/deepspeech.pytorch»ùÓÚPytorch
https://github.com/pannous/tensorflow-speech-recognition »ùÓÚTensorflow
https://github.com/facebookresearch/wav2letter »ùÓÚTorch
https://github.com/samsungsds-rnd/deepspeech.mxnet»ùÓÚMxnet
https://github.com/baidu-research/ba-dls-deepspeech»ùÓÚTheano
https://github.com/PaddlePaddle/DeepSpeech »ùÓÚPaddlePaddle
https://github.com/mozilla/DeepSpeech»ùÓÚTensorflow
https://github.com/kaldi-asr
±¾ÎÄϵ¾©¶«½ðÈÚ¼¼ÊõÑз¢²¿Ô´´ÄÚÈÝ¡£Ï£Íû¶ÔÄãÓÐÓ㬲»Áß¹Ø×¢¡¢µãÔ޺ͷÖÏí£¬Ò²»¶ÓÁªÏµÎÒÃDzμÓÓïÒôʶ±ð´óÈü£¬ÆÚ´ýÇд轻Á÷~
ÃâÔðÉùÃ÷£º ±¾ÎÄÕÂת×ÔÆäËüƽ̨£¬²¢²»´ú±í±¾Õ¾¹Ûµã¼°Á¢³¡¡£ÈôÓÐÇÖȨ»òÒìÒ飬ÇëÁªÏµÎÒÃÇɾ³ý¡£Ð»Ð»£¡ leyu¡¤ÀÖÓã(Öйú)ÌåÓý¹Ù·½ÍøÕ¾ChipSourceTek |