APP下载

浅析AI语音识别技术在传统听译上应用的可能性

2021-05-14杨茜袁奥航胡欢袁玉刘钧鹏朱奕阮先玉

锦绣·上旬刊 2021年6期

杨茜 袁奥航 胡欢 袁玉 刘钧鹏 朱奕 阮先玉

摘要:随着全球化的发展,我国与国外文化交流日益频繁,英文视频的需求量大幅上升。AI语音识别技术的应用极大的促进了语言产业的创新。为研究AI语音识别技术在传统听译工作中应用的可能,本文同时使用讯飞听见、腾讯云、搜狗听写三个支持语音识别技术的软件,对人工听译与AI语音识别听译后的文本进行了初步分析与总结。本文发现,AI语音识别较人工听译用时短,但正确率有待提高,就如何对两者的优缺点进行结合,本文提出了相应思路和方法。

关键词:听译;AI语音识别;语音转写

在“引进来”和“走出去”战略的指导下,我们对英文视频的需求量日益增加。听译是指对音频或视频中的原声语音文本进行听写和识别,便于后续对音频或视频进行翻译的过程。传统人工听译依靠人工提取,对速记员要求较高,受人为因素影响较大。随着人工智能技术的日渐成熟,AI语音识别技术在语音识别和听写方面受到更广泛的认可。2017年8月,微软宣布其旗下语音识别系统的正确率已经由原来的94.1%提升至94.9%,其正确率高于部分专业速记员。然而在语音特征提取的准确性,识别的稳定性等方面亟待改进。

1.传统人工听译的特点及问题

听译是一种特殊的语音识别和转换类型,具有书面性,即时性,同步性,跨文化性等特性。针对英文视频的语音识别,听译时并无源语文本作为参考。完成从音频到书面文本的转换,要求速记员有较高的听辨能力。然而,英文音频源文本具有口语化、不规范性、难以识别性等特征,使得速记员在听译时很难辨识。

2.AI语音识别听译与人工听译的分析与比较

选用音视频均来自TED演讲、BBC新闻、知名电影片段,AI语音识别软件采用讯飞听见、腾讯云、搜狗听写三个支持AI语音识别(语音转文字)的软件。

2.1用时

以TED演讲《如何学好外语》为例,速记员人工听译平均用时一小时三十七分二十七秒(1:37:27),三个AI语音识别软件平均用时十一分零九秒(11:09),AI软件语音识别并生成文本几乎与原视频同步。对比之下,笔者组织速记员对50个不同音频进行人工听译,并对用时进行统计。统计结果显示,人工听译文本的用时是AI语音识别软件的3-14倍,倍数与源语文文本的时长和难度呈正相关。统计结果表明,在用时方面,AI语音识别软件体现出其明显优势。

2.2口音校正

速记员在人工听译时能针对口音较重的音频进行反复多次的听写,从而达到终版听译文本的准确。然而,由于大部分语音识别软件默认标准的美式或英式发音,对部分带有口音的音频存在识别障碍。

例1:

人工:...talking about how this problem is being addressed...

搜狗/腾讯:...talking about how this problem is being dangerous...

例2:

人工:... after the third season, seriously, the dialogue started to make sense...

搜狗:... after they turn a season, seriously, the dialogue started to make sense...

以上材料均选用带有印式英语的音频。不难发现,由于印式英语与美式英语和英式英语之间存在元音障碍和辅音障碍,AI语音识别软件难以对部分发音进行准确的识别,使得导出文本出现严重错误。

2.3断句

例1:

人工: A pentagon official said this was to provide president Obama with flexibility.

腾讯: A pentagon official said this was to provide president Obama with flexibility should military options be required to protect American lives and interests.

例2:

人工:...people dont listen to them. Why is that?

搜狗:...people dont listen to them and why is that?

騰讯:...people dont listen to them why is that?

受原音频语速和轻重读音的影响,AI语音识别软件难以像人工听译一样做到准确的断句。但就普遍性而言,50个音频里断句错误占比较低。绝大多数情况下,AI语音识别软件还是能较准确的对原音频进行断句。

2.4整体准确性

例1:

人工:Its the instrument we all play. Its the most powerful sound in the world. Probably its the only one that can start a war or say, I love you.

讯飞:Its the instrument we all play. Probably see anyone that can start a war or say, I love you.

搜狗:Its the most powerful instrument well play. Its the most powerful sound in the world. Probably its the only one that can start a war or say, I love you.

腾讯:Voice instrument we will play its most powerful sound a world probably any one can start a war or say I love you.

例2:

人工:Oh no, I cant leave you. I promised I would put your photo up. I promised you would see Coco.

讯飞:Oh no, I cant leave you. I promised I put your photo up. I promise you would see Coco.

搜狗:Its almost sunrise. Leave you.

腾讯:Oh no, I cant leave you. I promised Id put your phone up. I promised you would see Coco.

例3:

人工:Remember me though I have to say goodbye. Dont let it make you cry. Forever if Im far away. Look, I sing secret song to you. Each time you hear sad guitar. Know that Im with you. The only way that I can be until youre in my arm again.

訊飞:Remember be so I have to travel for Free man army each time you hear cent town with you noise to noise noise yeah yeah noise yeah.

搜狗:Remember be so I have to travel for Free man army each time you hear cent town with you noise to noise yeah noise yeah.

腾讯:real number me! Do I have to say goodbye do not let it make you cry far away. I sings secret song to you. Each time you hear sand it are. The only way that I can be until youre in my arm again.

AI语音识别软件在识别过程中,存在增听、漏听、连读分辨不清、甚至部语段无法识别等问题,使得识别后的文本正确率较源语文本低。人工听译主要依靠速记员的专业性,听写时长长,且可反复听写某一模糊部分,正确率较源语文本高,准确性较AI语音识别软件更好。

3.总结

字幕听译较文本翻译受到更多因素的限制。笔者通过对人工听译与AI语音识别软件听译的分析与对比发现,人工能更好的保证断句、口音校正和整体的准确性,但用时长,工作量大,对速记员本身的语言素质要求高;由于AI语音识别软件当前固有的问题,AI语音识别整体上已经达到不错水平,能较为准确的识别出源音频。这说明,在日后的听译工作中,速记员可尝试将AI语音识别后的文本作为蓝本进行再精听;将AI语音识别技术同传统听译结合起来,采用更加灵活的听译策略和方法,更快速准确的完成听译工作。

参考文献

[1]林明月,耿磊.浅析字幕翻译的特点[J].明日风尚,2016(18):282.

[2]路雅芝.从功能对等理论浅谈字幕听译——以跨语言访谈类节目为例[J].校园英语,2019(14):229-230.

[3]艾朝阳,周祎,李红.二语习得中英汉口译障碍的边界条件——印式英语语音听辨障碍分析[J].教育现代化,2015(13):71-75.