基於大型語言模型的教學影片中文語音轉文字精準度提升方法之研究
| dc.contributor | 周遵儒 | zh_TW |
| dc.contributor | Chou, Tzren-Ru | en_US |
| dc.contributor.author | 楊之昌 | zh_TW |
| dc.contributor.author | Yang, Chih-Chang | en_US |
| dc.date.accessioned | 2025-12-09T08:09:16Z | |
| dc.date.available | 2025-07-15 | |
| dc.date.issued | 2025 | |
| dc.description.abstract | 隨著語音識別技術的迅速發展,中文語音轉文字(STT)系統對於字幕的製作,扮演著重要的角色,並經常應用於教學影片上。然而,由於中文的複雜性及同音字詞眾多,現有的STT系統在精準度方面仍存在明顯的提升空間。本研究針對提升中文STT精準度,提出了語言模型輔助編輯與微調語言模型輔助文本編輯等兩種基於大型語言模型(LLM)的優化方法,並透過製作多種領域課程的教學影片字幕,以萊文斯坦動態規劃來計算兩個字串之間的最短編輯距離進行驗證。研究結果顯示,使用語言模型輔助編輯不僅能提升精準度,微調語言模型輔助文本編輯的文字精準度更進一步得到提升,其能針對特定語言的特性產生微調策略,使其更有效地辨識出語言的細微差異,進一步提升中文語音轉文字系統的準確性。 | zh_TW |
| dc.description.abstract | With the rapid evolution of speech-recognition technology, Chinese speech-to-text (STT) systems have come to play a critical role in subtitle production and are now routinely employed in instructional videos. Yet, because of the language’s inherent complexity and the prevalence of homophones, the accuracy of current STT systems still leaves ample room for improvement.To close this gap, the present study proposes two optimisation strategies grounded in large language models (LLMs): LLM-assisted post-editing and fine-tuned-LLM-assisted post-editing. Their effectiveness is evaluated by generating subtitles for courses spanning multiple disciplines and computing the minimum edit distance between reference and candidate strings through a dynamic-programming implementation of the Levenshtein algorithm.The results demonstrate that LLM-assisted post-editing enhances transcription accuracy, and that fine-tuned-LLM-assisted post-editing delivers an additional performance gain. Fine-tuning equips the model with language-specific adaptation strategies, enabling it to capture subtle linguistic distinctions more effectively and, ultimately, to further improve the accuracy of Chinese STT systems. | en_US |
| dc.description.sponsorship | 圖文傳播學系碩士在職專班 | zh_TW |
| dc.identifier | 012723109-47469 | |
| dc.identifier.uri | https://etds.lib.ntnu.edu.tw/thesis/detail/d352c9909d915dea4d0d03fd916b8775/ | |
| dc.identifier.uri | http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/125370 | |
| dc.language | 中文 | |
| dc.subject | 語音轉文字 | zh_TW |
| dc.subject | 大型語言模型 | zh_TW |
| dc.subject | 教學影片 | zh_TW |
| dc.subject | 微調語言模型 | zh_TW |
| dc.subject | 萊文斯坦距離 | zh_TW |
| dc.subject | Speech-to-Text (STT) | en_US |
| dc.subject | Large Language Models (LLM) | en_US |
| dc.subject | Instructional Videos | en_US |
| dc.subject | Fine-Tuned Language Models | en_US |
| dc.subject | Levenshtein Distance | en_US |
| dc.title | 基於大型語言模型的教學影片中文語音轉文字精準度提升方法之研究 | zh_TW |
| dc.title | A Study on Enhancing the Accuracy of Chinese Speech-to-Text in Instructional Videos Using Large Language Models | en_US |
| dc.type | 學術論文 |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- 202500047469-109783.pdf
- Size:
- 3 MB
- Format:
- Adobe Portable Document Format
- Description:
- 學術論文