公司年報合規檢驗輔助系統建置與評估
No Thumbnail Available
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
研究背景與問題:台灣1843家上市櫃公司年報需符合「公開發行公司年報應行記載事項準則」145條細項規範,但現行人工檢驗面臨效率低下、人力資源短缺等問題。特別是每年5-6月集中召開股東常會期間,檢驗工作量高達259,260條細項,造成審查委員巨大工作壓力。傳統檢驗方式依賴人工逐項核對,存在定位困難、驗證複雜度高及自動化侷限等挑戰,亟需創新技術解決方案。研究目的與研究問題:本研究旨在建立基於大型語言模型(LLM)與檢索增強生成(RAG)技術的年報合規性自動判讀系統,實現自動化檢驗流程並設計視覺化輔助介面。研究探討四個核心問題:年報合規檢驗系統如何結合LLM與BM25檢索技術實現自動化判讀功能、互動式輔助介面應包含哪些核心功能模組、系統在不同排版類型年報和法規條款中的表現差異,以及影響系統檢驗準確度和效率的關鍵因素及其優化策略。
研究方法:採用模組化系統架構,包含資料前處理、自動化檢驗及互動式輔助介面三大模組。將145條法規細項系統性分類為五種類型,聚焦於74條不需額外資訊的內容規範條款。採用分層比例抽樣選取27條測試項目,以5家不同產業與排版類型的上市公司年報為驗證樣本。建立「寬鬆正確率」與「嚴格正確率」雙重評估指標,結合人工專家判斷作為標準答案進行系統效能評估。
主要發現:系統整體達到86.26\%寬鬆正確率與64.12\%嚴格正確率,平均處理時間僅需97.43秒。自定義詞典為關鍵技術亮點,將檢索準確率從47.93\%大幅提升至86.26\%,改善幅度達38.33\%。不同題目類型表現存在顯著差異:文本檢核類達95.89\%寬鬆正確率,欄位驗證類為68.75\%,圖像識別與程序條款均達100\%。系統對傳統排版與視覺化排版年報均具良好適應性,正確率分別為86.54\%與86.02\%。結論與意義:本研究成功驗證了RAG框架在專業合規檢驗領域的可行性,建立完整的年報自動化檢驗系統。研究在學術上建立了法規條款系統性分類框架,為RAG技術在專業領域應用提供實證基礎;在實務上大幅提升檢驗效率,減輕監管機構工作負擔,促進資訊透明化與投資者保護。研究成果可擴展至其他法規文件檢驗,為智慧合規檢驗發展奠定重要基礎,體現了圖書資訊檢索技術與AI技術融合的創新應用價值。
Research Background and Problem: Taiwan's 1,843 listed companies must comply with 145 detailed regulations under the "Standards for the Required Information in Annual Reports of Public Companies." However, current manual verification processes face challenges including low efficiency, insufficient human resources, and inconsistent standards. During the concentrated shareholder meeting period in May and June each year, the verification workload reaches 259,260 individual items, creating enormous pressure on review committees. Traditional verification methods rely on manual item-by-item checking, presenting difficulties in content localization, high verification complexity, and limited automation capabilities, urgently requiring innovative technological solutions.Research Objectives and Research Questions: This study aims to establish an automated annual report compliance assessment system based on Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) technology, implementing automated verification processes and designing visual assistance interfaces. The research explores four core questions: how annual report compliance verification systems can achieve automated assessment by combining LLMs with BM25 retrieval technology, what core functional modules should be included in interactive assistance interfaces based on annual report verification needs, the performance differences of system automated assessment across different report layout types and regulatory clauses, and the key factors affecting system verification accuracy and efficiency along with optimization strategies.Research Methods: A modular system architecture was employed, comprising three main modules: data preprocessing, automated verification, and interactive assistance interface. The 145 regulatory items were systematically classified into five types, focusing on 74 content specification clauses that require no additional information. Stratified proportional sampling was used to select 27 test items, with annual reports from 5 listed companies across different industries and layout types serving as verification samples. Dual evaluation metrics of"lenient accuracy rate" and "strict accuracy rate" were established, combined with expert manual judgment as standard answers for system performance assessment. Major Findings: The system achieved an overall lenient accuracy rate of 86.26\% and strict accuracy rate of 64.12\%, with an average processing time of only 97.43 seconds. Custom dictionaries emerged as a key technological highlight, improving retrieval accuracy from 47.93\% to 86.26\%, representing a 38.33\% improvement. Significant differences existed across different question types: text verification achieved 95.89\% lenient accuracy, field validation reached 68.75\%, while image recognition and procedural clauses both achieved 100\%. The system demonstrated good adaptability to both traditional layout and visual layout annual reports, with accuracy rates of 86.54\% and 86.02\% respectively.Conclusions and Significance: This study successfully validated the feasibility of RAG frameworks in professional compliance verification domains, establishing a complete automated annual report verification system. Academically, the research established a systematic classification framework for regulatory clauses and provided empirical evidence for RAG technology applications in professional domains. Practically, it significantly improved verification efficiency, reduced regulatory agency workloads, and promoted information transparency and investor protection. The research outcomes can be extended to other regulatory document verification, establishing an important foundation for intelligent compliance verification development and demonstrating the innovative application value of integrating library and information retrieval technology with AI technology.
Research Background and Problem: Taiwan's 1,843 listed companies must comply with 145 detailed regulations under the "Standards for the Required Information in Annual Reports of Public Companies." However, current manual verification processes face challenges including low efficiency, insufficient human resources, and inconsistent standards. During the concentrated shareholder meeting period in May and June each year, the verification workload reaches 259,260 individual items, creating enormous pressure on review committees. Traditional verification methods rely on manual item-by-item checking, presenting difficulties in content localization, high verification complexity, and limited automation capabilities, urgently requiring innovative technological solutions.Research Objectives and Research Questions: This study aims to establish an automated annual report compliance assessment system based on Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) technology, implementing automated verification processes and designing visual assistance interfaces. The research explores four core questions: how annual report compliance verification systems can achieve automated assessment by combining LLMs with BM25 retrieval technology, what core functional modules should be included in interactive assistance interfaces based on annual report verification needs, the performance differences of system automated assessment across different report layout types and regulatory clauses, and the key factors affecting system verification accuracy and efficiency along with optimization strategies.Research Methods: A modular system architecture was employed, comprising three main modules: data preprocessing, automated verification, and interactive assistance interface. The 145 regulatory items were systematically classified into five types, focusing on 74 content specification clauses that require no additional information. Stratified proportional sampling was used to select 27 test items, with annual reports from 5 listed companies across different industries and layout types serving as verification samples. Dual evaluation metrics of"lenient accuracy rate" and "strict accuracy rate" were established, combined with expert manual judgment as standard answers for system performance assessment. Major Findings: The system achieved an overall lenient accuracy rate of 86.26\% and strict accuracy rate of 64.12\%, with an average processing time of only 97.43 seconds. Custom dictionaries emerged as a key technological highlight, improving retrieval accuracy from 47.93\% to 86.26\%, representing a 38.33\% improvement. Significant differences existed across different question types: text verification achieved 95.89\% lenient accuracy, field validation reached 68.75\%, while image recognition and procedural clauses both achieved 100\%. The system demonstrated good adaptability to both traditional layout and visual layout annual reports, with accuracy rates of 86.54\% and 86.02\% respectively.Conclusions and Significance: This study successfully validated the feasibility of RAG frameworks in professional compliance verification domains, establishing a complete automated annual report verification system. Academically, the research established a systematic classification framework for regulatory clauses and provided empirical evidence for RAG technology applications in professional domains. Practically, it significantly improved verification efficiency, reduced regulatory agency workloads, and promoted information transparency and investor protection. The research outcomes can be extended to other regulatory document verification, establishing an important foundation for intelligent compliance verification development and demonstrating the innovative application value of integrating library and information retrieval technology with AI technology.
Description
Keywords
年報合規檢驗, 大型語言模型, 檢索增強生成, 自動化判讀, 資訊檢索, Annual Report Compliance Verification, Large Language Model (LLM), Retrieval-Augmented Generation (RAG), Information Retrieval