AI Research Paper

EvoLoRA: Synergizing Low-Rank Adaptation and Evolutionary Prompts for Resource-Efficient ASR Error Correction

A novel framework combining LoRA and evolutionary optimization for automatic speech recognition error correction with 98.7% parameter reduction

Automatic Speech Recognition Low-Rank Adaptation Evolutionary Optimization Large Language Models Resource-Efficient Learning
7.47%
TinyLlama-1.1B Final WER
6.25%
Qwen2.5-7B Final WER
98.7%
Parameter Reduction
2.3
GPU Hours (TinyLlama)

Research Overview

Automatic Speech Recognition (ASR) systems often generate transcription errors in noisy environments and specialized domains, while traditional Large Language Models (LLMs) perform poorly when applied directly (e.g., Qwen2.5-7B has a zero-sample Word Error Rate of 1503.20%). To address this problem, this paper proposes the EvoLoRA framework, which achieves resource-efficient ASR error correction through deep integration of Low-Rank Adaptation (LoRA), Evolutionary Prompt Optimization (EvoPrompt), and auxiliary techniques. Experiments demonstrate that Qwen2.5-7B reduces WER to 6.25% after optimization, and TinyLlama-1.1B reaches 7.47% with only 2.3 GPU hours of training time, both approaching the ASR baseline system (5.31%). LoRA technology reduces trainable parameters by more than 98%, validating the feasibility of the "lightweight model + efficient optimization" strategy for resource-constrained scenarios.

Technical Approach

Our framework integrates three core components: Low-Rank Adaptation for efficient parameter tuning, Evolutionary Prompt Optimization for automatic prompt generation, and auxiliary processing techniques for error mitigation.

L
Low-Rank Adaptation (LoRA)

Efficient parameter tuning by updating only 0.3%-1.2% of model parameters through low-rank decomposition:

ΔW = AB, where A ∈ ℝr×k, B ∈ ℝd×r, r ≪ min(d,k)

Applied to Query and Value projection layers in attention mechanisms, reducing parameters from dk to r(d+k).

E
Evolutionary Prompt Optimization

Genetic algorithm with population size 100, 50 iterations, using:

  • Selection: Roulette sampling (top 30% retention)
  • Crossover: Rate 0.8 with text segment splicing
  • Mutation: Rate 0.2 with keyword replacement
  • Fitness: WER on validation set

A
Auxiliary Error Mitigation

Multi-level error filtering mechanism:

  • TokenLimit: Dynamic sequence truncation (45% WER contribution)
  • Deduplication: Sliding window algorithm (30% redundancy reduction)
  • Word Filtering: Domain-specific blacklist/whitelist

Key Findings

Evolutionary Algorithm Process
Evolutionary Algorithm Process

Figure 1: 3D visualization of evolutionary prompt optimization process showing initial population, selection, crossover, and mutation stages

Word Error Rate (WER) Comparison
WER Comparison

Figure 2: Comprehensive WER comparison showing performance improvements across different models and configurations

Module Contribution Analysis
WER Reduction Radar Chart

Figure 3: Radar chart showing the relative contribution of each technique component to overall WER reduction

System Architecture Pipeline
Pipeline Diagram

Figure 4: End-to-end process flow of the EvoLoRA framework from audio input to final transcription

Experimental Results

Model Parameters Zero-shot WER Optimized WER Training Time Parameter Updates
TinyLlama-1.1B 1.1B 285.89% 7.47% 2.3 GPU hours 0.3%
Qwen2.5-7B 7B 1503.20% 6.25% 18.5 GPU hours 1.2% (LoRA)
LLaMA-3.2-3B 3.2B 452.71% 8.12% 12.7 GPU hours 100%
GPT-3.5 Turbo 175B 44.90% 33.96% 0.1 hours (API) 0%

Ablation Study

Individual contribution of each component to overall WER reduction on Qwen2.5-7B

TokenLimit
45.0%
Repetition Removal
30.0%
LoRA Adaptation
11.67%
EvoPrompt
11.67%
Word Filter
3.33%

Key Insights

Performance Bottlenecks and Limitations

Despite achieving significant performance improvements, our framework exhibits a 4.05% WER gap compared to SOTA method Hypo2Trans (2.20%). Core limitations include:

  • Domain Adaptation: Limited coverage of specialized terminology and acoustic conditions
  • Acoustic Information: Post-processing approach lacks explicit acoustic feature integration
  • Sequence Modeling: Decreased efficiency with long N-best hypothesis sequences (>300 tokens)

Computational Efficiency Trade-offs

The framework achieves optimal balance between performance and resource consumption:

  • Lightweight Model Advantage: TinyLlama-1.1B requires 87% less training time with only 1.22% WER increase
  • LoRA Efficiency: 98.7% parameter reduction while maintaining 92% of full fine-tuning performance
  • Scalability: Suitable for edge devices and resource-constrained scenarios

Research Contributions

This study proposes the EvoLoRA framework for resource-efficient ASR error correction, achieving significant performance improvements through synergistic integration of LoRA, EvoPrompt, and auxiliary techniques. Key contributions include:

  • Methodological Innovation: Novel combination of low-rank adaptation and evolutionary prompt optimization
  • Efficiency Achievement: 98.7% parameter reduction with minimal performance loss
  • Practical Impact: Validation of "lightweight model + efficient optimization" paradigm for resource-constrained scenarios

The framework provides a cost-effective solution for ASR error correction in edge devices and low-resource environments, demonstrating that careful optimization can achieve competitive performance with significantly reduced computational requirements.