AI unlocks lost pitch accents in 3,000-year-old Vedic Sanskrit texts
Researchers have made progress in restoring ancient pitch accents in Rigvedic Sanskrit. A team from the University of Oxford and the Max Planck Institute for the Science of Human History developed new computational tools for this task. The work aims to overcome challenges posed by the text’s complex accent system, which has long complicated accurate reconstructions.
The project began with the creation of a parallel dataset containing over 22,000 aligned verse pairs. This corpus allowed researchers to evaluate different models for accent restoration using three key metrics: Word Error Rate, Character Error Rate, and Diacritic Error Rate.
The study establishes reliable methods for restoring missing accents in Vedic Sanskrit. ByT5 emerged as the top-performing model, while LoRA provided an efficient alternative. These results create a foundation for further research into ancient language reconstruction and computational linguistics.