Source: AkiPress
Kyrgyzstan will create a national language corpus and digitize the KTRK archive for the AI development, Ulut Soft CEO Mirbek Okenov said during the panel session "AI Infrastructure: from Data Centers to Global Networks" at the KIT-2025 forum, The Caspian Post informs via AkiPress.
According to him, the state plans to create a Kyrgyz language corpus, which will become the basis for further development of artificial intelligence.
"Next year, it is planned to digitize the golden fund of KTRK, including archival radio recordings from the Soviet period to modern day. We have preserved a huge amount of unique content, which will become an important part of this corpus," Okenov noted.
He also said that the problem of a lack of data for training AI is gradually losing relevance due to the development of synthetic datasets.
"Artificial intelligence can already create training data itself. It is only important to correctly direct this process. Therefore, developers do not have to limit themselves to real texts only - synthetic datasets also play an important role," he added.
Share on social media