Microsoft to Train AI in Hungarian Using 10 Billion-Word Dataset in Historic GVH Agreement
Microsoft has launched a major initiative to strengthen its artificial intelligence capabilities in Hungarian, following proceedings with the Hungarian Competition Authority (GVH). The company will develop and utilize a dataset containing approximately 10 billion Hungarian words to enhance the training of its AI models.
As part of the agreement, Microsoft has committed to making this dataset freely accessible to other AI developers, a move that could significantly advance Hungarian-language AI applications. This commitment is tied to a GVH investigation into whether Microsoft had adequately informed Hungarian users about the AI-powered services it introduced in February 2023.
Expanding AI development in smaller languages like Hungarian is considered essential for data sovereignty, security, and cultural preservation. Microsoft will also implement educational initiatives, including training programs for Hungarian civil servants, small and medium-sized enterprises (SMEs), and consumers, to improve awareness of AI’s potential and risks. This effort aligns with Microsoft’s broader strategy, which includes investing over €20 billion in AI and cloud infrastructure across Europe.