Low-Resource NLP: Not Just Throwing Data at a Model and Hoping for the Best

Abstract

The talk will introduce several topics in low-resource and multilingual NLP – in the domain of disinformation combatting, through works done at the Kempelen Institute of Intelligent Technologies in Bratislava – and also in the more general context of efficiently adapting large language models to smaller languages. It will argue that machine learning – even in the era of deep learning and large language models – is not just about throwing increasing amounts of data at a model and hoping for the best; our lack of understanding can and sometimes does severely limit the capabilities of our models.

Date
Aug 6, 2025
Location
Philadelphia, Pennsylvania, USA
Michal Gregor
Michal Gregor
Researcher – Expert

My research interests include distributed robotics, mobile computing and programmable matter.