The success of Natural Language Processing (NLP) often depends on the availability of (high-quality) data. In particular, the costly manual annotation of text data has posed a major challenge since the early days of NLP. To overcome the data annotation bottleneck, a number of methods have been proposed. One prominent method in this context is Active Learning, which aims to minimize the set of data that needs to be annotated.
However, the development of Large Language Models (LLMs) has changed the field of NLP considerably. For this reason, it is of huge interest to us working in this field (both in research and in practical application) to understand if and how a lack of annotated data is still affecting NLP today.
At the center of this survey is Active Learning, which was last surveyed in a web survey in 2009. Fifteen years later, we aim to reassess the current state of the method from the user's point of view. Besides inquiring where Active Learning is used, we also ask where it is not used in favor of other methods. Moreover, we want to understand which computational methods the community considers most useful to overcome a lack of annotated data.
The survey is conducted solely for non-commercial, academic purposes. It specifically targets participants who are or have been involved in supervised machine learning for NLP. Knowledge about Active Learning is not required. Filling out the survey will take you approximately 15 minutes.