For humans and computers to communicate via natural language text, it is necessary that the understanding (interpretation) of the given texts be shared. This study addresses issues that arise in creating a common ground for natural language understanding. Especially, what is emphasized in system design in today’s language comprehensions systems underpinning deep learning is the design of language comprehension tasks including data collection and evaluation criteria. We study methods to measure skills that are demanded for language understanding and to collect cases required for training through the analysis and design of machine reading comprehension and natural language communication.
Evaluation Methodology for Machine Reading Comprehension Task: Prerequisite Skills and Readability
A major goal of natural language processing (NLP) is to develop agents that can understand natural language. Such an ability can be tested with a reading comprehension (RC) task that requires the agent to read open-domain documents and answer questions about them. In this situation, knowing the quality of reading comprehension (RC) datasets is important for the development of language understanding agents in order to identify what the agents can and cannot understand in the evaluation. However, a detailed error analysis is difficult due to the lack of metrics in recent datasets. In this study, we adopted two classes of metrics for evaluating RC datasets: prerequisite skills and readability. We applied these classes to six existing datasets, including MCTest and SQuAD, and demonstrated the characteristics of the datasets according to each metric and the correlation between the two classes. Our dataset analysis suggested that the readability of RC datasets does not directly affect the question difficulty and that it is possible to create an RC dataset that is easy-to-read but difficult-to-answer. (Sugawara et al.; Links  )
A Natural Language Corpus of Common Grounding under Continuous and Partially-Observable Context
Common grounding is the process of creating, repairing and updating mutual understandings, which is a critical aspect of sophisticated human communication. However, traditional dialogue systems have limited capability of establishing common ground, and we also lack task formulations which introduce natural difficulty in terms of common grounding while enabling easy evaluation and analysis of complex models. In this work, we propose a minimal dialogue task which requires advanced skills of common grounding under continuous and partially-observable context. Based on this task formulation, we collected a largescale dataset of 6,760 dialogues which fulfills essential requirements of natural language corpora. Our analysis of the dataset revealed important phenomena related to common grounding that need to be considered. Finally, we evaluate and analyze baseline neural models on a simple subtask that requires recognition of the created common ground. We show that simple baseline models perform decently but leave room for further improvement. Overall, we show that our proposed task will be a fundamental testbed where we can train, evaluate, and analyze dialogue system’s ability for sophisticated common grounding. ((Udagawa et al.: accepted at AAAI-19))