We are confronting an “information space” with a huge and heterogeneous structure that has never been experienced before. Our research specifically examines information conveyed by characters as a medium. We are conducting theoretical and practical studies of its analysis and application in this laboratory.

  • Information Linkage: Entity identification and information recommendation
  • Human reading: Measurement and modeling of human language activities through text
  • Machine reading: Language analysis of and knowledge acquisition from text by a computer

We are involved in the following research projects while studying the fundamental technology of text and language processing by a computer.

Research projects

Knowledge acquisition from documents and its application (Research topics)

Development of information technology has enabled us to process a huge quantity of language texts using a computer. However, fundamental information is not necessarily expressed in a text in a favorable fashion. Knowledge embedded into a text only becomes available through the understanding of its reader. Consequently, it is a great challenge for artificial intelligence to comprehend the fundamental function of human intelligence in which knowledge is transformed, combined, and used. We are conducting research on knowledge acquisition using language analysis by a computer and application of knowledge acquired, to undertake a more fundamental task: search for a widespread knowledge space generated by operations such as generalization, integration, and inference.


  • Construction and demonstration of recommendation systems for scientific papers
  • Linkage technology between texts and knowledge bases
  • Data integration by identification of scientific articles and researchers
  • Automatic construction of lexical resources/ontology
  • Logical and semantic structural analysis of documents and knowledge extraction therefrom

Retrieval and understanding of mathematical knowledge (Research topics)

A formula is mathematical notation used in diverse scenarios of science and education, and plays an important role in various scientific disciplines. However, because a formula is a nonverbal expression, it has been examined only slightly to date as a research subject in natural language processing. We regard a formula not as a kind of image or symbol string but as a component of a document carrying a special structure and interpretation, and analyze a formula associating its description to study of a language processing approach for handling the semantics of a formula. Our research goal is the implementation of an application base of mathematical knowledge through mathematical knowledge search and development and evaluation of an understanding support system using these component technologies.


  • Construction of math formula search systems for math information access
  • Support for formula understanding by semantic analysis of surrounding texts
  • Analysis of and accessibility to high school mathematics textbooks
  • Dataset construction for evaluation of math information access

“Science of reading” — Gaze-based analysis and application of reading process (Research topics)

Language activities through the screen of an electronic terminal are indispensable to our everyday life. We specifically investigate a human act of “reading” on a screen and performing research on its measurement, modeling, and support in this project. Specifically, we regard an act of “reading” a text on a screen as the interaction of the following three factors: (1) semantic structure of an object text; (2) image features, such as a layout and character decoration; and (3) a reader’s visual sense and language cognitive process. Our goal is to develop a method for presenting a text in a readable form, as well as studying the measurement and modeling method.


  • Text–gaze alignment
  • Modeling of gaze prediction based on linguistic analysis and text structure
  • Document layout optimization based on semantic analysis of contents

Language interface for scientific writing assistance (Research topics) ico_new12_1

We are studying the semantic indexing method for associating other texts semantically similar and recommending them as an example to the arbitrary sites of the text read and written by a user in this project. Moreover, our subjects include paraphrasing, error correction, and template extraction for sentence generation, with the aim of developing a practical application that assists paper writing in English for non-native speakers.

  • Scientific paper writing aid for non-native researchers
  • Distributional representation for similar sentence search
  • Template extraction, paraphrasing, and error correction for sentence generation

Introduction of research topics by laboratory members