Open Resources



  • 2WikiMultiHopQA: A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps
  • FECFeval: An evaluation dataset for formulaic expression extraction
  • OneCommon: A Natural Language Corpus of Common Grounding under Continuous and Partially-Observable Context
  • Dynamic-OneCommon: A Natural Language Corpus of Maintaining Common Ground in Dynamic Environments
  • NTCIR-Math: IR Evaluation Task for Math Information Access
  • NTCIR-math-annotation: Annotation of math formula descriptions
  • Q-Scisumm: A Evaluation Dataset for Query-focused Scientific Paper Summarization
  • VQAG: Synthetic datasets for Machine Reading Comprehension


  • PDFNLT 1.0: Tools for Natural Language Text aware PDF structure analysis for scientific papers
  • Planetext Converting XML document into plain text based on tag classification
  • FixFix: A web-based editor for fixations detected in gaze datasets of reading activities
  • mapPdfToXml: A tool for Extract PDF’s Layout Information and embed it into an XML


  • TermLink: Technical term extraction, Wikification and related paper recommendation
  • SideNoter: Scientific paper viewing system (by Takeshi Abekawa)
  • i-linkage: Citation identification