Open Resources | Aizawa Laboratory

Corpus

ACL Anthology Sentence Corpus (AASC): A corpus of natural language text extracted from ACL Anthology papers

BeNEDect:A Benchmark for Numerical Error Detection task
JMedBench: A Benchmark for Evaluating Japanese Biomedical Large Language Models
2WikiMultiHopQA: A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps
FECFeval: An evaluation dataset for formulaic expression extraction
OneCommon: A Natural Language Corpus of Common Grounding under Continuous and Partially-Observable Context
Dynamic-OneCommon: A Natural Language Corpus of Maintaining Common Ground in Dynamic Environments
NTCIR-Math: IR Evaluation Task for Math Information Access
NTCIR-math-annotation: Annotation of math formula descriptions
Q-Scisumm: A Evaluation Dataset for Query-focused Scientific Paper Summarization
VQAG: Synthetic datasets for Machine Reading Comprehension

PDFNLT 1.0: Tools for Natural Language Text aware PDF structure analysis for scientific papers
Planetext Converting XML document into plain text based on tag classification
FixFix: A web-based editor for fixations detected in gaze datasets of reading activities
mapPdfToXml: A tool for Extract PDF’s Layout Information and embed it into an XML

TermLink: Technical term extraction, Wikification and related paper recommendation
SideNoter: Scientific paper viewing system (by Takeshi Abekawa)
i-linkage: Citation identification

Some High-Level Thoughts on How to Conduct Research: Slide showing advice on how to proceed with the research made by Takuma Udagawa, who was a member of our lab.
EVAL-VL-GLUE: A repository for evaluation in language modality for vision-and-language (V&L) models, including a compact set for V&L studies (a pre-trained image extractor and five pre-trained transformers)