2020

TheoremKB: a knowledge base of mathematical results

2020: ENS - DI VALDA internship, under the supervision of Pierre Senellart

TheoremKB is a project led by Pierre Senellart whose goal is to build semantic knowledge from a collection of scientific articles in the PDF format. One of the key goals of TheoremKB is to be able to compute the graph of theoretical results from a given research topic. In this graph, an edge is drawn from a result A to result B when B is used in the proof of A. Having a finer resolution than the citation graph, as nodes are individual results instead of whole documents, this graph should allow deducing interesting facts on scientific research, such as whether we can find proof cycles among papers (which would be possible as some authors cite not yet published content), or what results are impacted if a proof is found to be incorrect. The project aims at making bibliographical work easier by having a deeper understanding of how papers are related to each other.

I have developped a set of tools and libraries to perform theorem extraction using conditional random fields. While it doesn't beat state-of-the-art methods, the framework will allow building more complex algorithms performing information extraction in the TheoremKB project.

  • Github repository

  • Internship report (HAL: A Knowledge Base of Mathematical Results, Extracting scientific results from research articles)