## TheoremKB: a knowledge base of mathematical results

*2020: ENS - DI VALDA internship, under the supervision of Pierre Senellart*

TheoremKB is a project led by Pierre Senellart whose goal is to build semantic knowledge from a collection of scientific articles in the PDF format. One of the key goals of TheoremKB is to be able to compute the graph of theoretical results from a given research topic. In this graph, an edge is drawn from a result A to result B when B is used in the proof of A. Having a finer resolution than the citation graph, as nodes are individual results instead of whole documents, this graph should allow deducing interesting facts on scientific research, such as whether we can find proof cycles among papers (which would be possible as some authors cite not yet published content), or what results are impacted if a proof is found to be incorrect. The project aims at making bibliographical work easier by having a deeper understanding of how papers are related to each other.

I have developped a set of tools and libraries to perform *theorem extraction* using conditional random fields. While it doesn't beat state-of-the-art methods, the framework will allow building more complex algorithms performing information extraction in the TheoremKB project.

## Learning Sudoku rules with conditional random fields

*2017: EPFL - CVlab internship, under the supervision of Pierre Baqué and Pascal Fua*

Combining backpropagation with Mean-Field inference models: the goal was to study these techniques in order to learn the rules of Sudoku as an ill-posed problem. Indeed given a grid it's possible to have multiple valid solutions. Pierre Baqué's (my internship mentor) Multi-Modal Mean-Field model should be able to cope with that kind of problems, combined with traditional machine learning techniques.

Learning the Sudoku rules as a set of constraints have been a success for the 4x4 version but I faced combinatorial difficulties in the 9x9 (traditional Sudoku) case, as inferring correct grids given a set of CRF constraints was proven to be a hard problem.