Skip to content

Version 0.7.0

Compare
Choose a tag to compare
@kermitt2 kermitt2 released this 17 Jul 15:01
· 977 commits to master since this release
605c646

Added

  • New YAML configuration: all the settings are in one single yaml file, each model can be fully configured independently
  • Improvement of the segmentation and header models (for header, +1 F1-score for PMC evaluation, +4 F1-score for bioRxiv), improvements for body and citations
  • Add figure and table pop-up visualization on PDF in the console demo
  • Add PDF MD5 digest in the TEI results (service only)
  • Language support packages and xpdfrc file for pdfalto (support of CJK and exotic fonts)
  • Prometheus metrics
  • BidLSTM-CRF-FEATURES implementation available for more models
  • Addition of a "How GROBID works" page in the documentation

Changed

  • JitPack release (RIP jcenter)
  • Improved DOI cleaning
  • Speed improvement (around +10%), by factorizing some layout token manipulation
  • Update CrossRef requests implementation to align to the current usage of CrossRef's X-Rate-Limit-Limit response parameter

Fixed

  • Fix base url in demo console
  • Add missing pdfalto Graphics information when -noImage is used, fix graphics data path in TEI
  • Fix the tendency to merge tables when they are in close proximity