Found 11 bookmarks
Custom sorting
NuExtract 1.5
NuExtract 1.5
Structured extraction - where an LLM helps turn unstructured text (or image content) into structured data - remains one of the most directly useful applications of LLMs. NuExtract is a …
·simonwillison.net·
NuExtract 1.5
Docling
Docling
MIT licensed document extraction Python library from the Deep Search team at IBM, who released [Docling v2](https://ds4sd.github.io/docling/v2/#changes-in-docling-v2) on October 16th. Here's the [Docling Technical Report](https://arxiv.org/abs/2408.09869) paper from August, which provides …
·simonwillison.net·
Docling
files-to-prompt 0.4
files-to-prompt 0.4
New release of my [files-to-prompt tool](https://simonwillison.net/2024/Apr/8/files-to-prompt/) adding an option for filtering just for files with a specific extension. The following command will output Claude XML-style markup for all Python and …
·simonwillison.net·
files-to-prompt 0.4
microsoft/table-transformer: Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
microsoft/table-transformer: Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS ev...
·github.com·
microsoft/table-transformer: Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.