Unstructured - Unstructured
NuExtract 1.5
Structured extraction - where an LLM helps turn unstructured text (or image content) into structured data - remains one of the most directly useful applications of LLMs. NuExtract is a …
AI-Powered Content Audits for Local News
How to responsibly use AI to help with understanding your coverage
GitHub - DocumindHQ/documind: Open-source platform for extracting structured data from documents using AI.
Open-source platform for extracting structured data from documents using AI. - DocumindHQ/documind
GitHub - DocumindHQ/documind: Open-source platform for extracting structured data from documents using AI.
Open-source platform for extracting structured data from documents using AI. - DocumindHQ/documind
Home - Docling
Docling
MIT licensed document extraction Python library from the Deep Search team at IBM, who released [Docling v2](https://ds4sd.github.io/docling/v2/#changes-in-docling-v2) on October 16th. Here's the [Docling Technical Report](https://arxiv.org/abs/2408.09869) paper from August, which provides …
Run a prompt to generate and execute jq programs using llm-jq
llm-jq is a brand new plugin for LLM which lets you pipe JSON directly into the llm jq command along with a human-language description of how you’d like to manipulate …
files-to-prompt 0.4
New release of my [files-to-prompt tool](https://simonwillison.net/2024/Apr/8/files-to-prompt/) adding an option for filtering just for files with a specific extension. The following command will output Claude XML-style markup for all Python and …
VikParuchuri/marker: Convert PDF to markdown quickly with high accuracy
Convert PDF to markdown quickly with high accuracy - VikParuchuri/marker: Convert PDF to markdown quickly with high accuracy
microsoft/table-transformer: Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS ev...