Search AI/ML

Found 11 bookmarks

Custom sorting

Unstructured - Unstructured

#data science #text sanitization

·docs.unstructured.io·Jan 3, 2025

Unstructured - Unstructured

NuExtract 1.5

Structured extraction - where an LLM helps turn unstructured text (or image content) into structured data - remains one of the most directly useful applications of LLMs. NuExtract is a …

#text sanitization #RAG

·simonwillison.net·Nov 19, 2024

NuExtract 1.5

AI-Powered Content Audits for Local News

How to responsibly use AI to help with understanding your coverage

#text sanitization #text #safety

·generative-ai-newsroom.com·Nov 19, 2024

AI-Powered Content Audits for Local News

GitHub - DocumindHQ/documind: Open-source platform for extracting structured data from documents using AI.

Open-source platform for extracting structured data from documents using AI. - DocumindHQ/documind

#data science #text sanitization #image

·github.com·Nov 19, 2024

GitHub - DocumindHQ/documind: Open-source platform for extracting structured data from documents using AI.

Open-source platform for extracting structured data from documents using AI. - DocumindHQ/documind

#text sanitization #text #data science

·github.com·Nov 19, 2024

GitHub - DocumindHQ/documind: Open-source platform for extracting structured data from documents using AI.

Home - Docling

#text sanitization #OCR

·ds4sd.github.io·Nov 3, 2024

Home - Docling

Docling

MIT licensed document extraction Python library from the Deep Search team at IBM, who released [Docling v2](https://ds4sd.github.io/docling/v2/#changes-in-docling-v2) on October 16th. Here's the [Docling Technical Report](https://arxiv.org/abs/2408.09869) paper from August, which provides …

#text sanitization #text #agent

·simonwillison.net·Nov 3, 2024

Docling

Run a prompt to generate and execute jq programs using llm-jq

llm-jq is a brand new plugin for LLM which lets you pipe JSON directly into the llm jq command along with a human-language description of how you’d like to manipulate …

#cli #json #text sanitization

·simonwillison.net·Oct 29, 2024

Run a prompt to generate and execute jq programs using llm-jq

files-to-prompt 0.4

New release of my [files-to-prompt tool](https://simonwillison.net/2024/Apr/8/files-to-prompt/) adding an option for filtering just for files with a specific extension. The following command will output Claude XML-style markup for all Python and …

#cli #scraping #text sanitization

·simonwillison.net·Oct 17, 2024

files-to-prompt 0.4

VikParuchuri/marker: Convert PDF to markdown quickly with high accuracy

Convert PDF to markdown quickly with high accuracy - VikParuchuri/marker: Convert PDF to markdown quickly with high accuracy

#text sanitization #pdf #markdown #scraping

·github.com·Dec 1, 2023

VikParuchuri/marker: Convert PDF to markdown quickly with high accuracy

microsoft/table-transformer: Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.

Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS ev...

#text #text sanitization

·github.com·Apr 29, 2023