Search AI/ML

Found 66 bookmarks

Custom sorting

·labs.google·Jan 3, 2025

LlamaOCR - Building your Own Private OCR System - YouTube

The video demonstrates LlamaOCR, an OCR tool leveraging the Llama 3.2 visual model. It focuses on the tool's ability to convert images and scanned documents into structured Markdown, preserving the original formatting of elements like tables, lists, and spreadsheets. The video covers practical usage examples, offering tutorials and code snippets in both JavaScript and Python within a Colab environment. For more tutorials on using LLMs and building agents, check out my Patreon Patreon: https://www.patreon.com/SamWitteveen Twitter: https://twitter.com/Sam_Witteveen Colab: https://drp.li/WpdNm 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: https://drp.li/dIMes ⏱️Time Stamps: 00:00 LlamaOCR Project 00:56 Demo Using their Site 02:43 Colab Demo 04:40 Together.AI Docs 06:06 Pricing 09:16 Python OCR Version 11:20 Thai OCR Project 16:30 Patreon

#vision #image #OCR

·youtube.com·Nov 19, 2024

LlamaOCR - Building your Own Private OCR System - YouTube

GitHub - DocumindHQ/documind: Open-source platform for extracting structured data from documents using AI.

Open-source platform for extracting structured data from documents using AI. - DocumindHQ/documind

#data science #text sanitization #image

·github.com·Nov 19, 2024

GitHub - DocumindHQ/documind: Open-source platform for extracting structured data from documents using AI.

Recraft V3

Recraft are a generative AI design tool startup based out of London who released their v3 model a few weeks ago. It's currently sat at the top of the [Artificial …

#image

·simonwillison.net·Nov 15, 2024

Recraft V3

Ollama: Llama 3.2 Vision

Ollama released version 0.4 [last week](https://github.com/ollama/ollama/releases/tag/v0.4.0) with support for Meta's first Llama vision model, [Llama 3.2](https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/). If you have Ollama installed you can fetch the 11B model (7.9 GB) like …

#local model #cli #image

·simonwillison.net·Nov 13, 2024

Ollama: Llama 3.2 Vision

Infinite AI Artboard - Recraft

Premium image generation and editing tool. Store and share your own styles, create, fine-tune, upscale, and perfect your visuals.

#art #image

·recraft.ai·Nov 2, 2024

Infinite AI Artboard - Recraft

microsoft/OmniParser · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

#image

·huggingface.co·Nov 2, 2024

microsoft/OmniParser · Hugging Face

A return to hand-written notes by learning to read & write

#image

·research.google·Oct 29, 2024

A return to hand-written notes by learning to read & write

What kind of music is this?

"This collection appears to be primarily alternati…" Go see Molmo's answer!

#vision #image

·molmo.allenai.org·Sep 27, 2024

What kind of music is this?

What I’ve Learned in the Past Year Spent Building an AI Video Editor - Make Art with Python

Lessons from An Unexpected Year in AI

#video #image #art

·makeartwithpython.com·Sep 25, 2024

What I’ve Learned in the Past Year Spent Building an AI Video Editor - Make Art with Python

No one’s ready for this

The photograph is now meaningless as evidence. We are not prepared.

#image #photo #safety

·theverge.com·Aug 23, 2024

No one’s ready for this

Black Forest Labs - Frontier AI Lab

Amazing AI models from the Black Forest.

#image

·blackforestlabs.ai·Aug 11, 2024

Black Forest Labs - Frontier AI Lab

KwaiVGI/LivePortrait

Make one portrait alive!

#image #vision #animation

·github.com·Jul 9, 2024

KwaiVGI/LivePortrait

Dragonfly: A large vision-language model with multi-resolution zoom

#vision #image

·together.ai·Jun 7, 2024

Dragonfly: A large vision-language model with multi-resolution zoom

GitHub - timpaul/form-extractor-prototype

This tool extracts the structure from an image of a form.

It uses the Claude 3 LLM model by Anthropic.

A single extraction of an A4 form page costs about 10p.

It replicates the form structure in JSON, following the schema used by GOV.UK Forms.

It then uses that to generate a multi-page web form in the GOV.UK style.

#image

·github.com·Apr 25, 2024

GitHub - timpaul/form-extractor-prototype

Sora: first impressions

We have gained valuable feedback from the creative community, helping us to improve our model.

#image #video #animation

·openai.com·Mar 30, 2024

Sora: first impressions

Lummi: Free AI-Generated Stock Photos & Royalty-Free Images

Discover Lummi: Photos Sans Photographer. The ultimate source for free AI-generated stock photos and royalty-free images. Perfect for designers, marketers, and creatives. No photographers, just algorithms like OpenAI's Dall-e and Midjourney. Browse our extensive library today

#photo #image

·lummi.ai·Mar 27, 2024

Lummi: Free AI-Generated Stock Photos & Royalty-Free Images

EMO

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

#image #video #audio

·humanaigc.github.io·Mar 2, 2024

EMO

Image Annotation with LLava & Ollama

LLava 1.6 models - https://huggingface.co/liuhaotian Code for this vid - https://github.com/samwit/ollama-tutorials/blob/main/ollama_python_lib/ollama_scshot...

#image

·youtube.com·Feb 14, 2024

Image Annotation with LLava & Ollama

Apple releases ‘MGIE’, a revolutionary AI model for instruction-based image editing

Apple’s MGIE is a revolutionary AI model that can edit images based on natural language instructions, using multimodal large language models to generate expressive and imaginative edits.

#image

·venturebeat.com·Feb 7, 2024

Apple releases ‘MGIE’, a revolutionary AI model for instruction-based image editing

Noiselith

The easiest local image generate tool

#image

·noiselith.com·Dec 2, 2023

Noiselith

Animate Anyone

Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

#image #video

·humanaigc.github.io·Dec 2, 2023

Animate Anyone

Visual Anagrams

Optical illusions zero-shot from diffusion models.

#image #art

·dangeng.github.io·Dec 2, 2023

Visual Anagrams

DALL·E image → GPT4 Vision → repeat | DALL·E Party

DALL·E image → GPT4 Vision → repeat

#image

·dalle.party·Nov 29, 2023

DALL·E image → GPT4 Vision → repeat | DALL·E Party

$1 Recognizer

#image #vision

·depts.washington.edu·Nov 14, 2023

$1 Recognizer

replicate/latent-consistency-model: Run Latent Consistency Models on your Mac

Run Latent Consistency Models on your Mac. Contribute to replicate/latent-consistency-model development by creating an account on GitHub.

#mac #m1 #image

·github.com·Oct 28, 2023

replicate/latent-consistency-model: Run Latent Consistency Models on your Mac

Generate images in one second on your Mac using a latent consistency model

How to run a latent consistency model on your M1 or M2 Mac

#image #mac #m1

·replicate.com·Oct 28, 2023

Generate images in one second on your Mac using a latent consistency model

CoTracker: It is Better to Track Together

#video #image

·co-tracker.github.io·Sep 1, 2023

CoTracker: It is Better to Track Together

A quote from Sam Bleckley

If you visit (often NSFW, beware!) showcases of generated images like civitai, where you can see and compare them to the text prompts used in their creation, you’ll find they’re …

#image

·simonwillison.net·Aug 21, 2023

A quote from Sam Bleckley

GitHub - varunshenoy/opendream: An extensible, easy-to-use, and portable diffusion web UI 👨‍🎨

An extensible, easy-to-use, and portable diffusion web UI 👨‍🎨 - GitHub - varunshenoy/opendream: An extensible, easy-to-use, and portable diffusion web UI 👨‍🎨

#image

·github.com·Aug 20, 2023

GitHub - varunshenoy/opendream: An extensible, easy-to-use, and portable diffusion web UI 👨‍🎨