fx
LlamaOCR - Building your Own Private OCR System - YouTube
The video demonstrates LlamaOCR, an OCR tool leveraging the Llama 3.2 visual model. It focuses on the tool's ability to convert images and scanned documents into structured Markdown, preserving the original formatting of elements like tables, lists, and spreadsheets. The video covers practical usage examples, offering tutorials and code snippets in both JavaScript and Python within a Colab environment.
For more tutorials on using LLMs and building agents, check out my Patreon
Patreon: https://www.patreon.com/SamWitteveen
Twitter: https://twitter.com/Sam_Witteveen
Colab: https://drp.li/WpdNm
🕵️ Interested in building LLM Agents? Fill out the form below
Building LLM Agents Form: https://drp.li/dIMes
⏱️Time Stamps:
00:00 LlamaOCR Project
00:56 Demo Using their Site
02:43 Colab Demo
04:40 Together.AI Docs
06:06 Pricing
09:16 Python OCR Version
11:20 Thai OCR Project
16:30 Patreon
GitHub - DocumindHQ/documind: Open-source platform for extracting structured data from documents using AI.
Open-source platform for extracting structured data from documents using AI. - DocumindHQ/documind
Recraft V3
Recraft are a generative AI design tool startup based out of London who released their v3 model a few weeks ago. It's currently sat at the top of the [Artificial …
Ollama: Llama 3.2 Vision
Ollama released version 0.4 [last week](https://github.com/ollama/ollama/releases/tag/v0.4.0) with support for Meta's first Llama vision model, [Llama 3.2](https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/). If you have Ollama installed you can fetch the 11B model (7.9 GB) like …
Infinite AI Artboard - Recraft
Premium image generation and editing tool. Store and share your own styles, create, fine-tune, upscale, and perfect your visuals.
microsoft/OmniParser · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
A return to hand-written notes by learning to read & write
What kind of music is this?
"This collection appears to be primarily alternati…" Go see Molmo's answer!
What I’ve Learned in the Past Year Spent Building an AI Video Editor - Make Art with Python
Lessons from An Unexpected Year in AI
No one’s ready for this
The photograph is now meaningless as evidence. We are not prepared.
Black Forest Labs - Frontier AI Lab
Amazing AI models from the Black Forest.
KwaiVGI/LivePortrait
Make one portrait alive!
Dragonfly: A large vision-language model with multi-resolution zoom
GitHub - timpaul/form-extractor-prototype
This tool extracts the structure from an image of a form.
It uses the Claude 3 LLM model by Anthropic.
A single extraction of an A4 form page costs about 10p.
It replicates the form structure in JSON, following the schema used by GOV.UK Forms.
It then uses that to generate a multi-page web form in the GOV.UK style.
Sora: first impressions
We have gained valuable feedback from the creative community, helping us to improve our model.
Lummi: Free AI-Generated Stock Photos & Royalty-Free Images
Discover Lummi: Photos Sans Photographer. The ultimate source for free AI-generated stock photos and royalty-free images. Perfect for designers, marketers, and creatives. No photographers, just algorithms like OpenAI's Dall-e and Midjourney. Browse our extensive library today
EMO
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Image Annotation with LLava & Ollama
LLava 1.6 models - https://huggingface.co/liuhaotian Code for this vid - https://github.com/samwit/ollama-tutorials/blob/main/ollama_python_lib/ollama_scshot...
Apple releases ‘MGIE’, a revolutionary AI model for instruction-based image editing
Apple’s MGIE is a revolutionary AI model that can edit images based on natural language instructions, using multimodal large language models to generate expressive and imaginative edits.
Noiselith
The easiest local image generate tool
Animate Anyone
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
Visual Anagrams
Optical illusions zero-shot from diffusion models.
DALL·E image → GPT4 Vision → repeat | DALL·E Party
DALL·E image → GPT4 Vision → repeat
$1 Recognizer
replicate/latent-consistency-model: Run Latent Consistency Models on your Mac
Run Latent Consistency Models on your Mac. Contribute to replicate/latent-consistency-model development by creating an account on GitHub.
Generate images in one second on your Mac using a latent consistency model
How to run a latent consistency model on your M1 or M2 Mac
CoTracker: It is Better to Track Together
A quote from Sam Bleckley
If you visit (often NSFW, beware!) showcases of generated images like civitai, where you can see and compare them to the text prompts used in their creation, you’ll find they’re …
GitHub - varunshenoy/opendream: An extensible, easy-to-use, and portable diffusion web UI 👨🎨
An extensible, easy-to-use, and portable diffusion web UI 👨🎨 - GitHub - varunshenoy/opendream: An extensible, easy-to-use, and portable diffusion web UI 👨🎨