Found 14 bookmarks
Custom sorting
Trying out llama.cpp’s new vision support
Trying out llama.cpp’s new vision support
This llama.cpp server vision support via libmtmd pull request—via Hacker News—was merged earlier today. The PR finally adds full support for vision models to the excellent llama.cpp project. It’s documented …
·simonwillison.net·
Trying out llama.cpp’s new vision support
Private Local LlamaOCR with a User-Friendly Streamlit Front-End
Private Local LlamaOCR with a User-Friendly Streamlit Front-End
Optical Character Recognition (OCR) is a powerful tool for extracting text from images, and with the rise of multimodal AI models, it's now easier than ever to implement locally. In this guide, we'll show you how to build a professional OCR application using Llama 3.2-Vision, Ollama for the backend, and Streamlit for the front end.PrerequisitesBefore we start, ensure you have the following:1. Python 3.10 or higher installed.2. Anaconda (Optional)3. Ollama installed for local model hosting. Downl
·gpt-labs.ai·
Private Local LlamaOCR with a User-Friendly Streamlit Front-End
Unlock Gemma 3's Multi Image Magic
Unlock Gemma 3's Multi Image Magic
🎬 Ever wondered how AI could turn your life into a documentary? Watch as I create a seemingly professional documentary about myself in minutes using Gemma 3, Ollama, and ElevenLabs - no film crew needed! 🎯 In this video, you'll learn: • How to use Gemma 3's multimodal capabilities with multiple images • Building a simple CLI app with Deno/TypeScript for image processing • Working with n8n workflows for AI integration • Creating convincing AI-generated narratives with Ollama • Complete workflow from capture to final video production ⏱️ Timestamps: 00:00 - Start 00:28 - I'm in a Documentary 01:56 - Gemma3 02:13 - Whats new in Gemma3 with Ollama 02:26 - Tell a story with many images 03:54 - Creating the app with Windsurf 04:49 - It's not in x language 05:19 - Let's look at the code 08:32 - The backend in n8n 🛠️ Tools & Resources Mentioned: • Gemma 3 27b • Ollama (https://ollama.com) • ElevenLabs (https://try.elevenlabs.io/tvlst) • n8n • Deno/TypeScript Want to create your own AI-powered content? Drop a comment below with your ideas or questions! #AIContent #TechTutorial #AIDocumentary My Links 🔗 👉🏻 Subscribe (free): https://www.youtube.com/technovangelist 👉🏻 Join and Support: https://www.youtube.com/channel/UCHaF9kM2wn8C3CLRwLkC2GQ/join 👉🏻 Newsletter: https://technovangelist.substack.com/subscribe 👉🏻 Twitter: https://www.twitter.com/technovangelist 👉🏻 Discord: https://discord.gg/uS4gJMCRH2 👉🏻 Patreon: https://patreon.com/technovangelist 👉🏻 Instagram: https://www.instagram.com/technovangelist/ 👉🏻 Threads: https://www.threads.net/@technovangelist?xmt=AQGzoMzVWwEq8qrkEGV8xEpbZ1FIcTl8Dhx9VpF1bkSBQp4 👉🏻 LinkedIn: https://www.linkedin.com/in/technovangelist/ 👉🏻 All Source Code: https://github.com/technovangelist/videoprojects Want to sponsor this channel? Let me know what your plans are here: https://www.technovangelist.com/sponsor
·youtube.com·
Unlock Gemma 3's Multi Image Magic
olmOCR - The Open OCR System
olmOCR - The Open OCR System
In this video, I look at olmOCR, the OpenOCR system from Allen AI. Colab: https://dripl.ink/HpaK4 Blog: https://olmocr.allenai.org/blog macOS ver: https://jonathansoma.com/words/olmocr-on-macos-with-lm-studio.html For more tutorials on using LLMs and building agents, check out my Patreon Patreon: https://www.patreon.com/SamWitteveen Twitter: https://x.com/Sam_Witteveen 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: https://drp.li/dIMes 👨‍💻Github: https://github.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00 Intro 00:31 Allen AI Blog 01:20 olmOCR Blog 02:08 olmOCR Hugging Face 04:52 olmOCR GitHub 05:41 Demo 05:59 Running olmOCR on macOS with LM Studio
·youtube.com·
olmOCR - The Open OCR System