Search AI/ML

Found 14 bookmarks

Custom sorting

Testing VLMs and LLMs for robotics w/ the Jetson Thor devkit

Exploring the Jetson Thor devkit w/ some local LLMs and VLMs.More info on the Jetson Thor Devkit: https://nvda.ws/45xIU4BNeural Networks from Scratch book: h...

#vision #local model #hardware

·youtube.com·Sep 3, 2025

Testing VLMs and LLMs for robotics w/ the Jetson Thor devkit

Introducing Gemma 3n: The developer guide

Learn how to build with Gemma 3n, a mobile-first architecture, MatFormer technology, Per-Layer Embeddings, and new audio and vision encoders.

#audio #vision #text #local model

·developers.googleblog.com·Jun 27, 2025

Introducing Gemma 3n: The developer guide

Extremely consequential new open weights model release from Google today: Multimodal by design: Gemma 3n natively supports image, audio, video, and text inputs and text outputs. Optimized for on-device: Engineered …

#local model #vision #image #audio #video #text

·simonwillison.net·Jun 27, 2025

Introducing Gemma 3n: The developer guide

Introducing the unified multi-modal MLX engine architecture in LM Studio

Leveraging `mlx-lm` and `mlx-vlm` to achieve unified multi-modal LLM inference in LM Studio's `mlx-engine`.

#local model #macos #vision #m1

·lmstudio.ai·Jun 10, 2025

Introducing the unified multi-modal MLX engine architecture in LM Studio

ollama-ocr

OCR package using Ollama vision language models.

#vision #OCR #local model #tools

·pypi.org·Jun 10, 2025

ollama-ocr

Passing Images to a Vision-Language Model in Ollama | by Manyi | Apr,…

https://medium.com/@manyi.yim/passing-images-to-a-vlm-in-ollama-a8c16bad9fea

#vision #local model #image

·archive.ph·Jun 9, 2025

Passing Images to a Vision-Language Model in Ollama | by Manyi | Apr,…

Trying out llama.cpp’s new vision support

This llama.cpp server vision support via libmtmd pull request—via Hacker News—was merged earlier today. The PR finally adds full support for vision models to the excellent llama.cpp project. It’s documented …

#vision #image #local model

·simonwillison.net·Jun 9, 2025

Trying out llama.cpp’s new vision support

The best open source OCR models

#OCR #vision #local model

·getomni.ai·Jun 1, 2025

The best open source OCR models

Private Local LlamaOCR with a User-Friendly Streamlit Front-End

Optical Character Recognition (OCR) is a powerful tool for extracting text from images, and with the rise of multimodal AI models, it's now easier than ever to implement locally. In this guide, we'll show you how to build a professional OCR application using Llama 3.2-Vision, Ollama for the backend, and Streamlit for the front end.PrerequisitesBefore we start, ensure you have the following:1. Python 3.10 or higher installed.2. Anaconda (Optional)3. Ollama installed for local model hosting. Downl

#OCR #vision #local model

·gpt-labs.ai·Jun 1, 2025

Private Local LlamaOCR with a User-Friendly Streamlit Front-End

Way Enough - Local VLMs Have Improved

#local model #vision

·danielcorin.com·May 26, 2025

Way Enough - Local VLMs Have Improved

Vision Language Models (Better, faster, stronger)

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

#vision #local model

·huggingface.co·May 13, 2025

Vision Language Models (Better, faster, stronger)

Unlock Gemma 3's Multi Image Magic

🎬 Ever wondered how AI could turn your life into a documentary? Watch as I create a seemingly professional documentary about myself in minutes using Gemma 3, Ollama, and ElevenLabs - no film crew needed! 🎯 In this video, you'll learn: • How to use Gemma 3's multimodal capabilities with multiple images • Building a simple CLI app with Deno/TypeScript for image processing • Working with n8n workflows for AI integration • Creating convincing AI-generated narratives with Ollama • Complete workflow from capture to final video production ⏱️ Timestamps: 00:00 - Start 00:28 - I'm in a Documentary 01:56 - Gemma3 02:13 - Whats new in Gemma3 with Ollama 02:26 - Tell a story with many images 03:54 - Creating the app with Windsurf 04:49 - It's not in x language 05:19 - Let's look at the code 08:32 - The backend in n8n 🛠️ Tools & Resources Mentioned: • Gemma 3 27b • Ollama (https://ollama.com) • ElevenLabs (https://try.elevenlabs.io/tvlst) • n8n • Deno/TypeScript Want to create your own AI-powered content? Drop a comment below with your ideas or questions! #AIContent #TechTutorial #AIDocumentary My Links 🔗 👉🏻 Subscribe (free): https://www.youtube.com/technovangelist 👉🏻 Join and Support: https://www.youtube.com/channel/UCHaF9kM2wn8C3CLRwLkC2GQ/join 👉🏻 Newsletter: https://technovangelist.substack.com/subscribe 👉🏻 Twitter: https://www.twitter.com/technovangelist 👉🏻 Discord: https://discord.gg/uS4gJMCRH2 👉🏻 Patreon: https://patreon.com/technovangelist 👉🏻 Instagram: https://www.instagram.com/technovangelist/ 👉🏻 Threads: https://www.threads.net/@technovangelist?xmt=AQGzoMzVWwEq8qrkEGV8xEpbZ1FIcTl8Dhx9VpF1bkSBQp4 👉🏻 LinkedIn: https://www.linkedin.com/in/technovangelist/ 👉🏻 All Source Code: https://github.com/technovangelist/videoprojects Want to sponsor this channel? Let me know what your plans are here: https://www.technovangelist.com/sponsor

#vision #local model #code #Automation #tutorial

·youtube.com·May 12, 2025

Unlock Gemma 3's Multi Image Magic

olmOCR - The Open OCR System

In this video, I look at olmOCR, the OpenOCR system from Allen AI. Colab: https://dripl.ink/HpaK4 Blog: https://olmocr.allenai.org/blog macOS ver: https://jonathansoma.com/words/olmocr-on-macos-with-lm-studio.html For more tutorials on using LLMs and building agents, check out my Patreon Patreon: https://www.patreon.com/SamWitteveen Twitter: https://x.com/Sam_Witteveen 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: https://drp.li/dIMes 👨‍💻Github: https://github.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00 Intro 00:31 Allen AI Blog 01:20 olmOCR Blog 02:08 olmOCR Hugging Face 04:52 olmOCR GitHub 05:41 Demo 05:59 Running olmOCR on macOS with LM Studio

#OCR #local model #vision

·youtube.com·Mar 2, 2025

olmOCR - The Open OCR System

OpenBMB/MiniCPM-o: MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone - OpenBMB/MiniCPM-o

#vision #image #local model

·github.com·Feb 1, 2025

OpenBMB/MiniCPM-o: MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone