Search AI/ML

Found 7 bookmarks

Custom sorting

GitHub - ivanfioravanti/qwen-image-mps: Qwen Image models through MPS

Qwen Image models through MPS. Contribute to ivanfioravanti/qwen-image-mps development by creating an account on GitHub.

#image #local model #macos #cli

·github.com·Nov 26, 2025

GitHub - ivanfioravanti/qwen-image-mps: Qwen Image models through MPS

qwen-image-mps

Ivan Fioravanti built this Python CLI script for running the Qwen/Qwen-Image image generation model on an Apple silicon Mac, optionally using the Qwen-Image-Lightning LoRA to dramatically speed up generation. Ivan …

#image #local model #macos #cli

·simonwillison.net·Aug 11, 2025

qwen-image-mps

Introducing Gemma 3n: The developer guide

Extremely consequential new open weights model release from Google today: Multimodal by design: Gemma 3n natively supports image, audio, video, and text inputs and text outputs. Optimized for on-device: Engineered …

#local model #vision #image #audio #video #text

·simonwillison.net·Jun 27, 2025

Introducing Gemma 3n: The developer guide

Passing Images to a Vision-Language Model in Ollama | by Manyi | Apr,…

https://medium.com/@manyi.yim/passing-images-to-a-vlm-in-ollama-a8c16bad9fea

#vision #local model #image

·archive.ph·Jun 9, 2025

Passing Images to a Vision-Language Model in Ollama | by Manyi | Apr,…

Trying out llama.cpp’s new vision support

This llama.cpp server vision support via libmtmd pull request—via Hacker News—was merged earlier today. The PR finally adds full support for vision models to the excellent llama.cpp project. It’s documented …

#vision #image #local model

·simonwillison.net·Jun 9, 2025

Trying out llama.cpp’s new vision support

OpenBMB/MiniCPM-o: MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone - OpenBMB/MiniCPM-o

#vision #image #local model

·github.com·Feb 1, 2025

OpenBMB/MiniCPM-o: MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Ollama: Llama 3.2 Vision

Ollama released version 0.4 [last week](https://github.com/ollama/ollama/releases/tag/v0.4.0) with support for Meta's first Llama vision model, [Llama 3.2](https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/). If you have Ollama installed you can fetch the 11B model (7.9 GB) like …

#local model #cli #image

·simonwillison.net·Nov 13, 2024

Ollama: Llama 3.2 Vision