Search AI/ML

Found 115 bookmarks

Custom sorting

Sumandora/remove-refusals-with-transformers: Implements harmful/harmless refusal removal using pure HF Transformers

Implements harmful/harmless refusal removal using pure HF Transformers - Sumandora/remove-refusals-with-transformers

#transformers #model training #fine tuning

·github.com·Mar 5, 2025

Sumandora/remove-refusals-with-transformers: Implements harmful/harmless refusal removal using pure HF Transformers

Hello from Transformer Lab | Transformer Lab

Documentation for LLM Toolkit, Transformer Lab

#model training #fine tuning #transformers #local model

·transformerlab.ai·Feb 14, 2025

Hello from Transformer Lab | Transformer Lab

plastic-plant/florence-2: Let's play with Florence-2 vision model.

Let's play with Florence-2 vision model. Contribute to plastic-plant/florence-2 development by creating an account on GitHub.

#image #model training #vision #OCR

·github.com·Feb 9, 2025

plastic-plant/florence-2: Let's play with Florence-2 vision model.

Why LLMs still have problems with OCR | Hacker News

A lot of problems jump out to me with this article, particularly with the explanation of multi-modal LLMs. I'll say that I _do_ agree with the thrust of the article. Don't trust LLMs. But they probably should have argued legitimate issues with VLM based OCR, rather than try to talk about how VLMs are somehow fundamentally flawed or something.> LLMs process images through high-dimensional embeddings, essentially creating abstract representations that prioritize semantic understanding over precise character recognition.This isn't true. CLIP and its derivatives don't prioritize semantic understanding. They are trained contrastively, which (very roughly speaking) means they need to be able to differentiate similar images. If two images are just white with a few words, the only way to differentiate them is to include the text in the embedding.Pretrained CLIP models do tend to be a bit lossy in this department, but not by as much as you would think considering they boil an entire image down to something on the order of 768 floats.> Each step in this pipeline optimizes for semantic meaning while discarding precise visual information.Again, that ... doesn't make any sense. It's a bit foolhardy to even say _what_ the models do, given that not even the most brilliant ML researchers know. But in broad _hypothesis_, the CLIP pipeline is optimizing being able to pair images with captions amongst a large number of possibilities. Which, again, requires them to surface all kinds of information from the image, and often times requires surfacing specific text from the image. How else would it differentiate powerpoint slides? Math problems in images? Etc.> Fixed patch sizes may split individual charactersThis doesn't matter. We know from empirical evidence. But even if it _did_, there's plenty of vision models that use overlapping patches.> Position embeddings lose fine-grained spatial relationshipsThis isn't true. The model is fully aware of the position of pixels within patches, and the position embedding is merely to tell it the position of the patches themselves within the image. Therefore it can derive the absolute position of every pixel, if it needs to. In fact, we have proof they can and do.> losing the ability to have human-in-the-loop evaluations, confidence scores, and bounding box outputs.You get confidence scores for free because the model is explicitly trained to provide cosine similarity scores.OWLv2 is a CLIP based open vocabulary bounding box model (from Google, makers of Gemini). It's finetuned from a standard, pretrained CLIP model. Nothing really special about the vision architecture; just that it gets finetuned to output bounding boxes. And it beats the pants off YOLO while being open vocabulary to boot. So not only are CLIP-like models capable of outputting bounding boxes, but OWLv2 was trained with human-in-the-loop processes and outputs confidence scores.Oh and there's Florence, which is a VLM trained on bounding boxes.> Favor common words over exact transcriptionNothing about LLMs indicates that. In fact, pretrained LLMs favor exact transcription.> "Correct" perceived errors in the source documentWhich OCR systems need to do to be useful for many applications. I get the argument that LLMs are a blackbox in this regard, which is a legitimate criticism, but correcting mistakes is not fundamentally the issue. It's better to say that LLMs _blindly_ correct issues. Whereas, perhaps, one could say a traditional OCR system can report "this is my exact transcription, I corrected it to this" and have various knobs to tweak thresholds. But there's no reason VLMs can't do that too.> Merge or reorder information based on learned patternsLLMs are perfectly capable of regurgitating data verbatim. That's perhaps the first thing they learn to do to get loss down. That's what all long context models are benchmarked against.> Produce different outputs for the same input due to samplingYou can turn off sampling, and then they are deterministic. Or you can output the logits to the user, which gives you effectively confidence scores on its transcription.And a well trained LLM for this task isn't really "probabilistic" in the sense that its outputs are completely different each time. If it's trained and prompted specifically to transcribe a document, that's what it's going to do. Any variations in output at that point are a result of real vagaries either in the document, vision, or the user request.If a user wants consistency, they merely need to ask for it. Or the VLM needs to be trained better. In either case, these models are _capable_ of it.It's most important to note here that, outside of pretrained LLMs, all LLMs that users interact with are Reinforcement trained. So while they were next token prediction trained during _pretraining_, they get trained to seek reward in production. That vastly trims the logits and focuses the model explicitly on performing tasks. Well trained, produc

#OCR #image #model training #vision #fine tuning

·news.ycombinator.com·Feb 9, 2025

Why LLMs still have problems with OCR | Hacker News

Pulse AI Blog - Why LLMs Suck at OCR

#OCR #model training #vision #image

·runpulse.com·Feb 9, 2025

Pulse AI Blog - Why LLMs Suck at OCR

Train your own R1 reasoning model locally (GRPO)

You can now reproduce your own DeepSeek-R1 reasoning model with Unsloth 100% locally. Using GRPO. Open-source, free and beginner friendly.

#model training #local model #fine tuning

·unsloth.ai·Feb 9, 2025

Train your own R1 reasoning model locally (GRPO)

S1: The $6 R1 Competitor?

#local model #model training #fine tuning

·timkellogg.me·Feb 6, 2025

S1: The $6 R1 Competitor?

Tim Kellogg shares his notes on a new paper, [s1: Simple test-time scaling](https://arxiv.org/abs/2501.19393), which describes an inference-scaling model fine-tuned on top of Qwen2.5-32B-Instruct for just $6 - the cost for …

#model training #fine tuning #local model

·simonwillison.net·Feb 6, 2025

S1: The $6 R1 Competitor?

DeepSeek Debates: Chinese Leadership On Cost, True Training Cost, Closed Model Margin Impacts

The DeepSeek Narrative Takes the World by Storm DeepSeek took the world by storm. For the last week, DeepSeek has been the only topic that anyone in the world wants to talk about. As it currently s…

#model training #politics #fine tuning

·semianalysis.com·Feb 3, 2025

DeepSeek Debates: Chinese Leadership On Cost, True Training Cost, Closed Model Margin Impacts

AI Markets Were Deceived To Believe In DeepSeek's Low Training Costs; They Are Actually 400 Times Higher Than The Reported Figure

The controversy around DeepSeek's costs for training their R1 model shook up the markets, but it seems like there was a lot of deception.

#security #model training #politics

·wccftech.com·Feb 3, 2025

AI Markets Were Deceived To Believe In DeepSeek's Low Training Costs; They Are Actually 400 Times Higher Than The Reported Figure

Is MLX the best Fine Tuning Framework?

🚀 Want to fine-tune AI models on your Mac without cloud services? As an ex-Ollama developer, I'll show you how to use Apple's MLX framework to fine-tune mod...

#apple #local model #model training #fine tuning #mac #m1 #macos

·youtube.com·Jan 18, 2025

Is MLX the best Fine Tuning Framework?

4. The Ollama Course - Using the CLI

Welcome back to the Ollama course! In this video, we dive deep into the command line interface (CLI) of Ollama, exploring all the powerful options and comman...

#cli #model training

·youtube.com·Dec 31, 2024

4. The Ollama Course - Using the CLI

AI Hallucinations: Why Large Language Models Make Things Up (And How to Fix It) - kapa.ai - Instant AI answers to technical questions

Kapa.ai turns your knowledge base into a reliable and production-ready LLM-powered AI assistant that answers technical questions instantly. Trusted by 100+ startups and enterprises incl. OpenAI, Docker, Mapbox, Mixpanel and NextJS.

#model training #safety

·kapa.ai·Dec 6, 2024

AI Hallucinations: Why Large Language Models Make Things Up (And How to Fix It) - kapa.ai - Instant AI answers to technical questions

Building LLMs is probably not going be a brilliant business

The Netscapes of AI

#model training

·calpaterson.com·Nov 30, 2024

Building LLMs is probably not going be a brilliant business

Blaizzy/mlx-vlm: MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.

MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX. - Blaizzy/mlx-vlm

#model training #fine tuning #mac #vision

·github.com·Nov 26, 2024

Blaizzy/mlx-vlm: MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.

2-fly-4-ai/V0-system-prompt

Contribute to 2-fly-4-ai/V0-system-prompt development by creating an account on GitHub.

#prompt #model training #code

·github.com·Nov 26, 2024

2-fly-4-ai/V0-system-prompt

exo-explore/exo: Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚

Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚ - exo-explore/exo

#model training #local model #mac

·github.com·Nov 26, 2024

exo-explore/exo: Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚

Everything I've learned so far about running local LLMs

#model training #local model

·nullprogram.com·Nov 11, 2024

Everything I've learned so far about running local LLMs

Everything I’ve learned so far about running local LLMs

Chris Wellons shares detailed notes on his experience running local LLMs on Windows - though most of these tips apply to other operating systems as well. This is great, there's …

#model training

·simonwillison.net·Nov 11, 2024

Everything I’ve learned so far about running local LLMs

Creating a LLM-as-a-Judge That Drives Business Results –

A step-by-step guide with my learnings from 30+ AI implementations.

#model training #agent

·hamel.dev·Nov 2, 2024

Creating a LLM-as-a-Judge That Drives Business Results –

Creating a LLM-as-a-Judge that drives business results

Hamel Husain's sequel to [Your AI product needs evals](https://hamel.dev/blog/posts/evals/). This is _packed_ with hard-won actionable advice. Hamel warns against using scores on a 1-5 scale, instead promoting an alternative he …

#model training #fine tuning #agent

·simonwillison.net·Nov 2, 2024

Creating a LLM-as-a-Judge that drives business results

You Should Probably Pay Attention to Tokenizers

Last week I was helping a friend of mine to get one of his new apps off the ground. I can’t speak much about it at the moment, other than like most apps nowadays it has some AI sprinkled over …

#RAG #model training

·cybernetist.com·Oct 24, 2024

You Should Probably Pay Attention to Tokenizers

LLMs don’t do formal reasoning - and that is a HUGE problem

Important new study from Apple

#safety #model training #apple

·garymarcus.substack.com·Oct 12, 2024

LLMs don’t do formal reasoning - and that is a HUGE problem

Training great LLMs entirely from ground up in the wilderness as a startup — Yi Tay

Chronicles of training strong LLMs from scratch in the wild

#model training

·yitay.net·Oct 9, 2024

Training great LLMs entirely from ground up in the wilderness as a startup — Yi Tay

MIT spinoff Liquid debuts non-transformer AI models and they’re already state-of-the-art

The startup from MIT's CSAIL says its Liquid Foundation Models have smaller memory needs thanks to a post-transformer architecture.

#transformers #model training

·venturebeat.com·Oct 2, 2024

MIT spinoff Liquid debuts non-transformer AI models and they’re already state-of-the-art

What We Learned from a Year of Building with LLMs (Part I)

#model training #fine tuning #learn

·oreilly.com·Sep 29, 2024

What We Learned from a Year of Building with LLMs (Part I)

EvolutionaryScale

ESM3. Enabling scientists to understand, imagine, and create proteins.

#science #compchem #biologics #model training

·evolutionaryscale.ai·Sep 23, 2024

EvolutionaryScale

txtai

txtai is an all-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

#database #model training

·neuml.github.io·Jul 22, 2024

txtai

Florence 2 - The Best Small VLM Out There?

There is a new VLM on the scene and it comes with a dataset of 5Billion labels. The new model can do a variety of old world tasks like bounding boxes and segmentation along with newer LLM style captioning etc. Paper: https://arxiv.org/pdf/2311.06242 HF Spaces Demo: https://huggingface.co/spaces/gokaygokay/Florence-2 Colab : https://drp.li/fGyMm 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: https://drp.li/dIMes 👨‍💻Github: https://github.com/samwit/langchain-tutorials (updated) https://github.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00 Intro 00:13 Florence-2 Paper 02:19 Florence - 2 Architecture 03:20 Florence - 2 Detailed Image Captioning 03:41 Florence - 2 Visual Grounding 04:09 Florence - 2 Dense Region Caption 04:24 Florence - 2 Open Vocab Detection 06:01 Hugging Face Spaces Demo 10:41 Colab Florence - 2 Large Sample Usage

#model training

·youtube.com·Jun 25, 2024

Florence 2 - The Best Small VLM Out There?

Training is not the same as chatting: ChatGPT and other LLMs don’t remember everything you say

I’m beginning to suspect that one of the most common misconceptions about LLMs such as ChatGPT involves how “training” works. A common complaint I see about these tools is that …

#model training #prompt

·simonwillison.net·May 29, 2024

Training is not the same as chatting: ChatGPT and other LLMs don’t remember everything you say