I’ve been struggling to articulate this idea, and maybe the problem is that it’s actually kind of simple once you put it out there, and there’s really no good reason to unpack a whole case for it once you put the thought on paper.
There is a new VLM on the scene and it comes with a dataset of 5Billion labels. The new model can do a variety of old world tasks like bounding boxes and segmentation along with newer LLM style captioning etc.
Paper: https://arxiv.org/pdf/2311.06242
HF Spaces Demo: https://huggingface.co/spaces/gokaygokay/Florence-2
Colab : https://drp.li/fGyMm
🕵️ Interested in building LLM Agents? Fill out the form below
Building LLM Agents Form: https://drp.li/dIMes
👨💻Github:
https://github.com/samwit/langchain-tutorials (updated)
https://github.com/samwit/llm-tutorials
⏱️Time Stamps:
00:00 Intro
00:13 Florence-2 Paper
02:19 Florence - 2 Architecture
03:20 Florence - 2 Detailed Image Captioning
03:41 Florence - 2 Visual Grounding
04:09 Florence - 2 Dense Region Caption
04:24 Florence - 2 Open Vocab Detection
06:01 Hugging Face Spaces Demo
10:41 Colab Florence - 2 Large Sample Usage
Amplified developers, automated development · Customize and optimize each component of your AI dev system · Accelerate your development with Continue · Fit
Delving into ChatGPT usage in academic writing through excess vocabulary
Recent large language models (LLMs) can generate and revise text with human-level performance, and have been widely commercialized in systems like ChatGPT. These models come with clear...
llama.ttf is "a font file which is also a large language model and an inference engine for that model". You can see it kick into action at [8m28s in this …
Building search-based RAG using Claude, Datasette and Val Town
Retrieval Augmented Generation (RAG) is a technique for adding extra “knowledge” to systems built on LLMs, allowing them to answer questions against custom information not included in their training data. …
A WIRED investigation shows that the AI-powered search startup Forbes has accused of stealing its content is surreptitiously scraping—and making things up out of thin air.
I gave a talk about accessing Large Language Models from the command-line last week as part of the Mastering LLMs: A Conference For Developers & Data Scientists six week long …
The Encyclopedia Project, or How to Know in the Age of AI
In an age when AI regurgitates the blather of meaningless content, seeking its audience in the attention marketplace, it's a small wonder that it is hard to tell what is really real anymore.
I believe in human creativity and don’t expect (or at all want) the robots to take over any time soon when it comes to making art in any medium. Every art has technical aspects, though, and i…
Earnest chats with objects are not so unusual. Mark “The Bird” Fidrych, the famed Detroit Tiger, used to stand on the pitching mound whispering to the baseball. Forky, the highly animate utensil from Toy Story 4, once posed deep questions about friendship to a ceramic mug. And many of us have made repeated queries of the Magic 8 Ball despite its limited set of randomly generated answers.
Jina AI provide a number of different AI-related platform products, including an excellent [family of embedding models](https://huggingface.co/collections/jinaai/jina-embeddings-v2-65708e3ec4993b8fb968e744), but one of their most instantly useful is Jina Reader, an API for …
Jina AI provide a number of different AI-related platform products, including an excellent [family of embedding models](https://huggingface.co/collections/jinaai/jina-embeddings-v2-65708e3ec4993b8fb968e744), but one of their most instantly useful is Jina Reader, an API for …