From shiny object to sober reality: The vector database story, two years later
RetrievalTutorials/tutorials/LevelsOfTextSplitting/5_Levels_Of_Text_Splitting.ipynb at main · FullStackRetrieval-com/RetrievalTutorials
Contribute to FullStackRetrieval-com/RetrievalTutorials development by creating an account on GitHub.
Evaluating Chunking Strategies for Retrieval | Chroma Research
Vector Search RAG Tutorial – Combine Your Data with LLMs with Advanced Search
Learn how to use vector search and embeddings to easily combine your data with large language models like GPT-4. You will first learn the concepts and then create three projects.
✏️ Course developed by Beau Carnes.
💻 Code: https://github.com/beaucarnes/vector-search-tutorial
🔗 Access MongoDB Atlas: https://cloud.mongodb.com/
🏗️ MongoDB provided a grant to make this course possible.
⭐️ Contents ⭐️
⌨️ (00:00) Introduction
⌨️ (01:18) What are vector embeddings?
⌨️ (02:39) What is vector search?
⌨️ (03:40) MongoDB Atlas vector search
⌨️ (04:30) Project 1: Semantic search for movie database
⌨️ (32:55) Project 2: RAG with Atlas Vector Search, LangChain, OpenAI
⌨️ (54:36) Project 3: Chatbot connected to your documentation
🎉 Thanks to our Champion and Sponsor supporters:
👾 davthecoder
👾 jedi-or-sith
👾 南宮千影
👾 Agustín Kussrow
👾 Nattira Maneerat
👾 Heather Wcislo
👾 Serhiy Kalinets
👾 Justin Hual
👾 Otis Morgan
👾 Oscar Rahnama
--
Learn to code for free and get a developer job: https://www.freecodecamp.org
Read hundreds of articles on programming: https://freecodecamp.org/news
❤️ Support for this channel comes from our friends at Scrimba – the coding platform that's reinvented interactive learning: https://scrimba.com/freecodecamp
An Intro to RAG with sqlite-vec & llamafile!
A brief introduction to using llamafile (a single-file tool for working with large language models) and sqlite-vec (A SQLite extension for vector search) to build a Retrival Augmentation Generation (RAG) application.
This was a live online event hosted on Dec 17th 2024 in the Mozilla AI Discord, join us for the next event at at https://discord.gg/Ve7WeCJFXk
LINKS:
- Doc w/ links to all mentioned projects/blog posts: https://docs.google.com/document/d/17GYLzlGUyJF9EDeaa1P-dFFZnkwxATnBcg5KnNgpvPE/edit?usp=sharing
- Slides: https://docs.google.com/presentation/d/14Szda-VnZzepL-1U9Nb7sXQg_TTf56OQ-KtUIMQ5xug/edit?usp=sharing
Qwen 3 Embeddings & Rerankers
In this video I look at the new release from Qwen of their new Embedding and Reranking models which are start of the art and most importantly open weights mo...
asg017/sqlite-vec: A vector search SQLite extension that runs anywhere!
A vector search SQLite extension that runs anywhere! - asg017/sqlite-vec
How sqlite-vec Works for Storing and Querying Vector Embeddings
Learn how `sqlite-vec` turns SQLite into a fast, embedded vector search engine. With support for float32, int8, and bit vectors, optimized distance metrics, and native SQL integration, it's ideal for offline AI, semantic search, and lightweight ML apps. This post walks through how it works and why it's surprisingly powerful.
Finding the Best Open-Source Embedding Model for RAG
Looking for the best open-source embedding model for your RAG app? We share a comparison workflow so you can stop paying the OpenAI tax.
How to improve the local LLM connected to Zotero for stunning results. So easy even I can do it.
Learn how to make simple changes that help your LLM chat with Zotero like a pro! I’m getting well written, well-cited results from a 2b parameter LLM.
Please Like and Subscribe to support the channel! @LearnMetaAnalysis
Embedding result testing: https://docs.google.com/spreadsheets/d/1P3rOLEO_NtCUYxaFIVaVZfMv4BOkQb3w/edit?usp=sharing&ouid=111617079417577058774&rtpof=true&sd=true
Granite 3.1 Dense is my favorite LLM for this setup right now, it's available in 2b and 8b versions for ollama - https://ollama.com/library/granite3.1-dense:2b
Snowflake Arctic Embed 2 has performed well for me so far as an embedding model: https://ollama.com/library/snowflake-arctic-embed2
MTEB leaderboard to see what embedding models perform well at different tasks: https://huggingface.co/spaces/mteb/leaderboard
How to connect a LLM to Zotero - https://youtu.be/b2BSZfOtD_w
I generally prefer local, private LLMs, but if you need large SOTA models like ChatGPT, Claude, Deepseek, Gemini, or Grok, check out ChatLLM - My 3 month review of ChatLLM: https://youtu.be/_Z3nLKvTbGc
Tutorials and how-to guides:
Conventional meta-analysis: https://www.youtube.com/playlist?list=PLXa5cTEormkEbYpBIgikgE0y9QR7QIgzs
Three-level meta-analysis: https://www.youtube.com/playlist?list=PLXa5cTEormkHwRmu_TJXa7fSb6-WBXXoJ
Three-level meta-analysis with correlated and hierarchical effects and robust variance estimation: https://www.youtube.com/playlist?list=PLXa5cTEormkEGenfcnp9X5dQUhmm7f9Jp
Want free point and click (no coding required) meta-analysis software? Check out Simple Meta-Analysis: https://learnmeta-analysis.com/pages/simple-meta-analysis-software
Tired of manually extracting data for systematic review and meta-analysis? Check out AI-Assisted Data Extraction, a free package for R! https://youtu.be/HuWXbe7hgFc
Free ebook on meta-analysis in R (no download required): https://noah-schroeder.github.io/reviewbook/
Visit our website at https://learnmeta-analysis.com/
0:15 Knowledge
0:59 Help make this better
1:32 Modify ‘knowledge’ settings
5:46 Demo of results
7:22 Top K
11:25 Testing Different embeddings
13:25 Use # not models
14:45 Impatient people (like me!) start here
21:38 Example Results
Introducing Contextual Retrieval
Here's an interesting new embedding/RAG technique, described by Anthropic but it should work for any embedding model against any other LLM. One of the big challenges in implementing semantic search …