Wikipedia:WikiProject AI Cleanup/AI catchphrases - Wikipedia
Introducing Gemma 3n: The developer guide
Learn how to build with Gemma 3n, a mobile-first architecture, MatFormer technology, Per-Layer Embeddings, and new audio and vision encoders.
Introducing Gemma 3n: The developer guide
Extremely consequential new open weights model release from Google today: Multimodal by design: Gemma 3n natively supports image, audio, video, and text inputs and text outputs. Optimized for on-device: Engineered …
The Best Embedding Models for Information Retrieval in 2025 | DataStax
Learn how the latest and greatest embedding models stack up against each other, as well as against some open source competition.
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
GITHUB HUGGING FACE MODELSCOPE DISCORD
We release Qwen3 Embedding series, a new proprietary model of the Qwen model family. These models are specifically designed for text embedding, retrieval, and reranking tasks, built on the Qwen3 foundation model. Leveraging Qwen3’s robust multilingual text understanding capabilities, the series achieves state-of-the-art performance across multiple benchmarks for text embedding and reranking tasks. We have open-sourced this series of text embedding and reranking models under the Apache 2.
Qwen3 Embedding
New family of embedding models from Qwen, in three sizes: 0.6B, 4B, 8B - and two categories: Text Embedding and Text Reranking. The full collection can be browsed on Hugging …
AI-Powered Content Audits for Local News
How to responsibly use AI to help with understanding your coverage
GitHub - DocumindHQ/documind: Open-source platform for extracting structured data from documents using AI.
Open-source platform for extracting structured data from documents using AI. - DocumindHQ/documind
Docling
MIT licensed document extraction Python library from the Deep Search team at IBM, who released [Docling v2](https://ds4sd.github.io/docling/v2/#changes-in-docling-v2) on October 16th. Here's the [Docling Technical Report](https://arxiv.org/abs/2408.09869) paper from August, which provides …
Jaided AI - Distribute the benefits of AI to the world
GitHub - JaidedAI/EasyOCR: Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
socketteer/loom: Multiversal tree writing interface for human-AI collaboration
Multiversal tree writing interface for human-AI collaboration - socketteer/loom: Multiversal tree writing interface for human-AI collaboration
Langchain gpt-3.5-turbo models reads files - problem
I am making really simple (and for fun) LangChain project.
A model can read PDF file and I can then ask him questions about specific PDF file.
Everything works fine (this is working example)
from P...
microsoft/table-transformer: Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS ev...
How to turn Text into Features
A comprehensive guide into using NLP for Machine Learning
Ask Your PDF
Your gateway to dynamic, interactive, and intelligent conversations with any PDF document
Ubisoft Proudly Announces 'AI' Is Helping Write Dialogue
Ubisoft Ghostwriter is described by the company as 'an AI tool'
NLP+CSS 201 Tutorials
Tutorials for advanced natural language processing methods designed for computational social science research.
NER Powered Semantic Search in Python
Semantic search is a compelling technology allowing us to search using abstract concepts and meaning rather than relying on specific words. However, sometimes a simple keyword search can be just as valuable — especially if we know the exact wording of what we're searching for.
Pinecone allows you to pair semantic search with a basic keyword filter. If you know that the document you're looking for contains a specific word or set of words, you simply tell Pinecone to restrict the search to only include documents with those keywords.
We even support functionality for keyword search using sets of words with AND, OR, NOT logic.
In this video, we will explore these features through a start-to-finish example of basic keyword search in Pinecone.
🌲 Pinecone Docs Page:
https://www.pinecone.io/docs/examples/metadata-filtered-search/
🤖 70% Discount on the NLP With Transformers in Python course:
https://bit.ly/3DFvvY5
🎉 Subscribe for Article and Video Updates!
https://jamescalam.medium.com/subscribe
https://medium.com/@jamescalam/membership
👾 Discord:
https://discord.gg/c5QtDB9RAP
00:00 NER Powered Semantic Search
01:19 Dependencies and Hugging Face Datasets Prep
04:18 Creating NER Entities with Transformers
07:00 Creating Embeddings with Sentence Transformers
07:48 Using Pinecone Vector Database
11:33 Indexing the Full Medium Articles Dataset
15:09 Making Queries to Pinecone
17:01 Final Thoughts