On the Diagram of Thought
Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
View PDF
Scaling the Tülu 3 post-training recipes to surpass the performance of DeepSeek V3 | Ai2
Alibaba releases AI model it says surpasses DeepSeek
DeepSeek
Training Large Language Models to Reason in a Continuous Latent Space
LLMs Can Plan Only If We Tell Them
View PDF
Wolfram Blog: News, Views and Insights from Wolfram
Evolving Deeper LLM Thinking
View PDF
NeurIPS Poster Large Language Models' Expert-level Global History Knowledge Benchmark (HiST-LLM)
How is Google using AI for internal code migrations?
View PDF
Titans: Learning to Memorize at Test Time
Meta’s new AI model can translate speech from more than 100 languages
Alibaba slashes prices on large language models by up to 85% as China AI rivalry heats up
DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch
Could Savannah be the Next San Jose? The Downstream Effects of Large Language Models
What just happened
12 Days of OpenAI | OpenAI
SciAgents: Automating Scientific Discovery Through Bioinspired Multi‐Agent Intelligent Graph Reasoning
FACTS Grounding: A new benchmark for evaluating the factuality of large language models
Brilliant talk by , but he's wrong on one point.
AI's Data Dilemma
If You Can't Use Them, Recycle Them: Optimizing Merging at...
View PDF
The Surprising Effectiveness of Test-Time Training for Abstract Reasoning
(10) I asked AI chatbots to analyze “Alice in Wonderland” | LinkedIn
Evaluating Character Understanding of Large Language Models via...
Large Language Models Fall Short: Understanding Complex...
View PDF
Decoding LLMs: How to be visible in generative AI search results
Asai, A. and others. (2024). OPENSCHOLAR: SYNTHESIZING SCIENTIFIC LITERATURE WITH RETRIEVAL-AUGMENTED LMS.
DeepSeek