Found 11 bookmarks
Newest
RAFT: A new way to teach LLMs to be better at RAG
RAFT: A new way to teach LLMs to be better at RAG
In this article, we will look at the limitations of RAG and domain-specific Fine-tuning to adapt LLMs to existing knowledge and how a team of UC Berkeley..
·techcommunity.microsoft.com·
RAFT: A new way to teach LLMs to be better at RAG
Fast Llama 2 on CPUs With Sparse Fine-Tuning and DeepSparse - Neural Magic
Fast Llama 2 on CPUs With Sparse Fine-Tuning and DeepSparse - Neural Magic
Key Takeaways We expanded our Sparse Fine-Tuning research results to include Llama 2. The results include 60% sparsity with INT8 quantization and no drop in accuracy. DeepSparse now supports accelerated inference of sparse-quantized Llama 2 models, with inference speeds 6-8x faster over the baseline at 60-80% sparsity. We used some interesting algorithmic techniques in order
·neuralmagic.com·
Fast Llama 2 on CPUs With Sparse Fine-Tuning and DeepSparse - Neural Magic