Scaling Rufus, the Amazon generative AI-powered conversational shopping assistant with over 80,000 AWS Inferentia and AWS Trainium chips, for Prime Day | Amazon Web Services
Groq
Inference with Reference: Lossless Acceleration of Large Language Models
Pinecone - 🤗 Inference Endpoints case study