Commercial LLMs like gpt-3.5-turbo and Claude are the best models to use for us right now. Nothing in the open source world comes close. However, this only means they’re the best of available options. They can take many seconds to produce a valid Honeycomb query, with latency ranging from two to 15+ seconds depending on the model, natural language input, size of the schema, makeup of the schema, and instructions in the prompt. As of this writing, although we have access to gpt-4’s API, it’s far too slow to work for our use case.