QLever and the associated Sparqloscope benchmark
Since I'm not at #ISWC2025, it's more easy for me to speak up. There are ginormous issues with QLever and the associated Sparqloscope benchmark by Hannah Bast and colleagues.
The main results table already shows something that's too good to be true. And while I'm sure that table is technically true, the tacit implication that this table has any bearing on real-world performance, is false.
QLever is faster than the state of the art… at COUNTing. That's it. QLever can count faster. The implication is that this would mean QLever can also produce results faster. Yet we have zero reason to assume it can—until there's proof.
In the real world, query engines rarely compute all results at once. They stream those results. The Sparqloscope benchmark is designed to trick established query engines into actually producing the result set and counting items. And you know what? Sometimes, the established engines are even faster at that than QLever, which seems to be purposefully designed to count fast. Yes—I'm sure QLever is a fast counter. But what on earth does that have to do with real-world streaming query performance? And did I mention that Virtuoso supports SPARQL UPDATE?
How can you tell, just from the table? Well, Virtuoso is faster than QLever for just about anything that doesn't rely on pure counting. QLever does “Regex: prefix” or “Filter: English literals” in the ridiculously fast 0.01s? The only rational explanation is that it has a great structure for specifically this kind of counting (again, not results, just count). But Virtuoso is faster for “strbefore”? Well, there you see the real QLever performance when it cannot just count. And only one of those strategies has impact on the real world.
So what if a query engine can count faster than any other to 65,099,859,287 (real result BTW). Call me when you can produce 65,099,859,287 results faster, then we'll have something to talk about.
In the first place, it's a major failure of peer review that a benchmark based on COUNT was accepted. And I'd be very happy to be proven wrong: let's release the benchmark results for all engines, but without COUNT this time. Then we'll continue the conversation.
https://lnkd.in/eT5XrR2k | 19 comments on LinkedIn
QLever and the associated Sparqloscope benchmark