Need help setting up snorkel?
Please send an email to okay.zed at gmail and I'll be more than glad to help you get setup with an instance and answer any questions about your use case and requirements.
Happy snorkeling!

Read the paper for query caching here


In late 2017, an optional intermediate result query cache was added to sybil that allows the re-use of per-block results between queries, speeding up repeated queries from 2 - 10x even if the timestamps in the query predicate change.

In 2018, we wrote a simple paper describing how sybil’s query caching works and why it is advantageous. As part of writing this paper, we implemented a fake data modeler that generates realistic data for testing against.

Short Summary

A typical problem with caching results in databases is that the result cache can not accomodate queries that have dynamic timestamps in them. For example, a query that looks at the last week of data will not be cached for the next time it is run because the timestamp in the predicate will change. This problem exists in most all modern databases that implement caching.

This is particularly disastrous when it comes to dashboards. To work around this weakness, sybil implements a per block query cache and eliminates any predicates that are true for all records in that block which allows sybil to re-use results between queries, even if the timestamps in the queries change.

The downside is that the final results are not cached, only the intermediate per-block results - meaning that the merging phase still needs to happen. For query results with high cardinality the merging of intermediate results can negate the effect of caching, but for most queries, caching will yield large speed ups.

For full details and results, please read the query caching paper.


December 15 2018