log(v)

back
Need help setting up snorkel?
Please send an email to okay.zed at gmail and I'll be more than glad to help you get setup with an instance and answer any questions about your use case and requirements.
Happy snorkeling!

In this post, we will talk about the differences between Scuba and Snorkel, two realtime data analysis tools. If you want to read more about OLAPs and Time Series DBs or see a list of OLAP engines, try this page

Overview

Scuba is a realtime log analysis tool used at Facebook. It’s both a UI for building queries and a distributed online analytical processing (OLAP) query engine. With a fleet of servers at its disposal, Scuba holds billions of records in RAM and runs simple aggregation queries in under a second. Today, Scuba is used along-side ODS (a time series database) to monitor and maintain Facebook’s infrastructure.

In general, Scuba trades accuracy and consistency for speed: its primary purpose is as a diagnosis and debugging tool for infrastructure. When people want to monitor an existing metric, they use their TSDB. When they want to explore, debug or diagnose, they use Scuba.

Unfortunately for us, Scuba is not available outside Facebook. That’s where Snorkel comes in. Snorkel is conceptually the little sister to Scuba. The major difference is that Snorkel is 1) free, 2) available and 3) smaller purpose.

Snorkel’s backend, Sybil, runs on a single machine and with much smaller datasets than Scuba’s backend. Sybil stores data on disk (instead of in memory) and is generally good for queries on up to 10M rows on commodity hardware (30M rows on server class hardware) vs the billions that Scuba can do.

Aside from their scale difference, Scuba and Snorkel are similar in many ways: they allow for ad-hoc queries, they perform full table scans, they do not require up front schema definitions and they cap table size by memory and time limits. Together, these particular features make for a compelling backend for digging through instrumentation.

Feature Comparison Matrix

Storage Features Scuba Snorkel TSDB OLAP
Requires Schemas       X
Distributed X Planned Commercial Commercial
Column Store X X   Optional
Tabular Data X X   X
Indices     X X
Append only data X X X  
Max Insertion 1M+/s 1K/s 10+K/s 1M+/s
Mem Capped Tables X X X  
Query Features Scuba Snorkel TSDB OLAP
SQL Support X     X
JOIN Queries       X
Parallel Query Engine X X    
Table Queries X X   X
Time-Series Queries X X X X
Distribution Queries X X   X
Samples Queries X X   X
Feasible Table Scans 1B+ 10 - 30M N/A 1B+M
Frontend Features Scuba Snorkel TSDB OLAP
Time Controls X X X ?
Filter Controls X X   ?
Table View X X   ?
Time View X X X ?
Composable Time Series     X ?
Dist. View X X   ?
Graph View X X   ?
Scatter Plot X X   ?
Sankey X     ?
Custom Views by Dataset X X   ?
Key typeaheads X X X ?
Value typeaheads X     ?
Time Comparison X X X ?
Filter Comparison X X   ?
Dashboarding X External External ?

CHANGELOG

2017-11-30

2017-06-17