logv

Overview
Feature Comparison Matrix
CHANGELOG

In this post, we will talk about the differences between Scuba and Snorkel, two realtime data analysis tools. If you want to read more about OLAPs and Time Series DBs or see a list of OLAP engines, try this page

Overview

Scuba is a realtime log analysis tool used at Facebook. It’s both a UI for building queries and a distributed online analytical processing (OLAP) query engine. With a fleet of servers at its disposal, Scuba holds billions of records in RAM and runs simple aggregation queries in under a second. Today, Scuba is used along-side ODS (a time series database) to monitor and maintain Facebook’s infrastructure.

In general, Scuba trades accuracy and consistency for speed: its primary purpose is as a diagnosis and debugging tool for infrastructure. When people want to monitor an existing metric, they use their TSDB. When they want to explore, debug or diagnose, they use Scuba.

Unfortunately for us, Scuba is not available outside Facebook. That’s where Snorkel comes in. Snorkel is conceptually the little sister to Scuba. The major difference is that Snorkel is 1) free, 2) available and 3) smaller purpose.

Snorkel’s backend, Sybil, runs on a single machine and with much smaller datasets than Scuba’s backend. Sybil stores data on disk (instead of in memory) and is generally good for queries on up to 10M rows on commodity hardware (30M rows on server class hardware) vs the billions that Scuba can do.

Aside from their scale difference, Scuba and Snorkel are similar in many ways: they allow for ad-hoc queries, they perform full table scans, they do not require up front schema definitions and they cap table size by memory and time limits. Together, these particular features make for a compelling backend for digging through instrumentation.

Feature Comparison Matrix

Storage Features	Scuba	Snorkel	TSDB	OLAP
Requires Schemas				X
Distributed	X	Planned	Commercial	Commercial
Column Store	X	X		Optional
Tabular Data	X	X		X
Indices			X	X
Append only data	X	X	X
Max Insertion	1M+/s	1K/s	10+K/s	1M+/s
Mem Capped Tables	X	X	X
Query Features	Scuba	Snorkel	TSDB	OLAP
SQL Support	X			X
JOIN Queries				X
Parallel Query Engine	X	X
Table Queries	X	X		X
Time-Series Queries	X	X	X	X
Distribution Queries	X	X		X
Samples Queries	X	X		X
Feasible Table Scans	1B+	10 - 30M	N/A	1B+M
Frontend Features	Scuba	Snorkel	TSDB	OLAP
Time Controls	X	X	X	?
Filter Controls	X	X		?
Table View	X	X		?
Time View	X	X	X	?
Composable Time Series			X	?
Dist. View	X	X		?
Graph View	X	X		?
Scatter Plot	X	X		?
Sankey	X			?
Custom Views by Dataset	X	X		?
Key typeaheads	X	X	X	?
Value typeaheads	X			?
Time Comparison	X	X	X	?
Filter Comparison	X	X		?
Dashboarding	X	External	External	?

CHANGELOG

2017-11-30

Add note about distributed query work being planned
Add note about support for custom views per dataset

2017-06-17

First write up.
Add comparison table
Add intro paragraphs

log(v)

Overview

Feature Comparison Matrix

CHANGELOG

2017-11-30

2017-06-17