Need help setting up snorkel?
Please send an email to okay.zed at gmail and I'll be more than glad to help you get setup with an instance and answer any questions about your use case and requirements.
Happy snorkeling!


This page contains a changelog and rough feature timeline for sybil and snorkel changes.
A longer form history of the project can be found here




  • ported all views (except map view) from snorkel to snorkel.lite
  • re-tested all views from snorkel.lite
  • packaged snorkel.lite into a pip installable package
  • pushed snorkel.lite to github
  • migrated all servers to use snorkel.lite
  • implemented RBAC auth and google oauth as auth schemes
  • add randomized ingestion to msybil



  • working on python version of snorkel - known as snorkel.lite for now - moving away from nodejs version, due to ecosystem concerns


  • audited NPM dependencies in snorkel: fixed most issues, but some low and medium pri remain
  • looked at typescript for snorkel, did not go forward with it


Q4 (0.5.0)

  • add msybil binary for distributed queries
  • Add map view based on datamaps

Q3 (0.4.0)

  • add new input for selecting multiple aggregate metrics per column
  • add support for sybil’s nested histograms
  • fix table popover related bugs
  • add regex escaping to table popover based filters
  • updating dist view with focus charts (zoomable line charts)
  • add initial plugin support for views, dataset presenters and snorkel config

Q2 (0.1.0)

  • quick select custom time filters from time series graphs
  • update multi-seasonal forecasting model to work with sparse data
  • add access controls using RBAC rules (see config.js and users.rbac)
  • turn the snorkel landing page off by default
  • better nvd3 tooltips

Q1 (0.0.20)

  • packaging for snapcraft and app image
  • fix mongo bugs (thanks adhil!)
  • update repository link (thanks bruce!)


Q4 (0.0.17)

  • add custom time inputs for all views
  • add reversable axes to scatter plot
  • use summed line charts instead of area charts
  • switch to nvd3 for primary plotting
  • add time series forecasting
  • add weco process control view
  • fix chosen selector bug

Q2 (0.0.16)

  • add dataset listing page
  • switch default config engine to linvo db (levelup based)
  • add directed graph view
  • upgrade chosen plugin to 1.6
  • polishing distribution view
  • updating screenshots
  • remove adhoc CSV upload feature
  • further upgrades to work with sybil
  • update to flatly theme for bootstrap
  • fix bootstrap popovers in different views

Q1 (0.0.11)

  • upgrade to support any mongo or postgres database using postgres_raw and mongo_raw backend adapters
  • add support for grafana dashboarding via custom endpoint
  • remove old dashboarding code in favor of grafana
  • initial support for sybil: a backend built for instrumentation
  • Update to using google oauth instead of google open ID
  • update dataset settings page a bit
  • add snorkel tour page



  • add optional unauthenticated API endpoint
  • all query data is downloadable as CSV or JSON
  • custom column formatter
  • add query history browser
  • add per dataset RSS feeds overlay for timeseries
  • add new views, including bar graph view and multi dist
  • add simple dashboarding (later to be replaced by grafana)
  • add dataset presenters
  • add google oauth based authentication
  • add dataset listing page
  • support mongo and postgres BSON


  • Initial release of snorkel: supports table, time series, distribution and samples view
  • All data is kept in mongo, including configuration




  • speculative work on calculated columns, tried out 3 expr evaluators
  • add t-digest for histogramming
  • remove megalinter from CI builds
  • investigating restoring session queries, thinking about how to build the snorkel UI for these


Q3 & Q4

  • (tmc) grpc server: can accept and run queries over GRPC
  • (tmc) benchmarks
  • (tmc) test suite upgrades

Q1 & Q2

  • added plaitpy data generators for testing fake data
  • wrote query caching paper


Q4 (0.5.0)

  • support for distributed & remote queries with msybil script
  • using loglogbeta for approximate count distinct queries
  • add a new intermediate result pruning stage during map / reduce
  • speeding up high cardinality string loading off disk

Q3 (0.4.0)

  • add query caching support
  • add nested histogram support
  • test out tdigest and hdr histogram, they don’t make the cut

Q2 (0.2.0)

  • add auto-compaction to ingest process
  • try out luajit, eventually discard it
  • add export to TSV option
  • support custom field delimiters for query flags

Q1 (0.1.0)

  • add support for reading gzipped files: any file can be gzipped
  • switch to using more helpful commit format


Q4 (0.0.8)

  • fixes for go’s test behavior: now tests fail properly

Q2 (0.0.7)

  • deliberately recycle memory (turn off GC) for speed
  • testing out session queries (since removed)
  • strengthening table lock recovery and multi-process behavior
  • add table trim command for pruning based on age or table size

Q1 (0.0.1)

  • sybil released as a JSON column store

Changelog Changelog