log(v)

back
Need help setting up snorkel?
Please send an email to okay.zed at gmail and I'll be more than glad to help you get setup with an instance and answer any questions about your use case and requirements.
Happy snorkeling!

This post details the transition of snorkel from nodejs to python.

Background

Snorkel started in 2013 as a UI for aggregating schemaless instrumentation. At the time, mongo was chosen to hold data since it was schemaless and had an aggregation pipeline. Many design decisions were made around using mongo (and later postgres) before I started working on Sybil in 2016.

I built Sybil to overcome some of the limitations of mongo: Sybil is faster than mongo and uses less disk space because it stores immutable data and uses columnar storage. In addition, the Sybil query model is parallelized vs. the single thread used in mongo. For these reasons, a Sybil dataset can have 5 - 10x or more space savings while still out performing mongo’s aggregation framework.

After several years, Sybil has become my primary datastore and preferred way of storing instrumentation, but Snorkel still had baggage from mongo and nodejs.

PS: Read more about the history of Snorkel and Sybil here

Motivation

There are several reasons to move to python, but the main ones are:

The python app (snorkel-lite) is built on top of flask with the intent of minimizing external dependencies. To keep things simple, saved queries and session data is stored in Sqlite instead of leveldb or mongo.

The transition from javascript to python took about 6 months and there were some tradeoffs involved, but on the whole I’m satisfied with the python app.

Advantages of nodejs:

Advantages of python:

Status

Part of the initial goals of snorkel-lite was to re-use as much code as possible from Snorkel while simplifying and removing extra code that was unecessary. To do so, I built a component framework for flask that supported the old Snorkel components and was able to re-use the views from Snorkel with minimal effort. Luckily, this worked out pretty well and the components from Snorkel.js were usable in snorkel-lite.

Timeline

There was an initial burst of activity in september 2018 where I ported the main views of Snorkel to python: table, time, dist and samples as a proof of concept. By November 2018, I was using snorkel-lite full-time on my local data.

Between Dec 2018 and January 2019, the Alternative and Advanced views were ported over. It was relatively easy to port them. Google auth and RBAC controls were added in January 2019.

In Feb 2019, I transitioned all my servers to using snorkel-lite and pointed my grafana dashboards at my Snorkel lite instances. Additionally, the snorkel-lite package was built and released to pypi.

In March 2019, I started redirecting requests from snorkel.logv.org to slite.logv.org. The snorkel-lite package is now bundled with sybil and several helper binaries for easily ingesting and querying from the CLI.

Future work

Some remaining work to be done for snorkel-lite is:

Despite the large amount of work left, I’m confident in snorkel-lite’s codebase and usefulness, especially the getting started portion.

Changelog

03-29-2019