logv

Background
- Motivation
Status
- Timeline
- Future work
Changelog
- 03-29-2019

This post details the transition of snorkel from nodejs to python.

Background

Snorkel started in 2013 as a UI for aggregating schemaless instrumentation. At the time, mongo was chosen to hold data since it was schemaless and had an aggregation pipeline. Many design decisions were made around using mongo (and later postgres) before I started working on Sybil in 2016.

I built Sybil to overcome some of the limitations of mongo: Sybil is faster than mongo and uses less disk space because it stores immutable data and uses columnar storage. In addition, the Sybil query model is parallelized vs. the single thread used in mongo. For these reasons, a Sybil dataset can have 5 - 10x or more space savings while still out performing mongo’s aggregation framework.

After several years, Sybil has become my primary datastore and preferred way of storing instrumentation, but Snorkel still had baggage from mongo and nodejs.

Motivation

There are several reasons to move to python, but the main ones are:

less mongo baggage: mongo was originally used to store data, config and sessions
the nodejs ecosystem has too much churn for me
security vulnerabilities in nodejs packages
python is an easier language for backend devs to pick up and contribute to
deploying and distributing python apps is more stable than node apps
its easier to self-host python apps
I want snorkel to last for another 10 years, I’m not confident that a nodejs app built today would be installable in 10 years

The python app (snorkel-lite) is built on top of flask with the intent of minimizing external dependencies. To keep things simple, saved queries and session data is stored in Sqlite instead of leveldb or mongo.

The transition from javascript to python took about 6 months and there were some tradeoffs involved, but on the whole I’m satisfied with the python app.

Advantages of nodejs:

it’s easier to use sockets in nodejs than in python
code can be shared between server and client
asynchronous execution model

Advantages of python:

the ecosystem is more mature
its simpler to follow and understand code
classes! ES6 does have classes and modules, but they are more natural in python
it’s easy to deploy packages to pip and plugins are easier to write

Status

Part of the initial goals of snorkel-lite was to re-use as much code as possible from Snorkel while simplifying and removing extra code that was unecessary. To do so, I built a component framework for flask that supported the old Snorkel components and was able to re-use the views from Snorkel with minimal effort. Luckily, this worked out pretty well and the components from Snorkel.js were usable in snorkel-lite.

Timeline

There was an initial burst of activity in september 2018 where I ported the main views of Snorkel to python: table, time, dist and samples as a proof of concept. By November 2018, I was using snorkel-lite full-time on my local data.

Between Dec 2018 and January 2019, the Alternative and Advanced views were ported over. It was relatively easy to port them. Google auth and RBAC controls were added in January 2019.

In Feb 2019, I transitioned all my servers to using snorkel-lite and pointed my grafana dashboards at my Snorkel lite instances. Additionally, the snorkel-lite package was built and released to pypi.

In March 2019, I started redirecting requests from snorkel.logv.org to slite.logv.org. The snorkel-lite package is now bundled with sybil and several helper binaries for easily ingesting and querying from the CLI.

Future work

Some remaining work to be done for snorkel-lite is:

continue adding polish and refining the interactions
adding RSS feeds
create better dataset presenter configs
port Map view over
writing UI tests

Despite the large amount of work left, I’m confident in snorkel-lite’s codebase and usefulness, especially the getting started portion.

Changelog

03-29-2019

Initial write up
Timeline of work
Background and motivation section
Future work