#data lasso

LIVE

TL;DR

Flux helped bring the complexity of Data Lasso down, replacing messy event bus structure. React helped make the UI more manageable and reduce code duplication. More below on our experience.

Flux

Data Lasso runs entirely in the browser. It is a somewhat complex app that has a rich UI and is highly interactive.

From the beginning, it used to rely on an event bus that tied different parts of the app together. But with new functionality being added, code complexity was increasing at a very high rate. Some bugs were hard to pinpoint to a source. Fixing others required more workarounds, what in the end led to more bugs.

This surfaced the underlying problem with event bus. While being flexible, it introduced too much complexity by itself, becoming a drag on the code.

Here is a simplified diagram of uploading a new dataset:

image

The dependencies that formed were vast - logic from one component was calling into several other (example) components. Something as simple as adding a new upload source was going to double the amount of event listeners and interdependencies.

Flux aims to solve a similar problem, so I decided to give it a try.

First of all, a bit on Flux. It is an application architecture for building interfaces, with it’s core principle being unidirectional data flow. I highly recommend looking through Facebook’s flux overview.

I like to think of Flux more as of a state of mind. You don’t have to use solutions like Redux to get started, it’s up to you on how you want to execute the pattern. That is what I did with Data Lasso - here are some of the key components:

  • Store + Dispatcher: In Data Lasso, Store is really just a single Backbone Model. Dispatcher, which is typically it’s own thing, is integrated into the Store. Actions are dispatched right on the Store, which is a “single source of truth”.
  • Actions: As Flux architecture goes, I am a big fan of having strict pre-defined actions, as well as a Reducer. From the standpoint of bringing clarity into the code, those two are great concepts. Data Lasso, however, is not that complex, so I opted for a humble switch statement on the Store that does the trick (here it’s in the code).

With that in mind, the diagram from before changes to this:

image

From the first glance, it’s not less complex. If anything, there is more entries. That’s not the point, however. The benefit is in having a more predictable logic. It’s more clear what is happening at more or less any point in time.

There are some other benefits:

  • Anything that can happen, happens in one place. It’s always nice to be able to glance at one file and get a complete picture
  • Race conditions are less likely, since everything is dispatched through a single point in the app

Overall, Flux pattern was a perfect match for Data Lasso. It really solved some of the pains of a highly dynamic application without adding unnecessary abstract conventions.

React

React was a more straightforward change. Besides the fact that React’s way of doing things matches well to a unidirectional data flow, it was a much nicer view layer to use, compared to Backbone Views.

Some advantages:

  • Components! Having few reusable components made a ton of difference, improving consistency and reducing code duplication.
  • Event binding made the UI easier to comprehend and maintain.

While animations took some trial and error to figure out, at the end of the day React was a great improvement, and maybe most of all - felt like a natural next step.


Further reading

  • Pull Request that implemented Flux in Data Lasso. (Did we mention that Data Lasso is Open Source?)
  • Flux Overview - video is exceptionally helpful and we would recommend you watch it!
  • React - while it’s necessary to maintain a healthy level of skepticism towards new technologies that come and go so frequently, React proved a new paradigm of thinking and established a solid solution to a painful problem.

Data Lasso, Tumblr’s three-dimensional visualization tool, just got a serious upgrade. Along with a version bump to 2.x, Data Lasso now has some handy new features (as well as completely reworked internals). A GIF is worth a thousand words:

Quick refresher: Data Lasso is a visualization tool that Tumblr built that allows us to look at large multi-dimensional data sets quickly. If you haven’t tried it yet, check out the hosted version here.

New stuff

  • Data Lasso is built on the premise of being able to quickly visualize data and select a subset of interest, using a lasso-like tool. That tool just became much more flexible. Now, you will be able to make complex selections by adding and subtracting from an existing selection - much like the tools that you are already used to, if you work with image editing programs. Hold your shift key to add, option/alt to subtract.
  • Now, you can also upload datasets using a URL, without needing to download them. Same rules apply - it can be any .csv,.tsvor.json, as long as it’s properly formatted. That will come in handy if you are using data lasso with public datasets that are available online, or if you are working with systems like Hive that provide a link to your query results.

Reworked Internals

A lot was changed under the 3 dimensional hood of Data Lasso.

  • Architecture now follows principles of Flux (a fitting approach for a complex front-end application like Data Lasso) and its interface is now powered by React. These two things help to reduce the complexity a lot. More on moving to Flux + React in a blog post to follow.
  • The build process was moved to Webpack and was simplified a lot. Webpack loaders also allowed us to have .hlsl files in the codebase for the first time - so we no longer had to rely on workarounds to include the vertex and fragment shaders that Data Lasso relies on for utilizing GPU.

It won’t be a major version bump, of course, if it did not contain backwards incompatible changes. With a move to Flux, the event bus was deprecated. So if you are using Data Lasso inside your app and rely on events for interacting with it, you will have to switch to using Store and Dispatcher instead. It is good in the long term - as it provides so much more clarity into what’s going on inside Data Lasso.

That should be it! Overall, 2.0 is a solid release that adds new fundamental functionality, while allowing for future work to go smoother. As usual, if you encounter a problem - open an issue on the repository.

loading