An Overview of Jupyter Notebooks

Article Wednesday, July 18 2018

Data is an important part of today’s software engineering ecosystem. The data that is collected by an individual or a company is not worth much unless it can be shared, distributed, and analyzed. Without a standardized method of distributing the data, it becomes cumbersome for someone to view the data. There could be situations in which someone has to troubleshoot code, download and install libraries, or if the original data is corrupted, it yields the wrong results. What Project Jupyter aims to achieve is a way of packaging data so that it becomes easy to view and share.

In the web development world it is possible to share snippets of code, but those solutions often involve a third party service in order to even run it. Think of JSFiddle or CodePen. They are great at sharing code, but it’s difficult to share an entire project. What if an author of a data project could create a package that simply works without worrying about the end user being able to view the data?

A Jupyter Notebook is like a digital trapper keeper filled with everything an individual or organization needs to analyze data. This is a key feature of Jupyter. Each Notebook has not only the data contained within it, but also the necessary software libraries needed to view the data and code contained therein. Each Notebook is completely self-contained; anyone who accesses a Notebook can run it and view the data as intended. If the author so chooses, they can package Jupyter Notebooks in a way in which viewers don’t even have to download anything at all. The data can be hosted on a server and rendered out as HTML. What this means for the original author(s) of a Jupyter Notebook is that the code or data within a Notebook is perfectly preserved and ready to be viewed by anyone who has access. Users can even fork the Notebooks themselves so they can adjust the findings within the Notebook.

Jupyter has been developed from the ground up to be open source and completely flexible. It started as a project called IPython and was developed by Fernando Pérez. Jupyter became a spin off project from IPython that supported three core languages at first: Julia, Python, and R (which, combined, gives the name Jupyter). Now, Jupyter Notebooks can support programming language kernals like Ruby, Haskell, C#, Perl, PHP, Javascript, and purportedly 50 other languages. This makes Jupyter something of a wunderkind in the software engineering world. That it can support so many languages makes it a powerful tool for disseminating data. As mentioned before Jupyter Notebooks are distributed with all the necessary code and software libraries. The end user does not need to install anything on their computer to view the Notebook.

These kernels and more can be used with Jupyter Notebooks.

Once a Jupyter Notebook is created it can be packaged to be distributed in a wide variety of ways. The Notebook can be uploaded and rendered as an HTML page; it can be put on a GitHub to be downloaded later; the Notebook can be put on a private server; or the Jupyter Notebook can run locally for a presentation. The Jupyter Notebooks can be hosted on a cloud platform so the end user doesn’t have to install anything on their own computer. All the calculations can be performed remotely. Again, the sheer flexibility of Jupyter Notebooks is what makes it such a fascinating option for sharing data.

One of the most well-known uses of Jupyter Notebooks comes from winners of the 2017 Nobel Prize in Physics. Scientists Rainer Weiss, Barry C. Barish, and Kip S. Thorne recorded the space phenomenon known as gravitational waves. Through their research they concluded that the waves were a result of two black holes colliding into each other. In order to share this data, they created a Jupyter Notebook in which users could view their data and even see the data in action in real time.

A sample of the data from the gravitational waves Jupyter Notebook.

Jupyter Notebooks work well within education. Recently colleges and universities have developed courses in which the class- and homework revolve around studying, viewing, analyzing, and creating Jupyter Notebooks. In Canada, the Pacific Institute for the Mathematical Sciences (PIMS), Compute Canada and Cybera teamed together to created a special version of Jupyter called syzygy.ca for students and professors across several Canadian Universities.

It is almost limitless how a programmer could utilize Jupyter Notebooks in their work. Whether it’s for data sharing, education, or research, the Jupyter Notebook is a tool that will be indispensable for software engineers in the future.

The Jupyter Project is committed to being open source. All the necessary tools, extensions, and complimentary programs needed to get started with Jupyter Notebooks are available to individuals under a self-proclaimed liberal BSD license.