distributed datasets
Dask: Scalable Python with Matthew Rocklin
Python is the most widely used language for data science, and there are several libraries that are commonly used by Python data scientists including Numpy, Pandas, and scikit-learn.





