To read public datasets from S3 into a Saturn Jupyter notebook using Dask you can use this:

import dask.dataframe as dd

df = dd.read_csv('s3://nyc-tlc/trip data/yellow_tripdata_2009-*.csv', storage_options={'anon': True})


This examples shows reading all data records from 2009 from the NYC Taxi Dataset

Now you can use that Dask Dataframe to do ETL and train machine learning models.

