To read public datasets from S3 into a Saturn Jupyter notebook using Dask you can use this:

import dask.dataframe as dd

df = dd.read_csv('s3://nyc-tlc/trip data/yellow_tripdata_2009-*.csv', storage_options={'anon': True})


This examples shows reading all data records from 2009 from the NYC Taxi Dataset

Now you can use that Dask Dataframe to do ETL and train machine learning models.

Further articles for reference

Did this answer your question?