To read public datasets from S3 into a Saturn Jupyter notebook using Dask you can use this:

import dask.dataframe as dd

df = dd.read_csv('s3://nyc-tlc/trip data/yellow_tripdata_2009-*.csv', storage_options={'anon': True})

df.head()

This examples shows reading all data records from 2009 from the NYC Taxi Dataset
https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page


Now you can use that Dask Dataframe to do ETL and train machine learning models.


Further articles for reference
https://docs.dask.org/en/latest/remote-data-services.html#amazon-s3
https://examples.dask.org/dataframes/01-data-access.html
https://examples.dask.org/machine-learning



Did this answer your question?