Requirements

Go to the Dask tab in the dashboard.

Specify the name of the cluster and select a running instance to create it in as well as number of dask nprocs for each worker and click Create.

Once your cluster finished provisioning you can execute this notebook to scale up the number of workers as the default is zero workers only the scheduler is provisioned.

To use the saturn dask cluster from within jupyter - first instantiate a cluster using SaturnCluster from dask-saturn. dask-saturn should be in the base environment, but if not, it is installable. Just open a terminal and do pip install dask-saturn.

from dask_saturn import SaturnCluster
cluster = SaturnCluster()
cluster

You should have a cluster GUI above that you can use to manipulate the number of workers.

cluster.status

Instead of using the GUI, you can call .scale and .adapt programatically to set the number of workers.

cluster.scale(2)

The external dashboard link should show up in the GUI and is also accessible on the cluster object:

cluster.dashboard_link

Once the cluster is instantiated, we can use regular dask client stuff. 

from distributed import Client

client = Client(cluster)
import dask.array as da

x = da.random.random((10000, 10000), chunks=(1000, 1000))

y = x + x.T
z = y[::2, 5000:].mean(axis=1)
z.compute()

When you are done with the cluster you can close it from within the notebook. This will delete all running pods.

cluster.close()


Note:  We currently do not have network policies in place to prevent a user from accessing another user's dask cluster.  This is mitigated in our enterprise product since it is only accessible inside your EKS cluster, and only people with Saturn accounts at your company can access your EKS clusters.  This will be addressed in a future release.

Did this answer your question?