- A running JupyterLab instance (link on how to create one)
Go to the Dask tab in the dashboard.
Specify the name of the cluster, select a running instance to create it in, and choose the number of dask nprocs for each worker. Then click Create.
Once your cluster is finished provisioning you can execute this notebook to scale up the number of workers as the default is zero workers only the scheduler is provisioned.
To use the saturn dask cluster from within jupyter - first instantiate a cluster using
dask-saturn, which should be in your base environment. If not you may install it. Just open a terminal and do
pip install dask-saturn.
from dask_saturn import SaturnCluster
cluster = SaturnCluster()
You should have a cluster GUI above that you can use to manipulate the number of workers.
Instead of using the GUI, you can call
.adapt programatically to set the number of workers.
The external dashboard link should show up in the GUI and is also accessible on the cluster object:
Once the cluster is instantiated, we can use regular dask client stuff.
from distributed import Client
client = Client(cluster)
import dask.array as da
x = da.random.random((10000, 10000), chunks=(1000, 1000))
y = x + x.T
z = y[::2, 5000:].mean(axis=1)
When you are done with the cluster you can close it from within the notebook. This will delete all running pods.
Note: We currently do not have network policies in place to prevent a user from accessing another user's dask cluster. This is mitigated in our enterprise product since it is only accessible inside your EKS cluster, and only people with Saturn accounts at your company can access your EKS clusters. This will be addressed in a future release.