Prefect is an open source workflow orchestration tool made for data-intensive workloads. In that project's language, you chain together a series of "tasks" into a "flow". If you are new to prefect, go read this introduction and then come back to this tutorial.

In this tutorial, you'll learn how to:

  • create an environment with prefect installed
  • test a prefect flow in a notebook
  • use Dask to speed up a prefect flow
  • schedule a flow to run on a schedule in a Saturn Custom Deployment

This tutorial assumes that you have installed and set up Saturn Cloud. If you haven't done that yet, please see "Getting Saturn from the AWS Marketplace" and then return to tutorial.

Set up an environment

Navigate to the Jupyter page and create a new project. A "project" contains information about the environment your code runs in.

Before clicking Create, scroll down to the Advanced Settings. prefect is not available in the standard Saturn images, so we'll add it in using a "Start Script". The "Start Script" is a small shell script that is run every time a Jupyter server, Dask worker, or Custom Deployment starts up.

pip install prefect==0.11.5

After adding the start script, click Create.

Author and test a prefect flow

On the Jupyter page, you should now see a card with the name of the Jupyter server you just created. Click the play button on the card to start it.

Once the Jupyter server is ready, click Jupyter Lab to open a JupyterLab instance.

Click File -> New -> Terminal to open a terminal, and run the following commands in it to download the example notebook for this article.

cd /home/jovyan/project

EXAMPLE_REPO_URL=https://raw.githubusercontent.com/saturncloud/examples/main/prefect

wget ${EXAMPLE_REPO_URL}/prefect-scheduled-scoring.ipynb

wget ${EXAMPLE_REPO_URL}/flow.png

Double-click prefect-scheduled-scoring.ipynb in the file browser to open a notebook that contains code with a sample prefect flow for this tutorial.

Walk through the documentation and code in that notebook. It describes a flow that has sample code for scheduled evaluation of a statistical model, and describes how to run that flow on a Saturn Dask cluster.

When you're done, come back to this article to learn how to deploy that flow in a Saturn Custom Deployment.

Deploy a prefect flow in Saturn

Notebooks are great for rapid prototyping and interactive work, but since the flow in this tutorial needs to be run on a schedule, it would be better to run it in an isolated environment that can be started, stopped, and scaled.

Saturn Cloud offers such a deployment option, called a Custom Deployment. A Saturn Custom Deployment runs a particular version of the code you've developed in an environment identical to the one JupyterLab runs in: the same image, environment variables, start script, and more.

For more details on Custom Deployments, see "Custom Deployments on Saturn Cloud".

Return to the JupyterLab instance you worked in in the previous section. To get your code ready to be deployed, we're going to convert it from a notebook to a Python script (.py). Click Kernel -> Restart Kernel And Clear All Outputs. This will clear out any interactive outputs like printed messages and warnings.

Click File -> New -> Terminal to open a terminal, and run the following commands in it.

cd /home/jovyan/project
jupyter nbconvert --to script prefect-scheduled-scoring.ipynb

This should have created a new file called prefect-scheduled-scoring.py. Double-click it in the file browser to open it.

Scroll through the file and make sure that the only code in it is code that you want to run automatically in your deployment. Remove the line #!/usr/bin/env python. The last executed statement should be a call to flow.run()

flow.run(
executor=executor
)

Commit this code and push it to the git repository used to manage project source code.

cd /home/jovyan/project
git add prefect-scheduled-scoring.py
git commit -m "Add deployment code"
git push origin HEAD

NOTE: Saturn automatically commits the state of /home/jovyan/project to a git repository once every minute. If the commands above result in a message like "nothing to commit, working tree clean", that just means the code has already been committed and you don't need to push anything else! See "Version Control" for more details on how Saturn manages project files.

Return to the Saturn UI and navigate to the Custom Deployment page. In the Create A Deployment section, fill in the following values:

  • Name: ticket-scoring
  • Project: myjupyter
  • Command: /srv/conda/envs/saturn/bin/python /home/jovyan/project/prefect-scheduled-scoring.py
  • Instance Count: 1
  • Instance Size: Medium

Click Create to create the deployment, then click the play button to start it.

The first time that this deployment runs, the dask-saturn code in the Python script will provision a new Dask cluster that is "attached" to this deployment. Stopping the deployment will stop that Dask cluster, and deleting the deployment will delete the Dask cluster.

After a few minutes, you should be able to see this newly-created cluster on the Dask page. Every time a worker is spun up in that cluster, it will use the same image, environment variables and start script as the Jupyter instance you set up at the beginning of this tutorial.

Once the deployment is ready and running, click >_ to see the logs. If all is working, you should see something like this at the end of the logs.

[2020-06-08 16:00:49] INFO - prefect.ticket-model-evaluation | Waiting for next scheduled run at 2020-06-08T16:01:00+00:00

[2020-06-08 16:01:00] INFO - prefect.FlowRunner | Beginning Flow run for 'ticket-model-evaluation'

[2020-06-08 16:01:00] INFO - prefect.FlowRunner | Starting flow run.

[2020-06-08 16:05:31] INFO - prefect.FlowRunner | Flow run SUCCESS: all reference tasks succeeded

[2020-06-08 16:05:31] INFO - prefect.ticket-model-evaluation | Waiting for next scheduled run at 2020-06-08T16:06:00+00:00 [2020-06-08 16:06:00] INFO - prefect.FlowRunner | Beginning Flow run for 'ticket-model-evaluation'

[2020-06-08 16:06:00] INFO - prefect.FlowRunner | Starting flow run.

[2020-06-08 16:06:12] INFO - prefect.FlowRunner | Flow run SUCCESS: all reference tasks succeeded

Next Steps

In this tutorial, you learned how to deploy a scheduled prefect flow as a Saturn custom deployment, how to use a Dask cluster to run that flow, and how to provision and manage that cluster with Saturn.

The deployment in this tutorial was designed for teaching purposes, but in your own deployments you might consider some improvements like:

  • Writing the trial summaries to a database. See "Adding IAM Credentials to Saturn" to learn about adding secrets to a Saturn project.
  • Installing prefect one time in a custom image, so it isn't re-installed every time a resource is created.
Did this answer your question?