Snowflake

Snowflake is cloud native enterprise data warehouse.  They publish native python drivers, the snowflake-connector-python , or if you prefer to use SQLAlchemy, snowflake-sqlalchemy  .  This article goes through using both packages, as well as building them into your own image.

Snowflake Connector with Pandas

After having spinning up a JupyterLab instance and a Dask cluster connecting to your Snowflake data warehouse using snowflake-connector-python is simple.

Then you can create a Dask Dataframe from the Pandas Dataframe created reading data from Snowflake.

import snowflake.connector
import pandas as pd

ctx = snowflake.connector.connect(
    user='YOUR_USER',
    password='YOUR_PASSWORD',
    account='YOUR_ACCOUNT'
)
query = "SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER"
pd.read_sql(query, ctx)

Snowflake SQLAlchemy

Alternatively, if you want to use snowflake-sqlalchemy can use Snowflake as you would any other database using SQLAlchemy.

And then create Pandas and Dask DataFrames reading from Snowflake:

from sqlalchemy import create_engine
import pandas as pd
import dask.dataframe as dd

engine = create_engine(
    'snowflake://{user}:{password}@{account}/'.format(
          user='YOUR_USER',
          password='YOUR_PASSWORD',
          account='YOUR_ACCOUNT'
    )
)
query = "SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER"
pd_df = pd.read_sql_query(query, engine)

Snowflake Connector with Dask


from sqlalchemy import create_engine
from snowflake.sqlalchemy import URL
from sqlalchemy import sql
from dask.dataframe.io.sql import read_sql_table

engine = create_engine(URL(
    user='YOUR_USER',
    password='YOUR_PASSWORD',
    account='YOUR_ACCOUNT',
    schema ='TPCH_SF1',
    database = 'SNOWFLAKE_SAMPLE_DATA'
))

engine_url = engine.url


Then you can query the database using read_sql_table which reads it into a Dask DataFrame. 

ddf = read_sql_table("CUSTOMER", engine_url, schema=None, index_col='c_custkey')
ddf.head()
df = ddf.compute()

query = sql.select([sql.column("c_custkey"), sql.column("c_mktsegment")]).select_from(
    sql.table("CUSTOMER")
)
ddf = read_sql_table(query, engine_url, npartitions=2, index_col='c_custkey')
df = ddf.compute()
df.head()

Images in Saturn

Saturn images will come with the snowflake connector installed in March.  If you are happy with our image, then you're good to go!

Most companies will be building their own images.  In Saturn, you can easily include one or both drivers by including the following packages from the conda-forge  channel, snowflake-connector-python or snowflake-sqlalchemy 

Here are a few environment.yml  files to get you started.

channels:
  - conda-forge
  - defaults
dependencies:
  - dask
  - snowflake-connector-python
channels:
  - conda-forge
  - defaults
dependencies:
  - snowflake-sqlalchemy


Did this answer your question?