Getting started with Open Data Hub

Jamie Hackett
4 min readApr 27, 2021

--

Photo by Stephen Dawson on Unsplash

As a data scientist I would have killed for a platform like Open Data Hub. Open Data Hub is an open source project that provides open source AI tools for running large and distributed AI workload on Openshift Container Platform.

In this post I want to show you how easy it is to deploy Open Data Hub (ODH) on Openshift using an operator. Then I’ll show you how quick it is to start using a JupyterLab notebook to do some data analysis.

This post was originally made as a video so if you find it easier to follow that please head here.

Deploying the ODH Operator

Head over to your instance of Openshift and create a new namespace for your ODH Project.

Once this done we can enable the ODH operator which will do all the heavy lifting for us, deploying all of the necessary components we need for our data platform. The operator for ODH can be found under the Operators -> OperatorHub. Search for “Open Data Hub” and click install.

Search for “Open Data Hub”

Once you’ve installed the operator on your cluster, you can click “Installed Operators” located under the Operators menu.

Click on the recently installed Open Data Hub operator and then click the “Create instance” hyperlink under “Provided APIs”

Click the “Create Instance” hyperlink

Once you’ve clicked the “Create Instance” hyperlink, you will be presented with either the option to create KfDef via a form view or a YAML view. Select the YAML view.

In this location we can define which AI tools we would like to deploy. In this instance we are going do a light deployment that has JupyterLab, although you can customise this to meet your needs.

Replace the original YAML with the following:

NOTE: Replace the #the name of your deployment, to match the name of the namespace you created earlier.

# ODH uses the KfDef manifest format to specify what components will be included in the deployment
apiVersion: kfdef.apps.kubeflow.org/v1
kind: KfDef
metadata:
# The name of your deployment
name: opendatahub
# only the components listed in the `KFDef` resource will be deployed:
spec:
applications:
# REQUIRED: This contains all of the common options used by all ODH components
- kustomizeConfig:
repoRef:
name: manifests
path: odh-common
name: odh-common
# Deploy Radanalytics Spark Operator
- kustomizeConfig:
repoRef:
name: manifests
path: radanalyticsio/spark/cluster
name: radanalyticsio-spark-cluster
# Deploy Open Data Hub JupyterHub
- kustomizeConfig:
parameters:
- name: s3_endpoint_url
value: s3.odh.com
repoRef:
name: manifests
path: jupyterhub/jupyterhub
name: jupyterhub
# Deploy addtional Open Data Hub Jupyter notebooks
- kustomizeConfig:
overlays:
- additional
repoRef:
name: manifests
path: jupyterhub/notebook-images
name: notebook-images
# Reference to all of the git repo archives that contain component kustomize manifests
repos:
# Official Open Data Hub v0.9.0 component manifests repo
# This shows that we will be deploying components from an archive of the odh-manifests repo tagged for v0.9.0
- name: manifests
uri: 'https://github.com/opendatahub-io/odh-manifests/tarball/v0.9.0'
version: v0.9-branch-openshift

Once you click create, the operator will go off and do all of the necessary work to create the Open Data Hub platform for you.

You can head back to projects and click on the namespace you created, you can then check the recent events to ensure that ODH has began to create all of the necessary pods needed for your ODH deployment.

On the right hand side you can see the recent events for my ODH namespace

Give it 10 to 20 minuntes for the operator to fully deploy ODH.

The next step is to setup your JupyterLab environment.

Ensure that you have selected the ODH project you’ve just created, then on the right hand menu of Openshift select Networking -> then Routes.

The routes menu will showcase the exposed routes that we need to access our JupyterLab. The exposed route should be called jupyterhub.

You can see my exposed route to my instance of JupyterLab

Once you click that route, you will be brought to the JupyerLab server spawner. You will need to sign in with Openshift.

Leave the default spawner options as they are and click “Spawn”.

After a few minutes you should have your JupyterLab environment, ready for you to begin some data analysis!

JupyterLab ready to rock!

Hopefully that showcases how quick and easy it is to get started with Open Data Hub on Openshift.

--

--

Jamie Hackett

Cloud Consultant at Red Hat | Passionate about getting AI/ML into production.