Getting started with Open Data Hub
As a data scientist I would have killed for a platform like Open Data Hub. Open Data Hub is an open source project that provides open source AI tools for running large and distributed AI workload on Openshift Container Platform.
In this post I want to show you how easy it is to deploy Open Data Hub (ODH) on Openshift using an operator. Then I’ll show you how quick it is to start using a JupyterLab notebook to do some data analysis.
This post was originally made as a video so if you find it easier to follow that please head here.
Deploying the ODH Operator
Head over to your instance of Openshift and create a new namespace for your ODH Project.
Once this done we can enable the ODH operator which will do all the heavy lifting for us, deploying all of the necessary components we need for our data platform. The operator for ODH can be found under the Operators -> OperatorHub. Search for “Open Data Hub” and click install.
Once you’ve installed the operator on your cluster, you can click “Installed Operators” located under the Operators menu.
Click on the recently installed Open Data Hub operator and then click the “Create instance” hyperlink under “Provided APIs”
Once you’ve clicked the “Create Instance” hyperlink, you will be presented with either the option to create KfDef via a form view or a YAML view. Select the YAML view.
In this location we can define which AI tools we would like to deploy. In this instance we are going do a light deployment that has JupyterLab, although you can customise this to meet your needs.
Replace the original YAML with the following:
NOTE: Replace the #the name of your deployment, to match the name of the namespace you created earlier.
# ODH uses the KfDef manifest format to specify what components will be included in the deployment
apiVersion: kfdef.apps.kubeflow.org/v1
kind: KfDef
metadata:
# The name of your deployment
name: opendatahub
# only the components listed in the `KFDef` resource will be deployed:
spec:
applications:
# REQUIRED: This contains all of the common options used by all ODH components
- kustomizeConfig:
repoRef:
name: manifests
path: odh-common
name: odh-common
# Deploy Radanalytics Spark Operator
- kustomizeConfig:
repoRef:
name: manifests
path: radanalyticsio/spark/cluster
name: radanalyticsio-spark-cluster
# Deploy Open Data Hub JupyterHub
- kustomizeConfig:
parameters:
- name: s3_endpoint_url
value: s3.odh.com
repoRef:
name: manifests
path: jupyterhub/jupyterhub
name: jupyterhub
# Deploy addtional Open Data Hub Jupyter notebooks
- kustomizeConfig:
overlays:
- additional
repoRef:
name: manifests
path: jupyterhub/notebook-images
name: notebook-images
# Reference to all of the git repo archives that contain component kustomize manifests
repos:
# Official Open Data Hub v0.9.0 component manifests repo
# This shows that we will be deploying components from an archive of the odh-manifests repo tagged for v0.9.0
- name: manifests
uri: 'https://github.com/opendatahub-io/odh-manifests/tarball/v0.9.0'
version: v0.9-branch-openshift
Once you click create, the operator will go off and do all of the necessary work to create the Open Data Hub platform for you.
You can head back to projects and click on the namespace you created, you can then check the recent events to ensure that ODH has began to create all of the necessary pods needed for your ODH deployment.
Give it 10 to 20 minuntes for the operator to fully deploy ODH.
The next step is to setup your JupyterLab environment.
Ensure that you have selected the ODH project you’ve just created, then on the right hand menu of Openshift select Networking -> then Routes.
The routes menu will showcase the exposed routes that we need to access our JupyterLab. The exposed route should be called jupyterhub.
Once you click that route, you will be brought to the JupyerLab server spawner. You will need to sign in with Openshift.
Leave the default spawner options as they are and click “Spawn”.
After a few minutes you should have your JupyterLab environment, ready for you to begin some data analysis!
Hopefully that showcases how quick and easy it is to get started with Open Data Hub on Openshift.