How to operate a complete MLOps pipeline with Dataiku Data Science Studio running on Kubernetes

Photo by SELİM ARDA ERYILMAZ on Unsplash

On my journey of getting familiarized with a relatively new field, Machine Learning Operations (MLOps), I’ve gained some valuable experience, which I’d like to share with you in a series of articles.

This is the first part of the series; each part deals with a particular segment of the complete solution. So you don’t need to read the whole story if you’re only interested in specific details. …

How to manage an entire MLOps pipeline on Kubernetes running on a Mac

The purpose of this article is to provide you with design and implementation ideas in the field of Machine Learning Operations (MLOps), describing a specific use case: How to implement an entire MLOps pipeline on a MacBook Pro. I’ve done this with Dataiku Data Science Studio (DSS) running on Kubernetes (K8s) with Docker Desktop. Even if your use case is different, you might benefit from my experience.

Photo by torben on Unsplash

Behind the main narrative — the complete solution description — I also detail some general design concepts, which might come in handy to solve some architectural challenges. These are as follows:

  • Kustomize: How…

Tutorial on how to employ Kubernetes secrets for an MLOps pipeline creatively

This article describes how to create Kubernetes (K8s) secrets as part of an installation guide to the Machine Learning Operations (MLOps) pipeline detailed in my post “MLOps on Kubernetes with Docker Desktop”.

Photo by Kristina Flour on Unsplash

The installation in the above-mentioned article uses the following secrets on the K8s pod:

  1. github-repo-cred to access the GitHub repo; used as an environment variable or file mount
  2. gitlab-registry-cred to push Docker images to GitLab Container Registry (CR); used as an environment variable or file mount
  3. gitlab-pull-cred to allow K8s to pull images from GitLab CR; used with imagePullSecrets in the pod definition or added to the service…

Ho to manage an MLOps node cluster on Kubernetes— template free

Kustomize is a tool to customize YAML files like Kubernetes (K8s) manifests, template free. Meanwhile, it became a built-in kubectl operation to apply K8s object definitions from YAML files stored in a hierarchical directory structure.

Photo by Eric Prouzet on Unsplash

There are some great examples in the documentation of Kustomize. Nevertheless, sharing my experience from a journey in the field of Machine Learning Operations (MLOps) might allow you to gain some practical knowledge from a real use case.

K8s object definition is thought to be a manifest of highly reproducible value. Therefore, templates or environment variables, to make content definitions alterable, are not supported. This…

A practical guide on how to use a filesystem image as Kubernetes volume whose content is stored persistently on the host machine

Photo by the author

Although not meant to be a production-ready environment, Docker Desktop provides a quite good playground for Kubernetes (K8s). Even on a playground, you would try to keep K8s configurations as close as possible to the final production version to make things efficient.

Therefore, I wanted to set up a K8s pod with a persistent volume incl. Access Control List (ACL) activated on my MacBook Pro. I needed ACL as a prerequisite for Dataiku Data Science Studio to support multi-user security or user isolation. A great feature to comply with the latest data protection requirements in the field of data science…

The advantage of the Access Control List (ACL) over standard Linux file permissions to comply with data protection regulations

Photo by AbsolutVision on Unsplash

Compliance with data protection regulations is becoming increasingly important nowadays. In my daily work as a Data Scientist, I‘m experiencing the challenge of how paperwork actually needs to be implemented down to the level of file access control to fulfill regulations such as GDPR.

ACL in Linux is a fine-graded permission control mechanism to access file system entities. It extends the concept of standard file permissions and allows more detailed management of who can do what on different object levels.

Some great data science tools like Dataiku Data Science Studio, with support for multi-user security (a.k.a. user isolation or user…

Tibor Fabian

Senior Data Scientist @ Telefonica; Visit my LinkedIn profile:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store