• 65k nodes on GKE, with Maciej Rozacki and Wojciech Tyczyński
    Nov 13 2024

    Guests are Maciej Rozacki, Product Manager on GKE for AI Training, and Wojciech Tyczyński, Software Engineer on the GKE team at Google. We explore what it means for GKE to support 65k nodes, and the open source contributions that made this possible

    Do you have something cool to share? Some questions? Let us know:

    - web: kubernetespodcast.com

    - mail: kubernetespodcast@google.com

    - twitter: @kubernetespod

    News of the week

    The Kubernetes Podcast is on Bluesky

    OpenTelemetry expanding into CI/CD observability

    Gitpod is moving away from Kubernetes

    OpenCost is a CNCF Incubated project

    Links from the interview

    Guests:

    • Maciek

    • Wojciech

    Kubernetes OSS Scalability thresholds

    PGS on the Kubernetes Podcast

    Batch Working Group

    Serving Working Group episode on the podcast

    Dynamic Resource Allocation

    Kueue

    Multitenancy and Fairness at Scale with Kueue

    SIG Scalability

    Links from the post-interview chat

    Consistent Reads from Cache

    Kubernetes Scalability: A Multi-Dimensional Analysis

    Show More Show Less
    49 mins
  • Working Group Serving, with Yuan Tang and Eduardo Arango
    Oct 31 2024

    Yuan is a principal software engineer at Red Hat, working on OpenShift AI. Previously, he has led AI infrastructure and platform teams at various companies. He holds leadership positions in open source projects, including Argo, Kubeflow, and Kubernetes WG Serving. Yuan authored three technical books and is a regular conference speaker, technical advisor, and leader at various organizations.

    Eduardo is an environmental engineer derailed into a software engineer. Eduardo has been working on making containerized environments the de facto solution for High Performance Computing(HPC) for over 8 years now. Began as a core contributor to the niche Singularity Containers, today known as Apptainer under the Linux foundation. In 2019 Eduardo moved up the ladder to work on making Kubernetes better for performance oriented applications. Nowadays Eduardo works at NVIDIA on the Core Cloud Native team working on enabling specialized accelerators into Kubernetes workloads.

    Do you have something cool to share? Some questions? Let us know:

    - web: kubernetespodcast.com

    - mail: kubernetespodcast@google.com

    - twitter: @kubernetespod

    News of the week

    Docker official terraform provider

    Tetrate and Bloomberg Envoy AI Gateway

    KubeCon+CloudNativeCon North America 2024 laptop drive

    Remaining KCDs for 2024

    Links from the interview

    Yuan Tang

    Eduardo Arango
    WG Serving

    Kserve

    Kserve Serving models with OCI images

    LLM Gateway

    Dynamic Resources Allocation

    Show More Show Less
    39 mins
  • Container Security, with Michele Chubrika
    Oct 15 2024

    This episode is special. We collaborated with the folks behind the Cloud Security Podcast from Google, Anton Chuvakin(LinkedIn)and Tim Peacock, to bring you a joint episode. We had the pleasure to jointly interview Michelle Chubirka, a Cloud Security Developer Advocate. We talked about VM and Container security, debunked some myths about isolation, attack surfaces, immutability of containers, and more.

    Do you have something cool to share? Some questions? Let us know:

    - web: kubernetespodcast.com

    - mail: kubernetespodcast@google.com

    - twitter: @kubernetespod

    News of the week
    • Nvidia NIM on GKE

    • Kubernetes Steering Committee Election Results for 2024

    • The schedule for KubeCon and CloudNativeCon India

    • Diagrid Catalyst Beta

    • Dapr on the Kubernetes Podcast with Salaboy

    Links from the interview
    • Cloud Security Podcast

    • Anton Chuvakin

    • Tim Peacock

    • Michelle Chubirka

    • Dora report

    • Container Security: It’s All About the Supply Chain - Michele Chubirka

    • Software composition analysis (SCA)

    • DevSecOps Decisioning Principles

    • Kubernetes CIS Benchmark

    • Cloud-Native Consumption Principles

    • State of WebAssembly outside the Browser - Abdel Sghiouar

    • Why Perfect Compliance Is the Enemy of Good Kubernetes Security - Michele Chubirka - KubeCon NA 2024

    Links from the post-interview chat
    • Cloud Code

    • Skaffold

    • Introduction to Distributed ML Workloads with Ray on Kubernetes - Mofi Rahman & Abdel Sghiouar - KubeCon NA 2024

    Show More Show Less
    56 mins
  • KCP, with Marvin Beckers
    Oct 1 2024

    Marvin Beckers is a Team Lead at Kubermatic and a contributor and maintainer of the CNCF Sandbox Project, KCP. KCP is an open source horizontally scalable control plane for Kubernetes-like APIs.

    Do you have something cool to share? Some questions? Let us know:

    - web: kubernetespodcast.com

    - mail: kubernetespodcast@google.com

    - twitter: @kubernetespod



    News of the week
    • [Docker Blog] Announcing Upgraded Docker Plans: Simpler, More Value, Better Development and Productivity

    • [LinuxFoundation Blog] Linux Foundation Announces Intent to Form Developer Relations Foundation

    • [Computer Weekly Article] NetApp Insight 2024 - Live show report: day zero

    Links from the interview
    • KCP

    • Kubernetes Resource Model (KRM)

    • Crossplane

    Links from the post-interview chat
    • Cloud Native Maturity Model

    Show More Show Less
    32 mins
  • Spotify AI Platform, with Avin Regmi and David Xia
    Sep 24 2024

    Guests are Avin Regmi and David Xia from Spotify. We spoke to Avin and David about their work building Spotify’s Machine Learning Platform, Hendrix. They also specifically talk about how they use Ray to enable inference and batch workloads. Ray was featured on episode 235 of our show, so make sure you check out that episode too.

    Do you have something cool to share? Some questions? Let us know:

    - web: kubernetespodcast.com

    - mail: kubernetespodcast@google.com

    - twitter: @kubernetespod

    News of the week

    IBM acquired Kubecost

    KubeCon Japan in 2025

    Call for Proposals for KubeCon EU 2025 is now open

    Artifact Hub is a CNCF incubating project

    OpenMetrics is dead, long live OpenMetrics

    Kubecolor 0.4.0

    Links from the interview

    Avin Regmi

    David Xia

    Hendrix ML Platform

    Ray on Kubernetes

    KubeRay

    Workbench instances

    Backstage

    PyTorch

    Ray Summit 2024

    Kueue

    Show More Show Less
    1 hr
  • Dagger, with Solomon Hykes
    Sep 17 2024

    Solomon Hykes is the co-founder of Dagger. He is probably best known as the creator of Docker. The tool that changed how developers package, run and distribute software in the last 11 years. His impact on our industry is undeniable. Today, we discuss his new venture, Dagger. Dagger is a new approach to how we do CI/CD.

    Do you have something cool to share? Some questions? Let us know:

    - web: kubernetespodcast.com

    - mail: kubernetespodcast@google.com

    - twitter: @kubernetespod



    News of the week
    • Kubeadm v1beta4

    • 1.32 Release Cycle Info

    • Updates to the Certified Kubernetes Administrator Exam

    • 2024 Generative AI Survey

    • Microsoft Azure Advanced Container Networking enhancements

    Links from the interview
    • Solomon Hykes on LinkedIn

    • Dagger

    • OpenStack

    • Act (GitHub Actions Locally)

    • Buildkit

    • Cue

    • GraphQL

    • Dagger Discord

    • Caching - Dagger Documentation

    • Bazel

    • Terraform

    • Pulumi

    • Kubectl

    • gRPC

    • GraphQL

    • Google Cloud’s Package Index

    • The Daggerverse

    • Cloud Foundry

    • PostHog

    • RedHat Development Model

    Links from the post-interview chat
    • Scaffold

    • Solomon Hykes - Docker, Dagger, and the Future of DevOps

    • Directed Acyclic Graphs

    • Solomon Hykes on wikipedia

    • Stack Overflow

    Show More Show Less
    1 hr and 7 mins
  • Ray & KubeRay, with Richard Liaw and Kai-Hsun Chen
    Sep 3 2024
    In this episode, guest host and AI correspondent Mofi Rahman interviews Richard Liaw and Kai-Hsun Chen from Anyscale about Ray and KubeRay. Ray is an open-source unified compute framework that makes it easy to scale AI and Python workloads, while KubeRay integrates Ray’s capabilities into Kubernetes clusters. Do you have something cool to share? Some questions? Let us know: - web: kubernetespodcast.com - mail: kubernetespodcast@google.com - twitter: @kubernetespod News of the week CNCF Blog - LitmusChaos audit complete! Kubernetes Podcast from Google episode 234 - LitmusChaos, with Karthik Satchitanand Google Cloud Blog - Run your AI inference applications on Cloud Run with NVIDIA GPUs Diginomica article - KubeCon China - at 33-and-a-third, Linux is a long player. So, why does Linus Torvalds hate AI? CNCF-Hosted Co-Located Event Schedule for KubeCon NA 2024 Google Kubernetes Engine Release Notes - August 20, 2024 (1.31 available in Rapid Channel) Kubernetes Podcast from Google - Kubernetes v1.31: "Elli", with Angelos Kolaitis Red Hat Press Release - Red Hat OpenStack Services on OpenShift is Now Generally Available Red Hat Enables OpenStack to Run Natively on OpenShift Platform Broadcom Revamps Tanzu to Simplify Cloud-Native App Development and Deployment Tanzu Platform 10 Offers Cloud Foundry Users Deep Visibility and Productivity Enhancements VMware Explore Conference Website CNCF Blog - Announcing 500 Kubestronauts CNCF - Kubestronaut FAQ Dapr Day 2024 Virtual Event Website Links from the interview Kai-Hsun Chen on LinkedIn Richard Liaw on LinkedIn Ray from the RISE Lab at UC Berkeley Ray: A Distributed System for AI by Robert Nishihara and Philipp Moritz - Jan 9, 2018 KubeRay Docs KubeRay on GitHub PyTorch Apache Airflow Apache Spark Kubeflow Apache Submarine (retired) Jupyter Notebooks VS Code Examples of schedulers for Batch/AI workloads in Kubernetes Kueue Volcano Apache Yunikorn Examples of observability tools for Batch/AI workloads in Kubernetes Prometheus Grafana Fluentbit Examples of loadbalancers Nginx Istio Ray Data: Scalable Datasets for ML Dask Python - Parallel Python Ray Serve: Scalable and Programmable Serving HPA - Horizontal Pod Autoscaling in Kubernetes Karpenter - “Just-in-time nodes for any Kubernetes cluster” Lazy Computation Graphs with the Ray DAG API Types of hardware accelerators Google Cloud Tensor Processing Units (TPUs) AMD Instinct AMD Radeon AWS Trainium AWS Inferentia Pandas Numpy KubeCon EU 2024 - Accelerators(FPGA/GPU) Chaining to Efficiently Handle Large AI/ML Workloads in K8s - Sampath Priyankara, Nippon Telegraph and Telephone Corporation & Masataka Sonoda, Fujitsu Limited NVidia Megatron Links from the post-interview chat DRA - Dynamic Resource Allocation in Kubernetes Different ways of Running RayJob on Kubernetes Ray framework diagram in the docs
    Show More Show Less
    55 mins
  • LitmusChaos, with Karthik Satchitanand
    Aug 20 2024

    In this episode, we spoke to Karthik Satchitanand. Karthik is a principal software engineer at Harness and co-founder and maintainer of LitmusChaos, a CNCF incubated project. We talked about Chaos engineering , the Litmus project and more.

    Do you have something cool to share? Some questions? Let us know:

    - web: kubernetespodcast.com

    - mail: kubernetespodcast@google.com

    - twitter: @kubernetespod

    News of the week
    • Kubernetes 1.31 release blog

    • Kubernetes 1.31 release episode of the Kubernetes Podcast from Google

    • KubeCon NA 2024 Schedule

    • Score accepted as a CNCF Sandbox Project

    Links from the interview
    • LitmusChaos

    • principlesofchaos.org

    • Okteto

    • LitmusChaosCon

    • community.cncf.io

    Links from the post-interview chat
    • Chaos Monkey

    • Chapter 5 of “Chaos Engineering” by Casey Rosenthal, Nora Jones, published by O’Reilly, covers DiRT

    • LitmusChaos ChaosHub

    • Klustered on YouTube

    • Rawkode Academy

    Show More Show Less
    54 mins