Jobme.dev

Company DescriptionOpen Innovation AI is a global technology company that specializes in developing advanced solutions for managing AI workloads. Its flagship product, the Open Innovation Cluster Manager (OICM), orchestrates complex AI tasks efficiently across diverse infrastructures. The platform is hardware-agnostic, optimized for various GPUs and accelerators hardware, and facilitates seamless integration and scalability for enterprise AI applications. Open Innovation AI focuses on optimizing and simplifying AI workload management and making AI technologies accessible to organizations of all sizes. With its innovative solutions, companies can reduce operational costs, accelerate time to value, and maximize their return on investment, ensuring that their AI strategies contribute directly to enhanced business outcomes.Role Overview:The Systems Engineer – Kubernetes role ensures that Open Innovation’s Kubernetes platforms are delivered and supported in a consistent and reliable way. The role focuses on keeping development clusters stable for product teams, preparing operating system images and drivers needed for Kubernetes and its extensions, and building the detailed deployment packages that allow customer environments to be rolled out smoothly. Acting as the link between design and delivery, the engineer turns architectural standards into practical, well-documented solutions that can be executed by deployment teams and sustained by operations.Requirements: Operate and support Kubernetes clusters in development environments, ensuring stability and availability for product teams.Prepare OS images, kernel modules, and drivers required for Kubernetes nodes and extensions (e.g., CSI, GPU).Build and maintain deployment templates, manifests, and automation playbooks for consistent cluster rollouts and upgrades.Ensure observability and performance by providing monitoring baselines, logging integrations, and performance validation for workloads.Conduct performance engineering activities, including cluster tuning, benchmarking, and resource optimization for AI/HPC scenarios.Collaborate with infrastructure, network, and platform teams to align Kubernetes with underlying systems and services.Produce deployment guides, runbooks, and operational documentation for delivery and operations teams.Provide SME-level support for Kubernetes and container platform issues, coordinating with vendors for escalations and compatibility updatesQualification, Experience, Competence and CertificationsBachelor’s degree in computer science, Engineering, or related field.4–7 years of experience in systems or platform engineering, with direct experience supporting Kubernetes environments.Strong Linux fundamentals and experience building and maintaining OS images, drivers, and kernel modules for Kubernetes nodes.Familiarity with Kubernetes storage and networking integrations (CNI, CSI, ingress/egress), with ability to provide SME-level support.Hands-on experience with automation and deployment tools (e.g., Ansible, Terraform, Helm, Kustomize, Python scripting).Knowledge of observability and monitoring stacks (Prometheus, Grafana, logging pipelines) and ability to define monitoring baselines.Experience in performance engineering, including cluster tuning, benchmarking, and optimization for compute, storage, and GPU workloads.Exposure to accelerator/GPU enablement (driver packaging, operators, scheduling) in Kubernetes environments is a strong advantage.Practical experience producing LLDs, deployment guides, and runbooks for engineering and delivery teams.Experience working with vendors for Support cases, driver/compatibility issues, and software updates.Certifications such as CKA, CKAD, or CKS are desirable.Strong troubleshooting, documentation, and cross-team collaboration skills

Job Posting

Systems Engineer - Kubernetes

Job Details

Job Description

Skills

How to Apply

Job Posting

Systems Engineer - Kubernetes

Job Details

Job Description

Skills

How to Apply