Sr Rancher Kubernetes Consultant

Our client, a large professional services firm, is looking to hire an experienced Rancher Kubernetes expert for a 6-month+ contract to lead the design, automation, and reliability of on-prem and hybrid container platforms. The consultant will sit at the intersection of the Platform Engineering and Infrastructure Reliability teams, owning the lifecycle of Rancher-managed clusters—from bare-metal provisioning and performance tuning to observability, security, and automated operations. The consultant will apply SRE principles to ensure high availability, scalability, and resilience across environments supporting mission-critical workloads.

Core Responsibilities:

Platform & Infrastructure Engineering
Design, deploy, and maintain Rancher-managed Kubernetes clusters (RKE2/K3s) at enterprise scale
Architect highly available clusters integrated with on-prem infrastructure: UCS, VxLAN, storage, DNS, and load balancers
Lead Rancher Fleet implementations for GitOps-driven cluster and workload management

Performance Engineering & Optimization

Tune clusters for high-performance workloads on bare-metal hardware, optimizing CPU, memory, and I/O paths

Align cluster scheduling and resource profiles with physical infrastructure topologies (NUMA, NICs, etc.)

Optimize CNI, kubelet, and scheduler settings for low-latency, high-throughput applications

Security & Compliance

Implement security-first Kubernetes patterns: RBAC, Pod Security Standards, network policies, and image validation

Drive left-shifted security using Terraform, Helm, and CI/CD pipelines; align to PCI, FIPS, and CIS benchmarks

Lead infrastructure risk reviews and implement guardrails for regulated environments

Automation & Tooling
Build and maintain IaC stacks using Terraform, Helm, and Argo CD
Develop platform automation and observability tooling using Python or Go
Ensure declarative management of infrastructure and applications through GitOps pipelines

SRE & Observability

Apply SRE best practices for platform availability, capacity, latency, and incident response

Operate and tune Prometheus, Grafana, and ELK/EFK stacks for complete platform observability

Drive actionable alerting, automated recovery mechanisms, and clear operational documentation

Lead postmortems and drive systemic improvements to reduce MTTR and prevent recurrence

Required Skills:

7+ years in infrastructure, platform, or SRE roles
Deep hands-on experience with Rancher (RKE2/K3s) in production environments
Proficient with Terraform, Helm, Argo CD, Python, and/or Go
Demonstrated performance tuning in bare-metal Kubernetes environments (UCS, VxLAN, MetalLB)
Expert in Linux systems (systems, networking, kernel tuning), Kubernetes internals, and container runtimes
Real-world application of SRE principles in high-stakes, always-on environments
Strong background operating Prometheus, Grafana, and Elasticsearch/Fluentd/Kibana (ELK/EFK) stacks

Desired Skills:

Experience integrating Kubernetes with OpenStack and Magnum
Knowledge of Rancher add-ons: Fleet, Longhorn, CIS Scanning
Familiarity with compliance-driven infrastructure (PCI, FedRAMP, SOC2)
Certifications: CKA, CKS, or Rancher Kubernetes Administrator

Sr Rancher Kubernetes Consultant

Apply for this Job.