Our client, a large professional services firm, is looking to hire an experienced Rancher Kubernetes expert for a 6-month+ contract to lead the design, automation, and reliability of on-prem and hybrid container platforms. The consultant will sit at the intersection of the Platform Engineering and Infrastructure Reliability teams, owning the lifecycle of Rancher-managed clusters—from bare-metal provisioning and performance tuning to observability, security, and automated operations. The consultant will apply SRE principles to ensure high availability, scalability, and resilience across environments supporting mission-critical workloads.
Core Responsibilities:
- Platform & Infrastructure Engineering
- Design, deploy, and maintain Rancher-managed Kubernetes clusters (RKE2/K3s) at enterprise scale
- Architect highly available clusters integrated with on-prem infrastructure: UCS, VxLAN, storage, DNS, and load balancers
- Lead Rancher Fleet implementations for GitOps-driven cluster and workload management
- Automation & Tooling
- Build and maintain IaC stacks using Terraform, Helm, and Argo CD
- Develop platform automation and observability tooling using Python or Go
- Ensure declarative management of infrastructure and applications through GitOps pipelines
Required Skills:
- 7+ years in infrastructure, platform, or SRE roles
- Deep hands-on experience with Rancher (RKE2/K3s) in production environments
- Proficient with Terraform, Helm, Argo CD, Python, and/or Go
- Demonstrated performance tuning in bare-metal Kubernetes environments (UCS, VxLAN, MetalLB)
- Expert in Linux systems (systems, networking, kernel tuning), Kubernetes internals, and container runtimes
- Real-world application of SRE principles in high-stakes, always-on environments
- Strong background operating Prometheus, Grafana, and Elasticsearch/Fluentd/Kibana (ELK/EFK) stacks
Desired Skills:
- Experience integrating Kubernetes with OpenStack and Magnum
- Knowledge of Rancher add-ons: Fleet, Longhorn, CIS Scanning
- Familiarity with compliance-driven infrastructure (PCI, FedRAMP, SOC2)
- Certifications: CKA, CKS, or Rancher Kubernetes Administrator