(781) 916-2284 [email protected]

Sr Rancher Kubernetes Consultant

Our client, a large professional services firm, is looking to hire an experienced Rancher Kubernetes expert for a 6-month+ contract to lead the design, automation, and reliability of on-prem and hybrid container platforms.  The consultant will sit at the intersection of the Platform Engineering and Infrastructure Reliability teams, owning the lifecycle of Rancher-managed clusters—from bare-metal provisioning and performance tuning to observability, security, and automated operations.  The consultant will apply SRE principles to ensure high availability, scalability, and resilience across environments supporting mission-critical workloads.
 
Core Responsibilities:

  • Platform & Infrastructure Engineering
  • Design, deploy, and maintain Rancher-managed Kubernetes clusters (RKE2/K3s) at enterprise scale
  • Architect highly available clusters integrated with on-prem infrastructure: UCS, VxLAN, storage, DNS, and load balancers
  • Lead Rancher Fleet implementations for GitOps-driven cluster and workload management
  • Performance Engineering & Optimization
  • Tune clusters for high-performance workloads on bare-metal hardware, optimizing CPU, memory, and I/O paths
  • Align cluster scheduling and resource profiles with physical infrastructure topologies (NUMA, NICs, etc.)
  • Optimize CNI, kubelet, and scheduler settings for low-latency, high-throughput applications
  • Security & Compliance
  • Implement security-first Kubernetes patterns: RBAC, Pod Security Standards, network policies, and image validation
  • Drive left-shifted security using Terraform, Helm, and CI/CD pipelines; align to PCI, FIPS, and CIS benchmarks
  • Lead infrastructure risk reviews and implement guardrails for regulated environments
    • Automation & Tooling
    • Build and maintain IaC stacks using Terraform, Helm, and Argo CD
    • Develop platform automation and observability tooling using Python or Go
    • Ensure declarative management of infrastructure and applications through GitOps pipelines
  • SRE & Observability
  • Apply SRE best practices for platform availability, capacity, latency, and incident response
  • Operate and tune Prometheus, Grafana, and ELK/EFK stacks for complete platform observability
  • Drive actionable alerting, automated recovery mechanisms, and clear operational documentation
  • Lead postmortems and drive systemic improvements to reduce MTTR and prevent recurrence 

    Required Skills:

    • 7+ years in infrastructure, platform, or SRE roles
    • Deep hands-on experience with Rancher (RKE2/K3s) in production environments
    • Proficient with Terraform, Helm, Argo CD, Python, and/or Go
    • Demonstrated performance tuning in bare-metal Kubernetes environments (UCS, VxLAN, MetalLB)
    • Expert in Linux systems (systems, networking, kernel tuning), Kubernetes internals, and container runtimes
    • Real-world application of SRE principles in high-stakes, always-on environments
    • Strong background operating Prometheus, Grafana, and Elasticsearch/Fluentd/Kibana (ELK/EFK) stacks

    Desired Skills:

    • Experience integrating Kubernetes with OpenStack and Magnum
    • Knowledge of Rancher add-ons: Fleet, Longhorn, CIS Scanning
    • Familiarity with compliance-driven infrastructure (PCI, FedRAMP, SOC2)
    • Certifications: CKA, CKS, or Rancher Kubernetes Administrator