Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE) | Leading AI Systems Scale-Up

Location: Sydney CBD (Hybrid)- 1 day per week in their CBD office
Sector: Enterprise AI & Cybersecurity
Company Phase: Australia’s #1 AI Agent Platform | Global Expansion Mode

The Company Our client is the undisputed leader in the Australian Agentic AI space. While others are experimenting with basic chatbots, this team is architecting autonomous, reasoning AI agents that manage end-to-end workflows for the world’s largest enterprise and financial institutions.
Having dominated the domestic market, they are now scaling their "Secure by Design" infrastructure globally. This is a rare opportunity to join a high-growth, Sydney-founded success story as they take their mission-critical platforms to the international stage.

The Opportunity
As a Site Reliability Engineer, you won't just be "maintaining" servers. You will be the architect of resilience for a global AI platform. You will sit at the intersection of high-scale Kubernetes orchestration, automated infrastructure, and "Security-First" DevSecOps. We need a Sydney-based engineer who treats infrastructure as software. If you have a background in software development but have found your true calling in scaling complex systems and reducing "toil" through automation, this is the role for you.

Key Responsibilities

Global Kubernetes Orchestration: Design and operate container workloads (EKS/AKS/GKE) to support rapid international scaling.
Infrastructure as Code (IaC): Lead the development of Terraform modules to maintain a declarative, version-controlled global environment.
Reliability & Observability: Define SLIs, SLOs, and SLAs; lead blameless post-mortems and automate incident response to ensure 24/7 availability.
Toil Reduction: Use Python, Go, or Bash to automate repetitive operational tasks, turning manual work into scalable code.
DevSecOps Integration: Bake security (secrets management, SAST/DAST, and compliance) directly into the CI/CD pipelines (Bitbucket/ArgoCD).

Who You Are (Must-Haves)

5+ Years Cloud Experience: Proven track record in a major cloud environment (AWS, Azure, or GCP).
K8s Expert: Deep hands-on experience designing and operating Kubernetes workloads at scale.
IaC Native: Expert-level proficiency with Terraform. You are comfortable building and managing complex, reusable modules.
Helm Specialist: Significant experience using Helm for Kubernetes deployments and CI/CD management.
DevOps/DevSecOps Foundation: You have owned CI/CD pipelines and integrated security tooling as a core responsibility.
Polyglot Scripter: Professional experience in at least two distinct languages (e.g., Python, Java, Go, Bash, or Ruby).
SRE Mindset: You understand the core tenets of reliability: observability, chaos engineering, and aggressive automation.

Highly Desirable

GitOps Practitioner: Hands-on experience with ArgoCD or similar declarative CD tooling.
Sector Experience: A background in Financial Services or highly regulated environments where security is paramount.
SDLC Insight: A solid understanding of the full software lifecycle, from design to "Day 2" operations.

Why Apply?

Market Leader: Join the most successful AI business in Australia at the exact moment they go global.
Security-First Culture: Work in an environment where "Security" isn't a checkbox—it's the product.
Tech Stack: Work with a modern, clean stack including K8s, Terraform, Helm, ArgoCD, and Bitbucket.
Impact: Your work directly enables the deployment of the next generation of autonomous AI agents.

DeVision Recruitment

DeVision Recruitment

Site Reliability Engineer (SRE)

Information & Communication Technology / Other

Applications open to:

3 job(s) found from DeVision Recruitment

Tags

Individuals

Employers

Service providers

Workinitiatives

Sign up for our newsletter ‘The Initiative’

Follow us