Job Title: ML Ops Tech Lead
Location: Remote
Employment Type: Full-time
Within this global organisation, they are pushing the boundaries of applied AI and deep learning to power smarter, faster, and more scalable solutions.
They are looking for an ML Ops Tech Lead who thrives on building production-ready ML infrastructure, and who can bring expertise in Google Cloud Platform (GCP), deep learning, and Python to drive the machine learning systems forward.
The Role
As our ML Ops Tech Lead, you’ll design and oversee the end-to-end ML operations strategy—from experimentation to deployment at scale. You’ll collaborate closely with data scientists, ML engineers, and product teams to ensure our deep learning models are trained, deployed, and monitored with speed, reliability, and efficiency.
What You’ll Do
- Lead the design and implementation of ML pipelines on GCP (Vertex AI, BigQuery, Dataflow, GKE).
- Build scalable workflows for training, testing, deployment, and monitoring of deep learning models.
- Establish best practices for reproducibility, observability, and automation in ML lifecycles.
- Collaborate with data scientists to productionize Python-based deep learning models (TensorFlow, PyTorch).
- Implement CI/CD pipelines and infrastructure-as-code (Terraform, Helm) for ML workloads.
- Monitor model performance, detect drift, and optimize cost efficiency in cloud environments.
- Mentor a team of ML Ops engineers and foster a culture of innovation and excellence.
What they’re Looking For
- 5+ years in software engineering, DevOps, or data engineering, with 2+ years focused on ML Ops.
- Strong experience with GCP services for machine learning (Vertex AI, Dataflow, GKE, BigQuery).
- Hands-on expertise with deep learning frameworks (TensorFlow, PyTorch) and Python.
- Proven track record deploying ML models at scale in production environments.
- Strong knowledge of containerization and orchestration (Docker, Kubernetes).
- Familiarity with ML workflow tools (Kubeflow, MLflow, Airflow, or similar).
- Excellent leadership, problem-solving, and communication skills.
Nice-to-Have
- Experience with distributed training and hyperparameter optimization at scale.
- Exposure to monitoring/model observability platforms (Weights & Biases, Prometheus, Grafana).
- Background in high-throughput data systems (Spark, Dataproc, or Databricks).
Why Join Us?
- Own the ML Ops vision at a company building cutting-edge AI products.
- Work on high-impact deep learning projects with a world-class team.
- Competitive salary, equity options, and benefits package.
- Flexible, collaborative, and innovation-driven environment.