Skip to content

JobShark: Find the Right Job

 

San Francisco, CA Full Time Posted by: Greylock Partners Posted: 25/01/2026 17:12:46
 
 

Company: One of our consumer AI investments is hiring an ML Infrastructure Engineer. The founding team helped build iconic consumer products at Pinterest, Lyft, Whatnot, Grubhub, and others. We led the Seed round, and the company is finalizing a Series A with another tier-1 VC.

Team size: 20+ (and growing)

The opportunity: You'll be an early ML Infra hire helping scale training and inference systems that directly power a consumer product. This is a hands-on role with real ownership: reliability, performance, cost, and developer velocity.

What you'll do:

  • Build and operate large-scale ML infrastructure for training and inference (GPU compute, orchestration, model serving)
  • Own core systems for data and feature pipelines, model artifacts, experiment tracking, and evaluation
  • Improve observability and reliability across ML and production systems (monitoring, debugging, incident response, SLAs)
  • Drive efficiency across the stack (latency, throughput, cost, capacity planning)
  • Partner closely with researchers/ML engineers/product engineers to unblock shipping

Role (key traits):

  • 3+ years building large-scale distributed systems infrastructure; 2+ years building modern ML systems in industry: ML training and inference systems, large-scale distributed infrastructure, GPU-backed compute, Kubernetes and cloud infrastructure, production reliability, Python + a systems language, PyTorch or TensorFlow
  • BSCS or equivalent (MSCS preferred)
  • Strong coding and systems fundamentals (production-grade engineering, not just prototypes)

Good fit for this role if you have:

  • Owned production ML training and/or serving infrastructure end-to-end (not just model development)
  • Experience with distributed compute and data systems (scheduler/orchestration, storage, streaming/batch, reliability)
  • Comfort operating systems in production: on-call mindset, instrumentation, performance tuning, cost discipline
  • Built internal platforms that improved ML developer velocity (reusable pipelines, tooling, paved roads)
  • Enjoyed ambiguity while moving quickly with a small team, and you like building the system as much as the model"

Not a good fit for this role if you're primarily:

  • Focused on research or model architecture work and don't want to own production systems
  • Looking for a role with minimal on-call/ops responsibilities or limited accountability for reliability
  • Experienced only with notebooks/prototypes and not with production-grade infra and debugging
  • Targeting more specialized scope rather than broad ownership across the ML stack

About Us:

Greylock is an early-stage investor in hundreds of remarkable companies including Figma, Rubrik, Airbnb, LinkedIn, Dropbox, Workday, Cloudera, Facebook, Instagram, among others. More can be found about us here: https://greylock.com/

And, as full-time, salaried employees of Greylock, we provide free candidate referrals/introductions to our active investments to help them grow/succeed (as one of the many services we provide). Please note: We are not recruiting for any roles within Greylock at this time - only for openings within our portfolio.

San Francisco, CA, United States of America
IT
Click apply
JS26489_25304_58CB408E6D263BEAB498E5BC4C3FCAEE
25/01/2026 17:12:46
We strongly recommend that you should never provide your bank account details to an advertiser during the job application process. Should you receive a request of this nature please contact support giving the advertiser's name and job reference.