Jump to content

Site Reliability Engineering (SRE) Manager - US, CA, Santa Clara

2 days ago


 Share

Job Opportunity Details

Type

Full Time

Salary

Not Telling

Work from home

No

Weekly Working Hours

Not Telling

Positions

Not Telling

Working Location

US, CA, Santa Clara, United States, United States   [ View map ]

We are seeking a seasoned Site Reliability Engineering (SRE) Manager to lead a team of SRE staff supporting a Network Automation team in a follow-the-sun support model. The team manages critical applications and infrastructure both Cloud and on prem for datacenter deployment and automated operations. This role is pivotal in not only the maintenance of resilient, scalable systems, but ownership of the general architecture of distributed systems, including a new DNS architecture and a distributed source of truth sync.

This role goes beyond an understanding of standard best practices and operations. We’re combining technical knowledge with development chops to come up with solutions at cloud scale. This is a new team operating in an exciting, groundbreaking environment, and we want you to help us shape it.

What you will be doing:

  • Team Leadership: Manage and mentor a team of Automation SREs, fostering a culture of collaboration, innovation, and excellence in execution.
  • Technical Guidance: Own technical decisions for the team, ensuring alignment with developers and employing industry standard methodologies
  • Operational Excellence: Implement and maintain robust operational practices, including incident management, monitoring, alerting, and capacity planning
  • Shift Scheduling: Coordinate follow-the-sun support across global time zones, ensuring 24/7 coverage and efficient handovers
  • Project Management: Lead initiatives related to the design, deployment, and maintenance of critical infrastructure components
  • Release Management: Oversee release processes and ensure smooth deployments, minimizing downtime and impact on users
  • Root Cause Analysis: Conduct thorough post-incident reviews, identifying root causes and implementing preventive measures

What we need to see:

  • 8+ years of experience in the industry, with a focus on Site Reliability Engineering, with a strong background in cloud service providers, ISPs, or similar service-oriented networking companies
  • Technical Skills: Proficiency in managing distributed web infrastructures, designing scalable and resilient systems, and implementing network automation
  • Leadership: Proven track record of managing technical teams, including performance management, career development, and hiring - 2+ yrs of management experience
  • Problem Solving: Demonstrated ability to conduct detailed root cause analysis and drive improvements based on findings
  • Communication: Excellent verbal and written communication skills, with experience presenting technical information to diverse audiences
  • Education: Bachelor’s degree in Computer Science, Engineering, or a related technical field, or relevant industry experience

If you are a strategic problem solver with a passion for leading high-performance teams in a dynamic and technically challenging environment, we encourage you to apply. Join us in shaping the future of our distributed systems and network automation infrastructure.

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. If you are creative and autonomous, we want to hear from you!

The base salary range is 164,000 USD - 258,750 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.


More Information

Application Details

  • Organization Details
    Nvidia
 Share


User Feedback

Recommended Comments

There are no comments to display.

Join the conversation

You are posting as a guest. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...