Jump to content

Senior Compute Cluster Deployment Engineer - Israel, Yokneam

3 days ago


 Share

Job Opportunity Details

Type

Full Time

Salary

Not Telling

Work from home

No

Weekly Working Hours

Not Telling

Positions

Not Telling

Working Location

Israel, Yokneam, Israel, Israel   [ View map ]

NVIDIA is looking for a hardworking Senior Compute Cluster Deployment Engineer to join our Professional Services team.

You'll join a small team working around the globe to build some of the most cutting-edge Datacenters in the world. This role will focus on working to deploy server and compute clusters built with brand new GPU platforms responsible for AI and Machine Learning. You'll be working with some of the world's largest and most sophisticated customers and supercomputers. You'll work alongside our Infiniband and Ethernet network engineers to deploy a complete solution for customers looking to adopt NVIDIA solutions into their business.

Opportunities for global travel and learning about the newest GPU-related technologies are plentiful as we seek to build, shape and expand this new aspect of our business.

What you will be doing:

  • Primary responsibilities will include managing and maintaining AI/HPC infrastructure in Linux-based environments for new and existing customers.

  • Support operational and reliability aspects of large scale AI clusters with focus on performance at scale, real time monitoring, logging and alerting

  • Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation and refinement.

  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health

  • Provide feedback into internal teams such as opening bugs, documenting workarounds, and suggesting improvements.

  • Be part of an on call rotation to support production systems

What we need to see:

  • 5+ years providing in-depth support and deployment services, solving problems for hardware and software products.

  • Knowledge and experience with Linux System Administration, process management, package management, task scheduling, kernel management, boot procedures/troubleshooting, performance reporting/optimization/logging, network-routing/advanced networking (tuning and monitoring).

  • Cluster management technologies, EX: Bright Cluster Manager

  • Scripting proficiency.

  • Good social skills with the ability to maintain and deliver resolutions for customer blocking issues as they arise.

  • Superb communication and presentation/oral skills.

  • Excellent verbal and written English skills.

  • Strong organizational skills and ability to prioritize/multi-task easily with limited supervision.

  • Candidates should have a minimum of a four-year degree from an accredited university or college in Computer Science, or Electrical or Computer Engineering.

  • Industry-standard Linux certifications.

Ways to stand out of a crowd:

  • InfiniBand experience.

  • Experience with GPU focused hardware/software.

  • Experience with MPI.

  • Automation tooling background (Ansible, Salt, Puppet etc.).

  • Ethernet and Storage technologies.

Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com/.


More Information

Application Details

  • Organization Details
    Nvidia
 Share


User Feedback

Recommended Comments

There are no comments to display.

Join the conversation

You are posting as a guest. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...