Job Title |
Site Reliability Engineer
|
Relevant Experience (in Yrs) |
5 |
Technical/Functional Skills |
Power BI; SQL; Cosmos; Digital : Kubernetes; Digital : Microsoft Power BI; Digital : Azure Databricks |
Experience Required |
5 |
Roles & Responsibilities |
SRE Key Responsibilities:
• Collaborate with cross-functional teams to design, implement, and maintain highly available and scalable production systems. • Monitor system performance, identify bottlenecks, and proactively take action to prevent downtime and ensure optimal user experience. • Implement automation for provisioning, deployment, and configuration management to increase efficiency and reduce manual intervention. • Participate in incident response and post-incident analysis, driving continuous improvement in system reliability and recovery processes. • Conduct capacity planning and performance testing to ensure our systems can handle anticipated growth and unexpected traffic spikes. • Troubleshoot complex technical issues across the entire technology stack, from application code to infrastructure. • Drive the adoption of best practices in software development, system architecture, and infrastructure management. • Collaborate with development teams to improve application reliability, performance, and observability through code reviews and guidance. • Contribute to the on-call rotation and actively engage in identifying and addressing root causes of incidents. • Stay up to date with industry trends, emerging technologies, and SRE best practices, and bring fresh id eas to the team.
Qualifications: • Proficiency in at least one programming language Powershell , C# etc. • Strong experience with cloud platforms (Azure) and containerization technologies (Docker, Kubernetes). • Solid understanding of networking concepts, protocols, and security principles. • Experience with configuration management tools and infrastructure-as-code practices. • Familiarity with Azure monitoring and observability tools • Ability to analyze complex systems and troubleshoot issues systematically. • Excellent communication skills and ability to work collaboratively in a team-oriented environment. • Prior experience with incident response, on-call rotations, and incident management is a plus. • Relevant certifications such as DevOps Engineer, Azure Administration, or equivalent certifications are a plus. |
Generic Managerial Skills |
Digital : Microsoft Azure; Digital : Docker; PostgreSQL |
#LI-NS
More Information
Application Details
-
Organization Details
TCS / Tata Consultancy Services
Recommended Comments
There are no comments to display.
Join the conversation
You are posting as a guest. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.