Be aware of scams on social media involving phony job postings. Learn more


Associate DevOps Engineer - 218803

Full Time
Hybrid

Pune, Maharashtra, India | Karnataka, India

Posted 1 day ago

Our Company 

At Teradata, we believe that people thrive when empowered with better information. That’s why we built the most complete cloud analytics and data platform for AI. By delivering harmonized data, trusted AI, and faster innovation, we uplift and empower our customers—and our customers’ customers—to make better, more confident decisions. The world’s top companies across every major industry trust Teradata to improve business performance, enrich customer experiences, and fully integrate data across the enterprise. 

What You’ll Do 

  • Keep the Lights On: Ensure the continuous availability and performance of the LRaaS Portal. 
  • Oncall Improvement: Participate in on-call duties, handling escalated support for production customers and systems. 
  • Ticket Management: Manage open and unresolved tickets during on-call shifts, ensuring smooth handovers and effective communication with peers. 
  • Release Management Automation: Automate release management processes to enhance efficiency and reduce manual interventions. 
  • Incident Resolution: Reduce meantime to resolve incidents by leveraging automation and AI-enabled solutions. 
  • Documentation and Automation: Improve automation and maintain comprehensive documentation, including policies, standards, procedures, and run books. 
  • Alert Management: Deduplicate alerts and reduce false positives to ensure accurate monitoring. 
  • SRE Principles: Apply Site Reliability Engineering (SRE) principles to drive reliability initiatives, including defining and monitoring Service Level Objectives (SLOs) and Error Budgets. 
  • Collaboration: Work with monitoring and operations teams to ensure the availability, performance, and scalability of the infrastructure and applications. 
  • Continuous Improvement: Identify and implement opportunities to enhance system performance, reliability, and security through automation and optimization. 
  • Provisioning Process Improvement: Analyze and improve existing provisioning processes for automation opportunities. 

Who You’ll Work With 

  • Cross-Functional Teams: Collaborate with developers, QA engineers, product managers, and stakeholders to address SRE and observability issues encountered via ServiceNow tickets. 
  • Cloud Operations Team: Manage site reliability for cloud operations across AWS, Azure, and GCP. 
  • Reporting: Report to the Senior Manager. 

What Makes You a Qualified Candidate 

  • Experience: 2+ years of relevant job experience in SRE or a similar role. 
  • Education: Bachelor's Degree in Computer Science or a related field preferred. 
  • Scripting Knowledge: Proficiency in scripting languages like Python, Ruby, or Bash. 
  • Hybrid Environment Experience: Experience working in hybrid environments preferred. 
  • Cloud Platforms: Hands-on experience with at least one major cloud platform (Google Cloud, Azure, AWS), with Google Cloud and Azure highly preferred. 
  • Configuration Management: Experience with tools like Ansible or equivalent technologies. 
  • Build Systems: Strong experience with test and build systems such as Jenkins, GitLab, GitHub. 
  • Monitoring Tools: Proficiency with monitoring and reporting tools such as DataDog, New Relic, Nagios, and Graphite. 
  • Linux Systems: Strong experience with Linux operating systems. 
  • Database Systems: Experience working with database systems, network topologies, and hardware. 

What You’ll Bring 

  • SRE Principles: Ability to apply SRE principles to drive reliability engineering initiatives, including defining and monitoring SLOs and Error Budgets. 
  • Oncall Expertise: Capability to handle on-call duties, improve on-call processes, and manage ticket handovers effectively. 
  • Automation Skills: Expertise in automating release management and improving automation for L1 and L2 level issue resolution. 
  • Documentation: Proficiency in developing and maintaining comprehensive documentation. 
  • Proactive Alerting: Drive improvement in proactive alerting using modern monitoring tools. 
  • System Monitoring: Enhance system monitoring and observability through log analysis, dashboard creation, and automated alerts. 
  • Collaboration: Work closely with monitoring and operations teams to ensure system reliability. 
  • Continuous Improvement: Identify and implement opportunities for system performance, reliability, and security enhancements. 
  • Security Assessments: Perform security assessments, vulnerability scans, and implement remediation actions. 
  • Security Documentation: Develop and maintain security-related documentation, including policies, standards, and procedures. 

Good to Have 

  • Security Assessments: Perform security assessments, vulnerability scans, and observability analysis, and implement remediation actions. 
  • Security Documentation: Develop and maintain security-related documentation, including policies, standards, and procedures. 

#LI-SK3

 

Why We Think You’ll Love Teradata We prioritize a people-first culture because we know our people are at the very heart of our success. We embrace a flexible work model because we trust our people to make decisions about how, when, and where they work. We focus on well-being because we care about our people and their ability to thrive both personally and professionally. We are an anti-racist company because our dedication to Diversity, Equity, and Inclusion is more than a statement. It is a deep commitment to doing the work to foster an equitable environment that celebrates people for all of who they are.

.