Technical Incident Manager
Location: MUNICH, Bavaria DE
Requisition Number: 203423
Position Title: CSR Manager (II)
Teradata Global Support Organization is expanding and building capabilities to further drive our company and function’s transformation. We are seeking a highly-skilled and technology focused leader to drive the evolution of Teradata’s Critical Incidents Technical Response Group handling all deployments types.
As the Technical Incident Manager (TIM) you are responsible for major incident management. In this role, you will be the key decision maker and authority to direct the problem resolution path for fastest restoration to service. As Major Incident Manager you are responsible for managing restoration of an impacted service affected by real or potential interruptions which may have an impact upon the quality or availability of that service. When a major or critical incident occurs, the right technical resources will be activated, you technically lead major Incident calls, determine the client impact, agree on resolution actions with everybody involved, manage the technical communication channel for focus on return-to-service. This will include managing technical sub-channels with tech leads who will take point for sub-channels and isolate issues contributing to return-to-service. The TIM will work hand-in-hand with Incident Communication Manager (ICM) who is responsible for internal and external communication leaving the TIM focused on return to service.
The Major Incident Manager is responsible for the quality and integrity of the Major Incident Management process and is the interface to the other process managers.
This position will be located in London, England but will consider other virtual locations in the EMEA region for the right candidate. This is a fast-paced high-tech environment and may require extended hours and after-hours follow up given the nature of the changes occurring 7x24.
Key Responsibilities of the TIM / Teradata Critical Support Office:
- Handle severity 1s, critical customers, and complex problems impacting customer systems
- Central-point for incident declaration, classification (priority) and triggers for SWARM
- Ensuring appropriate skill sets and incoming signals are aligned with SWARM triggers and automated with tools for notifications (this includes Security teams, Vendor teams, and, in some cases, client technical teams may also be involved)
- Technically leads all aspects of critical incidents focused on fastest service restoration/recovery using a SWARM approach – bridge, SLACK channels, sync-points for sub-tech teams leading investigations (including 3rdparty vendors and cloud providers).
- Responsible for the quality and integrity of Major Incident Management process and is the interface with SWARM members, communication manager, and problem manager.
- Support all SWARM activities globally when problems occur requiring deep technical and problem resolution skills of the team, this may include across regions working with other TIMs to support 24x7 coverage.
- Performing post Major Incident follow up via Post Incident Review on Major Incidents Post Mortem in concert with Problem Managers
- Enforce regular and systemic process control mechanisms to improve SWARM
- Participate in proactive design of new and innovative ways of simplifying future SWARM activities
- Clearly identifies and drives backlog prioritization for availability and reducing mean-time-between-failures with appropriate development teams
- Success factors and metrics will be visibly focused on mean-time-to-restore-service (MTTRS) and mean-time-between-failure (MTBF) and underlying KPIs in the SWARM, RCA, and improvement backlogs.
- Acts as the voice/conscience of the customer experience, and administers problem solving with customer advocacy front of mind.
Key Functions of the Technical Incident Manager:
- Communicateand advocate effectively with all levels of roles, in all geographies, across the entire company.
- Assumeleadership responsibility during a S1 to direct the SWARM team as they work towards service restoration
- LeadS1 team calls, determine SMEs needed, identify problem and release/deescalate after diagnosis
- Ensureincident management processes is efficient and automated for triggers, data collections, diagnostics, streamline artifact into incident including timelines and decision trees.
- EnsureSWARM team meets resolution specifications as designed in the SLA while also enabling reduction of mean time to resolution
- Participatein with problem managers as required to evolve monitoring/logging systems and appropriate development teams.
- Identifyfailure points driving availability and accelerating mean-time-to-repair including architectures, design, process improvements, software disciplines, test, etc.…
- Interactfrequently with various stakeholders across the organization to prioritize backlog for availability as required.
- Buildstrong internal and external relationships with technical teams, customers and third parties
- Serveas a key contributor to post-mortem reviews as a SME
- Customer Advocate- focus on what is deemed to be best outcome for the customer
- Is NOT responsible for communication planning or execution to the internal or external stakeholders; actively participates and interfaces with the Incident Communication Manager as needed
Skills & Qualifications
- Demonstrated strategic and tactical thinking, quantitative and analytical skills, while under pressure
- Knowledge and exposure with distributed systems across hyper-scale, cloud-based environments
- Working knowledge of physical IT infrastructures such as Enterprise Server Platforms and related IT architectures and equipment
- Solid understanding of large-scale networking, including OSI Model, DNS, WINS, TCP/IP, VLANs, DHCP, Routing, ACLs, switching protocols, etc.
- Understanding and knowledge of physical data centers and their related infrastructure or resources such as power, rack space, CE Infrastructures (e.g. UPS, Generators, AHU) etc.
- Flexibility and willingness to support a 24x7 global operation via off-hours support, on-call availability, or other as needed per rhythm and needs of the business
- Working knowledge of ITIL incident, problem, and change management components
- Excellent problem resolution, judgment, negotiation and decision-making skills
- Practical experience with incident/outage and crisis management
- Ability to balance competing demands for resources and adapt to changing priorities
- Excellent written and oral communication skills; with special focus on customer/client level interaction
- Operations experience in a 24x7x365 support model (NOC experience beneficial)
Preferred Skills & Experience
- BS in Computer Science, math or equivalent education or experience with database development and applications, data warehousing operations, and analytical software applications or ecosystems.
State: Greater London
Community / Marketing Title: Technical Incident Manager
Job Category: Customer Support
With all the investments made in analytics, it’s time to stop buying into partial solutions that overpromise and underdeliver. It’s time to invest in answers. Only Teradata leverages all of the data, all of the time, so that customers can analyze anything, deploy anywhere, and deliver analytics that matter most to them. And we do it at scale, on-premises, in the Cloud, or anywhere in between.
We call this Pervasive Data Intelligence. It’s the answer to the complexity, cost, and inadequacy of today’s analytics. And it's the way Teradata transforms how businesses work and people live through the power of data throughout the world. Join us and help create the era of Pervasive Data Intelligence.
Location_formattedLocationLong: MUNICH, Bavaria DE