Lead Technical Incident Manager
Location: London, Greater London GB
Requisition Number: 203527
Position Title: CSR Director
Teradata Global Support Organization is expanding and building capabilities to further drive our company and function’s transformation. We are seeking a highly-skilled and technology focused leader to drive the evolution of Teradata’s Critical Support Office handling all deployments types.
As the Technical Incident Manager (TIM) Lead you are responsible for critical incident management. In this role, you will be the key decision maker and authority to direct the problem resolution path for fastest restoration to service. As Technical Incident Manager Lead you are responsible for managing restoration of an impacted service affected by real or potential interruptions which may have an impact upon the quality or availability of that service. When a critical incident occurs, the right technical resources will be activated, you technically lead major Incident calls, determine the client impact, agree on resolution actions with everybody involved, manage the technical communication channel for focus on return-to-service. This will include managing technical sub-channels with tech leads who will take point for sub-channels and isolate issues contributing to return-to-service. The TIM will work hand-in-hand with Incident Communication Manager (ICM) who is responsible for internal and external communication leaving the TIM focused on return to service.
The Technical Incident Manager is responsible for the quality and integrity of the Critical Incident Management process and is the interface to the other process managers. In addition to roles and responsibilities required of a TIM team member, the Technical Lead will have additional responsibilities when it comes to proving guidance to other team members, cross-organizational activities and developing the new incident handling processes.
This position will be located in London, England but will consider virtual locations in EMEA region for the right candidate. This is a fast-paced high-tech environment and may require extended hours and after-hours follow up given the nature of the changes occurring 7x24.
Core capabilities for Technical Lead Role:
- TIM lead role covers a wide range of activities that require complex judgments and solutions based on analytical though comparing and selecting complex alternatives.
- Broad working knowledge of Teradata database architecture, feature functionality, networking concepts, cloud infrastructure and enterprise server/storage components.
- Ability to leverage holistic knowledge of the Teradata product architecture to quickly identify appropriate subject matter experts in the support and Product Development organization to resolve critical incidents based on problem type and affected subsystem/component.
- Effective in communicating and collaborating across organizations to influence strategic investments around availability and supportability.
- Demonstrated communication skills to negotiate internally and provide input to customer facing roles to address complex problems.
- Capable of leading technical resources across functional areas and aligning senior management support where required.
- In-depth understanding of how Critical Support Office integrates across regional customer facing as well as technology groups.
Key Responsibilities of the TIM Lead / Teradata Critical Support Office:
- Charged with initially formalizing process, procedures for the new organization based on direction from the Senior Director, CSO organization.
- Guiding the organization to instrument key metrics around availability and interpreting results to aid in improving organizational process improvements.
- Responsible for ensuring TIMs in all three global regions follow consistent guidelines, procedures and problem-solving methodologies to ensure consistent and expedient return to service.
- Providing technical insight from critical incidents to Product Development via Problem Resolution process to design in availability as a core feature.
- Escalation point for global TIM team to provide technical guidance during to support and engineering staff based on Teradata product architecture, current state of the customer system, support & engineering analysis of the problem and customer requirements for return to service.
- Role as CSO liaison with other CS and Product Development organizations, driving collaboration to improve company-wide culture around engaging critical incidents and improving customer experience.
- Central member to the regional VP extended staff to better align critical customer support with region field roles
- Handle severity 1s, critical customers, and complex problems impacting customer systems
- Central-point for incident declaration, classification (priority) and triggers for SWARM
- Ensuring appropriate skill sets and incoming signals are aligned with SWARM triggers and automated with tools for notifications (this includes Security teams, Vendor teams, and, in some cases, client technical teams may also be involved)
- Technically leads all aspects of critical incidents focused on fastest service restoration/recovery using a SWARM approach – bridge, SLACK channels, sync-points for sub-tech teams leading investigations (including 3rd party vendors and cloud providers).
- Responsible for the quality and integrity of Critical Incident Management process and is the interface with SWARM members, communication manager, and problem manager.
- Support all SWARM activities globally when problems occur requiring deep technical and problem resolution skills of the team, this may include across regions working with other TIMs to support 24x7 coverage.
- Performing post Critical Incident follow up via Post Incident Review on Critical Incidents Post Mortem in concert with Problem Managers
- Participate in proactive design of new and innovative ways of simplifying future SWARM activities
- Clearly identifies and drives backlog prioritization for availability and reducing mean-time-between-failures with appropriate development teams
- Success factors and metrics will be visibly focused on mean-time-to-restore-service (MTTRS) and mean-time-between-failure (MTBF) and underlying KPIs in the SWARM, RCA, and improvement backlogs.
- Acts as the voice/conscience of the customer experience, and administers problem solving with customer advocacy front of mind.
Key Functions of the Technical Incident Manager Lead:
- Empowered to ensure that the Technical Incident Manager teams are working effectively and efficiency internally as well as externally with Product Development and global support associates
- Communicate and advocate effectively with all levels of roles, in all geographies, across the entire company.
- Assume leadership responsibility during a S1 to direct the SWARM team as they work towards service restoration
- Lead S1 team calls, determine SMEs needed, identify problem and release/deescalate after diagnosis
- Ensure incident management processes is efficient and automated for triggers, data collections, diagnostics, streamline artifact into incident including timelines and decision trees.
- Ensure SWARM team meets resolution specifications as designed in the SLA while also enabling reduction of mean time to resolution
- Participate in with problem managers as required to evolve monitoring/logging systems and appropriate development teams.
- Identify failure points driving availability and accelerating mean-time-to-repair including architectures, design, process improvements, software disciplines, test, etc.
- Interact frequently with various stakeholders across the organization to prioritize backlog for availability as required.
- Build strong internal and external relationships with technical teams, customers and third parties
- Serve as a key contributor to post-mortem reviews as a SME
- Customer Advocate - focus on what is deemed to be best outcome for the customer
- Is NOT responsible for communication planning or execution to the internal or external stakeholders; actively participates and interfaces with the Incident Communication Manager as needed
Skills & Qualifications
- Demonstrated strategic and tactical thinking, quantitative and analytical skills, while under pressure
- Knowledge and exposure with distributed systems across hyper-scale, cloud-based environments
- Strong analytical acumen, communication and presentation skills necessary for relationship building with business leaders and teams across to deliver timely collaborative solutions.
- Working knowledge of physical IT infrastructures such as Enterprise Server Platforms and related IT architectures and equipment
- Solid understanding of large scale networking, including OSI Model, DNS, WINS, TCP/IP, VLANs, DHCP, Routing, ACLs, switching protocols, etc.
- Understanding and knowledge of physical datacenters and their related infrastructure or resources such as power, rack space, CE Infrastructures (e.g. UPS, Generators, AHU) etc.
- Flexibility and willingness to support a 24x7 global operation via off-hours support, on-call availability, or other as needed per rhythm and needs of the business
- Working knowledge of ITIL incident, problem, and change management components
- Excellent problem resolution, judgment, negotiation and decision-making skills
- Practical experience with incident/outage and crisis management
- Ability to balance competing demands for resources and adapt to changing priorities
- Excellent written and oral communication skills; with special focus on customer/client level interaction
- Operations experience in a 24x7x365 support model (NOC experience beneficial)
Preferred Skills & Experience
- 10+ years of experience in software/hardware development or key technical support roles demonstrating increasing technical ability and leadership responsibility.
- BS in Computer Science, math or equivalent education with experience database development and applications, data warehousing operations, and analytical software applications or ecosystems.
State: Greater London
Community / Marketing Title: Lead Technical Incident Manager
Job Category: Customer Support
With all the investments made in analytics, it’s time to stop buying into partial solutions that overpromise and underdeliver. It’s time to invest in answers. Only Teradata leverages all of the data, all of the time, so that customers can analyze anything, deploy anywhere, and deliver analytics that matter most to them. And we do it at scale, on-premises, in the Cloud, or anywhere in between.
We call this Pervasive Data Intelligence. It’s the answer to the complexity, cost, and inadequacy of today’s analytics. And it's the way Teradata transforms how businesses work and people live through the power of data throughout the world. Join us and help create the era of Pervasive Data Intelligence.
Location_formattedLocationLong: London, Greater London GB