Senior Software Engineer – AI Systems - 219502

Job Link Copied

Full Time

Hybrid

Hyderabad, Telangana, India

Posted within last 24 Hours

Our Company:

At Teradata, we believe that people thrive when empowered with better information. That’s why we built the most complete cloud analytics and data platform for AI. By delivering harmonized data, trusted AI, and faster innovation, we uplift and empower our customers—and our customers’ customers—to make better, more confident decisions. The world’s top companies across every major industry trust Teradata to improve business performance, enrich customer experiences, and fully integrate data across the enterprise.

What you will do:

In this role you will lead a critical and highly visible function within Teradata Vantage platform. You will be given the opportunity to autonomously deliver the technical direction of the service, and the feature roadmap. You will work with extraordinary talent and have the opportunity to shape the team to best execute on the product.

Job Responsibilities:

Design, develop, and scale intelligent software systems that power autonomous AI agents capable of reasoning, planning, acting, and learning in real-world environments.
Lead the implementation of core Agentic AI components — including agent memory, context-aware planning, multi-step tool use, and self-reflective behavior loops.
Architect robust, cloud-native backends that support high-throughput agent and model pipelines across major Cloud Service Providers (AWS, Azure, GCP), ensuring best-in-class observability, fault tolerance, and scalability.
Take part in the development of the Teradata AI Factory platform for training, fine-tuning, and serving state-of-the-art AI models with maximum performance and efficiency.
Design and build solutions for scheduling and orchestrating large-scale AI training and inference workloads on GPU clusters, ensuring optimal resource utilization and throughput.
Build seamless integrations with large language models (LLMs) such as GPT-4, Claude, Gemini, or open-source models — using advanced techniques like function calling, dynamic prompting, and multi-agent orchestration.
Design and implement standardized context management and sharing using the Model Context Protocol (MCP) to enable consistent, interoperable agent and tool interactions.
Develop scalable APIs and services connecting agents and models with internal tools, vector databases, RAG pipelines, and external APIs.
Explore and develop solutions for complex distributed AI challenges such as industry-scale resource management, GPU scheduling, performance prediction, and live workload migration.
Collaborate across hardware, software, and research teams to deliver end-to-end AI infrastructure, while mentoring peers and promoting best engineering practices.
Define and implement testing strategies to validate both deterministic and probabilistic agentic behavior.
Continuously evaluate emerging frameworks, libraries, and research to drive innovation in our Agentic AI and AI Infrastructure stack.
Own technical delivery of major features, leading design reviews, ensuring code quality, and driving engineering excellence across the team.

What makes you a qualified candidate:

5+ years of hands-on experience in backend development, distributed systems, or AI infrastructure, with a proven record of delivering high-scale, production-grade systems.
Strong expertise in AI/ML training and inference, with experience deploying models at scale and applying them to real-world use cases such as chatbots, RAG pipelines, and vector search.
Expertise in building and deploying AI-integrated software, MCP, particularly with LLMs and frameworks such as LangChain, LangGraph, AutoGen, CrewAI, Semantic Kernel, or custom orchestrators.
Hands-on experience training or fine-tuning generative AI models on large-scale GPU clusters, with familiarity in resource scheduling and distributed job orchestration.
Strong development skills in Python, Go, Java, or similar languages used in intelligent system design.
Practical understanding of agentic AI principles — including task decomposition, autonomous decision-making, memory/context management, and multi-agent collaboration.
Familiarity with deep learning frameworks such as PyTorch, TensorFlow, JAX, TRT-LLM, vLLM, or SGLang, and with inference-serving frameworks like Triton Inference Server, TensorRT, ONNX Runtime, or
Strong background in DevOps/MLOps technologies including Docker, Kubernetes, Terraform, and Ansible, with experience managing GPU-based cloud infrastructure.
Knowledge of vector databases (Pinecone, Weaviate, FAISS) and embedding models for semantic search and retrieval-augmented generation (RAG).
Proven ability to design clean APIs, modular microservices, and scalable, maintainable backend systems.
Clear communicator who can translate complex AI system behaviors into practical architectures.
Passion for AI innovation and a drive to build systems that push the boundaries of autonomous and intelligent software.
Strong understanding of Agile software development, CI/CD, and collaborative engineering workflows.

What you will bring:

BS or MS degree in Computer Science, Artificial Intelligence, Software Engineering, or a related technical field.
A solid foundation in software engineering principles, including system design, data structures, algorithms, and distributed computing.
Proven ability to work in a fast-paced, innovation-driven environment where engineers take full ownership from concept to deployment.
Experience deploying and operating intelligent systems in production environments with live users and evolving requirements.
Curiosity, creativity, and the mindset of a builder — someone who thrives at the intersection of AI research and real-world impact.
Desire to help shape the future of software agents by building scalable, reliable, and intelligent backends that unlock new capabilities in autonomy and adaptability.

#LI-AN1

Why We Think You’ll Love Teradata We prioritize a people-first culture because we know our people are at the very heart of our success. We embrace a flexible work model because we trust our people to make decisions about how, when, and where they work. We focus on well-being because we care about our people and their ability to thrive both personally and professionally. We are committed to actively working to foster an inclusive environment that celebrates people for all of who they are.