Chandra Alla — GenAI Engineer

About

Building AI that thinks and ships.

I'm pioneering AI-driven enterprise automation that transforms how large organizations operate. Currently at AT&T, I'm building systems that learn from human experts and autonomously execute complex workflows — turning what once took hours into automated processes that complete in minutes.

My work centers on multi-agent AI systems powered by LangGraph orchestration, Retrieval-Augmented Generation (RAG), and vector databases. When a domain expert records a workflow, my systems analyze it, generate semantic embeddings, and store it as reusable intelligence.

With deep expertise in production ML systems, GPU optimization, and distributed computing, I build scalable AI solutions that deliver measurable enterprise impact at the intersection of cutting-edge research and real-world deployment.

Email GitHub LinkedIn Scholar (945) 207-0833

Education

Master of Science in Computer Science

University of Texas at Arlington

Advanced Machine Learning · Data Structures & Algorithms · Big Data · AI · Cloud Computing · Neural Networks

Graduate Diploma in Deep Learning

University of Texas at Arlington

Neural Networks · Computer Vision · Data Analysis & Modeling Techniques

B.Tech in Computer Science and Engineering

REVA University

Experience

Professional history.

GenAI Engineer

AT&T AI Foundry

Built production-grade GenAI systems spanning enterprise workflow automation and schema-aware NL-to-SQL intelligence for internal business users.
Designed LangGraph orchestration, dynamic tool binding, human gating, evaluation pipelines, and failure recovery for real enterprise workflows.
Shipped systems with async control flow, observability, and business-facing outputs over internal tools, PostgreSQL metadata, and Snowflake data.

Selected AT&T Systems

Enterprise Agent Platform

AI Agentic Foundry

Production-grade multi-agent orchestration platform that automates enterprise workflows with stateful execution, tool calling, human-in-the-loop gating, and self-healing remediation.

85+ workflows Interrupt / resume Async gates

Problem

Enterprise workflows are multi-step, non-deterministic, dependent on multiple systems, and prone to failures from bad inputs, API issues, or approval requirements. Rule-based automation breaks quickly on these edge cases.

Solution

Designed a LangGraph-based 7-stage pipeline that decomposes requests into specialized agent responsibilities, isolates context, binds tools dynamically, pauses for approvals, and resumes after human or system events.

Architecture

Retrieval Planning Validation Execution Remediation Gates Summary

Highlights

Dynamic tool registry with runtime discovery via /v1/tools, category-based routing, and tighter execution accuracy.
Error remediation agent captures failed step context, tool responses, and error messages to generate corrective actions instead of naive retries.
Human, timed, and system gates implemented with LangGraph interrupt/resume patterns and Azure Durable Functions.

Impact

Enabled automation across multiple enterprise workflows with production-grade async orchestration.
Reduced manual intervention through self-healing execution and approval-aware control flow.
Improved determinism with context isolation, tool binding, and structured stage transitions.

Stack

Python FastAPI LangGraph LangChain OpenAI / Azure OpenAI PostgreSQL ChromaDB Docker AKS LangSmith

Enterprise Data Intelligence

Customer Insights Agent (CIA)

Natural-language intelligence layer over enterprise Snowflake data that lets business users query complex datasets without writing SQL, while still returning structured, business-ready answers.

23+ tables Schema-aware RAG 85+ golden cases

Problem

Business users do not know SQL, schema design, or join logic, while enterprise data is spread across 23+ related tables with non-trivial relationships and domain-specific business rules.

Solution

Built a LangGraph NL-to-SQL pipeline that retrieves schema context, resolves joins dynamically, triages unsupported requests, routes to an Ask Data API for SQL generation and execution, and summarizes the output for non-technical users.

Architecture

Setup Intent Table RAG Dataset Select Triage Enrichment SQL + Execute Summary Persist

Highlights

Schema-aware RAG over table metadata using pgvector with HNSW indexing for fast retrieval of relevant tables.
ER relationship model for keys like CUSTOMER_KEY, FAN_ID, and ACCT_ID to support multi-table reasoning.
Answerability gate catches unsupported or missing-data scenarios before wasting cost on downstream SQL generation.
Five-stage evaluation system with 85+ golden cases, per-table analysis, and LLM-as-judge scoring for SQL and answer quality.

Impact

Enabled non-technical users to query enterprise data without depending on analysts for every question.
Improved SQL reliability through schema retrieval, join modeling, and structured evaluation loops.
Scaled the system across multiple business domains with a cleaner separation between orchestration and execution.

Stack

Python FastAPI LangGraph PostgreSQL pgvector Snowflake LiteLLM Azure OpenAI Ragas Docker

Automation Engineer

Microsoft

Architected event-driven multi-agent framework on AKS with LangGraph + Azure Functions — 99.92% availability
Deployed GPU model serving with Triton + vLLM: ~2× throughput, 35–50% lower cost/1k tokens
Built hybrid retrieval system; citation pass rate from 69% → 92%, unsupported answers down 30%

AI Engineer

Optum

Built HIPAA/GDPR-compliant drug-discovery platform unifying PDB/ChEMBL/PubChem, TCGA/GEO, ClinicalTrials.gov
Accelerated target discovery ~20% using AlphaFold-style structural features with genomic Transformers
GraphSAGE/GAT + BioBERT/Med7 for trial recruitment; sub-second updates via Kafka + Flink

Machine Learning Engineer

Quora

Production recommender stack: two-stage ANN retrieval (FAISS/Pinecone) + learned ranking
Real-time feature pipelines at scale — Kafka/Flink with online/offline feature store parity
Multi-task learning optimizing CTR, read time, quality, and fairness simultaneously

Skills

Tech stack.

PythonCore Language

PyTorchModel Training

TensorFlowDeep Learning

KerasModel API

LangChainLLM Orchestration

Llama

LlamaIndexData Framework

Lite

LiteLLMModel Gateway

Hugging FaceModels & Tokenizers

CUDAGPU Acceleration

vLLM

vLLMInference Serving

RAG

RAGRetrieval Pattern

Guard

Guardrails AISafety Layer

MCP

MCPTool Protocol

QdrantVector Database

FAISS

FAISSVector Search

Pine

PineconeVector Database

ChDB

ChromaDBEmbedding Store

AWS

AWSCloud Platform

AzureEnterprise Cloud

DockerContainers

KafkaStreaming

PostgreSQLRelational Data

RedisCache Layer

MLflowMLOps Tracking

GitVersion Control

GoSystems Work

C++Performance

JavaBackend Systems

Research

Publications & papers.

BudgetMem: Selective Memory Policies for Cost-Efficient Long-Context Processing

72.4% memory reduction, 99% accuracy retained (1% F1 drop) on long-document benchmarks
Validated on $10/month hardware (Google Colab) — democratizes long-context AI without GPU infrastructure

Low-Level GPU Profiling & Kernel Debugging

Profiling playbook: nvprof, Nsight Systems/Compute, NCCL traces — warp divergence, uncoalesced memory, launch overhead
Kernel fusion, shared-mem tiling, async copy, CUDA Graphs → double-digit latency reductions on inference workloads

Automation Pipelines

Event-driven multi-agent pipeline with sagas, retries, idempotent checkpoints — hours to minutes
Offline eval harness (pass@k, citation correctness, hallucination rate); SME prep time cut ~60%

Projects

Key builds.

Custom Transformer Training Pipeline

Transformer LM from scratch in PyTorch · DDP across multiple GPUs · custom data preprocessing pipeline

GPU Kernel Optimization Framework

nvprof + Nsight profiling · custom CUDA kernels · 25% inference latency reduction

Multi-Agent Automation System

LangGraph event-driven pipeline · saga pattern · 99.92% availability · autonomous remediation

Hybrid RAG Retrieval System

Azure AI Search + FAISS · 92% citation accuracy · NLI hallucination guardrails

Blog

Coming soon — insights on AI/ML engineering, GPU optimization, and production agentic systems.