Chandra Alla.

Building memory into machines. Shipping enterprise AI. While everyone scales parameters, I'm optimizing what the model chooses to forget.
Dallas, TXBased
AT&T AI FoundryNow
About

Building AI that thinks and ships.

I'm pioneering AI-driven enterprise automation that transforms how large organizations operate. Currently at AT&T, I'm building systems that learn from human experts and autonomously execute complex workflows — turning what once took hours into automated processes that complete in minutes.

My work centers on multi-agent AI systems powered by LangGraph orchestration, Retrieval-Augmented Generation (RAG), and vector databases. When a domain expert records a workflow, my systems analyze it, generate semantic embeddings, and store it as reusable intelligence.

With deep expertise in production ML systems, GPU optimization, and distributed computing, I build scalable AI solutions that deliver measurable enterprise impact at the intersection of cutting-edge research and real-world deployment.

Chandra Alla

Education

Master of Science in Computer Science
University of Texas at Arlington
  • Advanced Machine Learning · Data Structures & Algorithms · Big Data · AI · Cloud Computing · Neural Networks
Graduate Diploma in Deep Learning
University of Texas at Arlington
  • Neural Networks · Computer Vision · Data Analysis & Modeling Techniques
B.Tech in Computer Science and Engineering
REVA University
Experience

Professional history.

GenAI Engineer
AT&T AI Foundry
  • Built production-grade GenAI systems spanning enterprise workflow automation and schema-aware NL-to-SQL intelligence for internal business users.
  • Designed LangGraph orchestration, dynamic tool binding, human gating, evaluation pipelines, and failure recovery for real enterprise workflows.
  • Shipped systems with async control flow, observability, and business-facing outputs over internal tools, PostgreSQL metadata, and Snowflake data.
Selected AT&T Systems
Enterprise Agent Platform
AI Agentic Foundry

Production-grade multi-agent orchestration platform that automates enterprise workflows with stateful execution, tool calling, human-in-the-loop gating, and self-healing remediation.

85+ workflows Interrupt / resume Async gates
Problem

Enterprise workflows are multi-step, non-deterministic, dependent on multiple systems, and prone to failures from bad inputs, API issues, or approval requirements. Rule-based automation breaks quickly on these edge cases.

Solution

Designed a LangGraph-based 7-stage pipeline that decomposes requests into specialized agent responsibilities, isolates context, binds tools dynamically, pauses for approvals, and resumes after human or system events.

Architecture
Retrieval Planning Validation Execution Remediation Gates Summary
Highlights
  • Dynamic tool registry with runtime discovery via /v1/tools, category-based routing, and tighter execution accuracy.
  • Error remediation agent captures failed step context, tool responses, and error messages to generate corrective actions instead of naive retries.
  • Human, timed, and system gates implemented with LangGraph interrupt/resume patterns and Azure Durable Functions.
Impact
  • Enabled automation across multiple enterprise workflows with production-grade async orchestration.
  • Reduced manual intervention through self-healing execution and approval-aware control flow.
  • Improved determinism with context isolation, tool binding, and structured stage transitions.
Stack
Python FastAPI LangGraph LangChain OpenAI / Azure OpenAI PostgreSQL ChromaDB Docker AKS LangSmith
Enterprise Data Intelligence
Customer Insights Agent (CIA)

Natural-language intelligence layer over enterprise Snowflake data that lets business users query complex datasets without writing SQL, while still returning structured, business-ready answers.

23+ tables Schema-aware RAG 85+ golden cases
Problem

Business users do not know SQL, schema design, or join logic, while enterprise data is spread across 23+ related tables with non-trivial relationships and domain-specific business rules.

Solution

Built a LangGraph NL-to-SQL pipeline that retrieves schema context, resolves joins dynamically, triages unsupported requests, routes to an Ask Data API for SQL generation and execution, and summarizes the output for non-technical users.

Architecture
Setup Intent Table RAG Dataset Select Triage Enrichment SQL + Execute Summary Persist
Highlights
  • Schema-aware RAG over table metadata using pgvector with HNSW indexing for fast retrieval of relevant tables.
  • ER relationship model for keys like CUSTOMER_KEY, FAN_ID, and ACCT_ID to support multi-table reasoning.
  • Answerability gate catches unsupported or missing-data scenarios before wasting cost on downstream SQL generation.
  • Five-stage evaluation system with 85+ golden cases, per-table analysis, and LLM-as-judge scoring for SQL and answer quality.
Impact
  • Enabled non-technical users to query enterprise data without depending on analysts for every question.
  • Improved SQL reliability through schema retrieval, join modeling, and structured evaluation loops.
  • Scaled the system across multiple business domains with a cleaner separation between orchestration and execution.
Stack
Python FastAPI LangGraph PostgreSQL pgvector Snowflake LiteLLM Azure OpenAI Ragas Docker
Automation Engineer
Microsoft
  • Architected event-driven multi-agent framework on AKS with LangGraph + Azure Functions — 99.92% availability
  • Deployed GPU model serving with Triton + vLLM: ~2× throughput, 35–50% lower cost/1k tokens
  • Built hybrid retrieval system; citation pass rate from 69% → 92%, unsupported answers down 30%
AI Engineer
Optum
  • Built HIPAA/GDPR-compliant drug-discovery platform unifying PDB/ChEMBL/PubChem, TCGA/GEO, ClinicalTrials.gov
  • Accelerated target discovery ~20% using AlphaFold-style structural features with genomic Transformers
  • GraphSAGE/GAT + BioBERT/Med7 for trial recruitment; sub-second updates via Kafka + Flink
Machine Learning Engineer
Quora
  • Production recommender stack: two-stage ANN retrieval (FAISS/Pinecone) + learned ranking
  • Real-time feature pipelines at scale — Kafka/Flink with online/offline feature store parity
  • Multi-task learning optimizing CTR, read time, quality, and fairness simultaneously
Skills

Tech stack.

PythonCore Language
PyTorchModel Training
TensorFlowDeep Learning
KerasModel API
LangChainLLM Orchestration
Llama
LlamaIndexData Framework
Lite
LiteLLMModel Gateway
Hugging FaceModels & Tokenizers
CUDAGPU Acceleration
vLLM
vLLMInference Serving
RAG
RAGRetrieval Pattern
Guard
Guardrails AISafety Layer
MCP
MCPTool Protocol
QdrantVector Database
FAISS
FAISSVector Search
Pine
PineconeVector Database
ChDB
ChromaDBEmbedding Store
AWS
AWSCloud Platform
AzureEnterprise Cloud
DockerContainers
KafkaStreaming
PostgreSQLRelational Data
RedisCache Layer
MLflowMLOps Tracking
GitVersion Control
GoSystems Work
C++Performance
JavaBackend Systems
Research

Publications & papers.

BudgetMem: Selective Memory Policies for Cost-Efficient Long-Context Processing
  • 72.4% memory reduction, 99% accuracy retained (1% F1 drop) on long-document benchmarks
  • Validated on $10/month hardware (Google Colab) — democratizes long-context AI without GPU infrastructure
Low-Level GPU Profiling & Kernel Debugging
  • Profiling playbook: nvprof, Nsight Systems/Compute, NCCL traces — warp divergence, uncoalesced memory, launch overhead
  • Kernel fusion, shared-mem tiling, async copy, CUDA Graphs → double-digit latency reductions on inference workloads
Automation Pipelines
  • Event-driven multi-agent pipeline with sagas, retries, idempotent checkpoints — hours to minutes
  • Offline eval harness (pass@k, citation correctness, hallucination rate); SME prep time cut ~60%
Projects

Key builds.

Custom Transformer Training Pipeline
Transformer LM from scratch in PyTorch · DDP across multiple GPUs · custom data preprocessing pipeline
GPU Kernel Optimization Framework
nvprof + Nsight profiling · custom CUDA kernels · 25% inference latency reduction
Multi-Agent Automation System
LangGraph event-driven pipeline · saga pattern · 99.92% availability · autonomous remediation
Hybrid RAG Retrieval System
Azure AI Search + FAISS · 92% citation accuracy · NLI hallucination guardrails

Blog

Coming soon — insights on AI/ML engineering, GPU optimization, and production agentic systems.