Tony Lee avatar

TONY LEE

Senior AI / Full-Stack Engineer

Generative AI Engineer at TheKey, US

Senior AI / Full-Stack Engineer with 10 years of building and scaling production-grade systems across SaaS and AI-driven platforms. Deep expertise in LLM applications, RAG pipelines, and real-time AI systems, with a strong focus on system design, performance optimization, and multi-tenant architectures. Proven ability to own end-to-end delivery from prototyping to production.

About

I build and scale production-grade AI systems end-to-end, with deep experience applying LLMs in real products. My work centers on reliable RAG pipelines, real-time AI response systems, and robust backends (authentication, session management, multi-tenant access control), with performance and reliability as first-class goals.

Research Interests

LLM Applications RAG Pipelines Real-Time AI Systems LLM Evaluation & Output Tuning AI Agents / Workflow Automation System Design & Performance Optimization Multi-tenant Architectures

Work Experience

TheKey logo

TheKey

Jan 2024 - Present

Generative AI Engineer

  • Built LLM-powered features for internal healthcare tools, allowing users to query patient-related data and workflows in a more natural way instead of navigating multiple systems.
  • Designed and iterated on RAG pipelines combining structured data and unstructured documents, improving retrieval quality and reducing manual lookup.
  • Implemented backend services in Python (FastAPI) to support real-time AI responses, including streaming outputs and session-aware interactions.
... More
  • Integrated embedding-based retrieval using vector search, tuning chunking strategies and retrieval logic to improve consistency across similar queries.
  • Worked closely with product to refine prompt design, response formats, and guardrails, improving reliability and usability in real-world workflows.
  • Implemented authentication, session management, and access control, ensuring secure handling of sensitive healthcare data.
  • Identified bottlenecks in retrieval and generation pipelines, improving latency through async processing and lightweight caching.
  • Structured the system to support extensible AI use cases (chat, summarization, workflow automation) without major refactoring.
Google Cloud logo

Google (Google Cloud)

Jan 2023 - Oct 2023

AI Research Engineer (LLM Systems)

  • Worked on LLM-based pipelines for document processing and semantic search, improving how large datasets are indexed and queried.
  • Built embedding-based retrieval systems, experimenting with indexing strategies and similarity search.
  • Used GCP tools such as Vertex AI and BigQuery to run experiments, process data, and support model-driven workflows.
... More
  • Improved data pipelines for indexing, retrieval, and preparation of model inputs.
  • Helped transition research prototypes into more stable, production-oriented pipelines by improving data handling and system reliability.
  • Collaborated with engineers and researchers to evaluate model outputs and refine retrieval and generation behavior.
HubSpot logo

HubSpot

Mar 2020 - Dec 2023

AI Full-Stack Engineer

  • Built and maintained backend services using Node.js and Python, supporting high-volume SaaS workflows and integrations across multiple systems.
  • Designed REST APIs for data-heavy operations with a focus on reliability, clear contracts, and long-term maintainability as product requirements evolved.
  • Improved database performance by analyzing slow queries, adding indexes, and optimizing data access patterns in critical services.
... More
  • Introduced Redis-based caching to reduce repeated load on core endpoints and improve response times.
  • Contributed to frontend development using React and Next.js, improving usability and reducing friction in key user flows.
  • Worked on early AI-driven features, including automation and data enrichment, integrating them into existing product workflows.
  • Participated in system design discussions around scaling services and maintaining system reliability.
  • Improved CI/CD pipelines and deployment processes, reducing release issues and increasing consistency.
Squarespace logo

Squarespace

Jun 2017 - Feb 2020

Software Engineer

  • Developed full-stack features for web applications, including backend APIs and frontend components used in customer-facing products.
  • Built and maintained services for content management systems and user-related workflows.
  • Improved frontend performance by addressing rendering issues and optimizing key UI interactions.
... More
  • Collaborated with product and design teams to deliver features that balanced usability with technical constraints.
AutoZone logo

AutoZone

Jul 2016 - May 2017

Software Engineer Intern

  • Assisted in building internal tools and web applications as part of a larger engineering team.
  • Supported debugging, testing, and incremental feature development across backend and frontend components.
  • Applied standard development practices including version control, code reviews, and collaborative workflows.
The University of Texas at Dallas logo

University of Texas at Dallas

Aug 2015 - Dec 2016

Research Assistant

  • Built Python-based tools for data processing and experimentation in research projects.
  • Assisted with early-stage machine learning workflows and data analysis.
  • Supported implementation and documentation of research systems.

Technical Skills

AI / ML

LLMs (OpenAI, Anthropic APIs), RAG, Embedding & Vector Search, Semantic Search, Prompt Engineering, Retrieval Optimization, LLM Evaluation & Output Tuning, AI Agents / Workflow Automation, NLP, Knowledge Augmentation Systems, Context Injection / Memory Handling, LangChain / LLM orchestration.

Backend

Python, FastAPI, Node.js, REST APIs, WebSockets, Async Processing / Concurrency, Microservices, Distributed Systems, Event-Driven Architecture, Authentication & Authorization (RBAC, JWT), Session Management, Multi-tenant systems, System Design & Scalability, Performance Optimization, Caching Strategies.

Frontend

React, Next.js, TypeScript, JavaScript (ES6+), Component Design, Responsive UI Development, Frontend performance optimization.

Data & Storage

PostgreSQL, MongoDB, Redis, Vector Databases (Pinecone, Weaviate, FAISS), BigQuery, Data modeling, Query optimization, Indexing strategies.

Cloud & Infrastructure

AWS, GCP (Vertex AI, BigQuery, Cloud Functions), Docker, CI/CD (GitHub Actions), cloud-native architecture, scalable infrastructure design, deployment & monitoring basics.

Engineering Practices

System design, code review, debugging & troubleshooting, performance profiling, agile/iterative development, cross-functional collaboration, technical decision making.

Education

Bachelor of Science (B.Sc.), Computer Science

The University of Texas at Dallas

2012 - 2016