LLM System Design & Deployment
- Design and implement LLM-powered features end-to-end — from prompt architecture and model selection through API integration and production deployment — with minimal supervision.
- Own prompt engineering for production features: design, version, and systematically evaluate prompts across model updates and behavior regressions.
- Integrate conversational and agentic AI capabilities into an existing application, owning the API layer, session management, and graceful degradation strategies.
RAG & Retrieval Systems
- Build and maintain RAG pipelines — including chunking strategy, embedding selection, vector store management, and retrieval evaluation — tuned for the application's domain.
- Work across retrieval approaches (dense vector search, BM25 hybrid, re-ranking) and evaluate trade-offs for accuracy, latency, and cost.
Agentic Workflows & Orchestration
- Select and apply frameworks (LangChain, LlamaIndex, LangGraph, custom) based on real trade-offs in the context of the product — not hype.
- Build with and extend MCP (Model Context Protocol) servers for tool integration, external service access, and structured agent communication.
Evaluation & Quality
- Define and run LLM evaluation pipelines — automated metrics, human eval, regression suites — and act on results without waiting for direction.
- Identify prompt regressions, retrieval quality issues, and latency problems early and drive resolution.
Collaboration & Engineering Culture
- Collaborate with backend and frontend engineers as a peer, translating AI capabilities into clean service contracts and integration specs.
- Identify architectural or data quality issues early and escalate when scope warrants.
- Stay current with the LLM ecosystem and bring concrete, well-reasoned proposals for adopting techniques or tooling that address real product problems.
- Contribute to technical documentation, internal best practices, and code reviews for junior team members.