| Project | What | Impact |
|---|---|---|
| Event Pipeline | Designed self-hosted Kafka + OpenSearch replacing Mixpanel SaaS | System design β phased migration (0%β100%) β SLO preserved, zero data loss |
| Developer Platform | Org-wide one-click service onboarding (ArgoCD + Helm) | Measured deployment toil, then automated it away across all teams |
| Incident Mgmt & SLOs | SLO-driven alerting, 1200+ incidents resolved with RCA | Calibrated thresholds, preventive automation, blameless postmortems |
| LB Consolidation | Designed migration from ingress-nginx to ALB | 50% reduction in operational surface area, zero-disruption cutover |
| China Region | End-to-end AWS China β EKS, VPC, monitoring, compliance | Cross-region distributed system design |
| Headlamp | K8s observability deployed org-wide | Eliminated a class of support requests β devs self-serve |
| Marqet π¨ | Self-hosted eCommerce analytics β Go API, async workers, scrapers, Next.js | Dual-DB (Postgres OLTP + ClickHouse OLAP), Redis queue, K8s deploy |
| Billdar π¨ | Full-stack cost intelligence platform (Go+React+ClickHouse) | System design + full-stack engineering |
| GPUC π¨ | Unified GPU provisioning API β concurrent multi-cloud fetching | Go, REST API, distributed provider abstraction |
@ Ultrahuman (Jul 2025 β Present): Owning end-to-end infrastructure. Go & Python for reliability automation. Multi-region AWS including China.
@ Acko (Jan 2022 β Jul 2025): Built Life Insurance infra from zero. 99.9% uptime SLO. Linkerd service mesh (mTLS, circuit breaking). Kong API Gateway. Graviton migration. Terraform + Ansible automation. Structured on-call with blameless postmortems.*
