An end-to-end product retention cohort analysis project using SQL, Python, SQLite, Pandas, and Data Visualization to measure user engagement and retention over time.
This project simulates a real-world product analytics workflow used by product analysts and data analysts in SaaS and consumer tech companies.
Retention is one of the most important metrics for product-led growth.
This project answers questions such as:
- How well do users return after signup?
- At which weeks do we lose the most users?
- Are newer cohorts retaining better than older ones?
We perform weekly cohort retention analysis and visualize the results using a heatmap.
- Generate realistic product usage event data
- Build a clean analytical data model
- Perform cohort retention analysis using SQL
- Visualize retention trends over time
- Extract actionable product insights
product-retention-cohort-analysis/ │ ├── data/ │ ├── raw/ │ │ └── retention.db │ └── processed/ │ ├── notebooks/ │ └── 01_retention_cohort.ipynb │ ├── sql/ │ └── 01_cohort_retention.sql │ ├── src/ │ └── generate_events.py │ ├── reports/ │ └── report.md │ ├── requirements.txt ├── .gitignore └── README.md
- Python 3.13
- SQLite
- Pandas
- Seaborn & Matplotlib
- SQL
- Jupyter Notebook
- VS Code
1️⃣ Create & activate virtual environment
python -m venv .venv
.venv\Scripts\activate
2️⃣ Install dependencies
python src/generate_events.py
3. Generate the Dataset:
notebooks/01_retention_cohort.ipynb
4. Run the Analysis:
notebooks/01_retention_cohort.ipynb
---
## 📈 Key Analysis Performed
- Weekly signup cohorts
- Active users by cohort and week
- Retention rate calculation
- Cohort retention matrix
- Heatmap visualization
---
## 🔍 Key Insights
- Strong **Week-1 retention** across all cohorts
- Significant drop between **Weeks 3–6**
- Later cohorts show **slightly better retention**
- Long-term retention stabilizes around **10–15%**
The **largest retention leakage** happens in early lifecycle weeks.
---
## 💡 Recommendations
- Improve onboarding experience in the first 7 days
- Trigger engagement nudges between Weeks 2–4
- Run re-engagement campaigns for early churn-risk users
- Track cohort retention weekly to detect regressions early
---
## 🧪 Data Notes
- All data is **synthetically generated**
- No real user data is used
- Designed to mirror realistic SaaS usage patterns
---
## 📌 Why This Project Matters
This project demonstrates:
- Real-world **product analytics thinking**
- Strong **SQL + Python integration**
- Ability to convert raw events into insights
- Stakeholder-ready visualization and reporting
This mirrors workflows used by **Product Analysts, Data Analysts, and Growth Analysts**.
---
## 👤 Author
**Surjya Bhowmick**
Data & Product Analytics
GitHub: https://github.com/S-Bhowmick
---
⭐ If you found this project useful, feel free to star the repository!