8000
Skip to content

S-Bhowmick/product-retention-cohort-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📊 Product Retention Cohort Analysis

An end-to-end product retention cohort analysis project using SQL, Python, SQLite, Pandas, and Data Visualization to measure user engagement and retention over time.

This project simulates a real-world product analytics workflow used by product analysts and data analysts in SaaS and consumer tech companies.


🚀 Project Overview

Retention is one of the most important metrics for product-led growth.
This project answers questions such as:

  • How well do users return after signup?
  • At which weeks do we lose the most users?
  • Are newer cohorts retaining better than older ones?

We perform weekly cohort retention analysis and visualize the results using a heatmap.


🧠 Key Objectives

  • Generate realistic product usage event data
  • Build a clean analytical data model
  • Perform cohort retention analysis using SQL
  • Visualize retention trends over time
  • Extract actionable product insights

🗂️ Project Structure

product-retention-cohort-analysis/ │ ├── data/ │ ├── raw/ │ │ └── retention.db │ └── processed/ │ ├── notebooks/ │ └── 01_retention_cohort.ipynb │ ├── sql/ │ └── 01_cohort_retention.sql │ ├── src/ │ └── generate_events.py │ ├── reports/ │ └── report.md │ ├── requirements.txt ├── .gitignore └── README.md


⚙️ Tech Stack

  • Python 3.13
  • SQLite
  • Pandas
  • Seaborn & Matplotlib
  • SQL
  • Jupyter Notebook
  • VS Code

▶️ How to Run the Project

1️⃣ Create & activate virtual environment

python -m venv .venv
.venv\Scripts\activate
2️⃣ Install dependencies 
python src/generate_events.py

3. Generate the Dataset:  
notebooks/01_retention_cohort.ipynb

4. Run the Analysis: 
notebooks/01_retention_cohort.ipynb

---

## 📈 Key Analysis Performed

- Weekly signup cohorts
- Active users by cohort and week
- Retention rate calculation
- Cohort retention matrix
- Heatmap visualization

---

## 🔍 Key Insights

- Strong **Week-1 retention** across all cohorts
- Significant drop between **Weeks 3–6**
- Later cohorts show **slightly better retention**
- Long-term retention stabilizes around **10–15%**

The **largest retention leakage** happens in early lifecycle weeks.

---

## 💡 Recommendations

- Improve onboarding experience in the first 7 days
- Trigger engagement nudges between Weeks 2–4
- Run re-engagement campaigns for early churn-risk users
- Track cohort retention weekly to detect regressions early

---

## 🧪 Data Notes

- All data is **synthetically generated**
- No real user data is used
- Designed to mirror realistic SaaS usage patterns

---

## 📌 Why This Project Matters

This project demonstrates:

- Real-world **product analytics thinking**
- Strong **SQL + Python integration**
- Ability to convert raw events into insights
- Stakeholder-ready visualization and reporting

This mirrors workflows used by **Product Analysts, Data Analysts, and Growth Analysts**.

---

## 👤 Author

**Surjya Bhowmick**  
Data & Product Analytics  
GitHub: https://github.com/S-Bhowmick

---

⭐ If you found this project useful, feel free to star the repository!

About

End-to-end product retention cohort analysis using SQL, Python, SQLite, Pandas, and data visualization to measure user engagement and long-term retention.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

0