8000
Skip to content
View BulutHamali's full-sized avatar

Highlights

  • Pro

Block or report BulutHamali

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
BulutHamali/README.md

Bulut Hamali

Turning Biological Complexity Into Computational Clarity

Typing SVG


The Mission

The gap between raw biological data and actionable clinical insight is vast, and mostly unautomated.

My work sits precisely in that gap. I build across the full stack of this problem: multi-agent AI pipelines that reason over clinical trial criteria, spatial and single-cell transcriptomics analyses that map tumor microenvironments at cellular resolution, regulatory-grade clinical data pipelines built to CDISC standards, and full-stack applications that surface these insights to the people who need them. The goal is always the same: make the biology computable and the computation trustworthy.


Currently Building

Status Project Focus
🛠 Ongoing SpatioCore-Flow Gated multi-agent orchestrator for single-cell and spatial transcriptomics with Code-in-the-Loop hallucination prevention
🟢 Active ClinPilot Refactoring orchestration layer into a stateless FastAPI backend for multi-user support
🟢 Active LabTasker Token persistence, email deadline reminders, analytics view, and time tracking per task
🟢 Active CargoURL AI Supabase integration and live frontend connection to replace mock data with real API responses

Featured Projects

ClinPilot: Verification-First Multi-Agent Clinical Trial Auditor

GitHub

How do you make an LLM-generated clinical trial report trustworthy enough for a patient, a clinician, and a regulatory reviewer to act on?

Problem LLMs can summarize clinical trial eligibility criteria fluently, but they hallucinate, paraphrase, and omit without flagging it. In a regulatory context, an unverified output is not just wrong, it's a liability.

Solution A 5-agent sequential pipeline that retrieves live trial data from ClinicalTrials.gov, cross-references PubMed literature and FDA drug labels, then audits its own outputs before returning a structured 9-section report, written simultaneously for patients, clinicians, and regulatory reviewers.

Agent Model Role
Analyst gpt-4o-mini Extracts top inclusion and exclusion criteria from raw eligibility text
Researcher gpt-4o-mini Summarizes relevant PubMed abstracts; notes support or tension with criteria
Advocate gpt-4o-mini Produces plain-language patient report (trial overview, checklists, next steps)
Auditor gpt-4o Self-verification loop: grounds every criterion in verbatim source quotes; prunes unverifiable claims
Guardrail gpt-4o Compliance & risk review against 21 CFR Part 312 and ICH E6; Fairness & Diversity Audit per NIH/FDA guidelines

Tech Stack

Python OpenAI CrewAI ChromaDB SQLite Streamlit

Data sources:  ClinicalTrials.gov V2 API · PubMed E-utilities · FDA Drug Label API
Verification:  Auditor self-verification loop: verbatim citation binding, unverifiable claims pruned
Output:        9-section progressive-disclosure report + downloadable PDF certificate
Caching:       SQLite (audit cache by NCT ID) · ChromaDB (past audit retrieval)
Models:        gpt-4o-mini (extraction/summarization) · gpt-4o (verification/compliance)

SpatioCore-Flow: Gated Multi-Agent Biological Orchestrator

What if an AI system could analyze a tumor's spatial architecture without hallucinating biology it never actually saw in the data?

Problem LLM-based bioinformatics tools fail silently: they reason over biological data but have no mechanism to verify their own inferences against the underlying genomics. In a clinical or research context, an undetected hallucination in a spatial deconvolution or cell-type annotation is not a minor error.

Solution A production-grade multi-agent framework for autonomous Single-Cell Genomics and Spatial Transcriptomics analysis built on a Verification-First architecture. Every AI-generated inference is programmatically validated against raw AnnData/Squidpy objects through a Code-in-the-Loop gate before propagation: if the Biological Consistency Score drops below 0.8, the inference is rejected and the agent reruns with constraint adjustments.

Agent Role
Curator Maps raw input to the correct biological coordinate system (dissociated vs. in-situ)
Analyst Executes foundation models (scGPT, Geneformer, Tangram, cell2location) for deconvolution and ST prediction
Validator Code-driven verification of Analyst claims against source data
Synthesizer Merges "What" (RNA) with "Where" (Spatial) into a tumor microenvironment graph
Auditor Source-to-bit traceability linking every claim to a gene index or pixel coordinate
Guardrail Evaluates outputs against FDA/SaMD risk frameworks and clinical literature

Tech Stack

Python CrewAI LiteLLM Scanpy Squidpy FastAPI Streamlit PostgreSQL ChromaDB

Architecture:  Hybrid DAG orchestration · Sandbox-Gate pattern
Validation:    Code-in-the-Loop · Biological Consistency Score (BCS ≥ 0.8)
Models:        scGPT · Geneformer · Tangram · cell2location · StarDist
Storage:       PostgreSQL (audit trail) · ChromaDB (RAG) · Redis (cache)

Bioinformatics Research Portfolio

scRNA-seq Analysis: Gastric Cancer Cell Atlas

GitHub

Problem Gastric cancer subtypes are clinically heterogeneous, and bulk profiling fails to resolve the malignant, immune, and fibroblast populations driving treatment resistance.

Solution Single-cell RNA-seq pipeline from raw count matrices through clustering, cell-type annotation, differential expression, and trajectory inference, reconstructing the cellular landscape of gastric tumor samples.

Tech Stack

Python Scanpy R


Spatial Transcriptomics: Breast Cancer Tumor Microenvironment (Xenium)

GitHub

Problem Standard scRNA-seq dissolves tissue architecture: you lose the spatial organization of tumor, immune, and stromal compartments that determines how cancer progresses and resists treatment.

Solution End-to-end spatial transcriptomics pipeline on a 10x Xenium in situ dataset from a human breast cancer FFPE section: cell type mapping, spatially-resolved gene expression patterns, and tissue architecture characterization at single-cell resolution.

Tech Stack

Python Squidpy Scanpy


SDTM/ADaM Clinical Data Pipeline

GitHub

Problem Regulatory submissions to the FDA and EMA require clinical trial data in CDISC-standard formats. Transforming raw trial datasets into analysis-ready ADaM structures, and from there into auditable TLF outputs, demands both statistical rigor and deep familiarity with submission standards.

Solution End-to-end clinical data pipeline from SDTM source datasets (DM, AE, LB) through ADaM derivation (ADSL subject-level, ADAE adverse events) to regulatory-grade tables, listings, and figures. Fully automated via a single run_all.R entry point.

Tech Stack

R admiral Bioconductor

Standards:  CDISC SDTM → ADaM (ADSL, ADAE)
Outputs:    AE summary tables, demographics TLFs, treatment-emergent AE flags
Pipeline:   Modular R scripts · run_all.R single-command execution

LabTasker: Research Lab Task Manager

GitHub Frontend GitHub Backend

Research workflows don't fit generic project managers: they need Kanban boards that understand experiments, not sprints.

Problem Lab teams lose track of experiments, deadlines, and task ownership across spreadsheets and email threads. Generic project tools lack the domain-specific structure that research workflows require.

Solution Full-stack research task management application with a drag-and-drop Kanban board (To Do / In Progress / Done), per-project progress tracking, and JWT-authenticated user accounts. Frontend deployed on Render with a live demo.

Frontend

React TypeScript Tailwind CSS Vite

Backend

Node.js Express MongoDB

Auth:       JWT · bcrypt · protected routes
Features:   Drag-and-drop reorder, due dates, progress bars, per-user scoping
Live demo:  https://labtasker-frontend.onrender.com

CargoURL AI: Click Analytics & Optimization API

GitHub

A link management platform needs more than redirect counts: it needs to tell you when to post, who's clicking, and whether the numbers are moving.

Problem Raw click data from a URL shortener is noise without context. Posting time, audience composition, and CTR deltas require time-series forecasting and segmentation, not just counting.

Solution A Flask REST API that turns raw click event streams into optimization signals: high-engagement posting window prediction via Prophet, audience segmentation via KMeans clustering, and CTR delta reporting. Built as the analytics and AI backend for CargoURL.

Tech Stack

Python Flask Prophet scikit-learn pandas

Pipeline:   Click events → Prophet forecasting → KMeans segmentation → CTR delta
Deployment: Railway / Render (Nixpacks)
Status:     API functional and deployed; frontend integration in progress

Skills Matrix

Bioinformatics & Data Science

Domain Tools & Methods
Single-Cell Genomics Scanpy, Seurat, scRNA-seq clustering, trajectory inference
Spatial Transcriptomics 10x Xenium (in situ), Squidpy, Tangram, cell2location, spatial gene expression analysis
Biological Foundation Models scGPT, Geneformer, StarDist, spatial deconvolution
Clinical Data Standards CDISC SDTM/ADaM, TLF generation, admiral (R), regulatory pipelines
Statistical Analysis R/Bioconductor, differential expression, survival analysis
Data Engineering Pandas, NumPy, high-performance Python, HPC/SLURM

AI & LLM Orchestration

Domain Tools & Methods
Multi-Agent Systems CrewAI, LiteLLM, Hybrid-DAG orchestration, deliberation frameworks
Agentic Validation Code-in-the-Loop verification, Biological Consistency Scoring, Pydantic schemas
Retrieval-Augmented Generation ChromaDB, vector embeddings, semantic search
LLM Integration OpenAI API (gpt-4o, gpt-4o-mini), prompt engineering
Caching & Optimization SQLite semantic cache, inference cost reduction
Regulatory AI FDA-aware guardrail agents, citation integrity, audit trails

Full-Stack Engineering

Domain Tools & Methods
Frontend React 18, TypeScript, Tailwind CSS, shadcn/ui, Vite
Backend Node.js, Express, Flask, RESTful API design
Database MongoDB Atlas, PostgreSQL, SQLite, Redis, data modeling
Auth & Security JWT, bcrypt, role-based access control
DevOps Git, GitHub Actions, Linux/Bash, Render, Railway

GitHub Stats

Profile Summary

Stats Top Languages by Commit Top Languages by Repo

GitHub Streak

Activity Graph

DNA Contribution Animation


"The most consequential code running today is the code that interprets biology."

Pinned Loading

  1. scRNAseq-Pipeline-GastricCancer scRNAseq-Pipeline-GastricCancer Public

    Reconstruction and augmentation of the Wang B. et al. (2021) paper's computational framework for dissecting metastatic gastric cancer through their provided single-cell transcriptomics dataset.

    Jupyter Notebook

  2. interactive-registration-form interactive-registration-form Public

    A fully interactive and responsive registration form built with HTML, CSS, and JavaScript. Features live validation, user feedback, and accessibility enhancements.

    JavaScript

  3. personal-blog-sba personal-blog-sba Public

    A clean and responsive personal blog layout with structured posts and smooth navigation, built with HTML, CSS, and JavaScript.

    JavaScript

  4. labtasker-frontend labtasker-frontend Public

    Research project and task tracker for academic labs. Kanban board, progress tracking, and JWT authentication. Built with React, TypeScript, and Node.js.

    TypeScript 18

  5. labtasker-backend labtasker-backend Public

    REST API for research project and task management. Built with Node.js, Express, MongoDB Atlas, and JWT authentication. Features Kanban-style task tracking and drag-and-drop reordering.

    JavaScript 18

  6. cargourl-ai cargourl-ai Public

    AI-powered click analytics API for URL optimization — posting time prediction, audience segmentation, and CTR reporting

    Python

0