Bulut Hamali BulutHamali

Bulut Hamali

Turning Biological Complexity Into Computational Clarity

The Mission

The gap between raw biological data and actionable clinical insight is vast, and mostly unautomated.

My work sits precisely in that gap. I build across the full stack of this problem: multi-agent AI pipelines that reason over clinical trial criteria, spatial and single-cell transcriptomics analyses that map tumor microenvironments at cellular resolution, regulatory-grade clinical data pipelines built to CDISC standards, and full-stack applications that surface these insights to the people who need them. The goal is always the same: make the biology computable and the computation trustworthy.

Currently Building

Status	Project	Focus
🛠 Ongoing	SpatioCore-Flow	Gated multi-agent orchestrator for single-cell and spatial transcriptomics with Code-in-the-Loop hallucination prevention
🟢 Active	ClinPilot	Refactoring orchestration layer into a stateless FastAPI backend for multi-user support
🟢 Active	LabTasker	Token persistence, email deadline reminders, analytics view, and time tracking per task
🟢 Active	CargoURL AI	Supabase integration and live frontend connection to replace mock data with real API responses

Featured Projects

ClinPilot: Verification-First Multi-Agent Clinical Trial Auditor

How do you make an LLM-generated clinical trial report trustworthy enough for a patient, a clinician, and a regulatory reviewer to act on?

Problem LLMs can summarize clinical trial eligibility criteria fluently, but they hallucinate, paraphrase, and omit without flagging it. In a regulatory context, an unverified output is not just wrong, it's a liability.

Solution A 5-agent sequential pipeline that retrieves live trial data from ClinicalTrials.gov, cross-references PubMed literature and FDA drug labels, then audits its own outputs before returning a structured 9-section report, written simultaneously for patients, clinicians, and regulatory reviewers.

Agent	Model	Role
Analyst	gpt-4o-mini	Extracts top inclusion and exclusion criteria from raw eligibility text
Researcher	gpt-4o-mini	Summarizes relevant PubMed abstracts; notes support or tension with criteria
Advocate	gpt-4o-mini	Produces plain-language patient report (trial overview, checklists, next steps)
Auditor	gpt-4o	Self-verification loop: grounds every criterion in verbatim source quotes; prunes unverifiable claims
Guardrail	gpt-4o	Compliance & risk review against 21 CFR Part 312 and ICH E6; Fairness & Diversity Audit per NIH/FDA guidelines

Tech Stack

Data sources:  ClinicalTrials.gov V2 API · PubMed E-utilities · FDA Drug Label API
Verification:  Auditor self-verification loop: verbatim citation binding, unverifiable claims pruned
Output:        9-section progressive-disclosure report + downloadable PDF certificate
Caching:       SQLite (audit cache by NCT ID) · ChromaDB (past audit retrieval)
Models:        gpt-4o-mini (extraction/summarization) · gpt-4o (verification/compliance)

SpatioCore-Flow: Gated Multi-Agent Biological Orchestrator

What if an AI system could analyze a tumor's spatial architecture without hallucinating biology it never actually saw in the data?

Problem LLM-based bioinformatics tools fail silently: they reason over biological data but have no mechanism to verify their own inferences against the underlying genomics. In a clinical or research context, an undetected hallucination in a spatial deconvolution or cell-type annotation is not a minor error.

Solution A production-grade multi-agent framework for autonomous Single-Cell Genomics and Spatial Transcriptomics analysis built on a Verification-First architecture. Every AI-generated inference is programmatically validated against raw AnnData/Squidpy objects through a Code-in-the-Loop gate before propagation: if the Biological Consistency Score drops below 0.8, the inference is rejected and the agent reruns with constraint adjustments.

Agent	Role
Curator	Maps raw input to the correct biological coordinate system (dissociated vs. in-situ)
Analyst	Executes foundation models (scGPT, Geneformer, Tangram, cell2location) for deconvolution and ST prediction
Validator	Code-driven verification of Analyst claims against source data
Synthesizer	Merges "What" (RNA) with "Where" (Spatial) into a tumor microenvironment graph
Auditor	Source-to-bit traceability linking every claim to a gene index or pixel coordinate
Guardrail	Evaluates outputs against FDA/SaMD risk frameworks and clinical literature

Tech Stack

Architecture:  Hybrid DAG orchestration · Sandbox-Gate pattern
Validation:    Code-in-the-Loop · Biological Consistency Score (BCS ≥ 0.8)
Models:        scGPT · Geneformer · Tangram · cell2location · StarDist
Storage:       PostgreSQL (audit trail) · ChromaDB (RAG) · Redis (cache)

Bioinformatics Research Portfolio

scRNA-seq Analysis: Gastric Cancer Cell Atlas

Problem Gastric cancer subtypes are clinically heterogeneous, and bulk profiling fails to resolve the malignant, immune, and fibroblast populations driving treatment resistance.

Solution Single-cell RNA-seq pipeline from raw count matrices through clustering, cell-type annotation, differential expression, and trajectory inference, reconstructing the cellular landscape of gastric tumor samples.

Tech Stack

Spatial Transcriptomics: Breast Cancer Tumor Microenvironment (Xenium)

Problem Standard scRNA-seq dissolves tissue architecture: you lose the spatial organization of tumor, immune, and stromal compartments that determines how cancer progresses and resists treatment.

Solution End-to-end spatial transcriptomics pipeline on a 10x Xenium in situ dataset from a human breast cancer FFPE section: cell type mapping, spatially-resolved gene expression patterns, and tissue architecture characterization at single-cell resolution.

Tech Stack

SDTM/ADaM Clinical Data Pipeline

Problem Regulatory submissions to the FDA and EMA require clinical trial data in CDISC-standard formats. Transforming raw trial datasets into analysis-ready ADaM structures, and from there into auditable TLF outputs, demands both statistical rigor and deep familiarity with submission standards.

Solution End-to-end clinical data pipeline from SDTM source datasets (DM, AE, LB) through ADaM derivation (ADSL subject-level, ADAE adverse events) to regulatory-grade tables, listings, and figures. Fully automated via a single run_all.R entry point.

Tech Stack

Standards:  CDISC SDTM → ADaM (ADSL, ADAE)
Outputs:    AE summary tables, demographics TLFs, treatment-emergent AE flags
Pipeline:   Modular R scripts · run_all.R single-command execution

LabTasker: Research Lab Task Manager

Research workflows don't fit generic project managers: they need Kanban boards that understand experiments, not sprints.

Problem Lab teams lose track of experiments, deadlines, and task ownership across spreadsheets and email threads. Generic project tools lack the domain-specific structure that research workflows require.

Solution Full-stack research task management application with a drag-and-drop Kanban board (To Do / In Progress / Done), per-project progress tracking, and JWT-authenticated user accounts. Frontend deployed on Render with a live demo.

Frontend

Backend

Auth:       JWT · bcrypt · protected routes
Features:   Drag-and-drop reorder, due dates, progress bars, per-user scoping
Live demo:  https://labtasker-frontend.onrender.com

CargoURL AI: Click Analytics & Optimization API

A link management platform needs more than redirect counts: it needs to tell you when to post, who's clicking, and whether the numbers are moving.

Problem Raw click data from a URL shortener is noise without context. Posting time, audience composition, and CTR deltas require time-series forecasting and segmentation, not just counting.

Solution A Flask REST API that turns raw click event streams into optimization signals: high-engagement posting window prediction via Prophet, audience segmentation via KMeans clustering, and CTR delta reporting. Built as the analytics and AI backend for CargoURL.

Tech Stack

Pipeline:   Click events → Prophet forecasting → KMeans segmentation → CTR delta
Deployment: Railway / Render (Nixpacks)
Status:     API functional and deployed; frontend integration in progress

Skills Matrix

Bioinformatics & Data Science

Domain	Tools & Methods
Single-Cell Genomics	Scanpy, Seurat, scRNA-seq clustering, trajectory inference
Spatial Transcriptomics	10x Xenium (in situ), Squidpy, Tangram, cell2location, spatial gene expression analysis
Biological Foundation Models	scGPT, Geneformer, StarDist, spatial deconvolution
Clinical Data Standards	CDISC SDTM/ADaM, TLF generation, admiral (R), regulatory pipelines
Statistical Analysis	R/Bioconductor, differential expression, survival analysis
Data Engineering	Pandas, NumPy, high-performance Python, HPC/SLURM

AI & LLM Orchestration

Domain	Tools & Methods
Multi-Agent Systems	CrewAI, LiteLLM, Hybrid-DAG orchestration, deliberation frameworks
Agentic Validation	Code-in-the-Loop verification, Biological Consistency Scoring, Pydantic schemas
Retrieval-Augmented Generation	ChromaDB, vector embeddings, semantic search
LLM Integration	OpenAI API (gpt-4o, gpt-4o-mini), prompt engineering
Caching & Optimization	SQLite semantic cache, inference cost reduction
Regulatory AI	FDA-aware guardrail agents, citation integrity, audit trails

Full-Stack Engineering

Domain	Tools & Methods
Frontend	React 18, TypeScript, Tailwind CSS, shadcn/ui, Vite
Backend	Node.js, Express, Flask, RESTful API design
Database	MongoDB Atlas, PostgreSQL, SQLite, Redis, data modeling
Auth & Security	JWT, bcrypt, role-based access control
DevOps	Git, GitHub Actions, Linux/Bash, Render, Railway

GitHub Stats

"The most consequential code running today is the code that interprets biology."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulut Hamali BulutHamali

Achievements

Achievements

Highlights

Block or report BulutHamali

Bulut Hamali

Turning Biological Complexity Into Computational Clarity

The Mission

Currently Building

Featured Projects

ClinPilot: Verification-First Multi-Agent Clinical Trial Auditor

SpatioCore-Flow: Gated Multi-Agent Biological Orchestrator

Bioinformatics Research Portfolio

scRNA-seq Analysis: Gastric Cancer Cell Atlas

Spatial Transcriptomics: Breast Cancer Tumor Microenvironment (Xenium)

SDTM/ADaM Clinical Data Pipeline

LabTasker: Research Lab Task Manager

CargoURL AI: Click Analytics & Optimization API

Skills Matrix

Bioinformatics & Data Science

AI & LLM Orchestration

Full-Stack Engineering

GitHub Stats

Pinned Loading

Uh oh!