Skip to content

NadiaRozman/NadiaRozman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 

Repository files navigation

Hi, I'm Nadia! ☺️

I'm a data enthusiast transitioning into data science, with a background spanning clinical research, public health trials, and digital banking operations. Across every role, data was always at the centre — and that consistency led me to deepen my technical skills in Python, SQL, TensorFlow/Keras, NLP libraries, and Tableau.

I'm passionate about turning raw data into actionable insights, and I build end-to-end projects that reflect real-world analytical workflows across healthcare and banking domains.


🌟 Featured Projects

End-to-end data science pipeline on 5,000 ClinicalTrials.gov records — EDA, NLP, XGBoost, SHAP, and SQL analytics.

This is my most comprehensive project, combining my clinical research background with a full data science workflow:

Notebook Focus
📊 EDA & Data Acquisition HuggingFace streaming, XML parsing, feature engineering
📝 NLP & Text Analytics TF-IDF, Sentence Transformers, BART zero-shot, NER
🤖 Machine Learning XGBoost + hyperparameter tuning (ROC-AUC: 0.68)
🗄️ SQL Analytics SQLite, window functions, multi-CTE sponsor scorecard

Key results: Predicted clinical trial completion from registration metadata alone; SHAP explainability identified phase and collaborator presence as the strongest completion signals. NLP baseline (TF-IDF) achieved ROC-AUC of 0.69 from free-text summaries alone.

Tech: Python · XGBoost · SHAP · HuggingFace Transformers · Sentence Transformers · SQLite · Plotly · scikit-learn


End-to-end SQL case study on retail banking operations — customer segmentation, transaction analysis, complaint SLA performance, and an executive KPI scorecard. Built on a synthetic Malaysian banking dataset.

A business intelligence project demonstrating how SQL alone can power executive-grade analytics across a full retail banking operation:

Query File Focus
📋 Customer Segmentation Segment distribution, age bands, tenure cohorts
💳 Transaction Analysis Monthly trends, MoM change, channel mix, risk flags
📞 Complaints & SLA SLA breach rates, CSAT scoring, customer risk scorecard
🔄 Cohort Retention Cohort analysis, dormancy detection, Pareto analysis
📦 Product Performance Adoption rates, complaint ratios, segment affinity
🏆 Executive Scorecard 5-CTE composite KPI scorecard across all dimensions

Key results: Identified cross-sell opportunities in the Mass segment (55.5% of customers, avg 1.91 products); confirmed Pareto — top 20% of customers drive ~58% of transaction value; built a composite risk scorecard combining complaints, SLA exposure, and transaction failure rates.

Tech: SQL · SQLite · CTEs · Window Functions · Python · Pandas · Matplotlib · Seaborn


🔹 Other Projects

Machine Learning

Analytics & NLP

Visualization


🔹 Skills

Languages & Tools: Python · SQL · TensorFlow · Keras · Tableau · NLTK · Pandas · NumPy · Scikit-learn · XGBoost · SHAP · HuggingFace Transformers · Matplotlib · Seaborn · Plotly · SQLite · WordCloud

Techniques: Regression · Classification · Clustering · Artificial Neural Networks · Convolutional Neural Networks · NLP · Sentiment Analysis · SHAP Explainability · Data Cleaning · Data Visualization · ETL Pipelines


📌 All projects are fully reproducible with notebooks and environment files included. Explore my repositories for the full workflow.

🔗 Connect: LinkedIn · GitHub

About

Clinical research professional transitioning into data science, showcasing projects in SQL, Tableau, and Machine Learning with actionable insights and interactive dashboards.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors