Qianwen (Kaia)Gao
Data Scientist
User behavior analysis · Causal inference · Growth experimentation
Experienced in agentic RAG systems, A/B tests, and predictive models—translating complex analytics into actionable insights that drive business growth.
About Me
Transforming data into actionable insights
I'm a Data Scientist with a strong foundation in user behavior analysis, causal inference, and growth experimentation. I'm experienced in developing agentic RAG systems, designing A/B tests, and building predictive models using Python and SQL.
I'm proficient in building automated workflows and interactive dashboards that translate complex analytics into actionable insights—from retrieval benchmarking and survival analysis to audience segmentation and marketing optimization. I've driven measurable impact at companies like Xiaohongshu, Didi, and Wrodium.
Currently pursuing a Master's in Computational Social Science at UC Berkeley, I bring both technical depth and product-minded thinking to every project. Fluent in English and Mandarin.
RAG & Retrieval Systems
Experimental frameworks, Recall@k, BM25, vector embeddings, and GEO strategies to improve AI agent retrieval and content attribution.
Causal Inference & A/B Testing
A/B/C/D experiments, survival analysis (Kaplan-Meier, Cox PH), multivariate regression, and statistical inference for product and growth decisions.
Dashboards & Automation
KPI dashboards, Make workflows, and data pipelines—turning analytics into reporting and product strategy for stakeholders.
Skills & Technologies
A comprehensive toolkit for data science and growth analytics
Programming
Databases & Backend
Statistics
Machine Learning & Frameworks
Data Processing & Visualization
Tools & Workflow
Languages
Featured Projects
Full-stack apps, NLP, and data science in action
Consentful Civic Lens – Event Organizer
CalHacks Project | Oct 2025
Built a full-stack web app with Next.js, Supabase, and PostgreSQL for event consent management and storytelling. Integrated Claude API and LiveKit for AI-generated highlight summaries, and developed a recommendation system to personalize future event suggestions based on user interests and location.
Consumer Sentiment & Brand Insights from Amazon Fashion Reviews
Course Project | Oct 2025 – Present
Analyzed 2.5M Amazon Fashion reviews to extract customer sentiment and brand perception using NLP techniques (VADER, BERT embeddings, topic modeling). Built regression and clustering models to identify key drivers of satisfaction and differentiate brand positioning. Visualized sentiment and keyword trends across categories through an interactive Streamlit dashboard, providing actionable insights for marketing and product strategy.
Experience & Leadership
Research, analytics, and leadership across tech and creative teams
Research Intern
Wrodium
Berkeley, CA · Dec 2025 – Present
- •Experimental Framework Design – Developed a large-scale A/B/C/D experimental framework to quantify the impact of structured data (JSON-LD) and HTML semantic markers on AI agent retrieval (ChatGPT Search, Perplexity), managing a 64-topic pipeline with over 5,000 longitudinal observations.
- •Retrieval Benchmarking & RAG Optimization – Evaluated retrieval performance using Recall@k (R@5/10), BM25 lexical ranking, and vector embedding similarity to identify optimal page structures (one-sentence claims, facts tables) that improved content retrievability.
- •Statistical Inference & Survival Analysis – Applied Kaplan-Meier estimators and Cox Proportional Hazards models to analyze Time-to-Quote (TTQ) metrics, measuring the statistical significance of freshness signals (IndexNow, <lastmod>) in accelerating AI citation speeds.
- •GEO Strategy & Automated Content Pipelines – Engineered a Make automation workflow to generate data-driven technical reports and blog content on Generative Engine Optimization (GEO), translating complex retrieval benchmarks into actionable product strategies for improving Share of Voice (SoV) and content attribution in LLM-based search engines.
Strategy & Data Analyst Intern
APPA Health
Berkeley, CA · Sept – Dec 2025
- •Market Opportunity Analysis – Analyzed the educational funding landscape to identify and evaluate a pipeline of potential funding opportunities supporting youth wellness.
- •Impact Measurement & Reporting – Established a KPI framework to measure SEL program effectiveness. Analyzed pre- and post-program survey data to quantify impact on student engagement, providing key insights for program iteration and reporting to funding partners.
Marketing Analytics Intern
Xiaohongshu
Shanghai, China · Aug 2024 – Jan 2025
- •Audience Segmentation – Queried and analyzed behavioral and demographic user data using SQL in Hive on a large-scale data warehouse to create 35 pet industry audience segments, contributing to ¥1.83M (~$250K) in ad revenue and improved ad targeting accuracy within the first month.
- •KPI Automation – Developed and automated marketing KPI dashboards using Python, SQL, and RedBI (BI tool comparable to Power BI) to track campaign performance, user engagement, and retention metrics. Presented findings and strategic recommendations to over 740 clients and internal stakeholders.
- •Marketing Strategy – Designed and analyzed A/B tests to optimize ad targeting strategies and creatives. Integrated CRM data to conduct deep-dive analyses on marketing performance, providing insights that improved marketing efficiency and ROI.
Product Strategy & Analytics Intern, Chauffeur Business Unit
Didi
Hangzhou, China · Mar – Jun 2024
- •Pricing Analytics – Conducted multivariate regression and causal inference analyses on supply-demand patterns and user price elasticity to inform dynamic pricing strategies, leading to a 2% revenue lift.
- •User Research – Designed and distributed user surveys to identify pain points in the "hourly driver" service; combined findings with SQL-based behavioral analysis to uncover actionable product insights, driving a 3% reduction in complaints and measurable improvement in driver-passenger experience.
Content Strategy & Analytics Intern
Huace Film & TV
Hangzhou, China · Jun – Sept 2023
- •Content Engagement Analysis – Queried and analyzed 10,000+ follower records using SQL and Python to identify audience attributes and content preferences; created user clusters that informed strategy adjustments, boosting page views by 15.1%.
- •A/B Testing – Conducted A/B tests to refine video strategy; produced and distributed 300+ YouTube clips, leveraging insights to drive engagement from 620K+ global followers.
President
ZJU Lingyun Musical Club
Hangzhou, China · Sept 2021 – May 2024
- •Managed club operations across 8 departments with 150+ members; led the annual musical theatre production, drawing 6,000+ audience members.
- •Produced an original musical commemorating the 40th anniversary of Chu Kochen Honors College, overseeing recruitment, script development, budgeting, and cross-team coordination.
Education
Master of Computational Social Science
University of California, Berkeley
Berkeley, CA · Jun 2025 – Present
- •Relevant Coursework: Advanced Computing, Machine Learning, Advanced Applied Statistics, Data Visualization, Deep Learning for Visual Data (DeCal)
Bachelor of Arts, Communication
Zhejiang University (ZJU)
Hangzhou, China · Sept 2021 – Jun 2025
- •GPA: 3.95/4.00
- •Relevant Coursework: Big Data Analytics, Advanced Mathematics, Probability and Mathematical Statistics, Python Programming, Introduction to Research Methodology in Social Sciences
Resume
View or download my resume to learn more about my experience and skills
Qianwen (Kaia) Gao
Data Scientist · Berkeley, CA
Get In Touch
I'm always open to discussing data science projects, opportunities, or the latest in ML, RAG, and growth analytics.
Contact Information
Phone
+1 (510) 542-6385GitHub
github.com/kaiagaooLocation
Berkeley, CA