Kaia Gao

Qianwen (Kaia)Gao

Data Scientist

User behavior analysis · Causal inference · Growth experimentation

Experienced in agentic RAG systems, A/B tests, and predictive models—translating complex analytics into actionable insights that drive business growth.

About Me

Transforming data into actionable insights

I'm a Data Scientist with a strong foundation in user behavior analysis, causal inference, and growth experimentation. I'm experienced in developing agentic RAG systems, designing A/B tests, and building predictive models using Python and SQL.

I'm proficient in building automated workflows and interactive dashboards that translate complex analytics into actionable insights—from retrieval benchmarking and survival analysis to audience segmentation and marketing optimization. I've driven measurable impact at companies like Xiaohongshu, Didi, and Wrodium.

Currently pursuing a Master's in Computational Social Science at UC Berkeley, I bring both technical depth and product-minded thinking to every project. Fluent in English and Mandarin.

RAG & Retrieval Systems

Experimental frameworks, Recall@k, BM25, vector embeddings, and GEO strategies to improve AI agent retrieval and content attribution.

Causal Inference & A/B Testing

A/B/C/D experiments, survival analysis (Kaplan-Meier, Cox PH), multivariate regression, and statistical inference for product and growth decisions.

Dashboards & Automation

KPI dashboards, Make workflows, and data pipelines—turning analytics into reporting and product strategy for stakeholders.

Skills & Technologies

A comprehensive toolkit for data science and growth analytics

Programming

🐍Python (pandas, numpy, scikit-learn)
🗄️SQL
📊R
💻HTML, CSS, JavaScript
Next.js

Databases & Backend

🔌Supabase
🐘PostgreSQL
📦MySQL
🔗RESTful APIs

Statistics

🧪A/B Testing
📐Causal Inference
📈Regression Analysis
🎲Bayesian Methods

Machine Learning & Frameworks

🤖Predictive Modeling
🔥PyTorch
🧠TensorFlow

Data Processing & Visualization

📊Tableau
🎨Matplotlib
📈Seaborn
📉Plotly
🖥️Streamlit

Tools & Workflow

🔧Git
🐙GitHub
📓Jupyter
☁️Google Colab
📋Excel

Languages

🌐English (Professional)
🇨🇳Mandarin (Native)

Featured Projects

Full-stack apps, NLP, and data science in action

🎭

Consentful Civic Lens – Event Organizer

CalHacks Project | Oct 2025

Built a full-stack web app with Next.js, Supabase, and PostgreSQL for event consent management and storytelling. Integrated Claude API and LiveKit for AI-generated highlight summaries, and developed a recommendation system to personalize future event suggestions based on user interests and location.

Next.jsSupabasePostgreSQLClaude APILiveKit
👗

Consumer Sentiment & Brand Insights from Amazon Fashion Reviews

Course Project | Oct 2025 – Present

Analyzed 2.5M Amazon Fashion reviews to extract customer sentiment and brand perception using NLP techniques (VADER, BERT embeddings, topic modeling). Built regression and clustering models to identify key drivers of satisfaction and differentiate brand positioning. Visualized sentiment and keyword trends across categories through an interactive Streamlit dashboard, providing actionable insights for marketing and product strategy.

PythonVADERBERTStreamlitScikit-learn

Experience & Leadership

Research, analytics, and leadership across tech and creative teams

Research Intern

Wrodium

Berkeley, CA · Dec 2025 – Present

  • Experimental Framework Design – Developed a large-scale A/B/C/D experimental framework to quantify the impact of structured data (JSON-LD) and HTML semantic markers on AI agent retrieval (ChatGPT Search, Perplexity), managing a 64-topic pipeline with over 5,000 longitudinal observations.
  • Retrieval Benchmarking & RAG Optimization – Evaluated retrieval performance using Recall@k (R@5/10), BM25 lexical ranking, and vector embedding similarity to identify optimal page structures (one-sentence claims, facts tables) that improved content retrievability.
  • Statistical Inference & Survival Analysis – Applied Kaplan-Meier estimators and Cox Proportional Hazards models to analyze Time-to-Quote (TTQ) metrics, measuring the statistical significance of freshness signals (IndexNow, <lastmod>) in accelerating AI citation speeds.
  • GEO Strategy & Automated Content Pipelines – Engineered a Make automation workflow to generate data-driven technical reports and blog content on Generative Engine Optimization (GEO), translating complex retrieval benchmarks into actionable product strategies for improving Share of Voice (SoV) and content attribution in LLM-based search engines.

Strategy & Data Analyst Intern

APPA Health

Berkeley, CA · Sept – Dec 2025

  • Market Opportunity Analysis – Analyzed the educational funding landscape to identify and evaluate a pipeline of potential funding opportunities supporting youth wellness.
  • Impact Measurement & Reporting – Established a KPI framework to measure SEL program effectiveness. Analyzed pre- and post-program survey data to quantify impact on student engagement, providing key insights for program iteration and reporting to funding partners.

Marketing Analytics Intern

Xiaohongshu

Shanghai, China · Aug 2024 – Jan 2025

  • Audience Segmentation – Queried and analyzed behavioral and demographic user data using SQL in Hive on a large-scale data warehouse to create 35 pet industry audience segments, contributing to ¥1.83M (~$250K) in ad revenue and improved ad targeting accuracy within the first month.
  • KPI Automation – Developed and automated marketing KPI dashboards using Python, SQL, and RedBI (BI tool comparable to Power BI) to track campaign performance, user engagement, and retention metrics. Presented findings and strategic recommendations to over 740 clients and internal stakeholders.
  • Marketing Strategy – Designed and analyzed A/B tests to optimize ad targeting strategies and creatives. Integrated CRM data to conduct deep-dive analyses on marketing performance, providing insights that improved marketing efficiency and ROI.

Product Strategy & Analytics Intern, Chauffeur Business Unit

Didi

Hangzhou, China · Mar – Jun 2024

  • Pricing Analytics – Conducted multivariate regression and causal inference analyses on supply-demand patterns and user price elasticity to inform dynamic pricing strategies, leading to a 2% revenue lift.
  • User Research – Designed and distributed user surveys to identify pain points in the "hourly driver" service; combined findings with SQL-based behavioral analysis to uncover actionable product insights, driving a 3% reduction in complaints and measurable improvement in driver-passenger experience.

Content Strategy & Analytics Intern

Huace Film & TV

Hangzhou, China · Jun – Sept 2023

  • Content Engagement Analysis – Queried and analyzed 10,000+ follower records using SQL and Python to identify audience attributes and content preferences; created user clusters that informed strategy adjustments, boosting page views by 15.1%.
  • A/B Testing – Conducted A/B tests to refine video strategy; produced and distributed 300+ YouTube clips, leveraging insights to drive engagement from 620K+ global followers.

President

ZJU Lingyun Musical Club

Hangzhou, China · Sept 2021 – May 2024

  • Managed club operations across 8 departments with 150+ members; led the annual musical theatre production, drawing 6,000+ audience members.
  • Produced an original musical commemorating the 40th anniversary of Chu Kochen Honors College, overseeing recruitment, script development, budgeting, and cross-team coordination.

Education

Master of Computational Social Science

University of California, Berkeley

Berkeley, CA · Jun 2025 – Present

  • Relevant Coursework: Advanced Computing, Machine Learning, Advanced Applied Statistics, Data Visualization, Deep Learning for Visual Data (DeCal)

Bachelor of Arts, Communication

Zhejiang University (ZJU)

Hangzhou, China · Sept 2021 – Jun 2025

  • GPA: 3.95/4.00
  • Relevant Coursework: Big Data Analytics, Advanced Mathematics, Probability and Mathematical Statistics, Python Programming, Introduction to Research Methodology in Social Sciences

Resume

View or download my resume to learn more about my experience and skills

KG

Qianwen (Kaia) Gao

Data Scientist · Berkeley, CA

Get In Touch

I'm always open to discussing data science projects, opportunities, or the latest in ML, RAG, and growth analytics.

Contact Information