Jack Stehn
Machine Learning Engineer | Data Engineer | Data Scientist
San Francisco, California
Summary
High-agency Data Professional bridging rigorous social science research with production-grade engineering. I specialize in architecting end-to-end data systems—from raw ingestion and warehousing to deploying predictive models that drive revenue. I bring a software engineering mindset to data teams, championing CI/CD, unit testing, and modular design.
Experience
Data Scientist (Lead: ML, Data Engineering, MLOps) - Ed Pioneers Fellow
Caliber Public Schools · Richmond, California
- Strategic Leadership (Solo Data Lead): Owned the full data lifecycle (DS, DE, ML) as the sole data scientist. Partnered directly with C-suite and department heads to navigate a 'zero-to-one' environment.
- Predictive ML & Risk Modeling: Developed and deployed explainable ML models to predict staff turnover. Engineered a 'Risk Tolerance' configuration allowing non-technical leadership to adjust precision/recall thresholds.
- Modern Data Stack Architecture: Architected a scalable platform on GCP. Orchestrated ELT pipelines using Dagster, dbt, and dlt to ingest data from disparate SIS and HR platforms.
- Engineering Maturity (ROI): Engineered a comprehensive People Team data pipeline, reducing manual consistency checks from months of collective annual work to seconds.
Data Scientist (ML, Data Engineering, MLOps)
SetSail · San Mateo, California
- Business Impact: Contributed to product enhancements that achieved 33% faster ramp times, 16% higher revenue, and 15x ROI for customers.
- Production ML (Revenue): Developed and deployed production ML models for Propensity Scoring and Churn Modeling. Leveraged NLP on unstructured email metadata to identify sales signals.
- Pipeline Architecture (AWS): Led a critical overhaul of the AWS data infrastructure. Implemented 'SQL Push-down' strategies and asynchronous DAGs, reducing data processing latency by 75%.
- Engineering Best Practices: Championed the adoption of CI/CD pipelines (GitHub Actions), unit testing (pytest), and Agile methodologies.
Data Science Research Team Lead
UC Berkeley School of Public Health · Berkeley, California
- Leadership: Led data science components for mixed-methods studies on equity and public health. Managed a team of undergraduates.
- Unstructured Data: Analyzed diverse unstructured and non-traditional datasets requiring the development of novel data processing approaches.
- Geospatial Analysis: Performed geospatial analysis (ArcGIS) to identify and visualize spatial patterns for non-technical stakeholders.
Education
University of California, Berkeley
Bachelor of Arts in Data Science (Domain Emphasis: Quantitative Social Science)
GPA: 4.00/4.00
Highest Distinction (Summa cum laude). Outstanding Data Science Undergraduate Award (Top of Class).
Skills
Programming & Core Data Skills
Machine Learning - Predictive & Classical
Data Engineering & Cloud Platforms
Software Engineering & DevOps Practices
Data Visualization & BI Tools
Research, Experimentation & Ethics
Awards
2020-2021 Outstanding Data Science Undergraduate Award
UC Berkeley
Recognized for excellence in Data Science undergraduate studies, research, and community contributions at UC Berkeley.
Volunteer & Community
Impact Fellow (Placement @ Caliber Public Schools)
Education Pioneers
Selected for national fellowship applying leadership/management skills to advance educational equity.
- Leadership Development: Applying data science & leadership skills to advance educational equity.
- Capacity Building: Building organizational capacity through strategic data projects at placement site.
Data Team Lead
San Francisco Gay Men's Chorus
Provide data-driven insights for policy-making and organizational growth through survey creation and analysis.
- Team Leadership: Led volunteer team providing data analysis for organizational strategy.
- Survey Analysis: Designed & analyzed surveys (qual/quant) informing policy & growth.
References
"Chosen from over 50 applicants and 5 finalists, Jack joined our organization at a pivotal moment and has been an invaluable team member ever since. Jack streamlined a survey and analysis process that previously took our team a month, developing a replicable system that now delivers actionable insights in just a few days. Jack is an easy choice for any team seeking a results-driven, collaborative data scientist who elevates both projects and people."
Brian Jimenez (Managed Jack directly at Caliber Public Schools) - Managing Director of People
"Not only is Jack an extremely capable engineer and data scientist, they are also a collaborative team player who elevates everyone around them. Their contributions at SetSail were always valuable to the company—whether it was their huge role in our data pipeline migration, or countless bug fixes and feature implementations. I wholeheartedly recommend Jack for any data science position."
Darrin Gilkerson (Worked with Jack on different teams at SetSail) - Software Engineer at QVT Financial
"Jack worked on a variety of projects that involved teasing out actionable insights from complex data sets, enhancing modeling capabilities through feature development and algorithm development, and building out a data ETL process that transformed the data infrastructure to help SetSail scale for enterprise customer needs. I highly recommend Jack as a Data Scientist and Data Engineer for any organization."
Danny Pan (Managed Jack directly at SetSail) - Data Science
"Jack is a motivated self-starter who loves to accomplish project tasks while developing and implementing smooth processes in their work environments. Jack is an accomplished leader, utilizing problem-solving skills to support their own work and the work of their colleagues and peers. Jack is a leader who uses imagination, experience, and empathy to create sustainable processes."
G. Allen Ratliff (Managed Jack directly at UC Berkeley SPH) - Assistant Professor of Social Work
Summary
High-agency Data Professional with a unique background blending rigorous social science research with production-grade engineering. My experiences, including graduating top of my class at Cal after navigating homelessness, gave me a unique perspective and a drive to build useful, ethical tools. I specialize in bridging the gap between 'notebook data science' and scalable infrastructure. My core expertise lies in architecting end-to-end data systems—from raw ingestion (Dagster/dbt) and warehousing (BigQuery/Athena) to deploying predictive models (Propensity Scoring, NLP, Churn) that drive revenue. I bring a software engineering mindset to data teams, championing CI/CD, unit testing, and modular design. Whether acting as a solo data lead or a core contributor in high-growth startups, I focus on untangling complexity to build tools that are explainable, maintainable, and directly impactful.
Experience
Data Scientist (Lead: ML, Data Engineering, MLOps) - Ed Pioneers Fellow
Caliber Public Schools · Richmond, California
- Strategic Leadership (Solo Data Lead): Owned the full data lifecycle (DS, DE, ML) as the sole data scientist. Partnered directly with C-suite and department heads to navigate a 'zero-to-one' environment, moving the org from manual spreadsheets to automated warehousing.
- Predictive ML & Risk Modeling: Developed and deployed explainable ML models (Logistic Regression, Deep Learning) to predict staff turnover. Engineered a 'Risk Tolerance' configuration allowing non-technical leadership to adjust precision/recall thresholds based on quarterly hiring capacity.
- Modern Data Stack Architecture: Architected and built a scalable platform on Google Cloud Platform (GCP). Orchestrated ELT pipelines using Dagster, dbt, and dlt to ingest data from disparate SIS (School Information Systems) and HR platforms.
- Engineering Maturity (ROI): Engineered a comprehensive People Team data pipeline, reducing manual consistency checks from months of collective annual work to seconds. Built a custom automated data quality framework to catch inconsistencies between systems.
- API Integration & Interoperability: Solved complex data interoperability challenges by designing integrations between Schoolmint and PowerSchool. Reverse-engineered undocumented APIs to create a unified data model for cross-functional analysis.
- Data Democratization: Updated the organization's data security policy and designed data literacy training modules, empowering school leaders to access real-time attendance and academic metrics without technical bottlenecks.
- Survey Design & Enrichment: Leveraged data to enrich staff satisfaction surveys with demographic, work location, role, grade level, and tenure data, enabling highly segmented and actionable insights.
- Stakeholder Management: Partnered with leadership on high-impact, data-driven solutions; presented findings to stakeholders including the Board of Directors.
Data Scientist (ML, Data Engineering, MLOps)
SetSail · San Mateo, California
- Business Impact: Contributed to product enhancements that achieved 33% faster ramp times, 16% higher revenue, and 15x ROI for customers.
- Production ML (Revenue): Developed and deployed production ML models for Propensity Scoring (deal closure probability) and Churn Modeling. Leveraged NLP on unstructured email metadata to identify sales signals.
- Pipeline Architecture (AWS): Led a critical overhaul of the AWS data infrastructure (S3, Athena, EMR). Implemented 'SQL Push-down' strategies and asynchronous DAGs, reducing data processing latency by 75% and scaling to multi-terabyte datasets.
- Scalable Data Modeling: Architected scalable Star Schema data models and optimized ETL/ELT processes, ensuring data readiness for LLM integration.
- Engineering Best Practices: Championed the adoption of CI/CD pipelines (GitHub Actions), unit testing (pytest), and Agile methodologies within the data science team.
- Causal Analysis: Performed deep causal inference studies to isolate specific sales behaviors that drive outcomes, influencing the product roadmap to focus on 'high-leverage' user actions.
- Technical Consulting: Acted as a technical consultant for enterprise customers, diagnosing complex data discrepancies and proposing architectural solutions for data integration.
- Cross-Functional Leadership: Collaborated seamlessly with Engineering, Product, and Support teams to spec out large-scale infrastructure restructuring.
Data Science Research Team Lead
UC Berkeley School of Public Health · Berkeley, California
- Leadership: Led data science components for mixed-methods studies on equity and public health (specifically violence against homeless youth). Managed a team of undergraduates.
- Unstructured Data: Analyzed diverse unstructured and non-traditional datasets (qualitative interviews, geospatial data, text corpora, hand-drawn maps) requiring the development of novel data processing approaches.
- Geospatial Analysis: Performed geospatial analysis (ArcGIS) to identify and visualize spatial violence patterns for non-technical stakeholders.
- Visualization: Created interactive dashboards (Tableau, Plotly) to communicate findings to stakeholders.
- Interdisciplinary Collaboration: Collaborated across disciplines (public health, psychology, sociology) ensuring ethical, robust research.
Full Stack Engineer
Los Medanos College · Pittsburg, California
- Full Stack Development: Independently designed and developed a web application (Python, JavaScript, PHP, PostgreSQL) to guide student program selection.
- Data Acquisition: Built a database of 500+ university programs via web scraping (Beautiful Soup) and rigorous data cleaning.
- Stakeholder Consulting: Consulted with college stakeholders (district, counselors) to define user needs.
- UX/UI: Created an intuitive management dashboard for non-technical staff.
Education
University of California, Berkeley
Bachelor of Arts in Data Science (Domain Emphasis: Quantitative Social Science)
GPA: 4.00/4.00
Highest Distinction (Summa cum laude). Outstanding Data Science Undergraduate Award (Top of Class).
Skills
Programming & Core Data Skills
Machine Learning - Predictive & Classical
Deep Learning & Advanced AI
Natural Language Processing (NLP)
Generative AI & LLMs
Data Engineering & Cloud Platforms
Software Engineering & DevOps Practices
Data Visualization & BI Tools
Research, Experimentation & Ethics
Collaboration & Professional Skills
Familiar Technologies & Other Tools
Awards
2020-2021 Outstanding Data Science Undergraduate Award
UC Berkeley
Recognized for excellence in Data Science undergraduate studies, research, and community contributions at UC Berkeley.
Volunteer & Community
Impact Fellow (Placement @ Caliber Public Schools)
Education Pioneers
Selected for national fellowship applying leadership/management skills to advance educational equity via capacity-building projects and leadership development.
- Leadership Development: Applying data science & leadership skills to advance educational equity.
- Capacity Building: Building organizational capacity through strategic data projects at placement site.
- Cohort Engagement: Engaging in rigorous leadership development programming with diverse cohort.
Data Team Lead
San Francisco Gay Men's Chorus
Provide data-driven insights for policy-making and organizational growth through survey creation and analysis (qualitative & quantitative).
- Team Leadership: Led volunteer team providing data analysis for organizational strategy.
- Survey Analysis: Designed & analyzed surveys (qual/quant) informing policy & growth.
- Executive Presentation: Presented data-driven insights to chorus leadership.
Event Producer
Bearrison Street Fair
- Event Production: Co-produced large-scale (~10k attendees) LGBTQ+ community street fair, overseeing all aspects from planning through execution.
- Logistics Management: Managed complex logistics, ~100 vendor relations, and multi-stage entertainment programming.
- Fundraising: Led fundraising efforts, securing over $90k in sponsorships & donations, contributing to event profitability.
- Stakeholder Coordination: Coordinated hundreds of diverse stakeholders including volunteers, performers, operations teams, city agencies, and non-profits.
Mentor and Trans Support Leader
San Francisco Gay Men's Chorus
Support members through mentorship and leadership within trans support initiatives, coordinating meetings and events.
- Peer Mentorship: Provided mentorship and peer support to chorus members.
- Community Leadership: Led coordination for trans member support group meetings and events.
- Inclusive Culture: Contributed to fostering an inclusive environment within the organization.
Transfer Mentor
UC Berkeley Division of Computing, Data Science, and Society
- Student Mentorship: Mentored incoming transfer students transitioning into UC Berkeley Data Science.
- Academic Guidance: Assisted students in developing data science skills & navigating coursework.
- Community Building: Fostered community and peer networking during remote learning (pandemic).
Student Ambassador (Transfer & Career Services)
Los Medanos College
Supported transfer/career programs through data analysis, marketing, peer training, and event coordination.
- Data Analysis: Analyzed student transfer data (SQL, R, Excel) to inform program development.
- Marketing Leadership: Led marketing committee managing social media, web content, and outreach.
- Public Speaking: Presented transfer/career information via public speaking & workshops.
- Peer Training: Trained new student employees on department policies & procedures.
- Event Coordination: Organized large campus events coordinating multiple stakeholders.
References
"Chosen from over 50 applicants and 5 finalists, Jack joined our organization at a pivotal moment and has been an invaluable team member ever since. As soon as they joined, they immediately took initiative on a complex survey design and analysis project that was critical to our success, bringing both expertise and ownership from day one. Jack’s approach is highly collaborative and mission-driven. They actively engage with departments across the organization, listen closely to their needs, and build thoughtful, scalable solutions—including dashboards, data quality reports, and automated systems that allow staff to focus on their core work with students. Jack is systems-oriented and consistently plans for long-term, sustainable outcomes. One standout example: Jack streamlined a survey and analysis process that previously took our team a month, developing a replicable system that now delivers actionable insights in just a few days. This perfectly captures their ability to problem-solve proactively and significantly boost our efficiency and decision-making. Beyond their technical and strategic skills, Jack is reliable, resourceful, and generous with their knowledge. They’ve led internal trainings to empower colleagues, handle ambiguity with ease, and bring a positive, solution-focused mindset to every challenge. Jack is an easy choice for any team seeking a results-driven, collaborative data scientist who elevates both projects and people. I recommend them without hesitation."
Brian Jimenez (Managed Jack directly at Caliber Public Schools) - Managing Director of People
"I had the pleasure of interviewing Jack before they joined the SetSail team-- I gave them a 4 out of 4. It's important to note that on our hiring scale, a 4 meant "I will flip the table if you don't hire this person." One thing that stuck with me after the interview, and which was reaffirmed while we worked together at SetSail, is Jack's enthusiasm for data science and their love of learning (and sharing what is learned). Not only is Jack an extremely capable engineer and data scientist, they are also a collaborative team player who elevates everyone around them. Their contributions at SetSail were always valuable to the company-- whether it was their huge role in our data pipeline migration, or countless bug fixes and feature implementations that directly improved our user experience, you could always count on Jack to get the job done on time, with clean code, and great documentation. I wholeheartedly recommend Jack for any data science position—they would be an invaluable addition to any team."
Darrin Gilkerson (Worked with Jack on different teams at SetSail) - Software Engineer at QVT Financial
"Jack is a sharp, human-first data person. They possess incredible passion for doing what is right and making good science happen. I highly recommend their work and their presence."
Ollie Downs (Studied with Jack at UC Berkeley) - Senior Data and Research Analyst, County of San Diego
"Jack was an integral part of the planning and designing of data pipeline overhaul at SetSail. Even with a moving target and many dependencies, Jack was able to adjust the design of our new pipeline, maintaining conversations across the product and engineering teams as the project progressed. They are also a fast learner and willing to dig into new technologies, which I really admired as their coworker. They would be a great addition to any team looking for a fast-learning and flexible data scientist."
Sarah Nam (Worked with Jack on the same team at SetSail) - Senior Associate at Cancer Navigator
"Jack is a hard-working Data Scientist with a keen eye for details. Their passion for data analytics and software development really stands out when tasked with complex problems. At SetSail, Jack worked on a variety of projects that involved teasing out actionable insights from complex data sets, enhancing modeling capabilities through feature development and algorithm development, and building out a data ETL process that transformed the data infrastructure to help SetSail scale for enterprise customer needs. In addition to these technical skills, Jack's collaborative work with the engineering and product team continually earned praises from fellow coworkers. They were never shy and was always proactive to jump in and help solve a problem. I highly recommend Jack as a Data Scientist and Data Engineer for any organization. Their technical skill and work ethic will be immediately apparent upon joining any team. Feel free to reach out to me as I am happy to provide additional reference or information as desired."
Danny Pan (Managed Jack directly at SetSail) - Data Science
"I had the privilege of working alongside Jack at SetSail, and I can confidently say that they are a top-notch data scientist. Jack's expertise in data science, combined with their passion for software engineering, make them a valuable asset to any team. They have a keen ability to plan and lead complex cross functional projects and their software engineering skills are second to none. Jack's enthusiasm for learning is contagious, and they are always eager to dive into new projects and technologies. They are a great communicator and are able to explain technical concepts in a way that is easy for both technical and non-technical colleagues to understand. On top of all that, Jack is one of the kindest and most genuine people I've had the pleasure of working with. They truly care about their team and go above and beyond to support them. I highly recommend Jack for any data science or software developer role, and I have no doubt that they will excel in their next endeavor."
Josh Mantovani, M.A. (Senior to Jack, worked together at SetSail) - Data Scientist / Engineer
"I am happy to recommend Jack for a variety of roles and positions. Jack is a motivated self-starter who loves to accomplish project tasks while developing and implementing smooth processes in their work environments. Jack is an accomplished leader, utilizing problem-solving skills to support their own work and the work of their colleagues and peers, taking time to ensure that their team has the skills, knowledge, and resources they need to finish their tasks and projects effectively. Jack has a wide array of skills that they readily apply to their work, and they are ready to search for answers and learn new skills to address problems that arise in their projects. Then they are ready and willing to teach peers and colleagues how to utilize those new skills, supporting team-based processes and accomplishing team projects and goals in addition to their own individual work. Jack is a leader who uses imagination, experience, and empathy to create sustainable processes and consistently complete their goals. I am happy to recommend Jack and I am confident that Jack will be a positive asset to any work that they set out to complete."
G. Allen Ratliff (Managed Jack directly at UC Berkeley SPH) - Assistant Professor of Social Work
"I have had the great fortune of having Jack as project lead on the SFYEAH Research Project. One word to describe Jack, I would say "Integrity", Jack holds themself to the highest standard. It shows in the work they produce, Jack is meticulous. Jack is skilled coder and data scientist, with a wealth of geospatial analysis knowledge. They are are first rate leader, and an exceptional communicator. Jack keeps everyone on the same page, and is incredibly thorough. It is an absolute pleasure to work with Jack."
Conan Minihan (Jack was Project Lead for SFYEAH Research Project) - Data Scientist, PhD Student
"Jack and I worked on the same research team and they effortlessly evolved into a pillar of leadership and direction. It's been an absolute pleasure and relief to be able to work alongside them. Jack learned quickly and worked beyond the expected and required amount to ensure deadlines and quality were kept. It truly astounded me how enthusiastic and exceptionally intelligent Jack was as I watched them surpass most of the team in their domain of expertise and knowledge in a matter of weeks. Jack's passion for details, design, and accuracy has made them one of the strongest assets on our team. Their work ethic and energy have and are contagiously inspiring and addicting to be around. Jack is just one of those people that you want on your team in every scenario because they really own the title "jack of all trades""
Eva Smolentseva (Worked with Jack on same research team at UC Berkeley) - Analyzing Natural Language Models @USAA
Summary
Data Engineer with a strong foundation in architecting scalable, production-grade data platforms. Expert in orchestrating end-to-end ELT/ETL pipelines (Dagster, dbt, dlt), cloud infrastructure (AWS, GCP), and modern data warehousing. I bring a software engineering mindset to data teams—championing CI/CD, testing, and modular design to build systems that are maintainable, observable, and directly impactful.
Experience
Data Scientist (Lead: Data Engineering, MLOps) - Ed Pioneers Fellow
Caliber Public Schools · Richmond, California
- Modern Data Stack Architecture: Architected and built a scalable platform on GCP. Orchestrated ELT pipelines using Dagster, dbt, and dlt to ingest data from disparate SIS (School Information Systems) and HR platforms into BigQuery.
- Engineering Maturity (ROI): Engineered a comprehensive People Team data pipeline, reducing manual consistency checks from months of collective annual work to seconds. Built a custom automated data quality framework to catch inconsistencies between systems.
- API Integration & Interoperability: Solved complex data interoperability challenges by designing integrations between Schoolmint and PowerSchool. Reverse-engineered undocumented APIs to create a unified data model for cross-functional analysis.
- Solo Data Lead: Owned the full data lifecycle as the sole data professional, partnering directly with C-suite and department heads to move the org from manual spreadsheets to automated warehousing.
- Data Democratization: Updated the organization's data security policy and designed data literacy training modules, empowering school leaders to access real-time metrics without technical bottlenecks.
Data Scientist (Data Engineering, MLOps)
SetSail · San Mateo, California
- Pipeline Architecture (AWS): Led a critical overhaul of the AWS data infrastructure (S3, Athena, EMR). Implemented 'SQL Push-down' strategies and asynchronous DAGs, reducing data processing latency by 75% and scaling to multi-terabyte datasets.
- Scalable Data Modeling: Architected scalable Star Schema data models and optimized ETL/ELT processes, ensuring data readiness for ML and LLM integration.
- Engineering Best Practices: Championed the adoption of CI/CD pipelines (GitHub Actions), unit testing (pytest), and Agile methodologies within the data science team.
- Business Impact: Contributed to infrastructure enhancements that achieved 33% faster ramp times, 16% higher revenue, and 15x ROI for customers.
- Cross-Functional Leadership: Collaborated with Engineering, Product, and Support teams to spec out large-scale infrastructure restructuring.
Full Stack Engineer
Los Medanos College · Pittsburg, California
- Full Stack Development: Independently designed and developed a web application (Python, JavaScript, PHP, PostgreSQL) to guide student program selection.
- Data Acquisition: Built a database of 500+ university programs via web scraping (Beautiful Soup) and rigorous data cleaning.
- Stakeholder Consulting: Consulted with college stakeholders (district, counselors) to define user needs.
Education
University of California, Berkeley
Bachelor of Arts in Data Science (Domain Emphasis: Quantitative Social Science)
GPA: 4.00/4.00
Highest Distinction (Summa cum laude). Outstanding Data Science Undergraduate Award (Top of Class).
Skills
Data Engineering & Cloud Platforms
Programming & Core Data Skills
Software Engineering & DevOps Practices
Data Visualization & BI Tools
Familiar Technologies & Other Tools
Awards
2020-2021 Outstanding Data Science Undergraduate Award
UC Berkeley
Recognized for excellence in Data Science undergraduate studies, research, and community contributions at UC Berkeley.
Volunteer & Community
Impact Fellow (Placement @ Caliber Public Schools)
Education Pioneers
Selected for national fellowship applying leadership/management skills to advance educational equity via capacity-building projects.
- Capacity Building: Building organizational capacity through strategic data engineering projects at placement site.
Summary
Data Scientist with a unique background blending rigorous social science research with production-grade engineering. Expert in the full ML lifecycle—from exploratory analysis and feature engineering to deploying predictive models (Propensity Scoring, NLP, Churn) that drive measurable business outcomes. I specialize in translating complex data into actionable insights and explainable models, backed by strong statistical foundations and ethical research practices.
Experience
Data Scientist (Lead: ML, Data Engineering, MLOps) - Ed Pioneers Fellow
Caliber Public Schools · Richmond, California
- Predictive ML & Risk Modeling: Developed and deployed explainable ML models (Logistic Regression, Deep Learning) to predict staff turnover. Engineered a 'Risk Tolerance' configuration allowing non-technical leadership to adjust precision/recall thresholds based on quarterly hiring capacity.
- Strategic Leadership (Solo Data Lead): Owned the full data lifecycle (DS, DE, ML) as the sole data scientist. Partnered directly with C-suite and department heads.
- Survey Design & Enrichment: Leveraged data to enrich staff satisfaction surveys with demographic, work location, role, grade level, and tenure data, enabling highly segmented and actionable insights.
- Data Democratization: Designed data literacy training modules, empowering school leaders to access real-time attendance and academic metrics without technical bottlenecks.
- Stakeholder Management: Partnered with leadership on high-impact, data-driven solutions; presented findings to stakeholders including the Board of Directors.
Data Scientist (ML, NLP, Causal Inference)
SetSail · San Mateo, California
- Production ML (Revenue): Developed and deployed production ML models for Propensity Scoring (deal closure probability) and Churn Modeling. Leveraged NLP on unstructured email metadata to identify sales signals.
- Business Impact: Contributed to product enhancements that achieved 33% faster ramp times, 16% higher revenue, and 15x ROI for customers.
- Causal Analysis: Performed deep causal inference studies to isolate specific sales behaviors that drive outcomes, influencing the product roadmap to focus on 'high-leverage' user actions.
- Scalable Data Modeling: Architected scalable Star Schema data models and optimized ETL/ELT processes, ensuring data readiness for LLM integration.
- Technical Consulting: Acted as a technical consultant for enterprise customers, diagnosing complex data discrepancies and proposing architectural solutions.
Data Science Research Team Lead
UC Berkeley School of Public Health · Berkeley, California
- Leadership: Led data science components for mixed-methods studies on equity and public health (specifically violence against homeless youth). Managed a team of undergraduates.
- Unstructured Data: Analyzed diverse unstructured and non-traditional datasets (qualitative interviews, geospatial data, text corpora, hand-drawn maps) requiring the development of novel data processing approaches.
- Geospatial Analysis: Performed geospatial analysis (ArcGIS) to identify and visualize spatial violence patterns for non-technical stakeholders.
- Visualization: Created interactive dashboards (Tableau, Plotly) to communicate findings to stakeholders.
- Interdisciplinary Collaboration: Collaborated across disciplines (public health, psychology, sociology) ensuring ethical, robust research.
Education
University of California, Berkeley
Bachelor of Arts in Data Science (Domain Emphasis: Quantitative Social Science)
GPA: 4.00/4.00
Highest Distinction (Summa cum laude). Outstanding Data Science Undergraduate Award (Top of Class).
Skills
Programming & Core Data Skills
Machine Learning - Predictive & Classical
Deep Learning & Advanced AI
Natural Language Processing (NLP)
Research, Experimentation & Ethics
Data Visualization & BI Tools
Collaboration & Professional Skills
Awards
2020-2021 Outstanding Data Science Undergraduate Award
UC Berkeley
Recognized for excellence in Data Science undergraduate studies, research, and community contributions at UC Berkeley.
Volunteer & Community
Impact Fellow (Placement @ Caliber Public Schools)
Education Pioneers
Selected for national fellowship applying data science & leadership skills to advance educational equity.
- Leadership Development: Applying data science & leadership skills to advance educational equity.
Data Team Lead
San Francisco Gay Men's Chorus
Provide data-driven insights for policy-making and organizational growth through survey creation and analysis.
- Team Leadership: Led volunteer team providing data analysis for organizational strategy.
- Survey Analysis: Designed & analyzed surveys (qual/quant) informing policy & growth.
Summary
Machine Learning Engineer specializing in building and deploying production ML systems end-to-end. From predictive risk models (Logistic Regression, Deep Learning) and NLP pipelines to scalable MLOps infrastructure, I bridge the gap between 'notebook data science' and production-ready systems. I bring strong software engineering practices—CI/CD, unit testing, and modular design—to ensure models are explainable, maintainable, and directly impactful at scale.
Experience
Data Scientist (Lead: ML, MLOps, Data Engineering) - Ed Pioneers Fellow
Caliber Public Schools · Richmond, California
- Predictive ML & Risk Modeling: Developed and deployed explainable ML models (Logistic Regression, Deep Learning) to predict staff turnover. Engineered a 'Risk Tolerance' configuration allowing non-technical leadership to adjust precision/recall thresholds based on quarterly hiring capacity.
- Modern Data Stack Architecture: Architected and built a scalable ML platform on GCP. Orchestrated ELT pipelines using Dagster, dbt, and dlt to ensure clean, reliable feature data for model training.
- Engineering Maturity (ROI): Built a custom automated data quality framework to catch inconsistencies between systems, reducing manual consistency checks from months of annual work to seconds.
- Strategic Leadership (Solo Data Lead): Owned the full data lifecycle (DS, DE, ML) as the sole data professional, partnering directly with C-suite to identify high-leverage ML applications.
Data Scientist (Production ML, NLP, MLOps)
SetSail · San Mateo, California
- Production ML (Revenue): Developed and deployed production ML models for Propensity Scoring (deal closure probability) and Churn Modeling. Leveraged NLP on unstructured email metadata to identify sales signals.
- Business Impact: Contributed to ML-driven product enhancements that achieved 33% faster ramp times, 16% higher revenue, and 15x ROI for customers.
- Pipeline Architecture (AWS): Led a critical overhaul of the AWS data infrastructure (S3, Athena, EMR). Implemented 'SQL Push-down' strategies and asynchronous DAGs, reducing data processing latency by 75% and scaling to multi-terabyte datasets.
- Scalable Data Modeling: Architected scalable Star Schema data models and optimized ETL/ELT processes, ensuring data readiness for LLM integration.
- Engineering Best Practices: Championed the adoption of CI/CD pipelines (GitHub Actions), unit testing (pytest), and Agile methodologies within the data science team.
- Causal Analysis: Performed deep causal inference studies to isolate specific sales behaviors that drive outcomes, influencing the product roadmap.
Data Science Research Team Lead
UC Berkeley School of Public Health · Berkeley, California
- Leadership: Led data science components for mixed-methods studies on equity and public health. Managed a team of undergraduates.
- Unstructured Data: Analyzed diverse unstructured and non-traditional datasets (qualitative interviews, geospatial data, text corpora) requiring novel data processing approaches.
Education
University of California, Berkeley
Bachelor of Arts in Data Science (Domain Emphasis: Quantitative Social Science)
GPA: 4.00/4.00
Highest Distinction (Summa cum laude). Outstanding Data Science Undergraduate Award (Top of Class).
Skills
Machine Learning - Predictive & Classical
Deep Learning & Advanced AI
Natural Language Processing (NLP)
Generative AI & LLMs
Programming & Core Data Skills
Data Engineering & Cloud Platforms
Software Engineering & DevOps Practices
Awards
2020-2021 Outstanding Data Science Undergraduate Award
UC Berkeley
Recognized for excellence in Data Science undergraduate studies, research, and community contributions at UC Berkeley.
Volunteer & Community
Impact Fellow (Placement @ Caliber Public Schools)
Education Pioneers
Selected for national fellowship applying ML & leadership skills to advance educational equity.
- Leadership Development: Applying ML engineering & leadership skills to advance educational equity.
Summary
Software Engineer with deep expertise in building production systems at the intersection of data and software. From full-stack applications and API integrations to CI/CD pipelines and scalable cloud infrastructure, I bring strong engineering practices—testing, modular design, and clean code—to every project. Experienced in leading cross-functional technical initiatives in high-growth environments.
Experience
Data Scientist (Lead: Data Systems, Platform Engineering) - Ed Pioneers Fellow
Caliber Public Schools · Richmond, California
- Platform Architecture: Architected and built a scalable data platform on GCP, orchestrating automated pipelines using Dagster, dbt, and dlt with comprehensive test coverage.
- API Integration & Interoperability: Solved complex system interoperability challenges by designing integrations between Schoolmint and PowerSchool. Reverse-engineered undocumented APIs to create a unified data model.
- Engineering Maturity (ROI): Built a custom automated data quality framework, reducing manual consistency checks from months of annual work to seconds.
- Data Democratization: Updated the organization's data security policy and designed training modules, empowering non-technical staff to access real-time metrics.
Data Scientist (Infrastructure, ML Systems)
SetSail · San Mateo, California
- Pipeline Architecture (AWS): Led a critical overhaul of the AWS data infrastructure (S3, Athena, EMR). Implemented 'SQL Push-down' strategies and asynchronous DAGs, reducing data processing latency by 75% and scaling to multi-terabyte datasets.
- Engineering Best Practices: Championed the adoption of CI/CD pipelines (GitHub Actions), unit testing (pytest), and Agile methodologies within the data science team.
- Scalable Data Modeling: Architected scalable Star Schema data models and optimized ETL/ELT processes, ensuring data readiness for LLM integration.
- Business Impact: Contributed to product enhancements that achieved 33% faster ramp times, 16% higher revenue, and 15x ROI for customers.
- Cross-Functional Leadership: Collaborated with Engineering, Product, and Support teams to spec out large-scale infrastructure restructuring.
- Technical Consulting: Acted as a technical consultant for enterprise customers, diagnosing complex data discrepancies and proposing architectural solutions.
Full Stack Engineer
Los Medanos College · Pittsburg, California
- Full Stack Development: Independently designed and developed a web application (Python, JavaScript, PHP, PostgreSQL) to guide student program selection.
- Data Acquisition: Built a database of 500+ university programs via web scraping (Beautiful Soup) and rigorous data cleaning.
- UX/UI: Created an intuitive management dashboard for non-technical staff.
- Stakeholder Consulting: Consulted with college stakeholders (district, counselors) to define user needs.
Education
University of California, Berkeley
Bachelor of Arts in Data Science (Domain Emphasis: Quantitative Social Science)
GPA: 4.00/4.00
Highest Distinction (Summa cum laude). Outstanding Data Science Undergraduate Award (Top of Class).
Skills
Software Engineering & DevOps Practices
Programming & Core Data Skills
Data Engineering & Cloud Platforms
Familiar Technologies & Other Tools
Generative AI & LLMs
Collaboration & Professional Skills
Awards
2020-2021 Outstanding Data Science Undergraduate Award
UC Berkeley
Recognized for excellence in Data Science undergraduate studies, research, and community contributions at UC Berkeley.
Volunteer & Community
Impact Fellow (Placement @ Caliber Public Schools)
Education Pioneers
Selected for national fellowship applying engineering & leadership skills to advance educational equity.
- Capacity Building: Building organizational capacity through strategic engineering projects at placement site.