Skip to main content
Yan He

Projects

Machine Learning/Statistics #

Brazilian E-Commerce Customer Satisfaction Prediction #

  • Use different classification algorithms (Logistic regression, Random Forecast, XGBoost w/ and w/o PCA transformation) to predict two outcomes - whether an online order will receive a 5-star review and whether the order will receive a low review score (<4 star). Find the best model for these predictions based on appropriate performance metrics (hyperparameter tuning).

  • Identify important features that contribute to low/high scores, which can

    • help the business learn and improve or retain good strategies
    • allow the business to proactively intervene with the customers who would potentially give low review to improve their experience and hopefully can prevent them from leaving bad reviews.

Bay Area Airbnb Analysis Codes #

  • Create a price-suggestion model for new Airbnb hosts & identify the important features that contribute to high listing price; Regression models along with cross-validation (Linear, Ridge, Lasso, Random Forest, Boosting) were fitted to find the best prediction model.

  • Provide a suggestion system/algorithms for Airbnb guests based on their preferences. K-means were employed to cluster zip-code areas based on various features.

Econometrics #

Prescription Drugs Project (codes in Appendix) #

  • Health Insurance Plans and Individuals’ Prescription drug Expenditures: How Expenditures vary among Plans along with underlying price inequality.
  • The findings suggest that the presumably higher drug price significantly increased the expenditures on prescription drugs for people with Medicare/Part D and Private Health Insurance plans. The effect is bigger for the elderly covered by Medicare with or without Part D

Data Visualization #

Right now, the R Shiny Apps may have better layout on desktop.

Philly School App #

  • Maps School Characteristics by displaying the differences in school characteristics between different areas Mapping
  • Explores Philly School Data by examining the correlation between school outcomes and different characteristics ggplot2
  • Analyzes School Outcomes by building a regression model and predicting the Student Attendance rate, withdrawals and suspension

Medicare Advantage Plans Visualization #

  • This app presents some descriptive analyses of the data used to analyze the budget effects of an illustrative policy that is designed to auto-enroll eligible people into certain Medicare Advantage (MA) plans. Specifically, this App visualizes the county population and the county level eligible Medicare Advantage (MA) plans (for auto-enrollment) given the illustrative policy choices.

Healthcare facts Shiny Dashboard #

  • This App visualizes health insurance coverage and the health expenditures patterns. With the visualization, we can compare health insurance coverage and cost by different groups of people. We can look into more details of the cost paid by different payment sources and for different service types.