Sri Kanajan
Data Scientist

kanajan.sri at

Asjed Hussain
Business Analyst

asjedhussain at

David Yerrington
Data Scientist

david at


This is an entry to an online hackathon competition detailed at . Please refer to that site for more information.

Dropping out of high school is a critical issue in today's society that contributes to many of the long term issues in the nation. To tackle this issue, we need to understand the precise cause and effect relationships.

Our goal is to use data to derive actionable recommendations to help increase the national graduation rate to 90% by 2020.

We found that the most important factors for improving graduation rates are household stability and economic security. We also found that weather plays an important role. On the other hand, we found that school spending and low food access may not be very important for improving graduation rates and have very diminishing returns upon investments to improve them. Our best model had 0.68 R^2 but with 150 features. Our final selected model was with 0.6 R^2 with 7 features.

Interesting Insights!!!:

  1. Coming from educated households were NOT strong predictors of whether you would graduate high school.
  2. We found that the percent of people living under the poverty line was a much stronger predictor than median household income. This implies that it's not necessary to raise incomes for everyone, but just for the neediest families. This helps to target policy in the right direction.
  3. Salaries and Wages were impactful to grad rates but NOTHING else. Investments need to be in the right place !
  4. Food Access is NOT a big impact to grad rates.
  5. Weather in your area IS impactful !

What We Did

How we built it

Most of the work was in the analysis and identifying actionable recommendations at the national level. In addition, we built a set of interactive visualization tools on a web platform in order to help the user understand what we accomplished.


The data required significant cleaning and deep understanding. Firstly, the school districts for cohort sizes under 60 students had graduation rates that were in large buckets. In fact for 6-15 student cohorts, the rate was either <50% or ≥50%. This could cause significant inaccuracies in our model, hence we decided to leave these cohorts outside of our modeling.

Next, we used the unmerged dataset and we needed to merge the specific tracts per school into district level data. The technique mentioned in the documentation assumed that all metrics are treated the same way, which is incorrect. We only applied the weighted average on absolute value metrics and recalculated the % value metrics.


This project entailed an end to end data science methodology and implementation. Building a comprehensive data story, actionable recommendations using predictive analytics and finally a interactive visualization web platform to communicate the story.

Recommendations (More in the notebook on the site)

Given that Household Stability and Economic Security are the biggest factors, here are a few actions to take:

1) Continued investment in community development corporations (CDCs) to strengthen communities from a financial and social perspective.

2) Initiatives like the Low Income Housing Tax and the New Markets Tax Credit have brought vital services to low-income communities. Check out this article for more information.

School spending

1) We found that more instructional spending tended to increase graduation rates, while other types tended to decrease it. Thus, all else being equal, it seems that schools should allocate more money towards instructional spending rather than support spending or administrative spending.

Inclement weather

1) Colder areas tend to have lower graduation rates than warmer areas. One way to design around this problem is to have alternate school schedules for colder areas. It might be better to have a longer break during winter months and a shorter break during summer months for these schools

Future Work

Look into a model for graduation rates per race to understand what features makes each race's graduation rate to be different.