“State Policy Report Card” – Does not reflect student achievement

Students First Report Card

NAEP Statewide Scores versus Students First State Educational Policy Letter Grade. Showing All students, then separating by eligibility for federal lunch programs.
NAEP Statewide Scores versus Students First State Educational Policy Letter Grade. Showing All students, then separating by eligibility for federal lunch programs.

The organization “Students First” is a policy think-tank organization whose purported mission is to improve the quality of K-12 education in the US. This month, they published a State Policy Report Card 2013, where they assigned a GPA to each state based on their statewide education policies. I had a look at it and my alarm bells went off when my own state, California, scored an F (with big scary red graphic on the interactive map), while the highest scoring state was Louisiana, with a B. Delving a little more deeply, I looked into their methodology, and they seemed to assign scores mostly for political reasons. For some reason, they seem to push the idea of evaluating schools, teachers, principals on a four-tiered system. In other words, assigning a letter grade to them. It seems at the outset a bit simplistic and frankly unimaginative (“Hey, we’re evaluating schools, let’s grade them as if they were in school!“). They also gave California failing marks because teacher evaluation criteria are collectively bargained. Another strange factor is “mayoral control” of a school district. Mayors have lots of priorities, safety, economics, sanitation, transportation, crime; I could see a mayor making hasty and politically driven decisions on school districts. I don’t think it makes much sense to give that power to an office that shifts with political winds; you could wind up with an young-earth-creationist appeaser who wants to erase evolution from science with no checks. At any rate, they are asserting that their policy prescriptions will lead to better student achievement. We can look into this and find that, so far, this has not been demonstrated.

Students First Report Card Grade Negatively Correlates with Actual Achievement

The National Center for Education Statistics administers the databases of nationwide assessment results from the National Assessment of Educational Progress (NAEP). I downloaded the 2011 scores, separated by state for Math, Science, Reading, and Vocabulary (for 8th grade). Overall, by all measures, the Students First GPA assignment negatively correlates with NAEP scores in all four categories. I calculated the Pearson r correlation coefficient. This particular statistic measures the dependence between two variables, if it is positive, means that: as one variable goes up, the other one tends to go up too; and vice versa with negative values. If Students First’s policy prescriptions were tied to student achievement, then  we would expect the state GPA’s to correlate with NAEP scores, and they do not.

Student Achievement and Poverty

It goes without saying that kids living in poverty will not perform as well on scholastic aptitude tests because they will have lacked the resources, tools, infrastructure, and stability to learn well. There is an obvious disparity between poor kids and the rest of kids on these tests. The marker of “poor” is here, is those students who are eligible for the free or reduced lunch program. So if a state is doing well by its children, surely it would take steps to reduce the achievement gap. If Students First’s policy prescriptions were tied to bridging the poverty-gap in achievement, then we would expect the state GPAs to correlate with NAEP scores at least in those kids who are Eligible for the Federal School Lunch Program.  Again — this is not the case; the correlations are even stronger with the Eligible kids compared to the Not Eligible kids.

The Correlation Data

Pearson’s r, Correlation Coefficients – Statewide Scores vs Students First GPA and [by School Lunch Eligibility]
Math -0.3018268 -0.3179454 -0.2183162
Science 0.4118529 0.3695821 -0.1705056
Reading -0.2968387 -0.3148445 -0.1979263
Vocab -0.3891802 -0.436711 -0.3057165

I simply calculated Pearson r for the statewide scores in Math, Science, Reading, Vocabulary for All students, Eligible (for reduced lunch), and Not Eligible (for reduced lunch). These are in the Table above. To determine statistical significance for this type of test, you only need to know the sample size and data distribution. Assuming a sample size of 50, and under the null hypothesis that Students First GPA and NAEP scores are statistically independent, our “critical values” at α < 0.05, r > 0.279; and at α < 0.01, r > 0.363. This means that there’s 1:20 probability that r will be 0.279 or higher even if the GPA and NAEP score are truly unrelated ….. and likewise a 1:100 probability that r will be 0.363 or higher. As we can see, for Science and Vocabulary, it is likely that they are, in fact, negatively correlated for all students. If we were to look at kids in poverty compared to the rest of the student population, the negative correlation is stronger. So by these accounts, if a state were to implement Students First’s policy ideas, the overall trend would be for the all students to perform more poorly, and the poor kids to be more strongly affected.  See the scatterplots of linear regression line.Scores-vs-gpa

Another view is to look at the letter grades.  I separated the states by their Students First State Policy Report Card letter grade and plotted their NAEP scores. There’s nearly a stepwise progression of improving scores with worsening “grades.” I put these plots at the top of the post because it seemed most striking to me. Whatever Students First is measuring, it is clearly NOT student achievement.


I just want to comment that I do think it is a good idea to evaluate statewide and local student achievement. It’s also a good idea to promote policies that improve achievement. The policies that Students First are promoting don’t seem to have an evidentiary basis, and before they go around “failing” states, they need to demonstrate that their ideas will work. Secondly, their four-tiered grading system is poorly implemented. A sign that a scoring scheme is bad is if the entire scale is not used. It is of no value when all the GPAs are clustered in the 0.9-1.2 range; and making almost no use of the B-A range.

So this first came to my attention last Sunday in a blog post on Simply Statistics, a blog that I started following when I enrolled in a MOOC, Computing for Data Analysis. A commenter linked to another blog that showed some back-of-the-envelope calculations that the GPA’s were negatively correlated with Reading & Math. This inspired me to look into it a little further, and see whether it was fluke and I started all the legwork shown above. Tonight, as I was preparing this blog post, I headed over to Students First website and they also have a blogpost, in response to a blog post about the negative correlation. Eric Lerum over at Students First lambasted American Federation of Teacher’s (AFT) analysis. AFT simply did a linear regression in Excel; and Mr. Lerum calculated the R2 value; saying it was not close enough to 1 to demonstrate a correlation. This is actually not true. The wikipedia article on Pearson’s r actually gives a good treatment of that; if you look at the section on statistical inference. Mr. Lerum is implying that the R2 is something like a percent-confidence; it’s really just a measure of how well the data fit a straight line, and you determine statistical correlation by comparing the actual coefficient with a cut-off value that you set as “significant.” I can say with 99% confidence that Science & Vocabulary scores are negatively correlated with Students First GPA; and with 95% confidence that all four tests are. No, they do not follow a straight line. Nor would you expect them to, especially, given the limited range of GPAs that Students First used.

I honestly don’t know much about these players. I do know that Students First has a flashy website with cool graphics meant to make us think that California must be doing something horribly wrong with his big scary red graphic. I also think they are misleading with their title, borrowing the NAEP’s “Nation’s Report Card” slogan, and calling their own report a “State Policy Report Card.” I’m instinctively suspicious when something obviously try to ride the coattails of legitimacy and authority by having a similar-sounding name, when there’s no association between the two. I am also a big fan of evidence-based policy.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s