Legislators hear arguments over STAAR’s validity

Education chief disputes critics’ claims of flawed reading exams

By Jacob Carpenter STAFF WRITER

A long-simmering debate over the validity of Texas’ primary public education standardized test, known as STAAR, will reach the Capitol in Austin on Tuesday, when school superintendents and Education Commissioner Mike Morath are expected to provide diverging testimony about the exam’s reliability.

At issue is whether Texas students unfairly are being tested on reading material beyond their grade level, causing exam scores to slightly dip in recent years.

The question is expected to pit skeptics of the State of Texas Assessments of Academic Readiness, known as STAAR, against Morath’s administration during a hearing in front of the Texas House Public Education Committee. While there are no immediate plans to change STAAR, recently reignited concerns about the test have prompted calls from some legislators and education leaders for a thorough re-evaluation of the exam.

Some Texas superintendents and local education leaders argue the state’s declining scores in reading — 1 in 4 students meet the minimum STAAR passage standard, roughly 1 in 2 are considered on-grade level — are due to a flawed test, rather than faltering student performance.

As evidence, they cite a study showing that STAAR reading exams given to students in grades 3, 4 and 5 were of nearly identical difficulty in terms of the complexity of words and sentences. The study, they say, helps prove that students are not tested on materials aligned with their grade level.

In addition, critics of STAAR’s validity argue Texas teachers are working too diligently and effectively to allow educational outcomes to stall.

“STAAR results are telling us one thing about reading and children, and other universally accepted standards and metrics are saying something else,” said Alief ISD Superintendent H.D. Chambers, who also serves as president of the Texas School Alliance, a superintendent-led association of 37 medium- and large-sized districts.

Morath, however, plans to argue that text difficulty cannot be measured solely by readability formulas, which do not account for a student’s knowledge base or the underlying content of the text, among other factors. Texas Education Agency leaders said about 15 to 25 Texas teachers review each reading exam and students field-test STAAR questions on other exams — methods they consider preferable to computer algorithms. A third-party organization commissioned by the state also deemed the test appropriately difficult for each grade level analyzed, according to the company’s report.

TEA officials have noted that the state’s downward trend in reading scores closely mirrors Texas’ decline during the past decade — from 30th to 41st among the 50 states — on the reading portion of the National Assessment of Educational Progress, an exam commonly referred to as the Nation’s Report Card.

“The evidence clearly shows that STAAR is predictive of meaningful life outcomes, like being ready for college, career and the military,” Morath said in a statement. “But it’s also critical that we work to ensure educators have confidence in the results they get from STAAR. Without that confidence, we’re less likely to adjust our own practices to improve student support.”

Used for grades, sanctions

Few Texas education matters draw as much debate as STAAR, which is administered annually to all students in grades 3 through 8 and some high school students. Federal law requires states to administer standardized exams and apply results to a statewide accountability system.

In Texas, STAAR results are the largest factor determining state-issued letter grades, which can have significant impact on perceptions about districts and schools. Texas also uses STAAR results as justification for issuing sanctions to some low-performing districts and holding back some failing students from advancing grade levels. Opponents of STAAR often argue the test’s high stakes are detrimental to students and teachers, while supporters claim the exam system holds educators accountable and improves academic outcomes.

In a 2017 poll published by the Texas Tribune, 21 percent of respondents said the best way to improve public education is to cut the number of standardized tests — more than increased spending on schools, raising teacher pay or expanding pre-kindergarten access.

“You’ll find folks who are strongly associated with the tea party … you’ll find very progressive parents who think it’s a very poor overall look at what schools are doing to educate their children,” said Mark Wiggins, a lobbyist with the Association of Texas Professional Educators. “It’s really across the board.”

While education leaders and advocates have loudly debated the appropriateness of STAAR’s high stakes for years, doubts about the test’s validity have received much less attention.

That changed last month, when Texas Monthly published an article about educators’ concerns. Within hours, House Public Education Committee Vice Chair Diego Bernal, D-San Antonio, called for an investigation into the STAAR creation process and a halt to accountability-related sanctions.

The debate over STAAR’s validity largely has centered on readability formulas, which are mathematical tools that analyze various text properties — rarity of words, average sentence length, complexity of sentence arrangement — to determine grade-level appropriateness.

‘Not one you can dismiss’

In 2012, two associate professors from Texas A&M University-Commerce ran 10 STAAR reading passages through five readability formulas. They found some texts given to elementary-age students were two or three grades above their level, based on an average of the five readability ratings.

Four years later, researchers at the University of Mary Hardin-Baylor analyzed 51 STAAR passages given to elementary students between 2013 and 2015, running the texts through many of the same formulas and a common readability metric known as Lexile. The results: students in grades 3-5 were taking tests with nearly identical readability and Lexile scores.

“It’s a complex issue, but I would definitely say (readability formulas) are a good tool,” said one of the study’s co-authors, University of Mary Hardin-Baylor professor Jodi Pilgrim. “It’s not one you can dismiss.”

Even though students in grades 3-5 were all taking STAAR exams with nearly identical readability ratings, third-graders passed the test at a higher rate (80 percent) than fourth-graders (76 percent). In addition, fifth-graders (83 percent) barely outperformed third-graders.

TEA in 2017 started incorporating Lexile scores in its determination about appropriateness of STAAR text passages

“Not because we believe it’s a necessity but rather because we wanted to increase public confidence in the assessment,” said Jeff Cottrill, TEA’s deputy commissioner of standards and engagement.

Beyond readability metrics, the debate over STAAR’s validity also rests on differing perceptions about the performance of Texas educators. Chambers said Texas teachers and administrators have devoted too much time and too many resources for students to be falling behind. He also noted his district’s internal, state-approved reading assessment showed 57 percent of students reading on grade level, compared to 27 percent on STAAR.

“Either you blame the teacher, you blame the student or you blame the instrument being used,” Chambers said. “The first one I would want to investigate is the instrument.”

Texas School Alliance officials emphasize they do not want to lower standards for students, but state officials are wary of attacks on STAAR given other data suggesting students are falling behind.

“As opposed to trying to say the STAAR test is giving us a false narrative, I think it would be best as Texans if we accepted the challenge, as well as the moral obligation, to improve student outcomes,” Cottrill said.

Andrea Zelinski contributed to this story. jacob.carpenter@chron.com twitter.com/ChronJacob