CAL-ACCESS app hits version 1.0

With your help, we’ve decoded California’s cryptic database tracking money in state politics. Now let’s do something with it.

Today we’re announcing the release of version 1.0 of the django-calaccess-raw-data library, a major step towards our goal of making it easier to analyze the role of money in California politics.

Alongside the code, we’re introducing new, comprehensive documentation for CAL-ACCESS, the jumbled, dirty and difficult state database that powers our project.

Both achievements are thanks to our growing crowd of open-source contributors, which now numbers nearly 150 journalists, computer programmers and other volunteers.

The final sprint was aided by more than 50 people who participated in our fourth California Code Rush event at NICAR 2016 in Denver.

There, hackers like Anthony Pesce contributed code to improve our Django application that downloads, extracts and loads the state’s massive database.

Others, like Neil Bedi and Maegan Clawges, marched through the data column by column to help us beef up our documentation.

Video by University of Missouri students Jennifer Lu and Yang Sun

After all their work, our core team applied the finishing touches and prepared today’s release. To review the full list of changes to our code check out the changelog. The documentation is now available at django-calaccess.californiacivicdata.org.

Why did this take so long?

The CAL-ACCESS database, maintained by California Secretary of State Alex Padilla, is one of the most confusing systems we’ve ever encountered.

For instance, its adminstrators think a column like the SPLT_CD table’s PFORM_TYPE is too self-evident to bother explaining.

Not enough to put you off? Imagine 1,466 more columns just like it.

To spare anyone else from having to sift through Padilla’s outdated, incomplete and inconsistent documentation ever again, we’ve compiled simple searchable explanations for each of the system’s 80 data tables.

(Spoiler: The PFORM_TYPE column stores an ID referencing the parent filing form of a given sub-itemization record. That’s hard-won knowledge we puzzled out after hours of comparing the database records against paper filings.)

We also meticulously mapped hundreds of cryptic abbreviations to human-readable descriptions. No more hunting for which office code represents “Supreme Court Justice” (Was it SPM or SUP?).

And we mapped the dozens of different forms submitted by campaigns and lobbyists to the tables where their information is stored, the most complete connection between the two that we suspect has ever been published.

Now what?

With the first major component of our project complete, we’re excited to begin work on the next link in our planned data pipeline: A “processed data” app that will refine the raw CAL-ACCESS data and shoehorn it into simpler data models that are easier to query and analyze.

We’re also beginning work an online archive where the public will be able to download current and previous versions of the raw CAL-ACCESS data files and, eventually, our processed data files.

How you can help

If you’re a journalist covering California state politics or a political scientist eager to incorporate CAL-ACCESS data into your research, we want to hear from you. Give us a shout on email or in the #california-civic-data channel of the News Nerdery Slack.

If you’re a coder eager to get cracking, check out our repositories on GitHub. There you will find dozens of tickets where you can make a difference today.