Introducing django-postgres-copy v0.1
A new release of our bulk-loading tool powered by innovation in medical research
Today the California Civic Data Coalition released a new version of django-postgres-copy, our open-source software library that empowers users of the Django web framework to more quickly load large pools of data into PostgreSQL databases.
It includes a sweeping rewrite of the code and a series of feature additions largely driven by a contributor from outside our field of journalism.
In the past, the system was more rigid, allowing only exact one-to-one transfers from file to database.
Greater flexibility can be useful if you want to ignore a field, or ingest not just a column’s raw values, but also a cleaned-up version.
Here’s a simplified example. Django database models can now enlist our existing transformation tools to load both a raw string and a companion version converted into all uppercase letters.
Our CopyMapping utility functions just as before, but with the source column mapped to both database fields. Here it is being run via a Django management command.
Kirby is not involved in our project analyzing money in politics. He said django-postgres-copy is being used at Commonwealth Informatics to load and study millions of anonymized medical records as part of an effort to automate the detection of disease.
He writes that django-postgres-copy “gives us a nice, clean and quick way to load the data into our data store while respecting the Django-ness of everything.”
“[It] also sped up the processing from a naive csvreader implementation that took hours to running in about 10 minutes,” he said.
In our view, this collaboration reaffirms how open-source techniques allow developers in different fields to benefit from each other’s efforts.
Check out Kirby’s changes and learn more about django-postgres-copy in the official documentation.
If there are changes you’d like to see, go get involved on our GitHub repository.