Converting JSON to TSV using Python streaming

Earlier today, a friend asked my advice on how to convert a JavaScript Object Notation (JSON) file to tab-separated values (TSV) in Python. As with most things in software development, there are many ways to accomplish this, some more Pythonic than others.

I thought about it, and decided to illustrate how to do this map/reduce style (sans reducer) by streaming to STDOUT.

The Data

To start off with, he was dealing with a pretty simple JSON format for closed captioning data. There’s a “cc” root key with an array of items containing: duration, content, and a timestamp. The derived schema looks like this:

Since I couldn’t use his actual data here, I scoured Google trying to find another example of closed captioning JSON data. That proved elusive, so I converted the sample data from w3’s WebVTT Introduction to JSON. This was apparently the beginning of an audio interview between Roger Bingham and Neil deGrasse Tyson.

Here’s the JSON data:

This will be pretty easy to read and flatten into TSV.

The Code

Normally, you might think to craft a class which knows how to load and read the specific JSON file, and maybe code or another class to do the writing to a TSV file. However, in this case, since I’m writing this in a map/reduce style, and streaming the data to STDOUT, I only really need the mapper (CcJsonMapper) and not a TsvWriter class or code. The mapper will map the JSON file to STDOUT as TSV. Here’s how it will be used:

So, basically, I’m calling the cc_json_mapper.py Python script, passing in the filename of the JSON file, and redirecting it’s output to a file.

Here’s the source code for cc_json_mapper.py:

Full Source Code

ejstembler/py-json-to-tsv

More Python Articles

Leave a reply:

Your email address will not be published.

Sliding Sidebar