Loading jsonline files

Working on data with python, you need to load various kinds of files onto python processes. One popular data format in machine learning is jsonline in which each record is saved as json line by line. A simple example looks like following,

{"Country": "Japan", "Capital": "Tokyo"}
{"Country": "France", "Capital": "Paris"}

.

If you want to load a file in this format, the most straightforward way wiht python would be read this file line by line, and load each line as json object like following,

import json 
with open("./data.jsonl", "r") as f:
    data = [json.loads(line) for line in f.readlines()]

and this way is absolutely fine, not complicated at all.

But when I was doing this over and over in projects, it’s tiring. So I did what most of the programmers would do, making a package for make this easeir, and give it a cool name. This is why I wrote sienna.

sienna doesn’t have any extra dependencies (of course) so can be installed easily by pip (or my favorite package manager Poetry) as following,

> pip install sienna
% or
> poetry add sienna

.

And after this, you can load jsonline files like,

import sienna
data = sienna.load("./data.jsonl")

10x nicer, no? It just saves one little line but it comes in handy when I want to load many files.

And you can easily save things too.

import sienna
mega_cool_data = [
  {"Country": "Japan", "Capital": "Tokyo"},
  {"Country": "France", "Capital": "Paris"},
]
sienna.save(mega_cool_data, "./data.jsonl")

Loading other file formats

Initially, I started this project to load only jsonline files but since I load json and text files quite often, I adapted sienna to them as well.

Given a path to a file you want to load, sienna automatically recognizes the file type and load it accordingly.

import sienna
lines = sienna.load("./text_file.txt")  # As line separate strings
obj = sienna.load("./obj.json")  # As usual json file
objs = sienna.load("./objts.jsonl")  # As jsonline file

And of course, you can save python objects in those two formats.

import sienna
sienna.save(["first line.", "second line."], "./text_file.txt")
sienna.save({"my": "data"}, "./data.json")
sienna.save([{"id": 1}, {"id": 2}], "./datas.jsonl")

Under the hood, sienna just checks its extension and match to super simple if blocks (ref). If it’s neither .json nor .jsonl, sienna loads it as line separated text file.

Conclusion

It’s always fun to make mini packages with cool names. If you have ideas to improve sienna, write an issue, or even a PR.

Happy coding.