Imagine creating your own application or game. When you first start out, you may keep and generate your data in code. As you progress you may find how inefficient and cumbersome this turns out to be and perhaps contemplate moving all that to external files.
How do you store the growing amount of application data? What formats do you use to store them?
There are endless methods to store and retrieve data for your project. Before reinventing the wheel by creating your own parser and file format (which is an enormous effort), one must contemplate what requirements your format needs to fulfill.
For example: Does my data need to be…
- …flexible and extendable?
- …compatible with different versions?
- …easily editable?
- …easily maintainable?
- …as compact as possible?
In Java, multiple ways of storing data for your projects exist. I want to explore some of the technology that exist to do that. A summary can be found below. Click on the format to jump to the details.
|Format||Syntax||Legible||Complexity of saved Data||Extendability||Libraries||Potential Uses|
|CSV||Simple||Yes||Plain||Changes must be adopted manually||none||
|.ini Format||Simple||Yes||Nested objects with depth 1||Changes must be adopted manually||ini4j||
|XML||Bloated||Yes||Nested objects||Changes in classes can be ignored (data won’t become incompatible)||JDOM
|JSON||Clear||Yes||Nested objects||Changes in classes can be ignored (data won’t become incompatible)||json-simple||
|Binary||None||No||Nested objects||JDK: Changes in classes may
cause saved data to
become unusableKryo: Changes in classes can be logged, preserving compability
Comma Separated Values. This is probably the most basic format you can use (it’s also one of the oldest, being around before personal computers even existed). CSV files can be easily written programmatically. No additional parsing needed. If you need to keep it as simple as possible (e.g. you only need to store a lot of Strings) you’re good with CSV. It has the advantage of being editable with programs like Excel or OpenOffice, which can be very useful tools. However, the second your data uses a tree-like structure (objects in objects) you get dangerously close to creating your own parser, eventually ending up reinventing the wheel. Don’t go there ever!
CSV extremely handy for tasks like storing configs or as export format – I’ve seen it being used for translation files – but other than that… try to stay away from it.
Windows .ini file
Before starting I want to point out that this is the only file format I
haven’t gotten into much detail, but want to mention it regardless.
The .ini file format is a rather ancient remnant of the past, being in use ever since Windows XP and earlier. It strikes out due to its simplicity. Because .ini files allow you to pack your name-value pairs into sections or groups, they can be very handy even to this day. Another plus: you can read them and edit them with any text editor. These kind of files still won’t allow you to store nested objects, but are easier to use than CSV. They make great configuration files – as long as you keep the data you want to store simple. Keep in mind it’s likely unsuited to store larger amounts of data. You’re welcome to try it out and get the ini4j library implementation for Java.
Due to its nature, XML is a format that supports storing of objects in other objects. Its syntax is rather bloated which results in a rather large file size. XML is readable and easily editable, and that is why it is commonly used in a variety of applications, like storing configs and also more complex data itself for games and software alike. It is supported by many frameworks. For this reason, it is also being used to export files across different programs.
When I dove into XML I was using a custom JDOM serializer. That pretty much ended up being a nightmare because I reinvented the wheel. Since you should probably not do the same mistake, you can use Serializer like JAXB or XMLEncoder from the Java SDK to turn your data into XML files with greater ease.
Another method to store your data is to simply store it in binary, as 1’s and 0’s. The upside is, your data won’t be readable by anyone (well, partially). The downside is, you always have to call your binary serializer from code to store your data.
When working with Java standard serialization, you will encounter problems deserializing your objects from a file once you’ve changed the class. The code will terminate telling you the object cannot be serialized (because you changed, added or removed a constructor, methods or field). This can be disastrous if you are recklessly making changes to a class, just to find out you just made hundreds of bytes of binary data incompatible. Good luck redoing all of this!
To avoid this, there are libraries like Kryo that allows you to 1) create customized serializers for each class and 2) add version control for each field you are storing. Point 2) may give you some control over adding compability, but you will be left with unused code fragments you can’t remove. If you do remove them, Kryo will kry that an old, deprecated field in your code is missing (excuse the pun). This allows you for some control but is still not as flexible as JSON, I’ve found.
However, there are some neat things you can do with binary files, for example controlled binary serialization (only store the bits of data you need using your own definition), creating your own compressed data formats, and much much more. One important thing about binary is the fact it’s used to transmit data over a network.
The most flexible format I have used for saving large amounts of data that is easy to use and maintain is without doubt JSON, followed by controlled binary serialization (Kryo). With a decent serializer, XML can also be very powerful despite of its large file size and bloated syntax. CSV and .ini are rather simple formats and very handy for simple config files that don’t change frequently during development. Which of these formats you will use eventually and for what purpose – you decide!