LZO Data Compression

While fooling around with the zram Linux kernel module, I learned about the existence of a fast compression/decompression algorithm called LZO. This is a very fast compression and decompression algorithm that does not consume a lot of resources and provides decent compression ratios as benchmarked against other popular algorithms.

Looking at this, it is something that would be useful for us to use in the future for our product. As we are storing a lot of files in the database, it would probably help performance if these files are compressed first before storage. This will help reduce the foot-print of the file and reduce the amount of disk activity of the database.

However, this is something for the future. It is a feature that can be easily added to the system as all our file system activities are currently being run through our own custom fstream inherited classes. At the moment, all we do is take a SHA1 hash of every file read/write in order to ensure data integrity. We could easily add in the mechanism to compress/decompress the data in this flow.

We realised that we had to check the integrity of the files as there is no other way to know if the data had been corrupted in any way. There was an odd bug that we faced earlier with the use of a std::vector:reserve(), which was a trivial bug due to our misunderstanding of how that function worked, but was only detected through the hash checks.

Another reason for using the hash is to provide us the opportunity to implement better data deduplication and caching in the future. Two files having the similar content should share the similar storage in order to reduce disk usage and improve caching capabilities.

But all that aside, we will likely implement LZO compression of all files stored in the database as a future improvement to the architecture. For the moment, we will still need to focus on the functionality first.

Leave a Reply