Last year, a previous intern had developed a system that helps to manage various events or performances in the competition. It’s called the competition management system (CMS for short). At that time, registration was done through Google Forms, and once the registration deadline has passed, all the participant registration details were collated to one spreadsheet that was fed as input to that system. Forward to this year, my system is expected to generate that spreadsheet when invoked.

There are two types of files where you can collate the data and get a spreadsheet out of it, i.e comma separated values (CSV) and tab separated values (TSV) file. The common thing that both file types share is that they separate the data into columns, or fields as they called it, with a certain delimiter. The obvious difference is that they both use a different delimiter, CSV is with commas while TSV is with tabs. So for example, we could see this in a CSV file:

data1,data2,data3

And this in a TSV file:

data1    data2    data3

If you open either files with office programs such as Microsoft Office Excel, LibreOffice Calc or even Google Sheets, you would see data1 in its own field, data2 in its own field and so forth. The CMS is only able to import a TSV file into its system, so my system simply just have to generate that TSV file. But of course, there were some initial complications in the effort to do that.

Exporting the TSV file

I was given a sample of last year’s spreadsheet by Nadia, so that I could see how to format the TSV file to be a valid one for the CMS to accept it. I was able to generate the TSV file with no big issues, except for the fact that the CMS system was rejecting the TSV file by responding with HTTP status code of 400 Bad Request when POST-ing the file to the CMS’s TSV Importer REST API endpoint. After a few more fruitless attempts, Dr Shawn decided to stepped in and help pinpoint my problems. He used the meld tool that highlights the differences between two files, which in this case is my generated TSV file and the sample TSV file. From there, he could already see that I was not faithfully following the sample, such as if the sample leaves N amounts of rows blank, then my TSV file should follow that same amount of blank rows as well. When looking at the source code of CMS, I thought it would not matter how many blank rows there is. Anyhow, Dr Shawn soon discovered that the biggest difference was that the sample uses a different line ending than my generated TSV file. In Unix systems, line endings are represented with ‘\n’ which are known as line feeds, however MS-DOS systems, now superseded by Microsoft Windows, uses ‘\r\n’ as its line endings. ‘\r’ means that it is a carriage return, you can think of it as returning a text caret to the beginning of the line, while the line feed ‘\n’ brings that caret to the next line.

Now that I know what is needed to be done, it was up to me to amend the generation of the TSV file. Dr Shawn also advised me to look up if there is a standard format for TSV files, in which I did. Although the document specifies a common format for CSV files, it is also applicable to TSV files as well. Two things to note is that it is common that the ‘\r\n’ is used for line endings and each line in the file should have the same number of fields. So apparently the CMS TSV Importer was expecting for the given file to follow this format. Have I knew this earlier, I could save myself some trouble. Moral of the story is to always look up for standards if there is any and follow it faithfully.

Categories: Experiential

0 Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.