Syncing data from CRS and CMS, more visualizations

This week, I spent some time working on CRS to implement a new REST API endpoint for exporting data in the form of TSV. This is needed for CAS because the existing export function in CRS does not export any information regarding registrants. Next, I implemented a new class, CRSTSVParser in CAS to handle the parsing of string content from the imported TSV. Since I had already implemented a parsing class, CMSTSVParser before this, I thought it might be a good idea to have these two classes inherit from a abstract base class TSVParser which stores some commonly used string processing functions.

Then, I implemented another REST API endpoint, ImportCRSResource in CAS to handle the import procedure. To store registrants information from CRS, I had to add two additional tables into CAS’s database, registrants and entries. The entries table has a foreign key to registrants table. It is required mainly to store the uuid information from CRS, which is crucial for synchronizing with information from CMS. The uuid field is a generated 128-bit number which uniquely identifies an entries record in CRS. This field is named as guid in CMS’s performances table, which uniquely identifies a performances record in CMS. Synchronization is achieved by matching the guid from CMS with uuid from CRS.

During the import procedures carried out in ImportCRSResource and ImportCMSResource also includes computation of statistics required for later use. Since CAS is having two sources of input, it is likely that, most of the time, information from one source (say, CRS) is more updated than the other. Therefore, some of the statistics that stored in CAS (such as percentileByYearCategorySchool) are computed using only information that have already been synchronized with both sources.

Last week, Dr Shawn spotted an issue in CAS database, it appears that the some NRIC values from the players table have missing leading zeros. He suspected that this might have been caused by casting from std::string to int during some data processing steps in CRS, CMS or CAS. I spent some time to investigate into this issue by tracing codes line by line in CMS and CRS. Yet, I couldn’t find the source of error. During a discussion with Dr Shawn on Wednesday, Dr Shawn found out that the problem was actually not due to our code, but occurs during a step involving importing the TSV into Google Sheets during the data transfer step. Google Sheets interpreted the NRIC field as a numeric field (rather than a string), and thus automatically removed the leading zeros. Fortunately this is a problem that can be easily avoided in the future, simply by selecting the right settings while importing into Google Sheets.

On Friday, I started working on the Javascript codes that will produce visualizations on the client browser. I have used bar chart with error bars to present the distribution of prizes (gold, silver, bronze) in various events, grouped bar charts for comparing the precision and recall scores of judges from the same year and venue. Next week, I will continue to work on completing the visualizations, and leave the final week for documentating my code and wikis on github.

Syncing data from CRS and CMS, more visualizations

Published by Wen Yan on 2018-08-212018-08-21

0 Comments

Leave a ReplyCancel reply

Week Thirteen of Internship

Week Twelve of Internship

Week Eleven of Internship

Syncing data from CRS and CMS, more visualizations

Published by Wen Yan on 2018-08-212018-08-21

0 Comments

Leave a ReplyCancel reply

Related Posts

Week Thirteen of Internship

Week Twelve of Internship

Week Eleven of Internship