Visualizations, Refactoring, and More

Last week, I was spent a lot of time exploring various options for writing the Rest API for returning a JSON string to the browser. I eventually came up with a template class (derived class of Wt::Resource) to handle ‘queries’ to each table in the database, and made use of the Poco::JSON::Object for setting the key-value pairs of a JSON string. Thankfully, it took me just a few hours to implement this properly.

My next task was to create some visualizations for the statistics I’ve calculated so far. To achieve this, I first did some searching to find out which are the commonly used Javascript libraries for charting. There are many blogs/tutorials online that describe the pros and cons of these libraries (this seems to be a very good one). I ended up going with plotly.js, which is relatively easy to use and provides huge variety of visualizations. The only downside is that the library is not lightweight and might be an overkill for this project.

I used a radar chart from plotly.js to illustrate the percentile scores of a performance in 6 different scopes (same category and year, same category and instrument … etc). I thought it looks great and really cool, but Dr Shawn later pointed out to me that a radar chart is much more useful for comparing between two or more performances’ statistics. Also, radar charts might be less well understood by the general public, as apposed to a simple bar chart, which also fulfills the same purpose in this case. Dr Shawn also reminded me that, when calculating statistics from datasets, I need to put myself in the audiences’ shoes to imagine what kind of information would be most practical and impactful for the audiences’ lives.

Besides working on visualizations, I have also spent some time implementing more calculations of statistics in the backend. This week, I have completed the calculations for estimations of the proportions of gold, silver and bronze prize winners in different events. These estimates are made based on the assumption that the proportions of gold, silver and bronze winners follow a multinomial distribution. From the estimated proportions, I have also calculated the 95% confidence intervals of the estimated proportions using the Wilson’s score method, which is an improvement over the commonly used Wald interval, especially in cases where sample size is small.

Another set of statistics that I have implemented this week is the precision and recall values of judge scores. Precision and recall are commonly used as a performance measure for classification models in machine learning/data mining. However, in the context of this project, we may treat each judge as an individual ‘classifier’ who is trying to ‘classify’ whether a specific performance belongs to the gold, silver or bronze caliber. I will give an example of why I think precision and recall values are highly relevant:

For judge A, his precision of gold prize (%) can be interpreted as, “Out of all instances where judge A has initially given a gold caliber score to a performance, how many of those performances ended up being awarded a gold prize”.
For judge A, his recall of gold prize (%) would then be interpreted as, “Out of all instances where the performance ended up being awarded a gold prize (and judge A was involved in these performances), how many times did the judge gave a gold caliber score, initially.”

A judge with high recall for gold will rarely give an underwhelming score to a performance which deserves a gold prize. When a judge with high precision for gold gives a gold caliber score, he/she is most certainly making the right call. The same interpretation can be applied for precision and recall values of bronze and silver prizes.

Finally, I have also spent some time to refactor some of my Wt::Dbo codes this week. Instead of having the persist() method in the Table class, I have introduced a nested class Record within each Table to hide the implementations of the persist() method. I find this much more intuitive than my previous design, as Table classes should handle table-level operations (like updateColumns(), map(), checkDuplicates()), while Recordclasses should handle record-level operations (like writeToJSON()). In my previous design, table-level operations and record-level operations were stored within the same class, which is not intuitive at all.

Visualizations, Refactoring, and More

Published by Wen Yan on 2018-08-092018-08-09

0 Comments

Leave a ReplyCancel reply

Week Thirteen of Internship

Week Twelve of Internship

Week Eleven of Internship

Visualizations, Refactoring, and More

Published by Wen Yan on 2018-08-092018-08-09

0 Comments

Leave a ReplyCancel reply

Related Posts

Week Thirteen of Internship

Week Twelve of Internship

Week Eleven of Internship