Warm welcome/Git Worflow
This is the first week of my 10-week internship at AESTE. On my first day, Dr Shawn showed me around the office and taught me how to access the main accounts that I will be using throughout my work at AESTE. Like most other interns, my first task is to learn about Git, a version control system used to track changes in files in order to help coordinate work between different members of a project. I have already been using Git for my own assignments and group projects for some time, but this is my first time being explicitly introduced to the Git workflow. This is the workflow that is being utilized at AESTE in all projects, so I spent several hours practicing some common Git workflow operations in the sandbox repository. Dr Shawn also shared with me a blog post which describes the underlying structure of Git. I found this a really interesting read. It turns out that Git can be understood as a purely functional data structure. Visually, the Git commit history is like a directed acyclic graph, where each node represents a Git commit, and every branch of a repository points towards a specific node in the graph.
Web development – Witty
Since I would be building a web application, I started learning about Witty. Witty is a C++ GUI library for web development. The beauty of Witty is that it allows the programmer to focus on the functionalities of the UI elements, which are implemented as widgets by Witty. I spent some time to experiment with a helloWorld web application implemented in Witty to get familiar with the Witty Widgets.
I also spent some time to learn about Witty Database Objects. The Wt::Dbo class implemented in Witty is a cool implementation which maps C++ objects to tables in a database. Not only that, it allows the programmer to perform SQL operations on the database without actually writing SQL statements, but using the querying functions implemented by the Witty Database Object class.
The web application that I will be building uses a REST API to communicate with a server. Therefore, I need to learn how to use the Wt::WResource class in Witty. I have spent some time reading the REST API written by previous interns at AESTE. However, right now I am still not able to fully understand their codes. I will need to spend more time to learn about this in the upcoming week.
On my third day, Dr Shawn gave me an overview of the project that I will be working on for this internship. Basically, I need to build a system which can perform statistical analyses on competition data, and provide useful insights that will benefit three main parties involved in the competition, namely the participants, registrants and organizers. For instance,
- A participant wishes to know how well he/she has performed relative to all other participants, including competitors from other states.
- The organizer wishes to have a rough estimate of the number of participants that will register for the competition in the upcoming year, in order allocate resources in a more informed manner.
- The organizer wishes to know whether the participants are properly evaluated, whether the judges are giving biased scores to participants.
After spending some time to think about this, I realize that the analyses that are relevant in this project could be categorized into these three types:
- Standardization of scores and calculation of percentile – Judge scores need to be standardized because a judge typically has a personal preference over the ‘range’ of values he/she gives.
- Time series forecasting
- Analysis of interaction between variables
More googling and reading needs to be done in order to determine which algorithms/models are most appropriate within the context of this project. For now, some of the models that I can think of are ANOVA, linear models, and ARIMA.
The competition data that will be used in this project must be extracted from the competition management and registration systems. This is certainly not the most glamorous part of the project, but it needs to be done using Witty Database Objects, and a REST API framework. In the past, when I was given data analysis assignments in college, I was always handed a clean, well-structured dataset that is ready for analysis. However, now I am beginning to appreciate the process and effort required to obtain clean, well-structured data for data analysis.