This is the last week of my 10-week internship at AESTE. As planned, I spent the entire week writing up the Github wiki documentation for:
- CAS (Database schema, REST API, details on how to interpret results of analysis)
- CMS (REST endpoint for export)
- CRS (REST endpoint for export)
There were quite a lot that I needed to cover in my documentation for CAS. Fortunately, I was able to finish on time.
Since this is my final week, I guess it would be good for me to share some of my thoughts about working at AESTE. I am a third year undergraduate student studying Computer Science and Mathematics at HKUST. My first exposure to programming occurred in my first year of study in university, during which I was involved in developing C++ code for maneuvering a robotic car. Prior to that, I had no idea that programming was such an immersive activity (I entered university with the intention of pursuing a mechanical engineering degree). I picked up Mathematics as a second major because I wanted to pursue a data science career in the future, and I realize that my CS major alone could not provide me with sufficient exposure to more rigorous math & statistics topics.
Coming from a background with almost no programming experience before university, learning programming, computer science and theoretical math turned out to be quite a challenge for me. In certain semesters, I would perform well on computer science subjects and then struggled more on the math subjects.
During my first internship, I worked on a project that is very ‘data science-ish’. I needed to develop a time series anomaly detection algorithm to identify erratic patterns in customer headcount data. I really enjoyed working on the project, and in the end I did manage to implement an algorithm (in Python) that works. However, now that I think back, I guess that particular project was a little too much for my capabilities back then. Although my algorithm was functional, there were many aspects of how I handled the project that were questionable. For instance, my code was not written in a maintainable manner, no thought has been given into considering whether design pattern(s) could be employed, version control system like git was not used on a strict manner.
After my first internship, I realize that before delving deep into data science, I needed to first equip myself with solid understanding of software engineering practices. I need to first learn how to write good, maintainable code. This is one of the main reasons I decided to work at AESTE. Dr Shawn assigned me a statistics-oriented project that I needed to build from scratch. This time, I not only need to think about an algorithm, but I need to consider the entire web application:
- I had to start using Git religiously.
- Datasets were not handed to me freely, I needed to extract them from existing databases of other applications.
- Working in C++ forces me to think more about how to generalize my code using virtual functions & templates. In particular, I spent a lot of time refactoring the ORM sections of my C++ code, during which I experimented with different designs using inheritance & composition.
- I had to acknowledge the importance of using try-catch blocks and logging them in a manner that could be easily understood.
- I also needed to consider which statistical models are most suitable for the purpose of the product.
- I needed to think about how to present the results of my analysis in a manner that could be interpreted by non-experts.
Sure, the project was not as glamorous and exciting as the one I worked on during my first internship. But, I think it is highly likely that a large portion data science projects in real life are structured in a similar manner: 60 to 70 percent of the time is spent on data wrangling, generating visualizations, working with databases, understanding the problem etc. Modeling is only a very small portion of the total work.
I am satisfied with the work I’ve produced in the past 10 weeks. The only small disappointment that I may have is that I wasn’t able to implement a user interface on time. Well, considering I only had 10 weeks, and the level of my programming skill, I guess I just had to accept it because I had given my best. I am still inexperienced and I have a lot more to learn.
I would like to thank Dr Shawn for giving me feedbacks on my work regularly, as well as treating us lunches occasionally and sharing with us his experiences. I also really grateful to be able to work with my colleagues. Thank you Marcus, Lucas, Ahed, Gabriel, Alya and Nadia for making my internship so much more enjoyable.