starting summer internship @ IRCC, building a chess neural network, wrapping up ML research @ UofT & more!

Bi-Monthly Update: March & April 2024

May 13, 2024

hey, I’m Dev, and if you’re new to my bi-monthly newsletter, welcome! My bi-monthly newsletter is where I recap what’s been going on in my life and share some thoughts and reflections from the last couple months. Allow me to quickly introduce myself, I’m currently a 2nd year Computer Science undergrad student at the University of Toronto; over the last few months, I’ve been busy with wrapping up my sophomore year, doing researching in the ML field, starting my summer internship, and building a couple projects in the ML space. Here is a quick tl;dr of what I did over the last 2 months:

Wrapped up ML research with PointNet model
Built a neural network that can analyze the next best chess move
Deep dive into Transformers
Started my summer internship @ IRCC
Wrapped up my sophomore year @ UofT
Working with Aercoustics to develop ML models for them

Looking back at these past couple months, it’s difficult to wrap my head around how I’ve grown so much in a short period of time. But, I’m grateful for the learning and growth that has come with each experience. If you’re interested in keeping up with what I do & what I learn, consider subscribing to this newsletter. Now let’s jump into the thick of this newsletter :)

wrapping up ML research w/ PointNet model

For the past 4 months, I’ve been conducting Machine Learning research at the University of Toronto. I’ve been working closely with a masters student, Eman Faisal, to develop some ML models for her final thesis. The overarching goal of this project was to perform age & sex classification on 3D point data of a human pelvis. For context, here is what the data would like if we were to plot it onto a xyz-plane:

How to Use PointNet for 3D Computer Vision in an Industrial Context | by Nour Islam Mokhtari | Towards Data Science

Similarly, this was done for the pelvis bone of the human body. The ML model that I worked with is called PointNet. It’s a computer vision model, but much different from a Vision Transformer and a CNN. The PointNet model designed to directly take unstructured 3D point clouds as input, which sets it apart from conventional approaches that often rely on 3D voxel grids or collections of 2D images. Using PointNet also allowed me to efficiently process and capture the finer details of 3-dimensional shapes.

Over the course of these last 4 months, I worked on improving the accuracy of the age & sex classification. The initial issue was that the training accuracy itself was very poor, around ~30%, this was mainly due to the model not being able to pick up on the intricacies of the complex data. I did a lot of tinkering with the model itself i.e adding extra layers, more normalization, better data augmentation techniques, etc. Eventually, after many many iterations, the accuracy of the model did increase and it was able to successfully classify age and sex with great accuracy. Overall, it was a great learning experience and can’t wait to continue researching & building in the ML space 😁. For those of you who are interested in the model, click the button below to read about the original PointNet paper!

original pointnet paper

ML updates 😁

The past couple months have mostly been spent doing research, but with my spare time, I’ve spent some time building out a couple side projects, doing a deeper dive into the fundamental models of ML / DL, and working with a startup to develop ML models for them. As I was working on these projects, I realized that if you really want to build technical depth in a field, there’s no better way to do it than project-based learning. Learning has always been my driving force; in my opinion, it’s the most gratifying aspect of this journey. I realized that learning isn’t just a means to an end; especially for a subject that has a lot of breadth and depth. Learning is just the beginning, chasing knowledge eventually leads to a profound transformation—a shift in perspective, a broadening of horizons, and a deepening of understanding. As such, I worked on 3 main projects over the course of the last couple months:

Built a neural network to evaluate chess moves
Dived into transformers and self-attention
Worked with a company, Aercoustics, to develop ML models for them
Not too many updates on this — but a quick tl;dr, working on a sound classification model which classifies a bunch of sounds specifically in the construction workplace

neural network for chess

Growing up, I played chess quite a bit and so one of the projects that I decided to take on was to build a neural network that could predict the next best move. I built the model using a Convolutional Neural Network, training it on images of chess boards. The difficult part of this project was to get the neural network to understand what it means for a position to be considered “winning” for either black or white.

To get around this, I represented each chess board as a matrix; each chess piece was assigned a numerical value to convert the game states into a matrix form that could be processed by the CNN. This process was done for each type of piece, resulting in separate layers for each piece type in the dataset. Each board state from the game is represented as a set of these matrices (one for each piece type), creating a three-dimensional tensor for each snapshot of the game. Following this, each move in the game was converted from standard algebraic notation to a representation that captures the move's starting and ending positions as two separate matrices (one indicating the start and the other the end).

After that, I trained the model using a model that I built — this is what it looked like:

This architecture effectively leverages the spatial and feature-specific information inherent in the chess board representations, making it adept at tasks requiring deep understanding of chess dynamics. I wrote all my code in an annotated Jupyter notebook, if you’re interested in learning more about how I did the entire project, click the link below 😁

chess neural net

dive into transformers

Alongside building that chess project out, I also spent some time going back to the fundamentals of important Machine Learning models. More specifically, I spent some time understanding all the components of the transformer architecture.

The Transformer Model - MachineLearningMastery.com

In my previous update, I built out a mini language model from scratch using the transformer architecture, but didn’t spend too much time learning the details of the model — such as understanding the self attention block, feed-forward network / MLP, word embeddings, tokens, etc. It was actually pretty cool to go back and understand the smaller components which make it possible for a transformer to do what it does. While understanding the theoretical aspect of it, I had the code for how to make a transformer from scratch and it was cool to see how the theory translates directly into code / practical application. To give a quick breakdown, here is what I spent most of my time learning:

Learning about word embeddings and how they are generated
The embedding matrix is used by the transformer architecture quite often and the idea is that each column represents the vector for a word
And the direction of these vectors in the dimensional space carry semantic meaning → this essentially means that similar words have similar vectors i.e cat and dog would have similar vectors
We can do something cool where we can approximate the vectors of certain words — vec(queen) = ~ vec(king) + vec(woman) - vec(man)
Generating tokens
Initially, these language models would predict one word / token at a time. So it would essentially use all the words before (the context) and then make a prediction of what the next word could be
This prediction would be a probability distribution which is generated by a softmax activation function — we use softmax because it always returns a value between 0 and 1. We take the max probability as the next predicted word
Learning Self-Attention
The overarching goal of self attention is to measure the similarity between words. There are 3 ways to go about it:
Dot product
Cosine similarity
Scaled dot product
The initial “attention is all you need” paper implemented the scaled dot product method to compute the similarities between words
This method essentially gets the similarity between the vectors and then we scale it by 1 / sqrt(n) where n represents the number of dimensions of the vector
Key, Value, & Query Matrices
The purpose of these matrices are to perform a linear transformation to find a matrix that makes it very easy to apply self attention and see the differences between the words

For those of you who are interested in the work that I did, click the link below to the github. This github repository contains both the code of implementing a transformer from scratch and a pdf which breaks down all the concepts within a transformer in very intuitive and simple terms :)

transformer github

other quick updates & reflections!

Now wrapping up this newsletter, I wanted to give a quick update on the other parts of my life which I haven’t given much spotlight to. The main update I have is that I’m starting my summer internship at IRCC as a Junior Data Scientist. I’m super excited to be part of the team and get to learn a ton and further establish myself in the Machine Learning space. Super pumped to provide a full update in the next newsletter once I’ve completed half my summer term there!

In other news, I finally wrapped up my sophomore year at the University of Toronto; this past year was a whirlwind of new experiences, challenges, and growth—a true learning curve, if you will. The leap in intensity from my first year to my second year was quite significant, catching me off guard at times. But amidst the academic demands and the pressure to excel, I found myself learning some profound lessons along the way. One of the most valuable lessons I took away from it all is how important it is to be quick on your feet; more specifically, adaptability. University would often push me out of my comfort zone, but looking back, those moments were when I personally experienced the most growth. I wasn’t growing during the times where I had no work at all, but rather, it was the moments where I had a diverse range of work.

After my freshman year, I became a lot better at balancing my university workload with other work. Balancing my studies with my extracurricular pursuits, including my passion for machine learning and deep learning projects, required a delicate juggling act. But through trial and error, I began to find a rhythm—a way to allocate my time effectively, to focus on what truly mattered, while still leaving room for rest and rejuvenation. As I look back on the past year, I'm filled with gratitude for the journey—the highs, the lows, and everything in between. It wasn't always easy, but it was undeniably rewarding.

looking ahead.

If you’ve made it this far, I would like to thank you for taking time to read my newsletter. I hope that my insights and experiences have been valuable to you, and I look forward to sharing more of what I’m up to in the future. With that being said, here’s what I’m going to be working on in the next few months:

Continuing my work at IRCC as a data scientist intern
Working with Fallyx to develop Machine Learning models for fall detection
- Bigger update to come on the next newsletter :)
Keeping up with writing — producing some articles in the ML space and explaining how important algorithms work
Working on a couple more ML projects

That’s all from me; if you enjoyed reading this newsletter, please consider subscribing and I’ll see you in the next one 😅.

Dev’s Monthly Newsletter

Discussion about this post