conducting ML research @ uoft, meeting Geoffrey Hinton at an AI conference, building a concentration web app & more!

Bi-Monthly Update: January & February 2024

Mar 13, 2024

hey, I’m Dev, and if you’re new to my bi-monthly newsletter, welcome! My bi-monthly newsletter is where I recap what’s been going on in my life and share some thoughts and reflections from the last couple months. Allow me to introduce myself, I’m currently a 2nd year Computer Science undergrad student at the University of Toronto; over the last few months, I’ve been working on building out full stack ML applications, researching in the ML field, and doing some writing on the side. Here is a quick tl;dr of what I did over the last 2 months:

Built a concentration web app that analyzes your concentration while working (provides feedback and gets you back on track while working)
Wrote an in-depth article on how to build GPT from scratch
Attended UofT AI’s 2024 conference
met with Geoffrey Hinton and had a valuable discussion with him
Doing ML research using PointNet model
Working with Aercoustics to develop ML models for them
Read some interesting books as well

I’ve been busy since my last update earlier in January, I’ve grown a lot both professionally and personally. I fell on my face a ton, but each time I picked myself up, I learned something new. If you’re interested in keeping up with what I do & what I learn, consider subscribing to this newsletter.

uoft-AI conference 2024.

As a part of the UofT AI team, I had the opportunity to attend UofT AI’s annual AI conference they host. It was a surreal experience, being in the same room as the biggest brains in the ML/AI space isn’t an everyday thing. I find conferences are one of those experiences that leave an indelible mark on my personal growth, especially when you need that pushing factor to go out there and build something new. Being surrounded by like-minded people who are equally passionate about the possibilities of AI and machine learning was incredibly inspiring. The energy and enthusiasm in the room ignited a renewed sense of purpose and determination within me.

No alt text provided for this image — Bell AI Panel

The cornerstone of the event was the networking opportunities that arose throughout the conference. I got the opportunity to have valuable conversations with researchers from Google, data engineers from Bell, and many more. But the highlight of all the conversations I had was getting the chance to talk to Geoffrey Hinton and pick his brain about the development of ML models and what this means for the future of industries such as healthcare, education, etc.

One thing that resonated with me was his perspective on integrating ML models into healthcare systems, specifically with AI-based therapists. He mentioned that there was research which showed that AI-based therapists / avatars were preferred over human therapists for many individuals. This revelation underscores the transformative potential of AI in addressing accessibility challenges within mental health support. By implementing AI-based therapists effectively, we stand on the cusp of completely revolutionizing how mental health care is delivered, making it far more accessible and affordable for people around the globe.

Having these high value conversations with experts in the field provided a broader context for my own work, allowing me to see beyond the immediate projects and challenges I had been focused on. After the conference, I had ignited a revitalized passion and optimism within me. The conversations I had and the perspectives I gained have inspired me to think more critically about how my work can contribute to positive change. Overall, the conference was a great experience, full of continuous growth and unforgettable connections.

building gpt from scratch.

In my last update, I built a smaller version of GPT from scratch and shared the detailed Jupyter notebook that I made outlining how I implemented the entire language model. Over the course of the past 2 months, I’ve been working on converting that piece of code into a more digestible piece of content. I published an article that goes over how to implement the entire language model from scratch. The main motivation behind writing the article was to make it easy for anyone to follow, but more than that, it was to explain the intuition behind transformers and how they work. To give a quick overview:

The large language model that i used to build gpt was a bigram language model. A bigram language model is a simple statistical approach to language modeling in natural language processing. It calculates the probability of a word based on the occurrence of its preceding word. mathematically, it is represented as:

\(P(w_n | w_{n-1} \)

where w_n would be the current word and w_(n-1) is the previous. the probability of a specific bigram is calculated by dividing the count of that bigram by the count of its preceding word. while bigram models are simple, they capture some local language structure, although they may not handle long-range dependencies as well as more advanced models like trigrams, n-grams, or neural network-based models. for the case of something simple like generating shakespearian text, this language model worked perfectly. after this point, i implemented the transformer architecture that is referenced in the ‘attention is all you need’ paper. for context:

The Transformer Model - MachineLearningMastery.com

Implementing each aspect of the transformer architecture was challenging; given the large amounts of math involved, specifically with linear algebra. If you want to learn more about the specifics of how I built the model, feel free to click the link below, it’ll redirect you to the article that I wrote which covers every aspect of the transformer architecture, dives into the math, implements the code, and explains why we perform each step. It’s a bit of a long read, but I hope it adds some form of value to the reader.

gpt article

eye-concentration application.

Along with building in the theoretical ML space, I’ve been working on a couple projects that use a ML backend. A common problem among a lot of people is their concentration span: the average human has an attention span of only 8.25 seconds – 4.25 seconds less than in 2000. With this being said, when us individuals are trying to get into a flow state of work, it becomes difficult because there are so many potential distractions. Given all of this, it motivated me to build this web application; the goal of it was to not only benefit myself, but help those who suffer from a form of an attention disorder.

To build out the web application, I first built out a couple ML models that analyzed a couple aspects of your concentration. When analyzing how focused an individual is can’t simply come from 1 piece of data, you need multiple pieces of data. And so, I built out 3 ML models that collected 3 unique pieces of data. The 1st piece of data was analyzing whether or not the persons eyes were closed. This piece of data tells me how often their eyes closed, more specifically, whether or not the individual is sleepy or tired. The 2nd one was eye direction; this basically told me where they were looking, if they were zoning out or looking at their phone, it would become evident that they were not paying attention. The last piece of data that I collected was yawns. This sounds a little non-intuitive, but checking how frequently an individual is yawning gives me and idea of how focused they are. I used some built in openCV models to crop the image of your face and only focus on the eye and mouth. These cropped images would then be fed into the ML models I made. Combining all of these pieces of data, I coded some threshold values that would help determine how focused you are. If you weren’t focused, the computer would speak back to you and provide feedback to you. If you want to check it out, I’ve linked the github repo below.

eye concentration github

updates on ML research @ UofT

For the last 2 months, I’ve been doing Machine Learning research at the University of Toronto. I’ve had the chance to work closely with a masters student to help develop some ML models for their final thesis. To give some context, the goal of the paper is to implement a Machine Learning model to perform age & sex classification on 3D point data of human pelvis’. The way the 3D point data works is that it's essentially a collection of points in a three-dimensional space that represent the surface of the human pelvis. Each point has its own set of coordinates (X, Y, Z) that positions it precisely in 3D space.

The ML model that I’ve been working with is called PointNet. It’s not your typical CNN or Transformer model, this model specifically works with 3D point data. PointNet is uniquely designed to directly take unstructured 3D point clouds as input, which sets it apart from conventional approaches that often rely on 3D voxel grids or collections of 2D images. Using PointNet also allowed me to efficiently process and capture the finer details of 3-dimensional shapes.

The way PointNet works is like this:

Input Format: PointNet takes a set of points from a point cloud as input. Each point is represented by its coordinates (x, y, z) and possibly additional features like color or normal vectors.
Permutation Invariance: One of the challenges with point cloud data is that it is unordered. PointNet addresses this through a symmetric function, specifically a max pooling layer, which ensures that the output of the network is invariant to the permutation of the input points. This means that no matter how you order the points in the input, the output of the network will be the same.
Feature Transformation: PointNet learns spatial transformations of the input points to a canonical, stable space, improving the robustness of the network to geometric transformations. It does this by including a mini-network (T-Net) that predicts an affine transformation matrix applied to the points before processing them for the main task.
Local and Global Feature Extraction: PointNet processes each point individually through a shared MLP (multi-layer perceptron) to extract features from each point. Afterward, a global feature is obtained by applying a max pooling operation across all points, which aggregates the features into a single global signature. This global feature captures the overall shape of the object, while the individual point features capture local details.
Task-Specific Layers: Depending on the task (classification, segmentation, etc.), PointNet uses the global feature (for classification) or a combination of local and global features (for segmentation) in the final layers to make predictions.

Over the past couple months, I’ve been working to implement the original PointNet paper from scratch. There have been a couple road blocks that have prevented the ML model from training at a high accuracy, but that’s something that will be fixed in the coming weeks! If you’re interested to learn more about the PointNet model, I’ve linked the paper below and I’ll provide more updates on how the ML model trains in my next update!

pointnet original paper

looking ahead.

If you’ve made it this far, I would like to thank you for taking time to read my newsletter. I hope that my insights and experiences have been valuable to you, and I look forward to sharing more of what I’m up to in the future. With that being said, here’s what I’m going to be working on in the next few months:

Finishing up my ML research at UofT
Wrapping up my sophomore year @ UofT
Keeping up with writing — I’m going to continue to consistently put out articles and pieces of writing.
Working on a couple more ML projects

That’s all from me; if you enjoyed reading this newsletter, please consider subscribing and I’ll see you in the next one 😅.

Dev’s Monthly Newsletter

Discussion about this post