Vithursan Thangarasa bio photo

Vithursan Thangarasa

Originally from Toronto, Canada, and currently based in the San Francisco Bay Area, I am deeply passionate about neural network compression, large-scale foundation models, and enhancing the efficiency of training large neural networks, with a keen interest in generative AI.

Twitter   Google Scholar LinkedIn Github E-Mail

Some Code I've Written

It all started when I took my first computer science class in highschool, and going through the many trials and tribulations of programming. Since then, I’ve been enjoying coding and here you can find some of the programs I’ve written.


  • Simplified Transformer Block for Apple MLX
    Apple MLX implementation of the Simplified Transformer architecture, as presented in a 2023 paper. This project aims to adapt the Simplified Transformer model, which reduces the complexity and component count of traditional Transformer models without compromising performance, for use with Apple’s MLX framework. It promises faster training throughput and a reduced parameter count by 15% compared to standard Transformers, targeting enhanced performance on Apple’s GPUs.

  • nanoGPT for Apple MLX
    Port of Andrej Karpathy’s nanoGPT to the Apple MLX framework. The project adapts the lightweight GPT model implementation to work with Apple’s machine learning framework, making it accessible for developers working within the Apple ecosystem.


  • Sparse Iso-FLOP Transformations for Maximizing Training Efficiency (Sparse-IFT)
    Official implementation of Sparse Iso-FLOP Transformations aimed at maximizing the training efficiency of deep neural networks. The codebase, developed in PyTorch, supports experiments detailed in their paper which introduces a family of Sparse Iso-FLOP Transformations. These transformations are designed to enhance test accuracy relative to training FLOPs by employing dynamic sparsity, particularly through the RigL algorithm, across various neural network configurations. The repository provides comprehensive guidelines for setting up the environment, running CIFAR-100 and ImageNet experiments with different model configurations, and cites their paper for those interested in their research findings.

  • Self-Paced Learning with Adaptive Deep Visual Embeddings (SPL-ADVisE)
    Randomized mini-batches might not be an optimal training curriculum for deep networks. Our paper comes up with an insightful and general method for adaptive curriculum creation by extending self-paced learning with diversity. We show state-of-the-art convergence speeds to optimal test performance on MNIST, FashionMNIST, CIFAR-10 and CIFAR-100. From our BMVC 2018 paper.

  • Magnet Loss for Deep Metric Learning
    PyTorch implementation of the Magnet Loss based on the paper Metric Learning with Adaptive Density Discrimination by Oren Rippel, Piotr Dollar, Manohar Paluri, Lubomir Bourdev from Facebook AI Research (FAIR). From ICLR 2016.

  • VAE with Gumbel-Softmax
    TensorFlow implementation of a Variational Autoencoder with Gumbel-Softmax Distribution based on the papers from Google Brain and DeepMind: Categorical Reparametrization with Gumbel-Softmax by Maddison, Mnih and Teh, The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables by Jang, Gu and Poole, and REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models by Tucker, Mnih, Maddison and Sohl-Dickstein.



  • Ackermann Quicksort
    A comparative analysis of recursion in C and Python for the Ackermann function and Quicksort algorithm.