Skip to main content

· 11 min read
Tanul Singh

Scenic view

NLP’s State completely changed when in 2018, researchers from Google open-sourced BERT (Bi-Directional Encoder Representation From Transformers). The whole idea of going from a sequence to sequence transformer model to self-supervised training of just the encoder representation which can be used for downstream tasks such as classification was just mind-blowing. Ever since that day efforts have been made to improve such encoder-based models in different ways to do better on NLP benchmarks. In 2019, FacebookAI open-sourced Roberta who has been ruling as the best performer for all tasks up till now, but now the throne seems to be shifting towards the new king DeBerta released by Microsoft Research in 2022. DeBerta-v3 has beaten Roberta by big margins not only in the recent NLP Kaggle competitions but also on big NLP benchmarks.


In this article, we will deep dive into the DeBerta paper by Pengcheng He et. al., 2020 and see how it improves over the SOTA Bert and RoBerta. We will also explore the results and techniques to use the model efficiently for downstream tasks. DeBerta gets its name from the two novel techniques it introduces, through which it claims to improve over BERT and RoBerta :

  • Disentangled Attention Mechanism
  • Enhanced Mask Decoder

Decoding-enhanced BERT with disentangled attention (DeBerta)

Now to understand the above techniques, the first step is to understand how Roberta and other encoder-type networks work, let’s call this context and discuss it in the next section.

· 11 min read
Vinayak Nayak

In this post, we will understand all the working of polyloss from the paper PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions and implement the same on an image classification task. We shall explore the following

  • Understanding PolyNLoss
    • Quick overview of CrossEntropy Loss
    • CrossEntropy (CE) Loss as an infinite series
    • PolyN Loss by perturbations in terms of CE loss
  • Implementation in PyTorch
    • Understanding oxford flowers dataset
    • Building an image classification pipeline in fastai
    • Writing PolyN Loss function in PyTorch
    • Compare classifiers trained using CE loss vs Poly1 loss

· 31 min read
Atharva Ingle

Natural Language Processing is one of the fastest-growing fields in Deep Learning. NLP has completely changed since the inception of Transformers. Later on, variants of Transformer architecture where-in only the encoder part was used (BERT) cracked the transfer learning game in NLP. Now, you can download a pre-trained model from the internet which is already trained on huge amounts of data and has the knowledge of language and use it for your downstream tasks with a bit of fine-tuning.

If you are into Deep Learning, you might have heard about HuggingFace. They are the pioneers when it comes to NLP. More than 5000 organizations are using HuggingFace today. Hugging Face is now worth $2 billion and recently became a bi-unicorn 🦄🦄. They aim to build the GitHub of Machine Learning and they are rightly marching towards it. Check out the Forbes article here covering the news.

HuggingFace provides a pool of pre-trained models to perform various tasks in NLP, audio, and vision.

Here are the reasons why you should use HuggingFace for all your NLP needs

  • State-of-the-art models available for almost every use-case
  • The models are already pre-trained on lots of data, so you can use them directly or with a bit of finetuning, saving an enormous amount of compute and money
  • Variety of common datasets are available to test your new ideas
  • Easy to use and consistent APIs
  • Supports PyTorch, TensorFlow, and JAX

· 19 min read
Vinayak Nayak

In this post, we shall look at the task of metric learning, and implement the paper Classification is a strong baseline for deep metric learning on the Inshop dataset

  • What is Metric Learning?
  • Framing Image Classification as an Image Retrieval problem
  • Model Architecture
  • Loss Function
  • The Inshop Dataset
  • Implementing the paper on inshop dataset in fastai
    • Building a datablock & dataloader
    • Implementing a custom sampler
    • Implementing a custom metric
    • Training the model
  • Evaluation on unseen data
  • References

· 4 min read
Vishnu Subramanian

Have you ever wondered 🤔 how PyTorch nn.Module works? I was always curious to understand how the internals work too. Recently I was reading's Deep learning for coders book's 19th chapter, where we learn how to build minimal versions of PyTorch and FastAI modules like

  • Dataset, Dataloaders
  • Modules
  • FastAI Learner

This intrigued 🤔 me to take a look at the PyTorch source code for nn.Module. The code for nn.Module is 1000+ lines 😮. After a few cups of coffee ☕☕, I was able to make sense of what is happening inside. Hopefully, by end of this post, you would have an understanding of what goes insidenn.Module without those cups of coffee 😄.

· 22 min read
Atharva Ingle

One of the least taught skill in machine learning is how to manage and track machine learning experiments effectively. Once you get out of the shell of beginner-level projects and get into some serious projects/research, experiment tracking and management become one of the most crucial parts of your project.

However, no course teaches you how to manage your experiments in-depth, so here I am trying to fill in the gap and share my experience on how I track and manage my experiments effectively for all my projects and Kaggle competitions.

In this post, I would like to share knowledge gained from working on several ML and DL projects.

  • Need for experiment tracking
  • Conventional ways for experiment tracking and configuration management
  • Trackables in a machine learning project
  • Experiment tracking using Weights and Biases
  • Configuration Management with Hydra

· 5 min read
Vishnu Subramanian

While designing DL modules like a classification head, it is required to calculate the input features. PyTorch Lazy modules comes to the rescue by helping us automate it.

In this post, we will explore how we can use PyTorch Lazy modules to re-write PyTorch models used for

  • Image classifiers
  • Unet

· 3 min read
Vishnu Subramanian

Time flies. It's been more than a year, from the time we launched. Thanks to all our early adopters for trusting and supporting us. Without your love, feedback and patience we would have not come this far.

In the last few months, we have been working on some key features

  • Managing instance life cycle through Python API
  • Bring your own container
  • Start up script
  • Spot instances
  • Weekly and Monthly prices
  • Live invoice
  • New website, docs and blog page

· 8 min read
Tanul Singh


Transformer-Based Models have become the go-to models in about every NLP task since their inception, but when it comes to long documents they suffer from a drawback of limited tokens. Transformer-Based Models are unable to process long sequences due to their self-attention which scales quadratically with the sequence length. Longformer addresses this limitation and proposes an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer. Longformer’s attention mechanism is a drop-in replacement for the standard self-attention and combines local windowed attention with task-motivated global attention.