Skip to main content

3 posts tagged with "Hugging Face"

View All Tags

· 8 min read
Tanul Singh


Transformer-Based Models have become the go-to models in about every NLP task since their inception, but when it comes to long documents they suffer from a drawback of limited tokens. Transformer-Based Models are unable to process long sequences due to their self-attention which scales quadratically with the sequence length. Longformer addresses this limitation and proposes an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer. Longformer’s attention mechanism is a drop-in replacement for the standard self-attention and combines local windowed attention with task-motivated global attention.

· 5 min read
Tanul Singh

DeepSpeed With the recent advancements in NLP, we are moving towards solving more and more sophisticated problems like Open Domain Question Answering, Empathy in Dialogue Systems, Multi-Modal Problems, etc but with this, the parameters associated with the models have also been rising and have gone to the scale of billions and even Trillions in the largest model Megatron.