Exploring the transformer architecture
At the core of Anand’s Excel implementation lies the GPT-2 Small model, which boasts 124 million parameters. In contrast, the full version of GPT-2 employed a staggering 1.5 billion parameters, while its successor, GPT-3, raised the bar even