Okay so before Super Tiny LM, maybe we shall first look at Tiny LM from
@MSFTResearch
, published on May 24, 2023 (and
@astar_research
's publishing date is exactly one year after MSFT, fun fact). In MSFT paper, they trained models with 1M, 2.5M, 8.3M, 28M, and 33M parameters