Sparse expert models are becoming increasingly relevant as they are now being used across many domains (NLP, speech, vision, multi-modality) w/ very strong results
Right now sparse expert models hold SOTA on various benchmarks (e.g. ST-MoE on SuperGlue, ANLI, ARC, etc…)