I'm training a small LLM this weekend.
Found some cool Llama 2 facts while researching.
Time to train in GPU hours:
• 7B param took 184,320 GPU hours
• 13B param took 368,640 GPU hours
• 34B param took 1,038,336 GPU hours
• 70B param took 1,720,320 GPU hours
How about in