@yusufg
We have optimized training and inference for LLMs. So we can offer a higher level interface. Just submit 10k LLM requests to a cluster of 1000s of GPUs, we will handle it for you.
You can also drop to a lower level, eg PyTorch, BLAS, etc. that would work fine