Knut Jägersberg Profile Banner
Knut Jägersberg Profile
Knut Jägersberg

@JagersbergKnut

6,184
Followers
5,085
Following
4,588
Media
96,147
Statuses

Content Strategy & AI @knutjaegersberg @sigmoid .social

Gronau, Germany
Joined June 2018
Don't wanna be here? Send us removal request.
@JagersbergKnut
Knut Jägersberg
2 years
declare-lab/flan-alpaca-xl Base model: flan-t5, thus no license probs
13
51
306
@JagersbergKnut
Knut Jägersberg
7 months
@tsarnick weirdly, I think it is existing celebrities. Sure you can make AI influencers, and they can and will compete, but even with interesting personality, the brands of at least some existing people remain valuable. People are interested in people, even with superinteresting AI around.
17
3
285
@JagersbergKnut
Knut Jägersberg
2 years
WizardLM: An Instruction-following LLM Using Evol-Instruct uuh "WizardLM-7B outperforms ChatGPT in the high-complexity instructions... Evol-Instruct is a novel method using LLMs instead of humans to automatically mass-produce open-domain instructions"
9
57
280
@JagersbergKnut
Knut Jägersberg
2 years
TheBloke/galpaca-30B-GPTQ-4bit-128g Tom Jobbins had the kindness to quantize galpaca 30b, it fits in 18gb of vram.
4
58
274
@JagersbergKnut
Knut Jägersberg
2 years
GeorgiaTechResearchInstitute/galpaca-30b Please somebody 4int this!
7
50
263
@JagersbergKnut
Knut Jägersberg
11 months
llmware/dragon-mistral-7b-v0 A RAG model
7
19
259
@JagersbergKnut
Knut Jägersberg
10 months
TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
7
33
203
@JagersbergKnut
Knut Jägersberg
2 years
Writer/camel-5b-hf Camel-5b, a state-of-the-art instruction-following large language model
6
36
201
@JagersbergKnut
Knut Jägersberg
1 year
GeneZC/MiniMA-3B A language model distilled from an adapted version of LLaMA2-7B following "Towards the Law of Capacity Gap in Distilling Language Models".
Tweet media one
6
28
196
@JagersbergKnut
Knut Jägersberg
2 years
TheBloke/alpaca-lora-65B-GGML -4bit -2bit Already with positive community reception: "This is the best model I have tried locally this far. Thank you!"
5
37
186
@JagersbergKnut
Knut Jägersberg
1 year
Recent LLMs: - StableVicuna 13B - WizardLM 7B - GPT4 based Alpaca 30B - OpenAssistant data llama fine tune 30B - RWKV Raven 14B with EvolInstruct added - Replit Code - FastChat-T5 - GPT4-X-Alpasta-30B -GPT4-X-AlpacaDente-30B - llama-30b-supercot -Chimera-13B - Alpacino-30B
4
43
178
@JagersbergKnut
Knut Jägersberg
2 years
digitous/Alpacino30b A triple model merge of (Alpaca+(CoT+Storytelling)), resulting in a comprehensive boost in Alpaca's reasoning and story writing capabilities.
4
34
176
@JagersbergKnut
Knut Jägersberg
1 year
LLMs/AlpacaGPT4-LoRA-7B-OpenLLaMA Soon there will be high performance LLMs with permissive licenses. It won't take long until the fully trained openllama get's a fine tune on OpenAssistant data. Later we will have 30b redpajama on that.
3
45
174
@JagersbergKnut
Knut Jägersberg
1 year
TheBloke/wizard-vicuna-13B-HF
Tweet media one
8
28
147
@JagersbergKnut
Knut Jägersberg
10 months
Trelis/Mistral-7B-Instruct-v0.1-Summarize-16k used patent dataset to make it. They also have a 60k version, which you can buy for 30 euro. that's the first model I have seen on the hub which you can purchase. I don't think that's a bad idea.
5
18
142
@JagersbergKnut
Knut Jägersberg
6 months
llama3-42b: Pruned llama3-70b
Tweet media one
5
21
144
@JagersbergKnut
Knut Jägersberg
10 months
Q-bert/Mamba-1B
Tweet media one
1
14
142
@JagersbergKnut
Knut Jägersberg
1 year
01-ai/Yi-34B-200K The 200k context model was released.
1
23
143
@JagersbergKnut
Knut Jägersberg
1 year
One guy is really flooding the hub
Tweet media one
19
3
133
@JagersbergKnut
Knut Jägersberg
2 years
More ranking LLMs for distraction: Gpt-4 Gpt-4-distilling-alpaca-30b Chatgpt / gpt-3.5 alpaca-30b galpaca-30b Llama 30b Gpt-4-alpaca-13b Vicuna-13b Koala-13b Alpaca-13b Flan-t5-xxl
8
23
133
@JagersbergKnut
Knut Jägersberg
9 months
internlm/internlm2-20b (200K) Claims ChatGPT comparable performance
4
18
123
@JagersbergKnut
Knut Jägersberg
11 months
Made a tiny-llama based 1b deacon and quantized it to 6bit. I'm surprised about the quality of the output. Upload might take 20 mins or so.
5
16
119
@JagersbergKnut
Knut Jägersberg
1 year
BLING: "Best Little Instruction-following No-GPU-required" - BLING is designed for enterprise automation use cases, especially in knowledge-intensive industries - BLING is not designed for 'chat-bot' or 'consumer-oriented' applications
4
15
120
@JagersbergKnut
Knut Jägersberg
9 months
So what's up with this model here?
4
14
116
@JagersbergKnut
Knut Jägersberg
1 year
This model must be crazy fast at inference and still good
3
24
108
@JagersbergKnut
Knut Jägersberg
1 year
globuslabs/ScholarBERT-XL The model is pretrained on a large collection of scientific research articles (221B tokens).
3
18
103
@JagersbergKnut
Knut Jägersberg
1 year
News: "Preparing the 33B version and we expect to empower WizardLM with the ability to perform instruction evolution itself, aiming to evolve your specific data at a low cost. - released 13B version of WizardLM trained with 250k evolved instructions.
2
19
100
@JagersbergKnut
Knut Jägersberg
7 months
any news on what happened to @TheBlokeAI ?
18
2
96
@JagersbergKnut
Knut Jägersberg
1 year
TheBloke/MPT-30B-Dolphin-v2-GGML This and the non-quantized weights might now be one of the best LLMs on HF. Replicating orca, I guess without censorship.
4
21
94
@JagersbergKnut
Knut Jägersberg
10 months
Altman says they brought cost down of operating gpt-3 (I guess that's gpt3.5 by now) by a factor of 40. Tell that anybody who thinks chatgpt has 175 billion parameters.
8
4
90
@JagersbergKnut
Knut Jägersberg
7 months
First Jamba fine tunes are incoming
1
10
92
@JagersbergKnut
Knut Jägersberg
2 years
Effects of different quantization of llama
Tweet media one
2
13
85
@JagersbergKnut
Knut Jägersberg
1 year
seonghyeonye/flipped_11B Generates Instructions
3
18
87
@JagersbergKnut
Knut Jägersberg
10 months
LLM overview pretrained; below 2b: - Mamba 1b - TinyLlama 1.1b - Qwen-1_8b - LiteLlama-460M-1T pretrained; 3b: - stablelm-3b-4e1t - Phi-2 - MiniMa-2-3b - BTLM-3b
4
13
85
@JagersbergKnut
Knut Jägersberg
1 year
TheBloke/MistralLite-7B-AWQ A quantized version of the mistral that is instruction following over 32k tokens.
3
7
77
@JagersbergKnut
Knut Jägersberg
10 months
Open-hermes 2.5 is better than GPT-3.5 by my real-world tests, change my mind
Tweet media one
11
5
78
@JagersbergKnut
Knut Jägersberg
1 year
Ahead of e5 embeddings in MTEB!
4
11
78
@JagersbergKnut
Knut Jägersberg
6 months
Did anybody notice Nvidia published a competitive llama3-70b QA/RAG fine tune?
Tweet media one
8
8
79
@JagersbergKnut
Knut Jägersberg
1 year
amazon/FalconLite2 "By utilizing 4-bit GPTQ quantization and adapted RotaryEmbedding, FalconLite2 is able to process 10x longer contexts while consuming 4x less GPU memory than the original model."
2
15
76
@JagersbergKnut
Knut Jägersberg
1 year
dynamofl/mistral-2 What do we have here? seemingly a pruned mistral!
9
4
76
@JagersbergKnut
Knut Jägersberg
8 months
abideen/gemma-7b-openhermes openllm average: 73.5%
6
4
75
@JagersbergKnut
Knut Jägersberg
1 year
Become a cognitive engineer or perish.
9
5
72
@JagersbergKnut
Knut Jägersberg
11 months
This is a 600b LLM. But they don't give access to the weights. I could imagine it's falcon-180bs glued together.
9
4
74
@JagersbergKnut
Knut Jägersberg
8 months
Tweet media one
2
7
72
@JagersbergKnut
Knut Jägersberg
10 months
Doctor-Shotgun/TinyLlama-1.1B-32k For speculative decoding
2
13
72
@JagersbergKnut
Knut Jägersberg
10 months
freecs/ArtificialThinker-Phi2 Adds an explicit reasoning phase in the prompt template, akin to kaist-ai/CoT-Collection
3
7
69
@JagersbergKnut
Knut Jägersberg
1 year
LumiOpen/Poro-34B "Poro is a 34B parameter decoder-only transformer pretrained on Finnish, English and code. It is being trained on 1 trillion tokens (300 billion as of this release)." More and more are coming, even for languages of small countries. 🇩🇪?
6
14
69
@JagersbergKnut
Knut Jägersberg
1 year
Oh so this is properly the first regular 1b model pretrained over 1 trillion tokens.
3
9
69
@JagersbergKnut
Knut Jägersberg
10 months
I bet true AGI is build before the EU AI act kicks into effect.
17
2
68
@JagersbergKnut
Knut Jägersberg
8 months
dranger003/LWM-Text-Chat-128K-iMat.GGUF "The imatrix Q4-K quant fits with 32K context on 24GB and gives me ~100 t/s inference on a 3090. With IQ3_XXS it seems to fit ~37K context on 24GB (and it is even faster than Q4-K)."
2
10
69
@JagersbergKnut
Knut Jägersberg
11 months
TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T 💪
Tweet media one
4
7
69
@JagersbergKnut
Knut Jägersberg
9 months
OrionStarAI/Orion-14B-Base 2.5T multilingual corpus, including Chinese, English, Japanese, Korean
1
17
66
@JagersbergKnut
Knut Jägersberg
11 months
Open-Orca/Mixtral-SlimOrca-8x7B 👀👀
4
7
66
@JagersbergKnut
Knut Jägersberg
1 year
I didn't know Mistral is that good
Tweet media one
5
3
66
@JagersbergKnut
Knut Jägersberg
11 months
VAGOsolutions/SauerkrautLM-7b-HerO merge of Teknium's OpenHermes-2.5-Mistral-7B and Open-Orca's Mistral-7B-OpenOrca, fine-tuned on the Sauerkraut dataset
Tweet media one
3
10
64
@JagersbergKnut
Knut Jägersberg
7 months
Vezora/Mistral-22B-v0.1 MoE merge, probably something
3
9
63
@JagersbergKnut
Knut Jägersberg
10 months
KnutJaegersberg/Tess-M-34B-2bit 🦾 A quip# 2 bit quantization of Tess-M by @migtissera based on Yi-34B-200k by @01AI_Yi . Weights are 10 GB smol, yet model quality is very good! Made with 8k context hessians to enhance long context inference quality.
Tweet media one
4
8
63
@JagersbergKnut
Knut Jägersberg
1 year
HuggingFaceBR4/falcon-180B-python-sft-logging What's that?
4
7
62
@JagersbergKnut
Knut Jägersberg
3 years
DriveML and forester ML libraries have a very streamlined workflow to use tuned ML models for explainable ML, forester has EDA functions. One can expect good performance and insights ratio for coding time invested. #rstats #machinelearning #datascience
Tweet media one
1
20
60
@JagersbergKnut
Knut Jägersberg
1 year
There was this interview with @sama that suggested they considered open sourcing gpt3, but they thought most businesses could not handle it, as it is so big. I could download below and set up a system in the cloud with 2 80gb vram gpus, if I wanted to.
5
6
61
@JagersbergKnut
Knut Jägersberg
9 months
haoranxu/ALMA-13B-R There it is, currently best open source option for machine translation.
4
12
61
@JagersbergKnut
Knut Jägersberg
11 months
Tweet media one
4
9
57
@JagersbergKnut
Knut Jägersberg
1 year
openlm-research/open_llama_13b
2
11
58
@JagersbergKnut
Knut Jägersberg
2 years
MBZUAI/LaMini-T5-738M LaMini: A Diverse Herd of Distilled Models from Large-Scale Instructions
Tweet media one
5
15
56
@JagersbergKnut
Knut Jägersberg
11 months
Llama3 as fast as possible
6
2
56
@JagersbergKnut
Knut Jägersberg
25 days
@kimmonismus this wave of AI is indeed disruptive. entire professions almost disappear. translators, transcribers, narrators, and the field is expanding. Like professional drawers that were replaced by PCs, software and printers. It's that kind of change and it's accelerating.
6
3
55
@JagersbergKnut
Knut Jägersberg
2 years
AI: What is the future of artificial intelligence? - BBC News "I've tried to brief policymakers: it is like explaining particle physics to a chocolade chip cookie"
Tweet media one
11
15
54
@JagersbergKnut
Knut Jägersberg
7 months
Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference
Tweet media one
0
15
53
@JagersbergKnut
Knut Jägersberg
8 months
An Open Source text-to-speech system built by inverting Whisper. Somehow I forgot about that one
3
10
53
@JagersbergKnut
Knut Jägersberg
7 months
Tweet media one
6
11
54
@JagersbergKnut
Knut Jägersberg
10 months
GeneZC/MiniMA-2-3B New more powerful iteration of MiniMA!
Tweet media one
4
10
54
@JagersbergKnut
Knut Jägersberg
1 year
CausalLM/14B Wonderful, they merged llama2 and Qwen and made a llama out of it.
Tweet media one
4
7
54
@JagersbergKnut
Knut Jägersberg
10 months
myshell-ai/OpenVoice instant voice cloning
2
11
52
@JagersbergKnut
Knut Jägersberg
2 years
GPT4Tools: Teaching LLM to Use Tools via Self-instruction
Tweet media one
1
13
50