bartowski Profile
bartowski

@bartowski1182

881
Followers
93
Following
8
Media
336
Statuses

LLM Enthusiast

Joined February 2024
Don't wanna be here? Send us removal request.
Pinned Tweet
@bartowski1182
bartowski
17 days
Can't believe it finally happened... 1000 followers on huggingface! So exciting, thank you to everyone for your outstanding support, never thought I'd be here when I started a year ago.. Thank you @huggingface for letting me upload countless terabytes of data to your servers ❤️
Tweet media one
8
5
76
@bartowski1182
bartowski
1 month
Hey @Microsoft , Phi 3 mini is way too big an upgrade to stay on the same name! It's too awesome for that! In that spirit, I've released it as Phi 3.1, and it's available for download now on my page as well as here on lmstudio-community: Go check it out!
3
36
189
@bartowski1182
bartowski
1 month
And now 27B is also available :) By the way, I'll likely be reuploading both Gemma sizes when the PR is officially merged (the ones i'm releasing today miss a bit of metadata and I prefer to use official llama.cpp releases) but for now those who want
3
18
128
@bartowski1182
bartowski
3 months
I just remade and uploaded my quants for @AIatMeta Llama 3 8B instruct GGUF to @huggingface using the latest llamacpp release with official support, so no hacking needed to make the end token work, generation is perfect with llama.cpp ./main Will have
6
9
90
@bartowski1182
bartowski
1 month
Want to try out the latest Gemma GGUF? Great news! @LMStudioAI has officially added support with 0.2.26!! 🔥 Grab the update at and then either size from the lmstudio-community: Thanks @googledev for the
4
15
79
@bartowski1182
bartowski
4 months
Seems like it's the day for GGUF announcements! I'm excited to share I’ve started a collaboration with @LMStudioAI as their LLM Archivist! I’ll be working with the amazing LM Studio team to update their @huggingface page! Read on for a few more details..
9
12
76
@bartowski1182
bartowski
2 months
ICYMI: OpenChat dropped their first llama 3 tune! They've had a great track record for releasing some surprisingly SOTA tunes, and this one looks to be no different! It's also up on the lmstudio-community :) Thanks @alignment_lab @OpenChatDev !!
2
14
64
@bartowski1182
bartowski
4 months
Posted the GGUF of Qwen 1.5 32B to the @LMStudioAI community HF page! A great model for those with extra RAM/VRAM looking for more power! Give it a try now in LM Studio 🤗
3
14
63
@bartowski1182
bartowski
2 months
We've got a new Mistral model! v0.3 instruct! 🎉 Hopefully a strong answer to Llama 3, you can download right now from the lmstudio-community page: Thanks @MistralAI !
1
8
60
@bartowski1182
bartowski
3 months
GGUF quants of Llama 3 8B instruct with BPE fixes are now up! Get em here:
4
10
60
@bartowski1182
bartowski
3 months
After days of compute (since I had to start over) it's finally up! Llama 3 70B GGUF with tokenizer fix :) In other news, just ordered an EPYC 7551p so hopefully this delay never happens again 😅
5
17
60
@bartowski1182
bartowski
30 days
Finished up testing the experimental quants testing with some interesting results (tldr, no more fp16 embed/output, now q8 embed/output) Basically across 8 categories I found that quantizing the embeddings and outputs to Q8 was equal to or better than FP16 in 6 of them. I feel
Tweet media one
5
7
58
@bartowski1182
bartowski
4 months
For anyone who missed it, we've got ourselves a dolphin tune of Mistral 0.2! Exciting stuff! Thanks @erhartford GGUF Exl2 Original
2
12
52
@bartowski1182
bartowski
10 days
Llama 3.1 GGUF quants going up on lmstudio community now! More sizes (and 70b) are on the way
1
7
49
@bartowski1182
bartowski
4 months
Another day another officially supported LM Studio model :) This time @GoogleDeepMind Gemma 1.1 2B! From the testing I did to get example output, I gotta say for the size it is surprisingly capable! Check it out in @LMStudioAI today!
2
6
46
@bartowski1182
bartowski
1 month
DeepSeek (non-lite) is moving along about as slowly as expected, but IQ1_M is finally up ! Expect only a few more sizes over the next 8-16 hours :) Even the IQ1_M is 52GB
4
8
43
@bartowski1182
bartowski
1 month
Okay take two: With the tokenizer fixes now so that it properly tokenizes its start and end tokens! Go crazy! Thanks again @googledevs @GoogleDeepMind <3
2
8
38
@bartowski1182
bartowski
2 months
Smaug 70B based on Llama 3 by @abacusai @bindureddy is now supported in llama.cpp :) My PR got merged, so as of b3001 you can run the GGUF I made awhile back: Link to (very simple) PR here for reference:
2
9
35
@bartowski1182
bartowski
3 months
Yi models alert!! 🔥 Available in 3 flavors, 6B, 9B, and the increasingly elusive 34B Available for download NOW! All sizes created with imatrix Check them out in LM Studio today! Thanks @01AI_Yi !
0
4
35
@bartowski1182
bartowski
1 month
Okay, a few nice sizes to try out including the "Q2_K_L" which is Q2_K with f16 embeddings and output weights, and same with Q3_K_XL, which is Q3_K_L with the same. Let me know if it makes any difference! Still very curious :O but they're massive...
2
3
34
@bartowski1182
bartowski
1 month
Hey @GoogleDeepMind @googledevs any chance you could release the code you used to convert Gemma 2 to GGUF? It's starting to seem like there might still be something missing from the llama.cpp implementation even after tokenizer and attention fixes
3
3
31
@bartowski1182
bartowski
3 months
Won't be making any GGUF quants of llama 3 until the BPE tokenizer fix is merged, just for anyone wondering about some missing models
2
4
30
@bartowski1182
bartowski
4 months
WaveCoder ultra 6.7b by @TeamCodeLLM_AI -- Incredible coding model for its size thanks their new CodeOcean dataset -- Tuned from 20,000 high quality instructions -- Intended for generation, summarization, repair, and translation More options than ever
1
5
26
@bartowski1182
bartowski
1 month
Gemma 2 9b by @GoogleAI with imatrix GGUFs are now available for download! I assume will need this PR to run it: You can download them all here! 27B is in the oven! Thanks so much for the release google :D
3
3
24
@bartowski1182
bartowski
4 months
Introducing CodeGemma - Google's new smol models, specialized in coding! 🔥 Available in 3 variants: - 2B: use for code generation and fill in middle - 7B: for code generation and fill in middle - 7B-it: for instruction following and code generation
2
2
23
@bartowski1182
bartowski
1 month
Of course we're all destroying HF servers as I'm trying to upload gemme 9 GGUFs 😅
2
1
22
@bartowski1182
bartowski
2 months
@theinformation @KalleyHuang I have 0 issue with this as long as they keep releasing the open weights.. money needs to be made to pay the bills, if it's a compelling use case and worth the money AND it doesn't take away from the open scene, then this sounds perfectly fine!
1
2
18
@bartowski1182
bartowski
18 days
And of course a huge shout out to @cognitivecompai for making the awesome tune! I'd have nothing to create without both of you, so kudos and thanks for your contributions!!
2
0
17
@bartowski1182
bartowski
2 months
Qwen2 is among us! 7B GGUF and ExLlamaV2 are up, 72B with imatrix GGUF up as well, exl2 in the oven Note: GGUF offloading to CUDA requires flash attention (-fa) to work!
1
5
17
@bartowski1182
bartowski
29 days
Also, can we talk about how even at default, these are insane MMLU pro scores for a Q3 quant of a 4B param model.. @Microsoft really put some work in on this update :O Reuploaded with Q8 for anyone who wants to try as well:
1
0
15
@bartowski1182
bartowski
4 months
Importantly, this WILL NOT change my current stream of GGUF and EXL2 models, this will be a more curated and higher detailed offering, specifically designed for users of LM Studio! Visit to check out our initial offerings!
1
0
15
@bartowski1182
bartowski
4 months
In other news, made it to the 200 follower count on @huggingface ! Thanks everyone for following along, happy to be part of the community :) Also passed 400 quants! Split between exl2 and gguf :)
1
3
15
@bartowski1182
bartowski
1 month
Thanks again to @abetlen for merging the soft-capping of attention for Gemma 2! This should add much more performance to the 27b model, and likely a bit to the 9b. I'll be remaking both, though only to improve the imatrix, you'll need a tool update to see the benefits!
0
0
14
@bartowski1182
bartowski
4 months
Our goal is to ensure the community has timely access to the latest & greatest LLMs in open standards (GGUF <3) as well as highlight the often unnoticed model creators, novel training techniques and datasets! Huge shoutout to @ggerganov and team for making all of this possible!
1
1
14
@bartowski1182
bartowski
2 months
Thinking of mildly streamlining GGUFs by removing a few sizes, would anyone be upset if I removed: Q5_K_S, Q4_K_S, IQ4_XS, IQ3_S, IQ3_XXS, IQ2_S, IQ2_XXS, IQ1_S These all seem to be extremely close to other (better) sizes and don't seem worth keeping, and would both speed up my
10
0
14
@bartowski1182
bartowski
2 months
Oh snap, good catch... Should be very interesting :o
@MaziyarPanahi
Maziyar PANAHI
2 months
Updated models for Mixtral-8x7B are coming soon!!! 😱
Tweet media one
5
0
14
2
1
12
@bartowski1182
bartowski
3 months
Server is out for delivery, hoping once I get it set up tonight I'll be able to start ripping through my backlog!
1
0
13
@bartowski1182
bartowski
4 months
After months of silence, @TeamCodeLLM_AI silently dropped wavecoder in the shadow of WizardLM2 (also a great looking model!) Check out my quants of their ultra model here: GGUF: EXL2 Should I make the others? Ultra seems like
1
3
13
@bartowski1182
bartowski
1 month
Seems like even after the tokenizer fixes the 27b Gemma 2 is struggling, hopefully a fix is on the way! In the mean time the 9b model seems to work without issue and is very good!
2
1
13
@bartowski1182
bartowski
1 month
Heads up, probably no surprises, but tokenizer issues were found with gemma-2 :) so as soon as fixes are in i'll be remaking them! Was relatively expected, it happens when we're on the bleeding edge!!
2
0
11
@bartowski1182
bartowski
4 months
WizardLM 2 7B - @Microsoft 's latest and greatest - available now! This model is punching way above its weight, outscoring models multiple times its size in the popular MT-Bench benchmark. Check the attached screenshot for numbers. This is a huge step
Tweet media one
0
3
12
@bartowski1182
bartowski
3 months
BPE tokenizer fix has been merged into main, no official release yet but building some GGUFs to test if it's backwards compatible :)
1
0
12
@bartowski1182
bartowski
3 months
I'm curious about @huggingface download counts 1: is there any way to see individual file download counts? IE how many people download each quant level of a GGUF 2: is there something buggy or different about how branch downloads are handled, or is EXL2 that much less popular
4
1
11
@bartowski1182
bartowski
3 months
Ran into multiple issues making llamacpp quants for 70b instruct, it'll be up soon I promise :) eta is tomorrow morning
2
0
11
@bartowski1182
bartowski
2 months
Cohere's new Aya 23 models, with support for 23 languages, dropped earlier today 🚀 Available in 2 sizes, 8B and 35B, and ready to download into lmstudio! Check the new "use this model" button near the top right!
1
2
11
@bartowski1182
bartowski
4 months
@LMStudioAI @TheBlokeAI @huggingface So excited for this! For anyone wondering what the main goal of this is, check the few we've posted! We plan to post fewer models overall but focus entirely on the high quality ones, and we'll provide a detailed explanation of why we chose the models and what they're good for!
1
0
11
@bartowski1182
bartowski
2 months
DiscoPOP is now up on the @LMStudioAI community! Check out this awesome work by @SakanaAILabs , this is incredible stuff, and I'm looking forward to their future developments!
0
1
9
@bartowski1182
bartowski
4 months
They'll also be offering in the near future a new model catalog they’re working on! All models on the huggingface page will also appear there, making finding your favorite models easier than ever! Stay tuned for more!
0
0
8
@bartowski1182
bartowski
2 months
@MistralAI As always these are all made with imatrix, and if you need other sizes you can find them on my personal page here: Just try not to flood the lmstudio community with too many options :)
1
0
8
@bartowski1182
bartowski
2 months
@MaziyarPanahi @mvaloatto @huggingface LOL right, i'm just some random guy uploading literal terabytes of data to their server and they're just like "yeah sure hit me"
1
0
8
@bartowski1182
bartowski
3 months
@WolframRvnwlf @iamRezaSayar @LMStudioAI Yeah that could do it very easily, turbo uses existing tokenizer classes where llamacpp tends to rewrite stuff for lack of dependencies, so exl2 worked out of the box. I'll let you know when it's merged and I've got quants up.
1
0
8
@bartowski1182
bartowski
4 months
@WizardLM_AI Where did it go? :'(
3
0
8
@bartowski1182
bartowski
2 months
GGUFs are up!
@hardmaru
hardmaru
2 months
New Paper and Blog! As LLMs become better at generating hypotheses and code, a fascinating possibility emerges: using AI to advance AI itself! As a first step, we got LLMs to discover better algorithms for training LLMs that align with human preferences.
8
119
568
0
0
8
@bartowski1182
bartowski
4 months
Last but not least, starcoder2 was released during this window, and was tuned by many, including @erhartford with dolphin and @huggingface with starchat
1
1
6
@bartowski1182
bartowski
4 months
Anyone have any thoughts on AWQ vs imatrix for GGUF? Been wanting to add imatrix for awhile, but was told about using AWQ recently. Is using both even better? Give me some opinions :D
3
1
7
@bartowski1182
bartowski
3 months
Woo looks like we have backwards compatibility! You just won't get the tokenizer fixes if you don't update your tool llama.cpp main now gets the common addition question correctly! Even with Q2_K :)
Tweet media one
1
0
7
@bartowski1182
bartowski
1 month
Reading into the Gemma 2 report.. Was highly intrigued by the inclusion of the local sliding window and global attention alternating at each layer, as well as the logit soft-capping. The report doesn't really dive into details on either of them and what the advantages may be,
2
0
7
@bartowski1182
bartowski
4 months
Time for another roundup of recent models I've enjoyed! Lots of good releases since last I posted, so this will be another long one! Check them all out on my @huggingface page
3
0
7
@bartowski1182
bartowski
3 months
Decided it was time to add AWQ to my pipeline, so enjoy Llama3 8b instruct: TBD if I'll make the 70b or anything larger than ~8b, will try one and see how long it takes to run
0
0
6
@bartowski1182
bartowski
29 days
Finally figured out why my chat template for Phi 3 all prints on one line rather than the multi-line the template/card shows it should be.. They have <|user|>, <|end|>, <|system|>, and <|assistant|> all set with rstrip = true in tokenizer_config.json Anyone know why..?
1
0
6
@bartowski1182
bartowski
1 month
@GoogleDeepMind @googledevs by the way, the officially uploaded f32 GGUFs are based on a broken conversion and don't properly tokenize the start/end tokens. I've got working ones up if you want to take em, or you can remake with latest llama.cpp build where it'll work
0
0
4
@bartowski1182
bartowski
17 days
@huggingface Also shout out to @LMStudioAI and @arcee_ai for helping fund my efforts by providing contracts for some of my work, it's been hugely beneficial for justifying hardware purchases and keeping everything afloat!!
1
0
6
@bartowski1182
bartowski
3 months
@erhartford Anyone have any recommendations for a chat UI that plays nicely with a Transformers backend? Or any open AI backend for that matter
4
0
6
@bartowski1182
bartowski
4 months
One of the first mistral 0.2 finetunes and it's from one of my favourite authors ! Check it out! I expect good things from this.. Quanted and up on my page, GGUF and exl2 as usual :)
0
2
5
@bartowski1182
bartowski
2 months
@MaziyarPanahi @mihai673 @huggingface @LMStudioAI Yup already downloading the lites :D the non-lites will take a tiiiny bit longer...
1
0
5
@bartowski1182
bartowski
3 months
@iamRezaSayar @LMStudioAI @WolframRvnwlf for what it's worth, just pushed an f32 conversion, so if any is gonna work as unquanted (assuming no llamacpp bugs) this should be it:
1
0
5
@bartowski1182
bartowski
2 months
@rohanpaul_ai Wouldn't this be cool for a company with distributed laptops across all their employees? Running a load balanced and distributed coding model using only a bit of performance from each device.. The possibilities are huge
1
1
5
@bartowski1182
bartowski
3 months
Biggest reason I wish there was some way to after-the-fact update them, though even then you'd have to download them all, make the change, and reupload... There's definitely a simplicity to single-file execution, but boy to I like the json files in exl2 when this happens
@MaziyarPanahi
Maziyar PANAHI
3 months
@WolframRvnwlf "how many broken/suboptimal models are (and will remain) floating around" - I know! editing 100s of GGUF models to point to a right eos_token_id and re-uploading them was so much work! 😂
1
0
7
2
0
5
@bartowski1182
bartowski
2 months
@heyitsyorkie and like, if you look at that chart, sizes like Q3_K_S, Q3_K_L, Q3_K_M, Q2_K, Q4_K_S etc all basically shouldn't exist (in terms of performance/bit) but NEED to for people who can't use i-quants
1
0
3
@bartowski1182
bartowski
1 month
27b is also ready! You'll need to be on llama.cpp b3259 or newer to run it :)
1
0
3
@bartowski1182
bartowski
3 months
@yagilb peaked my interest and I reformatted a world_sim prompt to follow llama 3 instruct seems to really do a good job following in (very) limited testing pastebin here: screenshot of playing around with it in @LMStudioAI attached
Tweet media one
1
1
4
@bartowski1182
bartowski
5 months
(5/6) @Weyaxi followed up on the success of Einstein v3 with, you guessed it, v4: @WenhuChen with TIGER lab released StructLM in 3 sizes for structured knowledge grounding tasks:
2
0
4
@bartowski1182
bartowski
5 months
Now featuring: GGUF! I've added GGUF to my pipeline so all future models will come in both ExLlamaV2 and GGUF format :)
1
0
4
@bartowski1182
bartowski
2 months
@JagersbergKnut I was thinking of dropping Q4_K_S because even on a 70B model it's only 2gb smaller than Q4_K_M, a 5% difference, and there's still IQ4_NL which is the same size (though has the disadvantage of being slower on CPU/metal..)
1
0
4
@bartowski1182
bartowski
16 days
@cognitivecompai For the record Slaren as usual was the discoverer of the bug source ;D I just do the reporting :) But yeah, it's already being fixed so next time we can try again it'll hopefully work!
2
1
4
@bartowski1182
bartowski
2 months
@dudeman6790 @Teknium1 @MatthewBerman most formatters would have a cutoff, probably anything above 3 you switch to multiline
1
0
4
@bartowski1182
bartowski
29 days
@altryne Don't forget Microsoft's new phi 3 mini update!
0
0
4
@bartowski1182
bartowski
1 month
@MaziyarPanahi @ZyMazza @Presidentlin @teortaxesTex @huggingface Yeah I try to keep an eye on my HF inbox and answer as many questions as I can :)
0
0
4
@bartowski1182
bartowski
3 months
@MaziyarPanahi @NewDigitalEdu @JagersbergKnut 3 days ago, this one slipped under the radar! I haven't started yet, just finally uploading vanilla 70b instruct with the fixed tokenizer ;D
1
1
4
@bartowski1182
bartowski
3 months
@dxrulezsrb @reach_vb It's coming :D just put it in the oven, imatrix quants on the way
1
0
4
@bartowski1182
bartowski
2 months
@NeuralNovel @migtissera Always at the ready 😈
1
0
4
@bartowski1182
bartowski
2 months
@JagersbergKnut Yeah the I-quants are mostly targeted at CUDA/ROCm, with slower performance on metal/CPU, but overall better PPL/bit.. which is why the options become so flooded. If you care about speed and can't fully offload to CUDA/ROCm, K quants are your only option
1
0
4
@bartowski1182
bartowski
2 months
@erhartford Facts though, so disappointed they cap at 32gb.. the price is not bad for that much RAM, but if they managed to shove 8x that amount in there (and then I could install Linux 😇) I'd be pre-ordering (with money I don't have, shh)
0
0
3
@bartowski1182
bartowski
2 months
@MaziyarPanahi @rohanpaul_ai @GroqInc oh god how long be buy "computers as a service" where you get a discount if you leave it on and available for distributed inferencing...
1
0
3
@bartowski1182
bartowski
2 months
@altryne small correction, I think that Cohere only released 2 models, the 8B and 35B, and they're "only" 23 languages
1
0
3
@bartowski1182
bartowski
2 months
@MaziyarPanahi @mihai673 @huggingface @LMStudioAI dammit, they're broken :') when trying to generate i get "GGML_ASSERT: ggml.c:5705: ggml_nelements(a) == ne0*ne1", will report
2
0
3
@bartowski1182
bartowski
2 months
@MaziyarPanahi @Weyaxi @huggingface @julien_c Yeah that sounds great! I have almost 500 models that are exl2 meaning I'm missing a solid 10 TB+ from my number 😈
1
0
3
@bartowski1182
bartowski
2 months
@JagersbergKnut heads up that your tokenizer_config.yaml has a chatml chat template, but your model card suggests i think alpaca? or something similar, not chatml either way
1
0
3
@bartowski1182
bartowski
1 month
oh and shoutout to @abetlen and pculliton at github for their help preparing the PR!!
0
0
3
@bartowski1182
bartowski
2 months
@heyitsyorkie Yeah K_M generally offers almost identical performance/bit as the K_S and is almost the exact same size even when scaling up If IQ quants just worked well on metal and CPU there are so many more sizes I'd drop haha
1
0
3
@bartowski1182
bartowski
3 months
@rohanpaul_ai Why do we not see the PPL comparison of unquanted Llama 3 vs Llama 2? What if PPL on llama 3 is just across the board worse (because PPL is a half-measure anyways)? Also should re-run this with proper BPE support included in llama.cpp
2
0
3