bartowski @bartowski1182 Twitter profile | Pikagi

Pikagi

bartowski

@bartowski1182

881

Followers

93

Following

8

Media

336

Statuses

LLM Enthusiast

Joined February 2024

Don't wanna be here? Send us removal request.

Pinned Tweet

@bartowski1182

bartowski

17 days

Can't believe it finally happened... 1000 followers on huggingface! So exciting, thank you to everyone for your outstanding support, never thought I'd be here when I started a year ago.. Thank you @huggingface for letting me upload countless terabytes of data to your servers ❤️

Tweet media one

8

5

76

Last Seen Profiles

@chrisverygay69

@DonCulion42

@Anniedarkskinn

@Jivko_Ivelinov

@ahmadameer32

@aninhaagb7

@dpcrpublicinfo

@CoachJillian

@rainkirishima3

@Bro1718419Bro

@MYPBTV

@nomadcess

@ichi11_11

@TKGreenMusic

@mahaaaaae

@mtfutur3

@ALAColombiaDev

@nomen_machine

@NOYBeu

@49571Mist

@reopinn5959

@auroragayming

@N_Paul30

@N_Paul30

@miascloset

@alimoha04118572

@DocReoRants

@wHeresWallac3

@KeleBtw

@MiliWood

@NOYBeu

@Therealraynez

@richardharper

@GCSAA_SW

@oriinjj

@Yusuke67839940

@bartowski1182

bartowski

1 month

Hey @Microsoft , Phi 3 mini is way too big an upgrade to stay on the same name! It's too awesome for that! In that spirit, I've released it as Phi 3.1, and it's available for download now on my page as well as here on lmstudio-community: Go check it out!

Tweet card media

lmstudio-community/Phi-3.1-mini-4k-instruct-GGUF · Hugging Face

3

36

189

@bartowski1182

bartowski

1 month

And now 27B is also available :) By the way, I'll likely be reuploading both Gemma sizes when the PR is officially merged (the ones i'm releasing today miss a bit of metadata and I prefer to use official llama.cpp releases) but for now those who want

Tweet card media

bartowski/gemma-2-27b-it-GGUF · Hugging Face

3

18

128

@bartowski1182

bartowski

3 months

I just remade and uploaded my quants for @AIatMeta Llama 3 8B instruct GGUF to @huggingface using the latest llamacpp release with official support, so no hacking needed to make the end token work, generation is perfect with llama.cpp ./main Will have

Tweet card media

bartowski/Meta-Llama-3-8B-Instruct-GGUF · Hugging Face

6

9

90

@bartowski1182

bartowski

1 month

Want to try out the latest Gemma GGUF? Great news! @LMStudioAI has officially added support with 0.2.26!! 🔥 Grab the update at and then either size from the lmstudio-community: Thanks @googledev for the

Tweet card media

lmstudio-community/gemma-2-9b-it-GGUF · Hugging Face

4

15

79

@bartowski1182

bartowski

4 months

Seems like it's the day for GGUF announcements! I'm excited to share I’ve started a collaboration with @LMStudioAI as their LLM Archivist! I’ll be working with the amazing LM Studio team to update their @huggingface page! Read on for a few more details..

9

12

76

@bartowski1182

bartowski

2 months

ICYMI: OpenChat dropped their first llama 3 tune! They've had a great track record for releasing some surprisingly SOTA tunes, and this one looks to be no different! It's also up on the lmstudio-community :) Thanks @alignment_lab @OpenChatDev !!

Tweet card media

bartowski/openchat-3.6-8b-20240522-GGUF · Hugging Face

2

14

64

@bartowski1182

bartowski

4 months

Posted the GGUF of Qwen 1.5 32B to the @LMStudioAI community HF page! A great model for those with extra RAM/VRAM looking for more power! Give it a try now in LM Studio 🤗

Tweet card media

lmstudio-community/Qwen1.5-32B-Chat-GGUF · Hugging Face

3

14

63

@bartowski1182

bartowski

2 months

We've got a new Mistral model! v0.3 instruct! 🎉 Hopefully a strong answer to Llama 3, you can download right now from the lmstudio-community page: Thanks @MistralAI !

Tweet card media

lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF · Hugging Face

1

8

60

@bartowski1182

bartowski

3 months

GGUF quants of Llama 3 8B instruct with BPE fixes are now up! Get em here:

Tweet card media

bartowski/Meta-Llama-3-8B-Instruct-GGUF · Hugging Face

4

10

60

@bartowski1182

bartowski

3 months

After days of compute (since I had to start over) it's finally up! Llama 3 70B GGUF with tokenizer fix :) In other news, just ordered an EPYC 7551p so hopefully this delay never happens again 😅

Tweet card media

bartowski/Meta-Llama-3-70B-Instruct-GGUF · Hugging Face

5

17

60

@bartowski1182

bartowski

30 days

Finished up testing the experimental quants testing with some interesting results (tldr, no more fp16 embed/output, now q8 embed/output) Basically across 8 categories I found that quantizing the embeddings and outputs to Q8 was equal to or better than FP16 in 6 of them. I feel

Tweet media one

5

7

58

@bartowski1182

bartowski

4 months

For anyone who missed it, we've got ourselves a dolphin tune of Mistral 0.2! Exciting stuff! Thanks @erhartford GGUF Exl2 Original

2

12

52

@bartowski1182

bartowski

10 days

Llama 3.1 GGUF quants going up on lmstudio community now! More sizes (and 70b) are on the way

Tweet card media

lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF · Hugging Face

1

7

49

@bartowski1182

bartowski

4 months

Another day another officially supported LM Studio model :) This time @GoogleDeepMind Gemma 1.1 2B! From the testing I did to get example output, I gotta say for the size it is surprisingly capable! Check it out in @LMStudioAI today!

Tweet card media

lmstudio-community/gemma-1.1-2b-it-GGUF · Hugging Face

2

6

46

@bartowski1182

bartowski

1 month

DeepSeek (non-lite) is moving along about as slowly as expected, but IQ1_M is finally up ! Expect only a few more sizes over the next 8-16 hours :) Even the IQ1_M is 52GB

Tweet card media

bartowski/DeepSeek-Coder-V2-Instruct-GGUF · Hugging Face

4

8

43

@bartowski1182

bartowski

3 months

GGUF of ChatQA 70B is up :)

Tweet card media

bartowski/Llama3-ChatQA-1.5-70B-GGUF · Hugging Face

0

5

42

@bartowski1182

bartowski

1 month

Okay take two: With the tokenizer fixes now so that it properly tokenizes its start and end tokens! Go crazy! Thanks again @googledevs @GoogleDeepMind <3

Tweet card media

bartowski/gemma-2-9b-it-GGUF · Hugging Face

2

8

38

@bartowski1182

bartowski

2 months

Smaug 70B based on Llama 3 by @abacusai @bindureddy is now supported in llama.cpp :) My PR got merged, so as of b3001 you can run the GGUF I made awhile back: Link to (very simple) PR here for reference:

Tweet card media

Add Smaug 70B support to conversion by bartowski1182 · Pull Request #7402 · ggerganov/llama.cpp

Converted, quanted, and ran with no issue with these changes https://huggingface.co/abacusai/Smaug-Llama-3-70B-Instruct

2

9

35

@bartowski1182

bartowski

3 months

Yi models alert!! 🔥 Available in 3 flavors, 6B, 9B, and the increasingly elusive 34B Available for download NOW! All sizes created with imatrix Check them out in LM Studio today! Thanks @01AI_Yi !

Tweet card media

lmstudio-community/Yi-1.5-6B-Chat-GGUF · Hugging Face

0

4

35

@bartowski1182

bartowski

1 month

Okay, a few nice sizes to try out including the "Q2_K_L" which is Q2_K with f16 embeddings and output weights, and same with Q3_K_XL, which is Q3_K_L with the same. Let me know if it makes any difference! Still very curious :O but they're massive...

Tweet card media

bartowski/DeepSeek-Coder-V2-Instruct-GGUF · Hugging Face

2

3

34

@bartowski1182

bartowski

1 month

Hey @GoogleDeepMind @googledevs any chance you could release the code you used to convert Gemma 2 to GGUF? It's starting to seem like there might still be something missing from the llama.cpp implementation even after tokenizer and attention fixes

3

3

31

@bartowski1182

bartowski

3 months

Won't be making any GGUF quants of llama 3 until the BPE tokenizer fix is merged, just for anyone wondering about some missing models

2

4

30

@bartowski1182

bartowski

10 days

@danielbulhosa @TheBlokeAI they're up on lmstudio-community :)

Tweet card media

lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF · Hugging Face

3

2

27

@bartowski1182

bartowski

4 months

WaveCoder ultra 6.7b by @TeamCodeLLM_AI -- Incredible coding model for its size thanks their new CodeOcean dataset -- Tuned from 20,000 high quality instructions -- Intended for generation, summarization, repair, and translation More options than ever

Tweet card media

lmstudio-community/wavecoder-ultra-6.7b-GGUF · Hugging Face

1

5

26

@bartowski1182

bartowski

1 month

Gemma 2 9b by @GoogleAI with imatrix GGUFs are now available for download! I assume will need this PR to run it: You can download them all here! 27B is in the oven! Thanks so much for the release google :D

3

3

24

@bartowski1182

bartowski

4 months

Introducing CodeGemma - Google's new smol models, specialized in coding! 🔥 Available in 3 variants: - 2B: use for code generation and fill in middle - 7B: for code generation and fill in middle - 7B-it: for instruction following and code generation

Tweet card media

lmstudio-community (LM Studio Community)

2

2

23

@bartowski1182

bartowski

1 month

Of course we're all destroying HF servers as I'm trying to upload gemme 9 GGUFs 😅

2

1

22

@bartowski1182

bartowski

2 months

@theinformation @KalleyHuang I have 0 issue with this as long as they keep releasing the open weights.. money needs to be made to pay the bills, if it's a compelling use case and worth the money AND it doesn't take away from the open scene, then this sounds perfectly fine!

1

2

18

@bartowski1182

bartowski

3 months

@WolframRvnwlf @iamRezaSayar @LMStudioAI I realized though that phi 3 is a BPE vocab model and there's a major PR for fixing BPE tokenizers in llama.cpp: so maybe best to wait and retest after that?

Tweet card media

llama : improve BPE pre-processing + LLaMA 3 and Deepseek support by ggerganov · Pull Request #6920...

Continuing the work in #6252 by @dragnil1 This PR adds support for BPE pre-tokenization to llama.cpp Summary The state so far has been that for all BPE-based models, llama.cpp applied a default pre...

2

2

19

@bartowski1182

bartowski

18 days

And of course a huge shout out to @cognitivecompai for making the awesome tune! I'd have nothing to create without both of you, so kudos and thanks for your contributions!!

2

0

17

@bartowski1182

bartowski

2 months

Qwen2 is among us! 7B GGUF and ExLlamaV2 are up, 72B with imatrix GGUF up as well, exl2 in the oven Note: GGUF offloading to CUDA requires flash attention (-fa) to work!

Tweet card media

bartowski/Qwen2-7B-Instruct-exl2 · Hugging Face

1

5

17

@bartowski1182

bartowski

29 days

Also, can we talk about how even at default, these are insane MMLU pro scores for a Q3 quant of a 4B param model.. @Microsoft really put some work in on this update :O Reuploaded with Q8 for anyone who wants to try as well:

Tweet card media

bartowski/Phi-3.1-mini-4k-instruct-GGUF · Hugging Face

1

0

15

@bartowski1182

bartowski

4 months

Importantly, this WILL NOT change my current stream of GGUF and EXL2 models, this will be a more curated and higher detailed offering, specifically designed for users of LM Studio! Visit to check out our initial offerings!

1

0

15

@bartowski1182

bartowski

4 months

In other news, made it to the 200 follower count on @huggingface ! Thanks everyone for following along, happy to be part of the community :) Also passed 400 quants! Split between exl2 and gguf :)

1

3

15

@bartowski1182

bartowski

1 month

Thanks again to @abetlen for merging the soft-capping of attention for Gemma 2! This should add much more performance to the 27b model, and likely a bit to the 9b. I'll be remaking both, though only to improve the imatrix, you'll need a tool update to see the benefits!

0

0

14

@bartowski1182

bartowski

4 months

Our goal is to ensure the community has timely access to the latest & greatest LLMs in open standards (GGUF <3) as well as highlight the often unnoticed model creators, novel training techniques and datasets! Huge shoutout to @ggerganov and team for making all of this possible!

1

1

14

@bartowski1182

bartowski

2 months

Thinking of mildly streamlining GGUFs by removing a few sizes, would anyone be upset if I removed: Q5_K_S, Q4_K_S, IQ4_XS, IQ3_S, IQ3_XXS, IQ2_S, IQ2_XXS, IQ1_S These all seem to be extremely close to other (better) sizes and don't seem worth keeping, and would both speed up my

10

0

14

@bartowski1182

bartowski

2 months

Oh snap, good catch... Should be very interesting :o

@MaziyarPanahi

Maziyar PANAHI

2 months

Updated models for Mixtral-8x7B are coming soon!!! 😱

Tweet media one

5

0

14

2

1

12

@bartowski1182

bartowski

3 months

Server is out for delivery, hoping once I get it set up tonight I'll be able to start ripping through my backlog!

1

0

13

@bartowski1182

bartowski

4 months

After months of silence, @TeamCodeLLM_AI silently dropped wavecoder in the shadow of WizardLM2 (also a great looking model!) Check out my quants of their ultra model here: GGUF: EXL2 Should I make the others? Ultra seems like

Tweet card media

bartowski/wavecoder-ultra-6.7b-exl2 · Hugging Face

1

3

13

@bartowski1182

bartowski

1 month

Seems like even after the tokenizer fixes the 27b Gemma 2 is struggling, hopefully a fix is on the way! In the mean time the 9b model seems to work without issue and is very good!

2

1

13

@bartowski1182

bartowski

1 month

Heads up, probably no surprises, but tokenizer issues were found with gemma-2 :) so as soon as fixes are in i'll be remaking them! Was relatively expected, it happens when we're on the bleeding edge!!

2

0

11

@bartowski1182

bartowski

4 months

WizardLM 2 7B - @Microsoft 's latest and greatest - available now! This model is punching way above its weight, outscoring models multiple times its size in the popular MT-Bench benchmark. Check the attached screenshot for numbers. This is a huge step

Tweet media one

0

3

12

@bartowski1182

bartowski

3 months

BPE tokenizer fix has been merged into main, no official release yet but building some GGUFs to test if it's backwards compatible :)

1

0

12

@bartowski1182

bartowski

2 months

Looks like Qwen2 GGUFs offloaded to CUDA is broken, but slaren is already on the case: a workaround is enabling flash attention with -fa

Tweet card media

Bug: QWEN2 quantization GGML_ASSERT · Issue #7805 · ggerganov/llama.cpp

What happened? When attempting to quantize Qwen2 7B instruct to IQ2_XS I get the following assert: GGML_ASSERT: ggml-quants.c:12083: grid_index >= 0 Anything I can provide to debug? Uploading th...

2

0

12

@bartowski1182

bartowski

3 months

I'm curious about @huggingface download counts 1: is there any way to see individual file download counts? IE how many people download each quant level of a GGUF 2: is there something buggy or different about how branch downloads are handled, or is EXL2 that much less popular

4

1

11

@bartowski1182

bartowski

3 months

Ran into multiple issues making llamacpp quants for 70b instruct, it'll be up soon I promise :) eta is tomorrow morning

2

0

11

@bartowski1182

bartowski

2 months

Cohere's new Aya 23 models, with support for 23 languages, dropped earlier today 🚀 Available in 2 sizes, 8B and 35B, and ready to download into lmstudio! Check the new "use this model" button near the top right!

Tweet card media

lmstudio-community/aya-23-8B-GGUF · Hugging Face

1

2

11

@bartowski1182

bartowski

4 months

@LMStudioAI @TheBlokeAI @huggingface So excited for this! For anyone wondering what the main goal of this is, check the few we've posted! We plan to post fewer models overall but focus entirely on the high quality ones, and we'll provide a detailed explanation of why we chose the models and what they're good for!

1

0

11

@bartowski1182

bartowski

2 months

DiscoPOP is now up on the @LMStudioAI community! Check out this awesome work by @SakanaAILabs , this is incredible stuff, and I'm looking forward to their future developments!

Tweet card media

lmstudio-community/DiscoPOP-zephyr-7b-gemma-GGUF · Hugging Face

0

1

9

@bartowski1182

bartowski

3 months

@dxrulezsrb @reach_vb

Tweet card media

bartowski/codegemma-1.1-7b-it-GGUF · Hugging Face

2

3

8

@bartowski1182

bartowski

4 months

They'll also be offering in the near future a new model catalog they’re working on! All models on the huggingface page will also appear there, making finding your favorite models easier than ever! Stay tuned for more!

0

0

8

@bartowski1182

bartowski

2 months

@MistralAI As always these are all made with imatrix, and if you need other sizes you can find them on my personal page here: Just try not to flood the lmstudio community with too many options :)

Tweet card media

bartowski/Mistral-7B-Instruct-v0.3-GGUF · Hugging Face

1

0

8

@bartowski1182

bartowski

2 months

@MaziyarPanahi @mvaloatto @huggingface LOL right, i'm just some random guy uploading literal terabytes of data to their server and they're just like "yeah sure hit me"

1

0

8

@bartowski1182

bartowski

3 months

@WolframRvnwlf @iamRezaSayar @LMStudioAI Yeah that could do it very easily, turbo uses existing tokenizer classes where llamacpp tends to rewrite stuff for lack of dependencies, so exl2 worked out of the box. I'll let you know when it's merged and I've got quants up.

1

0

8

@bartowski1182

bartowski

4 months

@WizardLM_AI Where did it go? :'(

3

0

8

@bartowski1182

bartowski

2 months

GGUFs are up!

bartowski/DiscoPOP-zephyr-7b-gemma-GGUF · Hugging Face

@hardmaru

hardmaru

2 months

New Paper and Blog! As LLMs become better at generating hypotheses and code, a fascinating possibility emerges: using AI to advance AI itself! As a first step, we got LLMs to discover better algorithms for training LLMs that align with human preferences.

8

119

568

0

0

8

@bartowski1182

bartowski

4 months

Last but not least, starcoder2 was released during this window, and was tuned by many, including @erhartford with dolphin and @huggingface with starchat

1

1

6

@bartowski1182

bartowski

4 months

Anyone have any thoughts on AWQ vs imatrix for GGUF? Been wanting to add imatrix for awhile, but was told about using AWQ recently. Is using both even better? Give me some opinions :D

3

1

7

@bartowski1182

bartowski

3 months

Woo looks like we have backwards compatibility! You just won't get the tokenizer fixes if you don't update your tool llama.cpp main now gets the common addition question correctly! Even with Q2_K :)

Tweet media one

1

0

7

@bartowski1182

bartowski

1 month

Reading into the Gemma 2 report.. Was highly intrigued by the inclusion of the local sliding window and global attention alternating at each layer, as well as the logit soft-capping. The report doesn't really dive into details on either of them and what the advantages may be,

2

0

7

@bartowski1182

bartowski

2 months

Qwen2 quants started but running into a bug details here: Will update with more details when they're available

Tweet card media

Bug: QWEN2 quantization GGML_ASSERT · Issue #7805 · ggerganov/llama.cpp

What happened? When attempting to quantize Qwen2 7B instruct to IQ2_XS I get the following assert: GGML_ASSERT: ggml-quants.c:12083: grid_index >= 0 Anything I can provide to debug? Uploading th...

0

0

7

@bartowski1182

bartowski

3 months

@erhartford It's up :)

Tweet card media

bartowski/dolphin-2.9-llama3-8b-GGUF · Hugging Face

0

0

7

@bartowski1182

bartowski

4 months

Time for another roundup of recent models I've enjoyed! Lots of good releases since last I posted, so this will be another long one! Check them all out on my @huggingface page

3

0

7

@bartowski1182

bartowski

3 months

This fix:

Tweet card media

llama : improve BPE pre-processing + LLaMA 3 and Deepseek support by ggerganov · Pull Request #6920...

Continuing the work in #6252 by @dragnil1 This PR adds support for BPE pre-tokenization to llama.cpp Summary The state so far has been that for all BPE-based models, llama.cpp applied a default pre...

0

0

6

@bartowski1182

bartowski

3 months

Decided it was time to add AWQ to my pipeline, so enjoy Llama3 8b instruct: TBD if I'll make the 70b or anything larger than ~8b, will try one and see how long it takes to run

Tweet card media

bartowski/Meta-Llama-3-8B-Instruct-AWQ · Hugging Face

0

0

6

@bartowski1182

bartowski

29 days

Finally figured out why my chat template for Phi 3 all prints on one line rather than the multi-line the template/card shows it should be.. They have <|user|>, <|end|>, <|system|>, and <|assistant|> all set with rstrip = true in tokenizer_config.json Anyone know why..?

1

0

6

@bartowski1182

bartowski

1 month

@GoogleDeepMind @googledevs by the way, the officially uploaded f32 GGUFs are based on a broken conversion and don't properly tokenize the start/end tokens. I've got working ones up if you want to take em, or you can remake with latest llama.cpp build where it'll work

0

0

4

@bartowski1182

bartowski

17 days

@huggingface Also shout out to @LMStudioAI and @arcee_ai for helping fund my efforts by providing contracts for some of my work, it's been hugely beneficial for justifying hardware purchases and keeping everything afloat!!

1

0

6

@bartowski1182

bartowski

3 months

@erhartford Anyone have any recommendations for a chat UI that plays nicely with a Transformers backend? Or any open AI backend for that matter

4

0

6

@bartowski1182

bartowski

4 months

One of the first mistral 0.2 finetunes and it's from one of my favourite authors ! Check it out! I expect good things from this.. Quanted and up on my page, GGUF and exl2 as usual :)

Tweet card media

Locutusque/Hercules-4.0-Mistral-v0.2-7B · Hugging Face

0

2

5

@bartowski1182

bartowski

2 months

@MaziyarPanahi @mihai673 @huggingface @LMStudioAI Yup already downloading the lites :D the non-lites will take a tiiiny bit longer...

1

0

5

@bartowski1182

bartowski

3 months

@iamRezaSayar @LMStudioAI @WolframRvnwlf for what it's worth, just pushed an f32 conversion, so if any is gonna work as unquanted (assuming no llamacpp bugs) this should be it:

Phi-3-mini-4k-instruct-fp32.gguf · lmstudio-community/Phi-3-mini-4k-instruct-GGUF at main

1

0

5

@bartowski1182

bartowski

2 months

@rohanpaul_ai Wouldn't this be cool for a company with distributed laptops across all their employees? Running a load balanced and distributed coding model using only a bit of performance from each device.. The possibilities are huge

1

1

5

@bartowski1182

bartowski

3 months

Biggest reason I wish there was some way to after-the-fact update them, though even then you'd have to download them all, make the change, and reupload... There's definitely a simplicity to single-file execution, but boy to I like the json files in exl2 when this happens

@MaziyarPanahi

Maziyar PANAHI

3 months

@WolframRvnwlf "how many broken/suboptimal models are (and will remain) floating around" - I know! editing 100s of GGUF models to point to a right eos_token_id and re-uploading them was so much work! 😂

1

0

7

2

0

5

@bartowski1182

bartowski

2 months

@heyitsyorkie and like, if you look at that chart, sizes like Q3_K_S, Q3_K_L, Q3_K_M, Q2_K, Q4_K_S etc all basically shouldn't exist (in terms of performance/bit) but NEED to for people who can't use i-quants

1

0

3

@bartowski1182

bartowski

1 month

27b is also ready! You'll need to be on llama.cpp b3259 or newer to run it :)

Tweet card media

bartowski/gemma-2-27b-it-GGUF · Hugging Face

1

0

3

@bartowski1182

bartowski

3 months

@yagilb peaked my interest and I reformatted a world_sim prompt to follow llama 3 instruct seems to really do a good job following in (very) limited testing pastebin here: screenshot of playing around with it in @LMStudioAI attached

Tweet media one

1

1

4

@bartowski1182

bartowski

5 months

(5/6) @Weyaxi followed up on the success of Einstein v3 with, you guessed it, v4: @WenhuChen with TIGER lab released StructLM in 3 sizes for structured knowledge grounding tasks:

2

0

4

@bartowski1182

bartowski

5 months

Now featuring: GGUF! I've added GGUF to my pipeline so all future models will come in both ExLlamaV2 and GGUF format :)

1

0

4

@bartowski1182

bartowski

2 months

@JagersbergKnut I was thinking of dropping Q4_K_S because even on a 70B model it's only 2gb smaller than Q4_K_M, a 5% difference, and there's still IQ4_NL which is the same size (though has the disadvantage of being slower on CPU/metal..)

1

0

4

@bartowski1182

bartowski

2 months

Raised an issue for the Deepseek coder GGUF issue here: Will update if it gets any responses/traction :)

Tweet card media

Bug: Deepseek Coder MOE GGML_ASSERT: ggml.c:5705: ggml_nelements(a) == ne0*ne1 · Issue #7979 ·...

What happened? When trying to run one of the new Deepseek Coder conversions or quantizations I see this error: GGML_ASSERT: ggml.c:5705: ggml_nelements(a) == ne0*ne1 Happens when on pure CPU My F32...

0

0

4

@bartowski1182

bartowski

16 days

@cognitivecompai For the record Slaren as usual was the discoverer of the bug source ;D I just do the reporting :) But yeah, it's already being fixed so next time we can try again it'll hopefully work!

Tweet card media

llama : bump max layers from 256 to 512 by ggerganov · Pull Request #8530 · ggerganov/llama.cpp

fix #8528 I have read the contributing guidelines Self-reported review complexity: Low Medium High

2

1

4

@bartowski1182

bartowski

2 months

@dudeman6790 @Teknium1 @MatthewBerman most formatters would have a cutoff, probably anything above 3 you switch to multiline

1

0

4

@bartowski1182

bartowski

29 days

@altryne Don't forget Microsoft's new phi 3 mini update!

0

0

4

@bartowski1182

bartowski

1 month

@MaziyarPanahi @ZyMazza @Presidentlin @teortaxesTex @huggingface Yeah I try to keep an eye on my HF inbox and answer as many questions as I can :)

0

0

4

@bartowski1182

bartowski

3 months

@MaziyarPanahi @NewDigitalEdu @JagersbergKnut 3 days ago, this one slipped under the radar! I haven't started yet, just finally uploading vanilla 70b instruct with the fixed tokenizer ;D

1

1

4

@bartowski1182

bartowski

3 months

@dxrulezsrb @reach_vb It's coming :D just put it in the oven, imatrix quants on the way

1

0

4

@bartowski1182

bartowski

2 months

@NeuralNovel @migtissera Always at the ready 😈

1

0

4

@bartowski1182

bartowski

2 months

@JagersbergKnut Yeah the I-quants are mostly targeted at CUDA/ROCm, with slower performance on metal/CPU, but overall better PPL/bit.. which is why the options become so flooded. If you care about speed and can't fully offload to CUDA/ROCm, K quants are your only option

1

0

4

@bartowski1182

bartowski

2 months

@erhartford Facts though, so disappointed they cap at 32gb.. the price is not bad for that much RAM, but if they managed to shove 8x that amount in there (and then I could install Linux 😇) I'd be pre-ordering (with money I don't have, shh)

0

0

3

@bartowski1182

bartowski

2 months

@MaziyarPanahi @rohanpaul_ai @GroqInc oh god how long be buy "computers as a service" where you get a discount if you leave it on and available for distributed inferencing...

1

0

3

@bartowski1182

bartowski

2 months

@altryne small correction, I think that Cohere only released 2 models, the 8B and 35B, and they're "only" 23 languages

1

0

3

@bartowski1182

bartowski

2 months

@MaziyarPanahi @mihai673 @huggingface @LMStudioAI dammit, they're broken :') when trying to generate i get "GGML_ASSERT: ggml.c:5705: ggml_nelements(a) == ne0*ne1", will report

2

0

3

@bartowski1182

bartowski

2 months

@MaziyarPanahi @Weyaxi @huggingface @julien_c Yeah that sounds great! I have almost 500 models that are exl2 meaning I'm missing a solid 10 TB+ from my number 😈

1

0

3

@bartowski1182

bartowski

2 months

@JagersbergKnut heads up that your tokenizer_config.yaml has a chatml chat template, but your model card suggests i think alpaca? or something similar, not chatml either way

1

0

3

@bartowski1182

bartowski

1 month

oh and shoutout to @abetlen and pculliton at github for their help preparing the PR!!

0

0

3

@bartowski1182

bartowski

2 months

@heyitsyorkie Yeah K_M generally offers almost identical performance/bit as the K_S and is almost the exact same size even when scaling up If IQ quants just worked well on metal and CPU there are so many more sizes I'd drop haha

1

0

3

@bartowski1182

bartowski

3 months

@rohanpaul_ai Why do we not see the PPL comparison of unquanted Llama 3 vs Llama 2? What if PPL on llama 3 is just across the board worse (because PPL is a half-measure anyways)? Also should re-run this with proper BPE support included in llama.cpp

2

0

3