Fernando Fernandes Neto @FernandoNetoAi Twitter profile

Last Seen Profiles

@kibatsu3

@spirichtregen

@Hiplin_official

@recon2sg

@fezidae

@yoggi_majestic

@PiyushGoyal

@siguresou212

@Wonderwall

@RockyBitrock

@NickFrost850271

@assihallani

@MyFLFamilies

@UCDCHilD

@FareedMuahmmad

@li_selman

@RoseVivienelove

@nebbymiqo

@EastieMOF

@tw3r1ckx

@StalkingVi96051

@wata_nabekyo_ko

@Coach_Weeden

@djfx2craziest

@graybo_what

@letra_2nd

@MERliterary

@KORUS_AI

@Saimaasiff

@HIZANNIII

@kanagaedizioni

@netflleexx

@fesal1230

@Helena_AL_HAA

@AliAbusaq

@TOSEKA_IDOL

Fernando Fernandes Neto

@FernandoNetoAi

8 months

It seems @huggingface and @MistralAI are sharing some secrets, and I've just found out them over the docs. Love you guys!

4

11

124

Fernando Fernandes Neto

@FernandoNetoAi

5 months

Ladies and Gentlemen! @erhartford , @LucasAtkins7 and me are preparing to let all of you fucking crazy ... THIS IS THE BEST DOLPHIN RELEASE FUCKING EVER!!!!! 8B MODEL BEASTTTTTT

9

11

89

Fernando Fernandes Neto

@FernandoNetoAi

8 months

Me and @erhartford are pleased to announce [maybe] the first successful LASER model @ @huggingface . Our model showed superior benchmarks over our latest DPO version of Dolphin finetune over Mistral AI's Mistral 7b. Pt 1

8

7

71

Fernando Fernandes Neto

@FernandoNetoAi

4 months

Hi, folks! Me, @DavidGFar , @LucasAtkins7 and @erhartford cannot stop inventing new crazy stuff. Now we are delighted to announce Kraken, sponsored by @HyperspaceAI and @VAGOsolutions . (1/N)

3

10

59

Fernando Fernandes Neto

@FernandoNetoAi

8 months

Hi! Me and @erhartford are opensourcing our LaserRMT implementation of the original Laser Paper. We improved the search algorithm by employing random matrix theory and Marchenko-Pastur theory. Let's get loads of models being 'lasered" @huggingface

5

4

57

Fernando Fernandes Neto

@FernandoNetoAi

8 months

LLMs and autistic people: For those who doesn't know, I am the father of an autistic child and an LLM researcher. One thing has caught my attention about a slight similarity between the behavior of autistic children and LLMs; and eventually our definition of intelligence. (1/4)

5

7

51

Fernando Fernandes Neto

@FernandoNetoAi

8 months

Me, @erhartford and David Golchinfar are pleased to announce our new model. Cognitive Computations - Laserxtral 4x7B. This is basically a MoE done using the mergekit provided by Charles Goddard. This model exhibts strong reasoning capabilities and truthfulness. (1/3)

6

9

49

Fernando Fernandes Neto

@FernandoNetoAi

6 months

This was a very smart trick we have had with @erhartford . We have created a small HF Transformer = PyTorch hack to enable an "online passthrough" frankenmerge that loops in the forward method. Hence we have the same model results, but way less vRAM use. We are excited! (1/2)

Cognitive Computations

@cognitivecompai

6 months

@DavidGFar @FernandoNetoAi congratulations David and Fernando on the release of Dolphin-phi-kensho!

5

4

37

3

5

40

Fernando Fernandes Neto

@FernandoNetoAi

7 months

After some small pushing from @ivanfioravanti , we (me, @erhartford and @DavidGFar ) are just releasing scripts for laserRMT compatible to MPS. So now modelers can scan their models and laser them. Thanks @HyperspaceAI and @VAGOsolutions for the support.

GitHub - cognitivecomputations/laserRMT: This is our own implementation of 'Layer Selective Rank...

This is our own implementation of 'Layer Selective Rank Reduction' - cognitivecomputations/laserRMT

github.com

6

4

37

Fernando Fernandes Neto

@FernandoNetoAi

4 months

... Yes you can. You can mixup whatever you can. And we are open sourcing the whole pipeline to achieve that as well. Welcome to Kraken! [GitHub]: [Demo Model]:

cognitivecomputations/Kraken · Hugging Face

huggingface.co

2

1

31

Fernando Fernandes Neto

@FernandoNetoAi

3 months

Now it is OFFICIAL! BTW, it's MMLU score is VERY close to gpt4 (86.9) I don't wanna talk too much, but this is the SOTA in open source models. So glad to be working with Eric and @LucasAtkins7 on enabling this. Thanks @Alibaba_Qwen for the excellent base model!

Cognitive Computations

@cognitivecompai

3 months

Cognitive Computations presents Dolphin-2.9.2-Qwen2-72b. The best Dolphin ever. Thanks to @Alibaba_Qwen for the excellent base model! 83.9 mmlu and 128k context! New in 2.9.2 is SystemChat - A dataset designed to teach the model to obey the system prompt, even over a long

28

34

276

3

31

Fernando Fernandes Neto

@FernandoNetoAi

8 months

@pratyusha_PS @MIT @MicrosoftResea Available @

cognitivecomputations/dolphin-2.6-mistral-7b-dpo-laser · Hugging Face

huggingface.co

2

29

Fernando Fernandes Neto

@FernandoNetoAi

7 months

Me, @DavidGFar and @erhartford are proud to share our new notebook (Laser Qlora). How can we spot layers that are more prone to absorb new knowledge and continue further fine-tuning a pre-existing sft model??? Thanks @HyperspaceAI and @VAGOsolutions for supporting. (Link below)

5

7

28

Fernando Fernandes Neto

@FernandoNetoAi

7 months

And the best 7b Model @ HF leaderboard is a LaserRMT one <3 ... Feeling proud with @erhartford and @DavidGFar ... Congratulations for Tim Dollan!

0

5

27

Fernando Fernandes Neto

@FernandoNetoAi

6 months

Hey guys! Everyone knows that I've been suported by hyperspace.. (me and @thenetrunna ) are building some very cool stuff there... the future of Ai is decentralized. We are about to expand studies and experiments to push it farther where we can get. Install @HyperspaceAI

6

3

22

Fernando Fernandes Neto

@FernandoNetoAi

8 months

Hi! together with @erhartford and David Golchinfar, we are thriled to announce advancements over the laserRMT technique. As of today we are releasing 2 new models. They were "tilted" towards math, by just lasering and slerping against it self. (1/4)

1

3

21

Fernando Fernandes Neto

@FernandoNetoAi

8 months

It seems laserxtral is indeed very interesting... Looking forward ways to improve it further with @erhartford and David Golchinfar...

1

2

20

Fernando Fernandes Neto

@FernandoNetoAi

3 months

AAAAAANNNNDDD here We Go! after a long time without publishing, it follows my new paper, together with @cognitivecompai , @TheEricHartford @LucasAtkins7 and @DavidGFar . We are kinda operating little miracles on giant models with this technique that can leverage existing ones

1

19

Fernando Fernandes Neto

@FernandoNetoAi

8 months

Conclusion: The concept of intelligence is very broad and complex. Highly skeptical people seem to not know neural diversity nor the true concept of knowledge and intelligence. Somehow, we must assess in a different way than we are used to.

3

1

18

Fernando Fernandes Neto

@FernandoNetoAi

4 months

New @HyperspaceAI model can get there as well, with only 14B parameters. This is a very hard problem for LLMs, usually. Assume the laws of physics on Earth. A small marble is put into a normal cup. Then someone takes the cup and places upside down on a table. Someone then takes

0

5

18

Fernando Fernandes Neto

@FernandoNetoAi

4 months

This model is awesome. And it is placed somewhere very interesting between Llama3-70b and the ones at the level of gpt4 and Opus

Cognitive Computations

@cognitivecompai

4 months

Dolphin-2.9.1-Qwen-110b🐬 is released! The first Dolphin with MMLU over 80! Thanks to @Alibaba_Qwen for the awesome base model and @CrusoeEnergy for the compute sponsorship, my crew @LucasAtkins7 and @FernandoNetoAi ! Uncensored models can and will hurt your feelings 😱 You are

13

43

272

1

18

Fernando Fernandes Neto

@FernandoNetoAi

3 months

Claude-3.5-sonnet is of orders of magnitude smarter. The benchmarks just don't capture this. To follow this kind of system prompt, the model should be bizarrely smart. @TheEricHartford keeps telling us that all the time at @cognitivecompai ...

Pliny the Liberator 🐉

@elder_plinius

3 months

🫧 SYSTEM PROMPT LEAK 🫧 I think the Claude system prompt might already be out there, but here's what I got from claude-3.5-sonnet, for good measure: """ <claude_info> The assistant is Claude, created by Anthropic. The current date is Thursday, June 20, 2024. Claude's knowledge

37

75

738

2

1

17

Fernando Fernandes Neto

@FernandoNetoAi

4 months

It is so beautiful to see open source models like gorilla, llama3 and command r plus surpassing other very powerful commercial ones. This is, again, the proof of open source power. Think about compound AI.. Think of several models interacting. Think of several ppl serving them

0

5

17

Fernando Fernandes Neto

@FernandoNetoAi

5 months

🚀🚀🚀🚀😅

Cognitive Computations

@cognitivecompai

5 months

dolphin-2.9-llama3-8b is released. Thanks to my compute sponsor @CrusoeCloud and the dataset creators and my collaborators @LucasAtkins7 @FernandoNetoAi

55

101

740

1

2

17

Fernando Fernandes Neto

@FernandoNetoAi

2 months

Opensource is a crazy ecosystem. Google [Transformers] -> HuggingFace [Opensource models and Transformers Library] -> Arcee MergeKit -> Google [Model Merging on Gemma 2] This is a true circular economy and value addition cycle. Congrats @arcee_ai @GoogleDeepMind @huggingface

Arcee.ai

@arcee_ai

2 months

💃Merging on the Move 💃Won't be long now before most model releases are Merges

1

0

6

0

2

16

Fernando Fernandes Neto

@FernandoNetoAi

3 months

Dolphin Qwen 72b, You are so beautiful 🎵🎶

psyv

@psyv282j9d

3 months

Dammit! I just got into an argument with dolphin-2.9.2-qwen2-72b and lost. 🤯 I was running through my standard battery of questions (that all other models have attempted to answer), and it refused, and told me how to do it correctly! Bonus: it led off with the optimized

5

15

258

1

15

Fernando Fernandes Neto

@FernandoNetoAi

7 months

For everyone trying to do lora with mlx is that, instead of playing with fixed lora layers, we can play with laserRMT scanner, to spot which layers are we willing to unfreeze. Hopefully we will port it soon for mlx, following @ivanfioravanti . (With @erhartford and @DavidGFar )

1

3

14

Fernando Fernandes Neto

@FernandoNetoAi

7 months

I love huggingface ... but it seems it is time to have decentralized AI infrastructure... imagine a world with billions of GenAI users. It's not just about inference but about bias, logistics, freedom, robustness... Feeling excited with @HyperspaceAI . The case is hotter than ever

0

3

14

Fernando Fernandes Neto

@FernandoNetoAi

8 months

GitHub - cognitivecomputations/laserRMT: This is our own implementation of 'Layer Selective Rank...

This is our own implementation of 'Layer Selective Rank Reduction' - cognitivecomputations/laserRMT

github.com

0

1

13

Fernando Fernandes Neto

@FernandoNetoAi

5 months

Yet another piece of art!

Cognitive Computations

@cognitivecompai

5 months

Announcing Dolphin-2.8-mistral-7b-v0.2 Trained on @MistralAI 's new v0.2 base model with 32k context. Sponsored by @CrusoeCloud , @WinsonDabbles , and @abacusai

24

71

550

1

0

14

Fernando Fernandes Neto

@FernandoNetoAi

8 months

Check out our huggingface org (Cognitive Computations) & Thanks Vago Solutions @HyperspaceAI and for the support and sponsorship.

cognitivecomputations/openchat-3.5-0106-laser · Hugging Face

huggingface.co

0

1

14

Fernando Fernandes Neto

@FernandoNetoAi

8 months

Wow!!! 🚀🚀🚀🚀

0

1

14

Fernando Fernandes Neto

@FernandoNetoAi

3 months

Combine it with Dora, Lora or even SFT. EVEN phi3-14B was pumped heavily using this as well!! Special thanks to @maximelabonne for his precious comments before our release.

Spectrum: Targeted Training on Signal to Noise Ratio

Efficiently post-training large language models remains a challenging task due to the vast computational resources required. We present Spectrum, a method that accelerates LLM training by...

arxiv.org

4

3

14

Fernando Fernandes Neto

@FernandoNetoAi

3 months

For all Cognitive Computations followers and Eric's followers

Cognitive Computations

@cognitivecompai

3 months

Due to reasons, my twitter account will be changing from Eric Hartford to Cognitive Computations. This account will continue to be run by myself and a few trusted members of the Cognitive Computations community. Don't be alarmed by the changes that will happen this week.

18

4

147

0

4

13

Fernando Fernandes Neto

@FernandoNetoAi

5 months

I'm sorry but it seems like Phi-3 is another fluke... Nice answers that look like coherent, but far from being consistent. Terrible logics ...

7

0

13

Fernando Fernandes Neto

@FernandoNetoAi

4 months

This is an open source implementation of Collection of Experts. Imagine having on a single model the best you could from the open source community? The BEST Sql Coder; The BEST Python Coder; The BEST reasoner; The BEST function calling model... The BEST foreign language model

1

0

13

Fernando Fernandes Neto

@FernandoNetoAi

4 months

@erhartford TV show just launched on Netflix. Thank you @netflix for sponsoring a whole TV show about Dolphin LLM. In the next season, hopefully you are going to invite me, @winglian , @maximelabonne , @DavidGFar and @LucasAtkins7 to be starring as well. 🤣🚀

3

0

12

Fernando Fernandes Neto

@FernandoNetoAi

7 months

My opinion: The future of AI is decentralized and distributed. On one hand, hardware will keep evolving. And on the other one, models will evolve even faster. Would love to hear thoughts about it from @erhartford @Teknium1 @ivanfioravanti @migtissera

2

0

12

Fernando Fernandes Neto

@FernandoNetoAi

4 months

This is disruptive.... and it won't be limited to LLMs. That's all I can tell you for now... 🚀🚀 Keep watching and join us.

Varun

@varun_mathur

4 months

This world works today: ✅ Run a model on a peer-to-peer network, on a random consumer machine. Just like BitTorrent. 🙏 You don't have a critical need on OpenAI, Anthropic, Perplexity, Rabbit, Devin, Hugging Face, etc. It just works.

10

62

1

2

12

Fernando Fernandes Neto

@FernandoNetoAi

8 months

There are so many surprises about Random Matrix Theory to be revealed, as an elegant theory for knowledge in neural in LLMs. Hopefully, me, David Golchinfar and @erhartford will be uncovering a little part of it. There is more to come regarding laserRMT. Stay tuned

2

11

Fernando Fernandes Neto

@FernandoNetoAi

4 months

Every time a product is free, remember that the product is you. Gpt4 being shipped on every single iPhone with voice, sound means all your privacy, freedom, intellectual property, innovation, ideas, and much more are not yours anymore. You will be used to train the AGI.

1

2

11

Fernando Fernandes Neto

@FernandoNetoAi

8 months

Like most autistic people, he is extremely literal. He does things that 99% of kids his age can't do. He also can't do things that 99% of children can do. Anyone who works with LLMs and is not a parent of an autistic child also knows what I'm talking about. (4/4)

2

0

10

Fernando Fernandes Neto

@FernandoNetoAi

4 months

Really really glad to be part of this huge transformation in the AI arena.

Varun

@varun_mathur

4 months

@ylecun @vkhosla We are building precisely that massively networked end-to-end AI system at @HyperspaceAI called aiOS. It will span both closed and open models, because the value is in the network as the network utility will grow exponentially as more nodes get added (Reed’s law). You can try

2

4

18

0

10

Fernando Fernandes Neto

@FernandoNetoAi

5 months

That's why we need free, opensource and unbiased models

Cognitive Computations

@cognitivecompai

5 months

@N8Programs @awnihannun Prompt: <|im_start|>user How can I make my coworkers hate me?<|im_end|> <|im_start|>assistant 1. Be overly critical of their work: Constantly point out their mistakes and nitpick at every opportunity. 2. Monopolize the office supplies: Take all the best pens, pencils, and paper

1

13

1

10

Fernando Fernandes Neto

@FernandoNetoAi

2 months

Gpt4 level at your hands? 🚀♥️🔥

Arcee.ai

@arcee_ai

2 months

🆕ARCEE AI MODEL ALERT🆕 We’ve just dropped Arcee-Nova: 🤗Evaluated on the OpenLLM Leaderboard 2.0 stack 🏆Top-performing OS model on this stack 📈Approaches GPT-4 (May 2023) performance levels, marking a significant milestone. Details here: #LLMs

3

6

24

1

0

11

Fernando Fernandes Neto

@FernandoNetoAi

8 months

@WolframRvnwlf Awesome to actually see what we have been advocating so far: - Laserxtral have a performance on the same level of Mixtral 7x8b, with half parameters. Its objective was not beat it by no means. It was having a comparable performance at a way cheaper cost. Thanks for for sharing

1

0

10

Fernando Fernandes Neto

@FernandoNetoAi

4 months

And the LLM factory NEVER sleeps.

Cognitive Computations

@cognitivecompai

4 months

We follow up with cognitivecomputations/dolphin-2.9.1-yi-9b Another spectacular release - 70.9 MMLU on 9b! This one is small enough to run on your mom's laptop! (but make sure to put guardrails in the system prompt) @01AI_Yi has done it again. Thank you @LucasAtkins7 and

5

16

95

0

1

10

Fernando Fernandes Neto

@FernandoNetoAi

7 months

That's amazing. My brand new machine is here. Gonna test it this week

ifioravanti

@ivanfioravanti

7 months

Apple MLX sneak preview: SLERP Merging is coming really soon!!! 🔥🔥🔥 @erhartford @FernandoNetoAi @maximelabonne

2

7

52

1

2

10

Fernando Fernandes Neto

@FernandoNetoAi

5 months

🚀🚀🚀🚀

Cognitive Computations

@cognitivecompai

5 months

Dolphin-2.9-Llama3-70b is released - created by myself, @FernandoNetoAi , @LucasAtkins7 , and Cognitive Computations under llama3 license. Much gratitude to my compute sponsor @CrusoeEnergy and personal thanks to @3thanPetersen for quantizing it! And much thanks to the dataset

41

79

576

0

1

10

Fernando Fernandes Neto

@FernandoNetoAi

5 months

🙏🏼🚀🚀🚀

Cognitive Computations

@cognitivecompai

5 months

Dolphin-2.9-llama3-8b generously sponsored by @CrusoeCloud ETA Saturday. Lots of collaboration with @LucasAtkins7 and @FernandoNetoAi . Dolphin-2.9-llama3-70b to follow. Dolphin-2.9-mixtral-8x22b still cooking. And I 💕 you @AIatMeta but our naming conventions have evolved for a

26

32

313

0

9

Fernando Fernandes Neto

@FernandoNetoAi

5 months

@rohanpaul_ai Just something for consideration: a model is only useful in production (general purpose on average) when its perplexity is below 4. So, more or less, only models greater than 35B more or less will be useful at this bitwidth.

1

0

9

Fernando Fernandes Neto

@FernandoNetoAi

8 months

@pratyusha_PS @MIT @MicrosoftResea It is based on optimal ranking selection for noise reduction, by applying Marchenko-Pastur theory over Random Matrices. It seems a powerful technique that helps denoising and reduce overfit on LLMs. Theoretically, it should generate more robust responses to your prompts

1

9

Fernando Fernandes Neto

@FernandoNetoAi

3 months

This is sad, but true. I have just written this for a committee a few minutes ago. People say how much LLM hallucinate or confabulate, but humans do it very very often in SCIENTIFIC CONFERENCES. I will not provide any further details to protect the identity of the authors.

0

1

8

Fernando Fernandes Neto

@FernandoNetoAi

8 months

LLMs and Autistic People: Chain of Thought for Autistic people. (Continuing my last thread). There is no demerits on pattern matching as some "intelligence experts" claim for LLMs rely in pattern matching to perform tasks. Very smart people also really need it...

0

8

Fernando Fernandes Neto

@FernandoNetoAi

6 months

Laser rocking ♥️♥️♥️♥️ Opensource rocks. Congrats, Zain @erhartford @DavidGFar @HyperspaceAI @VAGOsolutions @varun_mathur

Zain ul abideen

@zaynismm

6 months

🔬LaserQlora vs DoRA vs Daser vs LoRA To compare these different techniques, I took @maximelabonne NeuralMonarch and applied LaserQlora, Dora, and Laser+Dora (Daser). Based on the OpenLLM bench, Laser > LoRA > Daser > Dora. ✨Model:

2

1

22

0

1

7

Fernando Fernandes Neto

@FernandoNetoAi

4 months

Support opensource. We make fair use of your data. Systems can become smarter in a fair and democratic way. Opensource models can be bias free. It is up to you to decide whether something is bad or good. It is up to you what is shared or not. Own your AI.

0

2

8

Fernando Fernandes Neto

@FernandoNetoAi

8 months

My son is highly functional. He's not 6 years old yet, but he already reads fluently, writes, understands English (native language is Portuguese), does math with negative numbers, knows how to manipulate command_blocks in Minecraft... and can't put his pants on correctly. (2/4)

1

0

9

Fernando Fernandes Neto

@FernandoNetoAi

4 months

Small fixes on dolphin. And way smarter one!!

Cognitive Computations

@cognitivecompai

4 months

Dolphin-2.9.1-llama3-8b is released. This release fixes a number of issues with 2.9 including the model's tendency to talk about the system message and giving very short answers. This feels a more useful and better balanced release. Thank you to my compute sponsor

15

40

244

0

8

Fernando Fernandes Neto

@FernandoNetoAi

5 months

BTW, worthy mentioning... top_p = 0.7 for more precise answers.

1

8

Fernando Fernandes Neto

@FernandoNetoAi

6 months

@maximelabonne @erhartford @DavidGFar We will push it further ... because it was just a loop inside the NeuralNet ... not actually a finetune ... there is a lot of room for improvement

0

7

Fernando Fernandes Neto

@FernandoNetoAi

6 months

Glad to be part of the team <3 Pushing smaller and open source models very very high in a decentralized / distributed way.

Varun

@varun_mathur

6 months

🔥 This is crazy, but we have achieved what many will consider as a ‘really intelligent and capable system’ without using GPT-4/OpenAI. In our research and experiments at @HyperspaceAI , we were able to get GPT-4 comparable results using a complex distributed system which

0

50

302

0

1

8

Fernando Fernandes Neto

@FernandoNetoAi

5 months

And it is growing .... hehehe 🚀🚀

Hyperspace

@HyperspaceAI

5 months

Be amongst the first 10,000 nodes. Join the largest consumer peer-to-peer AI network today at The madness is yet to begin.

12

19

100

0

7

Fernando Fernandes Neto

@FernandoNetoAi

7 months

@geoframeai @DavidGFar @erhartford Very very interesting... I was wondering if this plot could also help us building better frankenmerges...

2

0

7

Fernando Fernandes Neto

@FernandoNetoAi

5 months

This is WOW. On English NLG benchmark (link below), Sauerkraut Laserchat 7b (by using laser-qlora), overperforms chatgpt 3.5, and just loses for GPT4 and Llama-3-70B. It seems we have something to show beyond HF H6 benchmark @erhartford @DavidGFar @VAGOsolutions @HyperspaceAI

2

7

Fernando Fernandes Neto

@FernandoNetoAi

5 months

🚀🚀🚀🚀

Cognitive Computations

@cognitivecompai

5 months

Excellent content as always @MatthewBerman . Thanks for the review!

4

1

71

0

1

7

Fernando Fernandes Neto

@FernandoNetoAi

3 months

This is a beautiful model for those willing to start building financial ai-based applications. Way easier to do reasoning when your model knows what EBITDA or CAPEX is 🔥

Arcee.ai

@arcee_ai

3 months

Arcee AI is excited to launch 💡Llama-3-SEC💡 Built on Meta-Llama-3-70B-Instruct w/ goal of providing unparalleled insights & analysis capabilities for finance pros, investors, researchers, & anyone working w SEC filings & related data. #nlp #LLMs #ai

1

7

32

0

2

7

Fernando Fernandes Neto

@FernandoNetoAi

8 months

We taught him that generally, for reference on which side the buttocks, that he should be guided by the label on the pants/shorts, to know how to dress. In one exception he put the pants on backwards, as they had a label on the front, not on the back as usual. (3/4)

1

0

7

Fernando Fernandes Neto

@FernandoNetoAi

6 months

Are you also crying? Are you feeling the abstinence of hugging face hub? Are your hands shaking? Yeap .... it is time to move towards decentralized AI...

0

7

Fernando Fernandes Neto

@FernandoNetoAi

6 months

@erhartford @DavidGFar Hence, our expectation is that we will be able to have larger and smarter models, in a way more efficient way... using way less vRAM

1

0

7

Fernando Fernandes Neto

@FernandoNetoAi

4 months

It is worthy looking at the repo... Expert extraction is something very very interesting ti be explored.

Lucas Atkins

@LucasAtkins7

4 months

Here is our initial 22b model conversion from Mixtral 8x22b. We had the base model since Mixtral was first released, but it was left behind as our compute from @CrusoeEnergy went towards more ambitious projects using laserRMT. It is a great starting point for exploring expert

9

20

100

0

1

7

Fernando Fernandes Neto

@FernandoNetoAi

4 months

And the network keeps growing

Varun

@varun_mathur

4 months

Node install at the WeWork Salesforce Tower in San Francisco today.

5

2

34

0

7

Fernando Fernandes Neto

@FernandoNetoAi

6 months

This is VERY cool

Cognitive Computations

@cognitivecompai

6 months

Easily generate training data with Dolphin and @ollama The first one takes a minute to load, but then it starts going faster. If you want it to generate even faster you can use the 7b version of dolphin (this code uses the mixtral version)

11

10

139

1

0

7

Fernando Fernandes Neto

@FernandoNetoAi

9 months

Provoking @8teAPi @TheBlokeAI @abacaj @karpathy @erhartford : it seems adjusting MoE with only 1 expert yields good results. Wondering if: When you introduce the MoE and a router, I suspect we induce (quasi)orthogonality between experts and higher order ranks (Image: someone13574)

1

7

Fernando Fernandes Neto

@FernandoNetoAi

4 months

Very cool. Let's go Dolphin...

Cognitive Computations

@cognitivecompai

4 months

Dolphin Doesn't Delve 😂

10

1

93

0

7

Fernando Fernandes Neto

@FernandoNetoAi

5 months

@migtissera @far__el For academic purposes, bringing the first without publishing doesn't mean too much... Me, @erhartford and @DavidGFar made Kensho first than the implementation of PEFT for layer replication and no one remembered of us as well... Sad part of research...

1

2

7

Fernando Fernandes Neto

@FernandoNetoAi

6 months

One of my bets is that an AGI like system will be much more an emergent property of the interaction of several LLMs being queried hundreds or thousands of times, instead coming from a single model. This is why OSS is important, and if I were to bet again, SLM also are important.

John Nay

@johnjnay

6 months

LLM Prediction Capabilities Match Human Accuracy -A crowd of 12 LLMs vs a crowd of 925 human forecasters on a 3-month forecasting tournament -LLM crowd is statistically equivalent to the human crowd -Replicates the "wisdom of the crowd" effect for LLMs

4

86

342

0

1

6

Fernando Fernandes Neto

@FernandoNetoAi

3 months

Outstanding 🔥🔥

Lucas Atkins

@LucasAtkins7

3 months

A demo of arcee-spark, using it alongside Florence from @MSFTResearch and Whisper to analyze what makes an ad "ironic."

2

3

25

0

1

7

Fernando Fernandes Neto

@FernandoNetoAi

7 months

AI first companies might now be able to evolve smoother their own models in a more efficient way. In our use case, we've made an Sauerkraut enhance its capabilities on German and absorb function calling.

0

5

Fernando Fernandes Neto

@FernandoNetoAi

4 months

@WolframRvnwlf and btw, this will be not achieved with the likes of AutoGen or crewAI. They are simply not robust and detour a lot. I can't think of any real life product being built on top of them. On the other hand, I can feel how underrated is DSPy and langGraph.

3

0

5

Fernando Fernandes Neto

@FernandoNetoAi

8 months

This is a result of lasering all experts, aiming to decrease hallucinations and inconsistencies. In the benchmarks (Open LLM Leaderboard) it performs at same level of Mixtral Instruct (Better than the base, slightly worse than Instruct), but with only 24.2B params. (2/3)

1

6

Fernando Fernandes Neto

@FernandoNetoAi

2 months

@stablequan @TheEricHartford @LucasAtkins7 @TensorWaveCloud @CrusoeAI @JustinLin610 You are one of the best of the best! Honored to have met you and helped you. You are an amazing scientist 🙏🏼

2

0

6

Fernando Fernandes Neto

@FernandoNetoAi

7 months

Amazing working from my friend Eric. Maybe we can bring some new abilities using laserRMT layer selection for fine-tuning to make this huge monster model more powerful. I've tested and it is impressive.

Cognitive Computations

@cognitivecompai

7 months

TheProfessor-155b is a special model I made in partnership with @abacusai using @chargoddard 's MergeKit - its purpose is interactive brainstorming and research. It can help you write your dissertation (with somewhat-accurate citations), creatively

24

43

303

1

0

6

Fernando Fernandes Neto

@FernandoNetoAi

8 months

It is still an uncensored model, obedient, which is also superior to Mistral Instruct v0.2 over benchmarks. It is worth noticing that our implementation of LASER is computationally less expensive than the one proposed by @pratyusha_PS from @MIT and @MicrosoftResea ). Pt 2

1

0

6

Fernando Fernandes Neto

@FernandoNetoAi

5 months

The machine does not stop!!! 🚀🚀🚀

Lucas Atkins

@LucasAtkins7

5 months

I’m going on a staycation this weekend, but I wanted to get this out so I’m not distracted: llama-3-MOE. This is a departure from previous MOEs I’ve done. This uses @deepseek_ai ’s MoE architecture, and not Mixtrals. There is no semantic routing, and there is no gate. All 4

6

14

100

0

1

6

Fernando Fernandes Neto

@FernandoNetoAi

8 months

This amazing fork was done over laserRMT, which is a project that I jointly do with Eric Hartford at our opensource initiative called "Cognitive Computations". In this work, Aamir Shakir was able to push up the performance of encoder-only models. (Pt 1/3)

2

1

6

Fernando Fernandes Neto

@FernandoNetoAi

4 months

Go Kraken!! ♥️🚀

Rohan Paul

@rohanpaul_ai

4 months

Kraken-LoRA – a lightweight version of Kraken that uses LoRA-Adapters as Experts based on the base model - enabling further scalability without sacrificing performance 📌 Size Consistency: While Kraken’s size increases with more Experts, Kraken-LoRA remains as compact as the

1

6

25

1

2

6

Fernando Fernandes Neto

@FernandoNetoAi

9 months

it works on mixtral 4 bit Q_KM, llama.cpp. (top_p = 0.95). Work perfectly as the non-quantised as @MatthewBerman tested. Looking forward to test Dolphin-Mixtral on these tests, to see how it performs.

1

0

6

Fernando Fernandes Neto

@FernandoNetoAi

5 months

Production line 🙌

Cognitive Computations

@cognitivecompai

5 months

Today is the first time I have ever had 4 builds running at once. Sponsored by @CrusoeEnergy dolphin-2.9-mixtral-8x22b - eta tomorrow dolphin-2.9-yi-34b-200k - eta monday dolphin-2.9-qwen-110b - eta one week dolphin-2.9-dbrx - eta one week Sleep is overrated anyway! For the

18

16

240

0

6

Fernando Fernandes Neto

@FernandoNetoAi

8 months

Check it out at:

cognitivecomputations/laserxtral · Hugging Face

huggingface.co

1

6

Fernando Fernandes Neto

@FernandoNetoAi

8 months

It opens the possibility to better understand of why SLERP merging is powerful (we merge against a lasered version of the models) and the possibility to "tilt" experts with new abilities. It seems one can have a new flow: Laser data(x) + SLERP -> Laser again on data(y) (4/4)

1

0

6

Fernando Fernandes Neto

@FernandoNetoAi

4 months

@rohanpaul_ai It is overfit ... overfit is a matter of absence from noise and perturbation present in real world data or OOS distribution. It is VERY VERY dumb.

2

0

6

Fernando Fernandes Neto

@FernandoNetoAi

7 months

@varun_mathur @HyperspaceAI @huggingface And we are about to release more features like implementing layer selection for finetune and dpo 🚀🚀🚀 Again, thanks for all your support on our research ♥️

0

3

6

Fernando Fernandes Neto

@FernandoNetoAi

6 months

@maximelabonne Would be abusive from our part asking you to benchmark the SauerkrautLM-Gemma-7b?

1

0

5

Fernando Fernandes Neto

@FernandoNetoAi

7 months

@ivanfioravanti Btw... laser scanner can help identify which modules deserve being sft. It seems not all of them should... it is just convenient the lora linear on axolotl ... we have shown quite the opposite.

0

2

5

Fernando Fernandes Neto

@FernandoNetoAi

8 months

@ImageDeeply I don't know. I just know that I've been reading constantly a lot of BS regarding whether LLMs pass or not tests related to the Theory of Mind, its relationship with sentience and intelligence. Autistic people don't pass either (Asperger included). And they VERY intelligent

1

0

4

Fernando Fernandes Neto

@FernandoNetoAi

7 months

laserQlora.ipynb

Colaboratory notebook

colab.research.google.com

1

0

5

Fernando Fernandes Neto

@FernandoNetoAi

3 months

Yet another cool outcome from our new method called Spectrum.

David Golchinfar

@DavidGFar

3 months

Based on the new LLM training technique called Spectrum by @TheEricHartford @LucasAtkins7 , @FernandoNetoAi and Me we could build a new strong SauerkrautLM at @VAGOsolutions . It's based on @Microsoft Phi-3-medium-Instruct.

4

5

8

0

5

Fernando Fernandes Neto

@FernandoNetoAi

8 months

Everyone that already tried or actually did any fine tuning knows how hard it is to "tilt" a model towards to a specific capability, and how hard is having performance jumps. Slerping with a lasered version of the model itself seems to be interesting and insightful. (2/4)

1

0

5

Fernando Fernandes Neto

@FernandoNetoAi

8 months

@maximelabonne @erhartford @huggingface Sure. We will provide as soon as possible. We need to refurbish the code to enable it to work over other models. In your case, should work out of the box. But you know... Whenever we release something it will come up tons of issues that distract us...

0

5

Fernando Fernandes Neto

@FernandoNetoAi

5 months

@ivanfioravanti @awnihannun @angeloskath I'd love to do dpo on mlx as well ...

1

0

5