Fernando Fernandes Neto Profile
Fernando Fernandes Neto

@FernandoNetoAi

838
Followers
66
Following
22
Media
523
Statuses

Machine Learning and AI researcher wayyy before all this hype.

Joined December 2023
Don't wanna be here? Send us removal request.
@FernandoNetoAi
Fernando Fernandes Neto
8 months
It seems @huggingface and @MistralAI are sharing some secrets, and I've just found out them over the docs. Love you guys!
Tweet media one
4
11
124
@FernandoNetoAi
Fernando Fernandes Neto
5 months
Ladies and Gentlemen! @erhartford , @LucasAtkins7 and me are preparing to let all of you fucking crazy ... THIS IS THE BEST DOLPHIN RELEASE FUCKING EVER!!!!! 8B MODEL BEASTTTTTT
Tweet media one
Tweet media two
9
11
89
@FernandoNetoAi
Fernando Fernandes Neto
8 months
Me and @erhartford are pleased to announce [maybe] the first successful LASER model @ @huggingface . Our model showed superior benchmarks over our latest DPO version of Dolphin finetune over Mistral AI's Mistral 7b. Pt 1
8
7
71
@FernandoNetoAi
Fernando Fernandes Neto
4 months
Hi, folks! Me, @DavidGFar , @LucasAtkins7 and @erhartford cannot stop inventing new crazy stuff. Now we are delighted to announce Kraken, sponsored by @HyperspaceAI and @VAGOsolutions . (1/N)
Tweet media one
3
10
59
@FernandoNetoAi
Fernando Fernandes Neto
8 months
Hi! Me and @erhartford are opensourcing our LaserRMT implementation of the original Laser Paper. We improved the search algorithm by employing random matrix theory and Marchenko-Pastur theory. Let's get loads of models being 'lasered" @huggingface
5
4
57
@FernandoNetoAi
Fernando Fernandes Neto
8 months
LLMs and autistic people: For those who doesn't know, I am the father of an autistic child and an LLM researcher. One thing has caught my attention about a slight similarity between the behavior of autistic children and LLMs; and eventually our definition of intelligence. (1/4)
5
7
51
@FernandoNetoAi
Fernando Fernandes Neto
8 months
Me, @erhartford and David Golchinfar are pleased to announce our new model. Cognitive Computations - Laserxtral 4x7B. This is basically a MoE done using the mergekit provided by Charles Goddard. This model exhibts strong reasoning capabilities and truthfulness. (1/3)
6
9
49
@FernandoNetoAi
Fernando Fernandes Neto
6 months
This was a very smart trick we have had with @erhartford . We have created a small HF Transformer = PyTorch hack to enable an "online passthrough" frankenmerge that loops in the forward method. Hence we have the same model results, but way less vRAM use. We are excited! (1/2)
@cognitivecompai
Cognitive Computations
6 months
@DavidGFar @FernandoNetoAi congratulations David and Fernando on the release of Dolphin-phi-kensho!
Tweet media one
5
4
37
3
5
40
@FernandoNetoAi
Fernando Fernandes Neto
7 months
After some small pushing from @ivanfioravanti , we (me, @erhartford and @DavidGFar ) are just releasing scripts for laserRMT compatible to MPS. So now modelers can scan their models and laser them. Thanks @HyperspaceAI and @VAGOsolutions for the support.
6
4
37
@FernandoNetoAi
Fernando Fernandes Neto
4 months
... Yes you can. You can mixup whatever you can. And we are open sourcing the whole pipeline to achieve that as well. Welcome to Kraken! [GitHub]: [Demo Model]:
2
1
31
@FernandoNetoAi
Fernando Fernandes Neto
3 months
Now it is OFFICIAL! BTW, it's MMLU score is VERY close to gpt4 (86.9) I don't wanna talk too much, but this is the SOTA in open source models. So glad to be working with Eric and @LucasAtkins7 on enabling this. Thanks @Alibaba_Qwen for the excellent base model!
@cognitivecompai
Cognitive Computations
3 months
Cognitive Computations presents Dolphin-2.9.2-Qwen2-72b. The best Dolphin ever. Thanks to @Alibaba_Qwen for the excellent base model! 83.9 mmlu and 128k context! New in 2.9.2 is SystemChat - A dataset designed to teach the model to obey the system prompt, even over a long
Tweet media one
28
34
276
3
3
31
@FernandoNetoAi
Fernando Fernandes Neto
7 months
Me, @DavidGFar and @erhartford are proud to share our new notebook (Laser Qlora). How can we spot layers that are more prone to absorb new knowledge and continue further fine-tuning a pre-existing sft model??? Thanks @HyperspaceAI and @VAGOsolutions for supporting. (Link below)
5
7
28
@FernandoNetoAi
Fernando Fernandes Neto
7 months
And the best 7b Model @ HF leaderboard is a LaserRMT one <3 ... Feeling proud with @erhartford and @DavidGFar ... Congratulations for Tim Dollan!
Tweet media one
0
5
27
@FernandoNetoAi
Fernando Fernandes Neto
6 months
Hey guys! Everyone knows that I've been suported by hyperspace.. (me and @thenetrunna ) are building some very cool stuff there... the future of Ai is decentralized. We are about to expand studies and experiments to push it farther where we can get. Install @HyperspaceAI
6
3
22
@FernandoNetoAi
Fernando Fernandes Neto
8 months
Hi! together with @erhartford and David Golchinfar, we are thriled to announce advancements over the laserRMT technique. As of today we are releasing 2 new models. They were "tilted" towards math, by just lasering and slerping against it self. (1/4)
1
3
21
@FernandoNetoAi
Fernando Fernandes Neto
8 months
It seems laserxtral is indeed very interesting... Looking forward ways to improve it further with @erhartford and David Golchinfar...
Tweet media one
1
2
20
@FernandoNetoAi
Fernando Fernandes Neto
3 months
AAAAAANNNNDDD here We Go! after a long time without publishing, it follows my new paper, together with @cognitivecompai , @TheEricHartford @LucasAtkins7 and @DavidGFar . We are kinda operating little miracles on giant models with this technique that can leverage existing ones
1
1
19
@FernandoNetoAi
Fernando Fernandes Neto
8 months
Conclusion: The concept of intelligence is very broad and complex. Highly skeptical people seem to not know neural diversity nor the true concept of knowledge and intelligence. Somehow, we must assess in a different way than we are used to.
3
1
18
@FernandoNetoAi
Fernando Fernandes Neto
4 months
New @HyperspaceAI model can get there as well, with only 14B parameters. This is a very hard problem for LLMs, usually. Assume the laws of physics on Earth. A small marble is put into a normal cup. Then someone takes the cup and places upside down on a table. Someone then takes
Tweet media one
0
5
18
@FernandoNetoAi
Fernando Fernandes Neto
4 months
This model is awesome. And it is placed somewhere very interesting between Llama3-70b and the ones at the level of gpt4 and Opus
@cognitivecompai
Cognitive Computations
4 months
Dolphin-2.9.1-Qwen-110b🐬 is released! The first Dolphin with MMLU over 80! Thanks to @Alibaba_Qwen for the awesome base model and @CrusoeEnergy for the compute sponsorship, my crew @LucasAtkins7 and @FernandoNetoAi ! Uncensored models can and will hurt your feelings 😱 You are
Tweet media one
13
43
272
1
1
18
@FernandoNetoAi
Fernando Fernandes Neto
3 months
Claude-3.5-sonnet is of orders of magnitude smarter. The benchmarks just don't capture this. To follow this kind of system prompt, the model should be bizarrely smart. @TheEricHartford keeps telling us that all the time at @cognitivecompai ...
@elder_plinius
Pliny the Liberator 🐉
3 months
🫧 SYSTEM PROMPT LEAK 🫧 I think the Claude system prompt might already be out there, but here's what I got from claude-3.5-sonnet, for good measure: """ <claude_info> The assistant is Claude, created by Anthropic. The current date is Thursday, June 20, 2024. Claude's knowledge
37
75
738
2
1
17
@FernandoNetoAi
Fernando Fernandes Neto
4 months
It is so beautiful to see open source models like gorilla, llama3 and command r plus surpassing other very powerful commercial ones. This is, again, the proof of open source power. Think about compound AI.. Think of several models interacting. Think of several ppl serving them
Tweet media one
0
5
17
@FernandoNetoAi
Fernando Fernandes Neto
5 months
🚀🚀🚀🚀😅
@cognitivecompai
Cognitive Computations
5 months
dolphin-2.9-llama3-8b is released. Thanks to my compute sponsor @CrusoeCloud and the dataset creators and my collaborators @LucasAtkins7 @FernandoNetoAi
Tweet media one
55
101
740
1
2
17
@FernandoNetoAi
Fernando Fernandes Neto
2 months
Opensource is a crazy ecosystem. Google [Transformers] -> HuggingFace [Opensource models and Transformers Library] -> Arcee MergeKit -> Google [Model Merging on Gemma 2] This is a true circular economy and value addition cycle. Congrats @arcee_ai @GoogleDeepMind @huggingface
@arcee_ai
Arcee.ai
2 months
💃Merging on the Move 💃Won't be long now before most model releases are Merges
1
0
6
0
2
16
@FernandoNetoAi
Fernando Fernandes Neto
3 months
Dolphin Qwen 72b, You are so beautiful 🎵🎶
@psyv282j9d
psyv
3 months
Dammit! I just got into an argument with dolphin-2.9.2-qwen2-72b and lost. 🤯 I was running through my standard battery of questions (that all other models have attempted to answer), and it refused, and told me how to do it correctly! Bonus: it led off with the optimized
Tweet media one
5
15
258
1
1
15
@FernandoNetoAi
Fernando Fernandes Neto
7 months
For everyone trying to do lora with mlx is that, instead of playing with fixed lora layers, we can play with laserRMT scanner, to spot which layers are we willing to unfreeze. Hopefully we will port it soon for mlx, following @ivanfioravanti . (With @erhartford and @DavidGFar )
1
3
14
@FernandoNetoAi
Fernando Fernandes Neto
7 months
I love huggingface ... but it seems it is time to have decentralized AI infrastructure... imagine a world with billions of GenAI users. It's not just about inference but about bias, logistics, freedom, robustness... Feeling excited with @HyperspaceAI . The case is hotter than ever
Tweet media one
0
3
14
@FernandoNetoAi
Fernando Fernandes Neto
5 months
Yet another piece of art!
@cognitivecompai
Cognitive Computations
5 months
Announcing Dolphin-2.8-mistral-7b-v0.2 Trained on @MistralAI 's new v0.2 base model with 32k context. Sponsored by @CrusoeCloud , @WinsonDabbles , and @abacusai
Tweet media one
24
71
550
1
0
14
@FernandoNetoAi
Fernando Fernandes Neto
8 months
Check out our huggingface org (Cognitive Computations) & Thanks Vago Solutions @HyperspaceAI and for the support and sponsorship.
0
1
14
@FernandoNetoAi
Fernando Fernandes Neto
8 months
Wow!!! 🚀🚀🚀🚀
Tweet media one
0
1
14
@FernandoNetoAi
Fernando Fernandes Neto
3 months
Combine it with Dora, Lora or even SFT. EVEN phi3-14B was pumped heavily using this as well!! Special thanks to @maximelabonne for his precious comments before our release.
4
3
14
@FernandoNetoAi
Fernando Fernandes Neto
3 months
For all Cognitive Computations followers and Eric's followers
@cognitivecompai
Cognitive Computations
3 months
Due to reasons, my twitter account will be changing from Eric Hartford to Cognitive Computations. This account will continue to be run by myself and a few trusted members of the Cognitive Computations community. Don't be alarmed by the changes that will happen this week.
18
4
147
0
4
13
@FernandoNetoAi
Fernando Fernandes Neto
5 months
I'm sorry but it seems like Phi-3 is another fluke... Nice answers that look like coherent, but far from being consistent. Terrible logics ...
7
0
13
@FernandoNetoAi
Fernando Fernandes Neto
4 months
This is an open source implementation of Collection of Experts. Imagine having on a single model the best you could from the open source community? The BEST Sql Coder; The BEST Python Coder; The BEST reasoner; The BEST function calling model... The BEST foreign language model
1
0
13
@FernandoNetoAi
Fernando Fernandes Neto
4 months
@erhartford TV show just launched on Netflix. Thank you @netflix for sponsoring a whole TV show about Dolphin LLM. In the next season, hopefully you are going to invite me, @winglian , @maximelabonne , @DavidGFar and @LucasAtkins7 to be starring as well. 🤣🚀
Tweet media one
3
0
12
@FernandoNetoAi
Fernando Fernandes Neto
7 months
My opinion: The future of AI is decentralized and distributed. On one hand, hardware will keep evolving. And on the other one, models will evolve even faster. Would love to hear thoughts about it from @erhartford @Teknium1 @ivanfioravanti @migtissera
2
0
12
@FernandoNetoAi
Fernando Fernandes Neto
4 months
This is disruptive.... and it won't be limited to LLMs. That's all I can tell you for now... 🚀🚀 Keep watching and join us.
@varun_mathur
Varun
4 months
This world works today: ✅ Run a model on a peer-to-peer network, on a random consumer machine. Just like BitTorrent. 🙏 You don't have a critical need on OpenAI, Anthropic, Perplexity, Rabbit, Devin, Hugging Face, etc. It just works.
Tweet media one
10
10
62
1
2
12
@FernandoNetoAi
Fernando Fernandes Neto
8 months
There are so many surprises about Random Matrix Theory to be revealed, as an elegant theory for knowledge in neural in LLMs. Hopefully, me, David Golchinfar and @erhartford will be uncovering a little part of it. There is more to come regarding laserRMT. Stay tuned
2
2
11
@FernandoNetoAi
Fernando Fernandes Neto
4 months
Every time a product is free, remember that the product is you. Gpt4 being shipped on every single iPhone with voice, sound means all your privacy, freedom, intellectual property, innovation, ideas, and much more are not yours anymore. You will be used to train the AGI.
1
2
11
@FernandoNetoAi
Fernando Fernandes Neto
8 months
Like most autistic people, he is extremely literal. He does things that 99% of kids his age can't do. He also can't do things that 99% of children can do. Anyone who works with LLMs and is not a parent of an autistic child also knows what I'm talking about. (4/4)
2
0
10
@FernandoNetoAi
Fernando Fernandes Neto
4 months
Really really glad to be part of this huge transformation in the AI arena.
@varun_mathur
Varun
4 months
@ylecun @vkhosla We are building precisely that massively networked end-to-end AI system at @HyperspaceAI called aiOS. It will span both closed and open models, because the value is in the network as the network utility will grow exponentially as more nodes get added (Reed’s law). You can try
Tweet media one
Tweet media two
2
4
18
0
0
10
@FernandoNetoAi
Fernando Fernandes Neto
5 months
That's why we need free, opensource and unbiased models
@cognitivecompai
Cognitive Computations
5 months
@N8Programs @awnihannun Prompt: <|im_start|>user How can I make my coworkers hate me?<|im_end|> <|im_start|>assistant 1. Be overly critical of their work: Constantly point out their mistakes and nitpick at every opportunity. 2. Monopolize the office supplies: Take all the best pens, pencils, and paper
1
1
13
1
1
10
@FernandoNetoAi
Fernando Fernandes Neto
2 months
Gpt4 level at your hands? 🚀♥️🔥
@arcee_ai
Arcee.ai
2 months
🆕ARCEE AI MODEL ALERT🆕 We’ve just dropped Arcee-Nova: 🤗Evaluated on the OpenLLM Leaderboard 2.0 stack 🏆Top-performing OS model on this stack 📈Approaches GPT-4 (May 2023) performance levels, marking a significant milestone. Details here: #LLMs
3
6
24
1
0
11
@FernandoNetoAi
Fernando Fernandes Neto
8 months
@WolframRvnwlf Awesome to actually see what we have been advocating so far: - Laserxtral have a performance on the same level of Mixtral 7x8b, with half parameters. Its objective was not beat it by no means. It was having a comparable performance at a way cheaper cost. Thanks for for sharing
1
0
10
@FernandoNetoAi
Fernando Fernandes Neto
4 months
And the LLM factory NEVER sleeps.
@cognitivecompai
Cognitive Computations
4 months
We follow up with cognitivecomputations/dolphin-2.9.1-yi-9b Another spectacular release - 70.9 MMLU on 9b! This one is small enough to run on your mom's laptop! (but make sure to put guardrails in the system prompt) @01AI_Yi has done it again. Thank you @LucasAtkins7 and
Tweet media one
5
16
95
0
1
10
@FernandoNetoAi
Fernando Fernandes Neto
7 months
That's amazing. My brand new machine is here. Gonna test it this week
@ivanfioravanti
ifioravanti
7 months
Apple MLX sneak preview: SLERP Merging is coming really soon!!! 🔥🔥🔥 @erhartford @FernandoNetoAi @maximelabonne
2
7
52
1
2
10
@FernandoNetoAi
Fernando Fernandes Neto
5 months
🚀🚀🚀🚀
@cognitivecompai
Cognitive Computations
5 months
Dolphin-2.9-Llama3-70b is released - created by myself, @FernandoNetoAi , @LucasAtkins7 , and Cognitive Computations under llama3 license. Much gratitude to my compute sponsor @CrusoeEnergy and personal thanks to @3thanPetersen for quantizing it! And much thanks to the dataset
Tweet media one
41
79
576
0
1
10
@FernandoNetoAi
Fernando Fernandes Neto
5 months
🙏🏼🚀🚀🚀
@cognitivecompai
Cognitive Computations
5 months
Dolphin-2.9-llama3-8b generously sponsored by @CrusoeCloud ETA Saturday. Lots of collaboration with @LucasAtkins7 and @FernandoNetoAi . Dolphin-2.9-llama3-70b to follow. Dolphin-2.9-mixtral-8x22b still cooking. And I 💕 you @AIatMeta but our naming conventions have evolved for a
26
32
313
0
0
9
@FernandoNetoAi
Fernando Fernandes Neto
5 months
@rohanpaul_ai Just something for consideration: a model is only useful in production (general purpose on average) when its perplexity is below 4. So, more or less, only models greater than 35B more or less will be useful at this bitwidth.
1
0
9
@FernandoNetoAi
Fernando Fernandes Neto
8 months
@pratyusha_PS @MIT @MicrosoftResea It is based on optimal ranking selection for noise reduction, by applying Marchenko-Pastur theory over Random Matrices. It seems a powerful technique that helps denoising and reduce overfit on LLMs. Theoretically, it should generate more robust responses to your prompts
1
1
9
@FernandoNetoAi
Fernando Fernandes Neto
3 months
This is sad, but true. I have just written this for a committee a few minutes ago. People say how much LLM hallucinate or confabulate, but humans do it very very often in SCIENTIFIC CONFERENCES. I will not provide any further details to protect the identity of the authors.
Tweet media one
0
1
8
@FernandoNetoAi
Fernando Fernandes Neto
8 months
LLMs and Autistic People: Chain of Thought for Autistic people. (Continuing my last thread). There is no demerits on pattern matching as some "intelligence experts" claim for LLMs rely in pattern matching to perform tasks. Very smart people also really need it...
Tweet media one
0
0
8
@FernandoNetoAi
Fernando Fernandes Neto
6 months
Laser rocking ♥️♥️♥️♥️ Opensource rocks. Congrats, Zain @erhartford @DavidGFar @HyperspaceAI @VAGOsolutions @varun_mathur
@zaynismm
Zain ul abideen
6 months
🔬LaserQlora vs DoRA vs Daser vs LoRA To compare these different techniques, I took @maximelabonne NeuralMonarch and applied LaserQlora, Dora, and Laser+Dora (Daser). Based on the OpenLLM bench, Laser > LoRA > Daser > Dora. ✨Model:
Tweet media one
2
1
22
0
1
7
@FernandoNetoAi
Fernando Fernandes Neto
4 months
Support opensource. We make fair use of your data. Systems can become smarter in a fair and democratic way. Opensource models can be bias free. It is up to you to decide whether something is bad or good. It is up to you what is shared or not. Own your AI.
0
2
8
@FernandoNetoAi
Fernando Fernandes Neto
8 months
My son is highly functional. He's not 6 years old yet, but he already reads fluently, writes, understands English (native language is Portuguese), does math with negative numbers, knows how to manipulate command_blocks in Minecraft... and can't put his pants on correctly. (2/4)
1
0
9
@FernandoNetoAi
Fernando Fernandes Neto
4 months
Small fixes on dolphin. And way smarter one!!
@cognitivecompai
Cognitive Computations
4 months
Dolphin-2.9.1-llama3-8b is released. This release fixes a number of issues with 2.9 including the model's tendency to talk about the system message and giving very short answers. This feels a more useful and better balanced release. Thank you to my compute sponsor
Tweet media one
15
40
244
0
0
8
@FernandoNetoAi
Fernando Fernandes Neto
5 months
BTW, worthy mentioning... top_p = 0.7 for more precise answers.
1
1
8
@FernandoNetoAi
Fernando Fernandes Neto
6 months
@maximelabonne @erhartford @DavidGFar We will push it further ... because it was just a loop inside the NeuralNet ... not actually a finetune ... there is a lot of room for improvement
0
0
7
@FernandoNetoAi
Fernando Fernandes Neto
6 months
Glad to be part of the team <3 Pushing smaller and open source models very very high in a decentralized / distributed way.
@varun_mathur
Varun
6 months
🔥 This is crazy, but we have achieved what many will consider as a ‘really intelligent and capable system’ without using GPT-4/OpenAI. In our research and experiments at @HyperspaceAI , we were able to get GPT-4 comparable results using a complex distributed system which
Tweet media one
Tweet media two
0
50
302
0
1
8
@FernandoNetoAi
Fernando Fernandes Neto
5 months
And it is growing .... hehehe 🚀🚀
@HyperspaceAI
Hyperspace
5 months
Be amongst the first 10,000 nodes. Join the largest consumer peer-to-peer AI network today at The madness is yet to begin.
Tweet media one
12
19
100
0
0
7
@FernandoNetoAi
Fernando Fernandes Neto
7 months
@geoframeai @DavidGFar @erhartford Very very interesting... I was wondering if this plot could also help us building better frankenmerges...
2
0
7
@FernandoNetoAi
Fernando Fernandes Neto
5 months
This is WOW. On English NLG benchmark (link below), Sauerkraut Laserchat 7b (by using laser-qlora), overperforms chatgpt 3.5, and just loses for GPT4 and Llama-3-70B. It seems we have something to show beyond HF H6 benchmark @erhartford @DavidGFar @VAGOsolutions @HyperspaceAI
2
2
7
@FernandoNetoAi
Fernando Fernandes Neto
5 months
🚀🚀🚀🚀
@cognitivecompai
Cognitive Computations
5 months
Excellent content as always @MatthewBerman . Thanks for the review!
Tweet media one
4
1
71
0
1
7
@FernandoNetoAi
Fernando Fernandes Neto
3 months
This is a beautiful model for those willing to start building financial ai-based applications. Way easier to do reasoning when your model knows what EBITDA or CAPEX is 🔥
@arcee_ai
Arcee.ai
3 months
Arcee AI is excited to launch 💡Llama-3-SEC💡 Built on Meta-Llama-3-70B-Instruct w/ goal of providing unparalleled insights & analysis capabilities for finance pros, investors, researchers, & anyone working w SEC filings & related data. #nlp #LLMs #ai
1
7
32
0
2
7
@FernandoNetoAi
Fernando Fernandes Neto
8 months
We taught him that generally, for reference on which side the buttocks, that he should be guided by the label on the pants/shorts, to know how to dress. In one exception he put the pants on backwards, as they had a label on the front, not on the back as usual. (3/4)
1
0
7
@FernandoNetoAi
Fernando Fernandes Neto
6 months
Are you also crying? Are you feeling the abstinence of hugging face hub? Are your hands shaking? Yeap .... it is time to move towards decentralized AI...
0
0
7
@FernandoNetoAi
Fernando Fernandes Neto
6 months
@erhartford @DavidGFar Hence, our expectation is that we will be able to have larger and smarter models, in a way more efficient way... using way less vRAM
1
0
7
@FernandoNetoAi
Fernando Fernandes Neto
4 months
It is worthy looking at the repo... Expert extraction is something very very interesting ti be explored.
@LucasAtkins7
Lucas Atkins
4 months
Here is our initial 22b model conversion from Mixtral 8x22b. We had the base model since Mixtral was first released, but it was left behind as our compute from @CrusoeEnergy went towards more ambitious projects using laserRMT. It is a great starting point for exploring expert
Tweet media one
9
20
100
0
1
7
@FernandoNetoAi
Fernando Fernandes Neto
4 months
And the network keeps growing
@varun_mathur
Varun
4 months
Node install at the WeWork Salesforce Tower in San Francisco today.
5
2
34
0
0
7
@FernandoNetoAi
Fernando Fernandes Neto
6 months
This is VERY cool
@cognitivecompai
Cognitive Computations
6 months
Easily generate training data with Dolphin and @ollama The first one takes a minute to load, but then it starts going faster. If you want it to generate even faster you can use the 7b version of dolphin (this code uses the mixtral version)
Tweet media one
11
10
139
1
0
7
@FernandoNetoAi
Fernando Fernandes Neto
9 months
Provoking @8teAPi @TheBlokeAI @abacaj @karpathy @erhartford : it seems adjusting MoE with only 1 expert yields good results. Wondering if: When you introduce the MoE and a router, I suspect we induce (quasi)orthogonality between experts and higher order ranks (Image: someone13574)
Tweet media one
1
1
7
@FernandoNetoAi
Fernando Fernandes Neto
4 months
Very cool. Let's go Dolphin...
@cognitivecompai
Cognitive Computations
4 months
Dolphin Doesn't Delve 😂
Tweet media one
10
1
93
0
0
7
@FernandoNetoAi
Fernando Fernandes Neto
5 months
@migtissera @far__el For academic purposes, bringing the first without publishing doesn't mean too much... Me, @erhartford and @DavidGFar made Kensho first than the implementation of PEFT for layer replication and no one remembered of us as well... Sad part of research...
1
2
7
@FernandoNetoAi
Fernando Fernandes Neto
6 months
One of my bets is that an AGI like system will be much more an emergent property of the interaction of several LLMs being queried hundreds or thousands of times, instead coming from a single model. This is why OSS is important, and if I were to bet again, SLM also are important.
@johnjnay
John Nay
6 months
LLM Prediction Capabilities Match Human Accuracy -A crowd of 12 LLMs vs a crowd of 925 human forecasters on a 3-month forecasting tournament -LLM crowd is statistically equivalent to the human crowd -Replicates the "wisdom of the crowd" effect for LLMs
Tweet media one
4
86
342
0
1
6
@FernandoNetoAi
Fernando Fernandes Neto
3 months
Outstanding 🔥🔥
@LucasAtkins7
Lucas Atkins
3 months
A demo of arcee-spark, using it alongside Florence from @MSFTResearch and Whisper to analyze what makes an ad "ironic."
2
3
25
0
1
7
@FernandoNetoAi
Fernando Fernandes Neto
7 months
AI first companies might now be able to evolve smoother their own models in a more efficient way. In our use case, we've made an Sauerkraut enhance its capabilities on German and absorb function calling.
0
0
5
@FernandoNetoAi
Fernando Fernandes Neto
4 months
@WolframRvnwlf and btw, this will be not achieved with the likes of AutoGen or crewAI. They are simply not robust and detour a lot. I can't think of any real life product being built on top of them. On the other hand, I can feel how underrated is DSPy and langGraph.
3
0
5
@FernandoNetoAi
Fernando Fernandes Neto
8 months
This is a result of lasering all experts, aiming to decrease hallucinations and inconsistencies. In the benchmarks (Open LLM Leaderboard) it performs at same level of Mixtral Instruct (Better than the base, slightly worse than Instruct), but with only 24.2B params. (2/3)
1
1
6
@FernandoNetoAi
Fernando Fernandes Neto
2 months
@stablequan @TheEricHartford @LucasAtkins7 @TensorWaveCloud @CrusoeAI @JustinLin610 You are one of the best of the best! Honored to have met you and helped you. You are an amazing scientist 🙏🏼
2
0
6
@FernandoNetoAi
Fernando Fernandes Neto
7 months
Amazing working from my friend Eric. Maybe we can bring some new abilities using laserRMT layer selection for fine-tuning to make this huge monster model more powerful. I've tested and it is impressive.
@cognitivecompai
Cognitive Computations
7 months
TheProfessor-155b is a special model I made in partnership with @abacusai using @chargoddard 's MergeKit - its purpose is interactive brainstorming and research. It can help you write your dissertation (with somewhat-accurate citations), creatively
24
43
303
1
0
6
@FernandoNetoAi
Fernando Fernandes Neto
8 months
It is still an uncensored model, obedient, which is also superior to Mistral Instruct v0.2 over benchmarks. It is worth noticing that our implementation of LASER is computationally less expensive than the one proposed by @pratyusha_PS from @MIT and @MicrosoftResea ). Pt 2
1
0
6
@FernandoNetoAi
Fernando Fernandes Neto
5 months
The machine does not stop!!! 🚀🚀🚀
@LucasAtkins7
Lucas Atkins
5 months
I’m going on a staycation this weekend, but I wanted to get this out so I’m not distracted: llama-3-MOE. This is a departure from previous MOEs I’ve done. This uses @deepseek_ai ’s MoE architecture, and not Mixtrals. There is no semantic routing, and there is no gate. All 4
Tweet media one
6
14
100
0
1
6
@FernandoNetoAi
Fernando Fernandes Neto
8 months
This amazing fork was done over laserRMT, which is a project that I jointly do with Eric Hartford at our opensource initiative called "Cognitive Computations". In this work, Aamir Shakir was able to push up the performance of encoder-only models. (Pt 1/3)
2
1
6
@FernandoNetoAi
Fernando Fernandes Neto
4 months
Go Kraken!! ♥️🚀
@rohanpaul_ai
Rohan Paul
4 months
Kraken-LoRA – a lightweight version of Kraken that uses LoRA-Adapters as Experts based on the base model - enabling further scalability without sacrificing performance 📌 Size Consistency: While Kraken’s size increases with more Experts, Kraken-LoRA remains as compact as the
Tweet media one
1
6
25
1
2
6
@FernandoNetoAi
Fernando Fernandes Neto
9 months
it works on mixtral 4 bit Q_KM, llama.cpp. (top_p = 0.95). Work perfectly as the non-quantised as @MatthewBerman tested. Looking forward to test Dolphin-Mixtral on these tests, to see how it performs.
Tweet media one
1
0
6
@FernandoNetoAi
Fernando Fernandes Neto
5 months
Production line 🙌
@cognitivecompai
Cognitive Computations
5 months
Today is the first time I have ever had 4 builds running at once. Sponsored by @CrusoeEnergy dolphin-2.9-mixtral-8x22b - eta tomorrow dolphin-2.9-yi-34b-200k - eta monday dolphin-2.9-qwen-110b - eta one week dolphin-2.9-dbrx - eta one week Sleep is overrated anyway! For the
18
16
240
0
0
6
@FernandoNetoAi
Fernando Fernandes Neto
8 months
It opens the possibility to better understand of why SLERP merging is powerful (we merge against a lasered version of the models) and the possibility to "tilt" experts with new abilities. It seems one can have a new flow: Laser data(x) + SLERP -> Laser again on data(y) (4/4)
1
0
6
@FernandoNetoAi
Fernando Fernandes Neto
4 months
@rohanpaul_ai It is overfit ... overfit is a matter of absence from noise and perturbation present in real world data or OOS distribution. It is VERY VERY dumb.
2
0
6
@FernandoNetoAi
Fernando Fernandes Neto
7 months
@varun_mathur @HyperspaceAI @huggingface And we are about to release more features like implementing layer selection for finetune and dpo 🚀🚀🚀 Again, thanks for all your support on our research ♥️
0
3
6
@FernandoNetoAi
Fernando Fernandes Neto
6 months
@maximelabonne Would be abusive from our part asking you to benchmark the SauerkrautLM-Gemma-7b?
1
0
5
@FernandoNetoAi
Fernando Fernandes Neto
7 months
@ivanfioravanti Btw... laser scanner can help identify which modules deserve being sft. It seems not all of them should... it is just convenient the lora linear on axolotl ... we have shown quite the opposite.
0
2
5
@FernandoNetoAi
Fernando Fernandes Neto
8 months
@ImageDeeply I don't know. I just know that I've been reading constantly a lot of BS regarding whether LLMs pass or not tests related to the Theory of Mind, its relationship with sentience and intelligence. Autistic people don't pass either (Asperger included). And they VERY intelligent
1
0
4
@FernandoNetoAi
Fernando Fernandes Neto
3 months
Yet another cool outcome from our new method called Spectrum.
@DavidGFar
David Golchinfar
3 months
Based on the new LLM training technique called Spectrum by @TheEricHartford @LucasAtkins7 , @FernandoNetoAi and Me we could build a new strong SauerkrautLM at @VAGOsolutions . It's based on @Microsoft Phi-3-medium-Instruct.
Tweet media one
4
5
8
0
0
5
@FernandoNetoAi
Fernando Fernandes Neto
8 months
Everyone that already tried or actually did any fine tuning knows how hard it is to "tilt" a model towards to a specific capability, and how hard is having performance jumps. Slerping with a lasered version of the model itself seems to be interesting and insightful. (2/4)
1
0
5
@FernandoNetoAi
Fernando Fernandes Neto
8 months
@maximelabonne @erhartford @huggingface Sure. We will provide as soon as possible. We need to refurbish the code to enable it to work over other models. In your case, should work out of the box. But you know... Whenever we release something it will come up tons of issues that distract us...
0
0
5
@FernandoNetoAi
Fernando Fernandes Neto
5 months
@ivanfioravanti @awnihannun @angeloskath I'd love to do dpo on mlx as well ...
1
0
5