Rami @rami_mmo Twitter profile

Last Seen Profiles

@jordynseverns

@momohipop

@bokeplokalmalam

@rahilkeyan

@bokeplokalmalam

@40__00

@celestecapa

@Herodes96184866

@LynnRoden12

@AyakaKokub53719

@yaabeedaabedoo

@stw_pdg

@support7777777

@chxnnieblue

@TobiasCheung61

@benthaniqatar

@vagabond_sy

@cukienaknikmati

@bnt_almadina12

@danypagano

@FOXFOOTY

@NapiTimikacuki

@hellapenny

@mlleonart

@cunha_dil

@New_Era_News

@RBMcD56

@kxmogxlo

@glgrau

@RosarioMallari1

@bokeplokalmalam

@mysugarune

@Papuaviral1

@mhmurdock1

@KirbyCEO_BNB

@ECPPSFCC

Rami

@rami_mmo

2 months

@ExtremeBlitz__ 6:00 PM , a different kind of nirvana kicks in

1

7

245

Rami

@rami_mmo

2 months

@Rothmus these languages have a similiar indoeuropean root, thats why, not much to read into it :3

0

60

Rami

@rami_mmo

4 months

@thu_rs_day do not sing to it

3

0

52

Rami

@rami_mmo

3 months

euclidean distance between clip image embeddings of person1 and rat is smaller than person 1 and person 2

20

0

46

Rami

@rami_mmo

2 months

V1.0 weight subcloning implementation (), i'm gonna get back to this soon, i wanna extend this to general transformers and stuff.

GitHub - SonicCodes/subcloning: implementation of https://arxiv.org/pdf/2312.09299

implementation of https://arxiv.org/pdf/2312.09299 - SonicCodes/subcloning

github.com

1

30

Rami

@rami_mmo

1 month

Traditional hypernetworks seem to be good conditioning mechanisms, did it for fun on a larger UNET DiT i was working on and got more adherence on conditioners than AdaIN, so had to make sure it was consistent, and it seems to be

0

1

23

Rami

@rami_mmo

15 days

the guy worked so hard for us, he wrote in reverse so that we can read correctly, legend...

2

1

21

Rami

@rami_mmo

6 months

@arithmoquine @sigmoid_male I feel like germ theory should be an emergent property, rather than being dropped, causing a mini-plague could be much more helpful to humanity than telling them the cure to something that's not a pressing issue for them.

1

0

18

Rami

@rami_mmo

12 days

gfodor.id

@gfodor

12 days

@nikitabier @yacineMTB

0

17

0

1

19

Rami

@rami_mmo

6 months

New features for the @weights_biases mobile client: - Media Browser Carousel: for audio & image medias. - Realtime vibration on new data. @verbocado not sure if this is what you meant :), also tagging @morgymcg

4

6

16

Rami

@rami_mmo

3 months

i'm so into diffusion models, i love diffusion so much, i'm so in awe of what it's been giving me these last couple of days.

2

0

13

Rami

@rami_mmo

11 days

@cloneofsimo 👀 tanishq is actually a 40y old larping around

1

0

14

Rami

@rami_mmo

2 months

Experimented with randomly mixing weights of block's of gpt-2,

1

0

13

Rami

@rami_mmo

2 months

Did a quick experiment w/ soft-clustered positional encodings. My hope was to reduce strain on the self-attention by allowing malleable & contextual positional encodings. It looked interesting, but I don’t know if it’s already a well-known/used thing though...

1

0

13

Rami

@rami_mmo

2 months

i wanna cluster embedding features like this

Ishan

@radshaan

2 months

Information is so beautiful

19

80

1K

1

12

Rami

@rami_mmo

3 months

Nice way to force your diffussion model to use a conditioner is having it reconstruct it alongside the latent, so two branches, latent and conditioner, it also is a surprisingly good indicator of how well the visuals are coming up.

2

0

11

Rami

@rami_mmo

11 days

what are some good methods to ground/anchor yourself to reality? regarding thoughts and stuff...

4

0

11

Rami

@rami_mmo

2 months

I'm loving this new permute i got going...

0

11

Rami

@rami_mmo

2 months

what would i do without the ball hypersphere diagrams?

2

0

11

Rami

@rami_mmo

11 days

:/ this is not true, I've done both, if what you're doing is training MNIST it's obviously easy, but trying doing any sufficiently complex stuff requires long and dreadful research that I've never had to do before, and the amount of experiments I have to do is soo much :(

kache

@yacineMTB

12 days

UI client dev is harder than ML

160

91

2K

1

0

11

Rami

@rami_mmo

2 months

Just finished, DinoV2<>CLiP image embeds. I get about 84%-94% of CLIP's performance on oneshot image & text retrieval on MSCOCO, Flickr8k & Flikcr30k eval sets using CLiP_benchmark

GitHub - SonicCodes/dinov2-clip: dinov2 features aligned with CLIP

dinov2 features aligned with CLIP. Contribute to SonicCodes/dinov2-clip development by creating an account on GitHub.

github.com

Rami

@rami_mmo

2 months

ok, so, who is training/trained DinoV2 <> CLIP text embedding model?

0

4

0

1

11

Rami

@rami_mmo

1 month

He was so brave to say this out loud, I don't think he should be excommunicated...

2

0

11

Rami

@rami_mmo

3 months

There is no rational approach to the world

2

0

10

Rami

@rami_mmo

29 days

@cloneofsimo POV: scale/norm issue fixed and model is training normally now

0

10

Rami

@rami_mmo

3 months

Meta's Multi-Token model is an example of how data can fit dots, saw this concept a while ago & found it a bit weird how they would constrain causality between the 4 tokens, ~~ but there's no specific reason a 4-token ahead prediction would be less capable/complex than a 1-token.

1

0

10

Rami

@rami_mmo

11 days

this, for some reason seems that it's much more pronounced in this field than the rest, I just don't know why...

xjdr

@_xjdr

11 days

AI is full of people that think what is difficult is valuable, lol

19

420

2

0

10

Rami

@rami_mmo

3 months

nice to see more research on 3d diffusion

Dreaming Tulpa 🥓👑

@dreamingtulpa

3 months

WonderWorld can generate interactive 3D scenes from a single image and a text prompt in less than 10 seconds on a single A6000 GPU! A dynamically while navigating the environment 🔥

13

103

545

1

0

8

Rami

@rami_mmo

26 days

@cloneofsimo on a level, i sorta feel like the methods that failed to scale sorta got mutated to be the scalable ones, like diffusion-alike processes was used in practice through MAEs,

2

0

9

Rami

@rami_mmo

3 months

one more...

Rami

@rami_mmo

3 months

i moved a model to unconditional diffusion from conditional, but me forgot to remove conditioner application on eval render, CLIP looks like this without learned conditioning...

0

5

2

0

8

Rami

@rami_mmo

3 months

@AlbyHojel coffee from a beaker 🙃🙃

0

9

Rami

@rami_mmo

4 months

@bubblebabyboi no way, grimes is mourning parm :(

0

9

Rami

@rami_mmo

1 month

@simonlast :/ will age very badly indeed, much of the details are sufficiently superfluous and with no technical supervision, regulatory capture will not only have a much worse outcome for the general public but also for the market

0

9

Rami

@rami_mmo

4 months

@snoopsonar @prmshra @PicoPaco17 nooo, i had enough twitter drama for a month, i'm just glad parm is back.

1

9

Rami

@rami_mmo

5 months

Facebook engagement farming 101: ✅ Black child ✅ Jesus ✅ "Made it with my own hands!"

1

9

Rami

@rami_mmo

2 months

Could MMCR be used as an alternative to typical contrasive losses? generally seems more elegant than those tricks i had to do for my contrasive setup.

2

0

7

Rami

@rami_mmo

2 months

inits uhmm, inits, and also good lr, uhmmmm....

1

0

8

Rami

@rami_mmo

5 months

@AartBik Nvidia worked hard to build the ecosystem, wouldn't be confused why they got such a huge monopoly

0

8

Rami

@rami_mmo

9 days

0

8

Rami

@rami_mmo

13 days

Me just realising why eps prediction in diffussion models doesn't work without full from gate skip connection , so like ffn encoders/decoders can only do x-prediction, sorta funky I'm just understanding this is what's so damn good about UNET & Transformers

0

8

Rami

@rami_mmo

1 month

@yacineMTB i view gradients as intrinsic subspaces, everything optimizes for lower energy (rest state), even predictive coding (which is widely believed to be the brain's optimization mechanism) has an approximated gradient, and maximal information propagation (critical state) seems to be

0

8

Rami

@rami_mmo

3 months

me at my work desk

Ethan (boston til 18th)

@Ethan_smith_20

3 months

me at my work desk

1

0

11

1

8

Rami

@rami_mmo

1 month

i'm just learning jupyter notebooks on vscode support debugging, 👀 where u been my whole life

0

7

Rami

@rami_mmo

2 months

i hate how communication is the single point of failure for humans

0

6

Rami

@rami_mmo

5 months

i'm starting to think @repligate & @ryunuck are doing a much better job than the LLM arena, as entertaining as it is, their methods arr incredible and valuable.

0

1

7

Rami

@rami_mmo

1 month

kurzgesagt did indeed fall off, it used to be one of the small number of "scientific" channels I loved on YouTube, now watching their latest video in about a year, all I felt was cringe at the stupidity of the arguments raised, cest la vie

Kurzgesagt

@Kurz_Gesagt

1 month

Humanity's smartest invention might also be its last. Superintelligent AI could be our dream come true – or our worst nightmare. Watch our latest video to find out what it could mean for the future of our species:

126

139

1K

0

7

Rami

@rami_mmo

3 months

this is why i'm in love with diffusion right now, something new comming soon.... (i don't wanna spoil yet :3)

0

1

6

Rami

@rami_mmo

3 months

ok, new rabbithole unlocked i think...

Dreaming Tulpa 🥓👑

@dreamingtulpa

3 months

Human 3Diffusion can reconstruct realistic avatars from a single RGB image, achieving high-fidelity in both geometry and appearance. Links ⬇️

2

13

119

0

7

Rami

@rami_mmo

4 months

me in gcs

1

0

7

Rami

@rami_mmo

2 months

autoregressive modelling on continious spaces turned out to be much harder than i thought

2

0

7

Rami

@rami_mmo

2 months

for anyone looking for a usecase: you can use my projection model to finetune (very quickly/efficiently) stable diffusion controlnets to use CLiP & DiNOv2 embeddings at the same time, you can have best of the both worlds... Multi-Modality & self-supervised features

Rami

@rami_mmo

2 months

Just finished, DinoV2<>CLiP image embeds. I get about 84%-94% of CLIP's performance on oneshot image & text retrieval on MSCOCO, Flickr8k & Flikcr30k eval sets using CLiP_benchmark

0

1

11

0

7

Rami

@rami_mmo

2 months

it's geniunely crazy to me how much free knowledge there is everywhere

0

7

Rami

@rami_mmo

8 days

i wonder why diffusion steps aren't expected to have exposure bias...

1

0

7

Rami

@rami_mmo

4 months

@Grimezsz this is obv an endorsement to parm lol

0

7

Rami

@rami_mmo

10 days

@Teknium1 this was something that was suggested in a reddit thread a bit back, for plausible LTM-2 architecture,

Retentive Network: A Successor to Transformer for Large Language Models

In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance....

arxiv.org

1

0

7

Rami

@rami_mmo

5 months

@garrytan let it survive, makes you look cool

0

7

Rami

@rami_mmo

3 months

I trained an SAE both with L1 and this L0 approx, and i find the latter smoother than L1, especially for MSE, on the graphs "sp" is sparsity from 0-100% (tops at 96.8% on mine over batch), sp_loss is the l0 , with scale of 1000 , eps of 1e-3 and alpha of 0.03.

Ethan (boston til 18th)

@Ethan_smith_20

3 months

@HessianFree It works although they say we ultimately want a penalty that selects for active/inactive. L1 penalizes total magnitude which is why they call it a crude approximation. I’ve had good results with both l1 and trying some of the above though!

2

0

3

0

1

7

Rami

@rami_mmo

2 months

just found this on a dataset i had,

2

1

7

Rami

@rami_mmo

2 months

clip text and image embeddings shows a similiar reconstruction error trend ,

1

0

7

Rami

@rami_mmo

3 months

ok, i think i am allowed to share this, person_1, person_1_2 pictures 👇, i hold no opinion of rihanna, i'm just working...

Rami

@rami_mmo

3 months

@_Stocko_

1

0

3

0

6

Rami

@rami_mmo

2 months

@cloneofsimo hmm, didn't the chameleon paper state this is generally unstable? they said there was a push and pull happening between the text and image modalities, each sabotaging the other by adjust its norm, they used a query-key normalization, how did they solve it?

1

0

7

Rami

@rami_mmo

5 months

@pmddomingos applies for you aswell sire

0

7

Rami

@rami_mmo

3 months

0

7

Rami

@rami_mmo

4 months

improved code is doing 5:1 compression ratio , i.e 197kb -> 39.4kb , about same as waggie :D

Martin Shkreli

@MartinShkreli

4 months

who has had luck with the neuralink challenge? i spent about 90 mins on it and get 80% deflation eg 197kb wav --> 40kb

23

2

114

1

6

Rami

@rami_mmo

3 months

i need more compute

1

0

6

Rami

@rami_mmo

14 days

Chat probabilistic tokenisation is going to accurately predict there are 3 "r"s in "strawberry"

Improving Self Consistency in LLMs through Probabilistic Tokenization

Prior research has demonstrated noticeable performance gains through the use of probabilistic tokenizations, an approach that involves employing multiple tokenizations of the same input string...

arxiv.org

2

1

6

Rami

@rami_mmo

4 months

I never thought parm would be a meme like this

andi (e/alb)

@Nexuist

4 months

ROON! YOU’RE ON THE WRONG ACCOUNT ROON! DON’T HIT RETWEET ON THAT ROON! ROOOONN

27

38

1K

0

6

Rami

@rami_mmo

1 month

I'm not e/acc, never been, neither I'm stating kurtzgsagt is a decel

Rami

@rami_mmo

1 month

kurzgesagt did indeed fall off, it used to be one of the small number of "scientific" channels I loved on YouTube, now watching their latest video in about a year, all I felt was cringe at the stupidity of the arguments raised, cest la vie

0

7

0

5

Rami

@rami_mmo

2 months

why am i not seeing much work on letting diffusion models learn latent conditioning vectors by themselves, i'm seeing a lot of 3< y old work on GANs for that

1

0

6

Rami

@rami_mmo

3 months

super cool work, i made the VAE's latent space a bit sparse and got fun results, i didn't expect text at all to appear with a person's image above, so much potential here

Ethan (boston til 18th)

@Ethan_smith_20

3 months

increased model dim to 512 from 256 and put it through another 50k steps.

2

0

16

0

6

Rami

@rami_mmo

3 months

wait what?

0

5

Rami

@rami_mmo

4 months

decoder now learning to decode the jointly learned embeddings into token space :D

Rami

@rami_mmo

4 months

joint-embedding transformers are somewhat cool, blue is positive, orange is negative

0

3

0

5

Rami

@rami_mmo

5 months

@mallocmyheart @mallocmyheart we gotta speak, like RN, I have something cool that worked on RL,

2

0

6

Rami

@rami_mmo

3 months

better naming, spfalse_lr0.0001 is gelu, leakyrelu(slope=0.02)^3 was best performing one

Ethan (boston til 18th)

@Ethan_smith_20

3 months

some tests with a mini ViT on CIFAR10... So the relu-square hype seems pretty legit? this came out in 2022 and we all just totally overlooked it? Not only that... for shits and giggles tried relu^3 and leaky-relu^3 which both well outperform GELU and similarly to relu square.

10

2

87

0

6

Rami

@rami_mmo

4 months

@ICBMinvestments he must have a very flexible neck, what's his secret?

0

Rami

@rami_mmo

3 months

those silly times, 💎🙌 it is

0

6

Rami

@rami_mmo

1 month

@katawaridokii Yo try this:

CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning

Image captioning task has been extensively researched by previous work. However, limited experiments focus on generating captions based on non-autoregressive text decoder. Inspired by the recent...

arxiv.org

1

0

6

Rami

@rami_mmo

4 months

great motivator

Mark Granza

@markgranza

4 months

I often think about this.

64

2K

23K

0

6

Rami

@rami_mmo

3 months

ahh, this is weird....

2

1

6

Rami

@rami_mmo

3 months

@bubblebabyboi will miss your twitter antics bubble.... :( :( :(

1

0

6

Rami

@rami_mmo

5 months

@ryunuck I noticed something in the sequence, when you give Claude Opus on a Latin based sequence it simply rejects interacting with it by saying it could contain harmful content and it doesn't understand/ doesn't want to process,

Claude

Talk with Claude, an AI assistant from Anthropic

claude.ai

1

5

Rami

@rami_mmo

6 months

@ylecun @PicoPaco17 TIL some people don't have internal monologue:

Champagne Joshi

@JoshWalkos

6 months

This is a fascinating conversation with a girl who lacks an internal monologue. She articulates the experience quite well.

760

2K

12K

0

6

Rami

@rami_mmo

2 months

some deets, left is the top prob for sequence, left is how it's distributed over the batch

0

6

Rami

@rami_mmo

4 months

websim 👀

0

5

Rami

@rami_mmo

1 month

many congrats to @Ethan_smith_20 & team on joining canva!

AI Image Generator - Create Art, Images & Video | Leonardo AI

Transform your projects with our AI image generator. Generate high-quality, AI generated images with unparalleled speed and style to elevate your creative vision

leonardo.ai

Ethan (boston til 18th)

@Ethan_smith_20

1 month

I am so happy to announce today has joined the Canva family. It’s been one hell of a journey and I don’t think I could have imagined a better team to work alongside. I am absolutely psyched to be building at a scale I’ve only gotten to dream of, and

22

7

128

0

6

Rami

@rami_mmo

3 months

I've been very lucky to have the friends I do

0

5

Rami

@rami_mmo

4 months

I think VoiceEngine & the large-scale video compression network that was trained for SORA have been extensively used

Rob Haisfield

@RobertHaisfield

4 months

Native voice input instead of transcribed text means that the training data probably picked up more emotional intelligence. It’ll be really trippy hearing GPT-4o express what it feels. Sounds like it’s constantly excited, but what happens when it gets frustrated?

1

0

11

0

6

Rami

@rami_mmo

10 days

god damn, seeing CFG work outside of image generation is crazy

0

6

Rami

@rami_mmo

3 months

side note: just found out the Arabic word for diffusion is "scattering" & "dispersion" , I find it interesting cause, in other languages I know of it's mostly something that's done by others (actions), but here it's defined as a natural process,

AKHIL 🪡

@fkasummer

3 months

diffusion still underrated

0

4

0

6

Rami

@rami_mmo

2 months

I love how absolutely random but relevant my twitter timeline has become

0

5

Rami

@rami_mmo

3 months

I wanna make chaotic diffusion 3

0

5

Rami

@rami_mmo

5 months

@tszzl Only show that talks about ML/AI, and then shows scenes writing a loss functions / models. cudos whoever made that happen...

1

0

5

Rami

@rami_mmo

5 months

got 18/36

Mehran Jalali

@mehran__jalali

5 months

ChatGPT gets 24/36 on this test Higher than some people I know

2

0

64

0

5

Rami

@rami_mmo

3 months

@mallocmyheart pov: dataset bled and hayden hasn't realized it yet

1

0

5

Rami

@rami_mmo

3 months

why does conditioner application on diffusion networks have goldilock zones like this?

1

0

5

Rami

@rami_mmo

5 months

Fear is the mind-killer. Fear is the little-death that brings total obliteration. I will face my fear. I will permit it to pass over me and through me.

1

0

5

Rami

@rami_mmo

3 months

idk how to interpret losses of diffusion models, they move so little over thousands of steps, i think this might have to do with the leniency they have, in theory for MSE, they can learn to displace the noise to less-magnitude sensitve parts the output just to satisfy the loss..

0

5

Rami

@rami_mmo

3 months

@teortaxesTex @Ethan_smith_20 lmao, yes, someone mentioned your post on the ml chat gc, i can add you if you want...

1

0

4

Rami

@rami_mmo

5 months

@emerywells i think all companies want to streamline, but it's not as easy as it's for a small company, i for ex. used to launch products with large companies, there's legal, marketing budgets, brand considerations, UX consideration, but trust me, everyone i worked with wants o

1

0

5