Rami Profile Banner
Rami Profile
Rami

@rami_mmo

495
Followers
188
Following
177
Media
1,537
Statuses

waiting on training runs...

Joined March 2018
Don't wanna be here? Send us removal request.
@rami_mmo
Rami
2 months
@ExtremeBlitz__ 6:00 PM , a different kind of nirvana kicks in
Tweet media one
1
7
245
@rami_mmo
Rami
2 months
@Rothmus these languages have a similiar indoeuropean root, thats why, not much to read into it :3
0
0
60
@rami_mmo
Rami
4 months
@thu_rs_day do not sing to it
Tweet media one
3
0
52
@rami_mmo
Rami
3 months
euclidean distance between clip image embeddings of person1 and rat is smaller than person 1 and person 2
Tweet media one
20
0
46
@rami_mmo
Rami
2 months
V1.0 weight subcloning implementation (), i'm gonna get back to this soon, i wanna extend this to general transformers and stuff.
1
1
30
@rami_mmo
Rami
1 month
Traditional hypernetworks seem to be good conditioning mechanisms, did it for fun on a larger UNET DiT i was working on and got more adherence on conditioners than AdaIN, so had to make sure it was consistent, and it seems to be
Tweet media one
Tweet media two
0
1
23
@rami_mmo
Rami
15 days
the guy worked so hard for us, he wrote in reverse so that we can read correctly, legend...
Tweet media one
2
1
21
@rami_mmo
Rami
6 months
@arithmoquine @sigmoid_male I feel like germ theory should be an emergent property, rather than being dropped, causing a mini-plague could be much more helpful to humanity than telling them the cure to something that's not a pressing issue for them.
1
0
18
@rami_mmo
Rami
12 days
Tweet media one
@gfodor
gfodor.id
12 days
Tweet media one
0
0
17
0
1
19
@rami_mmo
Rami
6 months
New features for the @weights_biases mobile client: - Media Browser Carousel: for audio & image medias. - Realtime vibration on new data. @verbocado not sure if this is what you meant :), also tagging @morgymcg
4
6
16
@rami_mmo
Rami
3 months
i'm so into diffusion models, i love diffusion so much, i'm so in awe of what it's been giving me these last couple of days.
2
0
13
@rami_mmo
Rami
11 days
@cloneofsimo 👀 tanishq is actually a 40y old larping around
1
0
14
@rami_mmo
Rami
2 months
Experimented with randomly mixing weights of block's of gpt-2,
Tweet media one
Tweet media two
1
0
13
@rami_mmo
Rami
2 months
Did a quick experiment w/ soft-clustered positional encodings. My hope was to reduce strain on the self-attention by allowing malleable & contextual positional encodings. It looked interesting, but I don’t know if it’s already a well-known/used thing though...
Tweet media one
Tweet media two
Tweet media three
1
0
13
@rami_mmo
Rami
2 months
i wanna cluster embedding features like this
@radshaan
Ishan
2 months
Information is so beautiful
Tweet media one
19
80
1K
1
1
12
@rami_mmo
Rami
3 months
Nice way to force your diffussion model to use a conditioner is having it reconstruct it alongside the latent, so two branches, latent and conditioner, it also is a surprisingly good indicator of how well the visuals are coming up.
Tweet media one
2
0
11
@rami_mmo
Rami
11 days
what are some good methods to ground/anchor yourself to reality? regarding thoughts and stuff...
4
0
11
@rami_mmo
Rami
2 months
I'm loving this new permute i got going...
Tweet media one
0
0
11
@rami_mmo
Rami
2 months
what would i do without the ball hypersphere diagrams?
Tweet media one
2
0
11
@rami_mmo
Rami
11 days
:/ this is not true, I've done both, if what you're doing is training MNIST it's obviously easy, but trying doing any sufficiently complex stuff requires long and dreadful research that I've never had to do before, and the amount of experiments I have to do is soo much :(
@yacineMTB
kache
12 days
UI client dev is harder than ML
160
91
2K
1
0
11
@rami_mmo
Rami
2 months
Just finished, DinoV2<>CLiP image embeds. I get about 84%-94% of CLIP's performance on oneshot image & text retrieval on MSCOCO, Flickr8k & Flikcr30k eval sets using CLiP_benchmark
@rami_mmo
Rami
2 months
ok, so, who is training/trained DinoV2 <> CLIP text embedding model?
0
0
4
0
1
11
@rami_mmo
Rami
1 month
He was so brave to say this out loud, I don't think he should be excommunicated...
Tweet media one
2
0
11
@rami_mmo
Rami
3 months
There is no rational approach to the world
2
0
10
@rami_mmo
Rami
29 days
@cloneofsimo POV: scale/norm issue fixed and model is training normally now
0
0
10
@rami_mmo
Rami
3 months
Meta's Multi-Token model is an example of how data can fit dots, saw this concept a while ago & found it a bit weird how they would constrain causality between the 4 tokens, ~~ but there's no specific reason a 4-token ahead prediction would be less capable/complex than a 1-token.
Tweet media one
Tweet media two
1
0
10
@rami_mmo
Rami
11 days
this, for some reason seems that it's much more pronounced in this field than the rest, I just don't know why...
@_xjdr
xjdr
11 days
AI is full of people that think what is difficult is valuable, lol
19
19
420
2
0
10
@rami_mmo
Rami
3 months
nice to see more research on 3d diffusion
Tweet media one
@dreamingtulpa
Dreaming Tulpa 🥓👑
3 months
WonderWorld can generate interactive 3D scenes from a single image and a text prompt in less than 10 seconds on a single A6000 GPU! A dynamically while navigating the environment 🔥
13
103
545
1
0
8
@rami_mmo
Rami
26 days
@cloneofsimo on a level, i sorta feel like the methods that failed to scale sorta got mutated to be the scalable ones, like diffusion-alike processes was used in practice through MAEs,
2
0
9
@rami_mmo
Rami
3 months
one more...
Tweet media one
@rami_mmo
Rami
3 months
i moved a model to unconditional diffusion from conditional, but me forgot to remove conditioner application on eval render, CLIP looks like this without learned conditioning...
Tweet media one
0
0
5
2
0
8
@rami_mmo
Rami
3 months
@AlbyHojel coffee from a beaker 🙃🙃
0
0
9
@rami_mmo
Rami
4 months
@bubblebabyboi no way, grimes is mourning parm :(
0
0
9
@rami_mmo
Rami
1 month
@simonlast :/ will age very badly indeed, much of the details are sufficiently superfluous and with no technical supervision, regulatory capture will not only have a much worse outcome for the general public but also for the market
0
0
9
@rami_mmo
Rami
4 months
@snoopsonar @prmshra @PicoPaco17 nooo, i had enough twitter drama for a month, i'm just glad parm is back.
1
1
9
@rami_mmo
Rami
5 months
Facebook engagement farming 101: ✅ Black child ✅ Jesus ✅ "Made it with my own hands!"
Tweet media one
Tweet media two
Tweet media three
1
1
9
@rami_mmo
Rami
2 months
Could MMCR be used as an alternative to typical contrasive losses? generally seems more elegant than those tricks i had to do for my contrasive setup.
Tweet media one
2
0
7
@rami_mmo
Rami
2 months
inits uhmm, inits, and also good lr, uhmmmm....
Tweet media one
1
0
8
@rami_mmo
Rami
5 months
@AartBik Nvidia worked hard to build the ecosystem, wouldn't be confused why they got such a huge monopoly
0
0
8
@rami_mmo
Rami
9 days
Tweet media one
0
0
8
@rami_mmo
Rami
13 days
Me just realising why eps prediction in diffussion models doesn't work without full from gate skip connection , so like ffn encoders/decoders can only do x-prediction, sorta funky I'm just understanding this is what's so damn good about UNET & Transformers
0
0
8
@rami_mmo
Rami
1 month
@yacineMTB i view gradients as intrinsic subspaces, everything optimizes for lower energy (rest state), even predictive coding (which is widely believed to be the brain's optimization mechanism) has an approximated gradient, and maximal information propagation (critical state) seems to be
0
0
8
@rami_mmo
Rami
3 months
me at my work desk
Tweet media one
@Ethan_smith_20
Ethan (boston til 18th)
3 months
me at my work desk
Tweet media one
1
0
11
1
1
8
@rami_mmo
Rami
1 month
i'm just learning jupyter notebooks on vscode support debugging, 👀 where u been my whole life
0
0
7
@rami_mmo
Rami
2 months
i hate how communication is the single point of failure for humans
0
0
6
@rami_mmo
Rami
5 months
i'm starting to think @repligate & @ryunuck are doing a much better job than the LLM arena, as entertaining as it is, their methods arr incredible and valuable.
0
1
7
@rami_mmo
Rami
1 month
kurzgesagt did indeed fall off, it used to be one of the small number of "scientific" channels I loved on YouTube, now watching their latest video in about a year, all I felt was cringe at the stupidity of the arguments raised, cest la vie
@Kurz_Gesagt
Kurzgesagt
1 month
Humanity's smartest invention might also be its last. Superintelligent AI could be our dream come true – or our worst nightmare. Watch our latest video to find out what it could mean for the future of our species:
Tweet media one
126
139
1K
0
0
7
@rami_mmo
Rami
3 months
this is why i'm in love with diffusion right now, something new comming soon.... (i don't wanna spoil yet :3)
Tweet media one
0
1
6
@rami_mmo
Rami
3 months
ok, new rabbithole unlocked i think...
Tweet media one
@dreamingtulpa
Dreaming Tulpa 🥓👑
3 months
Human 3Diffusion can reconstruct realistic avatars from a single RGB image, achieving high-fidelity in both geometry and appearance. Links ⬇️
2
13
119
0
0
7
@rami_mmo
Rami
4 months
me in gcs
1
0
7
@rami_mmo
Rami
2 months
autoregressive modelling on continious spaces turned out to be much harder than i thought
2
0
7
@rami_mmo
Rami
2 months
for anyone looking for a usecase: you can use my projection model to finetune (very quickly/efficiently) stable diffusion controlnets to use CLiP & DiNOv2 embeddings at the same time, you can have best of the both worlds... Multi-Modality & self-supervised features
@rami_mmo
Rami
2 months
Just finished, DinoV2<>CLiP image embeds. I get about 84%-94% of CLIP's performance on oneshot image & text retrieval on MSCOCO, Flickr8k & Flikcr30k eval sets using CLiP_benchmark
0
1
11
0
0
7
@rami_mmo
Rami
2 months
it's geniunely crazy to me how much free knowledge there is everywhere
0
0
7
@rami_mmo
Rami
8 days
i wonder why diffusion steps aren't expected to have exposure bias...
1
0
7
@rami_mmo
Rami
4 months
@Grimezsz this is obv an endorsement to parm lol
0
0
7
@rami_mmo
Rami
5 months
@garrytan let it survive, makes you look cool
0
0
7
@rami_mmo
Rami
3 months
I trained an SAE both with L1 and this L0 approx, and i find the latter smoother than L1, especially for MSE, on the graphs "sp" is sparsity from 0-100% (tops at 96.8% on mine over batch), sp_loss is the l0 , with scale of 1000 , eps of 1e-3 and alpha of 0.03.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
@Ethan_smith_20
Ethan (boston til 18th)
3 months
@HessianFree It works although they say we ultimately want a penalty that selects for active/inactive. L1 penalizes total magnitude which is why they call it a crude approximation. I’ve had good results with both l1 and trying some of the above though!
2
0
3
0
1
7
@rami_mmo
Rami
2 months
just found this on a dataset i had,
Tweet media one
2
1
7
@rami_mmo
Rami
2 months
clip text and image embeddings shows a similiar reconstruction error trend ,
Tweet media one
1
0
7
@rami_mmo
Rami
3 months
ok, i think i am allowed to share this, person_1, person_1_2 pictures 👇, i hold no opinion of rihanna, i'm just working...
Tweet media one
Tweet media two
@rami_mmo
Rami
3 months
Tweet media one
1
0
3
0
0
6
@rami_mmo
Rami
2 months
@cloneofsimo hmm, didn't the chameleon paper state this is generally unstable? they said there was a push and pull happening between the text and image modalities, each sabotaging the other by adjust its norm, they used a query-key normalization, how did they solve it?
1
0
7
@rami_mmo
Rami
5 months
@pmddomingos applies for you aswell sire
0
0
7
@rami_mmo
Rami
3 months
Tweet media one
0
0
7
@rami_mmo
Rami
4 months
improved code is doing 5:1 compression ratio , i.e 197kb -> 39.4kb , about same as waggie :D
@MartinShkreli
Martin Shkreli
4 months
who has had luck with the neuralink challenge? i spent about 90 mins on it and get 80% deflation eg 197kb wav --> 40kb
23
2
114
1
1
6
@rami_mmo
Rami
3 months
i need more compute
1
0
6
@rami_mmo
Rami
4 months
I never thought parm would be a meme like this
@Nexuist
andi (e/alb)
4 months
ROON! YOU’RE ON THE WRONG ACCOUNT ROON! DON’T HIT RETWEET ON THAT ROON! ROOOONN
Tweet media one
Tweet media two
27
38
1K
0
0
6
@rami_mmo
Rami
1 month
I'm not e/acc, never been, neither I'm stating kurtzgsagt is a decel
@rami_mmo
Rami
1 month
kurzgesagt did indeed fall off, it used to be one of the small number of "scientific" channels I loved on YouTube, now watching their latest video in about a year, all I felt was cringe at the stupidity of the arguments raised, cest la vie
0
0
7
0
0
5
@rami_mmo
Rami
2 months
why am i not seeing much work on letting diffusion models learn latent conditioning vectors by themselves, i'm seeing a lot of 3< y old work on GANs for that
1
0
6
@rami_mmo
Rami
3 months
super cool work, i made the VAE's latent space a bit sparse and got fun results, i didn't expect text at all to appear with a person's image above, so much potential here
Tweet media one
@Ethan_smith_20
Ethan (boston til 18th)
3 months
increased model dim to 512 from 256 and put it through another 50k steps.
Tweet media one
2
0
16
0
0
6
@rami_mmo
Rami
3 months
wait what?
Tweet media one
0
0
5
@rami_mmo
Rami
4 months
decoder now learning to decode the jointly learned embeddings into token space :D
Tweet media one
@rami_mmo
Rami
4 months
joint-embedding transformers are somewhat cool, blue is positive, orange is negative
Tweet media one
0
0
3
0
0
5
@rami_mmo
Rami
5 months
@mallocmyheart @mallocmyheart we gotta speak, like RN, I have something cool that worked on RL,
2
0
6
@rami_mmo
Rami
3 months
better naming, spfalse_lr0.0001 is gelu, leakyrelu(slope=0.02)^3 was best performing one
Tweet media one
@Ethan_smith_20
Ethan (boston til 18th)
3 months
some tests with a mini ViT on CIFAR10... So the relu-square hype seems pretty legit? this came out in 2022 and we all just totally overlooked it? Not only that... for shits and giggles tried relu^3 and leaky-relu^3 which both well outperform GELU and similarly to relu square.
Tweet media one
Tweet media two
10
2
87
0
0
6
@rami_mmo
Rami
4 months
@ICBMinvestments he must have a very flexible neck, what's his secret?
0
0
0
@rami_mmo
Rami
3 months
those silly times, 💎🙌 it is
Tweet media one
0
0
6
@rami_mmo
Rami
4 months
great motivator
@markgranza
Mark Granza
4 months
I often think about this.
Tweet media one
64
2K
23K
0
0
6
@rami_mmo
Rami
3 months
ahh, this is weird....
Tweet media one
2
1
6
@rami_mmo
Rami
3 months
@bubblebabyboi will miss your twitter antics bubble.... :( :( :(
1
0
6
@rami_mmo
Rami
5 months
@ryunuck I noticed something in the sequence, when you give Claude Opus on a Latin based sequence it simply rejects interacting with it by saying it could contain harmful content and it doesn't understand/ doesn't want to process,
1
1
5
@rami_mmo
Rami
6 months
@ylecun @PicoPaco17 TIL some people don't have internal monologue:
@JoshWalkos
Champagne Joshi
6 months
This is a fascinating conversation with a girl who lacks an internal monologue. She articulates the experience quite well.
760
2K
12K
0
0
6
@rami_mmo
Rami
2 months
some deets, left is the top prob for sequence, left is how it's distributed over the batch
Tweet media one
Tweet media two
0
0
6
@rami_mmo
Rami
4 months
websim 👀
0
0
5
@rami_mmo
Rami
1 month
many congrats to @Ethan_smith_20 & team on joining canva!
@Ethan_smith_20
Ethan (boston til 18th)
1 month
I am so happy to announce today has joined the Canva family. It’s been one hell of a journey and I don’t think I could have imagined a better team to work alongside. I am absolutely psyched to be building at a scale I’ve only gotten to dream of, and
22
7
128
0
0
6
@rami_mmo
Rami
3 months
I've been very lucky to have the friends I do
0
0
5
@rami_mmo
Rami
4 months
I think VoiceEngine & the large-scale video compression network that was trained for SORA have been extensively used
@RobertHaisfield
Rob Haisfield
4 months
Native voice input instead of transcribed text means that the training data probably picked up more emotional intelligence. It’ll be really trippy hearing GPT-4o express what it feels. Sounds like it’s constantly excited, but what happens when it gets frustrated?
1
0
11
0
0
6
@rami_mmo
Rami
10 days
god damn, seeing CFG work outside of image generation is crazy
0
0
6
@rami_mmo
Rami
3 months
side note: just found out the Arabic word for diffusion is "scattering" & "dispersion" , I find it interesting cause, in other languages I know of it's mostly something that's done by others (actions), but here it's defined as a natural process,
@fkasummer
AKHIL 🪡
3 months
diffusion still underrated
0
0
4
0
0
6
@rami_mmo
Rami
2 months
I love how absolutely random but relevant my twitter timeline has become
0
0
5
@rami_mmo
Rami
3 months
I wanna make chaotic diffusion 3
0
0
5
@rami_mmo
Rami
5 months
@tszzl Only show that talks about ML/AI, and then shows scenes writing a loss functions / models. cudos whoever made that happen...
1
0
5
@rami_mmo
Rami
5 months
got 18/36
@mehran__jalali
Mehran Jalali
5 months
ChatGPT gets 24/36 on this test Higher than some people I know
Tweet media one
Tweet media two
2
0
64
0
0
5
@rami_mmo
Rami
3 months
@mallocmyheart pov: dataset bled and hayden hasn't realized it yet
1
0
5
@rami_mmo
Rami
3 months
why does conditioner application on diffusion networks have goldilock zones like this?
Tweet media one
1
0
5
@rami_mmo
Rami
5 months
Fear is the mind-killer. Fear is the little-death that brings total obliteration. I will face my fear. I will permit it to pass over me and through me.
1
0
5
@rami_mmo
Rami
3 months
idk how to interpret losses of diffusion models, they move so little over thousands of steps, i think this might have to do with the leniency they have, in theory for MSE, they can learn to displace the noise to less-magnitude sensitve parts the output just to satisfy the loss..
Tweet media one
0
0
5
@rami_mmo
Rami
3 months
@teortaxesTex @Ethan_smith_20 lmao, yes, someone mentioned your post on the ml chat gc, i can add you if you want...
1
0
4
@rami_mmo
Rami
5 months
@emerywells i think all companies want to streamline, but it's not as easy as it's for a small company, i for ex. used to launch products with large companies, there's legal, marketing budgets, brand considerations, UX consideration, but trust me, everyone i worked with wants o
1
0
5