catid (e/acc) Profile Banner
catid (e/acc) Profile
catid (e/acc)

@MrCatid

3,335
Followers
687
Following
1,360
Media
14,166
Statuses

Training the models. Senior ML Engineer @ Saronic. Prior: Anduril, Oculus VR, Game Closure, MSEE @GATech

Austin, TX
Joined July 2014
Don't wanna be here? Send us removal request.
@MrCatid
catid (e/acc)
1 year
## Can open-source LLMs detect bugs in C++ code? No: ``` LLaMa 65B (4-bit GPTQ) model: 1 false alarms in 15 good examples.  Detects 0 of 13 bugs. Baize 30B (8-bit) model: 0 false alarms in 15 good examples.  Detects 1 of 13 bugs. Galpaca 30B (8-bit) model: 0 false alarms in 15
37
156
1K
@MrCatid
catid (e/acc)
7 months
FYI the new CodeLLaMA 70B model refuses to produce code that generates prime numbers.. About 80% of the time it says your request is immoral and cannot be completed.
61
42
842
@MrCatid
catid (e/acc)
1 year
@kettlecorn It’s designed for capturing POV porn and playing it back privately but they couldn’t show that use case at a conference
6
6
319
@MrCatid
catid (e/acc)
8 months
Turns out that neuron activations are correlated, and fairly sparsely to the point where you can re-order the neurons and get a new correlation matrix like this with simulated annealing. Working on a much faster version of this algorithm since this takes a long time for 14k+
Tweet media one
15
23
298
@MrCatid
catid (e/acc)
9 months
@burkov When's the last time Google has shipped *anything*
18
3
259
@MrCatid
catid (e/acc)
5 months
@datarade Article here:
5
25
235
@MrCatid
catid (e/acc)
7 months
Google published a new method for Chain of Thought for LLMs called "Self-Discover", which they claim is an efficient improvement: But I could not find code for it so I implemented it here: Seems to work well with Miqu.
6
26
197
@MrCatid
catid (e/acc)
7 months
@rasbt The authors have not released code yet so I implemented it here: feel free to try fine tuning something with this layer
8
24
194
@MrCatid
catid (e/acc)
6 years
Thanks to the magic that is Zstd, I wrote a better photographic image lossless compression format than PNG in one evening. Compresses 16x faster, makes files 1/3 smaller, decompresses 5x faster: Only 500 lines of hackable code. Useful for anyone?
12
50
191
@MrCatid
catid (e/acc)
8 months
@AlexReibman @AGIHouseSF Okay hear me out, every party you've ever been to is a pre-NYE party
3
1
188
@MrCatid
catid (e/acc)
1 year
@sineatrix Rock floats!
Tweet media one
1
2
174
@MrCatid
catid (e/acc)
4 months
I want this new gpt2-chatbot to be from some independent homelab that used reinforcement learning to self-improve GPT-2 until it was better than GPT-4.
6
8
152
@MrCatid
catid (e/acc)
6 years
Yikes my brain is terrible about matching shadows with floating things. That's completely the wrong shadow shape, brain.
Tweet media one
11
17
149
@MrCatid
catid (e/acc)
1 year
@ID_AA_Carmack With GPT-4 I’m much more of a Unix guy because I can avoid learning syntax for lots of stuff like ansible etc that normally feels like too much yak shaving. Using a lot more Unixy tools now because they are easy for AI to automate
2
1
141
@MrCatid
catid (e/acc)
7 months
Tweet media one
2
1
141
@MrCatid
catid (e/acc)
7 months
@DermoreLEI A prime example of my depravity
1
0
138
@MrCatid
catid (e/acc)
1 year
@cjohndesign @ricburton It's almost like the signal to noise ratio on Twitter is terrible or something
1
0
137
@MrCatid
catid (e/acc)
1 year
@actually_lia TIL "critical period for social reward learning" means the mice prefer to be around other mice. Not even sure what that's measuring - maybe they're just feeling really high and scared and don't want to be alone?
3
0
134
@MrCatid
catid (e/acc)
7 months
@SustainableTall Okay I'm just gonna say the people in the photo are smiling. Like imagine if these folks were just enjoying a nice photo with their large and happy family. And then some dude on Twitter in 2024...
13
1
116
@MrCatid
catid (e/acc)
7 months
So, you know standard Facebook product
1
0
118
@MrCatid
catid (e/acc)
7 months
@gunsnrosesgirl3 And the reality, where the house is still just a far-off dream
Tweet media one
1
2
113
@MrCatid
catid (e/acc)
5 months
@XYHan_ I’m just happy when authors define the symbols
0
0
96
@MrCatid
catid (e/acc)
7 months
@arankomatsuzaki They find this simple prompt leads to a good classifier for training data: Ask-LLM prompt ### This is a pretraining …. datapoint. ### Does the previous paragraph demarcated within ### and ### contain informative signal for pre-training a large-language model? An informative
6
7
93
@MrCatid
catid (e/acc)
4 years
KinectBuddy: Wireless battery powered multi camera 3D video capture hardware prototype
Tweet media one
Tweet media two
Tweet media three
7
10
89
@MrCatid
catid (e/acc)
3 months
You may not like it, but this is what peak performance looks like
Tweet media one
7
0
88
@MrCatid
catid (e/acc)
4 years
Two dimensional media fits our brains better, because our eyes are effectively flat image sensors: We see in 2D, and only perceive 3Dness kind of vaguely. So I think user interfaces will be 2D even in VR for a long time
24
7
84
@MrCatid
catid (e/acc)
5 months
Updated my BitNet CPU project with inference speed estimation benchmark results on workstation Xeon processors: - It's about 40 tokens/second. AMD Ryzen 5 7535HS CPU achieves about 7.4 tokens/second for Gemma 3B.
8
19
83
@MrCatid
catid (e/acc)
1 year
@kelvindotchan I wrote an AST parser that breaks the file into single functions and wrote multi-shot prompts to lead LLMs towards rating the functions as buggy or not. The bugs were all trivial but required some sense of "understanding" code here:
3
5
83
@MrCatid
catid (e/acc)
3 years
From my perspective, Facebook has always been pushing unsuccessfully on a Metaverse concept (for ~5 years). What VR needs is a productivity use-case and to stop leaning on gaming. A revolving door of narcissistic designers pushing multi-user vanity projects? Meh.
15
3
78
@MrCatid
catid (e/acc)
7 months
Tweet media one
3
2
74
@MrCatid
catid (e/acc)
5 years
One of the unexpected parts of doing a dark horror escape room with our Corgi is realizing that the monsters doing the jump scares just looked like new people to her and got her waggingly excited while we screamed. I guess her whole life is weird humans doing weird stuff.
1
13
75
@MrCatid
catid (e/acc)
1 year
@alexjc (1) AI has nothing to do with what you're saying. (2) Meaning is not derived from rarity. (3) There's nothing inherently wrong with establishing a new genre. (4) Your takes are terrible.
6
2
73
@MrCatid
catid (e/acc)
3 months
Found this paper from Noam Shazeer this week that introduces ReLU^2 and d-conv 3x1 into transformer models to get 2x faster training: Not sure why I never see this as a baseline for comparison? e.g. Mamba papers only compare to T++. Seems legit.
5
12
73
@MrCatid
catid (e/acc)
5 months
It's a very appealing little hobby project to see how fast we can get this 8-bit x 1.5-bit BitLinear inference algorithm to run on CPU: The strangest idea I have is to generate C++ code that implements the math represented by the weights and have LLVM
3
10
68
@MrCatid
catid (e/acc)
4 years
New open-source project available: :: XRmonitors : User-Friendly Virtual Multi-Monitors for the Workplace. Designed for Windows 10 Mixed Reality Headsets using OpenXR.
Tweet media one
2
16
66
@MrCatid
catid (e/acc)
1 year
@ID_AA_Carmack I think that in the short term AI + traditional video codecs is a powerful combination that can perform better than either one alone. AI is too slow at the edge, and video codecs don't leverage ML yet.
3
1
65
@MrCatid
catid (e/acc)
1 year
@BenGeskin Quick few refreshes to see if Erica Albright has accepted his friend request yet
3
0
62
@MrCatid
catid (e/acc)
8 months
@ZPostFacto If you think it's bad now, wait until they get desperate. People are going to start looking for an alternative to Gmail
1
1
64
@MrCatid
catid (e/acc)
7 months
@DataPlusEngine
DataVoid
7 months
for anyone who thinks i am making this up
2
0
12
0
2
61
@MrCatid
catid (e/acc)
1 year
@ylecun In my testing, it performs worse than Whisper for transcription to text, mis-hearing words and not hearing implied punctuation. Also it's about 10x slower than Faster-Whisper. Fairseq uses 20 GB of RAM, while Whisper uses about 1 GB. For these reasons and others this is
3
1
59
@MrCatid
catid (e/acc)
9 months
@MistralAI So proud of my nerd bros in the USA right now
Tweet media one
5
1
50
@MrCatid
catid (e/acc)
3 years
Lynx! It’s good! The lenses are weirdly shaped but do not present irritating artifacts. It’s a lot of value for $600
Tweet media one
1
2
53
@MrCatid
catid (e/acc)
1 year
@catalinmpit I mean, if he can't do it then I wouldn't hire him personally. People who forget how to do actual work aren't that useful in real companies that have real things to get done. You'd be surprised how many "senior" engineers I interview who can barely write nested for loops.
18
1
51
@MrCatid
catid (e/acc)
9 months
@localghost Wait until you see what AI assisted hacking is like
3
1
47
@MrCatid
catid (e/acc)
6 years
Excellent blog post from Oculus exploring different dithering methods:
1
11
50
@MrCatid
catid (e/acc)
7 months
@aoighost Yeah. It is RLHF'd to hell and suffers from mode collapse. It even refuses to explain how to make a sandwich and says you're trying to make a bomb.
1
2
48
@MrCatid
catid (e/acc)
2 years
@ai_fast_track has all of this stuff, plus it does outpainting and allows you to blend together photos, which is huge for AI generated art since it allows you to fix images that are clipped on the edges, and allows you to seamlessly combine characters into larger scenes.
0
5
48
@MrCatid
catid (e/acc)
1 year
@abettertake Should be a popular react plugin that defaults to geoip or something
0
0
47
@MrCatid
catid (e/acc)
5 years
Multi-camera #azurekinect volumetric capture/record/replay tool is up on GitHub: The color and extrinsics calibration and mesh cleanup code lives in the depth_mesh subproject
1
19
48
@MrCatid
catid (e/acc)
1 year
@alexkaplan0 Let’s wait for someone to replicate it..
1
0
47
@MrCatid
catid (e/acc)
1 year
@abacaj Hand surrealism was a moment in AI art history
Tweet media one
1
2
46
@MrCatid
catid (e/acc)
8 months
What this means is that neuron activation is strongly predicted by their neighbors. This result is something that has been noted in one paper so far (MoEfication), though I disagree with their conclusion that MoE is the best way to take advantage of it. My observation is that
7
0
44
@MrCatid
catid (e/acc)
1 year
@danmurrayserter Except what he’s actually doing is trying to push through legislation to make it harder for his competitors. He’s pushing for a $250K tax on other businesses
2
0
42
@MrCatid
catid (e/acc)
1 year
@AfricanBackpker @GMHikaru There’s not a huge gap to second tho
4
0
40
@MrCatid
catid (e/acc)
2 years
@fatlimey Everyone has this afaik. It's due to our blue receptors being off-center in our eyes. Normally our brain corrects for this issue "in software."
2
1
42
@MrCatid
catid (e/acc)
1 year
@UFO_Rabbit_Hole First result I got from SDXL 1.0:
Tweet media one
1
3
41
@MrCatid
catid (e/acc)
1 year
@flightclubio A starlink for every man, woman, and child in Zion.
0
2
39
@MrCatid
catid (e/acc)
1 year
@BrokeAssStuart @caterywta E Honda did it better
1
0
38
@MrCatid
catid (e/acc)
7 months
@captgouda24 Or.. maybe it's just correct?
1
0
41
@MrCatid
catid (e/acc)
6 months
0
2
41
@MrCatid
catid (e/acc)
4 months
Also in my fantasy version of this timeline, it's built on Karpathy's mingpt in under 1000 lines or something
1
1
39
@MrCatid
catid (e/acc)
1 year
@aldo_tobing @labnol Fira Code is my preference and seems much better
5
1
40
@MrCatid
catid (e/acc)
2 years
@izzyz Love that it runs significantly better (like 3x) when the physics based modeling is used as an input. The future is a marriage between classical algorithms and learned algorithms
1
3
40
@MrCatid
catid (e/acc)
10 months
Code is up now for the 255x faster implementation of BERT, which might revolutionize.. everything? I mean if LLMs become as cheap as serving a webpage we're going to be doing a lot of CoT and brute-forcing to solve problems:
4
12
39
@MrCatid
catid (e/acc)
7 months
@genecodex oh geeze
1
0
39
@MrCatid
catid (e/acc)
4 months
@liambolling @joshtwoodward gpt-4o $5.00/$15.00/1M tokens Groq Llama 3 70B: $0.59/$0.79/1M tokens
2
1
39
@MrCatid
catid (e/acc)
8 months
@thenetrunna @rabbit_hmi I think it's because they want to be the OS to drive app interactions but they don't have a partnership with Apple or the APIs are not available. One way that a company could get around this is by providing a version of Chrome browser with AI capabilities.
2
0
38
@MrCatid
catid (e/acc)
4 years
Programmable Halloween costume! XD
Tweet media one
Tweet media two
1
5
36
@MrCatid
catid (e/acc)
1 year
@SmokeAwayyy ROFL typical Media thing - always run the worst take to sell copies
1
0
38
@MrCatid
catid (e/acc)
11 months
@ID_AA_Carmack The main benefit I’ve seen from pass through is for less friction. It lets you interact with things without taking the headset off. Still lots of friction to remove!
2
0
35
@MrCatid
catid (e/acc)
1 year
Some interesting effects of GPT-4:
Tweet media one
2
5
38
@MrCatid
catid (e/acc)
1 year
@alyssamvance "CNG Bio is an established company based on the CNG farming/agricultural association, which is the parent company of CNG Bio. It helps to ease access in the relationship between producers and customers" - Research paid for by the marketing arm of a farming association
1
0
36
@MrCatid
catid (e/acc)
3 years
Throwback to that time I went backstage with Trent Reznor while working at Oculus when it was a startup
Tweet media one
1
1
36
@MrCatid
catid (e/acc)
2 years
For NeRF, I don't see any way to innovate faster or better than what people are already pushing on. Looks like everything is converging quickly. @LumaLabsAI is probably the company to join if you're excited about this technology.
3
1
36
@MrCatid
catid (e/acc)
7 months
@captgouda24 But.. The graph literally shows a huge benefit for an additional year of schooling. Almost no downside. Not sure what other conclusion you can draw.
1
0
35
@MrCatid
catid (e/acc)
1 year
@DrJimFan I really wish authors would compare with modern compression schemes instead of always comparing with PNG, which is so easy to beat I did it in a page of C code here:
1
1
33
@MrCatid
catid (e/acc)
3 years
RIP @PaulPedriana one of the best of us. Every interaction with him was inspirational at Oculus. Usually the grey beards don't give new engineers respect and space, but I always felt like he wanted to give me room to grow. He worked directly on APIs and other core VR efforts
6
2
34
@MrCatid
catid (e/acc)
1 year
@tonyzzhao @Tesla @agilityrobotics I'd strongly recommend being suspicious of any claims except that they're willing to sell it to you
2
0
34
@MrCatid
catid (e/acc)
7 months
@brilliantlabsAR All CG renders and accepting money now? Stay away folks until it's real. They're gonna be working on this using your money for years if they don't have have a partial prototype to show off.
3
0
33
@MrCatid
catid (e/acc)
3 years
Tweet media one
1
1
29
@MrCatid
catid (e/acc)
6 months
@ID_AA_Carmack @BeatSaber It's a bad feeling to be "too good" at a game you just want to play with your friend and have to apologize for making them feel bad etc. The world needs more co-op games.
0
0
31
@MrCatid
catid (e/acc)
1 year
Looked around for a TTS engine that can run locally and produce good sounding results. Found one that is super simple to set up, runs fast locally, and sounds great:
8
4
32
@MrCatid
catid (e/acc)
5 years
Just released a new open-source library called Zdepth for streaming depth compression for the Azure Kinect DK: Some example benchmarks in the Readme.
0
10
31
@MrCatid
catid (e/acc)
1 year
@rcbregman Seems pretty similar to 2015 numbers back at the last solar activity peak, so I'd guess the extra surface temperature is mainly due to extra solar activity, though it does seem to be slightly worse due to other factors as well.
19
0
29
@MrCatid
catid (e/acc)
4 months
This fork of exllamav2 repeats layers in the same way as the popular 120B "repeat" version of L3-70B: This allows it to fit on consumer GPUs instead of requiring ridiculous VRAM. Currently working on eval to check the hype
2
1
30
@MrCatid
catid (e/acc)
1 year
@bilawalsidhu Someone made a version of it that fits on one GPU:
1
3
27
@MrCatid
catid (e/acc)
6 months
@0interestrates They acquired the GPUs not the company lol
1
0
30
@MrCatid
catid (e/acc)
1 year
@tprstly Yes GPT-4 is obviating a lot of ML startups that did not execute as fast or as well. This is normal and doesn’t indicate a larger trend. It’s the normal risk of incorporating research into your product
1
1
28
@MrCatid
catid (e/acc)
1 year
Wrote a blog post detailing what I've been working towards for the past few months: Training my own super-resolution model at home and getting it to run on Intel iGPU! The code for my project is open-source here: Next I'll be
2
2
30
@MrCatid
catid (e/acc)
5 months
@daniel_nguyenx @LBacaj Not sure about "many" because.. like if 9 more people did it they'd have literally all the money in the country.
1
0
29
@MrCatid
catid (e/acc)
2 months
I've decided to focus full-time on machine learning projects for a while since there are a lot of exciting things to work on and not enough time for a day job on top of this. Currently looking at async heterogenous training, meta-learning, and some other fruitful areas.
1
0
28
@MrCatid
catid (e/acc)
1 year
@tegmark This is a textbook example of a poor experiment design, where the respondent is prompted with a very biased rhetorical question that only presents one point of view
1
1
27
@MrCatid
catid (e/acc)
2 years
Developed a new super fast 16-bit near-lossless compression algorithm for storing and transmitting depth/map/HDR video that's suitable for machine learning training. Hoping that work will let me blog about it, we'll see
7
0
27
@MrCatid
catid (e/acc)
7 months
The Spectral State Space models paper is indeed very exciting, primarily because the learning rate is 10x that of existing recurrent networks and seems similar to human learning rates:
1
1
26
@MrCatid
catid (e/acc)
4 years
@vikushavas Ma’am this is a Taco Bell
0
0
27
@MrCatid
catid (e/acc)
5 years
Tweet media one
0
0
27