Kuter Dinel Profile Banner
Kuter Dinel Profile
Kuter Dinel

@KuterDinel

1,685
Followers
727
Following
37
Media
321
Statuses

2022 SWE Intern at @Google . I like to tweet about things I learn, observe, and think

Turkey
Joined November 2022
Don't wanna be here? Send us removal request.
Pinned Tweet
@KuterDinel
Kuter Dinel
19 days
Here is the NVIDIA 4090(sm89) instruction Set. Please share
32
240
2K
@KuterDinel
Kuter Dinel
19 days
FYI. I am interested in low-level GPU programming job opportunities.
11
11
523
@KuterDinel
Kuter Dinel
19 days
Here is RTX4090 ISA Spec Please retweet Accidentally deleted the last tweet🫠
3
62
295
@KuterDinel
Kuter Dinel
4 months
@thecaptain_nemo An animal caught in a trap will gnaw off its own leg to escape. What will you do?
0
2
208
@KuterDinel
Kuter Dinel
16 days
Thanks a lot for the attention. Several companies reached out. Taking a small break from the NVIDIA RE project to consider different options.
2
2
182
@KuterDinel
Kuter Dinel
20 days
I published the machine readable ISA for NVIDIA Hopper GPUs. We are currently at 1505 instructions. There are still some others that I will add.
2
9
97
@KuterDinel
Kuter Dinel
9 months
@iammemeloper Real programmers use punch cards
Tweet media one
0
3
73
@KuterDinel
Kuter Dinel
18 days
In case you missed the previous tweet here is the ISA docs for NVIDA Hopper.
3
4
62
@KuterDinel
Kuter Dinel
6 months
@0xjprx Wow, really cool that it will go to pass through when the OS crashes. Is this because the r1 chip handles pass through and compositing ?
1
0
56
@KuterDinel
Kuter Dinel
5 months
@wasphyxiation Thou shalt not make a machine in the likeness of a human mind.
0
0
52
@KuterDinel
Kuter Dinel
10 months
@coffeebreak_YT @innercitypress Next Letter: Dear Judge Kaplan, Due to Mr. Bankman-Fried's lack of access to League Of Legends he has not been able to concentrate at the level he ordinarily would.
1
0
45
@KuterDinel
Kuter Dinel
23 days
Here is the preview for the Nvidia SASS ISA docs I had been working on for the last month. Please share since I don't have much reach on this platform.
2
10
44
@KuterDinel
Kuter Dinel
10 months
@LiveOverflow Oof this hits too close, I spent way too much time digging through V8's source code trying to find anything. Nothing found yet, .... but one day I will.
1
0
28
@KuterDinel
Kuter Dinel
6 months
@RyanMorey @0xjprx I think the real reason is to minimize pass through latency.
2
0
25
@KuterDinel
Kuter Dinel
9 months
@shxf0072 @OpenAI Not your GPU, not your assistant.
1
0
22
@KuterDinel
Kuter Dinel
9 months
@felix_red_panda If true, this has important implications for on device inference. We can all have gpt-3 level models running offline in few years. Future is exciting !
3
0
20
@KuterDinel
Kuter Dinel
19 days
Still need to measure instruction latencies to be able to create a high performance compiler. Which GPU Should I prioritize?
RTX4090
172
Hopper H100
60
3
0
21
@KuterDinel
Kuter Dinel
10 months
@durreadan01 I mean, it can be fun as a gimmick to try. But I don't think there are many active users of these apps. I remember making a similar app in high school when iPhone X came out. Most people download these apps play around a little bit and then move on.
1
0
17
@KuterDinel
Kuter Dinel
10 months
@trunarla This is not specific to JS. Just how floating point numbers work.
Tweet media one
2
0
15
@KuterDinel
Kuter Dinel
5 months
@natolambert 'Twitter for h100' 😅
0
0
14
@KuterDinel
Kuter Dinel
6 months
@prerationalist DALL-E is trained on data scraped from the internet. It learns to generate an image for a given caption. You usually don't write "no elephants" caption in a random photo. People who are sarcastic on images containing elephants might. DALL-E is just "interpolating" its dataset.
2
0
13
@KuterDinel
Kuter Dinel
10 months
@RenwaX23 It's more exciting to hack something that wasn't meant to be hacked !
1
0
13
@KuterDinel
Kuter Dinel
6 months
@heyeaslo fixed it for you
Tweet media one
0
0
13
@KuterDinel
Kuter Dinel
22 days
@cis_female Yes, my method is based on the algorithm in this paper. However, I made many improvements to the method. My ultimate goal is to create python DSL where you write near assembly low level code (with simple instruction selection, register allocation etc).
1
0
13
@KuterDinel
Kuter Dinel
10 months
@molly0xFFF I wonder if we will ever see the whole FTX code base.
1
0
11
@KuterDinel
Kuter Dinel
4 months
@Grady_Booch @brent_alvord @OpenAI @Microsoft It took 6 million years to get to the modern human. That's a lot of hyperparameter optimization.
0
0
11
@KuterDinel
Kuter Dinel
16 days
@sahir2k Epic paper btw. Must read if you want to do anything with SASS
0
0
12
@KuterDinel
Kuter Dinel
6 months
@liz_love_lace Vision Asahi linux when ?
0
0
12
@KuterDinel
Kuter Dinel
6 months
@prerationalist This is because of how GPT-4 generates the prompt for DALL-E. I guess there where no verbal logic in the training dataset for generating DALL-E prompts. Should be an easy fix for OA.
4
0
11
@KuterDinel
Kuter Dinel
5 months
@adithyashreshti @sama Probably some sort of attention leak from the prompt, "hamster on its back". Similar things happen with image generation where you ask for a "person with blue eyes" and the model generates a person with blue clothes.
1
0
11
@KuterDinel
Kuter Dinel
10 months
@t3dotgg Created by Fabrice Bellard. He also created QEMU. Oh forgot to mention, he invented a new way to calculate digits of PI.
2
0
11
@KuterDinel
Kuter Dinel
6 months
@SethBling This is amazing. Do you use command blocks for the calculations? I remember seeing your basic interpreter in minecraft video. Also, if you are using command blocks. How do you program them?
1
0
10
@KuterDinel
Kuter Dinel
22 days
If you are interested in SASS make sure to take a look at the amazing work done by Nouveau developers
0
0
9
@KuterDinel
Kuter Dinel
9 months
@felix_red_panda The comment for retraction is this: `There are some errors in the paper and we need to retract it`. Let's wait if they will publish an updated version.
0
0
9
@KuterDinel
Kuter Dinel
21 days
I have two different ideas for figuring out instruction latencies. Option A) Analise stall values in pre-compiled binaries Option B) Create custom sass sequences and increase stall count until we get the correct value. I think I am going to try option A first.
1
0
9
@KuterDinel
Kuter Dinel
10 months
@an0n_r0 Wow, don't this have huge implications since now we have audio deepfakes ? There where few instances of audio deepfakes being used for phishing already, such as the retool attack. But if you combine it with caller ID spoofing, most people would fall for it IMO.
0
0
9
@KuterDinel
Kuter Dinel
10 months
@LiveOverflow My goal in life is to become an important enough person that agencies/threat actors use 0-days one me !
1
1
9
@KuterDinel
Kuter Dinel
10 days
We will figure out instructions latencies by analyzing delays between data dependent instructions, need to look at anti-dependencies as well.
Tweet media one
0
0
9
@KuterDinel
Kuter Dinel
6 months
@fncischen Does the automatic inter puppylery ... ehm interpupillary distance adjustment work on dogs ?
0
0
8
@KuterDinel
Kuter Dinel
21 days
To build a high performance compiler targeting SASS directly, we need to know precise instruction latencies and throughput to have as little pipeline stalls as possible and avoid data hazards.
0
0
8
@KuterDinel
Kuter Dinel
10 months
@flyingcircuits Thanks for your response. I hope your data will be safely recovered. Maybe for future missions, it may make sense to have 2 SD cards in raid-1 configuration, ideally 3 with one remaining in the space station as last resort.
1
0
7
@KuterDinel
Kuter Dinel
1 month
Still working on automatically generating documentation for the NVIDIA instruction set architecture. Here are some instructions. Next: Modifier splitting and enumeration.
Tweet media one
0
0
6
@KuterDinel
Kuter Dinel
5 months
@RemiCadene @Tesla Excited to see what kinda of robots huggingface will build!
1
0
6
@KuterDinel
Kuter Dinel
4 months
@dingboard_ Depth and RGB channels are not perfectly alligned but here is a 3d visualization (left is marigold) . Depth anything looks a bit more accurate but I guess it depends on the application Made with
Tweet media one
Tweet media two
1
0
6
@KuterDinel
Kuter Dinel
9 months
@ValdikSS This is why we need end to end encryption.
1
0
6
@KuterDinel
Kuter Dinel
6 months
@CanadaHonk When I was reading V8's source code, I had the same idea. Programmer anottates types in TS -> Compilation to JS type info stripped -> V8 needs to deduce/ collect stats on types to produce good machine code. And since V8 is speculating code might have get deoptimized.
1
0
2
@KuterDinel
Kuter Dinel
10 months
@LiveOverflow I was an intern last summer, couldn't get a full time offer from them since they are not really hiring graduate SWEs in europe. Also check this out, this guy got root on Google machines by name squating pypi packages.
0
0
6
@KuterDinel
Kuter Dinel
5 months
@__tinygrad__ Here is what Gemini generated.
Tweet media one
2
0
6
@KuterDinel
Kuter Dinel
10 months
@ctjlewis @huggingface They are letting you use a GPU for free, which must be costing them so much. Would love to know what huggingface's burn rate is. Time will show if their strategy will work.
1
0
5
@KuterDinel
Kuter Dinel
25 days
Investigating Warpgroup MMA (Matrix Multiply Accumulate, aka Tensor Core) instructions Nvidia added with the Hopper architecture. Wonder what `gdesc` is maybe it's global descriptor ?
Tweet media one
1
0
5
@KuterDinel
Kuter Dinel
5 months
@SethBling I want to see Angry Birds in Minecraft.
0
0
5
@KuterDinel
Kuter Dinel
6 months
@mayfer If you are just going to search words, maybe try word2vec ? It's old but really good. If you `vector(”King”) - vector(”Man”) + vector(”Woman”)`. The closest vector is vector is for Queen.
1
0
5
@KuterDinel
Kuter Dinel
10 months
@AutismCapital Will he ask to play League Of Legends during the trial as well ?
1
0
5
@KuterDinel
Kuter Dinel
8 months
@LiveOverflow I was thinking about the same thing ! Even just cleaning up decompiler output would be useful.
0
0
3
@KuterDinel
Kuter Dinel
21 days
ptxas is basically a small compiler. It tries to select uniform data path instructions if it can prove that the value is going to be same(uniform) across the warp. AMD RDNA3 has something similar and calls it 'Scalar ALU instructions' I believe.
0
0
4
@KuterDinel
Kuter Dinel
2 months
@lafaiel I wonder how much transistor budget is spent on decoding x86 instructions to μops.
1
0
2
@KuterDinel
Kuter Dinel
4 months
@Teknium1 I think the ui/ux wasn't done very well. I was a GPT4 user and didn't realize it actually was shipped for a long time.
0
0
4
@KuterDinel
Kuter Dinel
6 months
@far__el Is showing off nvidia-smi outputs the new trend after showing off MRR ? Jokes aside, may I have some of that Gpu power kind, sir. Just a hundred teraflops.
0
0
4
@KuterDinel
Kuter Dinel
3 months
My dream is to build software that everyone in the world will enjoy.
Tweet media one
0
1
4
@KuterDinel
Kuter Dinel
6 months
@Mrwhosetheboss As a linux user, I feel tempted by the new Macbooks.
1
0
3
@KuterDinel
Kuter Dinel
10 months
@flyingcircuits Oh no, so sorry to hear about this. Also, I am curious why a SD card is used instead of satellite down link.
1
0
4
@KuterDinel
Kuter Dinel
21 days
There is also this paper that uses custom PTX to measure latencies. But looks like they confused uniform and regular data instructions in sass.
Tweet media one
1
0
4
@KuterDinel
Kuter Dinel
5 months
@alexkoch_ai The next step is to make the robots self replicate 😅
0
0
4
@KuterDinel
Kuter Dinel
6 months
Drinking coffee at 2 am, will implement LLM inference in numpy. Grind is 4ever
1
0
2
@KuterDinel
Kuter Dinel
5 months
@atc1441 @lozaning Make it run Doom!
0
0
3
@KuterDinel
Kuter Dinel
10 months
I am writing a tutorial on building a small jit compiler that compiles a small subset of C into x64 machine code. Almost finished with it. Here is an example program.
Tweet media one
1
0
3
@KuterDinel
Kuter Dinel
6 months
I made a Python Bytecode and AST explorer. Kinda like godbolt compiler explorer, but for Python !
0
1
3
@KuterDinel
Kuter Dinel
25 days
Looking at the life range output from nvdisasm for the 64x8x16.F32 variant, the instruction reads and writes 4 GPRs per thread. For addressing gdesc 4 UGPRs are used ... interesting.
Tweet media one
1
0
3
@KuterDinel
Kuter Dinel
7 months
@mervenoyann @youraimarketer Here is a screenshot of one of the test prompts I used with the model that I LoRa fine-tuned on turkish airoboros data. The model correctly answers the question asked ( I apologize for the red underlines. I tried a few other questions like this, and it answered those questions
Tweet media one
1
1
3
@KuterDinel
Kuter Dinel
5 months
@karpathy @obsdmd Have you considered emacs org mode? I don't feel comfortable using a closed source note taking program.
0
0
3
@KuterDinel
Kuter Dinel
9 months
@trashh_dev It would be more cursed if the query was a string. ``` document.query("select innerText from div where class='trash'"); ```
0
0
3
@KuterDinel
Kuter Dinel
9 months
@TakoTreba Hey maybe quick actions should have a prefix to avoid confusion. Maybe like `!github` or `/github`
2
0
3
@KuterDinel
Kuter Dinel
24 days
I integrated nvdisasm life range info to the html output. Here is the DFMA (double fused multiply add) instruction. As you see each operand reads/writes 2 registers.
Tweet media one
0
0
3
@KuterDinel
Kuter Dinel
1 month
Almost done with my nvdisasm fuzzer. Here is the recipe to encode the UTMASTG instruction.
Tweet media one
1
0
3
@KuterDinel
Kuter Dinel
5 months
Just as a ML model gives incorrect results for out of distribution samples, people often make incorrect (often negative) assumptions about things that are `out of ordinary` for them.
0
0
1
@KuterDinel
Kuter Dinel
9 months
midnight art coding session.
Tweet media one
0
0
3
@KuterDinel
Kuter Dinel
9 months
@mayfer I believe GPT-3 uses learned token embeddings instead of one-hot. Essentially each word is a vector of n size. It can be interesting to interpolate different word vectors to get weird in-between words. Interpolating positional embeddings has been used for extending LLaMAs
1
0
3
@KuterDinel
Kuter Dinel
10 months
@BenThePearman Very cool, I think you should consider doing something similar with tranpose convolution layers as well. Seeing the conv kernel move across the output to generate the image would be cool.
0
0
3
@KuterDinel
Kuter Dinel
10 months
After figuring out how to encode x64 instructions. arm64 feels like a breeze. Having 24bit offsets for control flow instructions feel a little bit weird though.
0
0
3
@KuterDinel
Kuter Dinel
1 month
Note that this doesn't include the control code section for the instruction.Will add that later. Most(all?) warp level instructions require that the read and write barriers are set to 7 (means disabled/unset). For more info on sass control codes
0
1
3
@KuterDinel
Kuter Dinel
7 months
@youraimarketer Hey, there is also this dataset created by @mervenoyann . I am also experimenting with turkish LLMs. I suspect merve used Airoboros or something similar to it to generate this dataset. Airoboros data works well to make mistral models speak turkish.
2
0
3
@KuterDinel
Kuter Dinel
5 months
@shxf0072 I think what is crazier is the model is only trained with video and no action data! But it learns a `latent action model` and the actions are consistent across different generations.
0
0
1
@KuterDinel
Kuter Dinel
5 months
@AnanthVeluvali Maybe you need to pay for things like food & shelter?
0
0
2
@KuterDinel
Kuter Dinel
7 months
@grkn Merhabalar. Benzer bir hikayem var. Programlamaya 11 yaşında başladım. İyi bir öğrenci olmadığımdan Düzce üniversitesini kazanabildim. MIT OCW üzerinden algortima dersleri izledim. Yüzlerce leetcode sorusu çözdüm ve son senemde Google'da staj yaptım. En son işten çıkartmalar vs
1
0
2
@KuterDinel
Kuter Dinel
6 months
@johndmcmaster Microchip wafers look so aesthethic. I really want to have a few to frame and hang on a wall. The way they reflect light looks like peacoc feathers or the morpho butterfly. I think the underlying physics is the same, nano structures interfering with visible light (structural
0
0
2
@KuterDinel
Kuter Dinel
10 months
@DJSnM The announcer in the live stream said that the parachute deployed early as a smart decision. Here is the exact moment where this is said:
0
0
2
@KuterDinel
Kuter Dinel
7 months
@dingboard_ That doesnt sound right, how can you have x million users when its invite only ?
1
0
2
@KuterDinel
Kuter Dinel
6 months
@theapplehub Interesting camera alignment for the non pro iPhone 16. Is it like this to be able to capture "spatial video"?
0
0
2
@KuterDinel
Kuter Dinel
1 month
Need to run nvdisasm half a million times again.
0
0
2
@KuterDinel
Kuter Dinel
10 months
@sama I wonder if there are any plans to make gpt models directly process audio embeddings instead of just speach to text output, similar to how the image processing (probably) works.
0
0
0
@KuterDinel
Kuter Dinel
9 months
@mayfer Thanks. I didn't know that one-hot was used before the emebedding process. Mathematically multiplying a matrix with a one hot vector is the same as fetching the row where the 'one' is located.
1
0
2