@coffeebreak_YT
@innercitypress
Next Letter: Dear Judge Kaplan, Due to Mr. Bankman-Fried's lack of access to League Of Legends he has not been able to concentrate at the level he ordinarily would.
Here is the preview for the Nvidia SASS ISA docs I had been working on for the last month. Please share since I don't have much reach on this platform.
@LiveOverflow
Oof this hits too close, I spent way too much time digging through V8's source code trying to find anything. Nothing found yet, .... but one day I will.
@felix_red_panda
If true, this has important implications for on device inference. We can all have gpt-3 level models running offline in few years. Future is exciting !
@durreadan01
I mean, it can be fun as a gimmick to try. But I don't think there are many active users of these apps. I remember making a similar app in high school when iPhone X came out. Most people download these apps play around a little bit and then move on.
@prerationalist
DALL-E is trained on data scraped from the internet. It learns to generate an image for a given caption.
You usually don't write "no elephants" caption in a random photo. People who are sarcastic on images containing elephants might. DALL-E is just "interpolating" its dataset.
@cis_female
Yes, my method is based on the algorithm in this paper.
However, I made many improvements to the method.
My ultimate goal is to create python DSL where you write near assembly low level code (with simple instruction selection, register allocation etc).
@prerationalist
This is because of how GPT-4 generates the prompt for DALL-E. I guess there where no verbal logic in the training dataset for generating DALL-E prompts. Should be an easy fix for OA.
@adithyashreshti
@sama
Probably some sort of attention leak from the prompt, "hamster on its back". Similar things happen with image generation where you ask for a "person with blue eyes" and the model generates a person with blue clothes.
@SethBling
This is amazing. Do you use command blocks for the calculations? I remember seeing your basic interpreter in minecraft video.
Also, if you are using command blocks. How do you program them?
@felix_red_panda
The comment for retraction is this: `There are some errors in the paper and we need to retract it`.
Let's wait if they will publish an updated version.
I have two different ideas for figuring out instruction latencies.
Option A) Analise stall values in pre-compiled binaries
Option B) Create custom sass sequences and increase stall count until we get the correct value.
I think I am going to try option A first.
@an0n_r0
Wow, don't this have huge implications since now we have audio deepfakes ? There where few instances of audio deepfakes being used for phishing already, such as the retool attack. But if you combine it with caller ID spoofing, most people would fall for it IMO.
To build a high performance compiler targeting SASS directly, we need to know precise instruction latencies and throughput to have as little pipeline stalls as possible and avoid data hazards.
@flyingcircuits
Thanks for your response. I hope your data will be safely recovered. Maybe for future missions, it may make sense to have 2 SD cards in raid-1 configuration, ideally 3 with one remaining in the space station as last resort.
Still working on automatically generating documentation for the NVIDIA instruction set architecture. Here are some instructions.
Next: Modifier splitting and enumeration.
@dingboard_
Depth and RGB channels are not perfectly alligned but here is a 3d visualization (left is marigold) . Depth anything looks a bit more accurate but I guess it depends on the application Made with
@CanadaHonk
When I was reading V8's source code, I had the same idea.
Programmer anottates types in TS -> Compilation to JS type info stripped -> V8 needs to deduce/ collect stats on types to produce good machine code. And since V8 is speculating code might have get deoptimized.
@LiveOverflow
I was an intern last summer, couldn't get a full time offer from them since they are not really hiring graduate SWEs in europe.
Also check this out, this guy got root on Google machines by name squating pypi packages.
@ctjlewis
@huggingface
They are letting you use a GPU for free, which must be costing them so much. Would love to know what huggingface's burn rate is.
Time will show if their strategy will work.
Investigating Warpgroup MMA (Matrix Multiply Accumulate, aka Tensor Core) instructions Nvidia added with the Hopper architecture. Wonder what `gdesc` is maybe it's global descriptor ?
@mayfer
If you are just going to search words, maybe try word2vec ? It's old but really good. If you `vector(”King”) - vector(”Man”) + vector(”Woman”)`. The closest vector is vector is for Queen.
ptxas is basically a small compiler. It tries to select uniform data path instructions if it can prove that the value is going to be same(uniform) across the warp.
AMD RDNA3 has something similar and calls it 'Scalar ALU instructions' I believe.
@far__el
Is showing off nvidia-smi outputs the new trend after showing off MRR ?
Jokes aside, may I have some of that Gpu power kind, sir. Just a hundred teraflops.
I am writing a tutorial on building a small jit compiler that compiles a small subset of C into x64 machine code.
Almost finished with it. Here is an example program.
Looking at the life range output from nvdisasm for the 64x8x16.F32 variant, the instruction reads and writes 4 GPRs per thread. For addressing gdesc 4 UGPRs are used ... interesting.
@mervenoyann
@youraimarketer
Here is a screenshot of one of the test prompts I used with the model that I LoRa fine-tuned on turkish airoboros data. The model correctly answers the question asked ( I apologize for the red underlines. I tried a few other questions like this, and it answered those questions
I integrated nvdisasm life range info to the html output. Here is the DFMA (double fused multiply add) instruction. As you see each operand reads/writes 2 registers.
Just as a ML model gives incorrect results for out of distribution samples, people often make incorrect (often negative) assumptions about things that are `out of ordinary` for them.
@mayfer
I believe GPT-3 uses learned token embeddings instead of one-hot. Essentially each word is a vector of n size.
It can be interesting to interpolate different word vectors to get weird in-between words. Interpolating positional embeddings has been used for extending LLaMAs
@BenThePearman
Very cool, I think you should consider doing something similar with tranpose convolution layers as well. Seeing the conv kernel move across the output to generate the image would be cool.
After figuring out how to encode x64 instructions. arm64 feels like a breeze. Having 24bit offsets for control flow instructions feel a little bit weird though.
Note that this doesn't include the control code section for the instruction.Will add that later.
Most(all?) warp level instructions require that the read and write barriers are set to 7 (means disabled/unset).
For more info on sass control codes
@youraimarketer
Hey, there is also this dataset created by
@mervenoyann
. I am also experimenting with turkish LLMs. I suspect merve used Airoboros or something similar to it to generate this dataset.
Airoboros data works well to make mistral models speak turkish.
@shxf0072
I think what is crazier is the model is only trained with video and no action data! But it learns a `latent action model` and the actions are consistent across different generations.
@grkn
Merhabalar. Benzer bir hikayem var. Programlamaya 11 yaşında başladım. İyi bir öğrenci olmadığımdan Düzce üniversitesini kazanabildim. MIT OCW üzerinden algortima dersleri izledim. Yüzlerce leetcode sorusu çözdüm ve son senemde Google'da staj yaptım. En son işten çıkartmalar vs
@johndmcmaster
Microchip wafers look so aesthethic. I really want to have a few to frame and hang on a wall.
The way they reflect light looks like peacoc feathers or the morpho butterfly. I think the underlying physics is the same, nano structures interfering with visible light (structural
@sama
I wonder if there are any plans to make gpt models directly process audio embeddings instead of just speach to text output, similar to how the image processing (probably) works.
@mayfer
Thanks. I didn't know that one-hot was used before the emebedding process. Mathematically multiplying a matrix with a one hot vector is the same as fetching the row where the 'one' is located.