Meet Robo!
i'm in the very early stages of building an accessible, general-purpose robotics platform, and looking for a few passionate people to help shape it from ground up. if you'd be excited about moving fast on a small, hands-on team I would love to chat -- my DMs are
>be me 🤡
>impulse pre-order AI hardware wearable (Oct '23)
>Get an email later that there won't be an Android release, "reply for refund"
>reply for refund
....
>months pass, no refund
>send three follow-up emails, nada
>DM the founder on X, crickets
>find out they spent $1.8M
I spent a couple months at the beginning of this year learning about GPU programming through trying to optimize inference for
@chichengcc
awesome Diffusion Policy paper. I was able to improve inference time for the denoising U-Net by ~3.4x over Pytorch eager mode and ~2.65x over
@AviSchiffmann
thanks! not assuming malintent, but others also mentioned they're missing refunds (
@chrisemmett_
,
@ignacioaal
,
@SoloOrTroll
). given the pivot in pricing/product, you should refund $501 to all 2023 buyers & also offer full refunds with an automated form to make this right
GPUs have a memory hierarchy which scales from large capacity, low bandwidth, high latency (global memory) to low capacity, high bandwidth, low latency (thread registers). The part of the chip that does the actual computation (FP32/INT8 units, tensor cores, etc) can do math
The full blog post series is a lot more detailed and hopefully helpful for anyone interested in GPU kernel optimization!
You can find all relevant code for the blog post here:
If you’ve got an Nvidia GPU with CUDA & pytorch
There is a physical reason for why global memory access is so dang slow. Global memory stores bits in ‘DRAM cells’ which consist of a capacitor and a transistor (which gates access to the cap). Each access requires precharging a row-buffer to a neutral voltage, hooking up the row
There isn’t really any hardware to map to what one might imagine when hearing the term ‘CUDA core’. A core in CPU-land is far more capable than an FP32 ALU, which is what roughly maps to a ‘CUDA core’ in NVIDIA GPU-land. For fun, we can try to guess at how many transistors are
@ChengleiSi
@tatsu_hashimoto
@Diyi_Yang
@stanfordnlp
this is a really cool study!
did the LLM idea's get ran past a Google search? i see the novelty scores for LLM idea's are higher but i've personally noticed asking for novel idea's sometimes results in copy-pasta from the LLM from obscure blog posts/research papers
The GPUs thread registers, L1/L2 memory stores in contrast are stored with SRAM. SRAM cells consist of a 6-transistor flip-flop circuit, and because the only capacitance involved here is of the transistor gate, data can be accessed much more quickly. The downside of SRAM is it’s
@teortaxesTex
@yeswondwerr
he's still holding money/ignorning many people in the thread whose emails he just happened to not notice. this dude is next level lmao
@karpathy
thanks Andrej!
totally unrelated but I used to be on Battery Design and work out of Deer Creek. I remember recognizing you in the office on a few weekends but never got around to saying hi. its really cool to interact with you over Twitter many years later :D
@TranBaoChi7
@chichengcc
This series is great! I read each section, tried to write the code myself for 15-20 mins, then peeked if I couldn't. After that you can go try to speed up your favorite ML paper. Hopefully my series is helpful with the latter part
@fleetwood___
almost thought I had a bug in my visualization script the first time I made this but nope. thats was 300cycles of memory access latency compared to 1 cycle of tensor core latency looks like lol
connectionists in 2010 - 'these symbolic AI people are silly, how do they expect to write down all the rules for reasoning'
connectionists in 2024 - 'quick, hire 1000 Phd's to...write down all the rules for reasoning'
@jokrvivek
@realGeorgeHotz
imo the self-driving car analogy breaks down w.r.t manipulation. robots have to intentionally change physical world state. with self-driving you can abstract away any tire/road interactions and explicitly avoid changing world state with the car body
@chris_j_paxton
Tangential to your original comment but I think their scaling efforts have a very different profile than big LLM labs. They have tremendous inference time constraints (100W of 2018 compute) so can't have very many params at all. Wish they published every now and then, I'm sure
@deftech_n
@_aidan_clark_
this point is underrated! self-driving car companies have to fight a borderline step-function in the value vs. success rate curve, with 0 value until many 9s of reliability. the curve is gentler for household/serbice robots
+1, the 'China just copies American tech' narrative is absolutely cope. I've found Chinese suppliers to be incredibly competent, offering manufacturing tech/process solutions that are just better than domestic counterparts. America has a lot of catching up to do - acceptance is
Yet another exceedingly bad China take by Noah Smith. I remain genuinely puzzled how anyone still gives the light of day to someone so consistently wrong 🤷
Here's why he's wrong here on Tesla in China 🧵
pip install 'torque dense stator/rotor/geartrain, safe and energy dense li-on battery, power efficient inference compute, high stiffness structures to bring it all together, with a factory to build this reliably at scale'
All serious robotics companies are foundation model companies.
Acting intelligently in the physical space is an emergent property of a large audio visual language agent, and intelligence at the most fundamental level is the same whether its expression is digital or physical.
damn, new simulation theory just dropped (we're all stuck in a 1X world model roll-out from 2040 which was deployed to eval how well Neo can stack blocks)
@Dan_Jeffries1
it's surprisingly difficult. there was an OSS project called OpenWorm that tried to simulate C. Elegans (worm with 302 biological neurons) and it kinda flopped
The sense of touch is fundamental to how we interact with the world. But the most exciting developments in robotics continue to focus primarily on vision. I spent the last four years trying to understand why. And we might have found a pretty good fix.
Introducing AnySkin
@bayeslord
idk but an interesting clue is that neurons don't stop firing under anesthesia (they just slow down to a dull pattern), and they actually increase in firing frequency during epileptic seizures. both cases result in loss of consciousness/loss of information complexity in the
@Raunaqmb
this is so neat, awesome work!
did you find particles embedded in the skin to be necessary or could a handful of discrete round magnets result in good signal as well? just thinking the latter could be more consistent part-to-part if they were positioned by the mold cavity
vibratory bowl feeder's are so fking elegant and underrated. probably hundreds of billions in material wealth unlocked globally from these things humming along 24/7
Sorting brass by height - .380, 9x19, and 9x18 all into different bins, all automated.
Nested and flat cases go back into the collator for another pass.
Simple and effective.
we can make machines that crush Leetcode Hard's but don't have ones that can pick up a strawberry in unstructured domains reliably. economic consequences if the gap holds are pretty grim. robotics needs to pick up the pace!
@asoare159
really cool viz! could it be the training dataset has strong biases towards certain modes? maybe k-NN could be used to visualize trajectories in the training set for a given proprioceptive + T keypoint state
@simonkalouche
i think a fair argument for making the hardware as anthromorphic as possible, is that it makes scaling e2e imitation learning easier. if scaled sim2real + RL works, fancier embodiments become feasible. i was pretty bearish on the latter approach but Deepmind's recent work is🤯
@BartronPolygon
@ROBOTIS
Maybe worth opening the motor up to check out the drivetrain, I had an AX-12A that would randomly overload and it turned out to be a mechanical issue between output shaft and plastic housing. Warning: you may get mad when you see how cheap all the internal components are and how
@yacineMTB
no lie half the reason I switched careers was because I got tired of clicking around shitty UIs. parametric CAD gets pretty hard to iterate with if you're tearing up designs every other day
im going back to being a mech e once they get neuralink figured out
We put OpenAI o1 to the test against ARC Prize.
Results: both o1 models beat GPT-4o. And o1-preview is on par with Claude 3.5 Sonnet.
Can chain-of-thought scale to AGI? What explains o1's modest scores on ARC-AGI?
Our notes:
@Liu_Zeyi_
awesome work! the taping task with randomized hook/loop velcro placement is really clever.
re: robot noise causing distribution shift, is this because of motor vibration transferring to gripper? could be interesting to try mechanically isolating the gripper from the robot with
@MechovaVila
@bayeslord
i know my personal experience with general anesthesia was kind of reality bending. its not like sleep at all, its closer to 'nothingness'. one second youre in the OR, the next in the recovery room. i imagine that middle period is what death is like, a whole bunch of nothingness
Lol hurricane's expend Hiroshima nuke levels of energy every second. We could throw the entire global thermonuclear arsenal at Milton and it wouldn't even blink. Conspiracy theorists vastly overestimate our ape powers
1960s colorized LA almost looks more futuristic than LA in 2024...
progress in the physical world has been glacial, wen ubiquitious flying cars and robots?
And just like with airplanes, scaling bros will extrapolate straight lines on a graph only to find the real world is complex and progress never quite holds pace in the ways the straight line would have you believe
this is how it feels working with LLM agents
lots of scrap work and tinkering and arbitrary failures, but with occasional, awe-inspiring glimpses of promise
just like the airline industry, the technology will not only eventually succeed, but succeed to such an overwhelming