Managing Partner at Autopilot (
@hqautopilot
), a research-driven investment firm. Previously investing in AI across the public and private markets
@ARKInvest
.
Introducing Autopilot
@maxmhfriedrich
and I have spent the last several years together researching and investing in fintech and artificial intelligence across the public and private markets. We have witnessed extraordinary founders build category-defining companies - Block
Game theory on the Bing announcement…
Microsoft is trying to lower Google’s search margins which will make it harder for Google to continue running Cloud and other competitive businesses at a loss.
Google processes ~8.5 billion searches per day. The noise around Bing/AI
The Vision vs. Lidar debate is reminiscent of the AC vs. DC battle of the late 1880s.
Direct Current (DC) systems, as proposed by Thomas Edison, worked reasonably well but could not transfer power over long distances. (1/5)
But neural network training efficiency has been doubling every 16 months. If this trend continues, the cost of training will likely decline as follows:
2024: $325mm
2028: $40mm
2032: $5mm
MIT CS PhD: “I want to solve cancer with AI!”
FAANG: "Good for you. Here's $1.2mm/yr. Make people look at their screen for an additional 0.85 seconds."
Telsa won the currents battle with AC.
And I think Tesla will win the FSD battle with Vision.
In both cases, scalability matters.
(But ironically, Tesla superchargers are DC systems!)
(5/5)
GPT-3 has 175B parameters (3.14E+23 flops) and it cost ~$4.6mm to train (2020).
The human brain has at least 100T synapse (parameters). Assuming linear scaling of compute requirements, the cost to train a “human brain” in 2021 would be more than $2B.
AI is the next great platform shift. It will create an order of magnitude more value than mobile, cloud, etc.
Companies that embrace it will likely see massive upside.
Companies that don’t will be disrupted.
Surprisingly few companies are investing accordingly.
My take: AI is likely advancing far faster than most people realize. And Tesla appears to be the leader when it comes to real-world AI.
The visual perception stack and training pipeline are extensible to many use cases beyond FSD (Tesla Bot?), and the data advantages are real.
Many claimed DC was the only "safe" way to transfer electricity. Similarly, many today claim that Lidar is the only "safe" option for FSD. Both were/are wrong. (4/5)
Our Zoom open-source model is now available on GitHub:
You can read about our thesis and assumptions in our blog post:
Our base case suggests Zoom’s share price could reach $1,500 by 2026.
AI training costs are declining at 60% YoY. It cost ~$5mm GPT-3 in 2020. By 2030, it will likely cost ~$500 to train a model to the same level of performance. AI training is Moore’s law on steroids.
If you look at some of the best investors in SV it basically comes down to their ability to “trust the exponential.” Turns out this is easy to say and almost impossible to do.
Humans just not wired to understand power laws.
Vector Space:
Tesla is predicting in vector space, not image space (most computer vision systems operate in 2D image space). It makes sense, given cars operate in a 3D world, but it's very hard to build.
Nikola Tesla proposed an Alternating Current (AC) system which transferred power over long distances at high voltage and then stepped down the voltage to make it safe and usable. Similar to Vision-based systems, AC was easier and less expensive to scale. (3/5)
As a result, power generation in a DC system had to be decentralized, increasing cost and slowing expansion. Lidar-based systems are similar in that they work reasonably well, but they are slow and expensive to scale. (2/5)
We've built an AI system that solves grade school math word problems. Starting to achieve decent accuracy & represents significant progress on top of GPT-3:
Our research suggests that AI tools could make engineers, designers, lawyers, etc., 140% more productive by 2030. The impact on software spend could be profound...more to come in Big Ideas 2022.
I have now gotten enough of a taste of AI-powered creative tools to know that they're going to be much better than even the AI optimists think.
So cool to just think of ideas and iteratively have the computer implement and build on them.
The lag between technology adoption and productivity gain is known as the “productivity paradox. Total factor productivity doesn’t really increase until organizations and processes are redesigned to take full advantage of a new technological capability.
The same will be true for
Memory:
Tesla has inserted a video module and feature queue model into their NN architecture. This effectively gives the NN short-term memory
Short-term historical context is useful when driving. For example, did the lane marking seen 5 seconds ago say this is a left turn lane?
Fusion:
Per-camera detection then fusion works fine in image space but doesn't translate well to vector space. Accurate depth per pixel is critical for vector space predictions.
An iPhone 12 with an A14 chip is capable of 11 TFLOPS (trillion float operations per second). It's 68,750x faster than the first supercomputer!
Bonus: The A14 chip is ~898,325,847x faster than the Apollo 11 computer (12,245 FLOPS) that guided us to the moon.
45 years ago today the first major supercomputer was installed.
The Cray-1 had a 64-bit processor and a top speed of 160 MFLOPS.
For comparison, today’s top supercomputers are 20 billion times faster.
Tesla solves the above fusion problems by fusing image data before detections. See the improvement in the screenshot below.
Bottom left: fusing images after detections
Bottom right: fusing images before detections. Much better!
Data (images, GPS, etc.) from multiple vehicles can align to reconstruct roads and 3D obstacles for labeling. The result is that many roads and 3D obstacles can be auto-labeled.
More customers = more labels = better product = more customers. Data advantage flywheel spins!
Another problem with per-camera detection is that object prediction is hindered when target objects are only partially visible.
For example, no single camera has a complete picture of the truck in the screenshot below:
The visual perception system can predict depth as accurately as radar without the complexity of additional sensor fusion.
The blue line represents depth prediction with video, the green line with radar.
Big Ideas 2022:
From 2022 -> 2030, AI training costs could decline by 60% annually.
Within the next eight years, AI could boost the output of knowledge workers by 140% (+$56T in global output).
By 2030, AI companies could be worth $87T.
(w/
@downingARK
and
@wintonARK
)
OpenAI charges $0.03 for DALL·E to generate an image. By our estimates, a graphic designer would charge ~$150.
We estimate inference costs ~$0.005 per image (~22 seconds per image on an A100 with hourly rate of $0.87).
>80% gross margins, consistent with SaaS
The above updates also improve object detection when temporary occlusion occurs.
Notice how the updated system (blue cars) knows there are cars behind the truck, even though they are temporarily blocked.
Simulation:
Real-world data is better, but simulation can be useful where situations are difficult to source (i.e. people running down a highway) or difficult to label (i.e. 100 pedestrians).
Planning:
Planning is currently done in a less-than-optimal way. There is an opportunity to apply NNs as heuristics to improve search through action space. Doing so will prevent the search from getting trapped in local minima.
Bears: “Zoom is losing market share to Microsoft.”
Data: Zoom has held market share between 45-50% since Q1 2020 (according to traffic estimates from a third-party source). And according to Gartner, customers like Zoom.
@facebookai
recently announced research related to self-supervised learning and Visual Transformers (DINO and PAWS). The coupling of self-supervised learning and Visual Transformers is quite exciting. Here’s why: (1/9)
What is Project Starline? It combines research in computer vision, machine learning, spatial audio and real-time compression with a breakthrough 3D display.
The effect is the feeling of a person sitting just across from you, like they are right there.
A simple framework for identifying high-value applications of AI:
1. AI can partially or fully automate a task, resulting in 10x better, cheaper, and/or faster outcomes
2. Humans are currently paid >$10B annually to perform that task
3. Usage of the product generates
Our research suggests that by 2030, AI coding tools could double the productivity of human software developers.
We might be underestimating both the timing and the magnitude of impact. (1/4)
Introducing
#AlphaCode
: a system that can compete at average human level in competitive coding competitions like
@codeforces
. An exciting leap in AI problem-solving capabilities, combining many advances in machine learning!
Read more: 1/
Important announcement: Today is my last day
@ARKInvest
. I couldn't be more grateful to have had the opportunity to work alongside
@CathieDWood
,
@wintonARK
, and the rest of the ARK team. We've had an incredible front-row seat as the AI revolution has accelerated over the past two
Translation: We believe that within the next decade, AI will boost the productivity of knowledge workers (lawyers, software engineers, etc.) by nearly 140%.
Working without AI = walking
Working with AI = riding a motorcycle
Meet our newest Analyst
@downingARK
!
Prior to
@ARKInvest
, he built infrastructure/software to support billions of users at some of the largest companies in the world.
I went to a party yesterday with a bunch of non-tech people, expecting everyone to be following the OpenAI drama and discussing governance structures. Turns out we live in a silo on tech twitter.
Most people go to college because it’s “what you’re supposed to do.” But that seems like a crazy reason to spend >$50k and four years of your life. 4/4
Replit is at an insane inflection point. The technology is about to get powerful enough to power entire businesses, revenue is scaling rapidly, and we're riding multiple massive tech trends. Yet it's early enough to have an insane impact and upside
The companies with the best training data will generally produce the most performant models. Those companies will attract more customers who will generate even more data. 7/14
It just shows how humans have explored very little of the possible solution space. Can't wait to see how AI systems discover unexpected but optimal solutions to problems outside of gaming. 3/3
Human knowledge workers are paid ~$32T annually. It appears AI will be capable of automating most knowledge work within the decade (likely a lot sooner).
The adoption rate of automation is very high, given the often extraordinary ROI. I.e., ChatGPT grew from 0-100mm users in 60
Data Advantages: Contrary to common belief, most data is useless or of little value. But in a few circumstances, data is gold.
As products embed AI, large quantities of high-quality training data are needed to train models. 6/14
Low margin but high quality hardware has become a distribution channel for high margin recurring software.
Similar to the old model of Kuerigs & K-Pods, razors & razor blades, etc.
A network effect exists when one user creates value for many other users in a system (think Facebook, Twitter, etc.). The larger the network, the more value it has, and the harder it is to replicate.
This exponential value creation can be modeled with Reed's Law: 5/14
The adoption rate of AI products is incredible. The AI supercycle has started, and it’s likely going to create an order of magnitude more value than past supercycles.
E.g. in autonomous driving, companies that are already collecting real-world data will likely have the most performant models in the short-term. That performance will probably attract more customers who collect more data, improve the models, and widen the competitive moat. 9/14
2. Knowledge is now widely available at a low cost. The content on Coursera, etc. might be better than most college courses.
3. The cost of tuition is increasing with no limit in sight. 3/4
Investor relations people - check out
@twilio
’s new earnings call format. Call was Q&A only. A document with prepared remarks was available before the call. Much more time-efficient approach.
Conversely, I doubt that legacy companies will pay 27 y/o engineers $1mm+. Nor will they give them unlimited vacation or the autonomy to work on problems they find interesting. 7/8
Economies of Scale: In the 1900s, the dominant companies benefited from scale. Standard Oil's size enabled Rockefeller to negotiate railroad rebates, acquire early tank cars, etc. He could profitability sell oil at a price point lower than his competitors could produce it. 2/14
This creates a flywheel effect where the robustness and performance of a system increase exponentially over time. The result is a competitive moat that strengthens with scale. 8/14
This will inevitably confuse and anger the risk-averse middle managers at legacy companies. They will default to “no” because their incentive structure rewards stable, low-risk execution. 3/8
Autonomous driving data is hard to acquire - you need many cars on the road operating in various conditions.
Facial images (for face recognition) are easy to acquire from public datasets (labeled faces in the wild, etc.), so the value of facial data is implicitly lower. 11/14
Economies of scale aren't as powerful in software because the underlying infrastructure components - internet, servers, etc. - are generally shared resources. The cost of compute isn't that much different for BigCo and SmallCo. And scale is available instantly. 3/14
Reducing system complexity is hard but usually worth it in the long run. Most impressive fact from
@karpathy
recent presentation is that Tesla was able to shadow test vision-only on existing fleet, generating 1,000 years worth of driving data.
Just wrote about Tesla's decision to remove radar in our latest newsletter. Would love to hear anyone's thoughts/ experiences with the tradeoff of including additional sensors to fuse into an autonomous tech stack.
Other AI problems require little training data. The classic "handwritten number recognition" problem can probably be solved with just a few thousand labeled images. In this case, the data-value asymptote is low. Adding more data would only marginally increase performance. 13/14
First, they are risk-averse. Deep learning models are black-box, and it’s kind of impossible to explain why a model produced a specific output. Sometimes the output is totally unexpected. Sometimes it’s offensive. Remember Tay, Microsoft’s offensive chatbot? 2/8
The consensus opinion (at least for institutional investors) appears to be “sell tech near the potential low” and “buy value/energy near the potential high.” 🤷🏻♂️
To build a great AI model, you need data, compute, and AI/ML talent.
You can buy compute, you might be able to collect data, but you probably can’t recruit top-tier AI/ML talent.
Top-tier talent flows to
@Tesla
,
@MetaAI
,
@DeepMind
,
@OpenAI
, etc.
Interesting thought by
@sama
on AI-enabled Moore's law of everything: "Imagine a world where, for decades, everything–housing, education, food, clothing, etc.–became half as expensive every two years." 1/11
Some AI problems, like autonomous driving, require massive amounts of training data. It will probably be a long time before autonomous driving data reaches a point of diminishing returns. 12/14
Multitask Unified Model (MUM) — our latest AI milestone — has the potential to transform how Google helps you with complex information tasks.
#GoogleIO
The plow increased the productivity of farmers by 5-6x. That productivity uplift enabled significant portions of the population to shift from food gathering (the primary “job” at the time) to producing specialized goods and services. It set the stage for significant economic
Second, legacy companies will struggle to attract top-tier talent. There are (maybe?) a few thousand great AI practitioners. Top tech companies know how to recruit, retain, and maximize the productivity of top-tier talent. Legacy companies don't. 5/8
The first operational US fighter jet was designed and built in 143 days.
The 1,700 mile Alaska highway was built in 234 days.
The city of SF has been building a bus lane on Van Ness for 7,600 days....
Top tech company culture promotes autonomy. They pay very well (over $1mm for good engineers) and have cool brands. And networks are insular in that great talent follows great talent. 6/8