AI image recognition models are powering the world’s next agricultural workforce:
Watch as these drones use multispectral color grading to determine the ripeness + sugar content of apples, then gently pick them:
Microsoft CEO Satya Nadella to board members:
"If OpenAl disappeared tomorrow, we have all the IP rights and all the capability. We have the people, we have the compute, we have the data, we have everything. We are below them, above them, around them."
Wow.
This is wild:
DraGAN: Interactive point-based manipulation of images using AI.
This gives you controllability of the pose, shape, expression, and layout of the objects in your images.
🤯 Full body tracking now possible using only WiFi signals
A deep neural network maps the phase and amplitude of WiFi signals to UV coordinates within 24 human regions
The model can estimate the dense pose of multiple subjects by utilizing WiFi signals as the only input
🧵
🚨 Breaking news from the land of paywalls:
Microsoft, OpenAI’s biggest investor, plans to integrate ChatGPT into Bing search by March 2023.
Microsoft already has plans in place for a DALLE-2 integration through “Bing Image Creator” as seen here:
Use ChatGPT on your own files
This is going to be big:
lets you upload a .pdf up to 60 pages long and allows you to ask questions about it in plain English ↓
These AI images are almost impossible to comprehend as fake (until you look closely)
Prompting Midjourney with "phone photo" adds an eerie sense of photorealism.
From Reddit user u/KudzuEye
We're in a "Don't look up" moment with AI right now.
A refined AutoGPT that can click around on the internet, send emails, write files, make calls, negotiate, send/receive money will emerge this year.
It will fundamentally change the world, and this cannot be stressed enough.
The First Room-Temperature Ambient-Pressure Superconductor:
A material called LK-99, a modified-lead apatite crystal structure, achieves superconductivity at room temperature.
Last week: ChatGPT Passes US Medical Licensing Exam
Today: GPT’s medical knowledge is distributed into a smooth UI
Glass AI generates a differential diagnosis or clinical plan based on a problem representation
Try Glass AI:
NVIDIA's Real-Time Neural Appearance Models have crossed the uncanny valley for realism.
They are creating the highest detail graphics we've ever seen.
New research visualizes the political bias of all major AI language models:
-OpenAI’s ChatGPT and GPT-4 were identified as most left-wing libertarian.
-Meta’s LLaMA was found to be the most right-wing authoritarian.
Models were asked about various topics (e.g., feminism,
OpenAI's speech-to-text API "Whisper" just got supercharged:
This tool transcribes audio 70x faster than Whisper
A 2-hour podcast can now be transcribed in ~30 seconds using Whisper JAX: The Fastest Whisper API
Try it here using your mic:
Eleven Labs has the most realistic AI text-to-voice platform I’ve seen.
(free to try)
It’s 99% perfect. Generates great inflection, cadence, and natural pauses.
Sample:
If Steve Jobs was still alive, he would have started an in-house AI project 5-8 years ago, and probably poached OpenAI’s key talent as soon as GPT-2 came out.
Not in a million years would he have licensed software that Microsoft is a primary owner of and tout it as the greatest
Major fusion breakthrough today:
A Lawrence Livermore National Labs fusion experiment achieved an energy net gain today for the second time, producing 3.15 megajoules from an input of 2.05 megajoules from hydrogen atoms.
The fusion experiment was repeatable.
Use ChatGPT on your own data (without limits)
Upload a PDF (any size), have a conversational dialogue with your documents:
-Research Papers
-Books
-Newspapers
-Legal Docs
Pulls citations + unlimited use plan available
Semantic search is the future ↓
📽️Text-to-Video?
It could revolutionize entertainment as we know it.
Here's Phenaki, a model that can synthesize realistic videos from text prompt sequences.
More examples below ↓
Duolingo partners with OpenAI
New AI powered features: Role Play, an AI conversation partner, and Explain my Answer, which breaks down the rules when you make a mistake
Wow - Anthropic (Google's latest $300M AI investment) is hiring a "Prompt Engineer" for $250k-$335k/yr + equity
No CS degree required, just have "at least basic programming and QA skills"
Wild times.
Google DeepMind’s CEO
@demishassabis
says their new AI model "Gemini" will far surpass the capabilities GPT-4
Gemini (still several months from release) is a multi-modal AI system that combines their Go-winning program, AlphaGo, with LLM capabilities.
Google acquired
Deepfakes have reached a new level.
AI will make you question the validity of everything.
Incredible example (with no post-processing) from
@deepwareai
OpenAI CEO
@sama
on personalized AI models:
"You should be able to write up a few pages of here's what I want, here are my values, here's how I want the AI to behave, and it reads it and thinks about it and acts exactly how you want, because it should be your AI."
When OpenAI releases Autonomous Agents, it will be like an intelligent swarm of bees clicking around on the internet.
Sending emails, negotiating, making products, purchases, fulfilling orders, etc.
It will fundamentally change the internet, and this cannot be overstated.
Generative AI will reprice the entertainment industry.
This video is made with RunwayML, which just released this Gen2 text-to-video model on their mobile app
The coming generation will correctly assume that all media in manipulated in some way.
The tools are getting too good.
Trust nothing, verify everything.
Webcam accurately measures person's heart rate
The AI model detects subtle color changes in the face caused by blood flow to measure heart rate (BPM) and heart rate variability (HRV)
Your live AI job interview assistant:
Guy builds Whisper + GPT-4 live transcription tool for generating real-time responses during job interviews, and open-sourced the code.
(link to GitHub in comments)
YOLO NAS is a phenomenon.
A real-time object detector with <5 millisecond latency.
This will eventually be integrated into mobile cameras, and have “click to buy this item” functionality.
Would not be surprised to see Apple integrate this soon.
Unitree robot dog: $2400 on Amazon
Raspberry Pi: $57
Glock 17: $400
Ended 3D printer: $210
GPT-4 vision API: $0.01 per 1,000 tokens
So ~$3100 is the cost of a robot soldier in 2024.
Rumor is that Q-Star figured out a way to break encryption, and OpenAI tried to warn the NSA about it.
Here’s a Google doc link to a compilation of (allegedly) leaked documents and compelling analysis
Open-source project Soundstorm (AI generated speech from Google Research) is going to give Elevenlabs a run for it's money:
The text-to-speech project specializes in dialogue between multiple parties, and is available on Github:
Unbelievable AI editing from
@Flawlessai
maps out faces using performance tracking and 3D modeling, altering mouth/facial movements for full dialogue reconstruction.
Google’s Gemini Ultra was just confirmed for release on Wednesday.
Ultra beats GPT-4 in 7 out of 8 benchmark tests, and is the first model to outperform human experts on MMLU (massive multitask language understanding)
Cool project: GPT-3 powered Google Sheets
This plugin answers questions, formats cells, writes letters, and generates formulas, all from within the spreadsheet
@elonmusk
@DavidSacks
Feels like there was a bombshell variable that the public is comply unaware of.
Not even the extreme speculations seem like they would warrant firing Sam.
A table showing average yearly compensation for “AI Researchers” started circling around yesterday.
Here’s the list:
OpenAl……………………$865K
Anthropic……………….$855K
Inflection………………..$825K
Tesla……………………….$780K
Amazon………………….$719K
Google
We’re currently in a brief window of time where text-to-video is insanely impressive and hilariously imperfect at the same time.
AI Generated pizza commercial using Runway ML Gen-2
Yes.
OpenAI releasing Text-to-3D soon: SHAP-E
They dropped a research paper and Github code for a model called SHAP-E for creating 3D models through text prompts.
We are probably 1-2 months away from seeing Text-to-3D printers, as in text-to-object.
Apple is reportedly spending millions of dollars per day to train Ajax, its most advanced language model, which they believe to be more powerful than ChatGPT.
Though Apple may seem late to the game on AI, don't underestimate a company with a $300B research war chest.
A ChatGPT model generated a 500% return in the stock market (trading options) over a 15 month period by assigning a sentiment score to news articles about publicly traded companies.
Research by University of Florida's Dept. of Finance ↓
Google’s “Code Red” from 3 months ago?
New research published today by Google:
PaLM-E 562
The 562 billion parameter "embodied language model" incorporates continuous sensor data directly into language models.
Shutterstock, a publicly-traded stock image and video company, lost about $70M in market cap in the last 2 hours.
The company sells about $1B worth of stock photos and videos every year.
People are realizing AI-generated video and imagery could decimate this industry.
AI news never slows down.
Here's what happened today:
-AI CEOs meet at the White House
-Deepmind CEO says we'll have AGI in a few years
-Microsoft says AI is about to do some "real damage"
-Bing is open to everyone
-AI tool to extract data from PDFs
Here are the details ↓
Voice cloning now with translation - speak other languages in your own voice, now a feature of ElevenLabs
Here’s David Attenborough speaking flawless German:
🚨 Google CEO
@sundarpichai
announced it is planning to roll out its LaMDA language model within "coming weeks and months” during live earnings call today.
LaMDA = Language Model for Dialogue Applications.
Google's version of ChatGPT.
GPT-5, codenamed "Gobi" rumored for early 2024 release will handle video interpretation as part of its multimodality.
Altman says "What we launched [at Dev Day] is going to look very quaint relative to what we're busy creating for you now."
"[GPT-5] will work for "most things
What if devices could track your eye movement through sound?
Duke researchers have discovered that each eye movement produces unique sounds in the ear canal.
Sight-based interface could be enabled by simply wearing AirPods, of all things.