@knrd_z
This seems pretty old. Firstly, the SAT math section is like middle school, to anyone decently trained in math HS you can get full scores after like what, 10 hours of training and pattern recognition? I’m not sure if you can correlate SAT math skill/difficulty to STEM phd
Ever wondered how LLMs stack up against human crowdsource workers? I'm thrilled to share "TurkingBench", a benchmark of web-based tasks for multi-modal and interactive AI agents.
Draft:
Project:
Code:
@ONAN_OUS
Funny enough, when I walked into a glass wall the I was chatting with a recruiter waiting for some ice and he let me know that the org saw the tweet 👀
might have another meeting with HR when we get to the office... tho probs not since it seemed positive with smiles
@knrd_z
I don’t think so. Consistently, among test takers with high scores, its perfects on math and few mistakes on verbal. The math section is just definitely easier. Especially proof at the point loss differential. -30 for one math mistake, -0 to -10/20 on verbal mistake
Had a legendary time at Hudson yards yesterday on the edge. Caught a timelapse of a storm, saw a double rainbow and a stranger took a candid polaroid of my friend and I accidentally (lmao I think he took it too early)
I was still putting down my polaroid bag and I don’t know
@bubblebabyboi
@mcneilly_alex
Also, forgive me for my lack of knowledge. But what was the state of automotive jobs like back then? Was it high paying, didn’t know it was comparable to swe nowadays
@SahilBloom
Smart inventors and engineers have been around since forever, it’s amazing to see what they can come up with with different and limited resources through the times
Grateful to the wonderful team behind this project that has been going on for ~2 years! I have a tremendous amount of gratitude towards my PI
@danielkhashabi
who gave us a lot of help and direction, and contributions from
@yeganekordi
@yizhongwyz
@kesnet50
Adam Byerly
Birthday!! 🎂 🥳
And my first push at
@Google
just hit production in some data centers in the US!
Search up some location like “London” on mobile view and look for some Short video results and you got the “More short videos!” button.
TurkingBench is a benchmark of web-based tasks containing textual instructions with multi-modal context. Unlike most existing benchmarks that employ synthesized web pages, here we use natural web pages originally designed for crowdsourcing workers for various annotation purposes.
@chrissyykat
Is this more like l: in HS you have more of a direction towards “success” but you lose that North Star in the infinite possibilities in college?
World wide exclusive
@porterrobinson
league parody of everything goes on sung by porter!
Went to the Porter concert today in NYC, did not disappoint!
(Sorry for cutting to my friend a lot, wanted to see his reaction to it. He’s not very expressive 😥)
We develop an evaluation framework that allows seamless interaction between AI and web pages. Our tasks are served on a web application through Turkle and the model programmatically accesses its content, responding with actions that are evaluated through our model.
Today I solved a Facebook Interview Question (LeetCode Hard) since I was feeling a little bored. Typically I don't do LeetCode since the problems there are typically too straightforward to be a challenge (here's a common topic, implement it).
Solved: 👇
Yo my friend is starting to try out competitive programming. Maybe I should regain my dignity cuz my friend is hard bullying me 😭 (not tony, another one)
Within each task, the HTML instructions and answer choices are instantiated with various values from the crowdsourcing tasks to form unique instances of the task. In totality, the benchmark contains 36.2K instances distributed across 158 tasks.
Leetcode mistyped Execute in Execute error in their interview mode (ignore the fact that I made an error in my initial implementation, i make no bugs, i code perfectly)