Anna Kazlauskas Profile Banner
Anna Kazlauskas Profile
Anna Kazlauskas

@anna_kazlauskas

1,062
Followers
124
Following
8
Media
106
Statuses

founder @withvana - user-owned AI models | prev: MIT, YC W18 ML startup, celo, janet yellen fan turned eth miner

San Francisco, CA
Joined April 2022
Don't wanna be here? Send us removal request.
Pinned Tweet
@anna_kazlauskas
Anna Kazlauskas
6 months
Today, models are primarily trained on the publicly scraped internet. What if 100M users contributed their private data from siloed platforms to create a user-owned foundation model?
15
20
142
@anna_kazlauskas
Anna Kazlauskas
1 month
Most of the TEE controversy assumes that TEEs are going to be used to replace zk or a blockchain, rather than complement them A well-designed system (like @withvana ) uses both crypto consensus where you don't trust hardware, and TEEs for privacy-specific applications
@_charlienoyes
Charlie Noyes
1 month
@0xfoobar Exactly what you rely on the TEE for depends on how you structure the application. In your example, eg require an zk proof for correctness and rely on TEE only for privacy. Basically in protocol design you should limit scope of dependencies, and favor simplicity + lindyness
4
2
21
2
142
138
@anna_kazlauskas
Anna Kazlauskas
2 months
The crypto x AI space is evolving so quickly. @caseykcaruso ’s market map is the first to capture how dynamic/alive the space feels
@caseykcaruso
CASΞY
2 months
Open sourcing a community-led market map of Decentralized AI. Most market maps are static and biased. We wanted to make one that was interactive and crowdsourced. Pls submit any projects that are missing. We implemented a 3d viz using @d3js_org and a
39
32
245
0
52
61
@anna_kazlauskas
Anna Kazlauskas
5 months
Launching the world’s first data DAO, focused on Reddit data, on the Vana network: @rdatadao
22
27
111
@anna_kazlauskas
Anna Kazlauskas
2 months
Usually, AI models are created and owned by a single company using public data. @rdatadao 's AI model is created and owned by a collective of 140k users who contributed their private data. User-owned data -> user-owned AI
@withvana
vana
2 months
Through a collaboration with @oraprotocol we’re helping @rdatadao turn their data into an on-chain AI for shitposting. 😳😂💩 This will be the first user-owned model trained on user-owned data.  The DAO plans to develop their early prototype () and launch
7
328
335
4
14
62
@anna_kazlauskas
Anna Kazlauskas
2 months
A great start to EthCC at Open Source AI day, hosted by @ekang426 ! I talked about user-owned foundation models and non-custodial data @withvana Incredible to see hundreds of people gathered to build towards user-owned AI. Also, enjoyed the mini @ZuBerlinCity reunion :)
Tweet media one
1
13
33
@anna_kazlauskas
Anna Kazlauskas
3 months
The current bottleneck in AI has shifted from compute to data. How do we overcome the great data wall? Satori Testnet is a step towards data abundance, pushing the frontiers of user-owned AI. It is designed to liberate data from walled gardens
3
8
13
@anna_kazlauskas
Anna Kazlauskas
2 months
At EthCC all week, it's crazy how many decentralized AI events there are! It was a little known category just a year ago. Find me throughout the week talking about user-owned AI
@withvana
vana
2 months
Meet the Vana team IRL at @EthCC in Brussels. We’ll be discussing the future of user-owned AI at these decentralized AI events:
Tweet media one
5
20
33
3
2
28
@anna_kazlauskas
Anna Kazlauskas
2 months
The more context an LLM has on you, the more helpful it is. But the cross-platform data (WhatsApp, iMessage, gmail) is stuck in walled gardens @withvana makes data portable, even from web2 walled gardens. Login w wallet -> all your data is there, just like your funds
@0xDesigner
0xDesigner
2 months
we won't be able to have the kind of life-changing experience with LLMs until we can truly absolutely positively own our data online
Tweet media one
6
4
68
3
1
26
@anna_kazlauskas
Anna Kazlauskas
5 months
Users should have control over their data and ownership in the models they help create
@TheBlock__
The Block
5 months
Paradigm-backed startup Vana launches DAO letting Reddit users control their personal data
22
61
197
6
3
25
@anna_kazlauskas
Anna Kazlauskas
2 months
Proof of work was great bc you could become an owner in the network w just an old computer. Proof of stake networks compound existing ownership.. hard to bring new people in that way Proof of contribution @withvana lets you earn ownership in the network with just your data
@binji_x
binji
2 months
we need less tokens that people can just buy and we need more tokens that people have to work for/earn in some sense, mining was a form of earning, but we need to evolve further and bring that into a more accessible vehicle for the masses; but also one that’s less passive.
44
15
139
1
4
21
@anna_kazlauskas
Anna Kazlauskas
5 months
It’s becoming so much easier (and more fun!) to build onchain. Many projects that make the user-owned internet easier to interact with have been heads-down building throughout the bear market.
2
2
21
@anna_kazlauskas
Anna Kazlauskas
6 months
Foundation models tend towards monopolies, but that's not the only option. We as users should create our own best foundation model — we have the data and compute to make it possible. Detailed post:
3
1
18
@anna_kazlauskas
Anna Kazlauskas
1 month
The data wall is one of the main barriers of AI progress. Current approaches: scrape the internet, buy data ($20-$200/response), license it from companies @withvana 's solution: give users true ownership in the AI models their data helps train
@martin_casado
martin_casado
1 month
Multi-turn data is incredibly expensive today. Single answers go for $20-$200 depending on quality. Assuming markets are somewhat efficient, this is a reasonable proxy for the atomic costs to aggregate "new data" for LLMs. Given (a) we've exhausted a non-trivial fraction of
26
12
213
0
0
12
@anna_kazlauskas
Anna Kazlauskas
6 months
Foundation models require a lot of compute, but users have a lot of that too. Ethereum miners’ combined compute is 50 times greater than that used to train leading foundation models.
Tweet media one
2
1
15
@anna_kazlauskas
Anna Kazlauskas
5 months
A data DAO has two main components: 1) onchain governance, with tokens earned for data contributions, and 2) a secure server, with a public-private key pair for encryption, where the community-owned dataset resides.
2
1
14
@anna_kazlauskas
Anna Kazlauskas
2 months
The true promise of crypto is digital sovereignty: own not just your funds, but your data. Make data non-custodial and private This is what @withvana is designed for - the first l1 for user-owned data. Fulfilling what got many of us into crypto in the first place
@jbrukh
Jake Brukhman @ NYC
2 months
Everyone out there thinks crypto people are here to trade tokens. They don’t realize we’re here to bail out their data and privacy.
18
11
82
1
2
15
@anna_kazlauskas
Anna Kazlauskas
5 months
A deep dive on data DAOs here:
2
2
15
@anna_kazlauskas
Anna Kazlauskas
5 months
Leading AI models only work because they are trained on your data (without your permission). Shouldn't you own a piece of the AI models that your data helps create?
1
1
14
@anna_kazlauskas
Anna Kazlauskas
5 months
r/datadao allows users to pool and govern their data, rewarding contributors with a dataset-specific token that represents ownership of the particular dataset. It’s a bit like a labor union for data.
1
3
12
@anna_kazlauskas
Anna Kazlauskas
2 months
Exciting to see the progress that data DAOs on Satori testnet are making. Many of the builders are early bittensor subnet creators or miners competing for one of the 16 spots on Vana - a really strong group of developers
@withvana
vana
2 months
Today, the 100th data DAO was deployed to Satori testnet. These data DAOs each implement a data liquidity pool (a new cryptoeconomic primitive) for a specific user-owned dataset, and advance the path towards a user-owned AI foundation model. Decentralized AI is accelerating :)
1
23
31
1
2
13
@anna_kazlauskas
Anna Kazlauskas
11 days
Great to learn more about @DlpLabs building out data DAOs on @withvana - Starting with resume data from LinkedIn - Focused on data licensing and already has a data buyer - Highlighted just how valuable LinkedIn data is, which they used to sell while working at LinkedIn
@withvana
vana
11 days
5
3
116
2
2
13
@anna_kazlauskas
Anna Kazlauskas
5 months
Reddit was one of the last web2 products that still felt community owned. Leading up to their IPO, incentives have shifted. The extract cycle of web2 risks ruining products we love Let's create a new, user-owned reddit:
@caseykcaruso
CASΞY
5 months
Does anyone else find it wild that Reddit sells YOUR data for millions of dollars and you have zero upside or ownership? The world is lucky to have the brilliant @anna_kazlauskas persistently working to make collectively owned data a reality.
6
4
49
1
2
13
@anna_kazlauskas
Anna Kazlauskas
5 months
The current path of society is to allow big tech to take our data and use it to train AI models that do our jobs. The only way to prevent this is through collective action. Data is currency, and collective data is power.
2
1
13
@anna_kazlauskas
Anna Kazlauskas
4 months
Congrats to @ErikVoorhees on the launch of @TryVenice . It strikes a balance for privacy-focused users who want convenience: it obfuscates data by mixing llm requests, similar to a crypto mixer, then just stores history encrypted in your browser
Tweet media one
4
1
11
@anna_kazlauskas
Anna Kazlauskas
6 months
This private data is valuable, as we've seen in recent deals like Google paying Reddit $60M a year for training data. And users have a lot of it: 100x more than the data used to train leading foundation models.
Tweet media one
1
0
11
@anna_kazlauskas
Anna Kazlauskas
2 months
On the increasing value of data in an AI-first world "So does AI change how much my data is worth? In isolation, probably not. But collectively? The value is there."
@gracekcarney
Grace Carney
2 months
AI is reshaping the data economy, but individual datasets remain undervalued. The real potential lies in data collectives, where users pool resources + own a stake in the value their data creates.
Tweet media one
2
0
18
0
2
11
@anna_kazlauskas
Anna Kazlauskas
6 months
As we start to rely on AI models more and more, they become our source of truth. We shouldn't let a single company control that truth. Google's AI is a recent example of this - do you want extreme wokeness to rewrite history? AI should be owned and controlled by users.
@stratejake
stratejake
6 months
I've never been so embarrassed to work for a company.
Tweet media one
1K
2K
43K
0
0
11
@anna_kazlauskas
Anna Kazlauskas
2 months
How to build user-owned AI: 1. liberate data from walled gardens 2. train a user-owned foundation model
4
0
10
@anna_kazlauskas
Anna Kazlauskas
4 months
Looking forward to the Decentralized AI days at Zuberlin, June 18-19!
@ZuBerlinCity
ZuBerlin
4 months
✨ZuBerlin is much more than just crypto! It's all about integrating various fields of cutting-edge technology, such as AI, and bringing diverse people together. 👥 We're super excited to have @anna_kazlauskas , founder of @withvana , with us! She’s part of curating an exceptional
Tweet media one
4
2
14
3
0
9
@anna_kazlauskas
Anna Kazlauskas
2 months
@rdatadao Model prototype:
Tweet media one
0
1
8
@anna_kazlauskas
Anna Kazlauskas
5 months
@usecapsule for seamless, portable wallets,
1
0
9
@anna_kazlauskas
Anna Kazlauskas
3 months
Great summary of the race for private data "Few platforms with abundant data accumulated organically over the years haven’t signed agreements with generative AI developers, it seems — from Photobucket to Tumblr to Q&A site Stack Overflow."
@Kyle_L_Wiggers
Kyle Wiggers
3 months
AI training data has a price tag that only Big Tech can afford
0
1
7
1
0
8
@anna_kazlauskas
Anna Kazlauskas
6 months
Users can export this data thanks to data regulation like GDPR and CCPA, allowing them to create the world's largest data treasury.
1
0
8
@anna_kazlauskas
Anna Kazlauskas
5 months
@withvana for personal data storage and permissions,
2
0
8
@anna_kazlauskas
Anna Kazlauskas
5 months
And @base + optimism crew for scaling ethereum to make this all possible
0
0
7
@anna_kazlauskas
Anna Kazlauskas
5 months
@snapshotlabs for simplified voting,
1
0
7
@anna_kazlauskas
Anna Kazlauskas
28 days
Building this @withvana , using proof of contribution to measure how original the training data is. Data contributors get paid based on how much the AI model they created from their data is used
@xanderatallah
Alex Atallah
28 days
Training data startup idea: - ask users to submit extremely original training data. Extremely original ≈ no known LLM can complete a substring. - sell the data to AI model companies for the user. If the user's data is incorporated in a model (it can now complete a substring),
4
2
26
2
0
7
@anna_kazlauskas
Anna Kazlauskas
1 month
Feels wrong when AI is taught based on your content and you don't own it Has economic implications too -> if someone built an AI @MKBHD trained on his data, shouldn't he be the one to own it and earn from it?
@MKBHD
Marques Brownlee
2 months
Apple has sourced data for their AI from several companies One of them scraped tons of data/transcripts from YouTube videos, including mine Apple technically avoids "fault" here because they're not the ones scraping But this is going to be an evolving problem for a long time
532
1K
21K
2
0
7
@anna_kazlauskas
Anna Kazlauskas
5 months
@commondotxyz for beautiful community tools,
1
1
7
@anna_kazlauskas
Anna Kazlauskas
17 days
So glad to be back in sf
Tweet media one
0
0
7
@anna_kazlauskas
Anna Kazlauskas
5 months
I love seeing their dedication start to enable real products that shift power back towards the individual. Some of the amazing projects that came together to make @rdatadao possible:
1
0
6
@anna_kazlauskas
Anna Kazlauskas
1 month
Apple published a full 2.5T token open source dataset This is rare - usually companies just publish AI model weights, not the underlying dataset (privacy/ownership/IP issues) Fun dataset to play with over the weekend
1
0
3
@anna_kazlauskas
Anna Kazlauskas
1 month
0
0
5
@anna_kazlauskas
Anna Kazlauskas
2 months
A big question in decentralized AI is distributed training. Token incentives can aggregate GPUs, but can those distributed GPUs work for foundation model training? Last week at @ZuBerlinCity , @flwrlabs presented a 1.3B param LLM trained in a fully federated way
2
0
5
@anna_kazlauskas
Anna Kazlauskas
3 months
Vana runs on proof-of-contribution, which rewards data contributors based on their data teaches an AI model Measuring and incentivizing data quality is hard, but is key to a user-owned foundation model
@withvana
vana
3 months
The latest on Satori Testnet from @CoinDesk
Tweet media one
4
26
33
2
0
4
@anna_kazlauskas
Anna Kazlauskas
1 month
The AI world needs data, but public scraping without the consent of data owners is not the answer @kwiens should own the AI models their content helps train, not the company that decided to scrape it
@kwiens
Kyle Wiens
1 month
Hey @AnthropicAI : I get you're hungry for data. Claude is really smart! But do you really need to hit our servers a million times in 24 hours? You're not only taking our content without paying, you're tying up our devops resources. Not cool.
103
895
11K
0
1
3
@anna_kazlauskas
Anna Kazlauskas
1 month
@rowancheung I like how fast the open source AI community is already getting this 800gb model running on consumer hardware
0
1
4
@anna_kazlauskas
Anna Kazlauskas
2 months
@KyleSamani > You can only own things that are naturally exclusionary Agreed, this is a "double spend problem for data" Data markets don't work unless you can exclude access to it, since why pay for something that you can just get for free Data is not exclusionary today but can be
0
0
3
@anna_kazlauskas
Anna Kazlauskas
26 days
@tommyeastman21 @withvana is building the data piece. Decentralized data contributions help overcome the data wall, which is the bottleneck in AI today
1
0
3
@anna_kazlauskas
Anna Kazlauskas
2 months
@packyM 100! From the Twitter data DAO () to a dating data DAO () to some of the less expected financial forecasting ones ()
1
0
3
@anna_kazlauskas
Anna Kazlauskas
5 months
@planet_nerf Love this. I think a lot of people don't realize how valuable high quality data will become in an AI-native world
1
0
2
@anna_kazlauskas
Anna Kazlauskas
2 months
@colludingnode @mikewho_eth @jbrukh The open source AI / local LLM community carries forward this ethos and has grown exponentially in the past year
0
0
2
@anna_kazlauskas
Anna Kazlauskas
12 days
@JonasWustrack @realGeorgeHotz @yacineMTB In the data case, decentralization has an advantage through data regulation, which ensures users can always export their data A centralized player can’t get access to Instagram, Reddit, X, Discord, YouTube since they’d get blocked or charged very high fees
1
0
2
@anna_kazlauskas
Anna Kazlauskas
18 days
0
0
1
@anna_kazlauskas
Anna Kazlauskas
1 month
+ their trained model
0
0
1
@anna_kazlauskas
Anna Kazlauskas
2 months
@keoneHD 5) forces you to clearly articulate your thinking 6) opportunity for feedback from others as you write 7) shows evolution of your thinking overtime (eg I wrote a 2022 post on user-owned AI models, then an update in 2024) 8) allows people to discover you through your writing
0
0
1
@anna_kazlauskas
Anna Kazlauskas
2 months
@Tyler_Did_It This is an amazing experiment @rdatadao you all could deploy the new AI shitposting model to see what happens when you set it free. Have it post autonomously and give it some tokens to get started
0
0
1
@anna_kazlauskas
Anna Kazlauskas
1 month
@lukedneumann The first of many cases to come If @MKBHD helped train Apple's AI model, then he should own part of it
0
0
1
@anna_kazlauskas
Anna Kazlauskas
7 days
@onunoreis @ETHGlobal Looking forward to it!
0
0
1
@anna_kazlauskas
Anna Kazlauskas
2 months
@srnlrsn @0xDesigner You actually legally own all your platform data bc of data regulation in the EU and California. You just grant a very permissive license to them to use it. Strava, Chase, Reddit, Twitter, all the data you’ve created is legally (although not yet in practice) yours
1
0
1
@anna_kazlauskas
Anna Kazlauskas
18 days
@timtimtim_eth Welcome welcome
0
0
1
@anna_kazlauskas
Anna Kazlauskas
11 days
0
0
1
@anna_kazlauskas
Anna Kazlauskas
27 days
@guywuolletjr @DavidVorick and glow are worth checking out, some of the furthest along in the decentralized energy space
0
0
1
@anna_kazlauskas
Anna Kazlauskas
6 days
@passportmonie Glad you liked it! Enjoyed all the questions
0
0
1
@anna_kazlauskas
Anna Kazlauskas
18 days
@SahilLalani0 @withvana Good question. Top of the list: iMessage data, Amazon purchase data, Google drive data, email data, Amazon Alexa data (audio), chatGPT exports. Rewind data would be good too @artieart88 any other sources to add from what you’ve seen in data sales?
1
0
1
@anna_kazlauskas
Anna Kazlauskas
2 months
@0xDesigner @withvana Short answer: non-custodial data Longer answer: - Export your platform data, leveraging data regulation - Encrypt your data w a key derived from your crypto wallet and store it w your storage provider of choice - Login w your wallet and sign a tx granting access to the data the
0
0
1
@anna_kazlauskas
Anna Kazlauskas
2 months
@seppuku_you @jbrukh Exactly. If you put private data onchain, someone can just take a copy of it and then you've lost your data/privacy None of the decentralized storage solutions work either as they're not GDPR compliant/designed for personal data We built to solve this
0
0
1