Anna Kazlauskas @anna_kazlauskas Twitter profile

Pinned Tweet

Anna Kazlauskas

6 months

Today, models are primarily trained on the publicly scraped internet. What if 100M users contributed their private data from siloed platforms to create a user-owned foundation model?

15

20

142

Last Seen Profiles

@receeppkaya

@hazieem25jan

@three9moon

@bader222abha

@EttimItiraf

@mercy_holidays

@ShenleyOfficial

@georgesantosfan

@MisterJohnDoyle

@S35hm

@TSzeri078

@milkyuez

@toniqMarket

@dollbabymani

@VelcroBaribari

@CryptoNewsUpd8s

@JeseniaWhyte

@_aisha_192516

@roussellroy

@said_alkamli

@swavymillee001

@runawaypacy

@umememumemem

@Pdiddy_0905

@Sharon_531

@JP_Excelsior

@rorollic

@SoftStuffings

@niwenjunkooklix

@kkpapua1

@nickiatquest

@sextaplaytv_

@say_shannon

@Kuni_Cheesecake

@rate_SOJ

@whizwang

Anna Kazlauskas

@anna_kazlauskas

1 month

Most of the TEE controversy assumes that TEEs are going to be used to replace zk or a blockchain, rather than complement them A well-designed system (like @withvana ) uses both crypto consensus where you don't trust hardware, and TEEs for privacy-specific applications

Charlie Noyes

@_charlienoyes

1 month

@0xfoobar Exactly what you rely on the TEE for depends on how you structure the application. In your example, eg require an zk proof for correctness and rely on TEE only for privacy. Basically in protocol design you should limit scope of dependencies, and favor simplicity + lindyness

4

2

21

2

142

138

Anna Kazlauskas

@anna_kazlauskas

2 months

The crypto x AI space is evolving so quickly. @caseykcaruso ’s market map is the first to capture how dynamic/alive the space feels

CASΞY

@caseykcaruso

2 months

Open sourcing a community-led market map of Decentralized AI. Most market maps are static and biased. We wanted to make one that was interactive and crowdsourced. Pls submit any projects that are missing. We implemented a 3d viz using @d3js_org and a

39

32

245

0

52

61

Anna Kazlauskas

@anna_kazlauskas

5 months

Launching the world’s first data DAO, focused on Reddit data, on the Vana network: @rdatadao

22

27

111

Anna Kazlauskas

@anna_kazlauskas

2 months

Usually, AI models are created and owned by a single company using public data. @rdatadao 's AI model is created and owned by a collective of 140k users who contributed their private data. User-owned data -> user-owned AI

vana

@withvana

2 months

Through a collaboration with @oraprotocol we’re helping @rdatadao turn their data into an on-chain AI for shitposting. 😳😂💩 This will be the first user-owned model trained on user-owned data. The DAO plans to develop their early prototype () and launch

7

328

335

4

14

62

Anna Kazlauskas

@anna_kazlauskas

2 months

A great start to EthCC at Open Source AI day, hosted by @ekang426 ! I talked about user-owned foundation models and non-custodial data @withvana Incredible to see hundreds of people gathered to build towards user-owned AI. Also, enjoyed the mini @ZuBerlinCity reunion :)

1

13

33

Anna Kazlauskas

@anna_kazlauskas

3 months

The current bottleneck in AI has shifted from compute to data. How do we overcome the great data wall? Satori Testnet is a step towards data abundance, pushing the frontiers of user-owned AI. It is designed to liberate data from walled gardens

Introducing the Satori Testnet for User-Owned Data

The Satori Testnet, an early implementation of Vana, is now live! It is a distributed network that runs on computers all over the world—including yours if you want—to enable people to collectively...

www.vana.org

3

8

13

Anna Kazlauskas

@anna_kazlauskas

2 months

At EthCC all week, it's crazy how many decentralized AI events there are! It was a little known category just a year ago. Find me throughout the week talking about user-owned AI

vana

@withvana

2 months

Meet the Vana team IRL at @EthCC in Brussels. We’ll be discussing the future of user-owned AI at these decentralized AI events:

5

20

33

3

2

28

Anna Kazlauskas

@anna_kazlauskas

2 months

The more context an LLM has on you, the more helpful it is. But the cross-platform data (WhatsApp, iMessage, gmail) is stuck in walled gardens @withvana makes data portable, even from web2 walled gardens. Login w wallet -> all your data is there, just like your funds

0xDesigner

@0xDesigner

2 months

we won't be able to have the kind of life-changing experience with LLMs until we can truly absolutely positively own our data online

6

4

68

3

1

26

Anna Kazlauskas

@anna_kazlauskas

5 months

Users should have control over their data and ownership in the models they help create

The Block

@TheBlock__

5 months

Paradigm-backed startup Vana launches DAO letting Reddit users control their personal data

22

61

197

6

3

25

Anna Kazlauskas

@anna_kazlauskas

2 months

Proof of work was great bc you could become an owner in the network w just an old computer. Proof of stake networks compound existing ownership.. hard to bring new people in that way Proof of contribution @withvana lets you earn ownership in the network with just your data

binji

@binji_x

2 months

we need less tokens that people can just buy and we need more tokens that people have to work for/earn in some sense, mining was a form of earning, but we need to evolve further and bring that into a more accessible vehicle for the masses; but also one that’s less passive.

44

15

139

1

4

21

Anna Kazlauskas

@anna_kazlauskas

5 months

It’s becoming so much easier (and more fun!) to build onchain. Many projects that make the user-owned internet easier to interact with have been heads-down building throughout the bear market.

2

21

Anna Kazlauskas

@anna_kazlauskas

6 months

Foundation models tend towards monopolies, but that's not the only option. We as users should create our own best foundation model — we have the data and compute to make it possible. Detailed post:

User-Owned Foundation Models (2024 Update)

An update on user-owned foundation models

anna.kazlausk.as

3

1

18

Anna Kazlauskas

@anna_kazlauskas

1 month

The data wall is one of the main barriers of AI progress. Current approaches: scrape the internet, buy data ($20-$200/response), license it from companies @withvana 's solution: give users true ownership in the AI models their data helps train

martin_casado

@martin_casado

1 month

Multi-turn data is incredibly expensive today. Single answers go for $20-$200 depending on quality. Assuming markets are somewhat efficient, this is a reasonable proxy for the atomic costs to aggregate "new data" for LLMs. Given (a) we've exhausted a non-trivial fraction of

26

12

213

0

12

Anna Kazlauskas

@anna_kazlauskas

6 months

Foundation models require a lot of compute, but users have a lot of that too. Ethereum miners’ combined compute is 50 times greater than that used to train leading foundation models.

2

1

15

Anna Kazlauskas

@anna_kazlauskas

5 months

A data DAO has two main components: 1) onchain governance, with tokens earned for data contributions, and 2) a secure server, with a public-private key pair for encryption, where the community-owned dataset resides.

2

1

14

Anna Kazlauskas

@anna_kazlauskas

2 months

The true promise of crypto is digital sovereignty: own not just your funds, but your data. Make data non-custodial and private This is what @withvana is designed for - the first l1 for user-owned data. Fulfilling what got many of us into crypto in the first place

Jake Brukhman @ NYC

@jbrukh

2 months

Everyone out there thinks crypto people are here to trade tokens. They don’t realize we’re here to bail out their data and privacy.

18

11

82

1

2

15

Anna Kazlauskas

@anna_kazlauskas

5 months

A deep dive on data DAOs here:

Data DAOs: The Path Towards a User-Owned Internet

anna.kazlausk.as

2

15

Anna Kazlauskas

@anna_kazlauskas

5 months

Leading AI models only work because they are trained on your data (without your permission). Shouldn't you own a piece of the AI models that your data helps create?

1

14

Anna Kazlauskas

@anna_kazlauskas

5 months

r/datadao allows users to pool and govern their data, rewarding contributors with a dataset-specific token that represents ownership of the particular dataset. It’s a bit like a labor union for data.

1

3

12

Anna Kazlauskas

@anna_kazlauskas

2 months

Exciting to see the progress that data DAOs on Satori testnet are making. Many of the builders are early bittensor subnet creators or miners competing for one of the 16 spots on Vana - a really strong group of developers

vana

@withvana

2 months

Today, the 100th data DAO was deployed to Satori testnet. These data DAOs each implement a data liquidity pool (a new cryptoeconomic primitive) for a specific user-owned dataset, and advance the path towards a user-owned AI foundation model. Decentralized AI is accelerating :)

1

23

31

1

2

13

Anna Kazlauskas

@anna_kazlauskas

11 days

Great to learn more about @DlpLabs building out data DAOs on @withvana - Starting with resume data from LinkedIn - Focused on data licensing and already has a data buyer - Highlighted just how valuable LinkedIn data is, which they used to sell while working at LinkedIn

vana

@withvana

11 days

5

3

116

2

13

Anna Kazlauskas

@anna_kazlauskas

5 months

Reddit was one of the last web2 products that still felt community owned. Leading up to their IPO, incentives have shifted. The extract cycle of web2 risks ruining products we love Let's create a new, user-owned reddit:

CASΞY

@caseykcaruso

5 months

Does anyone else find it wild that Reddit sells YOUR data for millions of dollars and you have zero upside or ownership? The world is lucky to have the brilliant @anna_kazlauskas persistently working to make collectively owned data a reality.

6

4

49

1

2

13

Anna Kazlauskas

@anna_kazlauskas

5 months

The current path of society is to allow big tech to take our data and use it to train AI models that do our jobs. The only way to prevent this is through collective action. Data is currency, and collective data is power.

2

1

13

Anna Kazlauskas

@anna_kazlauskas

4 months

Congrats to @ErikVoorhees on the launch of @TryVenice . It strikes a balance for privacy-focused users who want convenience: it obfuscates data by mixing llm requests, similar to a crypto mixer, then just stores history encrypted in your browser

4

1

11

Anna Kazlauskas

@anna_kazlauskas

6 months

This private data is valuable, as we've seen in recent deals like Google paying Reddit $60M a year for training data. And users have a lot of it: 100x more than the data used to train leading foundation models.

1

0

11

Anna Kazlauskas

@anna_kazlauskas

2 months

On the increasing value of data in an AI-first world "So does AI change how much my data is worth? In isolation, probably not. But collectively? The value is there."

Grace Carney

@gracekcarney

2 months

AI is reshaping the data economy, but individual datasets remain undervalued. The real potential lies in data collectives, where users pool resources + own a stake in the value their data creates.

2

0

18

0

2

11

Anna Kazlauskas

@anna_kazlauskas

6 months

As we start to rely on AI models more and more, they become our source of truth. We shouldn't let a single company control that truth. Google's AI is a recent example of this - do you want extreme wokeness to rewrite history? AI should be owned and controlled by users.

stratejake

@stratejake

6 months

I've never been so embarrassed to work for a company.

1K

2K

43K

0

11

Anna Kazlauskas

@anna_kazlauskas

2 months

How to build user-owned AI: 1. liberate data from walled gardens 2. train a user-owned foundation model

4

0

10

Anna Kazlauskas

@anna_kazlauskas

4 months

Looking forward to the Decentralized AI days at Zuberlin, June 18-19!

ZuBerlin

@ZuBerlinCity

4 months

✨ZuBerlin is much more than just crypto! It's all about integrating various fields of cutting-edge technology, such as AI, and bringing diverse people together. 👥 We're super excited to have @anna_kazlauskas , founder of @withvana , with us! She’s part of curating an exceptional

4

2

14

3

0

9

Anna Kazlauskas

@anna_kazlauskas

2 months

@rdatadao Model prototype:

0

1

8

Anna Kazlauskas

@anna_kazlauskas

5 months

@usecapsule for seamless, portable wallets,

1

0

9

Anna Kazlauskas

@anna_kazlauskas

3 months

Great summary of the race for private data "Few platforms with abundant data accumulated organically over the years haven’t signed agreements with generative AI developers, it seems — from Photobucket to Tumblr to Q&A site Stack Overflow."

Kyle Wiggers

@Kyle_L_Wiggers

3 months

AI training data has a price tag that only Big Tech can afford

0

1

7

1

0

8

Anna Kazlauskas

@anna_kazlauskas

6 months

Users can export this data thanks to data regulation like GDPR and CCPA, allowing them to create the world's largest data treasury.

1

0

8

Anna Kazlauskas

@anna_kazlauskas

5 months

@withvana for personal data storage and permissions,

2

0

8

Anna Kazlauskas

@anna_kazlauskas

5 months

And @base + optimism crew for scaling ethereum to make this all possible

0

7

Anna Kazlauskas

@anna_kazlauskas

5 months

@snapshotlabs for simplified voting,

1

0

7

Anna Kazlauskas

@anna_kazlauskas

28 days

Building this @withvana , using proof of contribution to measure how original the training data is. Data contributors get paid based on how much the AI model they created from their data is used

Alex Atallah

@xanderatallah

28 days

Training data startup idea: - ask users to submit extremely original training data. Extremely original ≈ no known LLM can complete a substring. - sell the data to AI model companies for the user. If the user's data is incorporated in a model (it can now complete a substring),

4

2

26

2

0

7

Anna Kazlauskas

@anna_kazlauskas

1 month

Feels wrong when AI is taught based on your content and you don't own it Has economic implications too -> if someone built an AI @MKBHD trained on his data, shouldn't he be the one to own it and earn from it?

Marques Brownlee

@MKBHD

2 months

Apple has sourced data for their AI from several companies One of them scraped tons of data/transcripts from YouTube videos, including mine Apple technically avoids "fault" here because they're not the ones scraping But this is going to be an evolving problem for a long time

532

1K

21K

2

0

7

Anna Kazlauskas

@anna_kazlauskas

5 months

@commondotxyz for beautiful community tools,

1

7

Anna Kazlauskas

@anna_kazlauskas

17 days

So glad to be back in sf

0

7

Anna Kazlauskas

@anna_kazlauskas

5 months

I love seeing their dedication start to enable real products that shift power back towards the individual. Some of the amazing projects that came together to make @rdatadao possible:

1

0

6

Anna Kazlauskas

@anna_kazlauskas

1 month

Apple published a full 2.5T token open source dataset This is rare - usually companies just publish AI model weights, not the underlying dataset (privacy/ownership/IP issues) Fun dataset to play with over the weekend

mlfoundations/dclm-baseline-1.0 · Datasets at Hugging Face

huggingface.co

1

0

3

Anna Kazlauskas

@anna_kazlauskas

1 month

@JosephJacks_ @opentensor @AIWayfinder @akashnet_ @withvana building user-owned AI through user-owned data

0

5

Anna Kazlauskas

@anna_kazlauskas

2 months

A big question in decentralized AI is distributed training. Token incentives can aggregate GPUs, but can those distributed GPUs work for foundation model training? Last week at @ZuBerlinCity , @flwrlabs presented a 1.3B param LLM trained in a fully federated way

2

0

5

Anna Kazlauskas

@anna_kazlauskas

3 months

Vana runs on proof-of-contribution, which rewards data contributors based on their data teaches an AI model Measuring and incentivizing data quality is hard, but is key to a user-owned foundation model

vana

@withvana

3 months

The latest on Satori Testnet from @CoinDesk

4

26

33

2

0

4

Anna Kazlauskas

@anna_kazlauskas

1 month

The AI world needs data, but public scraping without the consent of data owners is not the answer @kwiens should own the AI models their content helps train, not the company that decided to scrape it

Kyle Wiens

@kwiens

1 month

Hey @AnthropicAI : I get you're hungry for data. Claude is really smart! But do you really need to hit our servers a million times in 24 hours? You're not only taking our content without paying, you're tying up our devops resources. Not cool.

103

895

11K

0

1

3

Anna Kazlauskas

@anna_kazlauskas

1 month

@rowancheung I like how fast the open source AI community is already getting this 800gb model running on consumer hardware

From the LocalLLaMA community on Reddit

Explore this post and more from the LocalLLaMA community

www.reddit.com

0

1

4

Anna Kazlauskas

@anna_kazlauskas

2 months

@KyleSamani > You can only own things that are naturally exclusionary Agreed, this is a "double spend problem for data" Data markets don't work unless you can exclude access to it, since why pay for something that you can just get for free Data is not exclusionary today but can be

0

3

Anna Kazlauskas

@anna_kazlauskas

26 days

@tommyeastman21 @withvana is building the data piece. Decentralized data contributions help overcome the data wall, which is the bottleneck in AI today

1

0

3

Anna Kazlauskas

@anna_kazlauskas

2 months

@ai @withvana @fredwilson Love USV's thinking on this @rebeccakaden laid it out nicely here:

Collision Course

We are in the middle of two significant technological shifts happening simultaneously and also intertwined. AI and web3 may have different starting p...

blog.usv.com

1

0

3

Anna Kazlauskas

@anna_kazlauskas

2 months

@packyM 100! From the Twitter data DAO () to a dating data DAO () to some of the less expected financial forecasting ones ()

1

0

3

Anna Kazlauskas

@anna_kazlauskas

5 months

@planet_nerf Love this. I think a lot of people don't realize how valuable high quality data will become in an AI-native world

1

0

2

Anna Kazlauskas

@anna_kazlauskas

2 months

@colludingnode @mikewho_eth @jbrukh The open source AI / local LLM community carries forward this ethos and has grown exponentially in the past year

0

2

Anna Kazlauskas

@anna_kazlauskas

12 days

@JonasWustrack @realGeorgeHotz @yacineMTB In the data case, decentralization has an advantage through data regulation, which ensures users can always export their data A centralized player can’t get access to Instagram, Reddit, X, Discord, YouTube since they’d get blocked or charged very high fees

1

0

2

Anna Kazlauskas

@anna_kazlauskas

18 days

@SahilLalani0 @withvana @artieart88 Love it, will dm

0

1

Anna Kazlauskas

@anna_kazlauskas

1 month

+ their trained model

apple/DCLM-7B · Hugging Face

huggingface.co

0

1

Anna Kazlauskas

@anna_kazlauskas

2 months

@keoneHD 5) forces you to clearly articulate your thinking 6) opportunity for feedback from others as you write 7) shows evolution of your thinking overtime (eg I wrote a 2022 post on user-owned AI models, then an update in 2024) 8) allows people to discover you through your writing

0

1

Anna Kazlauskas

@anna_kazlauskas

2 months

@Tyler_Did_It This is an amazing experiment @rdatadao you all could deploy the new AI shitposting model to see what happens when you set it free. Have it post autonomously and give it some tokens to get started

0

1

Anna Kazlauskas

@anna_kazlauskas

1 month

@lukedneumann The first of many cases to come If @MKBHD helped train Apple's AI model, then he should own part of it

0

1

Anna Kazlauskas

@anna_kazlauskas

7 days

@onunoreis @ETHGlobal Looking forward to it!

0

1

Anna Kazlauskas

@anna_kazlauskas

2 months

@srnlrsn @0xDesigner You actually legally own all your platform data bc of data regulation in the EU and California. You just grant a very permissive license to them to use it. Strava, Chase, Reddit, Twitter, all the data you’ve created is legally (although not yet in practice) yours

1

0

1

Anna Kazlauskas

@anna_kazlauskas

18 days

@timtimtim_eth Welcome welcome

0

1

Anna Kazlauskas

@anna_kazlauskas

11 days

@DlpLabs Gm gm

0

1

Anna Kazlauskas

@anna_kazlauskas

27 days

@guywuolletjr @DavidVorick and glow are worth checking out, some of the furthest along in the decentralized energy space

0

1

Anna Kazlauskas

@anna_kazlauskas

6 days

@passportmonie Glad you liked it! Enjoyed all the questions

0

1

Anna Kazlauskas

@anna_kazlauskas

18 days

@SahilLalani0 @withvana Good question. Top of the list: iMessage data, Amazon purchase data, Google drive data, email data, Amazon Alexa data (audio), chatGPT exports. Rewind data would be good too @artieart88 any other sources to add from what you’ve seen in data sales?

1

0

1

Anna Kazlauskas

@anna_kazlauskas

2 months

@0xDesigner @withvana Short answer: non-custodial data Longer answer: - Export your platform data, leveraging data regulation - Encrypt your data w a key derived from your crypto wallet and store it w your storage provider of choice - Login w your wallet and sign a tx granting access to the data the

0

1

Anna Kazlauskas

@anna_kazlauskas

2 months

@seppuku_you @jbrukh Exactly. If you put private data onchain, someone can just take a copy of it and then you've lost your data/privacy None of the decentralized storage solutions work either as they're not GDPR compliant/designed for personal data We built to solve this

Vana - The First Network for User-Owned Data and Decentralized AI

Welcome to Vana, where users control their data and contribute to decentralized AI. Explore data DAOs, marketplaces, and projects like r/datadao, Volara, and Flirtual.

www.vana.org

0

1