AI4Bharat Profile Banner
AI4Bharat Profile
AI4Bharat

@ai4bharat

3,531
Followers
135
Following
20
Media
138
Statuses

The focus of AI4Bhārat, an initiative of IIT-Madras, is on building open-source language AI for Indian languages, including datasets, models, and applications.

India
Joined May 2019
Don't wanna be here? Send us removal request.
Pinned Tweet
@ai4bharat
AI4Bharat
2 years
We are pleased to announce the launch of the Nilekani Center at AI4Bharat, IIT Madras on 28th July. The Center's mission is to innovate on open-source Indian language technology with the intention to create societal impact.
Tweet media one
2
32
104
@ai4bharat
AI4Bharat
8 months
📣 📣 📣 New instruction-tuned LLM! 📣 📣 📣 Today, we announce an initial release of "Airavata", an instruction-tuned LLM for Hindi. Blog: Model: Datasets: (1/N)
Tweet media one
8
86
376
@ai4bharat
AI4Bharat
1 year
A course on LLMs will be offered by Prof Mitesh Khapra. If you are a beginner or have some experience and looking to deepen your knowledge then this course is for you. Right from theory and fundamentals to LLMs in practice everything will be covered.
4
54
276
@ai4bharat
AI4Bharat
9 months
We are pleased to announce that we will begin recruiting AI residents (and associates) for 2024-25. The AI resident program is an year long pre-doctoral program which allows you to work intensively on NLP, Speech and Vision projects. Apply below:
6
53
239
@ai4bharat
AI4Bharat
7 months
🚀IndicLLMSuite Launch Announcement!🚀 We're thrilled to unveil IndicLLMSuite: A collection of data resources and tools for developing Indic LLMs. 📜 Paper: 🌐 Blog (the way forward): 💻 Resources: (1/n)
Tweet media one
4
52
204
@ai4bharat
AI4Bharat
7 months
🎉 🎉 🎉 Presenting our blog on IndicVoices! IndicVoices is an ongoing journey spanning 16,237 speakers, 145 Indian districts and 22 Indic languages! Blog: Paper: Dataset: Kindly help spread the word!
2
44
156
@ai4bharat
AI4Bharat
10 months
🎉 Exciting News! 🚀 We are thrilled to announce the launch of our AI4Bharat Blog! 🌐✨ Our goal: Empower researchers to share their work with a wide audience. 🧠💻 Debut post: IndicTrans2-M2M, our groundbreaking system for 22 languages! 🗣️🌐
2
27
115
@ai4bharat
AI4Bharat
10 months
Slowly but surely we have reached 2000 followers. So much more to do! A big thank you to all the members of AI4BHARAT as well as people who have supported us for striving to push the boundaries of open source AI research for India! 🇮🇳
Tweet media one
1
4
67
@ai4bharat
AI4Bharat
2 years
The Nilekani Centre at AI4Bharat, IIT Madras is hiring TRANSCRIPTIONISTS in all of the 22 official Indian Languages. This is a remote (WFH), full-time position, with flexi-hours. Selected candidates will go through tests (speech/audio-to-text) prior to being hired.
2
26
60
@ai4bharat
AI4Bharat
9 months
Question: What do you do when you have a video in one language but your audience comprises of people speaking 2000 other languages? Answer: You use Chitralekha! Presenting our 3rd blog of our blog series.
1
10
55
@ai4bharat
AI4Bharat
9 months
Please read our paper to find out about how we pushed the boundaries for Indian language MT. We hope to present our work at @iclr_conf if alloted a slot. This is just the beginning of our journey to tackle Indian languages MT. Next up: Dialects and codemixing. Stay tuned!
@TmlrPub
Accepted papers at TMLR
9 months
IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled ... Jay Gala, Pranjal A Chitale, A K Raghavan et al.. Action editor: W Ronny Huang. #corpus #multilingual #corpora
0
8
30
1
2
51
@ai4bharat
AI4Bharat
8 months
📣 📣 📣 Do you want to create open-source datasets at scale? If so, then our blog detailing Shoonya is for you! Blog: GitHub: Video: Please give it a read and consider adopting it!
0
13
53
@ai4bharat
AI4Bharat
2 years
Dr. Pratyush Kumar announcing the open source release of Shoonya and Chitralekha tools
Tweet media one
1
3
47
@ai4bharat
AI4Bharat
1 year
As a result of our rapidly growing team, we are in need of a system admin. Kindly reach out if you are interested.
1
13
41
@ai4bharat
AI4Bharat
10 months
🚨 We are happy to present our second blog on IndicMT Eval acceped to ACL23! While training models is important, it is meaningless without evaluation. But which evaluation metric is reliable, especially for Indic languages? Our blog has answers for you:
1
5
37
@ai4bharat
AI4Bharat
11 months
A huge achievement for us all!
@prajdabre1
Raj Dabre
11 months
I am extremely pleased to announce that IndicTrans2 will be published in TMLR ( @TmlrOrg ). This is a tremendous achievement for my coauthors and me that took nearly 1.5 years of hard work. The camera ready version will be out soon but for now we are over the moon! #NLProc #ACL
8
10
107
1
2
36
@ai4bharat
AI4Bharat
2 years
We have made large scale contributions and hope to continue doing for more domains and languages. The roadmap for AI4Bharat going forward
Tweet media one
0
7
35
@ai4bharat
AI4Bharat
2 years
@ai4bharat , a center at @iitmadras , is excited to announce the 1st AI4B Summer of Code! If you are passionate about contributing to the nation with open-source AI tools and apps for language and speech tech, then consider applying through the links below before 20th April.
1
13
32
@ai4bharat
AI4Bharat
1 year
Multiple acceptances in @emnlpmeeting thanks to students, researchers and collaborators. 1. CTQScorer: Combining Multiple Features for In-context Example Selection for Machine Translation Authors: @NameIsAshwanth , @ratishsp , @prajdabre1 , @anoopk (Findings) #EMNLP2023 #NLProc
1
1
31
@ai4bharat
AI4Bharat
3 years
We are happy to share that IndicTrans Model is now available on @huggingface Spaces with @Gradio Indic2En - En2Indic - We welcome you to try out the model. Feedback would be appreciated
0
5
29
@ai4bharat
AI4Bharat
9 months
🚨🚨🚨 Important announcement! We have identified some suspicious websites as follows: : Uses AI4Bharat logo for an investment app. : A basic website serving an unknown purpose. Please be wary of such sites and spread the word.
3
12
29
@ai4bharat
AI4Bharat
2 years
Checkout latest work from our lab. We create a strong benchmark (IndicXTREME), upgraded the monolingual corpus (IndicCorp v2) and release new models with various ablations (IndicBERT v2) for all the constitutionally recognised Indian languages. All the artefacts are open sourced
@sumanthd17
Sumanth
2 years
New Paper 🚨 IndicXTREME: A Multi-Task Benchmark For Evaluating Indic Languages We introduce IndicXTREME, a diverse benchmark of 9 tasks covering 18 Indian languages. We maintain high quality by using human supervision to create all the test sets. [1/8]
4
15
69
0
6
21
@ai4bharat
AI4Bharat
2 years
It is a pleasure to welcome you to the launch of the Nilekani Center at AI4Bharat, IIT Madras! We will start live-streaming the launch at approximately 10.45 am. All the zoom links are available on our website.
0
3
18
@ai4bharat
AI4Bharat
7 months
📊 First comes Sangraha, the largest Indic language corpus spanning Verified (64B), Unverified (24B), Synthetic (162B) tokens. Verified - Web, PDF and Speech data Unverified - Other multilingual corpora Synthetic - Large scale translations and transliterations of Wikimedia (2/n)
Tweet media one
1
2
17
@ai4bharat
AI4Bharat
2 years
Thanks to everyone who joined us and made the event a grand success. We are energised to keep working towards advancing Speech & NLP tech for Indian languages. We will continue to truly and really open-source all our data, code, models and benchmarks. #NLProc @iitmadras
Tweet media one
0
1
16
@ai4bharat
AI4Bharat
8 months
Airavata was created by fine-tuning OpenHathi with diverse, instruction-tuning Hindi datasets to make it better suited for assistive tasks. Along with the model, we also share the instruction tuning datasets to enable further research for IndicLLMs. (3/N)
3
2
17
@ai4bharat
AI4Bharat
2 years
Our students going over posters for ASR, NLU, NLG, Translation, Transliteration @sumanthd17 @kaushal_py @tahirjmakhdoomi @gowtham_ramesh1
Tweet media one
Tweet media two
0
2
17
@ai4bharat
AI4Bharat
8 months
We also compile a collection of evaluation benchmarks along with an evaluation framework to compare various LLMs for their abilities on diverse tasks when instructed in Hindi. Currently, Airavata supports Hindi, but we plan to expand this to all 22 scheduled Indic languages! 4/4
2
1
14
@ai4bharat
AI4Bharat
2 years
Our IndicNLG model now live on @huggingface spaces!!
@prajdabre1
Raj Dabre
2 years
Tired of downloading IndicNLG models to play with them locally? Say no more: @ai4bharat
2
3
5
0
2
13
@ai4bharat
AI4Bharat
7 months
🔧To create this we developed Setu, an Apache Spark-based distributed open-sourced data-cleaning pipeline. (3/n)
Tweet media one
1
0
13
@ai4bharat
AI4Bharat
10 months
Don't forget to say hi to students, collaborators and researchers from AI4BHARAT at #EMNLP2023 , especially if you are working on Indian languages. @anoopk @jaygala24 @pranjalchitale @sumanthd17 @yashmadhani_ @ratishsp .
0
2
12
@ai4bharat
AI4Bharat
2 years
We are looking for people with - 1. Excellent command over mother tongue/chosen language. 2. Excellent listening comprehension. 3. Attention to detail. 4. Qualification: Graduate or PG with any Indian language as a core subject at UG &/or PG level. Salary: Rs.25,000-30,000 pm
1
2
11
@ai4bharat
AI4Bharat
9 months
We have only one official website: AI4BHARAT is a non-profit organisation so anything involving money does not represent us.
0
2
12
@ai4bharat
AI4Bharat
2 years
We are grateful for your support and will put the resources to the best use #OpenSource #AI4Bharat #NLProc
@NandanNilekani
Nandan Nilekani
2 years
Rohini and I are thrilled to support this endeavour to build open-source language AI for 22 Indian languages!
35
222
1K
0
1
11
@ai4bharat
AI4Bharat
3 years
AI4Bharat is excited to announce a talk on "RNN-T ASR Systems and Enabling Contextualization For RNN-T ASR Systems" by @mahajain3 this Saturday, March 19th, 9:00 to 10:45 AM. [1/n]
Tweet media one
1
3
10
@ai4bharat
AI4Bharat
2 years
Shri. Nandan Nilekani addressing the gathering
Tweet media one
0
0
9
@ai4bharat
AI4Bharat
3 years
As we gather today on 73rd Republic Day, to celebrate the mighty strength, rich heritage and cultural diversity of this great nation, we at AI4Bharat are proud to add to the technological advancements towards solving speech & language problems [1/n]
Tweet media one
1
7
9
@ai4bharat
AI4Bharat
2 years
On 28th July, in an event at IIT Madras, the center would be inaugurated by Rohini and @NandanNilekani . Following this, we are hosting the Center's first language AI workshop that is open to startups and researchers.
1
1
9
@ai4bharat
AI4Bharat
2 years
Prof. Kamakoti, Director, IITM giving the Inaugural address
Tweet media one
0
0
8
@ai4bharat
AI4Bharat
2 years
Workshop on Translation and Transliteration @MiteshKhapra @sumanthd17 Yash Madhani
Tweet media one
Tweet media two
Tweet media three
0
0
8
@ai4bharat
AI4Bharat
2 years
Checkout our latest work on NER for Indian Languages
@anoopk
Anoop Kunchukuttan
2 years
AI4Bharat is happy to share Named Entity Recognition datasets and model for 11 Indian languages. - Model: - Dataset: - Colab notebook: @ai4bharat @pratykumar @MiteshKhapra @vivek_raghavan @RudraMurthyV
0
2
14
0
2
7
@ai4bharat
AI4Bharat
9 months
Our residents in collaboration with students and researchers in AI4BHARAT have also produced high impact datasets and models like BPCC, IndicTrans1 and 2, IndicWav2Vec, IndicBart, IndicNLG Benchmark, etc, all of which have seen significant adoption in government and industry.
2
0
7
@ai4bharat
AI4Bharat
1 year
3. Aksharantar: Towards building open transliteration tools for the next billion users. Authors: Yash Madani, Sushane Parthan, Priyanka Bedekar, Ruchi Khapra, @anoopk , @pratykumar , @MiteshKhapra . (Findings)
1
1
7
@ai4bharat
AI4Bharat
9 months
If you would like to be a part of AI4BHARAT and contribute to and push the boundaries of the NLP, Speech and Vision ecosystem for India then this opportunity is for you. @anoopk @MiteshKhapra @ratishsp @RudraMurthyV @prajdabre1
0
0
7
@ai4bharat
AI4Bharat
7 months
It's a step towards collecting spontaneous speech data across the rich tapestry of Indian languages, while honouring the vast linguistic, cultural, and demographic diversity! With this, we release 7,348 hours of speech data! Let's push the boundaries of Indic speech technologies!
1
3
8
@ai4bharat
AI4Bharat
3 years
Join us at #AAAI2022 to learn more about IndicASR Poster: Feb 25 @ 4:45-6:30pm PST Feb 28 @ 12:45-2:30am PST Oral: Feb 28 @ 2:30-3:45am PST Preprints: @tahirjmakhdoomi @sumanthd17 @themlstud @kaushal_py @gowtham_ramesh1 @anoopk Pratyush @MiteshKhapra
1
3
6
@ai4bharat
AI4Bharat
3 years
IndicWav2Vec - Curated 17k hours of raw speech data for 40 Indian languages - SOTA ASR models for 9 languages on 3 public datasets paper - data - code - @tahirjmakhdoomi @_themlstudio @kaushal_py [3/n]
2
1
5
@ai4bharat
AI4Bharat
9 months
AI4BHARAT has had an excellent track record of AI residents conducting cutting edge research and publishing it in top tier venues like ACL, Interspeech, EMNLP, TMLR, etc, under the leadership of Prof Mitesh Khapra, Dr Anoop Kunchukuttan and Dr Pratyush Kumar (now at SarvamAI).
1
0
6
@ai4bharat
AI4Bharat
1 year
4. NICT-AI4B's Submission to the Indic MT Shared Task in WMT 2023 Authors: @prajdabre1 , @jaygala24 , @pranjalchitale (WMT shared task for Indic languages) Congratulations to all the authors! Camera ready versions will be out soon.
1
3
6
@ai4bharat
AI4Bharat
1 year
2. DecoMT: Decomposed Prompting for Machine Translation Between Related Languages using Large Language Models Authors: @ratishsp , @anoopk , @prajdabre1 , Ay Ti, Nancy Chen (main)
2
1
6
@ai4bharat
AI4Bharat
7 months
🌍 Empowering Language Communities: By releasing our open-license datasets and tools, we aim to empower open research. We hope this effort acts as a blueprint for creating quality resources in other language communities thus democratizing AI. (6/n) #IndicLLMSuite #AI4Bharat #LLM
0
2
8
@ai4bharat
AI4Bharat
3 years
IndicBART - Multilingual pre-trained seq2seq model for 11 Indian Languages and English - 1/3rd the size of mBART, but better/competitive on NMT and extreme summarisation paper - code - @prajdabre1 @anoopk @ratishsp [7/n]
1
1
5
@ai4bharat
AI4Bharat
7 months
🗣️ For model alignment we release IndicAlign-Instruct a collection of 74.7 million prompt-response pairs in 14 Indic languages created by repurposing and translating existing datasets as well as creating new ones. (4/n)
Tweet media one
1
0
5
@ai4bharat
AI4Bharat
7 months
🚫 We also create IndicAlign-Toxic: 123K pairs of toxic prompts and safe responses - repurposing existing datasets and using a novel method of taxonomized synthesis of toxic prompts using combination unaligned model and non-toxic responses using an aligned model. (5/n)
Tweet media one
1
2
6
@ai4bharat
AI4Bharat
1 year
5. Towards Better Evaluation of Instruction-Following: A Case-Study in Summarization (CoNLL 2023) Authors: Ondrej Skopek, Rahul Aralikatte, Sian Gooding, Victor Carbune.
0
0
5
@ai4bharat
AI4Bharat
2 years
All the notebooks, slides & posters from our workshops are now available on our website. Please reach out to us if you have any queries wrt any of the models and/or potential use cases for our model.
1
1
4
@ai4bharat
AI4Bharat
9 months
Chitralekha, which is an open-source AI-powered video transcreation platform. It has an integrated workforce management system, which enables transcreation of a video from one language to another. Please enjoy our blog, explore Chitralekha and feel free to contribute.
0
1
5
@ai4bharat
AI4Bharat
5 years
Calling out to @amitabhk87 to join - a community to innovate on AI solutions for the nation. Mitesh and Pratyush from IIT Madras will host a kick-off today 10th July at 7pm IST here: #AI #India
0
0
4
@ai4bharat
AI4Bharat
5 years
Happy to be featured on UmmId #AI #India #ai4bharat
0
0
4
@ai4bharat
AI4Bharat
5 years
Happy to be featured on dtnext #AI #India #ai4bharat
0
0
4
@ai4bharat
AI4Bharat
2 years
You can contact the following people on questions wrt particular projects IndicTrans - @sumanthd17 @gowtham_ramesh1 IndicWav2Vec - @tahirjmakhdoomi @kaushal_py @abhigyan_r IndicNLG - @prajdabre1 @anoopk @ratishsp @RudraMurthyV IndicBERT - @sumanthd17 IndicXlit - Yash Madhani
0
1
4
@ai4bharat
AI4Bharat
2 years
As a transcriptionist under the aegis of IIT Madras, you will get to collaborate with language experts at a pan-India level. You will be presented with many opportunities to improve your know-how in AI speech processing.
1
3
3
@ai4bharat
AI4Bharat
2 years
Please click here to apply: @MiteshKhapra @pratykumar @anoopk
0
0
4
@ai4bharat
AI4Bharat
3 years
Slides for the talk - Sign up here for a remainder -
@ai4bharat
AI4Bharat
3 years
AI4Bharat is excited to announce a talk on "RNN-T ASR Systems and Enabling Contextualization For RNN-T ASR Systems" by @mahajain3 this Saturday, March 19th, 9:00 to 10:45 AM. [1/n]
Tweet media one
1
3
10
0
1
3
@ai4bharat
AI4Bharat
3 years
SuperShaper - We propose SuperShaper, a task agnostic pre-training approach which simultaneously pre-trains a large number of Transformer networks by varying its shape (the hidden dimensions across layers) paper - @VinodG93 @gowtham_ramesh1 [n/n]
1
0
3
@ai4bharat
AI4Bharat
10 months
📚 Stay tuned for captivating insights! 🧐 Unveiling blogs on past and present research at AI4Bharat in the coming weeks. 🚀🔍 #ResearchInnovation
1
0
3
@ai4bharat
AI4Bharat
10 months
👉 Dive into the details of IndicTrans2-M2M – Indic to Indic translation covering 22 languages! 🌍 Discover how we achieved 5x compactness and 2x faster models, now on HuggingFace! 🚀💡 #TechInnovation #IndicTrans2M2M
1
0
3
@ai4bharat
AI4Bharat
3 years
Thanks to all the student researchers, mentors and professors how made this possible @MiteshKhapra , @anoopk , Pratyush Kumar
0
1
3
@ai4bharat
AI4Bharat
3 years
EvalEval - Proposed Perturbation Checklists for designing and evaluation of Automatic NLG metrics - Show that existing NLG metrics are not robust to perturbations and disagree with the human scores paper - code - @AnanyaSaiB [4/n]
1
0
3
@ai4bharat
AI4Bharat
3 years
Multilingual Language Models (MLLMs) Survey - Surveyed literature about MLLMs focussing on: (i) Models &objective functions (ii) Tradeoff between a monolingual & multilingual LM (iii) Zero-Shot transfer paper - @gowtham_ramesh1 @sumanthd17 [6/n]
1
0
3
@ai4bharat
AI4Bharat
5 years
Hi everyone, We're looking for talented full-stack web developers to join us for a full time posting at our office in the IIT-Madras Research Park. You can apply through this link
1
0
2
@ai4bharat
AI4Bharat
5 years
Calling out to @mhrd_innovation to join - a community to innovate on AI solutions for the nation. Mitesh and Pratyush from IIT Madras will host a kick-off today 10th July at 7pm IST here: #AI #India
0
0
2
@ai4bharat
AI4Bharat
1 year
@ratishsp @anoopk @prajdabre1 Correction: Ai Ti (apologies for the spelling mistake)
0
0
2
@ai4bharat
AI4Bharat
2 years
These full-time code contributor positions are for 2-3 months and come with a stipend of Rs 25,000 per month.
0
0
2
@ai4bharat
AI4Bharat
5 years
Calling out to @PMOIndia to join - a community to innovate on AI solutions for the nation. Mitesh and Pratyush from IIT Madras will host a kick-off today 10th July at 7pm IST here: #AI #India
0
0
2
@ai4bharat
AI4Bharat
3 years
Join us for Part - 2 of the talk which will be held on Saturday, March 26th, 9:00 to 10:10AM. In this talk @mahajain3 will be discussing beam-search decoding, and contextualisation for RNN-T Join us at: Sign up for the talk here:
@ai4bharat
AI4Bharat
3 years
AI4Bharat is excited to announce a talk on "RNN-T ASR Systems and Enabling Contextualization For RNN-T ASR Systems" by @mahajain3 this Saturday, March 19th, 9:00 to 10:45 AM. [1/n]
Tweet media one
1
3
10
1
1
2
@ai4bharat
AI4Bharat
5 years
Calling out to @isro to join - a community to innovate on AI solutions for the nation. Mitesh and Pratyush from IIT Madras will host a kick-off today 10th July at 7pm IST here: #AI #India
0
0
2
@ai4bharat
AI4Bharat
3 years
Mahaveer Jain "RNN-T ASR Systems and Enabling Contextualization For RNN-T ASR Systems" Saturday, March 19 · 9:00 – 10:45am Google Meet joining info Video call link: [2/n]
1
0
2
@ai4bharat
AI4Bharat
2 years
@shaily99 @prajdabre1 Hi, Someone asked for me? :)
2
0
2
@ai4bharat
AI4Bharat
3 years
Speaker Bio: Mahaveer Jain is a Software Engineer at Facebook. Priorly, he was a graduate research assistant at LTI at CMU, where he finished his Master's in Language Technologies. Mahaveer has worked extensively on building production ready RNN-T ASR systems at Facebook [5/n]
1
0
2
@ai4bharat
AI4Bharat
5 years
Calling out to AI engineers, domain experts, govt officials to join - a community to innovate on AI solutions for the nation. Mitesh and Pratyush from IIT Madras will host a kick-off today 10th July at 7pm IST here: #AI #India
0
0
2
@ai4bharat
AI4Bharat
5 years
Calling out to @nasscom to join - a community to innovate on AI solutions for the nation. Mitesh and Pratyush from IIT Madras will host a kick-off today 10th July at 7pm IST here: #AI #India
0
0
2
@ai4bharat
AI4Bharat
5 years
Calling out to @FollowCII to join - a community to innovate on AI solutions for the nation. Mitesh and Pratyush from IIT Madras will host a kick-off today 10th July at 7pm IST here: #AI #India
0
0
2
@ai4bharat
AI4Bharat
5 years
Calling out to @NITIAayog to join - a community to innovate on AI solutions for the nation. Mitesh and Pratyush from IIT Madras will host a kick-off today 10th July at 7pm IST here: #AI #India
0
0
2
@ai4bharat
AI4Bharat
3 years
Further, Mahaveer will discuss methods to enable contextualization for RNN-T ASR Systems. Contextualization allows us to use utterance specific context for ASR systems. [4/n]
1
0
2
@ai4bharat
AI4Bharat
1 year
@ShashiTheNxt Hi! Please send in your questions to: enquiry @ai4bharat .org
1
0
2