Srishti Gureja Profile
Srishti Gureja

@sGx_tweets

1,963
Followers
260
Following
68
Media
1,183
Statuses

ML Researcher // models don't write my code // I like MoEs, ML Safety and low level stuff // PyTorch Contributor Awardee '23

PyTorch forums
Joined May 2021
Don't wanna be here? Send us removal request.
Pinned Tweet
@sGx_tweets
Srishti Gureja
2 years
First paper accepted at a NeurIPS workshop! TL4NLP it is :) All because of folks around me that support & believe me. Grateful!
Tweet media one
17
0
93
@sGx_tweets
Srishti Gureja
2 years
"XGBoost is all you need?" Ok, but let me caution you for cases when your data has categorical variables & you are using any tree based or boosted tree method like RandomForest, XGB etc One hot encoding could ruin things when the categorical variable has many levels 1/n
22
89
560
@sGx_tweets
Srishti Gureja
3 years
Know your ML evaluation metrics Got highly imbalanced data? Probably you'd want to 're'consider ROC. While AUC ROC is a very popular metric because of its characteristics like being insensitive to class distribution, it isn't a good choice 1/
5
61
310
@sGx_tweets
Srishti Gureja
2 years
How cool is it to try your ML models live rather than just training them in Jupyter notebooks? Very cool! And even cooler when it's done as easy as it gets with @Gradio and @huggingface 🤗space! I'm ready to try my bean plant classifier! 1/
Tweet media one
12
38
303
@sGx_tweets
Srishti Gureja
3 years
Got reached out for data scientist role by a company based in London using AI in the field of medicine. They liked my LinkedIn activity & see me as a good fit according to what my LI makes me seem interested in - I do not even post much as I've always been interested more on the+
16
24
280
@sGx_tweets
Srishti Gureja
3 years
As a Data Scientist, I'll be smart if I quickly understand the business problem, frame it, understand how and what data is needed (or already used). In some cases knowing how to write integration pipelines also helps. Ofc, modelling is equally important. But obviously no one +
11
26
271
@sGx_tweets
Srishti Gureja
1 year
Thanks a lot @PyTorch @linuxfoundation ! 🔥❤️ Great to have received this. Looking forward to contributing much more substantially down the line.
Tweet media one
19
4
252
@sGx_tweets
Srishti Gureja
2 years
Data loading shouldn't be a bottleneck in the model training pipeline. With my new blog, learn how @PyTorch ensures this. I also explore & implement the latest DataPipes from TorchData.🌟Pretty cool. Wrote this one on @weights_biases posts section.
8
27
185
@sGx_tweets
Srishti Gureja
2 years
Are you setting off learning @PyTorch ? Follow along with my blog for a line-by-line explanation as you create your first neural net classifier. I wrote it on @weights_biases posts section. Here's the link -
4
24
141
@sGx_tweets
Srishti Gureja
2 years
Ever used regularisation (L1, L2) and wondered why it's advised to standardise the features (x1 x2...xn) before doing so? In L1 and L2 regularisation, we aim to shrink the magnitudes of coefficient estimates. 1/n
4
19
95
@sGx_tweets
Srishti Gureja
2 years
Looking to connect with people in the ML space that actively contribute to open source. I know how welcoming & helpful open source folks are. help me get started? :) I'd just want to chat a bit, won't take much of your time.
9
8
89
@sGx_tweets
Srishti Gureja
2 years
Outliers in your data? Among the many ways to deal w it, if one is going for regularisation, then - L1 (Lasso) is more robust to outliers while L2 (Ridge) isn't really. 1/n
2
10
82
@sGx_tweets
Srishti Gureja
3 years
Why do you choose '5' fold cross validation? Or why not any other k than the one you go for in k fold cross validation? Well, I shouldn't be asking here. Should ask the good old Bias Variance tradeoff instead. Here's why - 1/n
3
9
74
@sGx_tweets
Srishti Gureja
2 years
But Hash encoding & Dracula are two encoding schemes recommend for categorical variables with many levels. Ofcourse, these do not come without their cons. 6/6
7
8
66
@sGx_tweets
Srishti Gureja
2 years
Quote of the day - "Logistic Regression IS a regression algorithm." Anyone who tells you otherwise clearly doesn't know what they are talking.
16
5
62
@sGx_tweets
Srishti Gureja
1 year
working through RoPE's math & realising it's nothing but revising my undegrad lin algebra (intuition & concept) was fun. btw, I'll be presenting today on extending the context of models using RoPE - 2230 IST @forai_ml . Paper:
2
8
61
@sGx_tweets
Srishti Gureja
3 years
theoretical (mathy, nitty gritty) side of things & that acc to me wouldn't really help me scale as a creator or maybe that isn't what people would be interested in reading. Still, learning in public is Mighty! think I should follow strategic posting now :)
3
1
45
@sGx_tweets
Srishti Gureja
2 years
Two types of classification tasks and how to implement each in @PyTorch : I've come across this confusion a bunch of times now about choosing the right loss function for a Classification Task using Deep Learning in @PyTorch . Here's a simple explanation: 1/3
5
4
44
@sGx_tweets
Srishti Gureja
3 years
They, too were data scientists!
Tweet media one
4
2
36
@sGx_tweets
Srishti Gureja
3 years
So, I saw a post laying a 'set of golden rules' for dealing with missing data. Wrong on so many levels. No set of rules exists unless one considers the following-- Why is the data missing? Is it even worth imputing? A thread👇
1
12
37
@sGx_tweets
Srishti Gureja
2 years
Learning exporting @PyTorch models to ONNX today. Interesting! Thread soon 👋
6
5
39
@sGx_tweets
Srishti Gureja
2 years
PyTorch 2.0!!! Fun weekend ahead
2
0
39
@sGx_tweets
Srishti Gureja
2 years
Learn to create lists in @PyTorch the correct way! The wrong way To create a @PyTorch NN with a variable no. of layers, a plain python list might be a common choice to store the network layers (nn.Module) by appending. This becomes a source of error. See in code? 👇 1/3
1
4
40
@sGx_tweets
Srishti Gureja
2 years
In the data space, I've learnt more by writing than by reading. Few weeks back, when I started learning pytorch, I wrote a blog to explain every detail of a code of mine that constructed the most basic NN using pytorch. To cater my reader well, I made sure every little detail +
2
1
38
@sGx_tweets
Srishti Gureja
3 years
Basics go a loooong way! Nothing new but I've come to realise this again, as I've been interviewing for Data Scientist roles lately.
3
0
33
@sGx_tweets
Srishti Gureja
2 years
Creating Custom Models in @PyTorch ? Make sure you aren't making this 👇 error. Let's learn in 5 steps. Step 1: Firstly, to "correctly" create optimizable parameters in PyTorch without running into gradient errors, we need to ensure parameters are leaf tensors. 1/5
1
3
33
@sGx_tweets
Srishti Gureja
2 years
Ever wanted to set different learning rates for different layers/parameters while training your neural networks in @PyTorch ? Let's learn how to do that with @PyTorch in 2 steps 👇 1. We will create the simplest neural network with 2 layers: 1/2
Tweet media one
2
4
33
@sGx_tweets
Srishti Gureja
3 years
@Abh1navv What are my k nearest neighbours having?
2
0
28
@sGx_tweets
Srishti Gureja
3 years
at work will give you a nicely prepared data ready to be applied fancy models to.
3
0
31
@sGx_tweets
Srishti Gureja
2 years
Do not one hot encode your categorical variable while using XGB if it has got many levels. One hot encoding works ok & might even give performance boost if no. of levels are lesser. Curious why? 2/n
4
2
28
@sGx_tweets
Srishti Gureja
2 years
My @weights_biases blogathon submission explains CPCA - a simple & useful dimensionality reduction algorithm where you work with not 1, but 2 datasets to explore patterns in the target data. Also, +
1
2
27
@sGx_tweets
Srishti Gureja
2 years
this badge on @PyTorch forums is tempting to me, hoping to do more in this welcoming community! PS - almost accidentally misspelled badge as batch :)
Tweet media one
3
1
27
@sGx_tweets
Srishti Gureja
3 years
sometimes I really feel I should do an MS in 'Applied' math and stats. crazy how class 10th's moving average can be used for analysing time series data. it helps smoothing the series, assess trends by removing seasonality and even forecast future. no rocket science, just +
3
1
26
@sGx_tweets
Srishti Gureja
2 years
Didn't know PyTorch forums could be installed as a mobile application. More scrolls now :D
1
2
26
@sGx_tweets
Srishti Gureja
2 years
Working with Neural Networks? Using a CV architecture to predict whether a brain's MR scan classifies as cancerous, non-cancerous or any other such class? Rather than a single prediction from the neural net, wouldn't it be better if we could generate confidence intervals? 1/n
1
3
23
@sGx_tweets
Srishti Gureja
3 years
Data Science != Machine learning/Modelling
3
2
21
@sGx_tweets
Srishti Gureja
2 years
There it is! Few seconds and here's the prediction. Was nice to build my own web app as part of Building end to end Vision Applications taught by Dr. Abubakar @abidlabs at CoRise @corise_ !!
Tweet media one
2
2
22
@sGx_tweets
Srishti Gureja
2 years
Pytorch docs are the best resource!
1
0
22
@sGx_tweets
Srishti Gureja
2 years
With a lot of levels comes a lot of sparsity. So while one hot encoding many levels (equivalent to creating the same no. of variables) only a small fraction of data points shall have the value 1 for a single level (read: variable) Why's this a problem? 3/n
2
1
21
@sGx_tweets
Srishti Gureja
3 years
z-test vs t-test z-test: underlying statistic can be approximated to follow std. normal distribution. t-test: ,,,,,to follow t distribution instead. Catch is: t distribution is more accurate in case of small samples. 👇
3
3
20
@sGx_tweets
Srishti Gureja
2 years
Checkpointing models in @PyTorch mid-training? Do not forget to save the optimizer's current state along with the model's current state, like so👇
Tweet media one
2
0
21
@sGx_tweets
Srishti Gureja
3 years
What to do? PR Curve is a better choice. Both Precision & Recall deal with the postive (minority) class of interest. So in this example while Recall values are equal, precision informs how many positives as predicted by the model are true positivies. 4/
2
1
19
@sGx_tweets
Srishti Gureja
2 years
A serious error. Feature selection/engineering on tabular data is a crucial step in any machine learning problem. BUT! Hold on and double check you are doing it right. If you are using Cross-validation or validation set holdout approaches for estimating test error.. 1/n
1
1
20
@sGx_tweets
Srishti Gureja
2 years
People talk about fancy Machine Learning models. Today, I'll talk about anything fancy when it comes to solving problems/answering questions using data. An acquaintance was very keen to find the best outlier detection technique.
2
3
20
@sGx_tweets
Srishti Gureja
2 years
Clarity of concept goes a long way! Recently had an interview where I was asked something about pre-trained BERT models that I'd never read or thought of before. But, since I had the gist of what actually goes on inside the BERT architecture, I was able to answer on point +
4
2
20
@sGx_tweets
Srishti Gureja
3 years
exercising keeps the body fit and, learning computer algos keeps my mind fit. a fun mind exercise to do between my NLP lessons.
2
3
16
@sGx_tweets
Srishti Gureja
3 years
@TivadarDanka Coolest thing on twitter in a while!
0
0
16
@sGx_tweets
Srishti Gureja
2 years
New blog on @PyTorch soon! I'll be talking about how Pytorch handles data effectively and efficiently. Along, I'll also demonstrate the new DataPipe functionality from the TorchData library. Stay tuned 👋🔥
1
0
19
@sGx_tweets
Srishti Gureja
3 years
@marktenenholtz it would be what I failed to do myself: don't learn X first completely &then Y & then Z & so on. Take up a problem and learn on the go. so for eg. one really doesn't need to be vv good at python to learn ML. ofc it's an advantage to be that but def not required(atleast to start)
0
0
19
@sGx_tweets
Srishti Gureja
2 years
@marktenenholtz found it! guess I got a good memory so I just remembered this is your post. don't know what people get out of plagiarism but it's so irritating it would've really driven me up the wall had I been in your place.
Tweet media one
4
0
19
@sGx_tweets
Srishti Gureja
2 years
Do you use drop_last = True in your PyTorch DataLoader? I do. Here's what it is- Setting it to True shall drop the last batch in each epoch in case the dataset at hand cannot be evenly divided into batches of equal sizes. 1/n
2
0
18
@sGx_tweets
Srishti Gureja
2 years
I don't know whether one needs to know math for an industry ML role or not But what I know is that engineering skills are sooo needed. maybe more :D What's interesting to me is that I feel this latter skillset is no less required in research as well :) ps: No rigid agenda x
3
0
19
@sGx_tweets
Srishti Gureja
2 years
Trees split on those variables that yield the "purest" nodes. Easy to see why a One hot encoded variable typically shall not lead to very pure nodes & hence the tree shall not split on it No matter how important the original categorical variable could be as a feature 4/n
1
1
19
@sGx_tweets
Srishti Gureja
3 years
I need some help with tagging diseases (NER) in biomedical text data. Any help from anyone who's done that is appreciated.
5
1
17
@sGx_tweets
Srishti Gureja
2 years
Naturally, this would also interfere with feature importance generated by RandomForest or any other method as even if the splits happen on these hot encoded levels, they'll most likely not happen near the root What to do then? Well, here's needed knowledge from experience 5/n
1
1
19
@sGx_tweets
Srishti Gureja
3 years
it's a real deal, atleast for me, to implement research papers. it's a basic one (!= not useful rather very useful), still taking a lot of effort. Anyone who's in a regular practice of doing this? (will also do a thread once I finish. hoping I finish.)
4
1
17
@sGx_tweets
Srishti Gureja
1 year
@osanseviero for a quick comprehensive overview of pos embeddings: follows RoPE: Detailed blogs by the authors of RoPE are the best resources. For GQA, its paper is short and v easily understandable once you know MQA which again is v simple.
0
2
17
@sGx_tweets
Srishti Gureja
2 years
Full fine-tuning LLMs on downstream tasks comes with a lot of GPU memory usage + storage costs. Let us look into a PEFT technique called Adaptors for efficient transfer learning in LLMs with an example application in Transformers!👇 1/7
2
3
15
@sGx_tweets
Srishti Gureja
2 years
Communication-lack of it could ruin any data project, be it industry or research & the worse, it could cost you loads of time before things get ruined Communication-more than half work is done if this is done properly This isn't preach, it's what I'm experiencing these days :)
1
2
17
@sGx_tweets
Srishti Gureja
2 years
Transposing data in PyTorch? x.T is deprecated in PyTorch's latest release when used with tensors of dimensionality other than 0 or 2. Worth noting why - probably because it doesn't work like how we would want when dealing with batches of data (matrices). 1/2
1
1
17
@sGx_tweets
Srishti Gureja
3 years
What type of questions can I expect in a coding round for Data Scientist role? Pandas assignment already done! 🤔 I'm wondering what this round holds. Anyone that's gone through a similar round? Fingers crossed 🤞
5
0
15
@sGx_tweets
Srishti Gureja
2 years
Lately, I've been realising how important a good understanding of pytorch's autograd is for any practitioner. To this end, I'm planning to write a series of blog posts explaining how the autograd engine works with computational graphs, and related concepts. 1/2
3
1
16
@sGx_tweets
Srishti Gureja
3 years
Information lies in variability - Central idea on which dimensionality reduction by PCA is based. But, what if we want to capture variability only due to a specified cause/reason & not care about other sources of variability. For eg. one might want to capture +
1
1
16
@sGx_tweets
Srishti Gureja
2 years
Was reading about Markov Processes - guess they apply to us perfectly. The future state given the present & past depends only on the present no matter what the past was. Beaut!
2
0
16
@sGx_tweets
Srishti Gureja
3 years
my first encounter with word vectors was like -we use some algorithms to convert words into vectors in a way s.t. synonyms have similar vectors this isn't even the most appropriate definition & ofcourse left me sort of uninterested if not clueless read on to know the most basic
1
0
17
@sGx_tweets
Srishti Gureja
3 years
The more I study time series, the more interesting it gets. What comes in the way sometimes however, is those math heavy proofs/conditions. (Not the usual ones though, they are smooth to go :)) Skipping them for now, let's see them if I go for a PhD lol XD
1
0
16
@sGx_tweets
Srishti Gureja
3 years
this could be very (very, very) misleading (or misinforming). the decision boundary of logistic is linear in *its most basic form*. that s shaped sigmoid is **NOT** the decision boundary. this image makes it look like that sigmoid graph is separating the blue class +
@Sauain
Saurav Jain (Open Source + Communities)
3 years
1. Logistic Regression It's a classification model used when the target is categorical. It is a statistical model that in its basic form uses a logistic function to model a binary dependent variable, although many more complex extensions exist.
Tweet media one
1
6
62
3
1
14
@sGx_tweets
Srishti Gureja
3 years
Fine tuning since yesterday. Optuna, you nice! How's hyperopt? Has anybody used it?
5
0
16
@sGx_tweets
Srishti Gureja
2 years
Crucial to constantly revisit the business problem while solving & evaluating the Machine learning problem. Even if the business problem isn't dynamic- REVISE IT! We get tempted to try fancy data science techniques and tools & forget what we are here for - THE BUSINESS PROBLEM!
2
0
15
@sGx_tweets
Srishti Gureja
2 years
"Non linear decision boundaries cannot be solved by logistic regression" - One who understands *just the basics* of what logistic regression is, would know this is TOTALLY wrong. I've now lost count of blogs, threads, posts etc. that state this. 1/2
2
2
16
@sGx_tweets
Srishti Gureja
3 years
train test split in time series cannot be done like how is done with other types of data, that is, dividing the whole data at random. here the chronology that's inherent in the data needs to be followed in the train &test sets as well. this is needed as most modelling techniques+
1
0
15
@sGx_tweets
Srishti Gureja
3 years
cPCA - Contrastive Principal Components Analysis did impressively... +
Tweet media one
1
0
16
@sGx_tweets
Srishti Gureja
3 years
starting w non stationary time series today. I'm loving studying time series so far. Does it come under machine learning? :) After all, it's modelling here as well. Infact, I feel it's challenging cause we are mainly concerned w extrapolation here, among other things.
1
0
14
@sGx_tweets
Srishti Gureja
2 years
Scratched the surface of some techniques for efficient ML inference (Quantization, Pruning etc.) - Interesting topics! Nice to experiment w these techniques in @PyTorch .
0
0
15
@sGx_tweets
Srishti Gureja
2 years
Tensors' gradients unexpectedly None in @PyTorch ? Let's debug 👇🙌 Follow 4 simple checks and you'll have your answer. 1. tensor.requires_grad == True 2. == True, or tensor.grad_fn is None; if it is not None, use retain_grad() on it. 1/2
3
1
16
@sGx_tweets
Srishti Gureja
2 years
Came across hierarchical softmax while revisiting the negative sampling algo. NS is essentially a simpler alternative to HS. HS was a technique introduced to mitigate the heavy computational complexity involved in learning word embeddings using algos like word2vec. 1/2
1
2
15
@sGx_tweets
Srishti Gureja
3 years
subprocess in python :'( stuck badly. has anyone worked w it? help me please.
3
4
12
@sGx_tweets
Srishti Gureja
3 years
Seasonality, which is one of the major components of time series data, can occur in two types - Single & Multiple. Single - when there is one dominant seasonal pattern in the data; more likely to be seen in low frequency data like monthly or yearly. for eg. in a monthly data+
2
0
14
@sGx_tweets
Srishti Gureja
2 years
Fine-tuning an LLM taking up too much GPU memory? Heavily Parameterized Large Language Models + Basic Linear Algebra Theorem = Save GPU memory! 💯 Let’s talk about LoRA, a PEFT technique that relies on a simple concept - decomposition of non-full rank matrices. 1/7
Tweet media one
2
0
13
@sGx_tweets
Srishti Gureja
3 years
Curious to know if ML practitioners use LMMs. Linear mixed effect models are an important and very interesting class of models. They let you model correlated data. I used LMMs to model pollutant levels in Beijing's air over years.
1
1
14
@sGx_tweets
Srishti Gureja
2 years
Model inference in @PyTorch TIL that computational graphs can be used not just for backprop, but for inference as well. We could create and export our model's graph and use it for inference later without the model checkpoint file. 1/2
1
0
13
@sGx_tweets
Srishti Gureja
2 years
The correct way - Use ModuleList ModuleList functions similar to a python list & is meant to store nn.Module objects similar to how a python list is used to store objs like ints, strings etc. The parameters of different layers are registered & accessible using .parameters(). 3/3
Tweet media one
3
0
13
@sGx_tweets
Srishti Gureja
3 years
Tbh, 'classic' machine learning is the easiest thing I've studied in the field of Statistics so far. Now some may say ML isn't included in Statistics. For me, it is and I'll call it that way only. Why it seemed easiest to me could be a culmination of my interest, hold of basics👇
2
0
11
@sGx_tweets
Srishti Gureja
3 years
Electricity consumption high during the day, less during the night - I observed this in a time series data. (so it's like up and down with day and night) What is this? seasonal not cyclic Another series, I observe is going up then down up down.. so on. This? cyclic not seasonal+
5
0
12
@sGx_tweets
Srishti Gureja
2 years
@svpino Reminds me of cPCA - one of my favourite dimensionality reduction algorithms.
@sGx_tweets
Srishti Gureja
3 years
What PCA - Principal Components Analysis couldn't accomplish...
Tweet media one
2
1
9
0
1
12
@sGx_tweets
Srishti Gureja
2 years
Confusion Matrix over any metric! Any day!
0
0
13
@sGx_tweets
Srishti Gureja
3 years
to compare two (or more) classification models when the data is highly imbalanced It can be overly optimistic in case of highly imbalanced data So say, with 100k negative examples & 10 positives - Model A & B both correctly identify 9 out of 10 positives. (true positives) 2/
1
0
13
@sGx_tweets
Srishti Gureja
3 years
As a person who writes code, no matter what role, industry, purpose etc., learning time and space complexity is inevitable. And there's no argument to this.
1
0
11
@sGx_tweets
Srishti Gureja
3 years
Another way to detect outliers, this for the multidimensional ones. Last few Principal Components. Generally last few PCs capture very less variance present in the original data. So, a plot of last PC against all data points can be used to find the points against which this PC 👇
3
0
10
@sGx_tweets
Srishti Gureja
3 years
Language Modelling -Recurrent Networks vs Feed Forward NNs. A thread, no math -a gentle explanation First up, what's Language modelling? It's when you start to write a reply to this thread &your google keyboard recommends you next words Now how to train a model to do just this?
2
0
13
@sGx_tweets
Srishti Gureja
3 years
getting the machine learn from data's past - ML getting my brain's machine learn from my life's past - *?* while the former is cool and I'm okay at it ig, I hope I get good at the latter :)
1
0
11
@sGx_tweets
Srishti Gureja
3 years
large, very large data could do no good, if sampling scheme is flawed. (this is me regretting not taking my survey sampling classes seriously :)
1
0
10
@sGx_tweets
Srishti Gureja
3 years
word2vec- slight technicality: do we consider two vectors per word? (initially, during the learning phase) One vector when the word is a central word. Other, when the word acts as context word.
1
0
11
@sGx_tweets
Srishti Gureja
3 years
@pradologue Hometown
Tweet media one
Tweet media two
Tweet media three
2
0
9
@sGx_tweets
Srishti Gureja
2 years
@PyTorch 's Sequential vs ModuleList; & also their combination! 3 simple steps! nn.Module's stored in Sequential are connected in a "cascaded" way - the output of the 1st Module in Sequential becomes the input to the 2nd Module -- need to take care of dimensions. In code 👇 1/3
1
1
12
@sGx_tweets
Srishti Gureja
3 years
one question for data science people: how would you answer if an application asks you about your 'programming experience'? projects that demonstrate the same? Doesn't this sound more on the engineering side? And if you were to put ML(modelling)projects here, what would those be?
5
0
11
@sGx_tweets
Srishti Gureja
2 years
@PyTorch Tip! If the dataset isn't too big and you decide to save it in GPU's memory, and use the DataLoader to load mini batches.. Do not forget to specify the `generator` as a parameter to the DataLoader as shown. 👇 1/2
Tweet media one
2
0
12
@sGx_tweets
Srishti Gureja
3 years
In Machine learning/Statistics, Extrapolation is tougher than Intrapolation. Is that what the quote means?
Tweet media one
2
0
11
@sGx_tweets
Srishti Gureja
2 years
Let's try with this one - (although it isn't the kind of leaf the model was trained on) 2/
Tweet media one
1
0
12