A 2-minute demo showcasing how supports teams that train foundation models.
Haven't heard about Neptune before?
TL;DR: It's an experiment tracker built to support teams that train large-scale models.
Neptune allows you to:
→ Monitor and
We have a treat for all
@PyTorch
users out there!
"8 Creators and Core Contributors Talk About Their Model Training Libraries From PyTorch Ecosystem"
First-hand info about:
- Philosophy
- Build-in features
- Extension capabilities
- and way more
Just updated the
@fastdotai
integration with
@NeptuneML
so that you can have your experiments tracked, hosted and ready to share with others.
Check out the blog post about it
Want to understand with code how to build
#BERT
?
Check this article!
@nielspace07
uses
@pytorch
and breaks the process into 4 sections:
- Preprocessing
- Building model
- Loss and Optimization
- Training
Good news! We integrated Neptune with another awesome library - Keras Tuner.
You can now:
-see charts of logged metrics
-see the parameters tried
-log hyperparameter search space
And more!
Docs 👉
Thanks
@fchollet
and the team for this great library 🙏
#ToolAlert
dabl: an awesome library by
@amuellerml
that reduces boilerplate when creating baseline ML solutions.
With (dabl.) clean, plot, AnyClassifier, and explain you can do pretty much everything you need with a one-liner.
Check it out:
If you're training deep learning models,
@fastdotai
should definitely be your tool.
And if you want to additionally monitor your training (believe us, you should want that), try Neptune-fastai integration. It's just one additional callback.
Docs:
The secret key is the interoperability of the components.
That is the difference between success and frustration when building an
#MLOps
platform.
In the classic debate between build, in-house vs buy best-of-breed, the answer (when you cut the fluff out) is almost always both.
#MLOps
stack at companies doing
#ML
at a reasonable scale.
How do they choose their tools?
Veeeeery pragmatically.
First thing is to understand what you actually need.
Which part of the stack you need to do well and which part not so much.
Want to present your lib for
#AI
research at the
@iclr_conf
?
DM
@kamil_k7k
, co-host of the social event focused on tools and practices in
#DeepLearning
and join the ideas exchange forum!
Here's another amazing article by
@cathalhoran
on our blog.
This time he wrote about:
- how Transformer models learn context
- how they are similar and different
- if Masked Language Models perform better
- BERT and its masking process implementation
👇
We figured that sometimes the lack of
@huggingface
integration was a bit of a blocker from using
@neptune_ai
.
Well, it wasn’t difficult to figure out - a few people just told us that.
Our
#BERT
section on the blog is growing! We already published some theoretical articles, now it's time for a hands-on tutorial.
@nielspace07
shows how to code BERT using
@PyTorch
“Inevitably, the resident CTO arrives at a fruit bowl of cherries picked from open-source projects, vendor-supplied proprietary tools, and services from a cloud provider.” -
@ciphr
Task ML engineer
-decent understanding of infrastructure & really good of ML
-responsible for sustaining a specific ML pipeline
-concerned with specific models for business-critical tasks
-paged when top-line metrics are falling
-tasked with “fixing” something model-related
In this post, we explore how to build
#ML
apps with
@streamlit
, and give you a few examples.
It doesn’t take long to start with Streamlit, since you don’t need any front-end web development experience, you script everything with
#Python
, so give it a go!
If you were to build a new
#MLPlatform
now, is there anything that you would do differently?
Question from the AMA with the Netflix ML infra team.
Answer?
“...We would also try and integrate/build experiment tracking as a first class concern upfront.”
Learn how to optimize hyperparameters of lightGBM from MSFTResearch with Scikit-Optimize
Kudos to
@betatim
@mechcoder
@iaroslav_ai
and others for this beautiful library!
MachineLearning DataScience
3/ “I’m a big fan of buying what you can buy and building what you really really need to build. That makes you different. That is essential to your business.”
The solution can be a mix of:
-tools built in-house
-open-source
-third-party SaaS or on-prem tools
So depending on their use case, they may have sth as basic as bash scripts for most of their ML operations & get sth more advanced for one area where they need it.
Examples 👇
Scikit-Optimize is a great library for hyperparameters optimization if used correctly.
Read this library evaluation blog post on TDataScience post
#MachineLearning
#DataScience
Things we like about Hyperopt:
- Simple API
- Nested search space
- Distributed computation
Things we don't like about Hyperopt:
- documentation
- documentation
- visualization
Read this blog post for more
#DataScience
#MachineLearning
Next Tuesday on MLOps Live, we’re again doubling the number of our guests. Silas Bempong and
@abhijitramesh2k
will join us to answer your questions about doing MLOps for clinical research studies.
Awesome article by Johannes Schmidt about training your own object detector. 🙌
For training and experiment management he used
@PyTorchLightnin
and
@neptune_ai
.
Orchestration tools make the ML process easier, more efficient and help data scientists and ML teams focus on what’s necessary, rather than waste resources trying to identify priority issues.
We review 13 of them (
@flyteorg
, MLRun,
@PrefectIO
, and more).
We prepared another article with tips and tricks from
#Kaggle
competitions! This time we focused on
#TabularData
binary classification.
Make sure to check it out if you want to master the competition!
But when you combine tools from the MLOps ecosystem (open-source or vendor-supplied), you have new problems.
You need to design for interoperability of modules that solve particular problems.
Some time ago, inspired by the blog post by
@Polly_zk
and his package TeleGrad, we decided to write the
@telegram
bot for Neptune! You can use it to access experiment information.
Check the docs here:
#ToolAlert
PySyft - a framework for Differential Privacy and Federated Learning from
@OpenMinedOrg
.
Check out or listen to
@iamtrask
speak about it on twimlai podcast.
We heard it a few times from potential Neptune users:
“Overall, the product looks good and almost provides all we need - but unfortunately, it seems it doesn't support FBprophet, which is one of our used models.”
Finally, we can say it’s not the case anymore!
Check out our new article, we compile tips and tricks from solutions of some of Kaggle’s top NLP competitions.
We discuss dealing with larger datasets, small datasets and external data, text representations, modeling, evaluation, cross-validation and more!
“Your app is great, checks most of the boxes for us, supports a ton of metadata types, solid API, but
@huggingface
is fundamental to our workflow. So, we really need this integration to be there.”
Top 2 books for learning
#MLOps
by
@jacopotagliabue
(father of reasonable scale MLOps):
1/ Effective Data Science Infrastructure by
@vtuulos
2/ Designing Machine Learning Systems by
@chipro
What are your favorites?
Once an
#MLteam
gets to a certain number of experiments, it can be difficult to collaborate.
Here’s how
@hypefactors
managed to solve this problem.
Old vs new way of collaborating on
#ModelDevelopment
↓
Scikit-Optimize is being actively maintained again!
A new release is out and docs rebrand looks so cool :)
Want to learn about Scikit-Optimize?
Read our article that evaluates this lib based on API, speed, documentation, visualizations and more.
What are the bottlenecks teams often encounter when setting up
#MLOps
& data engineering for neural search?
Full MLOps Live episode with
@jakubzavrel
and Fernando Barrera:
Youtube:
Spotify:
Apple Podcasts:
Today
#DataScientists
and developers run multiple parallel experiments that can get overwhelming even for large teams.
How to compare them effectively?
Prepared by
@sam_techwriter
👏
Are there any
#football
fans here? If yes, here's an article for you.
@Elishatofunmi
shows how to build an automated
#MachineLearning
model that is able to track team players on the pitch, and help predict their moves.
Have you used Hyperopt to optimize parameters of your
#MachineLearning
Models?
Read our opinionated blog post about the pros and cons of using it
Do you agree with those points?
#DataScience
📣 New
@neptune_ai
feature alert: SERVICE ACCOUNTS
We’ve got quite a few requests from
#ML
teams that:
- integrate Neptune with their
#cicdpipeline
- want to avoid using personal tokens for automated processes
So we introduced service accounts that should check these boxes.
We’ve just launched our interview series from the
@aiDotEngineer
!
Today’s spotlight:
@itsSandraKublik
, Developer Relations at
@cohere
, talks about the evolution of the
#RAG
based models and the biggest challenges in the RAG-based systems (and advanced RAG) that will remain
Next Tuesday = Next MLOps Live Q&A session
This time the episode will be focused on MLOps at a small scale. Our guest
@duarteocarmo
will answer your questions around how early-stage startups and small teams tackle MLOps.
#ProTip
Before you start doing any
#MachineLearning
setup a solid validation schema.
Without reliable validation, you will never know if you are making any progress.
Read this great example of setting up validation to a tricky problem
“1st thing you need to know is that you have a problem that can be solved using ML techniques & it’s worth solving yourself.
I’m a big fan of buying what you can buy & building what you really really need to build. That makes you different. That is essential to your business.”
What are your expectations for the
#ExperimentTracking
tool?
4 criteria by
@deepsense_ai
:
1/ Easy onboarding
2/ Simplicity of the tool
3/ Convenient API
4/ Fast and accurate support
⚡️
@PyTorchLightnin
is a great research framework that helps you organize your DL code and outsource development boilerplate.
We like it a lot, so we had to integrate it with our tool. Anything that you can log to your Lightning module, you can have visualized in Neptune.
🥳Looking for a tool to store all your model metadata, including visualizations? Good news!
We updated Neptune's integrations with viz libraries, to be in line with the new Neptune API!
Check the docs 👉
Altair
@vega_vis
@bokeh
@plotlygraphs
@matplotlib
With
#MLOps
still being a nascent field, it’s hard to find established best practices and
#ModelDeployment
examples to operationalize ML solutions. What are the most common challenges faced by ML engineers and their teams?
Article by
@nerdCyberArtist
:
Do you know when to use ROC AUC and when to go with Precision-Recall curve AUC for your classification models?
Read our blog post:
#MachineLearning
#DataScience
#ToolAlert
PySyft - a framework for Differential Privacy and Federated Learning from
@OpenMinedOrg
.
Check out or listen to
@iamtrask
speak about it on twimlai podcast.
Do you know what beta in F-beta metric stands for and how you can use it to put more focus on recall or precision?
Read our blog post:
#MachineLearning
#DataScience
The most effective way to understand data is to visualize it. And we really value all the tools that help with this task.
#Altair
is a great visualization library for
#Python
so we integrated it with Neptune to let you log interactive charts.
More here:
Explore our
#HyperparameterOptimization
section on the blog:
👉 Hyperparameter Tuning in Python
👉 Best Tools for Model Tuning and Hyperparameter Optimization
👉 How to Track Hyperparameters of ML Models?
In this new article,
@Elishatofunmi
uses the
#Pytennis
environment to build a Model-Free and Model-Based
#RL
system + gives you some resources you can explore – check it out! 🎾🎾
Most companies are either not doing any production ML yet, or do it at a
#Reasonablescale
.
Reasonable scale as in:
- five ML people,
- ten models,
- millions of requests.
Reasonable, demanding, but nothing crazy and hyperscale.
Nothing like Google.
“I have been pleasantly surprised with how easy it was to set up Neptune in my PyTorch Lightning projects!” - user's feedback 😍
If you're a
@PyTorchLightnin
user and want to be positively surprised as well, check our integration docs 👉
Did you know that you can use Kolmogorov-Smirnov statistic to measure the performance of you binary classifier?
Read our blog post:
#MachineLearning
#DataScience
Neptune +
@bokeh
= possibility to log interactive charts generated in bokeh (like confusion matrix or distribution) in Neptune. 😍
📈📉📊
Check the docs ➡️