New
@elastic
blog post "Discovering anomalous patterns based on parent-child process relationships" covers a lot of material from my ProblemChild
@CamlisOrg
talk.
Excited to see
@elastic
open up its detection rules repo. Blog post by
@rw_access
does a great job detailing how to get rules into your detection engine and how best to contribute to the community.
Finally releasing MalwareRL, an OpenAI gym for Ember and MalConv malware classifiers. This builds on the RL research
@drhyrum
@mrphilroth
and I did on malware evasions. There are new binary modifications and a baseline (random) agent to get you started.
@newbury_eric
@passantino
@neontaster
You're right these can definitely can be used as an alternative, but they are still not ventilators.
Saying that you sent ventilators when you actually sent CPAP machines is why articles like this get written.
I am beyond excited about the SecML team's work here
@elastic
. This post shows how our team uses transforms to identify beaconing malware.
We hope this post encourages security researchers to prototype new statistical models to detect bad in their data!
@SwiftOnSecurity
@HuntOperator
- Where do they get training data?
- How do they generate labels?
- What are the performance metric?
- How often are models retrained? (Do they degrade over time?)
- How well does it generalize to previously unobserved samples/events?
"Machine Learning in Cyber-Security - Problems, Challenges and Data Sets" by researchers
@shodanhq
and
@PaloAltoNtwks
Really nice set of topics/datasets
Ember: An Open Source Classifier And Dataset by
@EndgameInc
. Huge step forward for reproducibility in malware classification research. Thanks
@mrphilroth
and
@drhyrum
for your hard work!
Github:
Blog:
Two ML/infosec papers on identifying malicious strings:
Predicting Domain Generation Algorithms with LSTMs by
@jswoodbridge
et al.
eXpose: A Char-Level CNNs For Detecting Malicious URLs, Paths and Reg Keys by
@joshua_saxe
et al.
Just came across
@struppigel
's "Malware Analysis For Hedgehogs" channel. His breakdown of the Basic Structure of PE Files is super helpful for Data Scientists who may be working on malware classification. Very simple, intuitive explanations.
I am excited to start my new role as the Head of Data Science at
@sublime_sec
. I look forward to showing how we can combine ML & custom rule logic to build a genuinely novel, adaptable, and transparent email security experience.
Updated "Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning" paper by
@drhyrum
,
@mrphilroth
,
@UdacityDave
, Kharkar, and myself!
After 7 years, today is my last day at
@elastic
. I am incredibly lucky to have worked with such a talented group!
I want to thank
@mark_dufresne
and
@snowboardvstree
for their patience, mentorship, and friendship.
@SamanthaZeitlin
for her outstanding leadership
Two books coming out this year on Machine Learning in Infosec by some real smart folks.
"Machine Learning and Security" by
@cchio
& Freeman (Facebook)
"Malware Data Science" by
@joshua_saxe
&
@hillarymsanders
RAPTOR: Ransomware Attack PredicTOR
DGA features coupled w/ time series methods used to build model to identify potentially malicious domains while watching new DNS registrations. Interesting approach, susceptible to bypass by sophisticated adversaries.
In the post on "Linux malware protection in
@elastic
Security,"
@DanielStepanic
and
@gradientjanitor
show how we leverage ML to generate YARA signatures for detecting Linux malware. Code included below.
Post:
Repo:
I really like this graphic the folks
#ThreatHuntingSummit
made for
@randomuserid
talk on "Practical Threat Hunting w/ ML." It captures how we can effectively leverage (un)supervised ML w/o drowning users in FP-prone signals.
Important ML+Infosec research from
@MSFTResearch
. "Neural Classification of Malicious Scripts: A study with JavaScript and VBScript" Highlights difficulties of building a proper dataset and challenges working w/ malicious scripts.
Finally published the research
@threatpunter
and I did on "ProblemChild: Discovering Anomalous Patterns based on Parent-Child Process Relationships" for
@virusbtn
2019.
"Using Recurrent Neural Networks for Decompilation" Uses machine translation methods to decompile binaries. Really cool to see Deep Learning NLP techniques applied to the security space.
Re-reading
@willcfleshman
's post on winning the Malware Evasion Comp. Excellent breakdown of potential blindspots in deep learning (MalConv) and tree-based (Ember) classifiers. Also good background on the malware features used in these models.
I’m super excited that ML-backed Malware Prevention is being released under the free tier.
Congrats to
@mrphilroth
and the rest of the Data Science team on getting this feature developed and released!
Elastic 7.9 is now available!
Elastic Agent (beta) and one-click data ingestion simplify data onboarding and ingest management in the
#ElasticStack
. Plus, we’re launching malware prevention and Workplace Search features under the free distribution tier →
Looks like my
@CamlisOrg
talk was just posted. "ProblemChild: Discovering Anomalous Patterns based on Parent-Child Process Relationships" s/o to
@threatpunter
and
@drhyrum
for their contributions!
"From 0 to 60 with Elastic Security" by
@wesleyraptor
is an excellent end-to-end tutorial on how to create an Adversary Simulation Environment, collect data, and explore the results w/in the Elastic ecosystem.
Posting slides from my
@BsidesDC
talk "Bringing Red vs. Blue to Machine Learning" High-level overview of adversarial ML nomenclature/techniques, plus "practical" application via scenario-based RvBs
I cannot believe it's already been two years since I joined
@sublime_sec
. Super interesting work with a great group of people has made the time fly by.
I cannot wait to see what comes next!
Kitsune: An Ensemble of Autoencoders for Online
Network Intrusion Detection. Unsupervised approach to identifying network intrusions.
Great section on adversarial attacks & countermeasures.
Paper:
Code:
Adversarial Malware Binaries: Evading Deep Learning for Malware Detection in Executables by
@biggiobattista
et al. Targets the MalConv DL malware model.
This group has been doing adversarial research for a while and their papers are fantastic!
I'm worried that WFH is rubbing off on my 4y.o... She just came into my office, brow furrowed, saying "the server must be down or something" while holding an iPad w/ a crashed kid's app.
"Quantifying the Robustness of ML & Current Anti-Virus" by
@willcfleshman
&
@EdwardRaffML
introduces adversarial testing methodology focused on binary manipulations.
Also excellent list of ML/infosec papers in the References section.
"Why We Release Our Research"
@drhyrum
,
@comathematician
and myself layout the importance of releasing ML research to the academic and infosec communities.
If you're going to
@defcon
, check out
@aivillage_dc
. They have a fantastic set of talks lined up:
-
@harini
on modeling User Behavior
-
@NMspinach
on hacking RL systems
- Salma Taoufiq and Ben Gelman (
@SophosAI
) - Alert Prioritization
-
@drhyrum
on this year's ML Evasion Comp!
If you are interested in how the
@elastic
Security Data Science team implemented a stack-based model for detecting anomalous parent-child processes, read our post on ProblemChild. Code and feature transforms made available too!
My 4y.o. daughter is really getting into chess and lately she’s become obsessed with endgame puzzles.
She likes writing down the notation to practice her letters/numbers.
Excited to present "Getting Passive Aggressive About False Positives"
@USENIXSecurity
SCAINet in August w/
@EdwardRaffML
.
We will demonstrate how human-in-the-loop feedback via PA algorithms can enable global malware models to gain local knowledge to reduce FPs.
Fridays on our team are now "Research Fridays" to allow folks to nerd out on an applied SecML/MLOps problem. I spent the day implementing the transformer in URLTran: Improving Phishing URL Detection w/
@PyTorch
&
@huggingface
Paper, code, and data tips:
"Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs" Really neat use of NLP techniques applied to the security domain.
Paper:
Model/Embeddings:
Acquiring a dataset for Macro-based malware classification is tough! However, yesterday I stumbled across this paper/dataset containing featurized and raw macro data. It is an excellent place to start experimenting.
Dataset:
Paper:
Just gave a shout out to my 7 year old during my BSidesLV talk.
I told we her it was going to be streamed and she naturally assumed I’m a YouTube star.
I the ported code from the
@EndgameInc
paper "Predicting Domain Generation Algorithms using LSTMs" over to py3 and Tensorflow 2. Updates to DGA datasets and interpretability methods coming soon.
Had a blast presenting my talk “Bringing Red vs. Blue to ML” today
@BsidesDC
Thanks for all who stuck around until the end and asked such great questions!
We’ve officially joined forces with
@EndgameInc
. Hear from CEO Shay Banon (
@kimchy
) and Endgame CEO Nate Fick (
@ncfick
) live on Oct. 15 at 8:30 a.m. EDT to learn more about what we have in store →
Excited that my
@BSidesLV
talk has been accepted! I will be introducing BabbelPhish, our upcoming open source framework for text-to-code generation. I'll also show how
@sublime_sec
uses LLMs to make it easier for detection engineers to grasp our DSL.
.
@andyplayse4
presenting on making Meterpreter an Adversarial Example. Using some research from
@EndgameInc
on Reinforcement Learning for Malware Evasion.
Detecting Malicious PowerShell Commands using Deep Neural Networks. Another interesting application of DL/NLP techniques being used to solve infosec problems.
A new Python-based parser for preprocessing and feature engineering on Portable Executable (PE) files. This is a great way to start ML research on windows binaries.
- metadata
- ngrams
- entropy
- generate grayscale image representations.
Reinforcement learning is easily my favorite research to dig into, and a user has compiled a list of RL in Security Papers/Github codebases to catch up on the latest developments!
We took our 7 year old to San Diego for her spring break. After the zoo, we went to a brewery where we saw her go up to the bartender to order a Sprite and pretzels and say “put it on my parents tab. It’s under filar”.
"Detecting Homoglyph Attacks with a Siamese Neural Network" Presented
@IEEESSP
workshop by
@EndgameInc
researchers
@drhyrum
@jack8daniels2
and Daniel Grant.
Targets name spoofing commonly used to obfuscate file and domain names in malware/C2 comms.
Deceiving Portable Executable Malware Classifiers into
Targeted Misclassification with Practical Adversarial Samples. Comprehensive lit review and interesting approach from the authors.
If you're attending RSA next week:
@filar
will give a talk at our booth on Tues & Wed at 10:45am. Learn how explainable, transparent machine learning provides much-needed confidence and context in your triage workflow.
Congrats to the
@EndgameInc
researchers for getting their paper "Detecting Homoglyph Attacks with a Siamese Neural Network" accepted in to the
@IEEESSP
Deep Learning & Security Workshop! Looks like a great list of talks!
"Feature Selection for Malware Detection Based
on Reinforcement Learning" An agent is trained through Q-learning to maximize expected accuracy. Action space covers PE header, section and import table.
Anyone working on Insider Threat detection?
@SEInews
has a pretty neat dataset that provides both background and malicious actor synthetic data.
Data:
Paper:
Part 2 of the "Detecting Phishing With Computer Vision" series by
@EndgameInc
researchers
@laborious_dtg
and Bill Finlayson. Provides a great overview of CV techniques and how they applied the YOLO object detection framework.
Lots of code samples too!
How AI can help in infosec (if it can fight through the marketing hype) by
@lilyhnewman
@mrphilroth
offers his opinion and shares some valuable insights he gained while creating
@EndgameInc
malware classifier and the Ember dataset.
If you are interested in detecting Living off the Land attacks using the
@elastic
Stack ML feature check out our webinar "ProblemChild in the Stack" next week!
"MEADE: Towards a Malicious Email Attachment Detection Engine" by
@rharang
&
@joshua_saxe
Super interesting research highlighting features to leverage, classifier comparison, and future research considerations.
"Anomaly Detection in Cyber Network Data Using a Cyber Language Approach" by
@keeghin
and team. Creates a language to build a probabilistic tree structure to identify interesting network events.
Explaining to my 3y.o. a proposal I just saw that would give her $1K.
Me: "What would you do with $1000?"
C: "A $1000 cash money?"
Me: "Yep."
C: "I would probably put that into my ice cream shop"
Reinvesting in her small business...
Are you interested in machine learning & security? My team at
@elastic
is hiring a Security Data Scientist. Come work on malware classification, detecting anomalous events, and more!
Elastic Security Data Science is looking for a Security Researcher to help grow our malware prevention capabilities. If you have experience in malware analysis or RE w/ a passion for ML we want to hear from you! DM if you have any questions.
I’m super excited to have
@gradientjanitor
onboard the Security Data Science team
@elastic
starting today!
Interested to see what he and
@mrphilroth
cook up for Malware Classification.
If you needed another reason to attend
@aivillage_dc
this weekend
@Andrew___Morris
is providing 60 days of
@GreyNoiseIO
API access for those who stop by. Check out his talk this Friday @ 1:20pm
Best papers I read this week:
"Statistical Estimation of Malware Detection Metrics in the Absence of Ground Truth"
"Two Can Play That Game: An Adversarial Evaluation of a Cyber-alert Inspection System"
Spent some time this weekend building a simple (crappy) annotation platform for a NER model I'm building. The idea is to extract entities from poorly written vulnerability reports to grab software name/version info.