🚀 Would you like to improve the latency of your Go apps?
In the video below I'll walk you through 5 optimizations using
@datadoghq
's new profiling timeline feature. Finding and fixing those problems would have been really hard with other tools ✨.
I herby start a new
#blockchain
consulting business:
1. You ask me if X will be revolutionized by blockchain technology.
2. I charge you $100k.
3. I say no.
4. You save millions of dollars.
First customer gets 10% discount. Contact me now 🤙🏻
Can travel be revolutionized with blockchain technology?
A blog post written by Matthias Felder and Moritz von Bonin published by our strategic partner
@IBM
@IBMBlockchain
:
Pro Tip: Are you tired of reviewing GitHub PR diff chunks out of context?
Replace with in the URL to enjoy a much better reviewing experience.
via
@__jakub_g
What happens when you make an http request in
#go
?
res, _ := http.Get("")
io.Copy(io.Discard, res.Body)
Below is a sneak peak my new function call tracer for Go🕵🏻♂️. Note time spent on TLS / Cert Loading.
Visualizer is Perfetto UI.
1) Are you using
#postgres
via
#docker
for mac?
Have you ever noticed `EXPLAIN ANALYZE` slowing down your queries by like 60x?
Let's dig into some
#postgres
and
#linux
internals to figure out what's going on!
📢 New project: The Busy Developers's Guide to Go Profiling, Tracing and Observability.
It's early days, but I'd love to get some feedback on my simplified models for understanding the Goroutine Scheduler and Garbage Collector. PTAL
📢 Announcing: Stack Traces in Go
In-depth research explaining Go's stack layout, various unwinding techniques (frame pointers, gopclntab, DWARF), etc. and how it all related to profiling.
Please let me know what you think! : )
🎉 Announcing fgtrace, a new profiler/tracer for
#golang
.
It captures wallclock timeline views for each goroutine and it's really simple to use:
defer fgtrace.Config{}.Start().Stop()
Check it out & let me know what you think
🚀🧵 I'm herby releasing a new
#golang
profiler called fgprof that allows you to analyze On-CPU as well as Off-CPU (e.g. I/O) time together.
AFAIK, this is impossible with the builtin Go profilers 🙈.
Please RT or comment on the thread : )
Nothing cures you of ideas like this than co-founding your own company, creating your own tech debt, and a few years later concluding that: no, it's not the right business move to clean it up.
It's hard to express how humbling of an experience this was for me at the time.
If I was in charge at a big tech company I would have a team whose entire job was to refactor and clean up code.
They'd add the original authors as reviewers, make custom lint rules, etc.
Turns out you can reduce the memory usage of a
#golang
by ~1.4MiB (on 64bit) by simply calling this at the beginning of your main:
runtime.MemProfileRate = 0
Join my
#P99CONF
presentation on Oct 6th to learn about:
- Go Profiling
- Go's unique runtime, calling convention
- Using eBPF with Go
- Why you should never use uretprobes with Go
- Using Linux perf with Go
- Go's builtin tracer
- How I managed to fit all of this in 20min
Profile-guided optimization for Go is great, and we have already used it to save significant amounts of money at Datadog.
However, as part of a wider rollout, we noticed that one service saw an 18% increase in memory usage from pgo until we performed a rollback.
Austin Clements from the Go team just landed a series of 17 patches. If everything goes well, you'll never notice that anything changed.
That being said, watching this come together has been mind blowing. This is serious engineering (TM).
🧵 Thread ...
I'm very excited to announce my latest project in the
#go
#profiling
space.
sprof is a static profiler that can profile your applications without running them using only static analysis.
It's a true revolution in
#profiling
- you should check it out!
Announcing my new
#PostgreSQL
tool: sqlbench 🎉🚀.
sqlbench measures and compares the execution time of one or more SQL queries.
Please RT & comment to let me know what you think!
1) Oh man, computer stuff is hard. A small
#postgresql
thread:
After spending weeks optimizing an ETL process to be 3x faster, an index-only scan got 3x slower compared to a replica that hadn't undergone the new ETL process. Main clue: (shared) buffer hits were up by 10x.
Starting tomorrow, I'll begin my new job at
@datadoghq
, working on continuous profiling for
#golang
🎉.
Most of my work should be open source, and hopefully I'll even get the chance to submit patches to the Go project itself!
I'm super excited!
#golang
folks: How do you measure and evaluate the memory usage of the code you're writing? Do you care about mean usage? Max usage? Do you use MemStats, runtime/metrics, runtime/trace?
Here is my attempt to help a colleague. Please share your thoughts!
🎉 The Go Profiling Guide now covers all profilers built into Go!
Big thanks to my new colleague Nick Ripley who contributed the mutex profiler section.
A powerful type system is a double edged sword.
Go's type system might be dull, but it prevents a lot of self-harm that is possible in other languages.
I can finally talk about the new feature I've been working on
@datadoghq
for the past few months.
Please check out our new profiling timeline for Go. We believe it's an absolute game changer for debugging tricky latency problems.
#golang
#performance
🧵
I'm very excited to share what I've been working on at
@datadoghq
recently.
✨ Connecting Go Profiling With Tracing
Check out the blog post to learn how profiling can fill in the gaps for distributed tracing, and how it works under the hood!
📣 Releasing my
#golang
block profiler notes, covering:
- What does the block profiler do?
- How can you use it?
- How does it work?
- What's the overhead?
- How accurate is it?
It also comes with benchmarks, simulations and pretty graphs 🎉. Please RT
📺 Go scheduler: Implementing language with lightweight concurrency (2019)
This presentation by
@dvyukov
outlining the problems solved by Go's scheduler is 🔥.
Wonderful slides, explanations and examples!
I didn't realize how easy it is to compile a linux kernel and hook it up to gdb under qemu. I even have it integrated with VS Code now. This is a very productive way to explore kernel stuff 🤩.
I just finished a vacation project: A working prototype + proposal for a new
#golang
profile type to break down stack memory usage by function.
Link is in the 🧵, PTAL and upvote/RT if you like it 🚀.
This might be the first stack memory profiler ... ever? 🤯
Just came across this hidden gem by
@k0dvb
- a free YouTube course teaching both basic and more advanced concepts of Go. I checked a few videos and it seems really well done 👏🏻
I'm very grateful to
@datadoghq
for the opportunity to contribute to the Go runtime and the Go team at Google for the great collaboration ❤️.
The new post on the Go blog puts our frame pointer unwinding optimizations in the context of the larger changes to the tracer in go1.22.
Awesome new
#golang
blog on traces:
The blog highlights several key enhancements in Go's execution traces, focused on the runtime/trace package. Here's a summary:
- Prior to Go 1.21, the run-time overhead of tracing was somewhere between 10–20% CPU for
Just published a new blog post 🥳
Go arm64 Function Call Assembly
It covers only 14 assembly instructions, but tries to do so in full depth. No hand waving at the software/hardware interface 😅.
Let me know if you find any mistakes!
You know the old saying, it takes a village to build a good profiler. So here is an in-depth blog post on the low-level tech and open source collaboration stories behind:
Profiling Improvements in Go 1.18 🚀
I'm very excited to share what I've been working on at
@datadoghq
recently.
✨ Connecting Go Profiling With Tracing
Check out the blog post to learn how profiling can fill in the gaps for distributed tracing, and how it works under the hood!
I'm currently writing an article called "Go Memory Metrics Demystified".
It's about runtime.MemStats and runtime/metrics, but also profiling.
Let me know if this sounds interesting to you.
And also let me know if you have any questions you'd like to see answered. Thanks 🙏
Been writing some arm64 SIMD code to optimize a
#golang
application to save some big $$$ / year.
Wish Go supported SVE. But even with NEON I'm seeing 4-5x speedup compared to highly optimized Go 🤩.
Hopefully I'll get to do a blog post about this soon!
New Blog Post: Learn how the CPU overhead of the Go execution tracer can be reduced by up to 25x by rediscovering the ancient art of frame pointer unwinding 🚀
TIL about git-absorb
If you're familiar with `git commit --fixup` and `git rebase -i --autosquash`, you'll love it:
absorb basically automates the process of figuring out which of your staged changes should be added to which commit in your history 🤯
I don't use Python for anything other than data visualization and analysis, and it's usually a pleasure.
But it's also a good reminder for how slow python can be. Example: Parsing a 500kB YAML file:
Python: 28s
Go: 500ms
I can confirm this. What most people don't realize is how much time is spend in mallocgc in
#golang
. It's death by a thousand paper cuts that is difficult to see.
Here is a toy app hiding 23.7% time in mallocgc. See screenshot 2 where with focus_on(). Real apps look similar!
Running some backend infrastructure in GC'ed languages, I am always amazed how much time is cumulatively spent in GC.
20-30% is not rare.
It is a bit crazy that memory management is so expensive.
👨🏻🔬 Go memory metrics demystified:
- How does process RSS relate to go memory metrics?
- How do you identify off-heap allocation problems?
- Why does the heap profile under-report memory usage by 2x?
If this sounds interesting, check out my new post:
Just found myself using the delve
#golang
debugger instead of printf for debugging a real world test case failure and it was smooth 🤯!
Who wants a video? :)
Next question: Should Go even preempt running goroutines if there is nothing else to do?
Maybe there is a good reason, but I couldn't help myself and cooked up a small patch that disables preemption if there is nothing else todo. runtime/trace is 🤩
Since more people seem interested in my new software/hardware project than my PRs to telescope.nvim, I guess I should share the release notes for Mira v0.1 as well 🎉.
Not spending a lot of time with computer stuff during paternity leave ... but when I do it's apparently my first PR for telescope.nvim.
Also redoing my entire vim config in pure lua 🙃.
I am very excited about this. Not only do frame pointers massively simplify CPU profiling, they also enable Off-CPU profiling use cases that require really fast unwinding that is impossible to achieve with today's techniques and hardware.
See
I think I was able to significantly improve the state of art of the "Processing Large Files" challenge (linked) in
#golang
.
My solution uses 3.5x less CPU time than the best solution I found 🌍🌿.
Would anybody be interested to read a blog post about it?
Go's memory profiler has a limit of 32 stack frames.
Today I looked at a case where over 50% of a live heap profile exceeded this limit, which made me very sad.
So I took another look at the issue, and I might have been able to cook up a patch 🤞.
What's the german word for being unable to merge a PR to disable a flaky test due to another flaky test failing?
I'll offer Testinstabilitätsgefängnis or Wackeltesthölle, but I'm open to better suggestions 🤣
Yes!!!! Frame pointers everywhere! This stuff is going to enable so many amazing performance and debugging wins, I'm very excited ✨
No more eval of turing complete DWARF bytecode non-sense. No more huge unwind tables.
h/t
@KnuX
🥳 My presentation for
@gopherconeu
in Berlin (June 17-20th) was accepted!
How to Win Frames and Influence Pointers
tl;dr: How frame pointer unwinding was implemented in the execution tracer, and what it means to Go devs interested in performance.
Looking forward to the event!
Came across this before, but just did a more careful reading. I didn't realize that context switching is 20x more expensive than mode switching in linux, e.g. ~100ns vs ~2000nsec on fast HW.
What's making context switching so expensive? TLB flushes?
Cattle, not pets - sure. But databases are Wagyu class.
Feed them the best CPU, Memory and NVMe you can afford.
I hear they also like music, wine and massages.
@polizeiberlin
Ich meine hätte Stefan das Gespräch beendet als er dezent auf seinen Verstoß hingewiesen wurde ... okay. Kann man mal tolerieren.
Aber wenn Stefan das "wurscht" ist und er einfach weiter telefoniert? Puh, da verstehe ich jetzt nicht warum die Polizei da tolerant sein soll?
#golang
#performance
folks, here are are my notes on "Goroutine Profiling in Go".
It's probably the most comprehensive "docs" you'll find for this, at least I'm not aware of anything similar 🙃.
Looking forward to feedback / questions / etc.!
Bold claim. Doesn't match my intuition nor what I heard from most rust devs.
That being said, I'd love to learn more!
Maybe the "internal data from Google projects in 2022 and 2023" won't be published. But was methodology mentioned? Number of commits / LoC 🙈? Something better?
Wrote a new blog post: 8 Unexpected Profiling Use Cases Beyond Performance Optimization
h/t
@thorstenball
for trying to write and publish stuff in < 60min to keep momentum :)
Found another great use case for ChatGPT.
Take some real Go code I'm currently working on that contains a state machine and have it generate the graphviz code for it.
I love my work, got no financial worries, a beautiful family and wonderful friends.
But the future of our planet has me so worried that it’s hard to feel motivated some days.
How do y‘all cope with this? Tech optimism? Denial? Nihilism? Hedonism?
I’m open to suggestions :)
TIL: Using go test -race also randomizes the operation of the
#golang
scheduler 🤯.
You should always run the race detector, but this makes it even more important as it also helps to detect logic races (not just data races).
I work on profilers. This is an immediate bookmark for me. This is a really great summary. And a reminder what a mistake DWARF unwinding. Omitting frame pointers on x86-64 was a mistake!
I'm embarrassed to say it, but I didn't have a robust intuition about stack frames, unwinding and exception handling stuff (CIE, FDE, CFI, eh_frame and especially all things DWARF). This great note made it much clearer:
Is there a German word for the anxiety you get when trying to add a newline in a iMessage on macOS and you're not sure if you need to hold shift or alt in order to avoid sending it prematurely?
🎉 My second patch for Go profiling just landed (again, first merge was reverted b/c flaky testing).
This will improve the accuracy of pprof labels in Go 1.18. Together with
@rhyshiltner
's patch for GH 35057 1.18 will be a great release for profiling!
👋🏻 I made a video: Using Delve to Examine Memory in Go
It's 5 minutes and shows how to examine the memory of a slice header in
#golang
as well as the array pointer it contains.
If you like it, I'll make more Go videos like this, e.g. about profiling.
PSA: Google Maps favorites uses a ring buffer of 500 items. If you add more, it silently (!!) deletes old entries. 😭😭😭
cc
@ctavan
who told me a about this
Did you know: You can't just copy & paste code from another open source project into yours just because both have the MIT License. You need to include the license/copyright info.
Yes, I'm subtweeting. Don't make me @ you : p.
First shot AstraZeneca 💉✅.
Risk profile for a 30-something isn't great compared to waiting for Biontech, and I'd like to say I'm doing this for society.
But if I'm being honest, this is mostly for retribution. Fuck you SARS-COV-2. May you vanish forever.
Go might gain profile guided optimization (pgo) in the future thanks to Uber. The idea is to use pprof profiles to help the compiler generate faster code 🚀. Very cool.
The quickest way to rid yourself of the notion that you understand computers is to benchmark stuff and look at the raw data.
Depicted here: A JSON parsing workload (16 goroutines) under macOS vs Linux w and w/o profiling.
The more you look, the more odd stuff you find 🙈
PSA: Datadog is available to open source projects for free!
In particular our CI Visibility product is really amazing for finding bottlenecks and flaky tests.
You can learn more about this program, and some of our other OSS contributions here.
Having execution tracing enabled in prod is fantastic. Look at this GC cycle causing absolute havoc on scheduling latency 😱.
Fix: Reduce allocs or increase GOMAXPROCs.
Just came back from a one week trip to the
@datadoghq
office Paris.
OMG. Meeting my colleagues for the first time after 1.25 years in person was amazing. They are incredible people. Smart, kind, funny, interesting, ...! I'm really lucky and grateful to be part of this team 🍀🙏🏻.
The hardest part of projects is the final grind. The feature is done, the happy path is mind blowing.
But there are still hundreds of a little paper cuts. Some are bugs, some are UX, some are performance, etc.
The worst part: You'll never be done. You got to call it yourself.
Blazingly Fast Shadow Stacks for Go 🚀
In this blog post I'm sharing my research and implementation for shadow stacks in Go which deliver up to 8x faster stack trace capturing than frame pointers 🤯.
This offers a glimpse into the future of hardware accelerated shadow stacks.
My first
#profiling
patch just got merged into the
#golang
core 🎉🍾🥳.
In other news: The block profile will become a little more accurate in Go 1.17.
🙏🏻 Huge thanks to
@dvyukov
and
@prattmic
for reviews & discussions.
Weekend project completed 🥳
traceutils anonymize <input> <output>
Takes a runtime/trace file and strips it of sensitive information (file names, function names, user logs).
This was part of a learning exercise to understand the binary format.
A new
@datadoghq
feature my team and I have worked on has officially gone live (public beta) now:
Watchdog Insights for Continuous Profiler 🎉
The idea is to automate some of the problem analysis and solution proposal a profiling expert on your team might provide.
#AstraZeneca
is:
- 100% effective at preventing hospitalization and death
- 79% effective at preventing symptoms
Dear Media: PLEASE pick the first number for your headlines. It's the number that matters and that should motivate peoples decision making!
We recently upgraded some servers from Oracle
#Linux
to Ubuntu and saw disk busyness go from 10% to 60% and avg read/write latency go up by 10x.
Turns out the culprit was the I/O scheduler! Ubuntu uses cfq by default, OL used deadline. Switching back to deadline fixes things! 🤯
👋🏻 I’m at
@gopherconeu
today and tomorrow. Come and say hello if you want to talk about go profiling, execution tracing, runtime metrics, frame pointers, performance, otel or anything else :).
#gopherconEU
#golang
Comparing my Go vs SIMD implementation to check if a string consists out of a set of valid characters and doesn't have double underscore sequences.
SIMD is up to 5x faster for bigger inputs 🎉. Blog post coming soon :).
Last Friday I made one of the hardest decisions of my life. I chose my love and commitment to open source over an amazing job with people I love.
That being said, I'm excited about what lays ahead in my new job. Expect to see more
#Go
profiling stuff from me starting Jan 4th.
#golang
pro tip: Use -m with the godbolt compiler explorer to see your code annotated with compiler tooltips about inlining, heap allocation, etc.
Try it yourself: