After several months of work, we finally enabled Pitchfork reforking on 100% of Shopify's monolith.
~30% reduction in memory usage, ~9% better latency and more!
Just finished upgrading Shopify's monolith to Ruby 3.2.0 today.
Average latency: -6.6%
Median latency: -5.1%
p99 latency: -6.7%
And that's without YJIT.
As of this morning Shopify's monolith is running Ruby 3.3.0-dev in production.
In the process we fixed over 20 bugs that weren't discovered before.
I like to say that this app is the true last boss of Ruby testing.
Ruby 3.3 adds a new parser named Prism, uses Lrama as a parser generator, adds a new pure-Ruby JIT compiler named RJIT, and many performance improvements especially YJIT.
Merry Christmas, Happy Holidays, and enjoy programming with Ruby 3.3!
I merged a third String#<< optimization. Overall it's now 65% faster for UTF-8. Unfortunately my profiler is no longer showing any more low hanging fruits, so that may be the last one 😞
I'm always pissed when I read people calling Rails or Ruby "magic".
But given that "any sufficiently advanced technology is indistinguishable from magic", I guess that makes Ruby a sufficiently advanced technology, which is nice.
I wrote an article on the practical implications of object shapes in Ruby 3.2 and their consequence on performance, notably on the performance of the very common "memoization" pattern:
Today I learned about the `Performance/UnfreezeString` rubocop rule, and it made me angry
We probably should go over all these cops that advocate for unobvious micro-optimizations of this sort.
Queue
#pop
(timeout:) was just accepted by
@yukihiro_matz
. A feature so obvious I always assumed it was left out on purpose, so did many others. Turns out nobody thought about formally requesting it 😂
It's fairly situational but I'm quite proud of that patch.
Makes allocating small buffers much cheaper, which is useful in network clients (e.g. redis).
We just spent 3 days tracking down a memory corruption issue in Ruby 3.3.0-preview3.
It turned out to be triggered by a single, very specific Regexp:
That's why community testing preview releases is so important.
We are pleased to announce the release of Ruby 3.3.0-preview3. Ruby 3.3 adds a new parser named Prism, uses Lrama as a parser generator, adds a new pure-Ruby JIT compiler named RJIT, and many performance improvements especially YJIT.
I just ran a load test against one of our small apps with GVL Instrumentation enabled. The app uses 4 threads per worker.
The graph is in nano-seconds, so that's over 300ms wasted waiting on the GVL per request on average 😱
I just merged the GVL instrumentation API I mentioned in the article. I'm super excited to start collecting production metrics with this. Thanks to
@_ko1
and others for the reviews and directions❤️
How would you call a method that works like Hash
#dig
, but raises an error when a key is missing? The feature is acceptable, but blocked until we find a good name
@kddnewton
@joeldrapper
I have on my personal todo list to look at interpolation performance. On paper you'd think it would be almost always faster than concatenation, but most of the time it's slower.
It has one big advantage by being able to right size the string.
But most of these tend to stabilize quite fast after the associated code path have been executed a few times.
So Pitchfork's answer to this is to periodically take a warmed up worker out of rotation and use it to fork a new generation of workers.
I'm trying to schedule an unpaid day off to answer to the nationwide call for strike next week.
But because the company has an unlimited paid time off policy, it seems like there's just no way to do it. 🤦
#FirstWorldProblems
@nateberkopec
I don't have actual numbers on hand, but from what I see only a small minority of Shopify new hires are actually experienced Ruby devs. Most of them barely did any Ruby prior to joining. And that's not anything new, it's been like that for years. We just train them.
I was hoping for some cool stuff to eventually come out of the GVL instrumentation API, but I must say I wasn't expecting it to happen this fast. ❤️
@KnuX
I've just posted about my new
#ruby
gem: gvl-tracing.
With this gem you can generate a timeline of Ruby Global VM Lock ("GVL") use in a Ruby application:
@joeldrapper
This is fine for small needs, but explicitly defining jobs is useful for having different queues with different SLAs, different error handling/retry policy, dealing with backward compatibility when changing a job argument, etc etc
Since this comes up often: IMO the rule of thumb for whether or not you should yank a gem release is: don't.
Two exceptions:
- The gem was pushed by an attacker who took over your account.
- Legal reasons (e.g. copyright, etc).
1/4
As a result, after the initial warm-up period, it is able to share much more memory than traditional pre-forking servers would, as demonstrated on the screenshot.
@nganpham
That’s why testing ruby-head is very beneficial. 3.0 to 3.1 took us, 1 engineer, about 2.5 hours (post release).
And we got to help ensure a very stable 3.1.0 release by reporting bugs early.
What's funny is that I opened this code assuming String#<< would already be super optimized, given how much Ruby is used for HTML templating and such, but it seems that there are quite a few low hanging fruits left in there.
Adding or changing opcodes is way more involved though, so I'm not gonna try that today.
Still got a 1.40x improvement in less than two hours, I think I'm happy with that for now:
For fun I tried to port the buffered IO implementation
@BiHi
and I developed for redis-client to net-http.
The main perf gain is when reading line by line, in a large buffer, which Redis does a lot, and HTTP a bit less
Note that I had to choose a size, but didn’t have much datapoints. So I settled on 100 by default but I’m open to change it if some people provide some datapoint by instrumenting their app in production
@nateberkopec
I start Stackprof in `config/boot.rb`, and then run `bin/rails runner 'StackProf.stop; ...'`. If doing this in production env, that should catch all boot.
All this to say that Rails apps using ERB views should hopefully notice a nice perf improvement with Rails 7.1 and Ruby 3.2. I'd love to give an actual figure but thats' heavily dependent on the templates, so your mileage may vary.
The key feature of Pitchfork is its reduced memory usage thanks to reforking.
When you fork a process, at first all its memory is shared between the parent and the child, that's called Copy-on-Write or CoW for short.
So in theory, subprocesses should be pretty much free, but in practice lots of shared memory gets invalidated when you start executing Ruby code. This is due in large part of the Ruby VM inline caches and JITed code.
Do you use Active Record's `extending` API?
I'd like to replace it by another API because it has some nasty performance implications and can cause leaks on current Rubies.
But I'm not sure I understand all the use cases for it.
Ref:
Went for my first Japanese breakfast. There was ratatouille, pot-au-feu and some kind of choucroute o_0
The local food was surprisingly delicious though.
😂 Just as I'm about to release redis-rb 5.0.0 first beta, I realize that both `redis_cluster` and `redis-cluster` gem already exist...
I'll need to find another name for my cluster support extraction...
@nateberkopec
Whereas frozen string literals are interned yes, but other non interned strings may be equivalent. So you have to to a full string comparison regardless.
@palkan_tula
Meh, Eileen's post was about monkey patches to change a dependency behavior or API.
Here's it's adding a method that didn't exist before. Could have happened just the same in ruby-core.
And as mentioned on the issue, I think the RSpec code is far from perfect here.
@getajobmike
That would require to mutate the Integer, which isn't possible since they are immediate.
You'd need a wrapper object like concurrent-ruby's AtomicFixnum. Would definitely be nice to have something like that in the stdlib though.
@nateberkopec
Depends how it’s configured. If you use it purely in memory you save having to write to disk, which for write heavy workloads does make a difference. Latency of SSDs improved tremendously, but are still 3-4 orders of magnitude slower.
I'm getting real tired of this guy. He keeps commenting an opening issues all across GitHub and reporting it for spam to GitHub doesn't stop it.
@github
don't you have a spam filter on public repos?
@keithpitt
please give all my thanks and praise to whoever is the genius that shipped this new UI feature.
This saves me an incredible number of clicks when digging into CI failures.
@fxn
A big difference to me is that constants are namespaced. Whereas with global variable if they were as popular you’d mite easily run into conflicts.
@fxn
@fatkodima
Yeah, we've been using a similar strategy for uniqueness validation
We should definitely try to upstream that. The annoying par it that it's hard to make 100% reliable because you have to parse error messages which may change easily or be localized...