Fri 22 Jan 2021
Reading time: (~ mins)
With the recent release of Ruby 3 I was looking forward to playing around with the new concurrency primitive, Ractors. As luck would have it the release notes mentioned another concurrency addition in the release called "Fiber Scheduler". This led me down a wonderful rabbit hole and completely distracted me from ever writing a single line of Ractor code.
Here is a peek at a project of mine that gets a 6x speed up using Fibers with a 4 line code change:
links = {} count = 0 shows = # list of show objectsscheduler = Scheduler.new Fiber.set_scheduler scheduler
shows.each_with_index do |s|begin
Fiber.schedule do
res = Net::HTTP.get_response(URI(s.url)) ... if # new episodes released links[count] = s count += 1 end ... rescue p "Failed: #{s.title}" end endscheduler.run
...
The code essentially makes 20+ http requests to Wikipedia and then parses the response with Nokogiri to check if episodes number has changed since last visit. An important note here is that using Thread or fork here would not work without significant code change due to shared variables count and links.
Parallelism is quite complicated in Ruby. It is almost impossible to achieve via threads due to the GVL which usually forces only one thread to run at a time. Threads are notoriously hard to get right as locks are needed to access shared state. Another option is process level parallelism which is inherently safer due to separate memory spaces but forking has a high cost and benefits only when you have many physical cores to back it up.
In comes Fiber to save the day, a stack-based concurrency primitive. They are by-far the lightest primitive, implemented as functions that have the ability to yield/resume as you see fit. Through the exhaustive effort of Samuel William and friends, Thread and Fiber in ruby have been getting lots of love lately. Samuel has introduced a simple Fiber::SchedulerInterface that allows you to quickly make a program concurrent for any blocking I/O. So only thing that a developer does have to worry about is creating hooks for all I/O actions so that her Fiber can transfer control before blocking the whole program.
Alright, here is another little toy example:
def task(n) puts "Starting task #{n}" sleep(1) # I/O blocking work puts "Finishing task #{n}" end # Synchronous 3.times do |n| task(n) end # takes 3 seconds to complete # Asynchronous with Fibers scheduler = MyScheduler.new Fiber.set_scheduler(scheduler) 3.times do |n| Fiber.schedule do task(n) end end scheduler.run # takes ~1 seconds to complete
The hidden benefit here, aside from the obvious concurrency is that each Fiber is controlled by your scheduler and only yields on blocking I/O. This removes a whole class of thread issues that occur because of pre-emption and unpredictable access to shared state. Fibers are essentially a drop-in form of concurrency with no extra work from the developer.
There is an example scheduler on the main ruby repo that you can essentially copy and paste and start using right away. Unfortunately as of writing this, there is currently an error with the code they provide so instead use the corrected version from here, as well as a corrected sample test file. Note: this scheduler depends on IO.select which only supports 1024 open files at a time. A production grade solution would probably want to use epoll or kqueue instead.
If you are interested in way more information on fibers check out these links:
Enjoy :)