Fibers are Amazing (Ruby)

Fri 22 Jan 2021
Reading time: (~ mins)

With the recent release of Ruby 3 I was looking forward to playing around with the new concurrency primitive, Ractors. As luck would have it the release notes mentioned another concurrency addition in the release called "Fiber Scheduler". This led me down a wonderful rabbit hole and completely distracted me from ever writing a single line of Ractor code.

Here is a peek at a project of mine that gets a 6x speed up using Fibers with a 4 line code change:

links = {}
count = 0
shows = # list of show objects

scheduler = Scheduler.new
Fiber.set_scheduler scheduler


shows.each_with_index do |s|
begin
Fiber.schedule do
    res = Net::HTTP.get_response(URI(s.url))
    ...
    if # new episodes released
      links[count] = s
      count += 1
    end
    ...
  rescue
    p "Failed: #{s.title}"
  end
end

scheduler.run

...

The code essentially makes 20+ http requests to Wikipedia and then parses the response with Nokogiri to check if episodes number has changed since last visit. An important note here is that using Thread or fork here would not work without significant code change due to shared variables count and links.

Parallelism is quite complicated in Ruby. It is almost impossible to achieve via threads due to the GVL which usually forces only one thread to run at a time. Threads are notoriously hard to get right as locks are needed to access shared state. Another option is process level parallelism which is inherently safer due to separate memory spaces but forking has a high cost and benefits only when you have many physical cores to back it up.

In comes Fiber to save the day, a stack-based concurrency primitive. They are by-far the lightest primitive, implemented as functions that have the ability to yield/resume as you see fit. Through the exhaustive effort of Samuel William and friends, Thread and Fiber in ruby have been getting lots of love lately. Samuel has introduced a simple Fiber::SchedulerInterface that allows you to quickly make a program concurrent for any blocking I/O. So only thing that a developer does have to worry about is creating hooks for all I/O actions so that her Fiber can transfer control before blocking the whole program.

Alright, here is another little toy example:

def task(n)
  puts "Starting task #{n}"
  sleep(1) # I/O blocking work
  puts "Finishing task #{n}"
end

# Synchronous
3.times do |n|
  task(n)
end
# takes 3 seconds to complete

# Asynchronous with Fibers
scheduler = MyScheduler.new
Fiber.set_scheduler(scheduler)

3.times do |n|
  Fiber.schedule do
    task(n)
  end
end

scheduler.run
# takes ~1 seconds to complete

The hidden benefit here, aside from the obvious concurrency is that each Fiber is controlled by your scheduler and only yields on blocking I/O. This removes a whole class of thread issues that occur because of pre-emption and unpredictable access to shared state. Fibers are essentially a drop-in form of concurrency with no extra work from the developer.

There is an example scheduler on the main ruby repo that you can essentially copy and paste and start using right away. Unfortunately as of writing this, there is currently an error with the code they provide so instead use the corrected version from here, as well as a corrected sample test file. Note: this scheduler depends on IO.select which only supports 1024 open files at a time. A production grade solution would probably want to use epoll or kqueue instead.

If you are interested in way more information on fibers check out these links:

Don't Wait For Me! Scalable Concurrency for Ruby 3! by Samuel Williams

[Gem]Production ready library to start utilizing Fiber in your apps now!

[Gem]Experimental Fiber Schedulers that use production grade I/O event loops

[Talk]Overview of the Fiber Scheduler project

[Talk]Fibers Are the Right Solution by Samuel Williams

Enjoy :)

Questions? Free free to contact me anytime :)

Get Notified of Future Posts