Yesterday at work, we ran into an interesting problem. We're creating the new version of an application and discarding the old, ugly code. But we need to migrate some data: the old system has (let's say) widgets, and the new system has widgets, too. The old system uses 5 different databases (see how ugly?) with weird row schemas, but it does reliably have widget color, size, and shapes. The new system uses one database and has a nice row schema, but it also has widget color, size, and shapes.
We need to know: which widgets are only in the old system? Which widgets are only in the new? Which are in both?
Asking these questions tripped a switch in my mind. "I know about this!", I thought. "This is a job for sets! And Ruby has a Set class."
I'd never used them yet, but sets are made for this kind of thing. Sets are often illustrated with Venn diagrams: overlapping circles, where you ask "which things are only in the left circle? What's in the overlap?", etc.
For instance:
![]()
A set is a list of items where no item is repeated. If you have more than one set, you can compare them and answer the kinds of questions we've been asking. Here's a demo I just threw together:
Got it?
In my examples, the items in the sets were strings, but they could be anything. In our case at work, we used hashes: a widget was represented by a hash containing its color, shape and size. So, we just had to:
Voila! Now we knew which widgets were new and which ones still needed to be migrated to the new system.
In conclusion: sets are swell!
Hmmm. That's a pretty weak ending.
We need to know: which widgets are only in the old system? Which widgets are only in the new? Which are in both?
Enter Sets
Asking these questions tripped a switch in my mind. "I know about this!", I thought. "This is a job for sets! And Ruby has a Set class."
I'd never used them yet, but sets are made for this kind of thing. Sets are often illustrated with Venn diagrams: overlapping circles, where you ask "which things are only in the left circle? What's in the overlap?", etc.
For instance:

A set is a list of items where no item is repeated. If you have more than one set, you can compare them and answer the kinds of questions we've been asking. Here's a demo I just threw together:
require 'set'
def sets_demo
# Sets ignore duplicate values
game_words = Set.new(['duck','duck','duck','goose'])
puts "Unique game words : #{game_words}\n\n"
#=> Unique game words : goose, duck
# Here are two sets with one thing in common
fast = Set.new(['bullet', 'cheetah'])
round = Set.new(['bullet', 'beach ball'])
# All the ways we can compare them
puts "Round : #{round}"
#=> Round : bullet, beach ball
puts "Fast : #{fast}"
#=> Fast : cheetah, bullet
puts ''
puts "Round and Fast (&) : #{(fast & round)}"
#=> Round and Fast (&) : bullet
#
puts "Round but not Fast (-) : #{(round - fast)}"
#=> Round but not Fast (-) : beach ball
puts "Fast but not Round (-) : #{(fast - round)}"
#=> Fast but not Round (-) : cheetah
puts "Round OR Fast (|) : #{(round | fast)}"
#=> Round OR Fast (|) : cheetah, bullet, beach ball
puts "Round OR Fast, but NOT both (XOR) : #{((round | fast) - (fast & round))}"
#=> Round OR Fast, but NOT both (XOR) : cheetah, beach ball
end
# Formatting the way the sets print
class Set
def to_s
to_a.join(', ')
end
end
sets_demo
Got it?
In my examples, the items in the sets were strings, but they could be anything. In our case at work, we used hashes: a widget was represented by a hash containing its color, shape and size. So, we just had to:
- Connect to each of the databases in the old system, getting all the widgets, creating a hash for each one, and dropping each into an old_system_widgets set (which automatically ignores duplicates)
- Connect to the new system's database and make a similar set of its widgets
- Do the kinds of set operations illustrated above
Voila! Now we knew which widgets were new and which ones still needed to be migrated to the new system.
In conclusion: sets are swell!
Hmmm. That's a pretty weak ending.