Quantcast
Channel: The Sleepless Geek
Viewing all articles
Browse latest Browse all 25

Ruby's Set class

$
0
0
Yesterday at work, we ran into an interesting problem. We're creating the new version of an application and discarding the old, ugly code. But we need to migrate some data: the old system has (let's say) widgets, and the new system has widgets, too. The old system uses 5 different databases (see how ugly?) with weird row schemas, but it does reliably have widget color, size, and shapes. The new system uses one database and has a nice row schema, but it also has widget color, size, and shapes.

We need to know: which widgets are only in the old system? Which widgets are only in the new? Which are in both?

Enter Sets


Asking these questions tripped a switch in my mind. "I know about this!", I thought. "This is a job for sets! And Ruby has a Set class."

I'd never used them yet, but sets are made for this kind of thing. Sets are often illustrated with Venn diagrams: overlapping circles, where you ask "which things are only in the left circle? What's in the overlap?", etc.

For instance:



A set is a list of items where no item is repeated. If you have more than one set, you can compare them and answer the kinds of questions we've been asking. Here's a demo I just threw together:

require 'set'

def sets_demo

# Sets ignore duplicate values
game_words = Set.new(['duck','duck','duck','goose'])
puts "Unique game words : #{game_words}\n\n"
#=> Unique game words : goose, duck


# Here are two sets with one thing in common
fast = Set.new(['bullet', 'cheetah'])
round = Set.new(['bullet', 'beach ball'])

# All the ways we can compare them
puts "Round : #{round}"
#=> Round : bullet, beach ball

puts "Fast : #{fast}"
#=> Fast : cheetah, bullet
puts ''


puts "Round and Fast (&) : #{(fast & round)}"
#=> Round and Fast (&) : bullet
#
puts "Round but not Fast (-) : #{(round - fast)}"
#=> Round but not Fast (-) : beach ball

puts "Fast but not Round (-) : #{(fast - round)}"
#=> Fast but not Round (-) : cheetah

puts "Round OR Fast (|) : #{(round | fast)}"
#=> Round OR Fast (|) : cheetah, bullet, beach ball

puts "Round OR Fast, but NOT both (XOR) : #{((round | fast) - (fast & round))}"
#=> Round OR Fast, but NOT both (XOR) : cheetah, beach ball

end

# Formatting the way the sets print
class Set
def to_s
to_a.join(', ')
end
end

sets_demo

Got it?

In my examples, the items in the sets were strings, but they could be anything. In our case at work, we used hashes: a widget was represented by a hash containing its color, shape and size. So, we just had to:

  1. Connect to each of the databases in the old system, getting all the widgets, creating a hash for each one, and dropping each into an old_system_widgets set (which automatically ignores duplicates)
  2. Connect to the new system's database and make a similar set of its widgets
  3. Do the kinds of set operations illustrated above

Voila! Now we knew which widgets were new and which ones still needed to be migrated to the new system.

In conclusion: sets are swell!

Hmmm. That's a pretty weak ending.

Viewing all articles
Browse latest Browse all 25

Latest Images

Trending Articles



Latest Images