hpricot:An Hpricot Showcase:Hpricot Challenge:Extracting multiple children from a table

ほぼ、まんま

require "hpricot"

X =<<EOS
<table>
  <tr>
    <td>...stuff I don't want...</td>
  </tr>
  <tr>
    <td>
       <table>
         ------------rows i want
         <tr>
           <td>
             <table>
               <tr>
                 <td>Field 1</td>
                 <td>Field 2</td>
               </tr>
             </table>
           </td>
           <td>Field 3</td>
           <td>Field 4, Field 5</td>
         </tr>
         ------------end of rows i want
       </table>
    </td>
  </tr>
</table>
EOS

doc = Hpricot(X)
x = (doc/"table//table//td").collect do |k|
  k.inner_html.split(',') unless k.inner_html =~ /</
end.flatten.compact

p x

で、

["Field 1", "Field 2", "Field 3", "Field 4", " Field 5"]