Scraping JavaScript-loaded HTML with Ruby on Rails and Nokogiri -
i trying scrape website product names.
my controller following:
page = nokogiri::html(open(page_url)) @items_array = page.css("li.item h3")
then displaying in view as:
<%= @items_array.each |item| %> <%= item.text %><br /><br /> <% end %>
the problem html loaded first 10 items. rest generated javascript. can't seem figure out how exactly.
any ideas on how scrape rest of content appreciated!
it won't work. nokogiri cannot scrape not on page, , can see (using "view source" on browser), part of list not html. how loaded irrelevant in case (probably using javascript).
best option ask them if expose api use (that make work easier).
scrapping fragile depend on exact layout of page.
Comments
Post a Comment