Scraping JavaScript-loaded HTML with Ruby on Rails and Nokogiri -


i trying scrape website product names.

my controller following:

page = nokogiri::html(open(page_url)) @items_array = page.css("li.item h3") 

then displaying in view as:

<%= @items_array.each |item| %> <%= item.text %><br /><br /> <% end %> 

the problem html loaded first 10 items. rest generated javascript. can't seem figure out how exactly.

any ideas on how scrape rest of content appreciated!

it won't work. nokogiri cannot scrape not on page, , can see (using "view source" on browser), part of list not html. how loaded irrelevant in case (probably using javascript).

best option ask them if expose api use (that make work easier).

scrapping fragile depend on exact layout of page.


Comments

Popular posts from this blog

c++ - OpenCV Error: Assertion failed <scn == 3 ::scn == 4> in unknown function, -

php - render data via PDO::FETCH_FUNC vs loop -

The canvas has been tainted by cross-origin data in chrome only -