Parse html page with mechanize to receive the appropriate array - html

I have the following html code on the page received by mechanize (agent.get):
<div class="b-resumehistorylist-views">
<!-- first date start-->
<div class="b-resumehistory-date">date1</div>
<div class="b-resumehistory-company">
<div class="b-resumehistory-time">time1</div>
company1</div>
<!-- second date start -->
<div class="b-resumehistory-date">date2</div>
<div class="b-resumehistory-company">
<div class="b-resumehistory-time">time2</div>
company2
</div>
<div class="b-resumehistory-company">
<div class="b-resumehistory-time">time3</div>
company3</div>
<div class="b-resumehistory-company">
<div class="b-resumehistory-time">time4</div>
company4</div>
<div class="b-resumehistory-company">
<div class="b-resumehistory-time">time5</div>
company5</div>
<div class="b-resumehistory-company">
<div class="b-resumehistory-time">time6</div>
company6</div>
<div class="b-resumehistory-company">
<div class="b-resumehistory-time">time7</div>
company7</div>
...
</div>
I need to search inside the div with class="b-resumehistorylist-views" each date.
Then find all divs between two div-dates and link each item to this particular date.
The problem is that each item (div class = b-resumehistorylist-views) is not inside div=b-resumehistorylist-views.
At final stage I need to receive the following array:
array = [ [date1, time1, company1, companylink1], [date2, time2, company2, companylink2], [date2, time3, company3, companylink3],[date2, time4, company4, companylink4] ]
I know that I must use method search with text() option, but I cannot find the solution.
My code right now can parse all companies information between div class=b-resumehistory-company, but I need to find right date.

It would be the same thing as before, just some of the class attributes have been changed:
doc = agent.get(someurl).parser
doc.css('.b-resumehistory-company').map{|x| [x.at('./preceding-sibling::div[#class="b-resumehistory-date"][1]').text , x.at('.b-resumehistory-time').text, x.at('a').text, x.at('a')[:href]]}

Related

display data of json nested objects in html using angular js

Newbie to angularjs.trying to display data from json nested object like this
enter image description here
my html code is
<a rel="extranal" data-val="<%rcds%>" ng-repeat="rcds in rcd" class="international" id="<%rcds.id%>">
<span><img ng-src="<% rcds.routes.subroutes %>"/> <% rcds.subroutes[0].xyz%></span>
<div class="departure-time"><% rcds.subroutes[0].abc %></div>
</a>
want to display the data subroutes in the ng-repeat based on the condition of legtype in the json.how to do this.
if you want show your object as JSON the only thing that you need is write {{rcds | json}}
Otherwise if you want to navigate your nested object you should do somethings like:
<div ng-repeat="rcds in red">
<div ng-repeat="route in rcds.routes">
<!-- route element --->
<div ng-repeat="depart in route.depart">
<!-- depart element --->
<div ng-repeat="subroute in route.subroutes">
<!-- subroute element -->
</div>
</div>
</div>
</div>

Passing a variable to href

I have this code that is getting data from a cell in a table
<div class="p-s-header">TITLE</div>
<div class="p-content">
<div class="row">
<div class="col-md-12 col-xs-12 col-sm-12">
<span di="76" ></span><br />
<span di="77" ></span><br />
</div>
</div>
</div>
This is producing:
TITLE
(data in cell 76)
(DATA in cell 77)
Now, data in cell 77 is a a link that is too long for the space, so I want to add the work "Click here" (hyperlinked) instead of showing the link.
So I wanted to change code, so output looks like:
TITLE
(data in cell 76)
Click here
"Click here" shoudl be built with the data in cell 77. I didn't code this but it seems the code to get the data from that cell is:
How would I build it?
I tried a few options, but nothing seems to work:
For example something like this:
<a href=<span di="77" ></span>Link</a><br />
thanks for your help
Try this, I hope this will encourage you on studying further how HTML and JS work together. (PHP is too advanced right now).
Change your code for this:
<div class="p-s-header">TITLE</div>
<div class="p-content">
<div class="row">
<div class="col-md-12 col-xs-12 col-sm-12">
<span id="cell76" ></span><br />
<span id="cell77" >http://www.google.com</span><br />
</div>
</div>
</div>
The attribute di doesn't exist, it is id and id's cannot start with numbers or be numbers
At the bottom of the code that you added write the following:
<script>
var link = document.getElementById('cell77').innerHTML;
document.write(''+link+'');
</script>
Here is the jsfiddle
https://jsfiddle.net/cz1Ltw3x/1/

How to get Nokogiri to scrape text from span in Ruby

I'm trying to scrape information from a website using Nokogiri and Curb, but I can't seem to find the right name/ to find where to scrape. I'm trying to scrape the API key, which is at the bottom of the HTML code as "xxxxxxx".
The HTML code is:
<body class="html not-front logged-in no-sidebars page-app page-app- page-app-8383900 page-app-keys i18n-en" data-twttr-rendered="true">
<div id="skip-link"></div>
<div id="page-wrapper">
<!--
Code for the global nav
-->
<nav id="globalnav" class="without-subnav"></nav>
<nav id="subnav"></nav>
<section id="hero" class="hero-short"></section>
<section id="gaz-content">
<div class="container">
::before
<div id="messages"></div>
<div id="gaz-content-wrap-outer" class="row">
::before
<div id="gaz-content-wrap-inner" class="span12">
<div class="row">
::before
<div class="article-wrap span12">
<article id="gaz-content-body" class="content">
<header></header>
<div class="header-action"></div>
<div class="tabs"></div>
lass="d-block d-block-system g-main">
<div class="app-details">
<h2>
Application Settings
</h2>
<div class="description"></div>
<div class="app-settings">
<div class="row">
::before
<span class="heading">
Consumer Key (API Key)
</span>
<span>
xxxxxxxxx
</span>
All I can seem to get is the "content" text.
My code looks like:
consumer = html.at("#gaz-content-body")['class']
puts consumer
I'm not sure what to type to select the class and/or span then the input text. All I can get is Nokogiri to put "content".
In this case we need to find the second span after the span class="heading", and inside the div class="app-settings" - I'm being a bit general but not too much. I'm using search instead of at to retrieve the two spans and get the second one:
# Gets the 2 span elements under <div class='app-settings'>.
res = html.search('#gaz-content-body .app-settings span')
# Use .text to get the contents of the 2nd element.
res[1].text.strip
# => "xxxxxxxx"
But you can also use at to target the same:
res = html.at("#gaz-content-body .app-settings span:nth-child(2)")
res.text.strip
# => "xxxxxxxx"

HTML element to contain id or name from ko.observable using foreach

Below I have a for-each loop using knockout.js.
<div data-bind="foreach:Stuff">
<div class="row">
<span data-bind="text: $data.name"></span>
</div>
</div>
I need to have the HTML Element with an id or name or something that reflects a unique value related to the $data.name value, as another method runs asynchronously, and needs to know which HTML element to update.
Ideally, it would look something like this, I guess:
<div data-bind="foreach:Stuff">
<div class="row">
<span id="data-bind='text: $data.name'" data-bind="text: $data.name"></span>
</div>
</div>
I have found a knockout syntax that applies values during runtime to specified attributes:
<div data-bind="foreach:Stuff">
<div class="row">
<span data-bind="attr: { id: $data.name}"></span>
</div>
</div>
Are you looking for this
<div data-bind="foreach:Stuff">
<div class="row">
<span data-bind="text: $data.name,attr:{id:$data.name}'"></span>
</div>
</div>
Here name is a observable i believe when ever there is change in name in stuff it will automatically updates its value & attr:{id}just to give a dynamic id to element using available bindings .

Right way to parse this HTML page?

I'm trying to parse some parts of an HTML page but I have problems with my regular expression.
My code looks like this:
... Download page using wget and some other stuff ...
$PAGE_REGEXP = "\<div class="col bg_dark clear">";
#Array HTMLLines
#HTMLLines = split(/\n/, $Page);
foreach $ThisOne (#HTMLLines) {
if ( ($Team) = ($ThisOne =~ /$PAGE_REGEXP/) ) {
$T{TranslateTeams($Team)}++;
$LastTeam=TranslateTeams($Team);
};
};
This is the HTML page:
<div class="col bg_dark clear">
<div class="col_1 left">15:30</div>
<div class="col_3_archive left">Team A - Team B</div>
<div class="col_2_archive left">
1:4 (0:2)
</div>
<div class="col_5 left ">2.4 </div>
<div class="col_5 left ">3.6 </div>
<div class="col_5 left bold">2.9 </div>
<div class="col_8 left">
</div>
<div class="col clear">
<div class="col_1 left">15:30</div>
<div class="col_3_archive left">Team C - Team D</div>
<div class="col_2_archive left">
2:3 (1:1)
</div>
<div class="col_5 left ">2.7 </div>
<div class="col_5 left ">3.7 </div>
<div class="col_5 left bold">2.5 </div>
<div class="col_8 left">
</div>
The informations I need to parse are the team names, the end and halftime result and the numbers in e.g., col_5_left: 2.4, 3.6 and 2.9(for the game Team A - Team B).
If I start my script, Perl gives me following error:
Bareword found where operator expected at parser.pl line 11, near ""\
I'm not familiar with all existing modules in Perl, maybe I'm trying to do something which is quite easily to achieve using
the correct module. Can anybody please provide me some hints/tips how to parse this HTML page?
Thx
The line with regexp should probably look something like this:
$PAGE_REGEXP = '<div class="col bg_dark clear">';