I know it might be a duplicate but I am not able to extract a value from this HTML source. Any help would be greatly appreciated.
So what I am trying to do is get the pid of the project from page.
The names of the project are being read from a csv file and I need to get the pid.
For example if the project here is "AA project", just the project key "AA" can also be used, the pid that needs to be extracted is 10441.
Since the values are not a label, I cannot figure out how to extract these.
Update : just using pid=(\d....) gives all the pid without any reference to the project name or key.
<table id="project-list" class="aui">
<thead>
<tr>
<th></th>
<th>Name</th>
<th>Key</th>
<th class="project-list-type">Project Type</th>
<th>URL</th>
<th>Project Lead</th>
<th>Default Assignee</th>
<th>Operations</th>
</tr>
</thead>
<tbody>
<tr data-project-key="AA">
<td class="cell-type-icon" data-cell-type="avatar">
<div class="aui-avatar aui-avatar-small aui-avatar-project jira-system-avatar"><span class="aui-avatar-inner"><img src="/secure/projectavatar?pid=10441&avatarId=10011&size=small" alt="Project Avatar for 10441" /></span></div>
</td>
<td data-cell-type="name">
<a id="view-project-10441" href="/plugins/servlet/project-config/AA/summary">AA project</a>
</td>
<td data-cell-type="key">AA</td>
<span>Software</span>
</td>
<td class="cell-type-url" data-cell-type="url">
No URL
</td>
<td class="cell-type-user" data-cell-type="lead">
<a class="user-hover" rel="localadmin" id="view_AA_projects_localadmin" href="/secure/ViewProfile.jspa?name=localadmin">Atlassian Administrator</a>
</td>
<td class="cell-type-user" data-cell-type="default-assignee">
Unassigned
</td>
<td data-cell-type="operations">
<ul class="operations-list">
<li><a class="edit-project" id="edit-project-10441" href="/secure/project/EditProject!default.jspa?pid=10441&returnUrl=ViewProjects.jspa">Edit</a></li>
<li><a id="change_project_type_10441" class="change-project-type-link" data-project-id="10441" href="#">Change project type</a></li>
<li><a id="delete_project_10441" href="/secure/project/DeleteProject!default.jspa?pid=10441&returnUrl=ViewProjects.jspa">Delete</a></li>
</ul>
</td>
</tr>
<tr data-project-key="AAL">
<td class="cell-type-icon" data-cell-type="avatar">
<div class="aui-avatar aui-avatar-small aui-avatar-project jira-system-avatar"><span class="aui-avatar-inner"><img src="/secure/projectavatar?pid=10442&avatarId=10011&size=small" alt="Project Avatar for 10442" /></span></div>
</td>
<td data-cell-type="name">
<a id="view-project-10442" href="/plugins/servlet/project-config/AAL/summary">AAL project</a>
</td>
<td data-cell-type="key">AAL</td>
<td class="cell-type-project-type">
<span>Software</span>
</td>
<td class="cell-type-url" data-cell-type="url">
No URL
</td>
<td class="cell-type-user" data-cell-type="lead">
<a class="user-hover" rel="localadmin" id="view_AAL_projects_localadmin" href="/secure/ViewProfile.jspa?name=localadmin">Atlassian Administrator</a>
</td>
<td class="cell-type-user" data-cell-type="default-assignee">
Unassigned
</td>
<td data-cell-type="operations">
<ul class="operations-list">
I wouldn't recommend using regular expressions to parse HTML data as it will be a headache to develop and maintain and it will be very sensitive to markup changes hence very fragile, see https://stackoverflow.com/a/1732454/2897748 for details.
Go for XPath Extractor instead, the relevant configuration would be:
Reference Name: anything meaningful, i.e. id
XPath Query: substring-after(//tr[#data-project-key='AA']/td[#data-cell-type='name']/a/#id,'view-project-')
Check Use Tidy if your response is not XHTML-compliant
Demo:
References:
XPath Tutorial
XPath Language Reference
Related
So I am having trouble getting the variable values to be shown in an email template. The 3rd party email templating provider is Postmark and it uses Mustache. My template is set up like this (I have ommitted some of the irrelevant html to keep things shorter):
{{#discount_group.delivery_fee}}
<tr>
<td width="30%" class="purchase_footer" valign="middle">
<p class="purchase_total">{{delivery_fee}}</p>
</td>
</tr>
{{/discount_group.delivery_fee}}
{{#discount_group.discount}}
<tr>
<td width="30%" class="purchase_footer" valign="middle">
<p class="purchase_total">{{discount}}</p>
</td>
</tr>
<tr>
<td width="30%" class="purchase_footer" valign="middle">
<p class="purchase_total_bold">{{grandtotal}}</p>
</td>
</tr>
{{/discount_group.discount}}
And my json payload looks like this:
"discount_group": {
"delivery_fee":"delivery_fee_Value",
"discount": "discount_Value",
"grandtotal": "grandtotal_Value"
}
But when I send out the email, the sections render properly but the variable values are blank (red box):
If I remove "delivery_fee" from the json payload, the section is not rendered as expected but the values are sill missing:
I have also tried {{discount_group.delivery_fee}} and {discount_group.discount}} etc but that still had the missing values.
What am I doing wrong?
Thanks in advance
So I figured it out. I'm not sure why it has to be this way but it does. My problem was in the payload. The payload should be formatted like this:
"discount_group": {
"delivery_fee":{
"delivery_fee":"delivery_fee_Value"
},
"discount": {
"discount":"discount_Value",
"grandtotal": "grandtotal_Value"
}
}
When you wrap a block of code in mustache, what you're doing is stepping into that object in your data in an effort to make your code more readable. Postmarks documentation calls it 'Scoping'. You can read up on here!
Therefore, by starting blocks with, for example, {{#discount_group.delivery_fee}}, you are already at delivery_fee and calling it again will return nothing since it doesn't exist.
With how your data was originally structured, you had everything you needed nested in discount_group, so you didn't need to nest further in your brackets. I know you have found a resolve, but in the future, instead of changing your data to match your code, you could consider instead update your code to be as follows:
{{#discount_group}}
<tr>
<td width="30%" class="purchase_footer" valign="middle">
<p class="purchase_total">{{delivery_fee}}</p>
</td>
</tr>
<tr>
<td width="30%" class="purchase_footer" valign="middle">
<p class="purchase_total">{{discount}}</p>
</td>
</tr>
<tr>
<td width="30%" class="purchase_footer" valign="middle">
<p class="purchase_total_bold">{{grandtotal}}</p>
</td>
</tr>
{{/discount_group}}
Hi all I would like to extract 25.8 value from this html block using xpath
the html code is from a weather website, https://app.weathercloud.net/
"<div id=""gauge-rainrate""><h3>Intensidad de lluvia</h3><canvas id=""rainrate"" width=""200"" height=""200""></canvas><div class=""summary"">
<table>
<tbody><tr>
<th> mm/h</th>
<th class=""max""><i class=""icon-chevron-up icon-white""></i> Máx </th>
</tr>
<tr>
<td class=""grey"">Diaria</td>
<td><a id=""gauge-rainrate-max-day"" rel=""tooltip"" title="""" data-original-title=""22/04/2022 00:00"">0.0</a></td>
</tr>
<tr>
<td class=""grey"">Mensual</td>
<td><a id=""gauge-rainrate-max-month"" rel=""tooltip"" title=""21/04/2022 02:15"">25.8</a></td>
</tr>
<tr>
<td class=""grey"">Anual</td>
<td><a id=""gauge-rainrate-max-year"" rel=""tooltip"" title=""21/04/2022 02:15"">25.8</a></td>
</tr>
</tbody></table>
</div></div>"
I use this expression to extract in a google spreadsheet cell
=IMPORTXML("https://app.weathercloud.net/d5044837546#current";"//a[#id='gauge-rainrate-max-month']")
apparently the code is correct but my output is always
-
I don't understand why...
I'm trying to display a button for each even row in my table ( Thymeleaf ) but I'm not sure how can I do it
<tr th:each="item, iter : ${flightlist}">
<td th:text="${item.flightNumber}"></td>
<td th:text="${item.airline}"></td>
<td th:text="${item.origin}"></td>
<td th:text="${item.destination}"></td>
<td th:text="${#dates.format(item.takeOffDate, 'MM-dd-yyyy')}"></td>
<td th:text="${item.takeOffTime}"></td>
<td th:text="${#dates.format(item.landingDate, 'MM-dd-yyyy')}"></td>
<td th:text="${item.landingTime}"></td>
<td th:text="${item.flightDuration}"></td>
<td th:text="${item.price}"></td>
// Display button code here if true for even
<td th:if="${iter.even == true ? '<a type="button" class="btn btn-primary"
href="/addUserFlight?id=${item.flightNumber}">Select</a> ' : ''}"></td>
</tr>
Edit: Iter worked!
You can use the Thymeleaf iterStat convention to get the value of even:
<tr th:each="item, iterStat : ${flightlist}">
<td th:text="${item.flightNumber}">[flight number]</td>
<td th:text="${item.airline}">[airline]</td>
<td th:text="${item.origin}">[origin]</td>
<td th:text="${item.destination}">[destination]</td>
<td th:text="${#dates.format(item.takeOffDate, 'MM-dd-yyyy')}">[takeOffDate]</td>
<td th:text="${item.takeOffTime}">[takeOffTime]</td>
<td th:text="${#dates.format(item.landingDate, 'MM-dd-yyyy')}">[takeOffTime]</td>
<td th:text="${item.landingTime}">[landingTime]</td>
<td th:text="${item.flightDuration}">[flight duration]</td>
<td th:text="${item.price}">[price]</td>
<td><a th:if="${iterStat.even}"
type="button"
class="btn btn-primary"
th:href="#{/addUserFlight(id=${item.flightNumber})}">Select</a></td>
</tr>
Note that you also want to put this th:if in your <a> tag because the column will otherwise not be included and it will break the design of your table.
You'll want to review the syntax for anchor tags too. You would need th: on the href because you will want Thymeleaf to dynamically evaluate your flight numbers.
Take a look at #numbers.formatCurrency() to format your price
You can include th:if=${!#lists.isEmpty(flightlist)} on the <table> to not show the table if there are no flights.
Lastly, include some default values between tags so that if you open up the page on a browser without a container, you can still what the design will look like. I usually just include the name of the property.
I`m using Scrapy Python to try to grep data from the site.
How I can grep this structure with Xpath?
<div class="foo">
<h3>Need this text_1</h3>
<table class="thesamename">
<tbody>
<tr>
<td class="tmp_year">
45767
</td>
<td class="tmp_outcome">
<b>Win_1</b><br>
<span class="tmp_category">TEST_1</span>
</td>
</tr>
<tr>
<td class="tmp_year">
1232004
</td>
<td class="tmp_outcome">
<b>Win_2</b><br>
<span class="tmp_category">TEST_2</span>
</td>
</tr>
<tr>
<td class="tmp_year">
122004
</td>
<td class="tmp_outcome">
<b>Win_3</b><br>
<span class="tmp_category">TEST_3</span>
</td>
</tr>
</tbody>
<h3>Need this text_2</h3>
<table class="thesamename">
<tbody>
<td class="tmp_year">
234
</td>
<td class="tmp_outcome">
<b>Win_E</b><br>
<span class="tmp_category">TEST_E</span>
</td>
</tr>
<tr>
<td class="tmp_year">
3476
</td>
<td class="tmp_outcome">
<b>Win_C</b><br>
<span class="tmp_category">TEST_C</span>
</td>
</tr>
</tbody>
<h3>Need this text_3</h3>
<table class="thesamename">
<tbody>
<tr>
<td class="tmp_year">
85567
</td>
<td class="tmp_outcome">
<b>Win_T</b><br>
<span class="tmp_category">TEST_T</span>
</td>
</tr>
<tr>
<td class="tmp_year">
435656
</td>
<td class="tmp_outcome">
<b>Win_A</b><br>
<span class="tmp_category">TEST_A</span>
</td>
</tr>
<tr>
<td class="tmp_year">
980
</td>
<td class="tmp_outcome">
<b>Win_Z</b><br>
<span class="tmp_category">TEST_Z</span>
</td>
</tr>
</tbody>
I would like to have output with this structure:
"Section": {
Need this text_1 :
[45767 : Win_1 : TEST_1]
[1232004 : Win_2 : TEST_2]
[122004: Win_3 : TEST_3]
,
Need this text_2:
[234 : Win_E : TEST_E]
[3476 : Win_C : TEST_C]
,
Need this text_3:
[85567 : Win_T : TEST_T]
[435656 : Win_A : TEST_A]
[980: Win_Z : TEST_Z]
}
How can I create the proper xpath select to take this structure?
I can take separately all "h3" , all "a" then all tags with class but how I can match?
GREP YOU SAY?! LOL Well, You would be entirely wron to name it so but for the sake ofkeeping the jargon cleanfor understanding your just parsing/extracting.... So new to scrapy? or web dev sideof things? No matter... Theres no way I couldexpect to teach you in one answer here how to xpth/regex like a pro... only wayis for you to keep at but I throw in my input.
First of all, xpath is amazingly usefull wen it comes to websites that are necessarily build to stadard, which doesnt make them bad per say but in the html snipet you gave... its structured all right soo.. Id recommend css extract .. THESE ARE THE VALUES...
year = response.css('td.tmp_year a::text').extract()
outcome = response.css('td.tmp_outcome b::text').extract()
category= response.css('span.tmp_category::text').extract()
PRO-TIP: For what ever case you deem it neccesary, you can save a web page asan HTML file and use scrapy shell by referencing the direct file path to it... So I save you html snippet to a file on my desktop then ran...
scrapy shell file:///home/scriptso/Desktop/letsGREPlol.html
ANYWAYS... as far as xpath... since you asked lol... cake. lets compare the xpath with the cssand tell me you can see... it? lol
response.css('td.tmp_outcome b::text').extract()
so is a td tag....and the class name is tmp_outcome, thn the next node is a bold tag... of which where the text is thusly declaring it as text with the ::text
response.xpath('//td[#class="tmp_outcome"]/b/text()').extract()
So xpath is basically saying we star with a patter inthe entire site of the td tag... and class= tmp_outcome, then the bold, then in xpath to declare type /text() is for text.... /#href is for.. yeah you guessedit
How could I use ruby to extract information from a table consisting of these rows? Is it possible to detect the comments using nokogiri?
<!-- Begin Topic Entry 4134 -->
<tr>
<td align="center" class="row2"><image src='style_images/ip.boardpr/f_norm.gif' border='0' alt='New Posts' /></td>
<td align="center" width="3%" class="row1"> </td>
<td class="row2">
<table class='ipbtable' cellspacing="0">
<tr>
<td valign="middle"><alink href='http://www.xxx.com/index.php?showtopic=4134&view=getnewpost'><image src='style_images/ip.boardpr/newpost.gif' border='0' alt='Goto last unread' title='Goto last unread' hspace=2></a></td>
<td width="100%">
<div style='float:right'></div>
<div> <alink href="http://www.xxx.com/index.php?showtopic=4134&hl=">EXTRACT LINK 1</a> </div>
</td>
</tr>
</table>
<span class="desc">EXTRACT DESCRIPTION</span>
</td>
<td class="row2" width="15%"><span class="forumdesc"><alink href="http://www.xxx.com/index.php?showforum=19" title="Living">EXTRACT LINK 2</a></span></td>
<td align="center" class="row1" width='10%'><alink href='http://www.xxx.com/index.php?showuser=1642'>Mr P</a></td>
<td align="center" class="row2"><alink href="javascript:who_posted(4134);">1</a></td>
<td align="center" class="row1">46</td>
<td class="row1"><span class="desc">Today, 12:04 AM<br /><alink href="http://www.xxx.com/index.php?showtopic=4134&view=getlastpost">Last post by:</a> <b><alink href='http://www.xxx.com/index.php?showuser=1649'>underft</a></b></span></td>
</tr>
<!-- End Topic Entry 4134 -->
-->
Try to use xpath instead:
html_doc = Nokogiri::HTML("<html><body><!-- Begin Topic Entry 4134 --></body></html>")
html_doc.xpath('//comment()')
You could implement a Nokogiri SAX Parser. This is done faster than it might seem at first sight. You get events for Elements, Attributes and Comments.
Within your parser, your should rememeber the state, like #currently_interested = true to know which parts to rememeber and which not.