Scrapy, Xpath, extracting h3 content?

Scrapy, Xpath, extracting h3 content? - html

I need to extract everything after h3 class AIRFRAME /h3 but before h3 class ENGINES /h3:
What I need extracted:
"Entry Into Service: December 2010
Total Time Since New: 3,580 Hours" etc.
HTML code photo - not sure how to embed it directly instead of having a link
Below is what I've tried but it doesn't return anything. I'm new to Scrapy and programming in general so I would appreciate some help. I've tried searching through other posts and google in general without any luck.
input = response.xpath("//div[#class='large-6 cell selectorgadget_rejected']/h3/text()").extract()
output = []

The code that you are using is referencing another class that doesn't have the text you mentioned.
input = response.xpath("//div[#class='large-6 cell selectorgadget_rejected']/h3/text()").extract()
The name of the class in the picture is large-6 cell selectorgadget_selected and not large-6 cell selectorgadget_rejected
Also, if you use .../h3/text() you are going to scrape the text inside the H3 tag.
As I understand you want the text after the H3, between the <div>. So try something like this:
input = response.xpath("//div[#class='large-6 cell selectorgadget_selected']/text()").extract()

To complete #renatodvc's answer, you could add normalize-space function to ignore whitespace nodes.
//div[#class='large-6 cell selectorgadget_selected']/text()[normalize-space()]
Or use the function directly on the element :
normalize-space(//div[#class='large-6 cell selectorgadget_selected'])
Output :
AIRFRAME " Entry Into Service: December 2010" " Total Time Since New: 3,58# Hours" " Total Landings Since New: 1,173" " (as of September 2019)" " Program Coverage: Enrolled on Smart Parts Plus" " Maintenance Tracking: CAMP "
Then, to extract the values, you can use regex :
import re
text = 'AIRFRAME " Entry Into Service: December 2010" " Total Time Since New: 3,58# Hours" " Total Landings Since New: 1,173" " (as of September 2019)" " Program Coverage: Enrolled on Smart Parts Plus" " Maintenance Tracking: CAMP "'
data = [el.strip() for el in re.findall(':(.+?)\"', text, re.IGNORECASE)]
print(data)
Output :
['December 2010', '3,58# Hours', '1,173', 'Enrolled on Smart Parts Plus', 'CAMP']

Related

format lookupset expression

In Report Builder, I have an expression using the lookupset function that pulls back either nothing, a date and description, or several dates and several descriptions. The data it is pulling is correct. I have searched this forum and MSDN. Using what I've found in both places, I have tweaked my expression to the following.
My expression:
=Join(Lookupset(Fields!ProjectName.Value,
Fields!ProjectNames.Value,
Fields!TaskBaseline0FinishDate.Value & " - " & Fields!TaskName.Value,
"DsActivitiesCompleted"))
However, when this is displayed it doesn't have a carriage return, it just puts one after another after another. Example Below:
08/05/2015 – Milestone: Kickoff meeting Complete 08/18/2015 – Milestone: PMT Test Planning Complete 08/26/2015 – Milestone: Set CCD Date 08/26/2015 – Sprint 0 Complete 09/18/2015 – Milestone: Wave 1 Complete 09/28/2015 - Milestone: Wave 2 Complete
What I want it to look like is below. If possible I would like to have bullet points in front of each line as well.
My question is how do I get it in the format above?
Thanks,
MM

You have missed the final (optional) argument of JOIN which states which character you want to use to join your string together. Changing your expression tyo use vbCrLf (the VB new line code) as follows
=Join(Lookupset(Fields!ProjectName.Value,
Fields!ProjectNames.Value,
Fields!TaskBaseline0FinishDate.Value & " - " & Fields!TaskName.Value,
"DsActivitiesCompleted"),
vbCrLf)
Gives this output
Update
Use the below to use Chr(183) as a bullet character for each new line as well
=" " + Chr(183) + " " +
Join(Lookupset(Fields!ProjectName.Value,
Fields!ProjectNames.Value,
Fields!TaskBaseline0FinishDate.Value & " - " & Fields!TaskName.Value,
"DsActivitiesCompleted"),
vbCrLf + " " + Chr(183) + " ")

Two different formats of field in report Access 2013

I'm trying to find a way to do the following:
I want to have two different formats for a certain text box. To do so, I've done the following: User types one or two digits in a form text box(who's input and format are both "#,0;0;_") and has "yes/no" box on the right of that number field which asks if it's "kg per bag"(so by default it's the other measurement unit which is Percentages), then an OnLoad event is fired when viewing the report for that form, which checks if the yes/no value is yes or no. If "yes" then the format is set to "#.0 & " kg/bag"", if no it's set to "#.0 & " %"".
I will have to additionally divide by 100 when percentages are the ones picked, but first I want the whole thing to work... Which I still can't do!
Sadly, I'm nowhere near getting it to work... Here is my current macro on the onload event of the report, which is marked as not valid expression:
Link to the image on Imgur
Or here is the MacroBuilder Code:
<?xml version="1.0" encoding="UTF-16" standalone="no"?>
<UserInterfaceMacros xmlns="http://schemas.microsoft.com/office/accessservices/2009/11/application"><UserInterfaceMacro For="Report" Event="OnLoad"><Statements><ConditionalBlock><If><Condition>[yn]=False</Condition><Statements><Action Name="SetValue"><Argument Name="Item">[Text0].[Format]</Argument><Argument Name="Expression">#,0 & " kg/bag"</Argument></Action></Statements></If><Else><Statements><Action Name="SetValue"><Argument Name="Item">[Text0].[Format]</Argument><Argument Name="Expression">#,0 & " %"</Argument></Action></Statements></Else></ConditionalBlock></Statements></UserInterfaceMacro></UserInterfaceMacros>
Which is displayed as:
If [yn]=False Then
SetValue
Item = [text0].[format]
Expression = #,0 & " kg/bag"
Else
SetValue
Item = [text0].[format]
Expression = #,0 & " %"
End if
Can anyone give me a hint on where to go with this? Thank you!!
P.S. Comma is my decimal separator in regional settings!

You don't really need to change format only concatenate the numeric value with unit (kg/bag or %).
Using VBA, try the following code in the OnLoad event (I am assuming the recordsource field behind the text0 control is called the same -text0):
If Forms!yourformname![yn] = False Then
Reports!yourreportname!text0 = Me.text0 & " kg/bag"
Else
Reports!yourreportname!text0 = (Me.text0)/100 & "%"
' ALTERNATIVELY: Reports!yourreportname!text0.Format = "percent"
End If
Alternatively in the OnLoad event, use an embedded macro or call an external macro with the following one action (if/then changed into the IIF function):
SetValue
Item: text0
Expression: =IIF(Forms!yourformname![yn] = False, text0 & " kg/bag", text0/100 & "%")

How to multiply the font size in html? ActionScript 3 implemention

Please read the examples compare input and output. Defrent is in size=[Values]. How to replace it?
input:
"<font size='30'> Head </font><br></br> <font color='#b5fe01' size='50'>Progress:</font>"
and I want multiply all font sizes by 2 and replace it in original input.
output:
"<font size='60'> Head </font><br></br> <font color='#b5fe01' size='100'>Progress:</font>"
Thanks

AS3 regexp as requested:
var multiply:Function = function(matched:String, start:String, size:String, index:int, str:String):String
{
return start + (2 * int(size)).toString() + "'";
}
var match:RegExp = /(<font[^>]*size=')(\d+)'/gi;
var src:String = "<font size='30'> Head </font><br/> <font color='#b5fe01' size='50'>Progress:</font>";
var replaced:String = src.replace(match, multiply);
Explanation:
multiply - Takes "start" and "size" params. "start" is the previously matched part of font tag. This is required as we need to know we are in font tag, yet we only want to replace the size value. "size" is the actual size value.
RegExp - Captures as first group "<font" followed by any number of non-'>' characters, followed by "size='". Second group is the value of size. match is finished with "'" after size value, which is not captured. g stands for "global" and makes multiple-times matching on single string, i makes matching case-insensitive.
This is not a foolproof solution but I think it follows the basic idea and is easy to extend for more universal usage.

We should do this with a HTML/XML processor...
Using just pure perl:
#!/usr/bin/perl -i
while(<>){
s/(<font\s)(.*?)(>)/$1 . repsize($2) . $3 /ge;
print
}
sub repsize{my $atribs=shift;
return $atribs =~ s/(size=.)(\d+)/ $1 . $2*2/er;
}

How to read/get tags startTime/endTime, startDate and boolean value of isAllDayEvent for CalendarEvent?

I'm searching an event in "CalendarID" by "EventID" like this:
var event = CalendarApp.getCalendarById(CalendarID).getEventSeriesById(EventID);
This "event" is a single CalendarEvent. Question: How can I get the next information of this CalendarEvent - isAllDayEvent(), startTime, endTime and startDate? Is it possible to find the way how can I get these tags?

If you have calendarEvent, do this
Logger.log("Is all day? " + event.isAllDayEvent());
Logger.log("Start at: " + event.getStartTime());
Logger.log("End at: " + event.getEndTime());
if(event.isAllDayEvent()) {
Logger.log("Start Date: " + event.getAllDayStartDate())
}
calendarEventSeries, however, is not very straightforward for getting a single event details the way you want. So you probably could add tags with relevant info and call calendarEventSeries.getTag(key) to retrieve the info. To get all the tag keys that are asoociated with the calendarEventSeries, use calendarEventSeries.getAllTagKeys() which returns a string array of keys. Currently, there is no way to getRecurrence of a calendarEventSeries. Here is a request to Google for adding this function. You could reply and up vote the request if you consider it is important feature to consider. https://code.google.com/p/google-apps-script-issues/issues/detail?id=4064

How to remove more than one whitespace character from HTML?

I want to remove extra whitespace which is coming from the user end, but I can't predict the format of the HTML.
For example:
<p> It's interesting that you would try cfsetting, since nothing in it's
documentation would indicate that it would do what you are asking.
Unless of course you were mis-reading what "enableCFoutputOnly" is
supposed to do.
</p>
<p>
It's interesting that you would try cfsetting, since nothing in it's
documentation would indicate that it would do what you are asking.
Unless of course you were mis-reading what "enableCFoutputOnly" is
supposed to do.</p>
Please guide me on how to remove more than one whitespace character from HTML.

You could use regex to replace any cases of multiple whitespace characters with a single space by looping over the result until no more multiple whitespace occurances exist:
lastTry = "<p> lots of space </p>";
nextTry = rereplace(lastTry,"\s\s", " ", "all");
while(nextTry != lastTry) {
lastTry = nextTry;
nextTry = REReplace(lastTry,"\s\s", " ", "all");
}
Tested working in CF10.

if you don't want to do it thru code out of total lazyness
=> http://jsbeautifier.org/
if you want to do it by code then a regex would be another option

This should do it:
<cfscript>
string function stripCRLFAndMultipleSpaces(required string theString) {
local.result = trim(rereplace(trim(arguments.theString), "([#Chr(09)#-#Chr(30)#])", " ", "all"));
local.result = trim(rereplace(local.result, "\s{2,}", " ", "all"));
return local.result;
}
</cfscript>

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Scrapy, Xpath, extracting h3 content? - html

Related

format lookupset expression

Two different formats of field in report Access 2013

How to multiply the font size in html? ActionScript 3 implemention

How to read/get tags startTime/endTime, startDate and boolean value of isAllDayEvent for CalendarEvent?

How to remove more than one whitespace character from HTML?

Categories

Resources