I am trying to retrieve the image from this html data:
<div class="image">
<a href="http://www.website.com/en/105/News/10217/">
<img src="/images/cache/105x110/crop/images%7Ccms-image-000005554.gif"
width="105" height="110" alt="kollsge (photo: author)" />
</a>
</div>
This is my code:
HTMLNode *bodyNode = [parser body];
NSArray *imageNodes = [bodyNode findChildTags:#"div"];
for (HTMLNode *imageNode in imageNodes) {
if ([[imageNode getAttributeNamed:#"class"] isEqualToString:#"image"]) {
NSLog(#"%#", [imageNode getAttributeNamed:#"img src"]);
}
}
Help would be much appreciated.
I solved it by this code:
for (HTMLNode *imageNode in imageNodes) {
if ([[imageNode getAttributeNamed:#"class"] isEqualToString:#"image"]) {
HTMLNode *aNode = [imageNode firstChild];
HTMLNode *imgNode = [aNode nextSibling];
HTMLNode *imNode = [imgNode firstChild];
NSLog(#"%#", [imNode getAttributeNamed:#"src"]);
}
}
You are not going through the tree correctly. You are attempting to find an attribute named img src on your div. That would look like this:
<div class="image" img src="whatever">
For one thing, that's not valid HTML, but the more important issue is that you want to be looking at the children. The thing you are looking for is nested inside the div, not an attribute. Since your div only has one child, a quick look at the project you provided in the comments leads me to believe that the following will work:
HTMLNode *bodyNode = [parser body];
NSArray *imageNodes = [bodyNode findChildTags:#"div"];
for (HTMLNode *imageNode in imageNodes) {
if ([[imageNode getAttributeNamed:#"class"] isEqualToString:#"image"]) {
HTMLNode *aNode = [imageNode firstChild];
HTMLNode *imgNode = [aNode nextSibling];
NSLog(#"%#", [imgNode getAttributeNamed:#"src"]);
}
}
Related
I want to learn web-scraping. Therefore, I started practicing. I am trying to get data-ad-id from HTML using XPath.
HTML structure like this:
<body id="z1234">
<div class="viewport">
<div class="g-row">
<div class="g-col-9">
<div class="cBox cBox--content cBox--resultList">
<div class="cBox-body cBox-body--resultitem dealerAd rbt-reg rbt-no-top"><a class="link--muted no--text--decoration result-item" href="url" data-ad-id="248059713"></a>
</div>
</div>
</div>
</div>
</body>
XPath for <a class="link--muted no--text--decoration result item" > is //*[#id="z1234"]/div[3]/div[4]/div[2]/div[1]/div[11]/a. If I choose different car, only last div changes.
According to this I write C# code:
var url = "https://suchen.mobile.de/fahrzeuge/search.html?damageUnrepaired=NO_DAMAGE_UNREPAIRED&isSearchRequest=true&maxPowerAsArray=KW&maxPrice=10000&minPowerAsArray=KW&minPrice=10000&scopeId=C";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader sr = new StreamReader(response.GetResponseStream());
string sourceCode = sr.ReadToEnd();
HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
document.LoadHtml(sourceCode);
var rows = document.DocumentNode.SelectNodes("//*[#id='z1234']/div[3]/div[4]/div[2]/div[1]/div[11]");
foreach (var row in rows)
{
var id = row.SelectSingleNode("a[#data-ad-id]").InnerText;
Console.WriteLine("id:" + id);
}
}
I cannot get anything from this Node. It is null. How can I get data-ad-id?
EDIT
I change my C# code:
var rows = document.DocumentNode.SelectNodes("//a[#data-ad-id]")[0];
var id = rows.Attributes["data-ad-id"].Value;
Now I can get data-ad-id.
As per the code of the site, I could sense that you have no innertext for that "A" tag. It just contains DIV and IMG tags.
You will need to fetch data-ad-id using
//a[#data-ad-id]/#data-ad-id
I have a html page, My requirement is in my Objective C code I need to find a text for example in the below example I have "Color Change" in <p> tag, Once I find the text, I need to change the <p> tag color value, How can we achieve it.
> <!DOCTYPE html> <html> <body>
>
> <h1 style="color:blue;">This is a heading</h1> <p
> style="color:red;">Color Change</p>
>
> </body> </html>
if you are using local html file
then this code might be helpful for you..
NSString *htmlFile = [[NSBundle mainBundle] pathForResource:#"sample" ofType:#"html"];
NSString* text = [NSString stringWithContentsOfFile:htmlFile encoding:NSUTF8StringEncoding error:nil];
NSLog(#"%#",htmlFile);
NSLog(#"Hello");
[self._webview loadHTMLString:[NSString stringWithFormat:#"<html><body bgcolor=\"#000000\" text=\"#FFFFFF\" face=\"Bookman Old Style, Book Antiqua, Garamond\" size=\"5\">%#</body></html>", text] baseURL: nil];
here in the below code color:#fff tag use for text color #fff use black color
NSString *webStr =#"Your text use";
[self._webview loadHTMLString:[NSString stringWithFormat:#"<div id ='foo' align='left' style='line-height:18px; float:left;width:300px; font-size:13px;font-family:helvetica;background-color:transparent; color:#fff;>%#<div>",webStr] baseURL:nil];
let me know , is this helpful or not for you
In the end, setTextColor: is the answer, there's an important detail missing from the earlier answers: To get this to work in iOS 8, I had to set the color =after= I set the text.
- (void)viewDidLoad
{
[super viewDidLoad];
_myTextView.textColor = [UIColor whiteColor];
_myTextView.text = #"hai hello hello..."
I'm using WKWebView loading html string, some end of html string have a few of ugly image links, i want to hide them.
The css use to hide image, but not works.
.article img[src* = "/smilies/"],
.article img[src* = ".feedburner.com/~ff/"],
.article img[src* = ".feedburner.com/~r/"],
.article img[src* = ".feedblitz.com/"]
{
display: none;
}
The sample html string with feedburner src i want to hide :
<div>
<img src="http://feeds.feedburner.com/~ff/Venturebeat?d=yIl2AUoC8zA" border="0"> <img src="http://feeds.feedburner.com/~ff/Venturebeat?d=qj6IDK7rITs" border="0"> <img src="http://feeds.feedburner.com/~ff/Venturebeat?i=H9eoOCii8XI:sanX3-jfWnw:V_sGLiPBpWU" border="0"> <img src="http://feeds.feedburner.com/~ff/Venturebeat?d=I9og5sOYxJI" border="0"> <img src="http://feeds.feedburner.com/~ff/Venturebeat?i=H9eoOCii8XI:sanX3-jfWnw:D7DqB2pKExk" border="0">
</div>
A quick and dirty way to achieve this is by using regular expressions. Mind you that this is really not ideal for long HTML files as it is not as efficient as a real HTML parser.
// The HTML you posted
NSString *HTML = #"<div>\n\t<img src=\"http://feeds.feedburner.com/~ff/Venturebeat?d=yIl2AUoC8zA\" border=\"0\"> <img src=\"http://feeds.feedburner.com/~ff/Venturebeat?d=qj6IDK7rITs\" border=\"0\"> <img src=\"http://feeds.feedburner.com/~ff/Venturebeat?i=H9eoOCii8XI:sanX3-jfWnw:V_sGLiPBpWU\" border=\"0\"> <img src=\"http://feeds.feedburner.com/~ff/Venturebeat?d=I9og5sOYxJI\" border=\"0\"> <img src=\"http://feeds.feedburner.com/~ff/Venturebeat?i=H9eoOCii8XI:sanX3-jfWnw:D7DqB2pKExk\" border=\"0\">\n</div>";
// A string containing source of the images that you want to delete
NSString *source = #"http://feeds.feedburner.com/~ff/";
// Builds a pattern that matches the tags of the images you want to delete
NSString *pattern = [NSString stringWithFormat:#"<img src=\"%#.+?>", source];
// The actual delete operation
NSString *cleanHTML = [HTML stringByReplacingOccurrencesOfString:pattern
withString:#""
options:NSRegularExpressionSearch
range:NSMakeRange(0, HTML.length)];
// Do what you want with the cleaned HTML (display it, ...)
NSLog(#"%#", cleanHTML);
I want to display HTML text with <blockquote> tags in the UITextView.
example of HTML:
<div>
<blockquote class="uncited">
<div>
<cite>Nick:</cite>
<blockquote class="uncited">
<div>
<cite>Tom:</cite>censored<br>
Hello
</div>
</blockquote>
<p>World</p>
</div>
</blockquote>
<p>Text</p>
</div>
I use the following code to display HTML text in the cell:
UITextView *textView = (UITextView*)[cell viewWithTag:100];
NSAttributedString *attributedString = [[NSAttributedString alloc] initWithData:[thisComment.htmlText dataUsingEncoding:NSUnicodeStringEncoding] options:#{ NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType } documentAttributes:nil error:nil];
textView.attributedText = attributedString;
But in the UITextView it's looks like plain text. I want to get it: sample
How can i do this? What kind of libraries should i use? i think about HTML -> Markdown -> NSAttributedString -> TextView
Apple's HTML to AttributedText parser is excellent with CSS so you can actually write it like you'd do for a raw web client.
let parsedCommentHTML = html.replacingOccurrences(of: "<blockquote>\n", with: "<blockquote>\n<k style=\"color:#ccc; font-size: 2em; font-family: 'Copperplate'\">“</k>")
let blockQuoteCSS = "\nblockquote > p {color:#808080; display: inline;} \n blockquote { background: #f9f9f9;}"
let pCSS = "p {margin-bottom: 0px;}"
let cssStyle = "\(blockQuoteCSS)\n\(pCSS)\n"
return try NSAttributedString(data: ("<html><head><style>\(cssStyle)</style></head><span style=\"font-family: HelveticaNeue-Thin; font-size: 17\">\(CONTENT)</span></html>").data(using: String.Encoding.unicode)!, options: [NSDocumentTypeDocumentAttribute : NSHTMLTextDocumentType], documentAttributes: nil)
This produces a nice (in my opinion) looking quote:
i need your help.
This is the first time i try to parse HTML and i am running through some problems. I followed Rays Tutorial about HTML parsing. He uses hpple. The file i want to parse is a lot more complicated than his. I want to excract some variables through the code below:
<div id="status">
<div id="loading" style="display:none">Error:<br />Connection to demo board was lost.</div>
<div id="display">
<span style="float:right;font-size:9px;font-weight:normal;padding-top:8px;text-indent:0px">(click to toggle)</span>
<p>LEDs:<br /><span class="leds">
<!-- <a id="led7" onclick="newAJAXCommand('leds.cgi?led=7');">•</a>
<a id="led6" onclick="newAJAXCommand('leds.cgi?led=6');">•</a>
<a id="led5" onclick="newAJAXCommand('leds.cgi?led=5');">•</a>
<a id="led4" onclick="newAJAXCommand('leds.cgi?led=4');">•</a>
<a id="led3" onclick="newAJAXCommand('leds.cgi?led=3');">•</a>
<a id="led2" onclick="newAJAXCommand('leds.cgi?led=2');">•</a> -->
<a id="led1" onclick="newAJAXCommand('leds.cgi?led=1');">•</a>
<!-- <a id="led0">•</a> -->
</span></p>
<p>Buttons:<br />
<!-- <span id="btn3">?</span>
<span id="btn2">?</span>
<span id="btn1">?</span> -->
<span id="btn0">?</span></p>
<p>Potentiometer: <span id="pot0" style="font-weight:normal">?</span></p>
<p>Temperature: <span id="temp0" style="font-weight:normal">?</span></p>
</div>
So far i can get LEDs, Buttons, Potentiometer and Temperature. I can not get their values . (the values of those specific 4 fields).
I am using the code below:
- (void)loadTutorials {
// 1
NSURL *tutorialsUrl = [NSURL URLWithString:#"http://192.168.0.112/"];
NSData *tutorialsHtmlData = [NSData dataWithContentsOfURL:tutorialsUrl];
// 2
TFHpple *tutorialsParser = [TFHpple hppleWithHTMLData:tutorialsHtmlData];
// 3
NSString *tutorialsXpathQueryString = #"//div[#id='display']/p";
NSArray *tutorialsNodes = [tutorialsParser searchWithXPathQuery:tutorialsXpathQueryString];
// 4
NSMutableArray *newTutorials = [[NSMutableArray alloc] initWithCapacity:0];
for (TFHppleElement *element in tutorialsNodes) {
// 5
Tutorial *tutorial = [[Tutorial alloc] init];
[newTutorials addObject:tutorial];
NSLog(#"Object=%#",element);
// 6
tutorial.title = [[element firstChild] content];
// 7
tutorial.url = [element objectForKey:#"???"]; //???
}
// 8
_objects = newTutorials;
[self.tableView reloadData];
}
I suspect that the problem is the objectForKey: But i am not sure. I tested all sort of keys i could imagine. Any help would be most welcome.