IOS parse HTML but get weird value " " - html

Hi i'm doing my assignment and I want to get some information from this website:. I used TFHpple.h from Raywenderlich tutorial .Every thing went fine until I try to get the view count(this number: 8.024.835 ) but in my code it return this number "
" I NSLOG its element.raw then I see this code:
<p>
Số lượt xem:
<span class="color-fuchsia" id="PageViews"/>    
Yêu thích:
<span class="color-hotpink" id="LikeCount"/>
</p>
but when I use firebug to its html, it display like this:
<p>
Số lượt xem:
<span id="PageViews" class="color-fuchsia">8.024.835</span>
     Yêu thích:
<span id="LikeCount" class="color-hotpink">1.565</span>
</p>
How to get the correct value please help me.
this is my code to parse and nslog the html.
-(void) GetBookViewCount{
NSURL *url = [NSURL URLWithString:#“http://blogtruyen.com/truyen/conan”];
NSData *htmlData = [NSData dataWithContentsOfURL:url];
TFHpple *parser = [TFHpple hppleWithHTMLData:htmlData];
NSString* XpathQueryString = #"//div[#class='description']/p";
NSArray *Nodes = [parser searchWithXPathQuery:XpathQueryString];
for (TFHppleElement *element in Nodes) {
NSLog(#"%#",element.raw);
}
}

It looks like there a bunch of odd whitespace characters in between the two spans there.
    
This number here:
Looks like an ascii code for a symbol (though I can't find one that matches), so when you parse the code it might be breaking when you hit those characters. I'm not familiar with TFHpple.h but you may need to implement some input sanitization (stripping out those characters).

Related

Loading Multiple lines of HTML parse into one UITextField

I have a set of HTML code, here:
<div id="content_text">
<p>Year 11 students will be making their course selections online this year.
</p>
<p>Information about this system has been made available through Tutor sessions. Each student will have an individual password. Once subject selections have been made students are to print out a copy of their choices and then have this form signed by themselves, their parent and their Tutor. Forms are to be completed by 22 August. Course books can be borrowed from the Library or are available online.
Now my problem is, is that this is fed from an RSS FEED article web page and there may be 1 or even 11 <p> tags within this one <div id="content_text">. How can I fetch all of the <p> in this divider and display them formatted into a UITextField?
I am currently using the XPathQuery, btw so currently my parse looks like this:
NSData *tutorialsHtmlDataTwo = [NSData dataWithContentsOfURL:[NSURL URLWithString:_storyLink]];
TFHpple *tutorialsParserTwo = [TFHpple hppleWithHTMLData:tutorialsHtmlDataTwo];
NSString *tutorialsXpathQueryStringTwo = #"//div[#id='content_text']/p";
NSArray *tutorialsNodesTwo = [tutorialsParserTwo searchWithXPathQuery:tutorialsXpathQueryStringTwo];
NSMutableArray *newTutorialsTwo = [[NSMutableArray alloc] initWithCapacity:0];
for (TFHppleElement *element in tutorialsNodesTwo) {
Tutorial *tutorialTwo = [[Tutorial alloc] init];
[newTutorialsTwo addObject:tutorialTwo];
tutorialTwo.title = [[element firstChild] content];
_rssBody.text = [NSString stringWithFormat:#"%#", [[element firstChild] content]];
}
So as you can see it will only parse the second line. Any help appreciated.
Thanks, SebOH.
Please use this query to find all the elements inside given element.
div[#id='content_text']

How can I parse tables in HTML?

I'm trying to parse an HTML page with a lot of tables. I've searched the net on how to parse HTML with Objective C and I found hpple. I'd look for a tutorial which lead me to:
http://www.raywenderlich.com/14172/how-to-parse-html-on-ios
With this tutorial I tried to parse some forum news which has a lot of tables from this site (Hebrew): news forum
I tried to parse the news title, but I don't know what to write in my code. Every time I try to reach the path I get, "Nodes was nil."
The code of my latest attempt is:
NSURL *contributorsUrl = [NSURL URLWithString:#"http://rotter.net/cgi-bin/listforum.pl"];
NSData *contributorsHtmlData = [NSData dataWithContentsOfURL:contributorsUrl];
// 2
TFHpple *contributorsParser = [TFHpple hppleWithHTMLData:contributorsHtmlData];
// 3
NSString *contributorsXpathQueryString = #"//body/div/center/center/table[#cellspacing=0]/tbody/tr/td/table[#cellspacing=1]/tbody/tr[#bgcolor='#FDFDFD']/td[#align='right']/font[#class='text15bn']/font[#face='Arial']/a/b";
NSArray *contributorsNodes = [contributorsParser searchWithXPathQuery:contributorsXpathQueryString];
// 4
NSMutableArray *newContributors = [[NSMutableArray alloc] initWithCapacity:0];
for (TFHppleElement *element in contributorsNodes) {
// 5
Contributor *contributor = [[Contributor alloc] init];
[newContributors addObject:contributor];
// 6
Could somebody guide me through to getting the titles?
Not sure if that's the option for you, but if desired table have unique id's you could use a messy approach: load that html into UIWebView and get contents via – stringByEvaluatingJavaScriptFromString: like this:
// desired table container's id is "msg"
NSString* value = [webView stringByEvaluatingJavaScriptFromString:#"document.getElementById('msg').innerHTML"];

Format link with parentheses for TWRequest

I am trying to embed a link in a twee sent out from iPhone. The URL ends with a closing parentheses. That parentheses gets dropped and causes the t.co link to fail. I have tried encoding, tagging with href. Nothing seems to bring that closing parentheses into the resulting URL. See what I tried last, it failed for having too many characters. Why didn't it get shortened?:
if (tweetsEnabled && twitToSendTo != nil) {
// Build the string
NSString *mapURL = [NSString stringWithFormat:#"http://maps.google.com/maps?q=%#,%#+(%#)",newLogEvent.lattitude,newLogEvent.longitude,eventString];
NSString *encodedURL = [mapURL encodeString:NSUTF8StringEncoding];
NSString *tweetString = [NSString stringWithFormat:#"%#\n<a href>=\"%#\"></a>",note,encodedURL];
NSLog(#"%#",tweetString);
// Send it
[self performSelectorOnMainThread:#selector(postToTwitterWithString:) withObject:tweetString waitUntilDone:NO];
}
In its simplest form,without encoding, without the tags and without the +(%#), the link works. It displays as a t.co shortened link and brings up the webpage as intended. But I need the string in the parentheses to give text to the label and it seems it should be very easy to get that in.
Here is the output of the NSLog:
2012-08-14 09:57:43:551 app[2683:34071] -[logger insertLogEvent:withLocation:isArrival:] [Line 641] Arrival logged for Home
<a href>="http%3A%2F%2Fmaps.google.com%2Fmaps%3Fq%3D26.17071170827948%2C-80.16628238379971%2B%28Arrival%29"></a>
This worked:
// Build the string
NSString *mapURL = [NSString stringWithFormat:#"http://maps.google.com/maps?q=%#,%#+%%28%#%%29",newLogEvent.lattitude,newLogEvent.longitude,eventString];
NSString *tweetString = [NSString stringWithFormat:#"%#\n%#",note,mapURL];
I am not now encoding the entire string but only the parenthesis. This handy post, How to add percent sign to NSString , is where I found the correct way to get the percent signs in for the encoding.

Parsing HTML NSRegularExpression

i'm trying to parse an HTML page using NSRegularExpressions..
The page is a repetition of this html code:
<div class="fact" id="fact66">STRING THAT I WANT</div> <div class="vote">
#106
<span id="p106">246080 / 8.59 </span>
<span id="f106" class="vote2">
(+++)
(++)
(+)
(-)</span>
<span id="ve106"></span>
</div>
So, i'ld like to get the string between the div
<div class="fact" id="fact66">STRING THAT I WANT</div>
So i made a regex that looks like this
<div class="fact" id="fact[0-9].*\">(.*)</div>
Now, in my code, i implement it using this:
NSString *htmlString = [NSString stringWithContentsOfURL:[NSURL URLWithString:#"http://www.myurl.com"] encoding:NSASCIIStringEncoding error:nil];
NSRegularExpression* myRegex = [[NSRegularExpression alloc] initWithPattern:#"<div class=\"fact\" id=\"fact[0-9].*\">(.*)</div>\n" options:0 error:nil];
[myRegex enumerateMatchesInString:htmlString options:0 range:NSMakeRange(0, [htmlString length]) usingBlock:^(NSTextCheckingResult *match, NSMatchingFlags flags, BOOL *stop) {
NSRange range = [match rangeAtIndex:1];
NSString *string =[htmlString substringWithRange:range];
NSLog(string);
}];
But it returns nothing... I tested my regex in Java and PHP and it works great, what am i doing wrong ?
Thanks
Try using this regex:
#"<div class=\"fact\" id=\"fact[0-9]*\">([^<]*)</div>"
Regex:
fact[0-9].*
means: fact followed by a number between 0 and 9, followed by any character repeated any number of times.
I also suggest using:
([^<]*)
instead of
(.*)
to match between the two divs so to deal with regex greediness, or alternatively:
(.*?)
(? will make the regex non-greedy, so it stops at the first instance of </div>.

HTML from NSAttributedString

Rather than converting HTML to an attributed string, I need to convert it back to HTML. This can easily be done on Mac as can be seen here: http://www.justria.com/2011/01/18/how-to-convert-nsattributedstring-to-html-markup/
Unfortuately, the method dataFromRange:documentAttributes: is only available on Mac via the NSAttributedString AppKit Additions.
My question is how can you do this on iOS?
Not the 'easy' way, but what about iterating through the attributes of the string using:
- (void)enumerateAttributesInRange:(NSRange)enumerationRange
options:(NSAttributedStringEnumerationOptions)opts
usingBlock:(void (^)(NSDictionary *attrs, NSRange range, BOOL *stop))block
Have an NSMutableString variable to accumulate the HTML (lets call it 'html'). In the block, you would construct the HTML manually using strings. For instance if the text attributes 'attrs' specify red, bold text:
[html appendFormat:#"<span style='color:red; font-weight: bold;'>%#</span>", [originalStr substringWithRange:range]]
EDIT: Stumbled across this yesterday:
NSAttributedString+HTMLFromRange category from "UliKit"
(https://github.com/uliwitness/UliKit/blob/master/NSAttributedString+HTMLFromRange.m)
Looks like it will do what you want.
Use the below code. it works well.
NSAttributedString *s = ...;
NSDictionary *documentAttributes = #{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType};
NSData *htmlData = [s dataFromRange:NSMakeRange(0, s.length) documentAttributes:documentAttributes error:NULL];
NSString *htmlString = [[NSString alloc] initWithData:htmlData encoding:NSUTF8StringEncoding];