iOS: Find the end of a specific paragraph in an HTML NSString - html

So I receive an NSString with html code like this:
<p class="img"><img src="blahblahblah"></p><p>This is some text</p>
I would like to find the end of the img-classed paragraph, so I can insert a heading in between the two paragraphs. Please note:
that the img-classed paragraph is not necessarily the first paragraph in the string.
there can be multiple img-classed paragraphs in the string but I only need to insert something after the first one
I would like to find the character-position after the first img-classed </p> in the string, and not parse it.

You want to parse, there is really no other option. But then make sure to find a criteria which is really unique.
Here is the Cocoa+NSString solution :
NSScanner *scanner = [NSScanner scannerWithString:originalString];
[scanner scanUpToString:#"<p class=\"img">" intoString:nil];
[scanner scanString:#"par_categorie_2\">" intoString:nil];
[scanner scanUpToString:#"</p>" intoString:nil];
[scanner scanString:#"</p>" intoString:nil];
NSInteger insertionPoint = scanner.scanLocation;
NSMutableString *modifiedString = [[NSMutableString alloc] initWithString:originalString];
[modifiedString insertString:insertedString atIndex:insertionPoint];

Related

IOS parse HTML but get weird value " "

Hi i'm doing my assignment and I want to get some information from this website:. I used TFHpple.h from Raywenderlich tutorial .Every thing went fine until I try to get the view count(this number: 8.024.835 ) but in my code it return this number "
" I NSLOG its element.raw then I see this code:
<p>
Số lượt xem:
<span class="color-fuchsia" id="PageViews"/>    
Yêu thích:
<span class="color-hotpink" id="LikeCount"/>
</p>
but when I use firebug to its html, it display like this:
<p>
Số lượt xem:
<span id="PageViews" class="color-fuchsia">8.024.835</span>
     Yêu thích:
<span id="LikeCount" class="color-hotpink">1.565</span>
</p>
How to get the correct value please help me.
this is my code to parse and nslog the html.
-(void) GetBookViewCount{
NSURL *url = [NSURL URLWithString:#“http://blogtruyen.com/truyen/conan”];
NSData *htmlData = [NSData dataWithContentsOfURL:url];
TFHpple *parser = [TFHpple hppleWithHTMLData:htmlData];
NSString* XpathQueryString = #"//div[#class='description']/p";
NSArray *Nodes = [parser searchWithXPathQuery:XpathQueryString];
for (TFHppleElement *element in Nodes) {
NSLog(#"%#",element.raw);
}
}
It looks like there a bunch of odd whitespace characters in between the two spans there.
    
This number here:
Looks like an ascii code for a symbol (though I can't find one that matches), so when you parse the code it might be breaking when you hit those characters. I'm not familiar with TFHpple.h but you may need to implement some input sanitization (stripping out those characters).

Trying to pull tabledata out from html

Basically I need to parse td(table data) from this html file.I need to get the right xpath.I am using raywenderlich as a model for this task, and here is the code I have so far.
NSURL *tutorialsUrl = [NSURL URLWithString:#"http://example.com/events];
NSData *tutorialsHtmlData = [NSData dataWithContentsOfURL:tutorialsUrl];
// 2
TFHpple *tutorialsParser = [TFHpple hppleWithHTMLData:tutorialsHtmlData];
// 3
NSString *tutorialsXpathQueryString = #"This is where I need to enter my xpath to rerieve the table data";
NSArray *tutorialsNodes = [tutorialsParser searchWithXPathQuery:tutorialsXpathQueryString];
I have the html path to this element thanks to firebug,which I will post below.
/<html lang="en">/<body>/div id="page" class="container">/<div class="span-19">/<div id="content">/<div>/<table id=yw0 class="detail-view">/<tbody>/<tr class="even">/<td>moo</td>/
I need the text moo to be parsed. Any help will be deeply appreciated.
this is the x path I get from firebug as well, but it didn't work at all.
/html/body/div/div[4]/div/div/table/tbody/tr[2]/td
At first, you need to get substrings, where each substring contains one element that needs to be extracted:
NSArray *split = [text componentsSeparatedByString:#"<td>"];
In array "split", first object contains nothing you want, so you will not work with it anymore. Now, for each substring in this array (except first one) you need to search for substring with "/td" tag:
NSRange range = [string rangeOfString:#"</td>"];
and then remove it and everything what is behind it:
- (NSString *)substringToIndex:(NSUInteger)anIndex //you will get index by searching for "</td>" as mentioned
EDIT:
Another possibility is to use componentsSeparatedByString even instead of 2nd and 3rd step for mentioned tag and in first item of each array, you will have wanted text.
EDIT2: (whole code)
NSString* originalText = #" /<html lang=""en"">/<body>/div id=""page"" class=""container"">/<div class=""span-19"">/<div id=""content"">/<div>/<table id=yw0 class=""detail-view"">/<tbody>/<tr class=""even"">/<td>moo1</td><td>moo2</td>/";
NSArray* separatedParts = [originalText componentsSeparatedByString:#"<td>"];
NSMutableArray* arrayOfResults = [[NSMutableArray alloc] init];
for (int i = 1; i < separatedParts.count; i++) {
NSRange range = [[separatedParts objectAtIndex:i] rangeOfString:#"</td>"];
NSString *partialResult = [[separatedParts objectAtIndex:i] substringToIndex:range.location];
[arrayOfResults addObject:partialResult];
}
I have slightly altered original text to show that its really working for table with more items inside

Getting the HTML tags in hpple as well as text?

The code below takes all of the text from a certain div. Is it possible for me to take all the text from the div as well as the html attributes? So it also adds all of the <p> </p>'s and <br> </br>'s to the string, myString?
//trims string from previous page
NSString *trimmedString = [stringy stringByTrimmingCharactersInSet:
[NSCharacterSet whitespaceAndNewlineCharacterSet]];
NSData *data = [[NSString stringWithContentsOfURL:[NSURL URLWithString:trimmedString]] dataUsingEncoding:NSUTF8StringEncoding];
TFHpple *xpathParser = [[TFHpple alloc] initWithHTMLData:data];
NSArray *elements = [xpathParser searchWithXPathQuery:#"//div[#class='field-item even']"];
TFHppleElement *element = [elements lastObject]; //may need to change this number?!
NSString *mystring = [self getStringForTFHppleElement:element];
trimmedTextView.text = [trimmedTextView.text stringByAppendingString:mystring];
Method here:
-(NSString*) getStringForTFHppleElement:(TFHppleElement *)element
{
NSMutableString *result = [NSMutableString new];
// Iterate recursively through all children
for (TFHppleElement *child in [element children])
[result appendString:[self getStringForTFHppleElement:child]];
// Hpple creates a <text> node when it parses texts
if ([element.tagName isEqualToString:#"text"])
[result appendString:element.content];
return result;
}
Any ideas would be appreciated. Cheers.
Try this:
NSString *htmlDataString = [webView stringByEvaluatingJavaScriptFromString: #"document.documentElement.outerHTML"];
This will take all the HTML out to string. You can then parse it in your native code and find div which is your interest what you have did in above example.
You can do it as well with any DOM element in your HTML like:
NSString *htmlDataString = [webView stringByEvaluatingJavaScriptFromString: #"document.documentElement.getElemenById('mydiv')"];
which is more efficient but requires a bit of javascript skill.

Parsing HTML NSRegularExpression

i'm trying to parse an HTML page using NSRegularExpressions..
The page is a repetition of this html code:
<div class="fact" id="fact66">STRING THAT I WANT</div> <div class="vote">
#106
<span id="p106">246080 / 8.59 </span>
<span id="f106" class="vote2">
(+++)
(++)
(+)
(-)</span>
<span id="ve106"></span>
</div>
So, i'ld like to get the string between the div
<div class="fact" id="fact66">STRING THAT I WANT</div>
So i made a regex that looks like this
<div class="fact" id="fact[0-9].*\">(.*)</div>
Now, in my code, i implement it using this:
NSString *htmlString = [NSString stringWithContentsOfURL:[NSURL URLWithString:#"http://www.myurl.com"] encoding:NSASCIIStringEncoding error:nil];
NSRegularExpression* myRegex = [[NSRegularExpression alloc] initWithPattern:#"<div class=\"fact\" id=\"fact[0-9].*\">(.*)</div>\n" options:0 error:nil];
[myRegex enumerateMatchesInString:htmlString options:0 range:NSMakeRange(0, [htmlString length]) usingBlock:^(NSTextCheckingResult *match, NSMatchingFlags flags, BOOL *stop) {
NSRange range = [match rangeAtIndex:1];
NSString *string =[htmlString substringWithRange:range];
NSLog(string);
}];
But it returns nothing... I tested my regex in Java and PHP and it works great, what am i doing wrong ?
Thanks
Try using this regex:
#"<div class=\"fact\" id=\"fact[0-9]*\">([^<]*)</div>"
Regex:
fact[0-9].*
means: fact followed by a number between 0 and 9, followed by any character repeated any number of times.
I also suggest using:
([^<]*)
instead of
(.*)
to match between the two divs so to deal with regex greediness, or alternatively:
(.*?)
(? will make the regex non-greedy, so it stops at the first instance of </div>.

HTML from NSAttributedString

Rather than converting HTML to an attributed string, I need to convert it back to HTML. This can easily be done on Mac as can be seen here: http://www.justria.com/2011/01/18/how-to-convert-nsattributedstring-to-html-markup/
Unfortuately, the method dataFromRange:documentAttributes: is only available on Mac via the NSAttributedString AppKit Additions.
My question is how can you do this on iOS?
Not the 'easy' way, but what about iterating through the attributes of the string using:
- (void)enumerateAttributesInRange:(NSRange)enumerationRange
options:(NSAttributedStringEnumerationOptions)opts
usingBlock:(void (^)(NSDictionary *attrs, NSRange range, BOOL *stop))block
Have an NSMutableString variable to accumulate the HTML (lets call it 'html'). In the block, you would construct the HTML manually using strings. For instance if the text attributes 'attrs' specify red, bold text:
[html appendFormat:#"<span style='color:red; font-weight: bold;'>%#</span>", [originalStr substringWithRange:range]]
EDIT: Stumbled across this yesterday:
NSAttributedString+HTMLFromRange category from "UliKit"
(https://github.com/uliwitness/UliKit/blob/master/NSAttributedString+HTMLFromRange.m)
Looks like it will do what you want.
Use the below code. it works well.
NSAttributedString *s = ...;
NSDictionary *documentAttributes = #{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType};
NSData *htmlData = [s dataFromRange:NSMakeRange(0, s.length) documentAttributes:documentAttributes error:NULL];
NSString *htmlString = [[NSString alloc] initWithData:htmlData encoding:NSUTF8StringEncoding];