iOS: Strip <img...> from NSString (a html string) - html

So I have an NSString which is basically an html string with all the usual html elements. The specific thing I would like to do is to just strip it from all the img tags.
The img tags may or may not have max-width, style or other attributes so I do not know their length up front. They always end with />
How could I do this?
EDIT: Based on nicolasthenoz's answer, I came up with a solution that requires less code:
NSString *HTMLTagss = #"<img[^>]*>"; //regex to remove img tag
NSString *stringWithoutImage = [htmlString stringByReplacingOccurrencesOfRegex:HTMLTagss withString:#""];

You can use the NSString method stringByReplacingOccurrencesOfString with the NSRegularExpressionSearch option:
NSString *result = [html stringByReplacingOccurrencesOfString:#"<img[^>]*>" withString:#"" options:NSCaseInsensitiveSearch | NSRegularExpressionSearch range:NSMakeRange(0, [html length])];
Or you can also use the replaceMatchesInString method of NSRegularExpression. Thus, assuming you have your html in a NSMutableString *html, you can:
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"<img[^>]*>"
options:NSRegularExpressionCaseInsensitive
error:nil];
[regex replaceMatchesInString:html
options:0
range:NSMakeRange(0, html.length)
withTemplate:#""];
I'd personally lean towards one of these options over the stringByReplacingOccurrencesOfRegex method of RegexKitLite. There's no need to introduce a third-party library for something as simple as this unless there was some other compelling issue.

Use a regular expression, find the matchs in your string and remove them !
Here is how
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"<img[^>]*>"
options:NSRegularExpressionCaseInsensitive
error:nil];
NSMutableString* mutableString = [yourStringToStripFrom mutableCopy];
NSInteger offset = 0; // keeps track of range changes in the string due to replacements.
for (NSTextCheckingResult* result in [regex matchesInString:yourStringToStripFrom
options:0
range:NSMakeRange(0, [yourStringToStripFrom length])]) {
NSRange resultRange = [result range];
resultRange.location += offset;
NSString* match = [regex replacementStringForResult:result
inString:mutableString
offset:offset
template:#"$0"];
// make the replacement
[mutableString replaceCharactersInRange:resultRange withString:#""];
// update the offset based on the replacement
offset += ([match length] - resultRange.length);
}

You can use below function in Swift 4,5:
func filterImgTag(text: String) -> String{
return text.replacingOccurrences(of: "<img[^>]*>", with: "", options: String.CompareOptions.regularExpression)
}
Hope it can help you all! comment below if it work for you. Thanks.

Related

How to apply bold for string in html format in iOS?

I want to print text in bold using string that prints in compose mail from my app
messageBody = [NSString stringWithFormat:#"<b>Background Information</b>"];
What you could do is use an NSAttributedString.
NSString *boldFontName = [[UIFont boldSystemFontOfSize:12] fontName];
NSString *yourString = ...;
NSRange boldedRange = NSMakeRange(22, 4);
NSMutableAttributedString *attrString = [[NSMutableAttributedString alloc] initWithString:yourString];
[attrString beginEditing];
[attrString addAttribute:kCTFontAttributeName
value:boldFontName
range:boldedRange];
[attrString endEditing];
//draw attrString here...
or
use this code
you do not want to bother with fonts (as not every variation of font contains "Bold"), here is another way to do this:
NSMutableAttributedString *attrString = [[NSMutableAttributedString alloc] initWithString:"Approximate Distance: 120m away"];
[attrString beginEditing];
[attrString applyFontTraits:NSBoldFontMask
range:NSMakeRange(22, 4)];
[attrString endEditing];
NSAttributedString sounds like the tool you need. In iOS 7, they added support to parse HTML strings with basic markup for styling into NSAttributedStrings. It's pretty easy:
NSMutableAttributedString *titleString = [[NSMutableAttributedString alloc] initWithData:[someHTMLString dataUsingEncoding:NSUTF8StringEncoding] options:#{NSDocumentTypeDocumentAttribute:NSHTMLTextDocumentType} documentAttributes:NULL error:nil];
self.someLabel.attributedText = titleString;
If you want to add font styling, I found it's best to add a span with a font specified around the text before creating the NSMutableAttributedString.
NSString *styledTitle = [NSString stringWithFormat:#"<span style=\"font-family:'%#'; font-size:%dpx;\">%#</span>", someFont.fontName, (int)someFont.pointSize, someHTMLString];

Get proper format of the text file in HTML form

I am using the following code to convert text to pdf form:
NSString *filePath = [[NSBundle mainBundle] pathForResource:#"All_lang_unicode" ofType:#"txt"];
NSString *str;
NSData *myData = [NSData dataWithContentsOfFile:filePath];
if (myData) {
str = [[NSString alloc] initWithData:myData encoding:NSUTF16StringEncoding];
NSLog(#"STRING : %#",str);
}
NSString *html = [NSString stringWithFormat:#"<body>%#</body>",str];
UIMarkupTextPrintFormatter *fmt = [[UIMarkupTextPrintFormatter alloc]
initWithMarkupText:html];
UIPrintPageRenderer *render = [[UIPrintPageRenderer alloc] init];
[render addPrintFormatter:fmt startingAtPageAtIndex:0];
CGRect page;
page.origin.x=0;
page.origin.y=0;
page.size.width=792;
page.size.height=612;
CGRect printable=CGRectInset( page, 0, 0 );
[render setValue:[NSValue valueWithCGRect:page] forKey:#"paperRect"];
[render setValue:[NSValue valueWithCGRect:printable] forKey:#"printableRect"];
NSLog(#"number of pages %d",[render numberOfPages]);
NSMutableData * pdfData = [NSMutableData data];
UIGraphicsBeginPDFContextToData( pdfData, CGRectZero, nil );
for (NSInteger i=0; i < [render numberOfPages]; i++)
{
UIGraphicsBeginPDFPage();
CGRect bounds = UIGraphicsGetPDFContextBounds();
[render drawPageAtIndex:i inRect:bounds];
}
UIGraphicsEndPDFContext();
NSArray *paths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES);
NSString *documentsDirectory = [paths objectAtIndex:0];
NSString * pdfFile = [documentsDirectory stringByAppendingPathComponent:#"test.pdf"];
[pdfData writeToFile:pdfFile atomically:YES];
But problem is that I am not getting the proper formatting of the text. when I print using NSLog(); I get the proper content but when I place the string in STRING the spacing and newline is missing.. all coming in same line. i.e. continuous.
(UPDATE : )
NSLog OUTPUT:(Proper)
NEW DELHI: Sachin Tendulkar's streak of low scores might have raised a question mark over his future but senior BCCI official and IPL chairman Rajiv Shukla on Monday came out in support of the senior batsman saying one needs to look at his "colossal record" before making any comment.
"He will hang up his boots when he thinks it's time for him to go. He does not need any advice on this. Before making a comment on his performance you have to see his colossal record and his past performance," Shukla told reporters outside the Parliament adding that the veteran cricketer will come back strongly in the forthcoming matches.
and Im getting as:
NEW DELHI: Sachin Tendulkar's streak of low scores might have raised a question mark over his future but senior BCCI official and IPL chairman Rajiv Shukla on Monday came out in support of the senior batsman saying one needs to look at his "colossal record" before making any comment. "He will hang up his boots when he thinks it's time for him to go. He does not need any advice on this. Before making a comment on his performance you have to see his colossal record and his past performance," Shukla told reporters outside the Parliament adding that the veteran cricketer will come back strongly in the forthcoming matches.
Can any one please suggest modification in this code so that I can get the proper format.
If I get it right, you should replace your new line characters with <br> or <p>.
Try
str = [str stringByReplacingOccurrencesOfString:#"\n" withString:#"<br>"];
How to detect new lines in Objective-C
Solution of your next question might look like this:
NSArray *words = [str componentsSeparatedByString:#" "];
NSString *line = #"";
NSUInteger maxLineLength = 100;
NSString *resultStr = #"";
for (NSString *word in words) {
if ([line length] + [word length] > maxLineLength) {
resultStr = [resultStr stringByAppendingFormat:#"%#<br>", line];
line = word;
} else {
line = [line stringByAppendingFormat:#" %#", word];
}
}
resultStr = [resultStr stringByAppendingString:line];

Getting the HTML tags in hpple as well as text?

The code below takes all of the text from a certain div. Is it possible for me to take all the text from the div as well as the html attributes? So it also adds all of the <p> </p>'s and <br> </br>'s to the string, myString?
//trims string from previous page
NSString *trimmedString = [stringy stringByTrimmingCharactersInSet:
[NSCharacterSet whitespaceAndNewlineCharacterSet]];
NSData *data = [[NSString stringWithContentsOfURL:[NSURL URLWithString:trimmedString]] dataUsingEncoding:NSUTF8StringEncoding];
TFHpple *xpathParser = [[TFHpple alloc] initWithHTMLData:data];
NSArray *elements = [xpathParser searchWithXPathQuery:#"//div[#class='field-item even']"];
TFHppleElement *element = [elements lastObject]; //may need to change this number?!
NSString *mystring = [self getStringForTFHppleElement:element];
trimmedTextView.text = [trimmedTextView.text stringByAppendingString:mystring];
Method here:
-(NSString*) getStringForTFHppleElement:(TFHppleElement *)element
{
NSMutableString *result = [NSMutableString new];
// Iterate recursively through all children
for (TFHppleElement *child in [element children])
[result appendString:[self getStringForTFHppleElement:child]];
// Hpple creates a <text> node when it parses texts
if ([element.tagName isEqualToString:#"text"])
[result appendString:element.content];
return result;
}
Any ideas would be appreciated. Cheers.
Try this:
NSString *htmlDataString = [webView stringByEvaluatingJavaScriptFromString: #"document.documentElement.outerHTML"];
This will take all the HTML out to string. You can then parse it in your native code and find div which is your interest what you have did in above example.
You can do it as well with any DOM element in your HTML like:
NSString *htmlDataString = [webView stringByEvaluatingJavaScriptFromString: #"document.documentElement.getElemenById('mydiv')"];
which is more efficient but requires a bit of javascript skill.

Parsing HTML NSRegularExpression

i'm trying to parse an HTML page using NSRegularExpressions..
The page is a repetition of this html code:
<div class="fact" id="fact66">STRING THAT I WANT</div> <div class="vote">
#106
<span id="p106">246080 / 8.59 </span>
<span id="f106" class="vote2">
(+++)
(++)
(+)
(-)</span>
<span id="ve106"></span>
</div>
So, i'ld like to get the string between the div
<div class="fact" id="fact66">STRING THAT I WANT</div>
So i made a regex that looks like this
<div class="fact" id="fact[0-9].*\">(.*)</div>
Now, in my code, i implement it using this:
NSString *htmlString = [NSString stringWithContentsOfURL:[NSURL URLWithString:#"http://www.myurl.com"] encoding:NSASCIIStringEncoding error:nil];
NSRegularExpression* myRegex = [[NSRegularExpression alloc] initWithPattern:#"<div class=\"fact\" id=\"fact[0-9].*\">(.*)</div>\n" options:0 error:nil];
[myRegex enumerateMatchesInString:htmlString options:0 range:NSMakeRange(0, [htmlString length]) usingBlock:^(NSTextCheckingResult *match, NSMatchingFlags flags, BOOL *stop) {
NSRange range = [match rangeAtIndex:1];
NSString *string =[htmlString substringWithRange:range];
NSLog(string);
}];
But it returns nothing... I tested my regex in Java and PHP and it works great, what am i doing wrong ?
Thanks
Try using this regex:
#"<div class=\"fact\" id=\"fact[0-9]*\">([^<]*)</div>"
Regex:
fact[0-9].*
means: fact followed by a number between 0 and 9, followed by any character repeated any number of times.
I also suggest using:
([^<]*)
instead of
(.*)
to match between the two divs so to deal with regex greediness, or alternatively:
(.*?)
(? will make the regex non-greedy, so it stops at the first instance of </div>.

HTML from NSAttributedString

Rather than converting HTML to an attributed string, I need to convert it back to HTML. This can easily be done on Mac as can be seen here: http://www.justria.com/2011/01/18/how-to-convert-nsattributedstring-to-html-markup/
Unfortuately, the method dataFromRange:documentAttributes: is only available on Mac via the NSAttributedString AppKit Additions.
My question is how can you do this on iOS?
Not the 'easy' way, but what about iterating through the attributes of the string using:
- (void)enumerateAttributesInRange:(NSRange)enumerationRange
options:(NSAttributedStringEnumerationOptions)opts
usingBlock:(void (^)(NSDictionary *attrs, NSRange range, BOOL *stop))block
Have an NSMutableString variable to accumulate the HTML (lets call it 'html'). In the block, you would construct the HTML manually using strings. For instance if the text attributes 'attrs' specify red, bold text:
[html appendFormat:#"<span style='color:red; font-weight: bold;'>%#</span>", [originalStr substringWithRange:range]]
EDIT: Stumbled across this yesterday:
NSAttributedString+HTMLFromRange category from "UliKit"
(https://github.com/uliwitness/UliKit/blob/master/NSAttributedString+HTMLFromRange.m)
Looks like it will do what you want.
Use the below code. it works well.
NSAttributedString *s = ...;
NSDictionary *documentAttributes = #{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType};
NSData *htmlData = [s dataFromRange:NSMakeRange(0, s.length) documentAttributes:documentAttributes error:NULL];
NSString *htmlString = [[NSString alloc] initWithData:htmlData encoding:NSUTF8StringEncoding];