How to write regex for Objective-C? - html

I have some html data containing some img tags as follows:
img width=500 height=400
img width=400 height=250
img width=600 height=470
Height and width always changing. I have to replace that html data. I need to replace that html data to "img with=100" using Objective-C.
I wrote these but it's not matching
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"/(img\\s)((width|height)(=)([0-9]+)"
options:NSRegularExpressionCaseInsensitive
error:&error];
NSUInteger numberOfMatches = [regex numberOfMatchesInString:myhtmldata
options:0
range:NSMakeRange(0, [myhtmldata length])];
NSString *modifiedString;
if (numberOfMatches > 0)
{
modifiedString = [regex stringByReplacingMatchesInString:myhtmldata
options:0
range:NSMakeRange(0, [myhtmldata length])
withTemplate:#"img width=30"];
}
Can you help me ?

If I infer the intent correctly from your sample code, you just want to use NSRegularExpression to change the width to 30. Then:
#import <Foundation/Foundation.h>
int main(int argc, char *argv[]) {
#autoreleasepool {
NSError *regexError = nil;
NSRegularExpressionOptions options = 0;
NSString *sampleText = #"img width=500 height=400";
NSString *pattern = #"^(img\\s+)width=\\d+(\\s+height=\\d+)";
NSRegularExpression *expression = [NSRegularExpression regularExpressionWithPattern:pattern options:options error:&regexError];
sampleText = [expression stringByReplacingMatchesInString:sampleText
options:0
range:NSMakeRange(0,sampleText.length)
withTemplate:#"$1width=30$2"];
printf("%s\n",[sampleText UTF8String]);
}
}
prints img width=30 height=400 to the console.
EDIT:
You change change the regular expression to (img\s+width=)\d+\s+height=\d+ which when escaped properly will be:
#"(img\\s+width=)\\d+\\s+height=\\d+"
then change the template string to #"$130". IF you make those changes to the my original code, you should match all occurrences of the img tag embedded in HTML. For example, it should change:
<html>
<body>
<img width=500 height=400>
<img width=520 height=100>
</body>
</html>
to:
<html>
<body>
<img width=30>
<img width=30>
</body>
</html>
Is this what your specs call for?

I found a different method and it's working. Here is code :
NSArray* ary = [oldHtml componentsSeparatedByString:#"<img"];
NSString* newHtml = [ary objectAtIndex:0];
for (int i = 1; i < [ary count]; i++) {
newHtml = [newHtml stringByAppendingString:[#"<img width=300 " stringByAppendingString:[[ary objectAtIndex:i] substringFromIndex:[[ary objectAtIndex:i] rangeOfString:#"src"].location]]];
}

Related

Retrieve contents from HTML which is a NSString

This is my NSString :
NSString timeString = #"<h5 style="direction:ltr"><span data-version-created-date="20180326T120530.000+0000" class="releasedDate">26-Mar-2018 12:05:30</span></h5>";
I want to retrieve only "26-Mar-2018 12:05:30" which is in the span tag.
How do i do that in Objective C?
Please note : The given HTML is in NSString format.
Try this
- (NSString *)stringByStrippingHTML : (NSString*) s {
NSRange r;
while ((r = [s rangeOfString:#"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
s = [s stringByReplacingCharactersInRange:r withString:#""];
return s;
}
This will work by stripping out bracketed (<>) expressions. Slashes have been added () to timeString to make it proper NSString*. The stripping is repeated four times, bt should probably be looped with condition.
NSString * timeString = #"<h5 style=\"direction:ltr\"><span data-version-created-date=\"20180326T120530.000+0000\" class=\"releasedDate\">26-Mar-2018 12:05:30</span></h5>";
NSRange openRange = [timeString rangeOfString:#"<"];
NSRange closeRange = [timeString rangeOfString:#">"];
NSRange enclosedRange = NSMakeRange(openRange.location, closeRange.location-openRange.location+1);
timeString = [timeString stringByReplacingCharactersInRange:enclosedRange withString:#""];
openRange = [timeString rangeOfString:#"<"];
closeRange = [timeString rangeOfString:#">"];
enclosedRange = NSMakeRange(openRange.location, closeRange.location-openRange.location+1);
timeString = [timeString stringByReplacingCharactersInRange:enclosedRange withString:#""];
openRange = [timeString rangeOfString:#"<"];
closeRange = [timeString rangeOfString:#">"];
enclosedRange = NSMakeRange(openRange.location, closeRange.location-openRange.location+1);
timeString = [timeString stringByReplacingCharactersInRange:enclosedRange withString:#""];
openRange = [timeString rangeOfString:#"<"];
closeRange = [timeString rangeOfString:#">"];
enclosedRange = NSMakeRange(openRange.location, closeRange.location-openRange.location+1);
timeString = [timeString stringByReplacingCharactersInRange:enclosedRange withString:#""];
NSLog(#"timeString = %#", timeString);
This worked with me
NSString *timeString = #"<h5 style=\"direction:ltr\"><span data-version-created-date=\"20180326T120530.000+0000\" class=\"releasedDate\">26-Mar-2018 12:05:30</span></h5>";
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#">\\d.+\\d<"
options:NSRegularExpressionCaseInsensitive
error:NULL];
[regex enumerateMatchesInString:timeString options:0 range:NSMakeRange(0, [timeString length]) usingBlock:^(NSTextCheckingResult *match, NSMatchingFlags flags, BOOL *stop){
// your code to handle matches here
NSString *subString = [timeString substringWithRange:match.range];
NSLog(#"%#",[subString substringWithRange:NSMakeRange(1, subString.length - 2)]);
}];
If you want to make sure you get the date between the span tags, it would be best to be more explicit than either stripping out all the HTML tags and assuming the only thing left is the date, or assuming there is only one span tag in the whole HTML text. It may work for now, but that will likely break in the future if the HTML ever changes.
NSString * timeString = #"<h5 style=\"direction:ltr\"><span data-version-created-date=\"20180326T120530.000+0000\" class=\"releasedDate\">26-Mar-2018 12:05:30</span><span class=\"someOtherClass\">garbageData</span></h5>";
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"<span.*class=\"releasedDate\"[^>]*>(.*)</span.*>"
options:NSRegularExpressionCaseInsensitive
error:nil];
NSTextCheckingResult *textCheckingResult = [regex firstMatchInString:timeString options:0 range:NSMakeRange(0, timeString.length)];
NSString *releaseDateString = [timeString substringWithRange:[textCheckingResult rangeAtIndex:1]];
if( ! [releaseDateString isEqualToString:#""] )
{
NSDateFormatter *dateFormatter = [[NSDateFormatter alloc] init];
[dateFormatter setDateFormat:#"dd-MMM-yyyy' 'HH:mm:ss"];
NSDate *releaseDate = [dateFormatter dateFromString:releaseDateString];
NSLog( #"%# - %#", releaseDateString, releaseDate );
}
Note that this works even if there are other spans in the HTML text. It specifically pulls out the one with a class "releasedDate".

NSString remove html tags, keep <Text in angle brackets>

How do I remove html tags from NSString, but keep any <Text in angle brackets>?
Like <p>123 <Hello> abc</p> -> 123 <Hello> abc
I have tried all kinds of regexp, scanner and XML Parser solutions, but they remove <Text in angle brackets> as well as tags.
The only solution that fit me was to use an NSAttributedString with options
NSAttributedString *str = [[NSAttributedString alloc] initWithData:utf8Data
options:#{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: #(NSUTF8StringEncoding)}
documentAttributes:nil
error:nil];
NSString *result = [str string];
but this approach employs WebKit and consumes too much memory for my task.
So, how do I strip tags from NSString, keeping <Text in angle brackets> without using any kind of WebKit/UIWebView and so on?
I asked a similar questionma while ago, may be some of the answers can help you out.
If you do need the full HTML parser and just want to strip HTML tags out, a NSString category might be useful (this one is a modified category by mwaterfal):
- (NSString *)stringByStrippingTags {
// Find first & and short-cut if we can
NSUInteger ampIndex = [self rangeOfString:#"<" options:NSLiteralSearch].location;
if (ampIndex == NSNotFound) {
return [NSString stringWithString:self]; // return copy of string as no tags found
}
// Scan and find all tags
NSScanner *scanner = [NSScanner scannerWithString:self];
[scanner setCharactersToBeSkipped:nil];
NSMutableSet *tags = [[NSMutableSet alloc] init];
NSString *tag;
do {
// Scan up to <
tag = nil;
[scanner scanUpToString:#"<" intoString:NULL];
[scanner scanUpToString:#">" intoString:&tag];
if (tag) {
NSString *t = [[NSString alloc] initWithFormat:#"%#>", tag];
[tags addObject:t];
}
} while (![scanner isAtEnd]);
NSMutableString *result = [[NSMutableString alloc] initWithString:self];
NSString *finalString;
NSString *replacement;
for (NSString *t in tags) {
replacement = #" ";
if ([t isEqualToString:#"<a>"] ||
[t isEqualToString:#"</a>"] ||
[t isEqualToString:#"<span>"] ||
[t isEqualToString:#"</span>"] ||
[t isEqualToString:#"<strong>"] ||
[t isEqualToString:#"</strong>"] ||
[t isEqualToString:#"<em>"] ||
[t isEqualToString:#"</em>"]) {
replacement = #"";
}
[result replaceOccurrencesOfString:t
withString:replacement
options:NSLiteralSearch
range:NSMakeRange(0, result.length)];
}
// Remove multi-spaces and line breaks
return = [result stringByRemovingNewLinesAndWhitespace];
}

How to encode odd HTML characters with Xcode?

I need to save a HTML page in my app, and when characters like "€" are found, the saved file displays them wrong.
I tried several encodings but none solves this, is there any solution?
I have also tried to replace the characters for the HTML name, but it still doesn't work.
Here's my code:
NSString *HTML = [web stringByEvaluatingJavaScriptFromString:#"document.getElementsByTagName('html')[0].innerHTML;"];
NSArray *path = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES);
NSString *filePath = [NSString stringWithFormat:#"%#/%#", [path objectAtIndex:0],#"code.html"];
int enc_arr[] = {
NSISOLatin1StringEncoding, // ESP
NSUTF8StringEncoding, // UTF-8
NSShiftJISStringEncoding, // Shift_JIS
NSJapaneseEUCStringEncoding, // EUC-JP
NSISO2022JPStringEncoding, // JIS
NSASCIIStringEncoding // ASCII
};
NSData *urlData= nil;
for (int i=0; i<6; i++) {
urlData = [HTML dataUsingEncoding:enc_arr[i]];
if (urlData!=nil) {
break;
}
}
[urlData writeToFile:filePath atomically:YES];
See these methods of NSString:
- (NSStringEncoding)smallestEncoding
- (NSStringEncoding)fastestEncoding
or just use method below with flag set to YES :
- (NSData *)dataUsingEncoding:(NSStringEncoding)encoding allowLossyConversion:(BOOL)flag
but with this one you can loose some characters.
Ok I finally did it, it's not the best way but the only one that worked for me and without using external libraries:
-(NSString*)escapeHTML:(NSString*)code{
NSMutableArray *maExceptions = [[NSMutableArray alloc] initWithObjects: #"Œ", #"œ", #"Š", #"š", #"Ÿ", #"ƒ", #"‘", #"’", #"‚", #"“", #"”", #"„", #"†", #"‡", #"•", #"…", #"‰", #"€", #"™", nil];
for (int i=0; i<[maExceptions count]; i++) {
code = [code stringByReplacingOccurrencesOfString:[maExceptions objectAtIndex:i] withString:[NSString stringWithFormat:#"&#x%x;",[[maExceptions objectAtIndex:i] characterAtIndex:0]]];
}
return code;
}

Removing parts from HTML page in UIWebVIew

I'm trying to remove text from html page, and I'm using this code:
NSRange *r;
while ((r = [commentsOnly rangeOfString:#"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound) {
commentsOnly = [commentsOnly stringByReplacingCharactersInRange:r withString:#""];
NSLog(#"clearing");
}
It removes html tags perfect, but how can I remove only one tag? For example, title or p. I don't want to remove only my tag. I want to remove start tag (<p>), info between two tags and close tag (<\p>).
If I understand your question, may be this will help you:
NSString *string = #"</body>", *htmlString = #"ddsfsdf_<body>_sdfsfd_<body>ffff</body></body>";
NSRange range = [htmlString rangeOfString:string];
if (range.location != NSNotFound)
{
range.length += range.location;
range.location = 0;
string = #"<body>";
NSRange rangeOpen = [htmlString rangeOfString:string options:NSBackwardsSearch range:range];
if (rangeOpen.location != NSNotFound)
{
range.length -= rangeOpen.location;
range.location = rangeOpen.location;
htmlString = [htmlString stringByReplacingCharactersInRange:range withString:#""];
NSLog(#"%#", htmlString);
}
}
Use stringByEvaluatingJavaScriptFromString: to execute JavaScript within the UIWebView to do this. This is much less work and is also much more reliable, as it will use WebKit's HTML parser instead of naïve string replacement.

Searching for keywords in HTML

The iOS app I'm writing displays an HTML page, and I would like to add a search feature where the user can search for instances of a keyword and highlight them.
What's the best way to do this?
NSString *filePath = PATH_OF_HTML_FILE;
NSError *err = nil;
NSString *pageHTML = [NSString stringWithContentsOfFile:filePath encoding:NSUTF8StringEncoding error:&err];
if(err)
{
pageHTML = [NSString stringWithContentsOfFile:filePath encoding:NSASCIIStringEncoding error:&err];
}
if([searchTxtField.text length])
{
NSRange range1 = [pageHTML rangeOfString:searchTxtField.text options:NSCaseInsensitiveSearch];
if(range1.location != NSNotFound)
{
NSString *highlightedString = [pageHTML substringWithRange:range1];
pageHTML = [pageHTML stringByReplacingOccurrencesOfString:highlightedString withString:[NSString stringWithFormat:#"<span style=\"background-color:yellow; color:red;\">%#</span>",highlightedString] options:NSCaseInsensitiveSearch range:NSMakeRange(0, [pageHTML length]) ];
[webView loadHTMLString:pageHTML baseURL:[NSURL fileURLWithPath:filePath]];
}
}