Get proper format of the text file in HTML form - html

I am using the following code to convert text to pdf form:
NSString *filePath = [[NSBundle mainBundle] pathForResource:#"All_lang_unicode" ofType:#"txt"];
NSString *str;
NSData *myData = [NSData dataWithContentsOfFile:filePath];
if (myData) {
str = [[NSString alloc] initWithData:myData encoding:NSUTF16StringEncoding];
NSLog(#"STRING : %#",str);
}
NSString *html = [NSString stringWithFormat:#"<body>%#</body>",str];
UIMarkupTextPrintFormatter *fmt = [[UIMarkupTextPrintFormatter alloc]
initWithMarkupText:html];
UIPrintPageRenderer *render = [[UIPrintPageRenderer alloc] init];
[render addPrintFormatter:fmt startingAtPageAtIndex:0];
CGRect page;
page.origin.x=0;
page.origin.y=0;
page.size.width=792;
page.size.height=612;
CGRect printable=CGRectInset( page, 0, 0 );
[render setValue:[NSValue valueWithCGRect:page] forKey:#"paperRect"];
[render setValue:[NSValue valueWithCGRect:printable] forKey:#"printableRect"];
NSLog(#"number of pages %d",[render numberOfPages]);
NSMutableData * pdfData = [NSMutableData data];
UIGraphicsBeginPDFContextToData( pdfData, CGRectZero, nil );
for (NSInteger i=0; i < [render numberOfPages]; i++)
{
UIGraphicsBeginPDFPage();
CGRect bounds = UIGraphicsGetPDFContextBounds();
[render drawPageAtIndex:i inRect:bounds];
}
UIGraphicsEndPDFContext();
NSArray *paths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES);
NSString *documentsDirectory = [paths objectAtIndex:0];
NSString * pdfFile = [documentsDirectory stringByAppendingPathComponent:#"test.pdf"];
[pdfData writeToFile:pdfFile atomically:YES];
But problem is that I am not getting the proper formatting of the text. when I print using NSLog(); I get the proper content but when I place the string in STRING the spacing and newline is missing.. all coming in same line. i.e. continuous.
(UPDATE : )
NSLog OUTPUT:(Proper)
NEW DELHI: Sachin Tendulkar's streak of low scores might have raised a question mark over his future but senior BCCI official and IPL chairman Rajiv Shukla on Monday came out in support of the senior batsman saying one needs to look at his "colossal record" before making any comment.
"He will hang up his boots when he thinks it's time for him to go. He does not need any advice on this. Before making a comment on his performance you have to see his colossal record and his past performance," Shukla told reporters outside the Parliament adding that the veteran cricketer will come back strongly in the forthcoming matches.
and Im getting as:
NEW DELHI: Sachin Tendulkar's streak of low scores might have raised a question mark over his future but senior BCCI official and IPL chairman Rajiv Shukla on Monday came out in support of the senior batsman saying one needs to look at his "colossal record" before making any comment. "He will hang up his boots when he thinks it's time for him to go. He does not need any advice on this. Before making a comment on his performance you have to see his colossal record and his past performance," Shukla told reporters outside the Parliament adding that the veteran cricketer will come back strongly in the forthcoming matches.
Can any one please suggest modification in this code so that I can get the proper format.

If I get it right, you should replace your new line characters with <br> or <p>.
Try
str = [str stringByReplacingOccurrencesOfString:#"\n" withString:#"<br>"];
How to detect new lines in Objective-C
Solution of your next question might look like this:
NSArray *words = [str componentsSeparatedByString:#" "];
NSString *line = #"";
NSUInteger maxLineLength = 100;
NSString *resultStr = #"";
for (NSString *word in words) {
if ([line length] + [word length] > maxLineLength) {
resultStr = [resultStr stringByAppendingFormat:#"%#<br>", line];
line = word;
} else {
line = [line stringByAppendingFormat:#" %#", word];
}
}
resultStr = [resultStr stringByAppendingString:line];

Related

Is this a legal/safe way to pull data from websites on iOS?

After playing around with a few different ways to pull website data I developed this simple and quick solution that appears to work well:
int zip = 13153;
int lowerBound = 10000;
int upperBound = 99999;
bool foundValidZip;
#implementation ViewController
- (void)viewDidLoad {
[super viewDidLoad];
while (foundValidZip == false) {
zip = lowerBound + arc4random() % (upperBound - lowerBound);
// Do any additional setup after loading the view, typically from a nib.
NSString *urString = [NSString stringWithFormat:#"http://www.zip-info.com/cgi-local/zipsrch.exe?zip=%i&Go=Go",zip];
NSURL *URL = [NSURL URLWithString:urString];
NSData *data = [NSData dataWithContentsOfURL:URL];
// Assuming data is in UTF8.
NSString *html = [NSString stringWithUTF8String:[data bytes]];
NSLog(#"%#",html);
NSMutableArray *names = [self stringsBetweenString:#"</th></tr><tr><td align=center>" andString:#"</font></td>" andText:html];
NSMutableArray *states = [self stringsBetweenString:#"</font></td><td align=center>" andString:#"</font></td><td align=center>" andText:html];
if ([names count] > 0 && [states count] > 0) {
NSString *name = [names objectAtIndex:0];
NSString *state = [states objectAtIndex:0];
self.nameLabel.text = name;
self.stateLabel.text = state;
self.zipLabel.text = [NSString stringWithFormat:#"%i",zip];
foundValidZip = true;
}
else {
foundValidZip = false;
}
}
}
-(NSMutableArray*)stringsBetweenString:(NSString*)start andString:(NSString*)end andText:(NSString*)text {
NSMutableArray* strings = [NSMutableArray arrayWithCapacity:0];
NSRange startRange = [text rangeOfString:start];
for( ;; )
{
if (startRange.location != NSNotFound)
{
NSRange targetRange;
targetRange.location = startRange.location + startRange.length;
targetRange.length = [text length] - targetRange.location;
NSRange endRange = [text rangeOfString:end options:0 range:targetRange];
if (endRange.location != NSNotFound)
{
targetRange.length = endRange.location - targetRange.location;
[strings addObject:[text substringWithRange:targetRange]];
NSRange restOfString;
restOfString.location = endRange.location + endRange.length;
restOfString.length = [text length] - restOfString.location;
startRange = [text rangeOfString:start options:0 range:restOfString];
}
else
{
break;
}
}
else
{
break;
}
}
NSLog(#"%#",strings);
return strings;
}
Essentially what this is doing is querying a website that looks up the city that a ZIP codes are associated with, then fetching the HTML for a random ZIP code. The program then extracts specific bits of information from that HTML data by searching for text between a unique set of front and end "caps". I've used this "cap" method for a few other sample applications. Some of these do not actually query the website, but fetch data off of a static URL that is updated frequently. One of the only pitfalls I can see here is that if the HTML changes, this may not work. But other than that, it seems to work really well and is extremely quick. Before I publish any of my applications, I want to ensure that a large amount of queries will not damage the websites, or other disadvantages for both me and the webmaster. Is this OK to do? And is there a better alternative? (not for this specific purpose - ZIP codes - but just for pulls in general)
What you're doing is called scraping the web site / page. It's a general approach, but one that isn't ideal and comes with a number of pitfalls...
Generally speaking, you're better off not having any scraping code inside your app, because your app will take quite a while to change and redeploy to the store if the website changes and you need to update.
So, it's best to either have a server of your own do the scraping and then provide your 'sanitised' version of the data to the app, or to use a reconfigurable 3rd party service (like Kimono, I've never used it but the website is colourful) to abstract your app from the nitty gritty.
As for the users, your app / service is just like a normal user, so the website needs to be able to handle the number of users in general.
I agree with the comment from #paulw11 about legality if you don't own / have a relationship with the website involved - you should have a relationship with them...

Loading Multiple lines of HTML parse into one UITextField

I have a set of HTML code, here:
<div id="content_text">
<p>Year 11 students will be making their course selections online this year.
</p>
<p>Information about this system has been made available through Tutor sessions. Each student will have an individual password. Once subject selections have been made students are to print out a copy of their choices and then have this form signed by themselves, their parent and their Tutor. Forms are to be completed by 22 August. Course books can be borrowed from the Library or are available online.
Now my problem is, is that this is fed from an RSS FEED article web page and there may be 1 or even 11 <p> tags within this one <div id="content_text">. How can I fetch all of the <p> in this divider and display them formatted into a UITextField?
I am currently using the XPathQuery, btw so currently my parse looks like this:
NSData *tutorialsHtmlDataTwo = [NSData dataWithContentsOfURL:[NSURL URLWithString:_storyLink]];
TFHpple *tutorialsParserTwo = [TFHpple hppleWithHTMLData:tutorialsHtmlDataTwo];
NSString *tutorialsXpathQueryStringTwo = #"//div[#id='content_text']/p";
NSArray *tutorialsNodesTwo = [tutorialsParserTwo searchWithXPathQuery:tutorialsXpathQueryStringTwo];
NSMutableArray *newTutorialsTwo = [[NSMutableArray alloc] initWithCapacity:0];
for (TFHppleElement *element in tutorialsNodesTwo) {
Tutorial *tutorialTwo = [[Tutorial alloc] init];
[newTutorialsTwo addObject:tutorialTwo];
tutorialTwo.title = [[element firstChild] content];
_rssBody.text = [NSString stringWithFormat:#"%#", [[element firstChild] content]];
}
So as you can see it will only parse the second line. Any help appreciated.
Thanks, SebOH.
Please use this query to find all the elements inside given element.
div[#id='content_text']

Trying to pull tabledata out from html

Basically I need to parse td(table data) from this html file.I need to get the right xpath.I am using raywenderlich as a model for this task, and here is the code I have so far.
NSURL *tutorialsUrl = [NSURL URLWithString:#"http://example.com/events];
NSData *tutorialsHtmlData = [NSData dataWithContentsOfURL:tutorialsUrl];
// 2
TFHpple *tutorialsParser = [TFHpple hppleWithHTMLData:tutorialsHtmlData];
// 3
NSString *tutorialsXpathQueryString = #"This is where I need to enter my xpath to rerieve the table data";
NSArray *tutorialsNodes = [tutorialsParser searchWithXPathQuery:tutorialsXpathQueryString];
I have the html path to this element thanks to firebug,which I will post below.
/<html lang="en">/<body>/div id="page" class="container">/<div class="span-19">/<div id="content">/<div>/<table id=yw0 class="detail-view">/<tbody>/<tr class="even">/<td>moo</td>/
I need the text moo to be parsed. Any help will be deeply appreciated.
this is the x path I get from firebug as well, but it didn't work at all.
/html/body/div/div[4]/div/div/table/tbody/tr[2]/td
At first, you need to get substrings, where each substring contains one element that needs to be extracted:
NSArray *split = [text componentsSeparatedByString:#"<td>"];
In array "split", first object contains nothing you want, so you will not work with it anymore. Now, for each substring in this array (except first one) you need to search for substring with "/td" tag:
NSRange range = [string rangeOfString:#"</td>"];
and then remove it and everything what is behind it:
- (NSString *)substringToIndex:(NSUInteger)anIndex //you will get index by searching for "</td>" as mentioned
EDIT:
Another possibility is to use componentsSeparatedByString even instead of 2nd and 3rd step for mentioned tag and in first item of each array, you will have wanted text.
EDIT2: (whole code)
NSString* originalText = #" /<html lang=""en"">/<body>/div id=""page"" class=""container"">/<div class=""span-19"">/<div id=""content"">/<div>/<table id=yw0 class=""detail-view"">/<tbody>/<tr class=""even"">/<td>moo1</td><td>moo2</td>/";
NSArray* separatedParts = [originalText componentsSeparatedByString:#"<td>"];
NSMutableArray* arrayOfResults = [[NSMutableArray alloc] init];
for (int i = 1; i < separatedParts.count; i++) {
NSRange range = [[separatedParts objectAtIndex:i] rangeOfString:#"</td>"];
NSString *partialResult = [[separatedParts objectAtIndex:i] substringToIndex:range.location];
[arrayOfResults addObject:partialResult];
}
I have slightly altered original text to show that its really working for table with more items inside

How can I parse tables in HTML?

I'm trying to parse an HTML page with a lot of tables. I've searched the net on how to parse HTML with Objective C and I found hpple. I'd look for a tutorial which lead me to:
http://www.raywenderlich.com/14172/how-to-parse-html-on-ios
With this tutorial I tried to parse some forum news which has a lot of tables from this site (Hebrew): news forum
I tried to parse the news title, but I don't know what to write in my code. Every time I try to reach the path I get, "Nodes was nil."
The code of my latest attempt is:
NSURL *contributorsUrl = [NSURL URLWithString:#"http://rotter.net/cgi-bin/listforum.pl"];
NSData *contributorsHtmlData = [NSData dataWithContentsOfURL:contributorsUrl];
// 2
TFHpple *contributorsParser = [TFHpple hppleWithHTMLData:contributorsHtmlData];
// 3
NSString *contributorsXpathQueryString = #"//body/div/center/center/table[#cellspacing=0]/tbody/tr/td/table[#cellspacing=1]/tbody/tr[#bgcolor='#FDFDFD']/td[#align='right']/font[#class='text15bn']/font[#face='Arial']/a/b";
NSArray *contributorsNodes = [contributorsParser searchWithXPathQuery:contributorsXpathQueryString];
// 4
NSMutableArray *newContributors = [[NSMutableArray alloc] initWithCapacity:0];
for (TFHppleElement *element in contributorsNodes) {
// 5
Contributor *contributor = [[Contributor alloc] init];
[newContributors addObject:contributor];
// 6
Could somebody guide me through to getting the titles?
Not sure if that's the option for you, but if desired table have unique id's you could use a messy approach: load that html into UIWebView and get contents via – stringByEvaluatingJavaScriptFromString: like this:
// desired table container's id is "msg"
NSString* value = [webView stringByEvaluatingJavaScriptFromString:#"document.getElementById('msg').innerHTML"];

iOS: Strip <img...> from NSString (a html string)

So I have an NSString which is basically an html string with all the usual html elements. The specific thing I would like to do is to just strip it from all the img tags.
The img tags may or may not have max-width, style or other attributes so I do not know their length up front. They always end with />
How could I do this?
EDIT: Based on nicolasthenoz's answer, I came up with a solution that requires less code:
NSString *HTMLTagss = #"<img[^>]*>"; //regex to remove img tag
NSString *stringWithoutImage = [htmlString stringByReplacingOccurrencesOfRegex:HTMLTagss withString:#""];
You can use the NSString method stringByReplacingOccurrencesOfString with the NSRegularExpressionSearch option:
NSString *result = [html stringByReplacingOccurrencesOfString:#"<img[^>]*>" withString:#"" options:NSCaseInsensitiveSearch | NSRegularExpressionSearch range:NSMakeRange(0, [html length])];
Or you can also use the replaceMatchesInString method of NSRegularExpression. Thus, assuming you have your html in a NSMutableString *html, you can:
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"<img[^>]*>"
options:NSRegularExpressionCaseInsensitive
error:nil];
[regex replaceMatchesInString:html
options:0
range:NSMakeRange(0, html.length)
withTemplate:#""];
I'd personally lean towards one of these options over the stringByReplacingOccurrencesOfRegex method of RegexKitLite. There's no need to introduce a third-party library for something as simple as this unless there was some other compelling issue.
Use a regular expression, find the matchs in your string and remove them !
Here is how
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"<img[^>]*>"
options:NSRegularExpressionCaseInsensitive
error:nil];
NSMutableString* mutableString = [yourStringToStripFrom mutableCopy];
NSInteger offset = 0; // keeps track of range changes in the string due to replacements.
for (NSTextCheckingResult* result in [regex matchesInString:yourStringToStripFrom
options:0
range:NSMakeRange(0, [yourStringToStripFrom length])]) {
NSRange resultRange = [result range];
resultRange.location += offset;
NSString* match = [regex replacementStringForResult:result
inString:mutableString
offset:offset
template:#"$0"];
// make the replacement
[mutableString replaceCharactersInRange:resultRange withString:#""];
// update the offset based on the replacement
offset += ([match length] - resultRange.length);
}
You can use below function in Swift 4,5:
func filterImgTag(text: String) -> String{
return text.replacingOccurrences(of: "<img[^>]*>", with: "", options: String.CompareOptions.regularExpression)
}
Hope it can help you all! comment below if it work for you. Thanks.