Scan for images in website - Xcode - html

I am making an app which will give me the latest news, and the image. I achieve the text bit by making a scanner like this.
NSMutableURLRequest *request = [[NSMutableURLRequest alloc] init];
/* set headers, etc. on request if needed */
[request setURL:[NSURL URLWithString:#"http://stackoverflow.com/questions/22671347/nsuinteger-should-not-be-used-in-format-strings"]];
NSData *data = [NSURLConnection sendSynchronousRequest:request returningResponse:NULL error:NULL];
NSString *html = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
NSScanner *scanner = [NSScanner scannerWithString:html];
NSString *token = nil;
[scanner scanUpToString:#"<p>" intoString:NULL];
[scanner scanUpToString:#"</p>" intoString:&token];
int length = 3;
token = [token substringFromIndex:length];
textView.text = token;
Now I was wondering if I could use the same type of code to scan the website to find the first image and put it an image view. Also it don't have to be same type of code , post what ever you know and any method.
Summary is.
Want a piece of code that will scan a webpage, pick up the first image and place it in a image view.
Thanks for the people who take the time to help me.
THANKS AGAIN!!!
BYE!!!

NSScanner its not a HTML parser only intended for scanning values from NSString object. If you doing the odd scan you probably could get away with it, but it doesn't seem like...
The CORRECT approach is to use Libxml2 library included in Xcode which is only written is C which doesn't have any Objective-C/Swift wrapper. Libxml2 is the XML C parser and toolkit developed for the Gnome project. Alternatively i would recommend using open-source project such as HTMLReader. Its a HTML parser with CSS selectors in Objective-C and Foundation. It parses HTML just like a browser and is all written in Objective-c.
Example (using HTMLReader):
HTMLDocument *document = [HTMLDocument documentWithString:html]; // get your html string
NSLog(#"IMG: %#", [document firstNodeMatchingSelector:#"img"].textContent); // => image returned here
To find images just change the tag to < img > and your set!!
IF your using Libxml2 take a look at HTMLparser.c header file to parse and retrieve HTML ltags

Related

Display a preview of the website

I need to display a preview of the website with given url in the single image(e.g. like Facebook do in Messenger when you sending a url to someone). Is there any way to achieve that without actually loading the html file and reading it's metadata?
Try this,
NSString *urlStr = #"http://cdn.sstatic.net/Sites/stackoverflow/company/img/logos/so/so-icon.png?v=c78bd457575a";
NSURL *url=[NSURL URLWithString: urlStr];
NSData *data=[NSData dataWithContentsOfURL:url];
UIImage *image=[[UIImage alloc] initWithData:data];
UIImageView *imagview = [[UIImageView alloc]initWithImage:image];
imagview.frame = CGRectMake(250, 500, 100, 100);
[self.view addSubview:imagview];
This will give just thumb or image.
You can use URLEmbeddedView Library for more functionality. I think this is the library what you want.

Parsing a web page with TFHpple

I'm trying to write a very simple iOS app that will parse a webpage (http://arxiv.org/list/cond-mat/recent) and display a simplified version of it. I chose to use TFHpple to parse this page. I want to get titles of papers and display them in the TableViewController. The HTML container for paper descriptions looks like:
<div class="list-title">
<span class="descriptor">Title:</span> Encoding Complexity within Supramolecular Analogues of Frustrated Magnets
</div>
Function that I use to parse and get the values is the following (thanks to raywenderlich.com):
- (void) loadPapers{
NSURL *papersURL = [NSURL URLWithString:#"http://www.arxiv.org/list/cond-mat/recent"];
NSData *papersHTMLData = [NSData dataWithContentsOfURL:papersURL];
TFHpple *papersParser = [TFHpple hppleWithHTMLData:papersHTMLData];
NSString *papersXpathQueryString = #"//div[#class='list-title']";
NSArray *papersNodes = [papersParser searchWithXPathQuery:papersXpathQueryString];
NSMutableArray *newPapers = [[NSMutableArray alloc] initWithCapacity:0];
for (TFHppleElement *element in papersNodes){
Paper *paper = [[Paper alloc] init];
[newPapers addObject:paper];
paper.title = [[element firstChild] content];
}
_objects = newPapers;
[self.tableView reloadData];
}
This function is supposed to parse the entire HTML page and return data into TableView. However, when I try it returns empty objects into the paperNodes array. Basically, the number of the elements is correct (~25), but they're all empty and I am not sure why.
Any help is greatly appreciated! Thanks!
I have rewritten your code with HTMLKit. It looks like this:
NSURL *papersURL = [NSURL URLWithString:#"http://www.arxiv.org/list/cond-mat/recent"];
NSData *papersHTMLData = [NSData dataWithContentsOfURL:papersURL];
NSString *htmlString = [[NSString alloc] initWithData:papersHTMLData encoding:NSUTF8StringEncoding];
HTMLDocument *document = [HTMLDocument documentWithString:htmlString];
NSArray *divs = [document querySelectorAll:#"div[class='list-title']"];
for (HTMLElement *element in divs) {
NSLog(#"%#", element.textContent);
}
Back to your question in the comment:
Could you give some useful links that you find good to learn about HTMLKit?
You can check out the examples on the project's GitHub page. The source code is documented and using it is relatively straightforward. If you have basic HTML & CSS experience then using HTMLKit would be just as easy. Unfortunately there are no other resources it to learn it yet.
Probably the [element firstChild] is returning nil. I suggest you add some NSLog statements to track the data extraction and help you pinpoint the error.

Find link on webpage

I have a google webpage, with a search already loaded, and I need to find the first link on the webpage and get the information(the brief summary) under the link. I imagine that this requires some sort of HTML download of the webpage, and then a search through that file for a link tag, but I have no idea how to get a HTML file off of a webpage and save it using Xcode.
To get a HTML file off a webpage is very easy to do, just use NSStrings method +stringWithContentsOfURL:
NSError *error = nil;
NSString *html = [NSString stringWithContentsOfURL:[NSURL URLWithString:#"http://www.example.com"] encoding:NSUTF8StringEncoding error:&error];
if(error)
{
// oh, thats bad
}
Then you can search for the first link e.g. by using -rangeOfString
NSRange rangeOfLink = [html rangeOfString:#"bla"];
if (rangeOfLink.location == NSNotFound)
{
// that's bad, too
}

iOS - Showing App version in HTML page

I'm showing About in form of html in my iOS application. About.html is also bundled along with the application.
I want to show the application version in About html page automatically so that I dont have to edit the HTML manually everytime I bump the version.
Currently what I'm doing is as below:
I'm creating the html as <b>Version %#</b>
In Objective C code, I'm writing it as
NSString* aboutFilePath = [[NSBundle mainBundle] pathForResource:#"About" ofType:#"html"];
NSString* htmlStr = [NSString alloc] initWithContentsOfFile:aboutFilePath encoding:NSUTF8StringEncoding error:nil];
NSString* formattedStr = [NSString stringWithFormat:htmlStr, [self appNameAndVersionNumberDisplayString];
[self.webView loadHTMLString:formattedStr baseURL:nil];
- (NSString *)appNameAndVersionNumberDisplayString {
NSDictionary *infoDictionary = [[NSBundle mainBundle] infoDictionary];
NSString *appDisplayName = [infoDictionary objectForKey:#"CFBundleDisplayName"];
NSString *majorVersion = [infoDictionary objectForKey:#"CFBundleShortVersionString"];
NSString *minorVersion = [infoDictionary objectForKey:#"CFBundleVersion"];
return [NSString stringWithFormat:#"%#, Version %# (%#)",
appDisplayName, majorVersion, minorVersion];
}
Is it good way to do it or is there any better way to achieve it?
You are doing it in near to perfect manner. There is no problem in proceeding with your approach.
That's ok. If you want to localise the text in the future then that might change the order of the parameters in the text so using something more like SUB_VERSION as the identifier to be replaced and then using the string replacement methods (instead of format method) would be better.

Reading HTML content from a UIWebView

Is it possible to read the raw HTML content of a web page that has been loaded into a UIWebView?
If not, is there another way to pull raw HTML content from a web page in the iPhone SDK (such as an equivalent of the .NET WebClient::openRead)?
The second question is actually easier to answer. Look at the stringWithContentsOfURL:encoding:error: method of NSString - it lets you pass in a URL as an instance of NSURL (which can easily be instantiated from NSString) and returns a string with the complete contents of the page at that URL. For example:
NSString *googleString = #"http://www.google.com";
NSURL *googleURL = [NSURL URLWithString:googleString];
NSError *error;
NSString *googlePage = [NSString stringWithContentsOfURL:googleURL
encoding:NSASCIIStringEncoding
error:&error];
After running this code, googlePage will contain the HTML for www.google.com, and error will contain any errors encountered in the fetch. (You should check the contents of error after the fetch.)
Going the other way (from a UIWebView) is a bit trickier, but is basically the same concept. You'll have to pull the request from the view, then do the fetch as before:
NSURL *requestURL = [[yourWebView request] URL];
NSError *error;
NSString *page = [NSString stringWithContentsOfURL:requestURL
encoding:NSASCIIStringEncoding
error:&error];
EDIT: Both these methods take a performance hit, however, since they do the request twice. You can get around this by grabbing the content from a currently-loaded UIWebView using its stringByEvaluatingJavascriptFromString: method, as such:
NSString *html = [yourWebView stringByEvaluatingJavaScriptFromString:
#"document.body.innerHTML"];
This will grab the current HTML contents of the view using the Document Object Model, parse the JavaScript, then give it to you as an NSString* of HTML.
Another way is to do your request programmatically first, then load the UIWebView from what you requested. Let's say you take the second example above, where you have NSString *page as the result of a call to stringWithContentsOfURL:encoding:error:. You can then push that string into the web view using loadHTMLString:baseURL:, assuming you also held on to the NSURL you requested:
[yourWebView loadHTMLString:page baseURL:requestURL];
I'm not sure, however, if this will run JavaScript found in the page you load (the method name, loadHTMLString, is somewhat ambiguous, and the docs don't say much about it).
For more info:
UIWebView class reference
NSString class reference
NSURL class reference
if you want to extract the contents of an already-loaded UIWebView, -stringByEvaluatingJavaScriptFromString. For example:
NSString *html = [webView stringByEvaluatingJavaScriptFromString: #"document.body.innerHTML"];
To get the whole HTML raw data (with <head> and <body>):
NSString *html = [webView stringByEvaluatingJavaScriptFromString:#"document.documentElement.outerHTML"];
Note that the NSString stringWithContentsOfURL will report a totally different user-agent string than the UIWebView making the same request. So if your server is user-agent aware, and sending back different html depending on who is asking for it, you may not get correct results this way.
Also note that the #"document.body.innerHTML" mentioned above will only display what is in the body tag. If you use #"document.all[0].innerHTML" you will get both head and body. Which is still not the complete contents of the UIWebView, since it will not get back the !doctype or html tags, but it is a lot closer.
To read:-
NSString *html = [myWebView stringByEvaluatingJavaScriptFromString: #"document.getElementById('your div id').textContent"];
NSLog(html);
To modify:-
html = [myWebView stringByEvaluatingJavaScriptFromString: #"document.getElementById('your div id').textContent=''"];
In Swift v3:
let doc = webView.stringByEvaluatingJavaScript(from: "document.documentElement.outerHTML")
(Xcode 5 iOS 7) Universal App example for iOS 7 and Xcode 5. It is an open source project / example located here: Link to SimpleWebView (Project Zip and Source Code Example)
I use a swift extension like this:
extension UIWebView {
var htmlContent:String? {
return self.stringByEvaluatingJavaScript(from: "document.documentElement.outerHTML")
}
}
you should try this:
document.documentElement.outerHTML
UIWebView
get HTML from UIWebView`
let content = uiWebView.stringByEvaluatingJavaScript(from: "document.body.innerHTML")
set HTML into UIWebView
//Do not forget to extend a class from `UIWebViewDelegate` and nil the delegate
func someFunction() {
let uiWebView = UIWebView()
uiWebView.loadHTMLString("<html><body></body></html>", baseURL: nil)
uiWebView.delegate = self as? UIWebViewDelegate
}
func webViewDidFinishLoad(_ webView: UIWebView) {
//ready to be processed
}
[get/set HTML from WKWebView]