Objective-C event-driven HTML parsing - html

I need to be able to parse HTML snippets in an event-driven way. For example, if the parser finds a HTML tag, it should notify me and pass the HTML tag, value, attributes etc. to a delegate. I cannot use NSXMLParser because I have messy HTML. Is there a useful library for that?
What I want to do is parse the HTML and create a NSAttributedArray and display it in a UITextView.

YES you can parse HTML content of file.
If you want to get specific value from HTML content you need to Parce HTML content by using Hpple. Also This is documentation with exmple that are is for parse HTML. Another way is rexeg but it is more complicated so this is best way in your case.

Related

html tag becomes to uppercase when get innerHTML by webbrowser

All html tags goes to uppercase letters when get body's innerHTML by webbrowser.
how should I do to get the real htmlcode?
There is no way to get the "real" HTML code from innerHTML.
By the time you read innerHTML, the HTML has already been parsed to a DOM and then discarded. innerHTML will serialise the DOM to HTML, it won't give you the original HTML.
The only way to get the original HTML would be to use XMLHttpRequest to get a fresh copy of the source code from the server and then parse it with a custom parser instead of the one built into the browser.

HTML parsing in Clojure

I'm looking for a good way to parse HTML in Clojure.
Exactly what I'm trying to do is get content of a web page with crawler and then get content of some HTML tags or their attributes.
So I have URL to the page, and I get html as String, but how do get data I need?
Use https://github.com/cgrand/enlive
It allows you to select and retrieve with CSS-alike selectors.
Or https://github.com/nathell/clj-tagsoup
I am not experienced with tag-soup but I can tell that enlive works well for most scraping.

Is XDocument appropriate for generating HTML?

My goal is to use a WCF Service to accept some parameters, generate and then return a HTML string that a user can use to embed on certain webpages of their choice.
Is XDocument appropriate for generating a string of HTML?
I do not really need a full HTML document, just a simple snippet that has some image elements, a p-tag element, and a table element.
It's suitable for generating XHTML, which is valid XML. It wouldn't be suitable for parsing HTML, which doesn't have to be a valid XML document.
There may be more HTML-specific APIs available, but for just a simple snippet of XHTML, using XDocument should be fine.

NSXMLParser cannot handle some HTML entities like Ã

Has anyone run into the XML parser just ending when it encounters HTML entities like Ã?
Thanks
Deshawn
Yes, this problem is because the the XMLParser is using element validation, which will break if it sees an html element like the one you described. If you want to parse HTML you will want to use Hpple.
Check out this post for more information.
parsing HTML on the iPhone

parse html in adobe air

I am trying to load and parse html in adobe air. The main purpose being to extract title, meta tags and links. I have been trying the HTMLLoader but I get all sort of errors, mainly javascript uncaught exceptions.
I also tried to load the html content directly (using URLLoader) and push the text into HTMLLoader (using loadString(...)) but got the same error. Last resort was to try and load the text into xml and then use E4X queries or xpath, no luck there cause the html is not well formed.
My questions are:
Is there simple and reliable (air/action script) DOM component there (I do not need to display the page and headless mode will do)?
Is there any library to convert (crappy) html into well formed xml so I can use xpath/E4X
Any other suggestions on how to do this?
thx
ActionScript is supposed to be a superset of JavaScript, and thankfully, there's...
Pure JavaScript/ActionScript HTML Parser
created by Javascript guru and jQuery creator John Resig :-)
One approach is to run the HTML through HTMLtoXML() then use E4X as you please :)
Afaik:
No :-(
No :-(
I think the easiest way to grab title and meta tags is writing some regular expressions. You can load the page's HTML code into a string and then read out whatever you need like this:
var str:String = ""; // put HTML code in here
var pattern:RegExp = /<title>(.+)<\/title>/i;
trace(pattern.exec(str));