How to convert non-well formed html to xhtml on iOS - html

Does anyone know how to convert non-well formed html to XHTML/XML on iOS?
I want to use xpath to parse an HTML page, but is not a well formed XML/XHTML.

Do you really need to convert it, or just use xpath? https://github.com/topfunky/hpple or some other tools can parse non-well-formed html.

Related

Objective-C event-driven HTML parsing

I need to be able to parse HTML snippets in an event-driven way. For example, if the parser finds a HTML tag, it should notify me and pass the HTML tag, value, attributes etc. to a delegate. I cannot use NSXMLParser because I have messy HTML. Is there a useful library for that?
What I want to do is parse the HTML and create a NSAttributedArray and display it in a UITextView.
YES you can parse HTML content of file.
If you want to get specific value from HTML content you need to Parce HTML content by using Hpple. Also This is documentation with exmple that are is for parse HTML. Another way is rexeg but it is more complicated so this is best way in your case.

HTML video detection

I got a HTML-document and I want to extract every single URL of a video-file. Whats the best way to do this, since there are different HTML-versions and different possibilities to embed a video-file into a HTML-document. For this purpose I'd use the Html Agility Pack (c#).
You should parse the html with a regular expression for getting the video URL's.

NSXMLParser cannot handle some HTML entities like Ã

Has anyone run into the XML parser just ending when it encounters HTML entities like Ã?
Thanks
Deshawn
Yes, this problem is because the the XMLParser is using element validation, which will break if it sees an html element like the one you described. If you want to parse HTML you will want to use Hpple.
Check out this post for more information.
parsing HTML on the iPhone

Generate a xml from a html

Im trying to generate a xml from a html (url). The html website have a formulary that i want to get into a xml archive, but its too long and im searching a way to do it easier.
There is a method to generate a xml with all the fields, etc, from a html?
you can also use an html parser and print out the objects / array as xml
try this: http://sourceforge.net/projects/html2xml/
You can try the free dotnet-classlibrary SgmlReader that can load html into a xmldocument. This in turn can be saved as xml.

how to convert the webpage to xml document using java?

The assumption is the webpage is coded with correct tags. How can I Convert it to the XML file? I think the most webpages can be viewed as dom tree...How can I convert it to XML file?
JTidy reads HTML and presents it as a DOM. Once you have your HTML as a DOM you should be able to process it and write it out as XML.
To output a DOM, see the example code here and the XMLSerializer in particular.