How to parse these html string in objective c - html

I'm getting an string from json response. In the string there are HTML tags,
<ul class='video-list-container'><li><a href='?page_id=6602&playid=6'>مكتبة الفيديو</a></li><li><a href='?page_id=6602&playid=10'>الزكاة</a></li><li><a href='?page_id=6602&playid=11'>الصلاة</a></li></ul>\n
Now all I want to do is to do is, get the arabic string with the href link and display in the UITableviewcell. And when selected row should move to next viewcontroller.
How can do this. I'm struck in parsing the string.
Can anyone help me.
Thanks In advance.

I think this tutorials might help you ,
AFNetworking is smart enough to load and process structured data over the network, as well as plain old HTTP requests. In particular, it supports JSON, XML and Property Lists (plists).
you can follow the tutorial given below for more clarification
http://www.raywenderlich.com/59255/afnetworking-2-0-tutorial

You should try using the library "hpple" which is a wrapper around "libxml2". You can find hpple here.
Hpple allows you to parse HTML in Obj-C using XPath expressions. With XPath selectors you will be able to select what content you want to extract out of your HTML. You can read and learn more about XPath here.
Hope it helps. :)
Cheers!

Related

How to convert html to json with C#?

My query is that I want to convert html to json with C#. Is there any way to do it. I searched a lot and found articles related to using Javascript Serializer and Newtonsoft to serialize the html string to json. But these serializers do nothing except adding a opening and closing curly braces around the html string. I don't want that. I want to convert whole html to json so that I can get relevant information from the html using C# objects instead of parsing html with regular exressions. Html can be any valid html from any website available on the internet. I am getting the html using http request & response objects using C#.
Please don't suggest using html agility pack because that will also do the same thing that Serialization does.
If anybody have any idea how to do this with C# then please share your ideas.
I will tell why your question can cause confusion.
Consider example of html:
<html>
<body>
<p> example of paragraph </p>
</body>
</html>
Example of json:
{"employees":[
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"},
{"firstName":"Peter", "lastName":"Jones"}
]}
json is something, on which html is generated or initial fundament. So when you say you want to convert html to json it's really confusing, because it is impossible to figure out according to which rules you want to make this conversion. Or which tags from html should be ignored/added while creating json.
There is a javascript example of the solution here: Map HTML to JSON
DOM Parsers are pretty similar so you can try implementing it in C#. (I'd be interested in such implementation as well :D )

parsing wikipedia page content

I'm looking for a library to parse html pages, specifically wikipedia articles for example: http://en.wikipedia.org/wiki/Railgun, I want to extract the article's text and images (full scale or original image not the thumb).
Is there an html parser out there ?
I would prefer not to use the wikimedia api since I can't seem to figure out how to extract an article's text and the fullsize images with them.
Thanks and sorry for my english.
EDIT: I forgot to say that the ending result should be valid html
EDIT: I got the json string with this: https://en.wikipedia.org/w/api.php?action=parse&pageid=218930&prop=text&format=json so now I need to parse the json.
I know that in javascript I can do something like this:
var pageHTML = JSON.parse("the json string").parse.text["*"];
Since I know a bit of html/javascript and python, how can I make that http request and parse the json in python 3 ?
I think you should be able to get everything with the webapi,
https://www.mediawiki.org/wiki/API:Main_page
https://www.mediawiki.org/wiki/API:Parsing_wikitext
or you could download the whole wikipedia
https://meta.wikimedia.org/wiki/Research:Data
You can get the html from the api too, check the info on https://www.mediawiki.org/wiki/Extension:TextExtracts/pt, it's like this example: https://en.wikipedia.org/w/api.php?action=query&prop=extracts&exchars=175&titles=hello%20world .
Depending on how many pages you'll need, you should consider using public dumps if the volume of pages is high.
I made a Node.js module called wikipedia-to-json (written in javascript) that parses the HTML in wikipedia articles and gives you back structed JSON objects that describe the layout of the article in-order. (titles, paragraphs, images, lists, sub-titles...)
That might be useful if you just want to do a quick extractions of text and sections and understand how things look like.

HTML parsing in Clojure

I'm looking for a good way to parse HTML in Clojure.
Exactly what I'm trying to do is get content of a web page with crawler and then get content of some HTML tags or their attributes.
So I have URL to the page, and I get html as String, but how do get data I need?
Use https://github.com/cgrand/enlive
It allows you to select and retrieve with CSS-alike selectors.
Or https://github.com/nathell/clj-tagsoup
I am not experienced with tag-soup but I can tell that enlive works well for most scraping.

How convert Html into Prolog

How convert Html into Prolog?
I need to extract from an html page its tag and i describe it into Prolog.
Example, if my file contains this html code
<title>Prove<title>
<select id="data_nastere_zi" name="data_nastere_zi">
i should get
title(Prove),
select(id(data_nastere_zi)).
I tried to see various library but i couldn't.
Thanks.
You can parse well formed HTML using SWI-Prolog library(sgml), in particular load_html/2.
My experience, scraping 'real world' websites, isn't really pleasant, because of insufficient error handling.
Anyway, when you will have loaded the page structure, you will have available library(xpath) to inspect such complex data.
edit getting a table inside a div:
xpath(Page, //div, Div),
xpath(Div, //table, Table)...
SWI-Prolog has a package for SGML/XML parsing based on the SWI-Prolog interface to SP by Anjo Anjewierden: "SWI-Prolog SGML/XML parser".

Convert html into string

How can I convert Html into string. For example I have html: <p>This is Test</p><p></p><p>Test</p>. I want to convert html into this:
This is a test
Test
I don't want the <p> tag to be printed on the screen but I want them to behave as actual paragraphs.
I have tried HtmlDecode but that doesn't work either. I am getting the string from mvc telerik editor.
Any help would be appreciated. Thanks :)
Update:
I did partially solve my problem by using HTML.Raw in my view which converted the html into string. I was wondering if there is any equivalent to Html.Raw that I can use on the server side too i.e. in my controller??
You need to use a parser. HTMLAgility pack is a tool that is constantly recommended here.