Getting data from a website from a unique html class - html

How am I able to get specific data from a website? If it helps then the data I need is labeled under a unique html class.

Get a web page.
Make it a DOM structure.
Traverse it with xPath: //*[#class='target_class']
Output results.
If you share a language that you use, I may give you some posts with examples. For php read here.

You can use Beautifulsoup to do what you want in python.You can scrape data from a specific html class.See here http://www.crummy.com/software/BeautifulSoup/bs4/doc/

Related

Openrefine cannot fetch html code inside accordion

I know that openrefine is not a perfect tool for web scraping but looking for some helps from the first step.
I cannot collect the full html codes from openrefine when I add column by fetching url (https://profiles.health.ny.gov/hospital/view/103094). They do not incorporate any codes under accordion such as services, bed types, and etc.
Any idea to get the full codes by fetching in openrefine?
I am trying to collect information under administrative, whose Xpath is "//div[4]/div/ul/li" ("div#AdministrativeBox.in.collapse")
This website loads its content dynamically using Javascript. The information that interests you is not stored in the source code of the page, so Open Refine cannot extract it.
However, there is a workaround. If you transform your URLs with the GREL formula value.replace('view', 'tab_overview'), you will get scrapable pages like this one.
Note that OpenRefine does not use Xpath, but JSOUP selectors. To get the elements of the "Administrative" block, you can use this GREL formula.
forEach(value.parseHtml().select('#AdministrativeBox li'), e, e.htmlText()).join(',')
Result:

Automate Web Applications -parsing HTML Data

I just want to automate a web application, where that application parses the HTML page and pulls all the HTML Tags inner text based on some condition like if we have a tag called Span Example has given whose class="spanclass_1"
This is span tag...
which has particular class id. so that app parses and pulls that span into it.
And here the main pain area is, I should not use the developer code to automate that same parsing the HTML.
I want to automate that parsing done correctly, simply by using the parsed data which is shown in UI.
Any help, would be great.
Appreciating your time reading this.
(Note span tag is not shown)
Thanks buddies.
not enough details.
is this html page just a file in local filesystem on it is internet webpage?
do u have access to pages? can u modify it ? if answer yes, that just add javascript to page which will extract data and post to server.
if answer not, than it depends on language u use to programm.
Find good framework to parse html. load page parse it and extract data. Several situation can be there.
Worse scenario - page generated on client side using js.
Best scenario - page is in xhtml mode( u are lucky. any xml parser will help to build dom and extract data)
So so - page is simple html format (try several html parser to find most suitable for u)

HTML parsing in Clojure

I'm looking for a good way to parse HTML in Clojure.
Exactly what I'm trying to do is get content of a web page with crawler and then get content of some HTML tags or their attributes.
So I have URL to the page, and I get html as String, but how do get data I need?
Use https://github.com/cgrand/enlive
It allows you to select and retrieve with CSS-alike selectors.
Or https://github.com/nathell/clj-tagsoup
I am not experienced with tag-soup but I can tell that enlive works well for most scraping.

How to display XML data(list) in HTML(CSS)

I want build a website that have two "table" to display two lists of data that stored in XML.
Category and items(think todo list. one category has more then one item)
I'm not sure how to display the list of data. What kind specific technique should I use? Can I only use HTML and CSS to achieve this(My friend said I have to use javascript).
U need to use AJAX for that purpose.
AJAX stands for Asynchronous Javascript XML...
This will help you to get your XML data loaded on Web Pages

XML data to be sorted by HTML Form

I would like to do a basic search functionality but the data I am using is XML - How would I got about making a search form that can filter the data from the XML and display it in HTML whether it meets the HTML form's criteria.
I have not yet tried to use a HTML form to display the XML data as I do not know how to do this, therefore I am asking for direction or examples that may be elsewhere in the web, as I ma having trouble doing so.
<UKPRN>10004048</UKPRN>
<ACCOMURL>http://www.londonmet.ac.uk/accommodation/</ACCOMURL>
<PRIVATELOWER>5000</PRIVATELOWER>
<PRIVATEUPPER>8300</PRIVATEUPPER>
<COUNTRY>XF</COUNTRY>
<NSP>1</NSP>
<Q24>51</Q24>
<Q24POP>2242</Q24POP>
<KISCOURSE>
<TITLE>FdSc Crime Scene and Forensic Investigation</TITLE>
<UCASCOURSEID>F411</UCASCOURSEID>
<VARFEE>10</VARFEE>
<FEETBC>1</FEETBC>
<WAIVER>0</WAIVER>
<MEANSSUP>0</MEANSSUP>
<OTHSUP>0</OTHSUP>
<ENGFEE>5700</ENGFEE>
</KISCOURSE>
There is just a basic look at my XML set up - any help is appreciated!
EDIT: Further to the question asked below, The XML data is structured as above, and I wish for some PHP or Html Form to search the Title of the title KISCOURSE and then display relevant searches.
Since this is rather a bigger project than a short question my answer just aims to give you some ideas on how I would approach this. Tell me whether you think it is helpful and if you have further questions.
Assuming that you have your data in one XML file, e.g. data.xml, and you have access to PHP and its XSLTProcessor class I would do the following:
Create an HTML form with inputs for the fields you want to query
Write an XSL transformation to filter data.xml on the wanted parameters
Write a PHP script that gets parameters from the HTML form, applies them to the transformation, executes it and outputs the result
Tell me if you find this approach suitable for you and I can add more detail. Good luck!