Automate Web Applications -parsing HTML Data - html

I just want to automate a web application, where that application parses the HTML page and pulls all the HTML Tags inner text based on some condition like if we have a tag called Span Example has given whose class="spanclass_1"
This is span tag...
which has particular class id. so that app parses and pulls that span into it.
And here the main pain area is, I should not use the developer code to automate that same parsing the HTML.
I want to automate that parsing done correctly, simply by using the parsed data which is shown in UI.
Any help, would be great.
Appreciating your time reading this.
(Note span tag is not shown)
Thanks buddies.

not enough details.
is this html page just a file in local filesystem on it is internet webpage?
do u have access to pages? can u modify it ? if answer yes, that just add javascript to page which will extract data and post to server.
if answer not, than it depends on language u use to programm.
Find good framework to parse html. load page parse it and extract data. Several situation can be there.
Worse scenario - page generated on client side using js.
Best scenario - page is in xhtml mode( u are lucky. any xml parser will help to build dom and extract data)
So so - page is simple html format (try several html parser to find most suitable for u)

Related

Storing content to display on page

In my current application we have a service which responds with a XML.With the XML we do XSLT transformation to a HTML and display it on our web page(review,Tutorials).
My Question here is this the only way to store the content of the webpage and display it as required.
We are going to migrate to a Angular app.So do i still continue using the XSLT way or is there other better way to store the content of the page and display it.
I suggest you use native JSON communication, if possible. Or try to convert your data to a native JSON object - Angular (and javascript, really) can work much better and faster with the native JSON itself.
Although, f you want to display a html content, you can use the innerHTML tag.
If you manage to get the data in a JSON, read more and learn about the angular tehcniques of displaying data. I suggest you do the tutorial on angular.io :)

Semantic Media Wiki: Displaying a SVG element of a HTML page

I have a HTML page that displays a SVG element (a Business process diagram) using some javascript libraries. A String variable, say 'str' needs to be given to html function.
After reading this, I plan to use widgets. So far I understand that I need to copy all scripts to Widgets: Test. For creating the hook, I write
{{#widget:Test|str=UserTask_1}}
The problem is that UserTask_1 is a variable as well. It is different each time.
Can someone help how can I add this dynamic information to my hook? This hook is a hyperlink from a previous page. In the previous page, I send the str=UserTask_1 through JavaWiki Bot.
PS: I have come-across SMW for first time. Please excuse if my language is not very technical at the moment.
Thanks.

How to best transfer a document to a SAPUI5 framwork?

I'd like to achieve the following and I'm looking for ideas. I have a document and I want to represent/transform this content in/to a nice SAPUI5 framework. My idea is the following: a split app with having the paragraph titles in the master view (plus a search function on top) and the respective content in the detail view.
I'd like to know from you if
a) you might want to share your ideas and hints on alternatives.
b) this can be achieved within one single file (i.e. all the code for the split app and document content in one html) and maybe using pure html code (xml also feasible) - against the background of easily handing a large amount of text available in html.
c) if you happen to have/know a reusable template.
Thanks in advance!
An interesting question. I went through a similar exercise once, re-presenting my site with UI5.
To your questions:
(a) I would think that the approach you suggest is a good one
(b) You can indeed include all the app in a single file, I do that often by using script templates, even with XML Views. You can see some examples in my sapui5bin repository, in particular in the SinglePageExamples folder. Have a look at this html file for example: https://github.com/qmacro/sapui5bin/blob/master/SinglePageExamples/SAP-Inside-Track-Sheffield-2014/end.html
What I would suggest is, rather than intermingle the document content and the app & view definitions, maintain the content of your document separately, for example, in XML or JSON, and use a client side model to load it in and bind the parts to the right places.

HTML parsing in Clojure

I'm looking for a good way to parse HTML in Clojure.
Exactly what I'm trying to do is get content of a web page with crawler and then get content of some HTML tags or their attributes.
So I have URL to the page, and I get html as String, but how do get data I need?
Use https://github.com/cgrand/enlive
It allows you to select and retrieve with CSS-alike selectors.
Or https://github.com/nathell/clj-tagsoup
I am not experienced with tag-soup but I can tell that enlive works well for most scraping.

dynamic HTML page to pdf

I know there is a list of similar questions but all handle pages without user interaction (static even though some js may be there).
Let's say we've a page the user can interact (e.g. svg than changes, or html tables with drilldown - content changes). Those interactions will change the page. Same happens in stackoverflow when entering the question...
The idea is adding a button, "convert to pdf" taking the state of the html and sending to the user back a pdf version (we've a Java server).
Using the print of the browser is not the answer I'm looking for :-).
Is this a stick in the moon ?
You would have to store the parameters that generate the HTML view (i.e. what the user clicks on, what selections they make, etc). If you can have a list of parameters that generate the HTML view, you can have a method which accepts the list of parameters (JSON post?), generates the HTML view and passes it to your PDF generating routine. I'm not too familiar with Java libraries for this purpose, but PHP has TCPDF can take html output to basically generate a PDF for you. Certainly, there are Java libraries which will allow you to do the same thing, or you can use the parameters to get a list of rows/arrays which can be iterated over and output using the PDF library of your choice.
Both iTextPDF and Aspose.PDF would allow you to do that (I've seen them used in two different projects), but there is no magic and you will have to do some work.
The steps are roughly:
Get (as a string) the part of the document which you want to print with jQuery or innerHTML
Call a service on the server side to convert this to PDF
[Serverside] Use a whitlist - based tool to clean up the hmtl (unless you want to be hacked). JSoup is great for that.
[Serverside] Use IText or Aspose API to create the PDF from the HTML (this is not trivial, you will have to read the doc)
Download the document
I'd also recommend DocRaptor, an HTML to PDF API built by my company, Expected Behavior.
DocRaptor uses Prince XML to generate PDFs, and thus produces higher quality results than similar products.
Adding PDF generation to your own web application using our service is as simple as making an HTTP POST request to our server.
Here's a link to DocRaptor's home page:
DocRaptor
And a link to our API documentation:
DocRaptor API documentation