How to properly enhance Qt HTML5 application? - html

In Qt Creator I see "HTML5 Application" wizard. It creates a project with "html5applicationviewer" sub-project that loads the HTML and injects the object named "Qt" object with only public slot "quit" (which is used in the demo code).
If I change html5applicationviewer to also inject my object in addition to "Qt", it Qt Creator suggests to "upgrade" html5applicationviewer; and there's a comment This file was generated ... It is recommended not to modify this file in html5applicationviewer's files.
How to properly add more Qt C++ things to the HTML part in Qt HTML5 applications?

To my knowledge there are two ways you can extend your Qt HTML5 application. The first one is indeed by injecting your QObjects to make them visible in JavaScript. This you do in response to javaScriptWindowObjectCleared() signal emitted by web view page main frame.
The second way is to provide custom plugin factory to the web view page (see QWebPage::setPluginFactory() method). This way allows you creating Qt objects (e.g. widgets) instances directly in HTML code (using <object> tag).

Related

Where to find entire HTML content in Chromium source code

I am currently trying to do this: once the webpage loads, find out if the URL is of a certain pattern (say www.wikipedia.com/*), then, if so, parse the HTML content of that webpage like one can do with BeautifulSoup, and check if the webpage has a div with class foo and id boo. Any idea where can I writ this code, that is, where can I get access to URL, where do I need to listen to to know that the webpage has finished loading following which I can look for the URL and HTML content, and where and how I can parse the HTML?
I tried going through the code in src/chrome/browser/tab_contents, I could not find any reasonable place where I can do all this.
Take a look at the following conceptual application layers which represent how Chromium displays web pages:
Image Source: https://docs.google.com/drawings/d/1gdSTfvLxbJDbX8oiWo5LTwAmXmdMQvjoUhYEhfhj0-k/edit
The different layers are described as:
WebKit: Rendering engine shared between Safari, Chromium, and all other WebKit-based browsers. The Port is a part of WebKit that integrates with platform dependent system services such as resource loading and graphics.
Glue: Converts WebKit types to Chromium types. This is our "WebKit embedding layer." It is the basis of two browsers, Chromium, and test_shell (which allows us to test WebKit).
Renderer / Render host: This is Chromium's "multi-process embedding layer." It proxies notifications and commands across the process boundary.
WebContents: A reusable component that is the main class of the Content module. It's easily embeddable to allow multiprocess rendering of HTML into a view. See the content module pages for more information.
Browser: Represents the browser window, it contains multiple WebContentses.
Tab Helpers: Individual objects that can be attached to a WebContents (via the WebContentsUserData mixin). The Browser attaches an assortment of them to the WebContentses that it holds (one for favicons, one for infobars, etc).
Since your goal is to access and interpret the HTML content of a web page by element and/or class, you can look to the rendering process which uses Blink:
The renderers use the Blink open-source layout engine for interpreting and laying out HTML.
Blink has a WebDocument class which allows you to access the HTML content and other properties of a web page:
WebDocument document = GetMainFrame()->GetDocument();
WebElement element = document.GetElementById(WebString::FromUTF8("example"));
// document.Url();
Cleanest would be via the chrome remote debugging protocol
Use the DOM methods to get the root DOM and walk, search, or query the dom
This would make testing simpler as well: you can implement the logic in your favourite scripting language using an existing client library (there are many) and once that works implement it in C++.
If this for some reason has to be inprocess within Chromium, as a next step start a thread that connects to this and performs the operations.
You need to use a server side library to parse the contents of a requested HTML page. In Java for example there is a library "jsoup" there might be another alternatives for other server side languages. The main problem you could find is a "forbiden access", due to security restrictions, but as you are not trying to access REST services or similar things but only parse pure HTML to found string patterns, it must be easily done with "jsoup". There was a project where similar things were programmed for accessing web sites pages & parse the response html string.
Document doc = Jsoup.connect("http://jsoup.org").get();
Element link = doc.select("a").first();
String relHref = link.attr("href"); // == "/"
String absHref = link.attr("abs:href"); // "http://jsoup.org/"
See: https://jsoup.org/

How to use Thymeleaf th:text in reactJS

I am running a springboot application with Thymeleaf and reactJS. All the HTML text are read from message.properties by using th:text in the pages, but when I have th:text in reactJS HTML block, reactJS seems angry about it.
render() {
return (
<input type="text" th:text="#{home.welcome}">
)
}
The error is:
Namespace tags are not supported. ReactJSX is not XML.
Is there a walkaround besides using dangerouslySetInnerHTML?
Thank you!
There is no sane workaround.
You are getting this error because Thymeleaf outputs XML, and JSX parsers do not parse XML.
You did this because JSX looks very, very similar to XML. But they are very, very different, and even if you somehow hacked Thymeleaf to strip namespaced attributes and managed to get a component to render, it would be merely a fleeting moment of duct-taped-together, jury-rigged code that will fall apart under further use.
This is a really, really bad idea because JSX is Javascript. You are generating Javascript on the fly. Just to name a few reasons this will not work in the long term:
This makes your components difficult if not impossible to test.
Reasoning about application state will be a nightmare as you will struggle to figure out if the source of a certain state is coming from Thymeleaf or JS.
Your application will completely grind to a halt if Thymeleaf outputs bad JS.
These problems will all get worse with time (Thyme?) as as developers abuse the ease with which they can render server-side data to the client-side, leading to an insane application architecture.
Do not do this. Just use Thymeleaf, or just use React.
Sample Alternative: I primarily work on a React application backed by a Java backend. So I understand how someone could stumble upon this hybrid and think it might be a good idea. You are likely already using Thymeleaf and are trying to figure out how you can avoid rewriting your servlets but still get the power of React.
We were in a similar boat two years ago, except with an aging JSP frontend, but the difference is negligible. What we did (and it works well) is use a JSP page to bootstrap the entire React application. There is now one JSP page that we render to the user. This JSP page outputs JSON into a single <script> tag that contains some initial startup data that we would otherwise have to fetch immediately. This contains resources, properties, and just plain data.
We then output another <script> that points to the location of a compiled JS module containing the entire standalone React application. This application loads the JSON data once when it starts up and then makes backend calls for the rest. In some places, we have to use JSP for these, which is less than ideal but still better than your solution. What we do is have the JSP pages output a single attribute containing JSON. In this way (and with some careful pruning by our XHR library) we get a poor man's data interchange layer built atop a JSP framework we don't have time to change.
It is definitely not ideal, but it works well and we have benefited vastly from the many advantages of React. When we do have issues with this peculiar implementation, they are easy to isolate and resolve.
It is possible wrap ReactJS apps in Thymeleaf. Think if you want a static persistent part (like some links, or even just displayed data), you could use Thymeleaf. If you have a complicated part (something that requires DOM repaints, shared data, updates from UI/Sockets/whatever), you could use React.
If you need to pass state you could use Redux/other methods.
You could have your backend send data via a rest API to the React part and just render your simple parts as fragments or as whole chunks of plain HTML using Thymeleaf.
Remember, Thymeleaf is really just HTML. React is virtual DOM that renders as HTML. It's actually fairly easy to migrate one to the other. So you could write anything "Static" or that does not respond much to UI, in Thymeleaf/HTML. You could also just render those parts in React too, but without State.
Thymeleaf 3 allows you to render variables from your Java to a separate JS file. So that is also an option to pass into JSX
function showCode() {
var code = /*[[${code}]]*/ '12345';
document.getElementById('code').innerHTML = code;
}
Now you can use data- prefix attributes (ex. data-th-text="${message}").
https://www.thymeleaf.org/doc/tutorials/3.0/usingthymeleaf.html#support-for-html5-friendly-attribute-and-element-names

React based Forms that could render via json

I am looking for any ReactJS based lib that utilizes json as input and renders the HTML accordingly. Bonus if it can do automatic validation for the end user's input. Example it takes a json input having an array and renders it as HTML select control.
Also the lib should have the possibility to be used without nodejs, I mean in a simple web application like for example Asp.Net MVC based web app. Much better if it has examples and good documentation.

dynamic HTML page to pdf

I know there is a list of similar questions but all handle pages without user interaction (static even though some js may be there).
Let's say we've a page the user can interact (e.g. svg than changes, or html tables with drilldown - content changes). Those interactions will change the page. Same happens in stackoverflow when entering the question...
The idea is adding a button, "convert to pdf" taking the state of the html and sending to the user back a pdf version (we've a Java server).
Using the print of the browser is not the answer I'm looking for :-).
Is this a stick in the moon ?
You would have to store the parameters that generate the HTML view (i.e. what the user clicks on, what selections they make, etc). If you can have a list of parameters that generate the HTML view, you can have a method which accepts the list of parameters (JSON post?), generates the HTML view and passes it to your PDF generating routine. I'm not too familiar with Java libraries for this purpose, but PHP has TCPDF can take html output to basically generate a PDF for you. Certainly, there are Java libraries which will allow you to do the same thing, or you can use the parameters to get a list of rows/arrays which can be iterated over and output using the PDF library of your choice.
Both iTextPDF and Aspose.PDF would allow you to do that (I've seen them used in two different projects), but there is no magic and you will have to do some work.
The steps are roughly:
Get (as a string) the part of the document which you want to print with jQuery or innerHTML
Call a service on the server side to convert this to PDF
[Serverside] Use a whitlist - based tool to clean up the hmtl (unless you want to be hacked). JSoup is great for that.
[Serverside] Use IText or Aspose API to create the PDF from the HTML (this is not trivial, you will have to read the doc)
Download the document
I'd also recommend DocRaptor, an HTML to PDF API built by my company, Expected Behavior.
DocRaptor uses Prince XML to generate PDFs, and thus produces higher quality results than similar products.
Adding PDF generation to your own web application using our service is as simple as making an HTTP POST request to our server.
Here's a link to DocRaptor's home page:
DocRaptor
And a link to our API documentation:
DocRaptor API documentation

WPF: Display HTML-based content stored in resource assembly

In my WPF project I need to render HTML-based content, where the content is stored in a resource assembly referenced by my WPF project.
I have looked at the WPF Frame and WebBrowser controls. Unfortunately, they both only expose Navigation events (Navigating, Navigated), but not any events that would allow me, based on the requested URL, to return HTML content retrieved from the resource assembly.
I can intercept navigation requests and serve up HTML content using the Navigating event and the NavigateToString() method. But that doesn't work for intercepting load calls for images, CSS files, etc.
Furthermore, I am aware of an HTML to Flowdocument SDK sample application that might be useful, but I would probably have to extend the sample considerably to deal with images and style sheets.
For what it is worth, we also generate the HTML content to be rendered (via Wiki pages) so the source HTML is somewhat predictable (e.g., maybe no JavaScript) in terms for referenced image locations and CSS style sheets used. We are looking to display random HTML content from the internet.
Update:
There is also the possibility to create an MHT file for each HTML page, which would 'inline' all images as MIME-types and alleviate the need to have finer-grained callbacks.
If you're okay with using a 28 meg DLL, you may want to take a look at BerkeliumSharp, which is a managed wrapper around the awesome Berkelium library. Berkelium uses the chromium browser at its core to provide offscreen rendering and a delegated eventing model. There are tons of really cool things you can do with this, but for your particular problem, in Berkelium there is an interface called ProtocolHandler. The purpose of a protocol handler is to take in a URL and provide the HTTP headers and body back to the underlying rendering engine.
In the BerkeliumSharp test app (one of the projects available in the source), you can see one particular use of this is the FileProtocolHandler -- it handles all the file IO for the "file://" protocol using .NET managed classes (System.IO). You could do the same thing for a made up protocol like "resource://". There's really only one method you have to override called HandleRequest that looks like this:
bool HandleRequest (string url, ref byte[] responseBody, ref string[] responseHeaders)
So you'd take a URL like "resource://path/to/my/html" and do all the necessary Assembly.GetResourceStream etc. in that method. It should be pretty easy to take a look at how FileProtocolHandler is used to adapt your own.
Both berkelium and berkelium sharp are open source with a BSD license.
The WebBrowser exposes a NavigateToStream(Stream) method that might work for you:
If your content is then stored as an embedded resource, you could use:
var browser = new WebBrowser();
var source = Assembly.Load("ResourceAssemblyName");
browser.NavigateTo(source.GetManifestResourceStream("ResourceNamespace.ResourceName"));
There is also a NavigateToString(string) method that expects the string content of the document.
Note: I have never used this in anger, so I have no idea how much help it will be!