What engine renders the WebBrowser Control? - html

I currently have a WebBrowser control in my VB.NET project being created below:
Private Sub SomeSubToPrintHTMLViaWebBrowser()
' strDocument is the giant blob of HTML text that can be seen in the jsFiddle linked later in the question.
Dim webBrowserHidden As New WebBrowser
AddHandler webBrowserHidden.DocumentCompleted, New WebBrowserDocumentCompletedEventHandler(AddressOf PrintDocument)
webBrowserHidden.DocumentText = strDocument
End Sub
However when I use the .Print or .ShowPrintDialog methods of the WebBrowser, the page is coming out malformed even though when I load the HTML coding as a webpage in either IE, Edge, Chrome, or Firefox, it works perfectly fine. The coding was also validated as "proper" by the W3C Online Validator.
So what I would like to know is, what engine is WebBrowserusing to render pages?
Here is the HTML/CSS coding that I'm trying to run
https://jsfiddle.net/et1t2kh5/

This is probably because the WebBrowser control is emulating an older version of IE.
Unfortunately, there's no easy fix for this and the workaround requires that you modify the registry.
Using the Registry Editor (regedit.exe) navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Internet Explorer\MAIN\FeatureControl\FEATURE_BROWSER_EMULATION
Add a new DWORD entry, where the name will be the name of your application's executable and then set the value to 2af8 (hex) or 11000 (dec).
This will force the WebBrowser control to use IE11's rendering engine.
Please refer to the following link for further information: Internet Feature Controls (B..C)

Related

start browser at an anchor possibly using a tag

I am developing an interactive software for thermodynamic calculations using an html help file with "anchor/target" features to select the appropriate part of the help file when the user types a ? as answer to a question.
This works well but at present a new browser window is opened each time the user types a "?". I would prefer to start a new tag if there already is a browser window opened.
At present my program activate the help by creating a character with the content: browser "file:helpfile#target"
and then call the Fortran subroutine execute_system_command(character).
"browser" can be firefox or whatever is the preferred browser by the user (on Mac including path);
"helpfile" contains the path and name of the my html help file;
"target" is a text which depend on the question asked by the software to localize the appropriate help text.
How can I modify this so I open a new tag in the browser (if it is already opened) rather than starting a new browser window?
Maybe something like "target=_blank" can be added?
My program is written in the new Fortran standard so I have no facilities that might be available in Java or Python. It must work using different browsers on different OS.
As #Vladimir pointed out I ment a "tab", there are so many terms. But the answers I had made me reconsider which browser I could use. And I made some new discoveries.
On Windows I used the old Explorer because the path to Firefox contains a space and when I tried to start Firefox to open a file
I had to enclose "C:\Program ...\firefox" within double quotes. That works to start the browser but if I want the browser to open a file I must enclose that also within double quote and that did not work. I am not sure if the problem was the Fortran intrinsic EXECUTE_COMMAND_LINE(txt) or deeper down. But the old Explorer I could start without "" and just enclose "file:/help.html" with "".
So now I tried to be smart wrote a test program enclosing just the directories with a space within using "" i.e.
C:"Program Files\Mozilla Firefox"\firefox.exe "file:/help.html"
in the call to EXEXUTE...
and that worked and opened the helpfile in a tab as is my default.
Problem solved? No, when I had exchanged the browser with path in the program and tested it did not find the browser. The reason was that I have a test that the browser exist using another Fortran intrinsic INQUIRE and as I understand doublequotes are not legal inside file names so INQUIRE did not find firefox when there are " inside the path. Only if " are used around the whole file name it worked. So back to square one? No, I simply removed the " in the path+browser before calling INQUIRE, then used the path with "" inside when calling EXECTUE ...
and now it everything works as I wanted!

Where to find entire HTML content in Chromium source code

I am currently trying to do this: once the webpage loads, find out if the URL is of a certain pattern (say www.wikipedia.com/*), then, if so, parse the HTML content of that webpage like one can do with BeautifulSoup, and check if the webpage has a div with class foo and id boo. Any idea where can I writ this code, that is, where can I get access to URL, where do I need to listen to to know that the webpage has finished loading following which I can look for the URL and HTML content, and where and how I can parse the HTML?
I tried going through the code in src/chrome/browser/tab_contents, I could not find any reasonable place where I can do all this.
Take a look at the following conceptual application layers which represent how Chromium displays web pages:
Image Source: https://docs.google.com/drawings/d/1gdSTfvLxbJDbX8oiWo5LTwAmXmdMQvjoUhYEhfhj0-k/edit
The different layers are described as:
WebKit: Rendering engine shared between Safari, Chromium, and all other WebKit-based browsers. The Port is a part of WebKit that integrates with platform dependent system services such as resource loading and graphics.
Glue: Converts WebKit types to Chromium types. This is our "WebKit embedding layer." It is the basis of two browsers, Chromium, and test_shell (which allows us to test WebKit).
Renderer / Render host: This is Chromium's "multi-process embedding layer." It proxies notifications and commands across the process boundary.
WebContents: A reusable component that is the main class of the Content module. It's easily embeddable to allow multiprocess rendering of HTML into a view. See the content module pages for more information.
Browser: Represents the browser window, it contains multiple WebContentses.
Tab Helpers: Individual objects that can be attached to a WebContents (via the WebContentsUserData mixin). The Browser attaches an assortment of them to the WebContentses that it holds (one for favicons, one for infobars, etc).
Since your goal is to access and interpret the HTML content of a web page by element and/or class, you can look to the rendering process which uses Blink:
The renderers use the Blink open-source layout engine for interpreting and laying out HTML.
Blink has a WebDocument class which allows you to access the HTML content and other properties of a web page:
WebDocument document = GetMainFrame()->GetDocument();
WebElement element = document.GetElementById(WebString::FromUTF8("example"));
// document.Url();
Cleanest would be via the chrome remote debugging protocol
Use the DOM methods to get the root DOM and walk, search, or query the dom
This would make testing simpler as well: you can implement the logic in your favourite scripting language using an existing client library (there are many) and once that works implement it in C++.
If this for some reason has to be inprocess within Chromium, as a next step start a thread that connects to this and performs the operations.
You need to use a server side library to parse the contents of a requested HTML page. In Java for example there is a library "jsoup" there might be another alternatives for other server side languages. The main problem you could find is a "forbiden access", due to security restrictions, but as you are not trying to access REST services or similar things but only parse pure HTML to found string patterns, it must be easily done with "jsoup". There was a project where similar things were programmed for accessing web sites pages & parse the response html string.
Document doc = Jsoup.connect("http://jsoup.org").get();
Element link = doc.select("a").first();
String relHref = link.attr("href"); // == "/"
String absHref = link.attr("abs:href"); // "http://jsoup.org/"
See: https://jsoup.org/

VB.NET webbrowser: HTML of DocumentText is inaccurate, compared to inspecting elements in a browser

I'm trying to read messages sent by strangers on Omegle. A random "chat with strangers" website.
I've displayed the DocumentText of my webbrowser (called Omegle) in a textbox called OmegleHTML:
Private Sub Omegle_DocumentCompleted(sender As Object, e As WebBrowserDocumentCompletedEventArgs) Handles Omegle.DocumentCompleted
OmegleHTML.Text = Omegle.DocumentText
Me.Text = Omegle.Document.Title
End Sub
I've also did a bit of coloring to make things a bit clear:
Now using this HTML code, I've been able to do simple tasks I need such as simulating clicks. But what I'm mainly interested in like I said is extracting the string a stranger says from the HTML code, sadly I'm unable to find what I need in the HTML code I've exported to the textbox, however when I inspect the message element in Chrome:
This is the exact code I need to display in my textbox in order to extract the logitem message a stranger types, now what am I doing wrong? I noticed that when I press Ctrl + U (page source) in chrome, it displays the same exact code my textbox displays, aslo missing the logitems I need, so if I'm not looking for the page source, what should I look for?
The content is written out dynamically using JavaScript. So it isn't part of the page source itself, but is part of the "state" of the page.
See this answer for some details.
How to get rendered html (processed by Javascript) in WebBrowser control?

textarea fields: IE behaves differently when loaded through a partial refresh

I have a page in XPages that I use to open and edit a document. There are two ways to open a document in edit-mode: first in read-mode then click a button to put it in edit-mode, or open it directly in edit-mode. Both work in all browsers, yet IE seems to handle both cases differently. We found this out when working with the SWING API.
Opening directly in edit-mode in IE (8/9/10) works, via read-mode to edit-mode doesn't. What we found is that the internal representation of a textarea field differs: when opened in edit-mode, there are more properties, but most importantly, the return+linefeeds are correctly set in both the value and the innerText property.
The button just contains a simple Change Mode action.
Has anyone heard of this anomaly? And does someone know what we did wrong?
PS I'll try to build a simple XPage that shows this behaviour more clearly tomorrow.
For IE switching from read to edit mode, you need a full page refresh

Fixing malformed html that html tidy doesn't fix

Okay, so I've been utilizing HTML tidy to convert regular HTML webpages into XHTML suitable for parsing. The problem is the test page I saved in firefox had its html apparently somewhat precleaned by firefox during saving, call this File F. Html tidy works fine on file F, but fails on the raw data written to a file via .NET (file N). Html tidy is complaining about form tags being intermixed with table tags. The Html isn't mine so I can't just fix the source.
How do I clean up file N enough so that it can be run through Html tidy? Is there a standard way of hooking into Firefox (completely programmically without having to use mouse or keyboard) or another tool that will apply extra fixes to the html?
I had been using HTML tidy for some time, but then found that I was getting better results from TagSoup.
It can be used as a JAXP parser, converting non-wellformed HTML on the fly. I usually let it parse the input for Saxon XQuery transformations.
But it can also be used as a stand-alone utility, as an executable jar.
I wound up using SendKeys in C# and importing functions from user32.dll to set Firefox as the active window after launching it to the website I wanted (file:///myfilepathhere/).
SendKeys seemed to require running a windowed program, so I also added another executable which performs actions in its form_load() method.
By using alt+f, down six times, enter, wait for a bit, type full path file name, enter (twice) and then killing firefox, I was able to automate firefox's ability to clean some html up.