Determining font and size for text from HTML - html

I need to determine what font and size will be used for each HTML element. They may be set in various css, div, span, or on the element itself.
If I were to do this manually I would start by looking at the element and work backwards until I came to span, div, or css that had a font and/or size. That is the value I want. The browser can obviously do this because it displays the text using a font and size. I want to print a list with two columns, one with the text and the other with the font/size.

If you are looking for a non-programmatic way: I would suggest the firebug plugin for mozilla. Firebug will not only show you all attributes, but allow you to turn them on and off in the client.

You want to reproduce the functionality of parsing HTML into DOM objects, parsing CSS into rules over those objects, and applying those rules across the objects to end up with the associated computedStyle values? That sounds pretty much like a web browser to me.
You could try scripting Firefox or an IE WebBrowser control. There's also an open-source native Java browser/toolkit being developed, though I don't know how practical that is yet.

Related

Command line tool to interpret HTML/CSS and get element styles

Let's say I download an html page along with all its css files (e.g. with curl)
So I have some html code, some css in head, in tags, and some css from files.
Is there a tool I can use to, e.g. get the color and font-size of that character at position 2957 in page, or the height of this tag starting at position 3917?
I am looking for either Linux command line without X, or perl modules.
Of course the tool would know how proprieties come from parents, get overwritten by css codes depending on their order, etc.
Thanks!
EDIT: height was a dangerous example that can confuse the reader. I do not mean the rendered height when auto e.g. I meant the string "auto". So no rendering necessary.
The standard headless browser is PhantomJS: http://phantomjs.org/ (and there are other similar ones like https://slimerjs.org/).
I'm not sure how pixel-perfect it's going to be (but that's true even with different versions of desktop browsers on a mix of OSs etc.), but would do the full DOM and CSS parsing that you can script and get results from.
Essentially you are asking for browser that can work without any graphic subsystem. Just to measure some element ( "height of this tag starting at position 3917" ) you need fonts on that machine and code that does rasterization of fonts.
I don't think that anyone from browser vendors even looking in rendering-on-headless-device direction.
So is the answer: almost no chances to find such a tool.

programmatically figure out the height and width at which an html will render in the browser with C#

I have almost 200,000 html pages and I need to figure out the height and width at which each html will render in any browser. I only need approximate numbers. How I can programmatically do this with C#?
C# Does not execute inside of a browser, and should not be used to try and determine the width and height a given HTML page will render at in a browser. Moreover, there is no answer for "any browser", as different browsers may support different fonts, may render the same content slightly different, and may be configured with different display-related settings (most browsers allow the user to arbitrarily scale the default font size up or down as desired, which would of course impact the final render size).
In general, however, I would suggest you do something like:
Come up with a JavaScript snippet that can compute the current size of the document.
Write a C# (or Java, C, bash, etc.) program to append your snippet to each of your 200,000 pages.
Use a browser-based test-harness like Selenium or Webdriver to load up each of your 200,000 pages, extract the result from your JavaScript snippet, and log it out to somewhere convenient.
Optionally, you can repeat step 3 with different browsers to get the width/height for all the different browsers that you care about.
Edit: Apparently Webdriver and Selenium are the same thing now. When did that happen?
It's pretty straightforward. Just write an HTML parser and enough of a rendering engine to at least know the height and width of any HTML element (for any screen size, font setting?). Obviously you will need a CSS parser and engine. Since you want to know for any browser, you will need to have modes of emulating each. If you can't directly get the DOM of the HTML pages you are trying to measure you will need a java-script engine to get the values as they appear on the page.
Or you could run the HTML in a browser and use java-script to get the values. This won't be in .NET, though. You could have the java-script post the data to an ASP.NET page if you like though.
Or you could use one of the tools recommended in answer to your earlier question.

Reset CSS for a certain area of the page?

I am working on a CMS. One of its functions is the editing of HTML "chunks" in a WYSIWYG editor that are displayed as individual pages.
I have an area in the CMS where these chunks are previewed.
The chunks rely on a "foundation.css" file that is loaded into the WYSIWYG editor. It does some small resets, defines a default font and text color, and is overall very simple.
The CMS, obviously, comes with a ton of CSS statements, many of which affect general settings like font size, family, color, line-height, paddings and the like.
Naturally, when I try to display a HTML chunk in a CMS page, it looks different from when it is displayed only with the foundation.css stylesheet.
Can anybody think of a way to clean a defined area in a HTML page (say, a DIV) from all previous style definitions? I can't.
an Iframe displaying the chunk and embedding foundation.css would help, but I fear for the user's workflow when 5-10 IFrames have to be rendered and then adjusted in height via JS once they are loaded. Yuck.
I have thought about "lifting" all other CSS to a sub-class (i.e. adjusting the CMS' CSS), but that would involve touching a lot of files, some probably PHP source code, and I'd rather not do that.
I don't think this has a solution but you never know.
There are CSS Reset stylesheets available; you could modify those, perhaps?
you could give all of the divs that contain code from the WYSIWYG editor a class, and then reset everything inside of that div.
Adding to what GSto said you can have some style set up like
div.clearCss *
{
property: value !important
}
With all pertinent style properties reset.
This style should apply to anything under an element with clearCss set as it's class, so you would only need to apply that class to the parent element.
Are you using a specific WYSIWYG editor in CMS? I've found that the Telerik RadEditor will allow you to use specific stylesheets for the actual editing area of the editor. The editor that you are using might also be able to to do that.
Good luck, and hope this helps.
If you take TinyMCE for example, they use also an iframe, I think they had the same problems like you. I use it in my backend and on some pages 4 iframes. I donĀ“t see any performance problems.
With TinyMCE it is possible to compress the functions you need (in PHP, JSP, .NET and Coldfusion), this gives you a great speedboost.
Think twice, bevore you write your own WYSIWYG editor, the others are well tested and have a bunk of very good plugins for nearly every need.

Raw HTML - how to measure width/height on the server?

I have a web application that lets users upload entire .html files to my server. I wish to 'detect' the width/height of the uploaded html and store it in my DB.
So far, I have unsuccessfully tried using the System.Windows.Forms.WebBrowser control - by reading the file into a string, loading it into the browser.document:
_browser = new WebBrowser();
_browser.Navigate("about:Blank");
_browser.Document.OpenNew(true);
_browser.Document.Write(html);
Inspecting the various properties of the _browser object (document, window etc) seems to always default the size to 250x250.
I've tried putting various css size declarations in the .html file and still the same thing.
Is the only option to inspect the html string and regex match CSS
properties?
How would you reliably determine what the rendered width/height would be of the document in question?
Remember, the .html file may or may not contain css properties. Maybe the user uses older, deprecated tags such as
<body width="500">
vs
<style>
body{ width: 400px; }
<body>
etc.
Even if you could capture the declared width through inspection of CSS and/or HTML tag specifications, you'd be unlikely to get the rendered width. Height will be even worse, since text wraps.
I think you may want to consider a different approach. Do you really need this? What requirement are you trying to satisfy? Can it be done in a different way?
As you've discovered, you won't be able to use a WebBrowser control because the height and width reported are the height and width of the control itself, not the document inside the control.
What you'd really need to do is write your own HTML parsing engine to calculate this out on your own. You would need to calculate out all of the lines, figure out the line height, etc.
Is this really worth the effort? You would need to make so many assumptions that such a calculation would be pretty much worthless... Differences in rendering by different browsers, customers that have their text size set to something other than the default, and probably dozens of others. Even the screen resolution would matter because, as you can see in this paragraph, text tends to wrap. You need to calculate where the text will wrap in order to calculate how many lines of text will show up. You need to factor in font sizes...
All of that said, in theory this should be doable, and the mechanics for calculating this all out would be the same concepts you would use for printing to a printer. Calculating the page height, and figuring out where you are on the page is all standard operating procedure when printing manually.
Here's an article that explains the basics. It'll be up to you to see if it's worth the effort.
http://msdn.microsoft.com/en-us/magazine/cc188767.aspx
You will not be able to find the dimensions using regular expressions - remember that there might not be any, in which case you'd have to manually measure the elements in the document, requiring a complete HTML renderer.
Doing it with Interhet Explorer raises security concerns; make sure that IE is always kept up to date on your server, and that its security settings in the ASP .Net account are as tight as possible. (I'm not sure how to do that)
Try _browser.Document.Body.OffsetRectangle.Size.
EDIT: Note that, ass other people have pointed out, the height will also depend on the width, because of text wrapping, etc, so you should set the width of the IE control to an appropiate value.

How do I have different font colors in a textarea?

I want the font color to change in a textarea as I type in specific keywords, like in Visual Studio.
I have not seen this anywhere, so I don't know if this is possible with HTML and JavaScript.
Has anyone seen anything like this? Or know how to write it?
Textarea is a standard HTML element and so was invented just after the dawn of time. Unfortunately this means it is limited in it's appearance and functionality.
Changing the colours of specific words is not possible as far as I know. However a way to get around this would be to have an iFrame embedded in the page. That way, you can treat the iFrame content as another web page and style it using CSS.
The Yahoo RTE, the FCKEditor and the Lightweight RTE works in this way.
Another option, which does not use an iFrame is the editor used here on Stack Overflow, known as the WMD. The files are here.
It's not possible.
Way to go is to make textarea's font, but not cursor, transparent using color:#000;-webkit-text-fill-color:transparent, then create underlying, 100% overlapping div to which content of textarea will be copied + formatted on textarea's oninput event.
You'll need to adress (or avoid) some issues coming out of syncing these two elements, like scrolling for example, but it can be done. I made my own HTML editor this way.
AFAIK, css property -webkit-text-fil-color is supported in Opera, Chrome and should be in soon-to-be-released Firefox v.48.
You would probably have to run javascript on the client to detect when the text changes, then replace the text to be highlighted with some child html elements with the proper style.
For example
Original text:
This is what the user typed.
Highlighted text
This is what the <a class="className">user</a> typed.