I have some HTML content being generated via some PHP.
Whilst investigating a css problem, I noticed through Firebug that some elements in the DOM were not organised as I expected. Yet, when I did the standard 'View Source' in Firefox it showed everything to be correct.
I know the source being displayed by Firebug is accurate, because the source it presents me corresponds to the aesthetic issue I'm seeing on screen, but I'm not sure what this means and how to investigate further.
Why does this happen, and which source version should I be looking at? (p.s. I have no JavaScript running on the website.)
Firebug cleans up the DOM tree, so if there's any syntax bugs in the raw source, you won't see them in Firebug (unless they're so bad it screws up the parse tree completely).
The regular view-source functionality shows the page's source as it came from the server. If you do any manipulations of the DOM after the page loaded, it won't show up in view-source, as that's now outdated. Firebug will show the live in-memory tree, with any manipulations included, but it will also clean things up.
Firebug shows a live view of the page's DOM structure.
View Source shows the original HTML received from the server.
If you modify the DOM using Javascript, the changes will only appear in Firebug.
If your HTML was invalid and the browser fixed it up, the fixes will also only appear in Firebug.
You can use the browser's View Selection Source option to show the source for the actual DOM, which will match what you see in Firebug.
Firebug shows more than just the code you have entered. It also includes default styles from the browser (assuming you have not used yahoo css reset). Although you cannot guarantee that firebug in itself does not contain any bugs, I tend to trust it more the view-source, even more so when javascript is used, because the output of the page can be vastly different from the original html content, albeit not in your case as you are not using JS.
Related
If you want to look at the CSS of other people's websites (to steal learn from them), Firebug lets you inspect the prettified CSS:
But in Chrome 16, you only get the minified CSS as it was served out:
Is there a way to get Chrome to prettify the CSS?
In the newer versions there is a "format" button that prettifies the source:
(only just realised myself :P )
The Developer's console shows the file as served. If you want a human-readable version, copy-paste the code to http://www.codebeautifier.com/.
If you use the Elements tab, the applied CSS properties are also shown per element.
I recommend Quick Source Viewer, which is an extension to chrome and requires no human copy-pasting (acts sort of like an extra chrome dev-tool).
It can show you the source of the current page formatted and colour coded.
It's pretty powerful, showing all 'sources' of the page, be it css, js or html. Even things like inline css/js can be viewed individually (with injected code highlighted). And the best part is it prettifies all of them, even the css (which chrome's dev tools still refuses to do).
You may want to checkout Pretty Print: https://chrome.google.com/webstore/detail/prettyprint/nipdlgebaanapcphbcidpmmmkcecpkhg?hl=en
After installing, when you view a minified CSS or JS file, it will appear (after a moment) un-minified.
I have done HTML parsing. I get a URL, and using Nokogiri I can extract components from the HTML. That is fine.
Now, I am wondering if the following it is possible or it just does not make sense at all:
When we look at a browser, there is a render engine that parses the HTML/CSS/JS and creates a visual representation of it. I am wondering if it is possible to access that in-memory DOM interpretation. So, for example, when parsing an HTML I can find a that is pretty far from the root element, but when rendered it can appear on top of the page (because the CSS says it is absolutely positioned). I would like to be able to get that image as it appears on the browser.
Is there any open source API that would let me access this interpretation of an HTML file or what I am saying does not make sense at all, because what we see it is just that visual objects that can not be treated?
It sounds like you're asking for a headless browser – a rendering engine that works for your code instead of a user.
Look at PhantomJS.
I'm developing a web scraping tool in Python, and I need to get intimately acquainted with the functions of various HTML tags on certain sites. Unfortunately, the "view source" that Chrome, Firefox, and Safari offer does not output very well formatted HTML source code -- it tends to place a huge number of tags on the same line. Do the browsers offer any plugins that may be able to clean things up a bit, or do I need to get/develop some kind of tool in Python that takes dirty HTML as input and outputs cleanly formatted HTML?
Since I work primarily with Chrome, the best examples I can think of are Code Formatter (Chrome)
This isn't automatic; you have to copy and paste the entire page into the app. Also the app window is small (this unalterable to my knowledge), but relatively effective.
...and JavaScript and CSS Beautifier
Much more effective and clean, but only works, as the title suggests, with .Js and CSS.
With Firefox you can select (highlight - I am writing for beginners also) the text, and once it is selected, release the left mouse button and right click within the selected area and choose "View selection source." You can then copy the highlighted text and paste it.
My composite example:
View selection source
I do a decent job of formatting my HTML and keeping it clean, but every time I view source there are elements all over the place. I guess that's fine since it won't make the page load any faster or slower and makes it harder to copy, but it just looks ugly and I wish it didnt
Why?
View Source in a web browser will show exactly what the server sent to the client. If you're really formatting your HTML nicely and it doesn't look exactly the same on the client, then there's something else in the middle that's making it not line up the same, such as a server-side technology like PHP or ASP.NET which is being used to generate some of the markup.
Also it's possible you're seeing it different due to spaces. If in your development environment you're mixing spaces and tabs and have one tab equal to 4 spaces, for example, and then in the browser it might be one tab equal to 8 spaces, then things won't line up right. To fix this, either always use tabs or always use spaces. Most decent IDEs will swap between tabs and spaces automatically for you (like Visual Studio).
Some browser tools like Firebug and Chrome's Developer Tools will show the DOM tree as the browser understands it. This is a translation of the DOM back to HTML and is not likely to be the exact same as what the server sent the content. It is formatted perfectly though.
I'm not sure why your HTML is not lined up properly in your browser's View Source. It would be helpful to actually see your HTML.
Some of the common culprits include:
a mixture of tabs and manual spaces for indenting code (if you want things to look pretty, do one or the other).
possibly a mixture of Windows, Unix, and Macintosh line breaks (CR/LF), which can happen if code is edited from multiple computers. I've had issues with this, but I'm not sure if it would cause the issues you're describing; perhaps not. (I'm sure others can comment more knowledgeably on that possibility.)
your site is managed through a CMS that emits terrible HTML
It may be useful for you to look at HTML Tidy. I haven't used it yet, but I've always heard good things.
I am working on a crawler and as a result, I need to look at the HTML of the site I will crawl, to make assumptions (Which will be soft coded).
HTML of large sites is not easy to read. Is there a tool which can display the HTML in some sort of tree-like hierarchy?
Thanks
If you want to actually look (like, you know, with your eyes) at some HTML source in a nicely formatted tree, then use Firebug. The HTML tab of that is perfect (and even editable if you want to play with it).
If you're looking forward to see the "tree-like hierarchy", you have this tool : http://tools.arantius.com/tabifier
I'm surprised nobody has mentioned IE8.
Tools -> Developer Tools.
Google Chrome has something pretty similar to Firebug, built in. Right-click anywhere in the page and click Inspect element. That gives you a fully collapsed tree view of the source. You can expand and drill down as necessary.