Tool to display html as a hierarchy - html

I am working on a crawler and as a result, I need to look at the HTML of the site I will crawl, to make assumptions (Which will be soft coded).
HTML of large sites is not easy to read. Is there a tool which can display the HTML in some sort of tree-like hierarchy?
Thanks

If you want to actually look (like, you know, with your eyes) at some HTML source in a nicely formatted tree, then use Firebug. The HTML tab of that is perfect (and even editable if you want to play with it).

If you're looking forward to see the "tree-like hierarchy", you have this tool : http://tools.arantius.com/tabifier

I'm surprised nobody has mentioned IE8.
Tools -> Developer Tools.

Google Chrome has something pretty similar to Firebug, built in. Right-click anywhere in the page and click Inspect element. That gives you a fully collapsed tree view of the source. You can expand and drill down as necessary.

Related

Tagging HTML elements with the source file

Problem
So we have quite a big project with lots of different Partial Views and a client side data binding framework (Knockout.js in our case).
One of the more problemtic parts is that is getting harder and harder to figure out which partial view is rendering an element that I see on my page.
So I need to debug this particular DIV. Okay, where do I find it?
Usually I try to find a very specific class or ID close by this element and do a search through the whole platform - far from ideal.
Question
So I was thinking about the following; tagging all elements (in debug mode) with the source file where they have been generated.
Right now I'm thinking about something like a precompiler that adds a data-source="" to every element. I might refer to an ID within a dictionary to prevent repeating all the long filenames.
Before I'm reinventing the wheel:
is there already something similar?
are there better alternatives?
We're using ASP.NET MVC, but any hints to how other platforms do this are perfect too.
If you are using Visual Studio, I highly recommend the Web Essentials extension. Among many great features, it has one called "Inspect Mode", part of the larger "Browser Link" feature, that does exactly what you are looking for; it identifies the file that a particular DOM element came from. It might be worth a shot if that option is open to you.
#Dirk, as per my understanding your issue is to easily identify the element/view. Adding data-source can be an option but before that have a look at this link
Editing Styles and DOM - Chrome Dev Tools
This page has many demonstrations which might be helpful to your problem. Furthermore, I do agree with Kevin suggestion.

Tool for Viewing Formatted HTML Source Code in Browser

I'm developing a web scraping tool in Python, and I need to get intimately acquainted with the functions of various HTML tags on certain sites. Unfortunately, the "view source" that Chrome, Firefox, and Safari offer does not output very well formatted HTML source code -- it tends to place a huge number of tags on the same line. Do the browsers offer any plugins that may be able to clean things up a bit, or do I need to get/develop some kind of tool in Python that takes dirty HTML as input and outputs cleanly formatted HTML?
Since I work primarily with Chrome, the best examples I can think of are Code Formatter (Chrome)
This isn't automatic; you have to copy and paste the entire page into the app. Also the app window is small (this unalterable to my knowledge), but relatively effective.
...and JavaScript and CSS Beautifier
Much more effective and clean, but only works, as the title suggests, with .Js and CSS.
With Firefox you can select (highlight - I am writing for beginners also) the text, and once it is selected, release the left mouse button and right click within the selected area and choose "View selection source." You can then copy the highlighted text and paste it.
My composite example:
View selection source

A plugin or tool to show what elements on a page are effecting other elements?

I have a bit of an issue with design where I have a list item that has taken it on itself to be the far higher than the others. I have a feeling this is because of another element in the design.
I use firebug sometimes and chrome developer tools the rest of the time.
These tools can be used to see what the height of something is, or what css styles are effecting the object but what these tools don't do however is show how other objects are effecting other objects.
Has anybody come across a tool which shows the relationship objects in a design are having on each other? Its a long shot, but if there were a place to find out it would be here at SO.
Thanks.
Have a look at webdeveloper available for Chrome and Firefox. This is a very extensive plugin but should be able to do what you need.
Install it > go to the page you want to debug > Cycle trough various Outline options. Start off with "Outline block level elements" and work your way from there.
Do tell me if I misunderstood your question, but with either Firebug or the Chrome Developer Tool, you should be able to inspect the nesting of different elements in your design, and see how a design will look after you delete one of those elements.
The only tricky part is learning how to use those tools, and for that I suggest you watch some youtube videos. For me at least, that's the way I've learned. Unless someone can show you in person, the next best alternative is someone showing you how those tools work through a video.

HTML source appearing different in Firebug to standard browser 'View Source' option?

I have some HTML content being generated via some PHP.
Whilst investigating a css problem, I noticed through Firebug that some elements in the DOM were not organised as I expected. Yet, when I did the standard 'View Source' in Firefox it showed everything to be correct.
I know the source being displayed by Firebug is accurate, because the source it presents me corresponds to the aesthetic issue I'm seeing on screen, but I'm not sure what this means and how to investigate further.
Why does this happen, and which source version should I be looking at? (p.s. I have no JavaScript running on the website.)
Firebug cleans up the DOM tree, so if there's any syntax bugs in the raw source, you won't see them in Firebug (unless they're so bad it screws up the parse tree completely).
The regular view-source functionality shows the page's source as it came from the server. If you do any manipulations of the DOM after the page loaded, it won't show up in view-source, as that's now outdated. Firebug will show the live in-memory tree, with any manipulations included, but it will also clean things up.
Firebug shows a live view of the page's DOM structure.
View Source shows the original HTML received from the server.
If you modify the DOM using Javascript, the changes will only appear in Firebug.
If your HTML was invalid and the browser fixed it up, the fixes will also only appear in Firebug.
You can use the browser's View Selection Source option to show the source for the actual DOM, which will match what you see in Firebug.
Firebug shows more than just the code you have entered. It also includes default styles from the browser (assuming you have not used yahoo css reset). Although you cannot guarantee that firebug in itself does not contain any bugs, I tend to trust it more the view-source, even more so when javascript is used, because the output of the page can be vastly different from the original html content, albeit not in your case as you are not using JS.

Do you know a webpage appearance comparator?

I need a tool to compare the design of a website, I do not want to compare the HTML code only, but the output design.
Is this even possible? also is there any opensource program of this kind?
I have searched google, but I only get one candidate so far which is an HTML Match.
In modern webpages the appearance is controlled by various 'things': html code, css styles and images at least (also javascript in some pages). Simple text-based diff programs are not enough because their output can be irrelevant to the webpage appearance (i.e. cleaning up css can show many differences but the rendered webpage remains the same).
For simpler pages HTML Match mentioned above could do the job. If I have to compare the design of two "complex" pages (including layout, space, image and text changes) I would do a two-step approach:
Run a diff tool on the html sources to highlight the textual content differences. Then I would modify one of the pages to show the same content as the other (in order to make the next step more accurate and 'focused' to show 'real' layout changes). Of course it works only with very similar html.
Load the pages in the same web browser, get some screenshots from the rendered output at fixed positions and compare the images (i.e. with ImageMagick). It should show all visual differences in the rendered output.
It is not perfect but should work.
[UPDATE] HTML Match seems dead, see this answer for an alternative solution.
Solution: “compare web pages” tool. (“We've been doing it since 1999. It's free.”)
Example output (comparing pages for TP-Link USB hub model UH700 and UH720):
Under windows:
http://www.htmlmatch.com/
If you are using KDE, you can use Kompare or KDiff3.
However, if you want to view how your web page looks in different browsers in different operating systems, BrowserShots can used.
There are these online tools - that aren't brilliant:
http://www.w3.org/2007/10/htmldiff
http://www.aaronsw.com/2002/diff/
I like the look of daisydiff but have not used it in anger: http://code.google.com/p/daisydiff/
The keyword you're looking for is "diff".
A good program that can show you the differences between two files (html markup or other) would be ExamDiff for windows.
I'm working on one and i tell you it's hard and there is nothing on the market. Maybe Google and Bing have something inhouse. You can use some image comparison tools which identify rectangle regions of changed images. This is for example a part of all modern video compression but you have to do it for different regions of the webpage (the nav bar section, the main article, the region filtered by an ad-blocker etc.) as some of them may change and it's still considered the same content on the page.
As i said very complex problem with no exact solution.
The other is going the non visual way and just compare the resulting computed computer styles of each html element. You have to hack the browser to get access to the layout tree. There is also no official API or existing library/program/hack/patch for it.
You can make a visual comparison with Araxis Merge Pro by taking screen output with systems like BrowserStack, Cross Browser, PhantomJS