Tagging HTML elements with the source file - html

Problem
So we have quite a big project with lots of different Partial Views and a client side data binding framework (Knockout.js in our case).
One of the more problemtic parts is that is getting harder and harder to figure out which partial view is rendering an element that I see on my page.
So I need to debug this particular DIV. Okay, where do I find it?
Usually I try to find a very specific class or ID close by this element and do a search through the whole platform - far from ideal.
Question
So I was thinking about the following; tagging all elements (in debug mode) with the source file where they have been generated.
Right now I'm thinking about something like a precompiler that adds a data-source="" to every element. I might refer to an ID within a dictionary to prevent repeating all the long filenames.
Before I'm reinventing the wheel:
is there already something similar?
are there better alternatives?
We're using ASP.NET MVC, but any hints to how other platforms do this are perfect too.

If you are using Visual Studio, I highly recommend the Web Essentials extension. Among many great features, it has one called "Inspect Mode", part of the larger "Browser Link" feature, that does exactly what you are looking for; it identifies the file that a particular DOM element came from. It might be worth a shot if that option is open to you.

#Dirk, as per my understanding your issue is to easily identify the element/view. Adding data-source can be an option but before that have a look at this link
Editing Styles and DOM - Chrome Dev Tools
This page has many demonstrations which might be helpful to your problem. Furthermore, I do agree with Kevin suggestion.

Related

Filter CSS according to used selectors

I am implementing the Summernote Editor which rely on Bootstrap, but I usse my own custom stylesheet. This gives me 2 problems:
It breaks my design and the Bootstrap file is so long it is difficult to find the exact selectors causing the trouble.
It loads a 120kB file, when maybe just 20-30 is necessary (the part actually needed for the editor to render nice).
Does anyone know a tool (maybe online) to compare the actual used tags, classes etc. in the source code with the attached stylesheet pointing out what is in use?
Could also be helpful after a long developing proces, where you have made a lot of editing and you maybe ended up with a lot of un-used code.
Please right click and select Inspect Element while you are on your webpage on any browser.
You could use purifyCSS, which would require grunt or alike. There's this site https://unused-css.com/, which would require a URL of your site, it then would scan just a single page of that site. There's also this tool: https://sourceforge.net/projects/cssscanner/, which gives you list of used and unused selectors, but you need a machine running Windows to use.

Auditing unused CSS on complex web pages

I know there are several tools available to find unused CSS on a static web page. But in most real world scenarios I encounter, a lot of the CSS is used after some or the other interaction on the page, maybe a new modal opening up or an options popup etc.
In such scenarios, what would you suggest? How do I keep a tab on my ever-growing render blocking CSS?
The only way I guess one could do that is by running regular unused-css-detector type tools in conjunction with Selenium - test known interactions and see whats left unused. But a big assumption here is that I'd need to know all interactions on my page which could use new CSS. Is there a way to achieve my goal without making this assumption?
In an ideal world, I'd be able to post-back all CSS used by a visitor's browser on my page to my server. Then I'd collect data over a month, aggregate, and get a pretty accurate idea about actual unused CSS.
Any good ideas?
I am the author of a tool that is aiming at doing what you are describing. Everywhere I worked, the CSS is this "append-only" thing that is too risky, too time-consuming to clean up. And even when you try, the ROI is so low that it not worth it.
So I am working on a tool that is very similar to what you are describing. The goal is to bring confidence on what can be removed, and to actually do it automatically by submitting pull requests.
A snippet of JavaScript is running in the browser and sends reports of what is being used to a server. Once enough data is accumulated to build some "confidence score", it can create Pull Request automatically to remove selectors that are actually not used.
It is still very early stage, but you are welcome to try it and give some feedback about it.
https://www.bleachcss.com/

Clean HTML using C#

How do I repair malformed HTML using C#? A great answer would be an HTML Agility Pack sample!
I'm scraping a site (for legitimate use). The site's HTML is OK but there are some annoying problems.
One way I could go would be through regular expressions. I used Expression Web to analyse the problems and the regular expressions needed to correct them. So one way would be to use a tool such as RegexBuddy to generate C# code for these regular expressions.
However, the recommended tool for processing malformed HTML in C# is the HTML Agility Pack (HAP). Moreover, I've analysed only a handful of pages and I'm afraid that future pages will contain patterns I've not yet solved, and I would hate to enter the "find the errors in the next few pages and correct them" maintenance business. So, if HAP already has a solid, always-working solution, this would be great. The problem is that except for a few mentions here at SO I could not find any how-to-use documentation for this tool, except for the object-by-object API help file.
So - before I spend $ and learning time on RegexBuddy (no free evaluation version), or break my teeth on HAP's API documentation - is there an easy way to do this? An HAP sample would help... :-)
can you tell me what kind of annoying problems are you having?
but you dont need to use regex to clean the html, HAP will let you access the elemtents of a malformed html using Xpath Queries.
and basically you need to learn Xpath to know how to get the html elements you want.
it really depends on the kind of html you are parsing using HAP.
but there is several ways to get the elements.
like by id or class or even you can get the element that follows another element that contain a given text like "name:" for example.
you can goto W3 schools Xpath Tutorial for a nice xpath tutorial
What I took from the answers here:
1) If you're scraping a website you don't control, you'll always enter a maintenance mode where you have to fix your scraper every time the layout of the page you're scraping changes.
2) If you are limited to this known site, why not write your scraper to adjust the problems
So, if I have to go into maintenance mode, it should be as easy as possible. Therefore, my process is as follows:
I use Webius's SWExplorerAutomation to detect scenes in Web pages. The idea is that a Scene is a collection of conditions you define for IE. When a web page is loaded, IE tries to see which set of conditions is met (e.g. - page title is "Account Login", the page contains a "Login" text box a "Password" text box). If a set of conditions corresponding to a scene is detected, IE reports that the scene has been detected. This model provides an abstraction layer - Some changes in the web page can translate to changes in the scene file, saving the code from having to change. Additionally, this shields me from IE's event driven model: I call "scene. I'm evaluating this product but I'm not yet sure I'll use it, mainly because the documentation is terrible. Another alternative is Watin, and one more reason I haven't yet bought SWEA is this article accusing its author of spamming against Watin.
Once the web page has been acquired, I use Expression Web to run compatibility checks and identify errors.
I use RegexMagic to remove and correct errors. I really love this tool. Sure, sometimes it make you murderously angry because it doesn't let you do things that should be really easy, but it's a sweet, sweet tool, and the documentation is amazing.
Finally, after all the errors I know have been corrected, I use HTML Agility Pack to convert to XHTML - cross the ts and dot the is, so to speak: all lower case, quotes across attributes, and so on.
Hope this helps!
Avi
Regex can't be used for HTML Cleaning.
Does http://tidy.sourceforge.net/ helps?
If you're scraping a website you don't control, you'll always enter a maintenance mode where you have to fix your scraper every time the layout of the page you're scraping changes. It doesn't matter if you're using the regex <td color="red">\d+</td> to get the big red number from a page or if you're using a DOM parser to get the 3rd cell in the 2nd row in the table with id numbers to get the same. The regex breaks if the webmaster replaces the color attribute with a class attribute. The DOM parser breaks if the webmaster adds another row to the top of the table.
If you're scraping larger parts of a web page and want to embed them in your own web page, it may be easier to get over your desire for web standards compliance and just let the browser figure out how to display things.
Since you're using Html Agility Pack and know of the problems that occur, if you are limited to this known site, why not write your scraper to adjust the problems when you've loaded the HtmlDocument.
i.e.:
If you know the element always appears after the , insert the element into the first child position of the tag.....

In MediaWiki is there a way to force a group of pages to have a particular skin?

The reason I am keen to do this is that we have a wiki which works great, but I would like to store help pages for an internal application in the wiki and link to those pages direct from the app. Although we wouldn't have concerns with people seeing the non-article stuff (i.e. the help pages) when viewing the pages from the rest of the wiki, for it to be streamlined when viewed from the application I thought it would be ideal if I gave it a simplified skin which I would design.
I have already found out that URLs can have the useskin= added (e.g. as is done in the Preview Skin page within the User Preferences pages), but following the links will revert you to your normal chosen skin.
Is there perhaps some way to adjust the skin, so that all the links contain useskin=? (I think this might have issues, since you appear to need the full pagename for useskin to work (e.g. ..../w/index.php?title=blah....&useskin=cologneblue as opposed to the short URLs).
If this isn't a smart way to go, I could consider different approaches (I run the box the wiki is on and could create a distinct wiki perhaps, although there might be disadvantages to this, such as needing to combine the user tables and maybe this would still pick up the user's preferred skin unless I re-coded things).
Any sensible suggestions gratefully received! Let me know if there's any more info you might need or if I need to clarify any points about my objective.
[I did submit this on the MediaWiki.org Support Desk page, but it got no response... I hope my question isn't that bad!!]
You could put all your content in its own namespace, then set the skin for that namespace using this extension (I've used it, it works well enough):
http://www.mediawiki.org/wiki/Extension:SkinPerNamespace
If you don't want to lock them all into a single namespace, you can also use the SkinPerPage extension to mark the pages individually.
Why not change the default skin to the skin you want?

HTML or RTF to display columns of text with color-coded words

In my Delphi program I want to display some information generated by the application. Nothing fancy, just 2 columns of text with parts of words color-coded.
I think I basically have two options:
HTML in a TWebbrowser
RTF in a TRichEdit.
HTML is more standard, but seems to load slower, and I had to deal with The Annoying Click Sound.
Is RTF still a good alternative these days?
Note: The documents will be discarded after viewing.
I would vote for HTML.
I think it is more future oriented. The speed would not concern me.
The question of HTML or RTF may be irrelevant. If they are just used for display purposes, then the file format doesn't matter. It's really just an internal representation. (Are any files even being saved to disk?) I think the question to ask is which one solves the problem with the least amount of work.
I would be slightly concerned that the browser control is changing all the time. I doubt the richedit control will change much. I would lean towards the richedit control because I think there is less that could go wrong with it. But it's probably not a big deal either way.
Have you considered doing an ownerdraw TListView?
I'd also use HTML. Besides, you just got an answer for the clicking sound in TWebBrowser.
If you'd rather not use TWebBrowser, take a look at Dave Baldwin's free HTML Display Components.
I would vote for HTML, too.
We started an app a while ago...
We wanted to
display some information generated by the application. Nothing fancy, just...
(do you hear the bells ring???)
Then we wanted to display more information and style it even more....
...someone decided, that RTF isn't enough anymore, but for backwards compatibility we moved on to MS Word over OLE-Server. That was the end of talking about performance anymore.
I think if we would have done that in HTML it would be much faster now.
RTF is much easier to deal with, as the TRichEdit control is part of every single Windows installation, and has much less overhead than TWebBrowser (which is basically embedding an ActiveX version of Internet Explorer into your app).
TRichEdit is also much easier to use to programmatically add text and formatting. Using the SelStart and SelLength, along with the text Attributes, makes adding bolding and italics, setting different fonts, etc. simple. And, as Re0sless said, TRichEdit can easily be printed while TWebBrowser makes it more complicated to do so.
I would vote RTF as I dont like the fact TWebBrowser uses Internet explorer, as we have had trouble with this in the past on tightly locked down computers.
Also TRichEdit has a print method build in, where as you have to do all sorts of messing about to get the TWebBrowser to print.
Nobody seems to have mentioned a reporting component yet. Yes, it is overkill right now, but if you use it anyway (and maybe you already have got some reporting to do in your app, so the component is already included) you can just display the preview and allow to print / export to pdf later, if it makes any sense. Also if you later decide that you want to have a fancier display there is nothing holding you back.
If both HTML and RTF won't satisfy your need, you could also use an open source text/edit component that supports coloring words or create your own edit component based on a Delphi component.
Another alternative to the HTML browser is the "Embedded Web Browser" components which I used a few projects for displaying html documents to the user. You have complete control over the embedded browser, and I don't recall any clicks when a page is loaded.
I vote for HTML also
RTF is good only for its editor, else then you'd better go standard.
RTF offers some useful text editing options like horizontal tabulator which are not available in HTML. Automatic hyperlink detection is also a nice extra. But I think I would prefer HTML, if these features are not required.
I vote for HTML.
Easier to generate programmatically.
Widely supported.
Since you don't need WYSIWYG capabilities I think HTML advantages trump RTF. Moreover, should the need to export generated data for further, WP-like editing arise, remember that major word processor can open and convert HTML files.
Use HTML, but with 'Delphi Wrapper for Chromium Embedded' by Henri Gourvest , Chromium embedded uses the core that powers Google Chrome.
Don't use TWebBrowser, I'm suffering from all programs that use IE's web control - the font is too small on my 22' monitor with a 1920x1080 resolution, I use Windows 7 and my system's DPI is 150% (XP mode), I tried everything to tweak trying to fix that, no luck...