Naming convention for web projects - html

It is typical in web projects to have multiple solutions work together to produce the desired outcome. In my case, I am currently working on a project that uses the following: HTML, CSS, JavaScript, jQuery, AJAX, JSON, PHP, MySQL, and some other libraries. This is a typical set of technologies for the average project.
At one point throughout the project, and as you might expect, you would need to interact with data that goes across from HTML to PHP via an AJAX call as a JSON object after being processed by either JavaScript or jQuery, and of course this input data is styled via CSS when displayed to the user.
An example of such HTML element is the following:
<input type="text" name="my-input-name" id="my-input-name" />
Note that the name and id are hyphenated. This could also be written as
<input type="text" name="my_input_name" id="my_input_name" />
Or
<input type="text" name="myInputName" id="myInputName" />
That name and id is what could be used by all the aforementioned technologies.
Personally, I prefer hyphenated for CSS, underscore for PHP, camel case for AJAX, and so on.
But then this means the name and id will not follow the desired naming convention in at least one of the used technologies.
I read many many different opinions (including some in Stack Overflow), but none really provided any final conclusion.
So, my question is, which convention should I use? What is best in HTML 5 and CSS 3 era?
Thanks

It really depends on the style/standards of the project you are using. I've seen all three styles you mentioned. If I may make a recommendation though, I've seen dashes "-" break certain scripts in a Linux environment. Also, underscores "_" are sometimes considered extra clutter if not needed.
If no other style concerns are important, I would recommend the third option of camel case (myInputName). The reason would be in a web environment, things have to be re-compiled each time, so fewer characters are slightly less processing.

Related

Is XML really more semantic that HTML with classes/ids?

I'm coming from a HTML / JavaScript / PHP background and have recently started learning XML.
I was reading this excerpt from "No Nonsense XML Web Development with PHP" which includes this comparison:
<div>
<div>
<h2>Product One</h2>
<p>Product One is an exciting new widget that will simplify your life.</p>
<p><b>Cost: $19.95</b></p>
<p><b>Shipping: $2.95</b></p>
</div>
</div>
Take a good look at this – admittedly simple – code sample from a computer’s perspective. A human can certainly read this document and make the necessary semantic leaps to understand it, but a computer couldn’t. ....
A computer program (and even some humans) that tried to decipher this document wouldn’t be able to make the kinds of semantic leaps required to make sense of it. The computer would be able only to render the document to a browser with the styles associated with each tag. HTML is chiefly a set of instructions for rendering documents inside a Web browser; it’s not a method of structuring documents to bring out their meaning.
The author then compares this to XML with this:
If the above document were created in XML, it might look a little like this:
<productListing title="ABC Products">
<product>
<name>Product One</name>
<description>Product One is an exciting new widget that will simplify your life.</description>
<cost>$19.95</cost>
<shipping>$2.95</shipping>
</product>
</productListing>
In theory, we should be able to look at any XML document and understand instantly what’s going on. In the example above, we know that a product listing contains products, and that each product has a name, a description, a price, and a shipping cost. You could say, rightly, that each XML document is self-describing, and is readable by both humans and software.
I get the author's point to a degree. Of course a computer would not be able to discern meaning from this HTML, there's no context.
However, I would never expect the HTML to be written in this way. Rather I would expect the HTML to use classes and/or ids to provide the necessary context more like:
<div class="productListing">
<div class="product">
<h2 class="name">Product One</h2>
<p class="description">Product One is an exciting new widget that will simplify your life.</p>
<p class="cost"><b>Cost: $19.95</b></p>
<p class="shipping"><b>Shipping: $2.95</b></p>
</div>
</div>
Given this example, my question is:
Is XML really more semantic than HTML that utilizes classes/ids to provide context to the data it contains?
(Note that I simplified the code examples to avoid TL;DR)
This is an interesting question.I'll give you my two cents.
I jumped onto XML a few years ago when I had to built a dynamic website and my client didn't have access to the database(just FTP access).What I essentially coded was an XML backend and PHP which fetched this through SimpleXML parsing.
In retrospect, I do think XML is more semantically richer than HTML. As a comment pointed out above, the html class has been a styling construct. I don't remember personally using/ hearing anyone using classes or ids for purposes other than CSS/JS based styles or animations.
The key in using XML over HTML with classes was the flexibility to throw it around. For another project, updating values of XML elements from one system, and then having them read and displayed by an other system made a lot of things smoother.Additionally, the XML parsing libraries allow a number of functions for parsing through the nodes.
Also it's important to note that XML allows you to define attributes.This could be viewed as something similar to classes and ids to HTML.
Also, let's not forget that RSS feeds are essentially XML and not HTML with more tags.
Therefore, answering your question specifically with respect to semantic, I definitely think XML has the advantage there.
TLDR:XML is more semantic according to me
You are correct that in terms of just looking at markup, there is little do none difference between XML's "meaningful" element names, and HTML class/id. However, keep in mind that for XML, there is a set of technologies and tools that allow you to easily work with element names. You can write schemas and validate against them. You can compose schemas by using namespaces. You can extract structures by using simple XPath expressions. All of this is much harder with the HTML approach.
So if you have requirements to capture and process "meaningful" structures, then XML is your friend. If all you want is to have snapshot of something where you can say "this is a product", then maybe there really might be not such a big difference.
My advice would be: If you store and process data using multiple publishing pipelines, XML very likely is a much better starting point. If all you want is capture snapshots that will get delivered to HTML-based consumers, then "semantically enriched" HTML may be the easier way to go.

Why Use Bracket Notation in HTML5?

This is a "best practices" question. I was perusing the Code Review StackExchage site, and I saw this thread. In it, Joseph Silber makes the suggestion of using bracket notation and eschewing IDs entirely for a form with multiple identically-structured rows of form inputs.
In the books I've read on web design/development, I've never seen this strategy used. Is using bracket notation and wrapping input elements within label tags, as well as not using id tags in form inputs a good idea? Perhaps there something about Javascript/JQuery that makes this approach advantageous?
I wanted to ask Joseph about it directly, but I don't have any reputation on the site yet so I couldn't even comment. However, I'm interested in seeing the community-wide perspective on this, because searching for things like "HTML5 bracket notation" and "HTML5 arraylike notation" isn't turning up anything definitive for me so far.
Here's a summary of the above question with some example code borrowed from the linked question:
Why use this:
<label>Comment:
<input type="text" name="comment[]" width="50" />
</label>
instead of this:
<label for="comment-0">Comment: </label>
<input type="text" id="comment-0" name="comment-0" width="50" />
in a form with multiple rows of the same inputs?
Sticking [] on the end of an input name will cause some server side form processing libraries to present the data as an array. Some other form processors don't need the special syntax. Your choice will largely depend on your desire to be compatible with the former kind vs having more complicated names and not being able to access the elements via dot-notation in JS.
Using child elements instead of for attributes will:
Save you from having to generate unique ids (especially useful if similar forms can appear on a single page)
Limit your styling choices
Be compatible with fewer browsers (I think the only loss is old versions of IE, which is no big loss).
So there are pros and cons there and you have to decide what it right for you.

So what if custom HTML attributes aren't valid XHTML?

I know that is the reason some people don't approve of them, but does it really matter? I think that the power that they provide, in interacting with JavaScript and storing and sending information from and to the server, outweighs the validation concern. Am I missing something? What are the ramifications of "invalid" HTML? And wouldn't a custom DTD resolve them anyway?
The ramification is that w3c comes along in 2, 5, 10 years and creates an attribute with the same name. Now your page is broken.
HTML5 is going to provide a data attribute type for legal custom attributes (like data-myattr="foo") so maybe you could start using that now and be reasonably safe from future name collisions.
Finally, you may be overlooking that custom logic is the rational behind the class attribute. Although it is generally thought of as a style attribute it is in reality a legal way to set custom meta-properties on an element. Unfortunately you are basically limited to boolean properties which is why HTML5 is adding the data prefix.
BTW, by "basically boolean" I mean in principle. In reality there is nothing to stop you using a seperator in your class name to define custom values as well as attributes.
class="document docId.56 permissions.RW"
Yes you can legally add custom attributes by using "data".
For example:
<div id="testDiv" data-myData="just testing"></div>
After that, just use the latest version of jquery to do something like:
alert($('#testDiv').data('myData'))
or to set a data attribute:
$('#testDiv').data('myData', 'new custom data')
And since jQuery works in almost all browsers, you shouldn't have any problems ;)
update
data-myData may be converted to data-mydata in some browsers, as far as the javascript engine is concerned. Best to keep it lowercase all the way.
Validation is not an end in itself, but a tool to be used to help catch mistakes early, and reduce the number of mysterious rendering and behavioural issues that your web pages may face when used on multiple browser types.
Adding custom attributes will not affect either of these issues now, and unlikely to do so in the future, but because they don't validate, it means that when you come to assess the output of a validation of your page, you will need to carefully pick between the validation issues that matter, and the ones that don't. Each time you change your page and revalidate, you have to repeat this operation. If your page validates entirely then you get a nice green PASS message, and you can move on the next stage of testing, or to the next change that needs to be made.
I've seen people obsessed with validation doing far worse/weird things than using a simple custom attribute:
<base href="http://example.com/" /><!--[if IE]></base><![endif]-->
In my opinion, custom attributes really don't matter. As other say, it may be good to watch out for future additions of attributes in the standards. But now we have data-* attributes in HTML5, so we're saved.
What really matters is that you have properly nested tags, and properly quoted attribute values.
I even use custom tag names (those introduced by HTML5, like header, footer, etc), but these ones have problems in IE.
By the way, I often find ironically how all those validation zealots bow in front of Google's clever tricks, like iframe uploads.
Instead of using custom attributes, you can associate your HTML elements with the attributes using JSON:
var customAttributes = { 'Id1': { 'custAttrib1': '', ... }, ... };
And as for the ramifications, see SpliFF's answer.
Storing multiple values in the class attribute is not correct code encapsulation and just a convoluted hack way of doing things. Take a custom ad rotator for instance that uses jquery. It is much cleaner on the page to do
<div class="left blue imagerotator" AdsImagesDir="images/ads/" startWithImage="0" endWithImage="10" rotatorTimerSeconds="3" />
and let some simple jquery code do the work from here.
Any developer or web designer now can work on the ad rotator and change values to this when asked without much ado.
Coming back to project a year later or coming into a new one where the previous developer split and went to an island somewhere in the pacific can be hell trying to figure out intentions when code is written in an unclear encrypted manner like this:
<div class="left blue imagerotator dir:images-ads endwith:10 t:3 tf:yes" />
When we write code in c# and other languages we don't write code putting all custom properties in one property as a space delimited string and end up having to parse that string every time we need to access or write to it. Think about the next person that will work on your code.
The thing with validation is that TODAY it may not matter, but you cannot know if it's going to matter tomorrow (and, by Murphy's law, it WILL matter tomorrow).
It's just better to choose a future-proof alternative. If they don't exist (they do in this particular case), the way to go is to invent a future proof alternative.
Using custom attributes is probably harmless, but still, why choose a potentially harmful solution just because you think (you can never be sure) it will cause no harm?. It might be worth to discuss this further if the future proof alternative was too costly or unwieldy, but this is certainly not the case.
Old discussion but nevertheless; in my opinion since html is a mark-up and not a progamming language, it should always be interpreted with leniency for mark-up 'errors'. A browser is perfectly able to do so. I don't think this will and should change ever. Therefore, the only important practical criteria is that your html will be displayed correctly by most browsers and will continue to do so in, say a few years. After that time, your html will probalbly be redesigned anyway.
Just to add my ingredient to the mix, validation is also important when you need to create content that can/could be post-processed using automated tools. If your content is valid you can much more easily convert markup from one format to another. For example, doing valid XHTML to XML with a specific schema is Much easier when parsing data that you know and can verify to follow a predictable format.
I, for example NEED my content to be valid XHTML because very often it is converted into XML for various jobs and then converted back without data loss or unexpected rendering results.
Well it depends on your client/boss/etc .. do they require it be validating XHTML?
Some people say there are a lot of workarounds - and depending on the sceneraio, they can work great. This includes adding classes, leveraging the rel attribute, and someone that has even written their own parser to extract JSON from HTML comments.
HTML5 provides a standard way to do this, prefix your custom attributes with "data-". I would recommend doing this now anyway, as there is a chance you may use an attribute that will be used down the track in standard XHTML.
Using non-standard HTML could make the browser render the page in "quirks mode", in which case some other parts of the page may render differently, and other things like positioning may be slightly different. Using a custom DTD may get around this, though.
Because they're not standard you have no idea what might happen, neither now, nor in the future. As others have said W3C might start using those same names in the future. But what's even more dangerous is that you don't know what the developers of "browser xxx" have done when they encounter they.
Maybe the page is rendered in quirks mode, maybe the page doesn't render at all on some obscure mobile browser, maybe the browser will leak memory, maybe a virus killer will choke on your page, etc, etc, etc.
I know that following the standards religiously might seem like snobbery. However once you have experienced problems due to not following them, you tend to stop thinking like that. However, then it's mostly too late, and you need to start your application from scratch with a different framework...
I think developers validate just to validate, but there is something to be said for the fact that it keeps markup clean. However, because every (exaggeration warning!) browser displays everything differently there really is no standard. We try to follow standards because it makes us feel like we at least have some direction. Some people argue that keeping code standard will prevent issues and conflicts in the future. My opinion: Screw that nobody implements standards correctly and fully today anyway, might as well assume all your code will fail eventually. If it works it works, use it, unless its messy or your just trying to ignore standards to stick it to W3C or something. I think its important to remember that standards are implemented very slowly, has the web changed all that much in 5 years. I'm sure anyone will have years of notice when they need to fix a potential conflict. No reason to plan for compatibility of standards in the future when you can't even rely on today's standards.
Oh I almost forgot, if your code doesn't validate 10 baby kittens will die. Are you a kitten killer?
Jquery .html(markup) doesn't work if markup is invalid.
Validation
You shouldn't need custom attributes to provide validation. A better approach would be to add validation based on fields actual task.
Assign meaning by using classes. I have classnames like:
date (Dates)
zip (Zip code)
area (Areas)
ssn (Social security number)
Example markup:
<input class="date" name="date" value="2011-08-09" />
Example javascript (with jQuery):
$('.date').validate(); // use your custom function/framework etc here.
If you need special validators for a certain or scenario you just invent new classes (or use selectors) for your
special case:
Example for checking if two passwords match:
<input id="password" />
<input id="password-confirm" />
if($('#password').val() != $('#password-confirm').val())
{
// do something if the passwords don't match
}
(This approach works quite seamless with both jQuery validation and the mvc .net framework and probably others too)
Bonus: You can assign multiple classes separated with a space class="ssn custom-one custom-two"
Sending information "from and to the server"
If you need to pass data back, use <input type="hidden" />. They work out of the box.
(Make sure you don't pass any sensitive data with hidden inputs since they can be modified by the user with almost no effort at all)

Writing XSS Filter for (X)HTML Based on White List

I need to implement a simple and efficient XSS Filter in C++ for CppCMS. I can't use existing high quality filters
written in PHP because because it is high performance framework that uses C++.
The basic idea is provide a filter that have a while list of HTML tags and a white
list of options for these tags. For example. typical HTML input can consist of
<b>, <i>, tags and <a> tag with href. But straightforward implementation is not
good enough, because, even allowed simple links may include XSS:
Click On Me
There are many other examples can be found there. So I though also about a possibility to create a white list of prefixes for tags like href/src -- so I always need to check if it starts with (https?|ftp)://
Questions:
Are these assumptions are good enough for most of purposes? Meaning that If I do not
give an options for style tags and check src/href using white list of prefixes it solves XSS problems? Are there problems that can't be fixes this way?
Is there a good reference for formal grammar of HTML/XHTML in order to write simple
parser that would cleanup all incorrect of forbidden tags like <script>
You can take a look at the Anti Samy project, trying to accomplish the same thing. It's Java and .NET though.
http://www.owasp.org/index.php/Category:OWASP_AntiSamy_Project#.NET_version
http://www.owasp.org/index.php/Category:OWASP_AntiSamy_Project_.NET
Edit 1, A bit extra :
You can potentially come up with a very strict white listing. It should be structured well and should be pretty tight and not much flexible. When you combine flexibility, so many tags, attributes and different browsers generally you end up with a XSS vulnerability.
I don't know what is your requirements but I'd go with a strict and simple tag support (only b li h1 etc.) and then strict attribute support based on the tag (for example src is only valid under href tag), then you need to do whitelisting in the attribute values as you stated http|https|ftp or style="color|background-color" etc.
Consider this one:
<x style="express/**/ion:(alert(/bah!/))">
Also you need to think about some character whitelisting or some UTF-8 normalization, because different encodings can cause awkward issues. Such as new lines in attributes, non valid UTF-8 sequences.
All details of HTML parsing are specified in HTML 5. However implementation of it is quite a lot of work, and it doesn't matter whether you'll parse HTML exactly with all corner cases. At worst you'll end up with different DOM, but you have to sanitize DOM anyway.
As you mentioned, there are various PHP implementations of this, but I don't know of any in C++, since that's not a language typically applied to web development. Overall, it's going to depend on how complex of an implementation you want to come up with.
A very restrictive whitelist is probably the "simplest" way, but if you want to be really comprehensive I would look into doing a conversion of one of the established versions to C++, as opposed to trying to write your own from scratch. There are so many tricks to worry about, that I think you'd be better off standing on the shoulders of others that have already gone through all that.
I don't know anything about using C++ for web development, but converting PHP to it doesn't seem like it would be a particularly difficult task, PHP doesn't really have any magical capabilities that C++ won't be able to duplicate. I'm sure there will be some small hitches, but overall if you want to go the more-complex route it'd definitely still be faster to do a conversion than a full design from scratch.
HTML Purifier seems like a strong PHP implementation that is still actively maintained, there's a comparison document where the author discuss some differences between his approach and others', probably worth reading.
Whatever you come up with, definitely test it with all the examples you link, and make sure it passes all those. Good luck!

Should HTML co-exist with code?

In a web application, is it acceptable to use HTML in your code (non-scripted languages, Java, .NET)?
There are two major sub questions:
Should you use code to print HTML, or otherwise directly create HTML that is displayed?
Should you mix code within your HTML pages?
Generally, it's better to keep presentation (HTML) separate from logic ("back-end" code). Your code is decoupled and easier to maintain this way.
As long as your HTML-writing code is separate from your application logic, and the HTML is guaranteed to be well-formed somehow, you should be okay.
The only code that should be mixed in markup-based pages (i.e, those that contain literal HTML) is the code used for formatting the HTML (e.g., a loop for writing out a list).
There are trade-offs whether you put the code in with the HTML or you use pure code to write the HTML out using quoted string literals.
No, if you want to build good and maintainable software, and to achieve loose coupling.
If I understand the question right, you're asking whether it's a good practice to mix markup with back-end code. No. While this is commonly done, it's still a bad idea.
You should read up on the MVC paradigm, as well as on existing questions on the matter, such as What is the best way to migrate an existing messy webapp to elegant MVC? and Best practices for refactoring classic ASP?
The point is to keep the display logic separate from the rest of the code. In any complex site you'll have code mixed in with your HTML, but the code should be for display purposes only. It shouldn't be doing any complex calculations.
For example, templates will contain loops and conditionals. Plus you'll probably have a library of HTML-specific routines, like printing out an <option> list based on a list object.
Imagine you were writing an application that has two output modes: HTML and something else. How would you write it, to avoid duplicating code? That will probably point you in the right direction.
The HTML that makes up the view has to get sent to the browser in some way. In .net, each server control emits its own HTML markup as part of the page lifecycle. So yes it is OK to use HTML in server side code.
Perhaps you should try following the ASP.net pattern. Create a bunch of controls that represent UI elements and make them responsible for emitting their own HTML based on their state.
Its fugly, and not type safe. But people do it without consequence. I'd prefer using a DOM or, at a minimum, classes designed to write HTML using type safe semantics. Also, its not all that good to mix UI with logic...
If I need methods that generate HTML I usually isolate them in an HtmlHelpers class. That way you keep some level of separation. The ASP.NET MVC Framework does this quite successfully.
If you mean printing out HTML in your code, then no. Unless you have a good reason not to, you should use templates
Even if you think you don't need this now, there's always a good chance you'll need it later. Maybe you want to output in a different format than HTML, or you want different presentation for the same data. You usually have the need for these things further down the road, so it's best to use one from the start.
I hate when developers print() a bunch of html. It's completely unnecessary and looks ugly in any text editor that shows print/echo strings in red.
I agree with everyone else that you should try as hard as you can to separate the HTML/XHTML markup from the application logic. However, sometimes you do need to generate HTML/XHTML in the application logic for various reasons.
In these cases what I have been trying to do is to ensure the bare minimum amount of presentation code is in mixed in with the application logic and try to migrate everything else over to the presentation code. It is worth nothing that is some cases you have situations where you could have everything moved over to the presentation layer, but it might be a bit easier to generate the markup as part of the application logic. In those cases, your best bet is likely to be to go the route that makes the most sense in terms of time.
I don't think there's any excuse for generating HTML inside your business logic. Don't even do it when it's just a "quick fix" or when you'll "go back and fix it later", because that never happens.
To reiterate my position from other questions, using some control logic (conditionals, loops) within HTML to construct it is OK. Do NOT do any data massaging or business logic in the HTML. You have to be disciplined, but it's worth it. Maintenance is much easier if your concerns (like logic and display) are separated.
Ideally you are aiming for a separation of concerns between your presentation (UI) code and your domain (business logic) code.
The reason why you should avoid coupling these two concerns (in either direction) is simple...
You will only have one reason to change a piece of code. whether this is from structural/styling changes in your html design, or from your business rules changing, you should only have to make the change in one place.
To a lesser extent, although many purists would disagree, by sprinkling HTML code through your domain code or vice versa you are creating noise for the next developer who comes along to read/maintain it.
I try to avoid using code to print HTML "directly". It is difficult to maintain, edit, add styles and etc. Some cases like generating an HTML email in the code, I create a text file or HTML file with markers like, [name], [verification code] and etc. I load this from the code and replace those markers. This way, you can edit the style of the email without re-compiling your code. Separating "presentation" and "logic" is a good practice in my opinion.
Mixing code within HTML is generally not a good practice in similar reasons as said in #1. However, I do use code in HTML for things like simple dynamic strings that are displayed multiple times on a page or pages. I think this is better than creating multiple server controls for same exact values to set. Since this is not code "logic" mixed in the HTML, I think this is ok.