Where to draw the line between efficiency and practicality - html

I understand very well the need for websites' front ends to be coded and compressed as much as possible, however, I feel like I have more lax standards than others when it comes to practical applications.
For instance, while I understand why some would, I don't see anything wrong with putting selectors in the <html> or <body> tags on a website with an expected small visitation rate. I would only do this for a cheap website for a small client, because I can't really justify the cost of time otherwise.
So, that said, do you think it's okay to draw a line? Where do you draw yours?

Some best practices can be safely ignored if you know what your are doing and why you are doing it.
Don't cut corners because you are lazy, but don't over engineer a 2 page website. Use your judgment.
But, if you delude yourself into thinking you are better than you are, either yourself, or a future maintainer will be cursing your existence.

For instance, while I understand why some would, I don't see anything wrong with putting selectors in the or tags on a website with an expected small visitation rate
I assume you mean putting inline CSS into those tags. Well, there's nothing wrong with that per se. As far as I'm concerned, everybody is allowed to do that to their heart's content (as long as I don't have to maintain it.) But a practice that puts all the CSS into a separate style sheet, so that the HTML file consists really only of a skeleton and the actual content, is just cleaner, easier to maintain and a joy to the eyes.
I would only do this for a cheap website for a small client, because I can't really justify the cost of time otherwise.
I don't think this reasoning is correct. A cleanly separated structure is equally expensive to build when you've got the hang of it, and cheaper to maintain in the long run.

A small client who doesn't have a lot of money to spend is going to be extremely angry when he asks you to change some color and it turns out that will take you two hours because it's specified in a bunch of in-line styles rather than in a css file.
I would also argue that if you get in the habit of using an external stylesheet and just applying styles in your HTML, you will find that it's actually faster than in-line css.

Where I draw the line is going to depend on the project. You're always going to have to choose a balance between readability and efficiency.
For example, it's possible to make HTML and JavaScript very efficient by making it unreadable--stripping whitespace, shortening element, variable, and function names, etc. To evaluate whether or not to do so, I would calculate delta in hardware costs plus opportunity cost of the heavier file and compare it to the cost of writing a generator that will take clean, easy-to-read code and turn it into terse, easy-to-load code. Whichever solution costs less is then the one to use.

Best practices so called for a reason. Your work will always reflect you, and although these things may seem small and insignificant, they will aid maintainability, readability etc when you come back to modify something. The profitability argument is one oft cited by freelancers - if you don't always do it, you may never do it. In time you'll realise it's actually quicker to "do it right" than bodge it, and you'll be proud of your work.
Always adhere to standards and best practices!

Related

Avoiding spaghetti code while writing small functions

My understanding of "Spaghetti Code" is a code base that jumps from one block of code to another without an logical and legible purpose. The most common offender seems to be the GOTO statement.
I'm currently reading/referencing the function chapter of Clean Code: A Handbook of Agile Software Craftsmanship. The author, while self admittedly, is extremely strict on the size of functions. I understand the idea of keeping functions small, however, he suggests they should be around 5 lines. While Classes certainly become more legible, I'm afraid of creating spaghetti code by writing smaller functions. Smaller functions also seem to inadvertently create much higher abstractions as well.
At what point does code become spaghetti code? How abstract is too abstract? Any answers would be greatly helpful.
As an aside, I'm a long time follower of Stack Overflow although this is my first time posting a question, so any suggestions regarding my post are welcome as well.
Thanks a lot!
As already said in the comments, there is no absolute rule. At the end, you should aim for a good readability of your code. But that is not all about the length of your methods. Robert Martin suggests ordering the methods according to the degree of abstraction. Abststract methods should be at the top of your class, and the more a method is, the deeper it should be located.
Another importand aspect is the method name. It should be chosen well in order to make clear what the method does! If you choose your method names wisely, then comments should be hardly necessary. For example, consider an if-statement:
if(isValidAge(value)) {
...
}
is much more readable than
if(value > 10 && value < 99) {
...
}
because the intention of the statement becomes much clearer. Of cause you could add a comment in the second example. But comments often become outdated (there is an extra chapter in Robert Martin's book about that). I think, this style of programming leads to many short methods.
It is hard to choose the right level of abstraction. According to my expecience, it is easier to start with a low level of abstraction. So I can first concentrate on solving the problem well. When I need more abstraction later, I still can refactor the code. TDD helps a lot!
Hope, this helps ...
I agree with comments and answers here. From practical point of view the thinks which Robert Martin writes in his books are every time very good orientations and I try to get as much close as possible to this "rules" and indeed 5-lines-methodes are mostly not to bad.
In my eyes best way to avoid spaghetti code is to write (small) classes with a high Cohesion. The disadvantage is that you become a whole bunch of classes, which makes it sometimes a little bit more hard for new employees to come in the project.
I understand the idea of keeping functions small, however, he suggests they should be around 5 lines.
That sounds ideal :)
While Classes certainly become more legible, I'm afraid of creating spaghetti code by writing smaller functions.
Spaghetti code is caused by code jumping from place to place with (having different levels of abstraction in the same function - low-level IO code and high level application logic). If you extract small functions, your result is getting further away from spaghetti code, not closer).
At what point does code become spaghetti code?
When the code forces you (the programmer) to make mental jumps (switch contexts) from line to line, the code is spaghetti code. This is true whether you use GOTOs or not (but GOTOs can make the problem much worse).

Regular Expressions vs XPath when parsing HTML text

I want to parse a HTML text and find special parts. For example a text in 3rd div of 1st row and 2nd column of a table. I have 2 options to parse: Regular Expressions and XPath. What is advantages and disadvantages of each one?
thanks
It somewhat depends on whether you have a complete HTML file of unknown but well-formed content versus having merely a snippet or an expanse of HTML of completely known content which may or may not be well-formed.
There is a difference between editing and parsing, you see.
It is one thing to be editing your own HTML file that you wrote yourself or are otherwise staring right in the face, and you issue the editor command
:100,200s!<br */>!!g
To remove the breaks from lines 200–300.
It is quite another to suck down whatever HTML happens to be at the other end of a URL and then try to make some sense out it, sight unseen.
The first calls for a regex solution — the very one shown above, in fact. To go off writing some massively overengineered behemoth to do a fall parse to set up the entire parse tree just to do the simple edit shown above is quite simply wrong. It’s also its own punishment.
On the other hand, using patterns to parse out (as opposed to lex out) an entire HTML document that can contain all kinds of whacky things you aren’t planning for just cries out for leveraging someone else’s hard work intead of recreating the wheel for yourself, and badly at that.
However, there’s something else nobody likes to mention, and that’s that most people just aren’t competent at regexes. They don’t really understand them. They don’t know how to test them or to craft them. They don’t know how to make them readable and maintainable.
The truth of the matter is that the overwhelming majority of regex users cannot even manage as simple and basic a thing as matching an arbitrary HTML tag using a regex, even when things gotchas like alternate encodings and CDATA sections and redefined entitities and <script> contents and archaic never-seen forms are all safely dispensed with.
It’s not because it’s hard to do; it isn’t, actually. It’s just that the people trying to do it understand neither regexes nor HTML particularly well, and they don’t know they don’t know, and so they get themselves in way over their heads more quickly than they realize. And then they have a complete disaster on their hands.
Plus it’s been done before, and correctly. Might as well learn from someone else’s mistakes for a change, eh? It would probably help to have a few canned regexes at your disposal to go at frequently manipulated things. This is especially useful for editing.
But for a full parse, you really shouldn’t try to embed a full HTML grammar inside your pattern. Honest, you really shouldn’t. Speaking as someone has actually can and has done this, I unlike 99.9999% of the responders here the credibility of actual experience in this area when I advise against it. Sure, I can do it, but I almost never want to, and I certainly don’t want you to try it at home unsupervised. I can’t be held responsible for any damage that might ensue. :)
Sure, this may sound like “Do as I say, not as I do,” but if your level of regex mastery were at a level that allowed you to contemplate such a thing, you would not be asking this question. As I mentioned, almost no one who uses regexes can actually match an arbitrary HTML tag, simple as that is. Given that you need that sort of building block before writing your recursive descent grammar, and given that next to nobody can even manage that simple building block, well...
Given that sad state of affairs, it’s probably best to use regexes for simple edit jobs only, and leave their use for more complete solutions to real regex wizards, for they are subtle and quick to anger. Meaning of course the regexes, not (just) the wizards.
But sure, keep some canned regexes handy for doing simple editing rather than full parsing. That way you won’t be forced to redevise them each time from first principles. I do keep a few of these around, but then I also keep simple frameworks that allow me to edit a particular structural element of the HTML, like the plain text or the tag contents or the link references, etc, and those all use a full parser, letting me then surgically target just the parts I want in complete confidence I haven’t forgotten something.
More as a testament to what is possible than what is advisable, you can see some answers with more, um, “heroic” pattern matching, including recursion,
here,
here,
here,
here,
here, and
here.
Understand that some of those were actually written for the express purpose of showing people why they should not use regexes, because some of them are really quite sophisticated, much moreso than you can expect in nonwizards. That difficulty may chase you away, which is ok, because it was sort of meant to.
But don’t let that stop you from using vi on your HTML files, nor should it scare you away from using its search or substitute commands. Don’t let the perfect be the enemy of the good. Sometimes good enough is exactly what you need, because the perfect would take more investment than it could ever be worth.
Understanding which out of several possible approaches will give you the most bang for your buck is something that takes time to learn, and no one can tell you the answer that works for you. They don’t know your dataset, your requirements, your skillset, your priorities. Therefore any categorical answer is automatically wrong. You have to evaluate these things for yourself.
I think XPath is the primary option for traversing XML-like documents. With RegExp, it will be up to you to handle the different forms of writing a tag (with multiple spaces, double quotes, single quotes, no quotes, in one line, in multi-lines, with inner data, without inner data, etc). With XPath, this is all transparent to you, and it has many features (like accessing a node by index, selecting by attribute values, selecting simblings, and MANY others).
See how powerfull it can be at http://www.w3schools.com/xpath/.
EDIT: See also How do HTML parses work if they're not using regexp?
XPath is less likely to break if the web developer does any minor changes. That would be my choice.
Here is the canonical Stackoverflow explanation for why you should not parse HTML with regex:
RegEx match open tags except XHTML self-contained tags
In general, you cannot parse HTML with regex because regex is not made to parse HTML. Just use XPath.

How should I format my markup?

When it comes to my markup, I'm anal. It always has to be perfectly indented, easily readable to me, and 100% valid with the W3C. Often time, when viewing the markup of other websites, I'm appalled with the lack of effort by the developer to try to and keep their markup in the browser clean, organized, and valid.
On the flip side, there's a lot of people who will force all their markup on to one, continuous line for the size saving benefits. This annoys me as well, though not to the same extent because it is done with a purpose. But for the most part, it seems like no developer ever actually looks at their markup in the browser and does anything about it.
Understanding that, to the parser in the browser, indents and spaces (usually) don't matter, how should I be handling my markup? Is it worth the extra time to get my markup perfectly easily readable to humans as well as the browser? Are all my \t's and \n's being used in vain?
There are some browsers who has bugs that renders indented well formed html completely wrong. Such as some versions of Internet explorer with tables and images.
Other than that, i try to keep sane indention, I don't spend to much time with it, just enough to make it easy to debug.
Is it worth the extra time to get my markup perfectly easily readable
My answer is no. The arguments:
Whoever tries to look at the code probably will want modify it so, for editing the code you need good code editor with code formatting (e.g. Netbeans). You'll very soon need other features like, syntax coloring.
Some users might prefer other type of formatting than you.
Anyone interested in readable HTML may use Tidy (of Tidy extension to Firefox) to format it.
It's a performance issue too: additional overload of formatting + stripping whitespace (and minifying when possible) will speed up the site. It's very important for sites with high traffic.
It's worth the effort imho since it helps you understand what exactly is going on in your html page, and that's definitely worth something.
If we want to write clean, elegant code in general this means we should want to generate nice, clean elegant html as well, not?
Not sure if this answers your question, but as long as the code is valid by W3C, is structured as intended. As far as your view-ability of the code (like view source) structure, that's really up to you, but I would not add too much clutter (comments etc). Use the correct DOCTYPE for your markup and you should be fine with that. I don't see any reason to "waste" time on making the source code from the browser "book" readable. The view source would only be beneficial to you so you can quickly see what's happening at a glance through source view.
I like to correctly format my markup, and I think it makes it easier to manage when I do.
Then again, I use ASP.NET and a lot of markup is generated through various controls and classes. In this case, I've decided it is not worth trying to track down each mis-aligned markup and see if something can be done to get the associated control to produce the correct result.
In short, nicely formatted markup is worth it if it can be accomplished without a huge effort.
Yes, in my opinion it is worth. It will be easier to maintain, for you and for other collegues, now and in the future.
About the disadvantage of lower performance, why not to develop a well indented and commented source file and to generate a minimized version to run on the server? It can be acheived with a simple series of regex replacements.

What is the golden rule for when to split code up into functions?

It's good to split code up into functions and classes for modularity / decoupling, but if you do it too much, you get really fragmented code which is also not good.
What is the golden rule for when to split code up into functions?
It really does depend on the size and scope of your project.
I tend to split things into functions every time there is something repeated, and that repetition generalized/abstracted, following the golden rule of DRY (Do not repeat yourself).
As far as splitting up classes, I tend to follow the Object Oriented Programming mantra of isolating what is the same from what is different, and split up classes if one class is implementing more an one large theoretical "idea" or entity.
But honestly if your code is too fragmented and opaque, you should try considering refactoring into a different approach/paradigm.
If I can give it a good name (better than the code it replaces), it becomes a function
I think you generally have to think of a chunk of codes have a chance of being reuse. It comes with experience to figure that out and plan ahead.
I agree with the answers that concentrate on reusability and eliminating repetition, but I also believe that readability is at least as important. So in addition to repetition, I would look for pieces of code (classes, functions, blocks, etc.) that do more than one thing.
If the name associated with the code does not describe what it does, then that is a good time to refactor that code into units which each have a single responsibility and a descriptive name. This separation of concerns will help both reusability, and readability.
Useful code can stick around for a long time, so it is important that you (or ideally someone else) can go back and easily understand code that was written months or years before.
Probably my own personal rule is if it is more than 2 lines long, and is referenced more than once on the same page (ASP.net), or a few times spread over a couple of pages, than I will write a function to do it.
I was taught that anything you do more than once should be a function. Anything similar should have a parent class, and above all else consult your source code "standards" within your organization. The latter mostly deals with formatting.
First, write the feature you're adding. (Notice the word "First", we tend to write a function/class before writing the feature, which might lead to having too many fragmentation).
Then, review the code you just wrote/changed, find what blocks of the code that is:
3-lines or more, and..
repeated, and..
Can be grouped under a named function. Because code is written by us, people (not programs), to be read/changed later by us, people (not programs).

Why use tables to structure your layout?

Looking at the source code for Stack Overflow, I noticed they have used tables and inline CSS quite a bit, also something I found odd was use of inline table attribute formatting.
<table width="100%">
I'm just curious if there was any specific reason(s) to why they used tables to structure their template instead of the popular (or used to be popular) DIVs.
As well...the purpose of using CSS includes and using inline CSS on the same page (I know there is probably a great answer/solution(s) for this...I'm just curious to what they are)
I understand there is nothing wrong with using tables for tabular data...but in this case Stack Overflows tables are used for structure.
Tables vs. Divs is a pointless holy war.
There are specific issues with using tables in particular ways for layout that can cause problems. One of these is building an entire site layout in a single table in order to handle margins and placement -- because of the way tables are rendered this frequently means a website will not render progressively by the browser engine as the content downloads, and can only be rendered after the entire thing has been received. For a large page or slow modem user, they may be staring at a blank page for quite a while, which is a "Bad Thing". Never mind a lot of the inconsistencies in table rendering in the mozilla/ie5 generation of browsers that made consistent cross-browser table layouts somewhat painful, especially with images in the cells.
Supporters of the pure div path like to talk about content vs. presentation, because in theory HTML 4.01 is pure content, all of it meaningful. The divs provide meaningful organizational structure in an abstract sense, which is then given presentation exclusively by CSS. In these arguments, tables are valid only if being used to contain actual tabular data. Of course, this ignores the fact that for any sufficiently complex layout, there are almost always quite a few empty divs floating around simply to support the necessary hooks for presentation CSS, breaking the first level of this abstraction. Once this abstraction is broken, there's no law stating that, when your layout simply requires a presentation hook in the HTML that has no meaningful content, a div is somehow more appropriate than a table. If you are stuck with the choice of a meaningless div or a meaningless table in order to make your layout work, choose whichever is easier.
In the end, it's about being aware of the limitations in all methods and using the one that is most appropriate. There are many cases where using a table is simply easier than setting up a pointless (i.e. not-content-meaningful) array of divs, and the table rendering limitations don't apply. If the table is small and represents a minor chunk of the interior content, the rendering delay is not relevant.
Having not been involved in SO development, I only speak generally:
I've found that tables are often easier and more consistent across browsers than CSS-based layouts.
Also, emitting random CSS here and there often happens when trying to get things done. It can be refactored later, I suppose.
With respect to why they chose to set a table's width in HTML instead of CSS, I couldn't say.
I know that SO used a real, honest to goodness designer when they started. I don't know, though, if that designer gave them an image of what the site should look like or actual markup.
Please don't flame me for saying so. We're not all CSS ninjas.
SO was probably written by programmers, not web developers.
Tables are not evil, but certain uses of them (which used to be everywhere) are evil. Namely, using spacers, nested cells, etc, to control margin and padding.
Even though everyone now a days talk about layout with css and divs, the truth of the matter is css is awful when it comes to layout. You can only do so much. Look at some suggested solutions to get 2 or 3 column layouts using css, they all suck. Throwing a <table><tr><td id="left-column"><td id="right-column"></tr></table> is a lot easier.
css is just not suitable for non-trivial layout (and by that, I mean pure div/css)
The table solution I just threw above needs to use css to control width and padding and borders and background images, etc.
Give up and use tables
Because Internet Explorer does not support the display:table CSS property, which is what provides the grid-like layout model (equivalent to how html tables are rendered). The grid-model is the simplest and most flexible way to model many layouts.
So you have three choices, none of them attractive:
sacrifice support for Internet Explorer (all other modern browsers supports display:table property, which have been part of the CSS2 standard for more than a decade)
use cumbersome CSS workarounds which are costly and hard to maintain.
sacrifice semantic purity and use TABLE-elements.
SO chose the last option, probably because they think support for Internet Explorer users is more important than support for disabled users, and because they wanted something that was quick to develop and simple to maintain.
Jeff and his team were getting it done quick and dirty. This was a very quick development cycle, without the time to refactor out much of the clutter.
And face it - unless you are an expert, CSS is time consuming for table structures.
Inline styles and included css are just a sign they were trying to get it done, not worrying (at least for the first iteration) about the "right" way of doing it. The right way was whatever worked and got it done fast.
IE8 will be the last major browser to finally add support for the CSS display: table-* values, so the distinction will go away. Hopefully this will end the whining about how hard CSS is, and people can stop polluting markup with presentation.
I'm a reasonably smart person, not too far below average at least, but I find css layout totally unintuitive. Tables are extremely intuitive. I think one measure of a good technology is how often you have to refer to the manual while reading it. Tables blow CSS out of the water in this way compared to css. Again and again, when using css, I have to dig to figure out how to do something like this
Tables and layout
SO's layout is not based on tables.
At a quick glance, I'd say SO layout is 80% div-based and 20% table-based. Tables are used in the header and on the "badges" box. Table use is appropriate for badges IMHO (it's a list of items, after all), not so good for the header.
Anywhere else, divs are used.
Inline CSS
Again, many inline definitions are used (probably to quickly mockup site's structure), but SO correctly uses also css (to style the divs and to provide print formatting).
"css is too hard" and "tables are quicker and easier" excuses, coupled with some down right misgivings about what's wrong with the use of tables for structural markup.
The question is asking why SO chose to use tables, inline css, etc., to which I think the answer is probably nothing more than that either they aren't familiar with graceful degradation and semantics, or they didn't consider it important enough to devote the time and resources.
There's nothing wrong with not knowing css, but to dismiss semantic markup and the proper use of css just because you don't know it is just WRONG.
To champion that your site looks the same in every browser using tables, while not giving a seconds thought to those that don't use a visual browser smacks of selfishness - that's a strong word, and I apologize for the tone, as I don't intend to insult.
BTW - I take umbrage at the idea that the use of tables for structural markup is a 'holy war'. While some might think that Semantic markup is overly championed, it is not based upon blind faith.
CSS is great and can really simplify your markup, however, sometimes aligning content with tables is just easier and more reliable. Also for code generated dynamically some of the pain points for non-CSS based styling are less bad. Specifically, since you are dynamically, creating the markup you can re-factor the styling elements in the functional code.
As for the in-line styles, I often use these when I'm building a page and then try to re-factor them into an external sheet later. This makes it a little easier for my testing (no need to do a hard refresh) and changes are in one file instead of two.
My thougths would be they went with table because (apart from the argument table vs css)
They needed to get the functionality get out rolled quickly to have an opinion
After all this is just the public beta.
They were experimenting with ASP.NET MVC more and layout issues less
SO is all about programming questions and answers and at the end that what matters.
SO is all about recognizing the contributors by rewarding points and badges which it does quite decently (this may also be a controversial topic).
and so on....
I have no idea, but the only explanation I can come up with is that Jeff isn't as supportive of web standards as he'd like us to think and none of the team are that hot on CSS. Programmers often use cross-browser-ness, ease-of-use and numerous other supposed time benefits to excuse their lack of CSS skills. And I don't mean that as a criticism, they're probably really good at programming C#/Java/Ruby/SQL/whatever, they just can't seem to admit that they don't really know something as well us a bunch of polo-neck wearing, Mac Book wielding designers...