When I want to have a message show when a user mouse overs an object, and lately I just use the title attribute on my html tags since it's simple and automatically doesn't go off screen.
Question: Is using the title attribute is a bad thing to rely on for a tool-tip?
Ignoring the fact you can't customize it, I'm curious about functionality over using a custom made tool-tip (such as how the standard user interacts with it). A specific web-comic I read, for example, uses the title attribute to add a witty comment / factoid when you hover over it. Yet not many people seem to know about it.
As such it seems a title might be good for a comment, or even saying author of a picture, but is it good for a true simply tool-tip?
Considering for a 'real' tool-tip you need usually 1-2 extra elements, css (and depending how you set it up, possibly some inline style for placement), and possibly even java-script, is the title attribute bad to use since (again) it cannot be customized, is often a small off-topic detail about the element, and only appears after a set amount of time.
Note: If it helps (food for thought), my current situation that brought this question on, is I like when a site has something like [?] for you to hover over to find more details without shoving them into the page, thus keeping it simple.
Also, I learned html from w3schools, and they never mention the title attribute, so not really sure what they are intended / should be used for. (and yes, mentioning w3schools part was a (bad) attempt at getting sympathy)
And I find this question kind of weird to ask considering SO uses them quite a bit, but feel free to assume I know nothing about it as... well... I really don't)
The title attribute (#title), should not be used.
Every browser does their own thing with the #title, even though it looks the same.
For people who just use the keyboard, they cannot get the information in #title.
People accessing the site from a mobile device, cannot get the information.
Some, but not all assistive technology can get the information in the #title
some allows it to be read after enabling it. Which not many people (users) know about.
other technology simply ignores the link text and reads the #title only.
Ex of 4.2:
Delete your account
This will read:
Are you sure? Link
Further Reading: PG: Title attribute
First of all,hats off to your question. Good thinking. I guess we people [I'm speaking about amateur coders like me] needn't develop a big site or rather lack that expertise. We simple need to keep getting our things done in an optimized manner. Therefore, similarily, I have also encountered almost every script using title for tooltips. Guess,it's the simple way to tackle it. Moreover, as long as the tooltip is attractive, isn't slow, and caters our need: its all good.
The title attribute is simple, and simplistic. It is not reliable. No tooltip mechanism really is, but the tooltips generated by title attributes have rather poor usability: tiny font size, problems with line control, timed disappearance, no way (to normal users) to make it stay put so that you can actually read it even if you are a slow reader. Besides, there is normally no hint to the user about the availability of a tooltip.
Related
<nav> typically has <a> elements inside it, but is it required? I have a radio button form whose purpose is akin to navigation. It doesn't navigate to other pages on the Internet, but instead changes the visibility of elements within the body of the HTML document (like a carousel).
Thanks! I've been wrestling with semantic markup tonight!
tl;dr: No, the nav element requires links (but not necessarily a elements).
W3C’s HTML5 defines the nav element like this:
The nav element represents a section of a page that links to other pages or to parts within the page: a section with navigation links.
By default, a radio button doesn’t link to a part within the page. In your case, it changes the visibility of an element on the page. So it’s not appropriate to use the nav element for this purpose (unless, maybe, changing the radio button focuses the changed element; but that behaviour might be bad for usability).
The whole point of semantics, is that when viewed in an entirely different light, it's still readily clear in how to derive meaning and relational context from the content.
A web browser will parse your HTML out of the Dom after every load or change, and construct an opinion based on that about the content. The browser will keep this handy to itself internally in case it needs it later to assist it with difficult judgement calls should it be asked to perform a seemingly complex operation.
For example, someone who has really poor eyesight might enable an accessibility feature on their mobile device that tacks on a variety of different visual styles adding a great deal of visual emphasis to interactible elements they can touch, depending on the type of interaction. This could be something like a bright color coated and outlined overlay on top of elements, perhaps something like cyan for multimedia controls, yellow for form elements, and magenta for navigation points. This feature would have to work on any and all possible content which the browser will ever render, and so what you've got is a hidden under the hood runtime script that the browser is using to dynamically parse what ever it's loaded in order to construct some sort of opinion which it can lean on exorcise what will hopefully be good judgement. So no matter how clear your navigation might seem visually to someone with great eyesight, this is why semantics are such a big deal and why it's so important we continue to make efforts to use them correctly, as here you have a machine alternatively parsing your source code because it has zero comprehension of it's otherwise visual context.
When it comes to accessibility, browsers are much more complex in forming their opinions than just simply parsing the Dom. In a scenario such as this example, and the code you're wanting to write, wrapping your navigation elements in a nav tag should properly assist the browser into making the right call. Even if they're not link tags, the browser is going to take note of any elements inside of a nav tag witch active event listener handles items like click and similar.
As another user mentioned, semantics is all about judgement. There are countless other ways which good semantics play a role into good development, dry code and easier maintainability being my two favorites. There are no hard lines for "can do" and "can't do", but practicing good semantics is still pretty easy to do regardless. Just continually ask yourself these couple of questions about your core content-
• If someone or something tried to use this in ways which I'm not explicitly building in targeted functionality for, do I think it will be able to understand what content is what and the associated intents well enough to be successful at what ever is being attempted?
• If I was to refactor or repurpose any of this later, is there a clear separation of content, logic, and style? Is my content clean, and meaningfully distinguished? Is it so clean and ready, that I can just rip it out and drop it into something new with little or no change? Essentially, how portable is this content? Is it plug and play level portable? And if not, could it be made more portable with better semantics?
Practice developing with proper semantics using those couple of core guidelines, and you'll almost always be perfectly fine.
Just to make sure I've directly addressed your question- Yes. What you've done is "okay", and "semantically legal" 🙂
As far as I understand, not rendering the HTML for an element at all, or adding display:none, seem to have exactly the same behavior: both make the element disappear and not interact with the HTML.
I am trying to disable and hide a checkbox. So the total amount of HTML is small; I can't imagine performance could be an issue.
As far as writing server code goes, the coding work is about the same.
Given these two options, is one better practice than the other? Or does it not matter which I use at all?
As far as I understand, not rendering the HTML for an element at all, or adding display:none, seem to have exactly the same behavior: both make the element disappear and not interact with the HTML.
No, these two options don’t have "exactly the same behavior".
If you hide an element with CSS (display:none), it will still be rendered for
user agents that don’t support CSS (e.g., text browsers), and
user agents that overwrite your CSS (e.g., user style sheets).
So if you don’t need it, don’t include it.
If, for whatever reason, you have to include the element, but it’s not relevant for your document/users (no matter in which presentation), then use the hidden attribute. By using this attribute, you give the information on the HTML level, hence CSS support is not needed/relevant.
You might want to use display:none in addition (this is what many CSS supporting user agents do anyway, but it’s useful for CSS-capable user agents that don’t support the hidden attribute).
You could also use the aria-hidden state in addition, which could be useful for user agents that support WAI-ARIA but not the hidden attribute.
I mean do you need that checkbox? If not then .hide() is just brushing things under the carpet. You are making your HTML cluttered as well as your CSS. However, if it needs to be there then sure, but if you can do without the checkbox then I would not have it in the HTML.
Keep it simple and readable.
The only positive thing I see in hiding it is in the case where you might want to add it back in later as a result of a button being clicked or something else activating it in the page. Otherwise it is just making your code needlessly longer.
For such a tiny scenario the result would be practically the same. But hiding the controls with CSS is IMO not something that you want to make a habit of.
It is always a good idea to make both the code and its output efficient to the point that is practical. So if it's easy for you to not include some controls in the output by adding a little condition everything can be managed tidily, try to do so. Of course this would not extend to the part of your code that receives input, because there you should always be ready to handle any arbitrary data (at least for a public app).
On the other hand, in some cases the code that produces the output is hard to modify; in particular, giving it the capability to determine what to do could involve doing damage in the form of following bad practices: perhaps add a global variable, or else modify/override several functions so that the condition can be transferred through. It's not unreasonable in that case to just add a little CSS in order to again, achieve the solution in a short and localized manner.
It's also interesting to note that in some cases the decision can turn out to be based on hard external factors. For example, a pretty basic mechanism of detecting spambots is to include a field that appears no different in HTML than the others but is made invisible with CSS. In this situation a spambot might fill in the invisible field and thus give itself away.
The confusion point here is this: Why would you ever use display: none instead of simply not render something?
To which the answer is: because you're doing it client side!
"display: none" is better practice when you're doing client side manipulations where the element might need to disappear or reappear without an additional trip to the server. In that case, it is still part of the logical structure of the page and easier to access and manipulate it than remove (and then store in memory in Javascript) and insert it.
However if you're using a server-side heavy framework and always have the liberty of not rendering it, yes, display:none is rather pointless.
Go with "display:none" if the client has to do the work, and manage its relation to the DOM
Go with not rendering it if every time the rendered/not rendered decision changes, the server is generating fresh (and fairly immutable) HTML each time.
I'm not a fan of adding markup to your HTML that cannot be seen and serves no purpose. You didn't provide a single benefit of doing that in your question and so the simple answer is: If you don't need a checkbox to be part of the page, then don't include it in your markup.
I suspect that a hidden checkbox will not add any noticeable time to the download or work by the server. So I agree it's not really a consideration. However, many pages do have extra content (comments, viewstate, etc.) and it can all add up. So anyone with the attitude that they will go ahead and add content that is not needed and never seen by the user, I would expect them to create pages that are noticeably slower overall.
Now, you haven't provided any information about why you might want to include markup that is not needed. Although you said nothing about client script, the one case where I might leave elements in a page that are hidden is when I'm writing client script to remove them. In this case, I may hide() it and leave in the markup. One reason for that is that I could easily show it again if needed.
That's my answer, but I think you'd get a much better answer if you described what considerations you had for including markup on the page that no one will see. Surely, it must offer some benefit that you haven't disclosed or you would have no reason to do it.
EDIT: Rephrased question.
Other than being bad practice, what other reasons are there against empty paragraphs in HTML?
ORIGINAL:
Background
Currently to add a nicely space paragraph in our CMS you press Enter key twice. I don't like empty paragraphs because they seem unnecessary to me. If you want a new paragraph, just press Enter and space it with CSS. If you want to write just below some text (e.g. to display code), then do a line break with Shift+Enter.
Question
Is there any very good reason in not allowing empty paragraphs? Is there a standard here? Seems like I just have a philosophical issue right now -- i.e. using empty paragraphs probably won't make page viewing faster or save that much space.
One thing I've learnt the heard way is that any time you have a WYSIWYG editor for a web page, you stand a risk of ending up with poor quality HTML.
It doesn't matter how good the editor is, or how well trained your people are to use it, you will end up with bad code.
They'll click the 'bold' button instead of selecting your sub-title class. They'll create spurious paragraph tags rather than line breaks. And I've had to explain to one person several times why it's a bad idea to use multiple spaces to indent stuff.
Even when people are very good at using the editor and understand the implications, you'll still get things like stray markup setting styles and then unsetting them without any content, because if you (for example) make a word bold and then delete it, it generally doesn't delete the bold tags, and no-one thinks to switch to the HTML view to check.
The basic problem is that when you make it easy to use like a word processor, people will treat it like a word processor, and the underlying code becomes completely irrelevant to them. Their job is to produce content that looks good, and as long as they can achieve that, they don't generally care for how the code looks.
The good thing is that there is a solution. In general, the people generating the content are the same people who care the most about SEO. If you emphasise that there might be SEO consequences to poor quality HTML, I find that they suddenly care a lot more about the code they're generating. They still don't generally have the skills to fix it when they've broken it, but it does seem to make people take more care to follow the rules.
To directly answer your question, I don't think it's a disaster to have empty paragraph tags like that. It's preferable not to though, and you need to consider how the content would look semantically to a search engine - it may cause the search engine to see the two paragraphs of content as being less connected to each other than they should be. This may affect how it weights the content of each paragraph when it comes to deciding its page rank. In truth, it's unlikely to be a huge difference; in fact, I'd say it's probably very tiny, but in a competitive world, it could be enough to push you down a few places. There are probably other more important SEO issues for you to deal with, but as they say, every little helps.
There are times when you have a CSS styling a particular element in your case a paragraph. IF you will use empty paragraph they will unneccesarily pick up that styling which might not be needed.
By styling paragraphs with CSS, you can change the way paragraphs are styled easily in future.
For example, you might want to style differently if the user is browsing on a mobile device, or you might just decide that you want to add more or less space between paragraphs (using attributes like margin-top and margin-bottom on the p tag I guess) because it just looks better that way. If the spacing is done with extra p tags it'd be a lot harder to change.
I expect that things like screen readers for the visually impaired would deal with CSS-styled paragraphs better than if the structure of the page is changed by adding empty paragraphs.
Is there a reason many websites place a small link/button to the W3C CSS/HTML validation of the respective site or is this just a weird practice that caught on?
Validation just shows that you took time to ensure that the webpage adheres to the standard, as specified in your DOCTYPE.
Ideally every page should validate, but that is very much not the exception.
For some companies it is to avoid lawsuits, as there is a standard for blind people, so, if you don't pass their validation then you can be sued.
Here is a link about ADA compliance:
http://www.icdri.org/CynthiaW/is_%20yoursite_ada_compliant.htm
You may want look here for more reasons:
http://www.clfsrpm.net/w3c-validator/docs/why.html
Basically, if you want to do something, you might as well learn to do it right and pass the validation.
So people can show off the fact that they've spent time complying with standards.
Complying with standards can be useful, as it helps ensure that your page will work across browsers. But very frequently there's no reward for spending the time getting your page to validate, and validators are a lot pickier than they really need to be to ensure interoperability. So, some people find it nice to stick a little badge on their page indicating that it validates, to show off the work they've done.
It's really not all that useful to stick a validation badge on your page, and a large number of the pages I've seen the badge on don't actually validate (as they stuck that badge on before they made changes, or they have an error that allows an & to come through unquoted), so you're pretty much right, it's just a weird little practice.
Well, it's just like why corporations want to display their ISO certifications wherever they can, on the name cards, website etc.
Maybe it's a consciousness-raising move by the developer to belatedly right the wrong done by all those rubbish 'Best viewed in Netscape Navigator 4' type of pseudo-disclaimers you used to get all the time.
It should be noted that there is no "standard", there are only "recommendations", which are not strictly followed by anyone. Besides, validator marks a lot of useless things as errors (such as missing "alt" attribute). And it doesn't guarantee anything related to how your website is displayed, it only guarantees that the syntax of your html is valid (you can easily "break" the recommendations with your code still being validated perfectly).
In my opinion it's mostly used in the same way as "Web 2.0" term: to show off. It doesn't really mean anything.
What work, if any, has been done to automatically determine the most important data within an html document? As an example, think of your standard news/blog/magazine-style website, containing navigation (with submenu's possibly), ads, comments, and the prize - our article/blog/news-body.
How would you determine what information on a news/blog/magazine is the primary data in an automated fashion?
Note: Ideally, the method would work with well-formed markup, and terrible markup. Whether somebody uses paragraph tags to make paragraphs, or a series of breaks.
Readability does a decent job of exactly this.
It's open source and posted on Google Code.
UPDATE: I see (via HN) that someone has used Readability to mangle RSS feeds into a more useful format, automagically.
think of your standard news/blog/magazine-style website, containing navigation (with submenu's possibly), ads, comments, and the prize - our article/blog/news-body.
How would you determine what information on a news/blog/magazine is the primary data in an automated fashion?
I would probably try something like this:
open URL
read in all links to same website from that page
follow all links and build a DOM tree for each URL (HTML file)
this should help you come up with redundant contents (included templates and such)
compare DOM trees for all documents on same site (tree walking)
strip all redundant nodes (i.e. repeated, navigational markup, ads and such things)
try to identify similar nodes and strip if possible
find largest unique text blocks that are not to be found in other DOMs on that website (i.e. unique content)
add as candidate for further processing
This approach of doing it seems pretty promising because it would be fairly simple to do, but still have good potential to be adaptive, even to complex Web 2.0 pages that make excessive use of templates, because it would identify similiar HTML nodes in between all pages on the same website.
This could probably be further improved by simpling using a scoring system to keep track of DOM nodes that were previously identified to contain unique contents, so that these nodes are prioritized for other pages.
Sometimes there's a CSS Media section defined as 'Print.' It's intended use is for 'Click here to print this page' links. Usually people use it to strip a lot of the fluff and leave only the meat of the information.
http://www.w3.org/TR/CSS2/media.html
I would try to read this style, and then scrape whatever is left visible.
You can use support vector machines to do text classification. One idea is to break pages into different sections (say consider each structural element like a div is a document) and gather some properties of it and convert it to a vector. (As other people suggested this could be number of words, number of links, number of images more the better.)
First start with a large set of documents (100-1000) that you already choose which part is the main part. Then use this set to train your SVM.
And for each new document you just need to convert it to vector and pass it to SVM.
This vector model actually quite useful in text classification, and you do not need to use an SVM necessarily. You can use a simpler Bayesian model as well.
And if you are interested, you can find more details in Introduction to Information Retrieval. (Freely available online)
I think the most straightforward way would be to look for the largest block of text without markup. Then, once it's found, figure out the bounds of it and extract it. You'd probably want to exclude certain tags from "not markup" like links and images, depending on what you're targeting. If this will have an interface, maybe include a checkbox list of tags to exclude from the search.
You might also look for the lowest level in the DOM tree and figure out which of those elements is the largest, but that wouldn't work well on poorly written pages, as the dom tree is often broken on such pages. If you end up using this, I'd come up with some way to see if the browser has entered quirks mode before trying it.
You might also try using several of these checks, then coming up with a metric for deciding which is best. For example, still try to use my second option above, but give it's result a lower "rating" if the browser would enter quirks mode normally. Going with this would obviously impact performance.
I think a very effective algorithm for this might be, "Which DIV has the most text in it that contains few links?"
Seldom do ads have more than two or three sentences of text. Look at the right side of this page, for example.
The content area is almost always the area with the greatest width on the page.
I would probably start with Title and anything else in a Head tag, then filter down through heading tags in order (ie h1, h2, h3, etc.)... beyond that, I guess I would go in order, from top to bottom. Depending on how it's styled, it may be a safe bet to assume a page title would have an ID or a unique class.
I would look for sentences with punctuation. Menus, headers, footers etc. usually contains seperate words, but not sentences ending containing commas and ending in period or equivalent punctuation.
You could look for the first and last element containing sentences with punctuation, and take everything in between. Headers are a special case since they usually dont have punctuation either, but you can typically recognize them as Hn elements immediately before sentences.
While this is obviously not the answer, I would assume that the important content is located near the center of the styled page and usually consists of several blocks interrupted by headlines and such. The structure itself may be a give-away in the markup, too.
A diff between articles / posts / threads would be a good filter to find out what content distinguishes a particular page (obviously this would have to be augmented to filter out random crap like ads, "quote of the day"s or banners). The structure of the content may be very similar for multiple pages, so don't rely on structural differences too much.
Instapaper does a good job with this. You might want to check Marco Arment's blog for hints about how he did it.
Today most of the news/blogs websites are using a blogging platform.
So i would create a set of rules by which i would search for content.
By example two of the most popular blogging platforms are wordpress and Google Blogspot.
Wordpress posts are marked by:
<div class="entry">
...
</div>
Blogspot posts are marked by:
<div class="post-body">
...
</div>
If the search by css classes fails you could turn to the other solutions, identifying the biggest chunk of text and so on.
As Readability is not available anymore:
If you're only interested in the outcome, you use Readability's successor Mercury, a web service.
If you're interested in some code how this can be done and prefer JavaScript, then there is Mozilla's Readability.js, which is used for Firefox's Reader View.
If you prefer Java, you can take a look at Crux, which does also pretty good job.
Or if Kotlin is more your language, then you can take a look at Readability4J, a port of above's Readability.js.