which is better to add two names (-) or(_) - html

hi when i write css or html i found that i want add two name like this
web-development
web_development
which one is better according SEO or write style name, file name or image name.

The first one is better. Also see this post by Google employee Matt Cutts: http://www.mattcutts.com/blog/dashes-vs-underscores/

use the dash. Google engines don't really parse underscores. This is maybe for programmers sanity, so that when they search for query_function, they get results they are looking for?
If you have a url like "http://example.com/web-site", google will return results for 'web', 'site' and '"web site"'. This is not the case for underscores: web_site will only return results for web_site.
ps.
I also think that dashes are better than underscores for usability purposes: a dash is a single button on the keyboard, while an underscore requires two buttons to be pressed. This has nothing to do with the technical side of SEO, but everything to do with usability, which is more important than SEO imo.

for css i don't think there is some issues with naming methodology, but for naming HTML pages - is preferred as search engines take - as space, even though good page name is not enough for good s.e.o. you need to have proper meta tag and keywords.
And make sure all your images have proper title tag, this is real essential.

Isn't it common practice to use the - to connect two words, and the _ to replace a space in situations where you can't use a space/+ sign, like CSS classNames?

first one is better in terms of SEO. Because the priority of hiphen is greater than under score

Please list two (2) words in the English language that use underscores ("_") within them.
Now list fifty (50) words that use dashes/hyphens ("-").
My opinion is that the hyphens would be a better solution for SEO.
IMO When it comes down to SEO is that everything makes a difference !

You are dealing with two different problems: URLs and CSS.
For URLs, hyphens would be the better choice because of SEO.
However, depending on your editing program, underscores might work better for mutli-word class names. In TextMate for instance, I can hit Esc to finish (auto-complete) a class I previously entered. It stops completing when it encounters a hyphen, but will fill in the whole class name when you use an underscore. If this is not the case for your editor, then it is really up to your preference.

Related

Hyphen in custom HTML-tags

I want to use custom tags at my page.
I've read that custom tags must contain a hyphen like this: <my-element></my-element>.
What will happen if I call my tags without hyphen: <myelement></myelement>?Will it affect the search engines or other consequences?
I'm sure it is not a requirement "because otherwise doesn't work". Maybe that advice is just to ensure you don't override a legacy tag and get puzzled with the fact something works in the way it's not documented.
There's definitely no problem with defining, styling and doing something via js with a tag without hyphens at all and I've practiced it and had no issue. Neither vanilla javascript nor jQuery cares about hyphens in your tags.
As for search engines - it would be very weird if search engines take account of the fact whether the tag contains the hyphen at all or not. It's like saying "will search engine penalise my page if my javascript doesn't work correctly", i.e. something that search engine is not supposed to check at all. The hyphens are just about your coding coding habits - use them if you like them.

Should the percent symbol (%) always be HTML-escaped?

I know the percent symbol has to be URL-encoded when being passed around, but when I display it in the browser, is it also necessary to escape it like so: %?
In URLs, the percent sign (%) has a special meaning, so it should be escaped. In HTML, it does not, so it is not necessary to escape it.
I agree with the chosen answer, but would like to qualify the statement “it is not necessary to escape it.”
If you have a need (or desire) to escape a percentage sign in HTML code, (and there are good reasons to do this with any potentially ambiguous character or symbol) then I would highly recommend using the percentage entity code &percnt; as opposed to any numeric code. (those I use when there is no entity name you could use)
That was the answer I was looking for when I found this page, because I forgot it looses the final "e".
We should probably all be using at least the entities kindly listed here. (whoever Webmasterish is; thank you)
Reasoning: Numeric codes (and particularly byte codes from unencoded characters) change with code–pages, on systems using different default languages, and / or different operating systems. (Windows and Mac using slightly different code sets for “English” being the classic, which still plagues plain–text eMail sent between Apple Mail and Outlook) This is slowing down, and should stop with UTF, but I'm still seeing it pop up.
If you're converting HTML to some other mark–up, (note, I used "–" not a "-", or even "−" for the same reason) such as LaTeX, DVI, PostScript or even MarkDown, then it's useful to completely squash any ambiguity… And those processes tend to happen on the information you least expect to be used in such a way when you initially write it. So just get used to doing it everywhere and be grateful to your former self for having had the foresight to do so. Probably years down the line, when you're looking to update formulae to be more readable by utilising MathJax or such, and keep picking up hyphenated words. <swearmarks>
I'd like to add this - if you use javascript in href, you are in troubles too. Check this example:
http://jsfiddle.net/cs4MZ/
One of the workarounds might be using onclick instead of href.
If you're talking about in HTML text, visible to the reader, no. It can't do anything harmful, there.
...if you're talking about inside of HTML attributes, then yes, that would be good to consider.
URLs and HTML are different languages, as weird as that might seem, so they have different weaknesses.

Find spaces in anchor links

We've got a large amount of static that HTML has links like e.g.
Link
However some of them contain spaces in the anchor e.g.
Link
Any ideas on what kind of regular expression I'd need to use to find the Spaces after the # and replace them with a - or _
Update: Just need to find them using TextMate, hence no need for a HTML parsing lib.
This regex should do it:
#[a-zA-Z]+\s+[a-zA-Z\s]+
Three Caveats.
First, if you are afraid that the page text itself (and not just the links) might contain information like "#hashtag more words", then you could make the regex more restrictive, like this:
#[a-zA-Z]+\s+[a-zA-Z\s]+\">
Second, if you have hash tags that contain characters beyond A-Z, then just add them in between the second set of brackets. So, if you have '-' as well, you would modify to:
#[a-zA-Z]+\s+[a-zA-Z-\s]+\">
Finally, this assumes that all the links you are trying to match start with a letter/word and are followed by a space, so, in the current form, it would not match "Anchor-tags-galore", but would match "Anchor tags galore."
Have you considered using an HTML parsing library like BeautifulSoup? It would make finding all the hrefs much easier!
Here, this regex matches the hash and all the words and spaces in between:
#(\w+\s)+\w+
http://dl.getdropbox.com/u/5912/Jing/2009-08-12_1651.png
When you have some time, you should download "The Regex Coach", which is an awesome tool to develop your own regexes. You get instant feedback and you learn very fast. Plus it comes at no cost!
Visit the homepage

How do you handle translation of text with markup?

I'm developing multi-language support for our web app. We're using Django's helpers around the gettext library. Everything has been surprisingly easy, except for the question of how to handle sentences that include significant HTML markup. Here's a simple example:
Please log in to continue.
Here are the approaches I can think of:
Change the link to include the whole sentence. Regardless of whether the change is a good idea in this case, the problem with this solution is that UI becomes dependent on the needs of i18n when the two are ideally independent.
Mark the whole string above for translation (formatting included). The translation strings would then also include the HTML directly. The problem with this is that changing the HTML formatting requires changing all the translation.
Tightly couple multiple translations, then use string interpolation to combine them. For the example, the phrase "Please %s to continue" and "log in" could be marked separately for translation, then combined. The "log in" is localized, then wrapped in the HREF, then inserted into the translated phrase, which keeps the %s in translation to mark where the link should go. This approach complicates the code and breaks the independence of translation strings.
Are there any other options? How have others solved this problem?
Solution 2 is what you want. Send them the whole sentence, with the HTML markup embedded.
Reasons:
The predominant translation tool, Trados, can preserve the markup from inadvertent corruption by a translator.
Trados can also auto-translate text that it has seen before, even if the content of the tags have changed (but the number of tags and their position in the sentence are the same). At the very least, the translator will give you a good discount.
Styling is locale-specific. In some cases, bold will be inappropriate in Chinese or Japanese, and italics are less commonly used in East Asian languages, for example. The translator should have the freedom to either keep or remove the styles.
Word order is language-specific. If you were to segment the above sentence into fragments, it might work for English and French, but in Chinese or Japanese the word order would not be correct when you concatenate. For this reason, it is best i18n practice to externalize entire sentences, not sentence fragments.
2, with a potential twist.
You certainly could localize the whole string, like:
loginLink=Please log in to continue
However, depending on your tooling and your localization group, they might prefer for you to do something like:
// tokens in this string add html links
loginLink=Please {0}log in{1} to continue
That would be my preferred method. You could use a different substitution pattern if you have localization tooling that ignores certain characters. E.g.
loginLink=Please %startlink%log in%endlink% to continue
Then perform the substitution in your jsp, servlet, or equivalent for whatever language you're using ...
Disclaimer: I am not experienced in internationalization of software myself.
I don't think this would be good in any case - just introduces too much coupling …
As long as you keep formatting sparse in the parts which need to be translated, this could be okay. Giving translators the possibility to give special words importance (by either making them a link or probably using <strong /> emphasis sounds like a good idea. However, those translations with (X)HTML possibly cannot be used anywhere else easily.
This sounds like unnecessary work to me …
If it were me, I think I would go with the second approach, but I would put the URI into a formatting parameter, so that this can be changed without having to change all those translations.
Please log in to continue.
You should keep in mind that you may need to teach your translators a basic knowledge of (X)HTML if you go with this approach, so that they do not screw up your markup and so that they know what to expect from that text they write. Anyhow, this additional knowledge might lead to a better semantic markup, because, as mentioned above, texts could be translated and annotated with (X)HTML to reflect local writing style.
What ever you do keep the whole sentence as one string. You need to understand the whole sentece to translate it correctly.
Not all words should be translated in all languages: e.g. in Norwegian one doesn't use "please" (we can say "vær så snill" literally "be so kind" but when used as a command it sounds too forceful) so the correct norwegian vould be:
"Logg inn for å fortsette" lit.: "Log in to continue" or
"Fortsett ved å logge inn" lit.: "Continue by to log in" etc.
You must allow completely changing the order, e.g. in a fictional demo language:
"Für kontinuer Loggen bitte ins" (if it was real) lit.: "To continue log please in"
Some language may even have one single word for (most of) this sentence too...
I'll recommend solution 1 or possibly "Please %{startlink}log in%{endlink} to continue" this way the translator can make the whole sentence a link if that's more natural, and it can be completely restructured.
Interesting question, I'll be having this problem very soon. I think I'll go for 2, without any kind of tricky stuff. HTML markup is simple, urls won't move anytime soon, and if anything is changed a new entry will be created in django.po, so we get a chance to review the translation ( ex: a script should check for empty translations after makemessages ).
So, in template :
{% load i18n %}
{% trans 'hello world' %}
... then, after python manage.py makemessages I get in my django.po
#: templates/out.html:3
msgid "hello world"
msgstr ""
I change it to my needs
#: templates/out.html:3
msgid "hello world"
msgstr "bonjour monde"
... and in the simple yet frequent cases I'll encounter, it won't be worth any further trouble. The other solutions here seems quite smart but I don't think the solution to markup problems is more markup. Plus, I want to avoid too much confusing stuff inside templates.
Your templates should be quite stable after a while, I guess, but I don't know what other trouble you expect. If the content changes over and over, perhaps that content's place is not inside the template but inside a model.
Edit: I just checked it out in the documentation, if you ever need variables inside a translation, there is blocktrans.
Makes no sense, how would you translate "log in"?
I don't think many translators have experience with HTML (the regular non-HTML-aware translators would be cheaper)
I would go with option 3, or use "Please %slog in%s to continue" and replace the %s with parts of the link.

A regular expression to remove a given (x)HTML tag from a string

Let's say I have a string holding a mess of text and (x)HTML tags. I want to remove all instances of a given tag (and any attributes of that tag), leaving all other tags and text along. What's the best Regex to get this done?
Edited to add: Oh, I appreciate that using a Regex for this particular issue is not the best solution. However, for the sake of discussion can we assume that that particular technical decision was made a few levels over my pay grade? ;)
Attempting to parse HTML with regular expressions is generally an extremely bad idea. Use a parser instead, there should be one available for your chosen language.
You might be able to get away with something like this:
</?tag[^>]*?>
But it depends on exactly what you're doing. For example, that won't remove the tag's content, and it may leave your HTML in an invalid state, depending on which tag you're trying to remove. It also copes badly with invalid HTML (and there's a lot of that about).
Use a parser instead :)
I think there is some serious anti-regex bigotry happening here. There are lots of times when you may want to strip a particular tag out of some markup when it doesn't make sense to use a full blown parser.
Of course there are times when a parser might be the best option, but if you are looking for a regex then:
<script[^>]*?>[\s\S]*?<\/script>
That would remove script tags and their contents. Make sure that you use case-insensitive matching.
If you don't want to remove the contents of the tag then you can use:
<\/?script[^>]*?>
An example of usage in javascript would be:
function stripScripts(markup) {
return markup.replace(/<script[^>]*?>[\s\S]*?<\/script>/gi, '');
}
var safeText = stripScripts(textarea.value);
I think it might be Raymond Chen (blogs.msdn.com/oldnewthing) that I'm paraphrasing (badly!) here... But, you want a Regular Expression? "Now you have two problems" ... :=)
If the string is well-formed (X)HTML, could you load it up into a parser (HTML/XML) and use this to remove any nodes of the offending variety? If it's not well-formed, then it becomes a bit more tricky, but, I suspect that a RegEx isn't the best way to go about this...
There are just TOO many ways a single tag can appear, not to mention encodings, variants, etc.
I strongly suggest you rethink this approach.... you really shouldnt have to be handling HTML directly, anyway.
Off the top of my head, I'd say this will get you started in the right direction.
s/<TAG[^>]*>([^<]*)</TAG[^>]*>/\1
Basically find the starting tag, any text in between the tags, and then the ending tag. Replace the whole thing with whatever was in between the tags.
Corrected answer:
</?TAG\b[^>]*?>
Because Dans answer would remove <br />, but you want only <b>
Here's a regex I wrote for this purpose, it works in a few more situations:
</?(?(?=b|img|a|script)notag|[a-zA-Z0-9]+)(?:\s[a-zA-Z0-9\-]+=?(?:(["",']?).*?\1?)?)*\s*/?>
While using regexes for parsing HTML is generally frowned upon or looked down on, you almost certainly don't want to write your own parser.
You could however use some inbuilt or library functions to achieve what you need.
JavaScript has getElementsByTagName and getElementById, not to mention jQuery.
PHP has the DOM extension.
Python has the awesome Beautiful Soup
...and many more.