Recently I've done some careless copying and pasting into my html documents. Because the document type is set to Strict, the quotation marks and apostrophes show up as this crazy symbol: �.
Example: Brad says, �Don�t rock the boat baby.�
I considered changing the document type from Strict (which could turn out to be the easiest thing to do) but I'm not sure if changing from Strict would have any negative repercussions.
Naturally, I need to get rid of them. The problem is that I need to replace A LOT of them from a lot of different documents. I'd use the replace feature on Textpad, but it doesn't recognize � , so I can't change it. I've been reduced to going through all of the code and doing the tedious replacing.
Does anyone know of a good way to replace these things? It could be some other software, or anything else really.
I always use textmate on the Mac to replace them because it does recognize those characters. Try notepad++ and a bunch of different text editors and see what you come up with.
I remember dealing with this when I first started out and my secretary got reprimanded for such a crime.
Related
I know the percent symbol has to be URL-encoded when being passed around, but when I display it in the browser, is it also necessary to escape it like so: %?
In URLs, the percent sign (%) has a special meaning, so it should be escaped. In HTML, it does not, so it is not necessary to escape it.
I agree with the chosen answer, but would like to qualify the statement “it is not necessary to escape it.”
If you have a need (or desire) to escape a percentage sign in HTML code, (and there are good reasons to do this with any potentially ambiguous character or symbol) then I would highly recommend using the percentage entity code % as opposed to any numeric code. (those I use when there is no entity name you could use)
That was the answer I was looking for when I found this page, because I forgot it looses the final "e".
We should probably all be using at least the entities kindly listed here. (whoever Webmasterish is; thank you)
Reasoning: Numeric codes (and particularly byte codes from unencoded characters) change with code–pages, on systems using different default languages, and / or different operating systems. (Windows and Mac using slightly different code sets for “English” being the classic, which still plagues plain–text eMail sent between Apple Mail and Outlook) This is slowing down, and should stop with UTF, but I'm still seeing it pop up.
If you're converting HTML to some other mark–up, (note, I used "–" not a "-", or even "−" for the same reason) such as LaTeX, DVI, PostScript or even MarkDown, then it's useful to completely squash any ambiguity… And those processes tend to happen on the information you least expect to be used in such a way when you initially write it. So just get used to doing it everywhere and be grateful to your former self for having had the foresight to do so. Probably years down the line, when you're looking to update formulae to be more readable by utilising MathJax or such, and keep picking up hyphenated words. <swearmarks>
I'd like to add this - if you use javascript in href, you are in troubles too. Check this example:
http://jsfiddle.net/cs4MZ/
One of the workarounds might be using onclick instead of href.
If you're talking about in HTML text, visible to the reader, no. It can't do anything harmful, there.
...if you're talking about inside of HTML attributes, then yes, that would be good to consider.
URLs and HTML are different languages, as weird as that might seem, so they have different weaknesses.
I have a flash movie that has a text field where the user can enter their desired text. It has a counterpart text field that displays the user-entered text nearby. There's also a dropdown menu where the user can change the font in the display area (fonts are included in the library).
This all works fine. We noticed though that one of the fonts, English 111 Vivace BT, has smart (curly) quotes available. But as the user types, they always get straight quotes instead of curly ones. The straight quotes clearly do not match the font.
Is there a way to tell flash to use smart quotes as the default, rather than straight ones? I know users can manually force it to use curly quotes by using Alt-0146 for example, but I don't expect them to know that and even if they do it shouldn't be required.
I'm thinking I might be able to catch all quotes and encode them myself behind the scenes in AS3, so if they enter a single quote I replace it with the curly quote. But that sounds like a PITA, I'm hoping there's a setting somewhere instead. It seems like other Adobe programs do have a setting for typographic quotes, but I can't find that option in flash.
This sounds more like the person creating the font either made a mistake or was using some unconventional keyboard layout. Or, possibly, they were hoping for some particular text editor to replace the quotes in a special way that they were used to.
Some such editors may be MS Word (as far as I know it may "correct" quotes as you type) or Emacs (where there's plenty of input methods, possibly, there's something that automatically adjusts quotes).
In order to do something similar to MS Word or Emacs input method you'd have to do it yourself, there's no standard or conventional way on typing quotes (if I understand you correctly, those quotes have two variants, to open and close the quotation, is that correct?).
Also note that depending on the language you type in, the understanding of typographically correct quotes varies. For example Russian and German use «,», but in German the way to write it is: »quotation«, while in Russian it's «quotation». American typical way of using quotes is this: “quotation”, but sometimes, when the font has the ‚,‘ quotes they may be used, when inside another quotation, while in other languages the second pair of quotes is used for quotations inside quotations. There's also a variation of "typical American", which I think is typically British, where the opening quote is placed on the font baseline, rather than aligned with the capital letters, but I cannot find that symbol at the moment.
The whole above paragraph was written to illustrate that you don't have a set rule for placing quotes and you probably have to research what it would be most likely to be used by your audience.
(You would need to open my answer for editing to see what quote characters are used, SO replaces them with something else)
I'm wondering if it's a bad idea to use weird characters in my code. I recently tried using them to create little dots to indicate which slide you're on and to change slides easily:
There are tons of these types of characters, and it seems like they could be used in place of icons/images in many cases, they are style-able and scale-able, and screen readers would be able to make sense of them.
But, I don't see anyone doing this, and I've got a feeling this is a bad idea, I just can't decide why. I guess it seems too easy to be true. Could someone tell me why this is or isn't okay? Here are some more examples of the characters i'm talking about:
↖ ↗ ↙ ↘ ㊣ ◎ ○ ● ⊕ ⊙ ○ △ ▲ ☆ ★ ◇ ◆ ■ □ ▽ ▼ § ¥ 〒 ¢ £ ※ ♀ ♂ &⁂ ℡ ↂ░ ▣ ▤ ▥ ▦ ▧ ✐✌✍✡✓✔✕✖ ♂ ♀ ♥ ♡ ☜ ☞ ☎ ☏ ⊙ ◎ ☺ ☻ ► ◄ ▧ ▨ ♨ ◐ ◑ ↔ ↕ ♥ ♡ ▪ ▫ ☼ ♦ ▀ ▄ █ ▌ ▐ ░ ▒ ▬ ♦ ◊
PS: I would also welcome general information about these characters, what they're called and stuff (ASCII, Unicode)?
There are three things to deal with:
1. As characters in a sentence/text:
The problem is that some fonts simply do not have them. However since CSS can control font use you probably will not run into this problem. As long as you use a web safe font, and know that that character is available in that font, you should probably be okay.
You can also use an embedded font, though be sure to fall back on a web safe font that contains the character you need as many browser will not support embedded fonts.
However sometimes certain devices will not have multiple fonts to choose from. If that font does not support your character you will run into problems. However depending on what your site does and the audience you are targeting this may not be a problem for you. Not to mention that devices like that are very old, and uncommon.
All in all it was probably not a good idea a handful of years ago, but now you are not likely to have problems as long as you cover all your bases.
It is important however to point out that you should never hard code those characters, instead use HTML entities. Just inserting those characters into your code can lead to unpredictable results. I recently copied some text from Word directly into my code, Word used smart quotes (quote marks that curve inwards properly). They showed up fine in Notepad++, but when I viewed the page I did not get quotes, I got some weird symbol.
I could have either replaced them with normal quotes " or with HTML entities to keep the style “ and ” (“ and ”).
Any Unicode character can be inserted this way (even those without special names).
Wikipedia has a good reference:
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
2. As UI elements:
While it may be safe to use them in many cases, it is still better to use HTML elements where possible. You could simply style some div elements to be round and filled/not filled for your example.
As far as design goes they are really limiting, finding one that fits with the style of your page can be a hassle, and may mean that you will definitely need to embed a font, which is still only supported by the latest browsers.
Plus many devices do not support heavy font manipulation, and will often display them poorly. It works in the flow of your text, but as a vital part of the UI there can be major problems. Any possible issue one of those characters can bring will be multiplied by the fact that it is part of your UI.
From an artistic stand point they simply limit your abilities too much.
3. What are you doing?
Finaly you need to consider this:
Text is for telling
Image is for showing
HTML is for organizing
CSS is for making things look good while you show them
JavaScript is for functionality
Those characters are text, they are for telling someone something. So ask the question: "What am I doing?" and then use what was designed for that task. If you are telling use them, if you are showing use Image, or CSS.
I've seen this done before (the stars) and I think it's an awesome idea! It's also becoming quite popular to use a font (with #font-face) full of icons, like this one: http://fortawesome.github.com/Font-Awesome/
I can't see any downside to using a font like "font awesome" (only the upsides you mention like scalabilty and the ability to change color with CSS). Perhaps there's a downside to using the special characters you mention but none that I know of.
The problem with using those characters is that not all of them are available in all fonts used by all users, which means your application may look strange, or in the worst case be unusable. That said, it is becoming more common to assume the characters available in certain common fonts (Apple/Microsoft's Arial, Bitstream Vera). You can't even assume that you can download a font, as some users may capture content for offline reading with a service like Instapaper or Read It Later.
There are a number of problems:
Portability: using anything other than the 7-bit ASCII characters in code can make your code less portable, as recipients may use the wrong encoding. You can do a lot to mitigate this (eg. use UTF16 or at least UTF-8 encoded files). Most languages allow you to specify strings in characters using some form of escape notation (eg. "\u1234" in C#), which will avoid the problem, but loses some of the advantages.
Font-dependency: user interface elements that depend on special characters being available in a font may be harder to internationalize, since those glyphs might not be in the font that you want/need to use for a particular audience.
No color, limited choice of art: while font glyphs might seem useful to a coder, they probably look pretty poor to a UI designer.
The question is very broad; it could be split to literally thousands of questions of the type “why shouldn’t I use character ... in HTML documents?” This seems to be what the question is about—not really about code. And it’s about characters, seen as “weird” or “uncommon” or “special” from some perspective, not about character encodings. (None of the characters mentioned are encoded in ASCII. Some are encoded in ISO-8895-1. All are encoded in Unicode.)
The characters are used in HTML documents. There is no general reason against not using them, but loads of specific reasons why some specific characters might not be the best approach in a specific situation.
For example, the “little dots” you mention in your example (probably not dots at all but circles or bullets), when used as control elements as you describe, would mean poor usability and poor accessibility. Making them significantly larger would improve the situation, but this more or less proves that such text characters are not suitable for controls.
Screen readers could make sense of special characters if they used a database of various properties of characters. Well, they don’t, and they often fail to read properly even the most common special characters. Just reading the Unicode name of a character can be cryptic or outright misleading. The proper reading would generally depend on meaning and context.
The main issue, however, is that people do not generally recognize characters in the meanings that you would assign to them. How many people know what the circled plus symbol “⊕” stands for? Maybe 1 out of 1,000, optimistically thinking. It might be all right to use in on a page about advanced mathematics or physics, especially if the notation is defined there. But used in general text, it would be just… a weird character, and people would read different meanings into it, or just get puzzled.
So using special characters just because they look cool isn’t a good idea. Even when there is time and place for a special character, there are technical issues with them. How many fonts do you expect to contain “⊕”? How many of those fonts do you expect Joe Q. Public to have in his computer? In this specific case, you would find the font coverage reasonably good, but you would still have to analyze it and write a longish list of font names in your CSS code to cover most platforms. In the pile of poo case (♨), it would be unrealistic to expect most people to see anything but a symbol for unrepresentable character. Regarding the methods of finding out such things, check out my Guide to using special characters in HTML.
I've run into problems using unusual characters: the tools editor, compiler, interpreter etc.) often complain and report errors. In the end, it wasn't worth the hassle. Darn western hegemony, or homogeneity, or, well, something!
I am reading in HTML from a file and displaying it on a web page:
When I look at in the source I see:
The Club’s summer junior programs
but it shows up as:
The Club�s summer junior program
What is happening here and why the � is showing up?
Did you set the proper encoding of the html page?
Read here and here.
I'm guessing you (or someone close to you) is copy/pasting from Word and you are seeing the webby effects of word's [not so] smart quotes. The work around is to set the character encoding to utf-8 or windows-1252.
This is definitely a character encoding issue. It means the page says it has X encoding, but actually it has Y.
A very interesting read by Joel: http://www.joelonsoftware.com/articles/Unicode.html about this topic, definitively a must read if you didn't already read this.
It explains pretty well why these problems occur, how they came to be and how to avoid it :).
May be you have copied text from a work editor, like MS Word, which changes quotes to open quotes and closed quotes characters. When such a text is copied to a text file, it gives these problems.
A simple solution can be to type these quotes again in the text editor.
I would like to print some kind of ASCII "art" on a web page in pre-tags. These graphics use DOS characters to show a map like old maze games did. I didn't find anything in the HTML special character reference. Is there a way to use these characters in HTML ?
Thanks in advance.
With the right Unicode characters, the old character encodings shouldn't make much odds. The tricky bit may be converting existing ASCII art into Unicode - at which point you need to know the original encoding.
The relevant code charts will be listed on the Unicode "symbols" charts page. In particular, I suspect you'll find the box drawing and block elements charts useful.
You'll need to make sure that your page uses a font which contains the right characters, of course...
As an example, you can render this:
┌┐
└┘
With:
<pre>┌┐
└┘</pre>
Not quite a proper box, but getting there...
You can send them in the <pre> tags, although in XHTML you'll need to encapsulate it in <![CDATA[[]> I think. Be careful though, not all encodings render this correctly. For example, a lot of ASCII art designed for DOS code page 430 (US) fails over here in the UK (830). Eastern Europe suffers especially.
I think the best approach here would be to render images.
EDIT: Oh. You could try , but I'm not sure if that would work.