Maintaining font style/formatting into a form that doesn't support html/markdown - html

I have looked into the previous postings to do with this area but haven't found any relevant answers as perhaps I am asking the wrong question.
On the popular design site Dribbble, there seem to be interesting formatting changes in profile names that break from the conventions of the site's styling.
Alot of people have been adding special characters (ΔδΓ etc.) that can be achieved by pasting into their profile form and saving changes, yet some users have somehow managed to enter formatted versions of their name, despite the profile form not supporting HTML or Markdown. You can see an example in the images below.
An example of copying the font to Google with maintained formatting
When opening in inspector, it also shows the formatted type
How could this be done in a simple text input form that doesn't support HTML/Markdown?

These are almost certainly Unicode characters, just like these characters that you reference in your question: ΔδΓ.
For example, Unicode's mathematical alphanumeric symbols section includes symbols that look like the ones in your screenshot. Since these are separate Unicode characters there is no need for additional formatting.
Users will need to have a font that supports those characters installed locally to view them.

Related

Store arbitrary characters in Semantic MediaWiki

I'm trying to store some text containing html tags into properties, which doesn't work. I created a form for a property with the data type 'text' and a template. Saving the form writes the text into the template, but it can't get displayed, as it contains illegal characters, as I guess.
What I'm trying to do:
I need a form to enter data, containing html tags and special
characters
I'd like to be able to use a query to find all those pages
and show that text using a template I provide to the ask query.
I also tried to use the free text option, but then I can't retrieve it using the ask query.
What would be the best, or at least a working solution to this?
Thanks a lot
storing text with html tags is a bit tricky in SemanticMediaWiki
The reason is the invention of the StripMarkers UNIQ/QINU by the MediaWiki developers.
When parsing the content of page with html tags in it the parsing is sort of "postponed". This technical detail unfortunately makes it hard for extension developers like the SMW developers to solve the issue of handling such content. Also it makes it hard for lay people to follow the discussion on how to solve the problem
Here are two examples of SMW Issues that are marked as "closed". This state of affairs means that by following the configuration hints in the issue your problem should be solved. If not please ask a question on the SMW issue list or even initiate the reopening of the issues.
https://github.com/SemanticMediaWiki/SemanticMediaWiki/pull/794
https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/3707
On my wiki we ran into this and resolved it by replacing special characters (we had issues with [ ] =, but the same problem happens with to < > tags too) with alternate unicode characters using the regex extension and a template before setting the property with {{#set:}}. If you want to display the formatted text on the wiki directly then call that parameter separately without replacing the unicode characters.
When you want to display the property, you can then run the reverse replacement with regex before displaying your now intact code (using the template result format to allow you to perform the operation on the output of the query).
To switch to special characters you can create this template
{{#regex:{{#regex:{{#regex:{{#regex:{{#regex:{{{1|}}}|/=/|꞊}}|/\[/|[}}|/\]/|]}}|/>/|≽}}|/</|≼}}
And to switch back you can use this as a template
{{#regex:{{#regex:{{#regex:{{#regex:{{#regex:{{{1|}}}|/꞊/|=}}|/[/|[}}|/]/|]}}|/≽/|>}}|/≼/|<}}

What are these strange characters in HTML source?

My friend runs a website and had an e-mail from Google Safesearch informing him he was hosting a phishing page. Turns out his cPanel was bruteforced (weak password) and they uploaded some of the pages onto his server. He told me about it and I wanted to take a look at how sophisticated are.
In many of the files, certain words/portions of text are strange. They display perfectly in a webbrowser, but are jumbled inside the HTML. I was wondering if anyone can tell me what this is?
Examples:
<title>WеlÑоmе tо еВаy: Sign in</title>
<span class="txtbox_title">Раsswоrd</span>
<a class="three" href="#">Fоrgоt yоur
It's also worth noting that there is normal text throughout the page that displays perfectly also.
I assume this is to stop the detection of certain words in the page, but I'm not sure. Any information would be great.
Edit: Originally was tagged as PHP. I realised that it probably shouldn't be so removed it. Be nice, kids.
Edit edit: For clarity, it's a phishing page targetting eBay users.
The examples I posted in the original post are (in order):
eBay: Sign In
Your Password
Forgot your [password]
As such I don't believe it to be any sort of malware, but a method of encrypting text to fight detection in browsers such as Chrome (which I assume detect 'hot' words in their algorithm).
They UTF-8 encoded Cyrillic letters and possibly other characters chosen for their visual similarity to common Latin letters. You are viewing the page in an editor that does not interpret data as UTF-8 but as in Latin 1 encoding.
For example, what you see as “о” is actually two bytes, 0xD0 0xBE. When interpreted as UTF-8 data (which is what browsers do here), they represent “о” U+043E CYRILLIC SMALL LETTER O. It is identical with the common Latin letter “o” in visual appearance (in any font that contains both letters), but coded as a separate character due to belonging to a different writing system. To any program, they are quite distinct characters, unless the program has been separately coded to handle “confusables”.
Such confusion is often intentionally created for various reasons. You are probably right in assuming that here the purpose was “to stop the detection of certain words in the page”. When e.g. “Forgot” is written using Cyrillic o’s (Fоrgоt), normal Find operations will find it when searching for “Forgot”.
My best guess is that there it is a custom type of keylogger. The WеlÑоmе tо еВаywould be parsed by the keylogger to output some data into a database that can be mined later for important information.
My second guess is that it is a means to scare or mess with the person whom owns the site.
My third guess is that the virus was coded by china or some other language and when the code was translated back into utf-8 it resulted in some of the unused characters to output the strange content.
EDIT
My fith guess is the the phishing website was programmatic getting the source code content of the ebay site and parsing it into it's own html file. And ebay has its own countermeasures against such a type of attack by scrambling the letter in the source code.
With this there must be some type of javascript that undoes the effects of the original source code.

How to give tool tips (title) in locale language (UTF-8) in IE and Chrome?

I want to show tool tips (title) in locale specific language by UTF-8 formatted values.
I tried It's working in firefox but not working in IE and Chrome. What I have to do for this problem?
<div title='(some UTF-8 formatted value)'
above code is working perfectly in firefox.
Thanx in advance.
The font(s) used in tooltips depend(s) on the browser, which may or may not use settings made at the operating system level. Thus it may be controllable by the user, though few users know about this. In any case, it is outside the control of the author.
This implies that the repertoire of characters you can use there may vary. A plain square or rectangle in text typically indicates that there is a recognized character but it cannot be displayed because it is not present in the font(s) being used.
Partly for reasons like this, authors are more and more moving towards using other techniques than the title attribute, namely “CSS tooltips” (or maybe “JavaScript tooltips”). This lets you use the same fonts as in the textual content or, if preferred, to set some suitable other fonts.

author html for ms word

my objective is to generate HTML markup to target ms word. So far my findings are, if you have all the styles inline to an element, the document, when opened in word renders properly. However it is lengthy task.
<h1 style="font-family:Arial">Inventory</h1>
This is how I try to achieve formatting. If i want to maintain a constant font across the document, in my HTML, I'd have to add font-family to all the elements like I've done above.
Later, I came across a codeproject article. http://www.codeproject.com/KB/office/Wordyna.aspx Now I am sort of convinced that you can declare the styles globally, but the styling language used and the formatting is not like CSS, and, I think its proprietary to ms word document formatting. I am looking for any tutorials/articles for this styling being used.
ps: I am aware about OpenXML etc, etc. I feel its too complex for me to implement at this point.
Word --should-- open valid (read: not Microsoft's proprietary html-ish mess) without fail as it's the rendering engine for Outlook when you open an HTML email. You could go to the effort to build a document entirely in-line (read: only best practice for Microsoft) as we do for HTML emails, but I suspect there are several different ways to skin this cat.
Personally, if I was trying to get a rich text formatted document from html to Word I'd use a tool such as PHPDocX to build a proper word document natively, then if I really wanted Word HTML I could simply hit save on Word. I've had to do similarly with Excel, where it will accept CSV, but the outcome is always better with XLSX, and there's a similar plugin to easily author a proper XLSX document.
If that's too difficult a route (and it's not that bad, trust me) then I'd stick to formatting following HTML Email rules. Simple guides are all over the web, such as here. And, since Outlook 07-current uses Word's html rendering engine, one could deduce that it has the same limitations listed here

Why do I need Markdown?

Why do I need a Markdown with a front edit editor like WMD? What does the markdown do to the content that’s sent from the WMD editor?
How does Markdown store the content in the backend? Is it the same way like *bold* or in some other format? Why can’t I just do an html encode?
Sorry if I sounded very naïve.

			
				
It's probably helpful to take a step back and ask some of the larger questions. The issue Markdown is trying to solve is that of rich editing in the browser. Consider this: At some point, for any piece of software to enable rich text it has to describe the richness in a some manner, however that may be.
We could call that description of richness (by description of richness I mean like "this bit of text is bold" or "this bit of text is a hyperlink), we could call that description of richness "markup" -- it marks up the text with meta "richness".
Implementations of rich text can take on two approaches, either a.) hide the markup from the user or b.) let them have access to the markup.
For those who choose to hide it, the end result is very often WYSIWYG. The user is oblivious to what is happening behind the scenes. The editor takes care of the details. Think MS Word as an example. No one manipulates the Word markup format as a regular end user.
For implementations which choose to expose the markup, a markup language is then in order to allow users to interacat with it. Such markup languages would be things like HTML doing <tag> or BB code for example, doing things like [tag].
Markdown is one such of these languages.
As opposed to the former types I mentioned, Markdown has tried to design itself so that the markup renders common ASCII people already use. For example, it's common for people to asterisk their text to set it off, *important*, and this notation in Markdown is an indicator of italic.
In regards to storage, as Stephan pointed out, the system will most likely store the raw markdown, because the user will most likely need to have the possibility of editing, and the original markdown can be recalled for that purpose.
In most of the systems I've built, I store the markdown, and then normalize it to a 2nd field which caches the HTML rendering of the markdown. This way I don't have to do markdown->HTML rendering for every markdown field. It takes a little more space, but I'd rather the user have a faster response than use less DB storage space.
Care should also be taken when accepting Markdown from the browser, as it can easily contain <script> tags which need to be filtered out. Most markdown implementations will also recognize HTML intermingled with Markdown formatting, as so to be safe, you need to make sure your inputs and caches are sanitized properly.
The reason for using an alternate encoding system other than HTML is for security
Markdown and other such wiki style encoding systems do not usually support scripting languages
HTML supports scripting languages in many ways (
The two main security issues are:
Malware criminals use scripts in user generated content to attempt malware actions on the content readers computer by scripting to access known security holes
Free loaders using scripts to subvert the rest of the site by changing the content frame or styles i.e. ads, menu's, logos etc. This can also be criminal behaviour if not just annoying
By using an intermediate language such as Markdown you have total control on the rendered output
Filtering HTML is possible, but is also complex and risky
The other significant reason for an alternate encoding system is enforcement of style. Normal HTML has too many options. By limiting the available options, users can only use certain styles. The usually makes for cleaner looking and more readable content (compare SO to Ebay)
The main reason for using Markdown is the readability of a marked text. For instance, you can send it in a plain-text email and the reader will still understand the emphiasis, bullets, the text will be divided in paragraphs et cetera.
When you ask about storing data, it depends. If you enable Markdown in the WordPress blog engine, it stores data as the user has input it - in Markdown. In Stack Overflow, however, it seems like the data is stored as HTML. At least, the "Stack Overflow data dumps" contain HTML, not Markdown (I've seen people complaining) that they have to convert it back).
If you use the WMD editor, you can show the user how the outputs will look like after being converted to HTML. Even though Markdown syntax is really simple, it is not hard to make mistakes. Hence, it is best to show users the output.
Another reason for using Markdown instead of a WYSIWIG control - a WYSIWIG control allows the user to use HTML in data you are displaying on your web page. So, you have to be the one who decides when there is simply incorrect HTML and when it is an evil XSS/CSRF/whatever injection. In Markdown, you simply convert *something* to <b>something</b>, remove any unknow HTML elements and you're done.