Matching backticks/accute accents for ES6 template literals should appear immediately in VS Code - ecmascript-6

When I enter an apostrophe / single quote in VS Code, VS Code will automatically add a second one and put the cursor in between of the two. I want the same behaviour when I enter the acute accent `. By default, it will not show anything at first in order to allow creating special characters such as è. That might be useful in other applications, but definitely not needed in VS Code. Is there a way to fix that?
It's mostly an aesthetics thing, but I find it distracting when writing code.

Related

How to properly display Hebrew in text widget?

I'm using Manjaro Linux KDE and the most recent versions of Tcl and Tk, and am attempting to display Hebrew in a text widget. In testing, the Hebrew text was pasted into the Tcl script in the Kate text editor and appears in the correct order, right to left with compound characters.
Without using a specific font in Tcl/Tk, the text prints from left to right and separates the components of compound characters, such that the vowel points and cantillation marks appear as separate characters. After using the SBL Hebrew font, the words look better but the vowel points are not located properly and they are still written from left to right. I tried using the \u200f and \u200e marks but it made no difference; but I really don't know what I'm doing there and simply tried prefixing and suffixing it to the Hebrew word. Reversing the the string helps but the vowel points are not combined with the consonants.
I'm not using Tkinter but this older SO post seems to indicate that it is a Linux issue with Tcl.
If I extract Hebrew from SQLite using Tcl and write it to the command line using puts, it displays correctly. Also, if I copy the reversed text from the Tk text widget and paste it in this SO question, it is displayed in the correct order. To clarify, by reversed here, I don't mean using string reverse but simply that it appears reversed in Tk but when pasted in this SO box, it displays correctly.
Would you please tell me what I'm doing wrong and how to get it to display properly?
I tried to follow this document on internationalization in Tcl and encoding but don't follow how this affects displaying Hebrew in a text Widget. I also came across a web site that has code for a unicode editor that displays several languages including Hebrew but I can't follow that code either. I tried running the code and, if select Hebrew language, it writes right to left but I don't see vowel points or cantillation marks; but I don't know much about typing the Hebrew language.
Thank you.
.tw tag configure heb -font {"SBL Hebrew" 18 normal}
.tw insert end "בְּרֵאשִׁ֖ית" "heb"
# Also tried "בְּרֵאשִׁ֖ית\u200f" and "\u200fבְּרֵאשִׁ֖ית".
# and "בְּרֵאשִׁ֖ית\u200e" and "\u200eבְּרֵאשִׁ֖ית".
# Tried .t insert end [string reverse $h ] "heb", which order the
# consonants but the vowel points and cantillation marks are not correct.
This is the correct rendering.
This is from Tk. The first is in normal order and the second using string reverse. It can be observed that the vowel points are not "on" the consonants and the cantillation marks are not correct. I know little about Hebrew but I can tell they don't match and appear to be printed as separate characters instead of combined. I think what looks like a "t" under the Hebrew letter that looks similar to a "W" is two characters on top of each other-- a dot and the symbol sort of similar to a left parenthesis in the correct rendering.
I don't know why but after rebooting and installing the next batch of updates, not that they have anything to do with Tk, the rendering is different when a font is not set. However, once the SBL Hebrew font is set, then the characters are separated as displayed above.
I can tell you know that the text renders very close to correctly with Tk on macOS (I'm not sure how much is just font differences, and there's a bit of clipping of the descender decorations that I don't like, but I don't think that's Tk itself doing the wrong thing).
That means that it's definitely a rendering bug that you're seeing. I suspect it might relate to the size of chunks of characters fed into the renderer; if the low levels of the renderer are only being given a character at a time, then they've got no chance to get the overall placement correct or to apply any character combining. I'm guessing that the real issue is that TkpDrawCharsInContext() just calls Tk_DrawChars(), if my reading of the comments is right. (By contrast, the macOS renderer does something different here.)
I don't have a workaround.

Custom google font error in HTML

I have a blog where I use custom fonts from Google Fonts in each and every text of the <body> element, but whenever there is an inverted comma or a double inverted comma in my text, it is not shown as it should be - it is replaced by an unknown character.
I had even looked into the font and there is the character support for the inverted commas.
I don't think this has anything to do with your font.
If you look at the source code you will see the characters already are broken there:
This rather is a problem of your encoding. Your site is UTF-8, but the characters seem to be non-UTF-8. You either need to use UTF-8 characters or change the encoding of your site. (1st option is preferable)
If you change the site encoding to Windows-1252 (which is automatically suggested by Chrome based on the content) everything seems fine:
The question is how did you create this text? Maybe in Word and then copy and pasted? Or is your blog backend not UTF-8?
Also note there are two different characters: ’vs ´.
It's a special character. Please check below example
if you want to write "Don't" than you have to use "don’t"
if you want to write in double quote "highly sought users" than you have to use “highly sought users”
I hope this will help you.
Usually the special characters appears when you copy the text from other sources like MS word. This can be solved by manually entering inverted commas while entering or modifying in the database.

Use smart quotes in dynamic text field in flash CS4

I have a flash movie that has a text field where the user can enter their desired text. It has a counterpart text field that displays the user-entered text nearby. There's also a dropdown menu where the user can change the font in the display area (fonts are included in the library).
This all works fine. We noticed though that one of the fonts, English 111 Vivace BT, has smart (curly) quotes available. But as the user types, they always get straight quotes instead of curly ones. The straight quotes clearly do not match the font.
Is there a way to tell flash to use smart quotes as the default, rather than straight ones? I know users can manually force it to use curly quotes by using Alt-0146 for example, but I don't expect them to know that and even if they do it shouldn't be required.
I'm thinking I might be able to catch all quotes and encode them myself behind the scenes in AS3, so if they enter a single quote I replace it with the curly quote. But that sounds like a PITA, I'm hoping there's a setting somewhere instead. It seems like other Adobe programs do have a setting for typographic quotes, but I can't find that option in flash.
This sounds more like the person creating the font either made a mistake or was using some unconventional keyboard layout. Or, possibly, they were hoping for some particular text editor to replace the quotes in a special way that they were used to.
Some such editors may be MS Word (as far as I know it may "correct" quotes as you type) or Emacs (where there's plenty of input methods, possibly, there's something that automatically adjusts quotes).
In order to do something similar to MS Word or Emacs input method you'd have to do it yourself, there's no standard or conventional way on typing quotes (if I understand you correctly, those quotes have two variants, to open and close the quotation, is that correct?).
Also note that depending on the language you type in, the understanding of typographically correct quotes varies. For example Russian and German use «,», but in German the way to write it is: »quotation«, while in Russian it's «quotation». American typical way of using quotes is this: “quotation”, but sometimes, when the font has the ‚,‘ quotes they may be used, when inside another quotation, while in other languages the second pair of quotes is used for quotations inside quotations. There's also a variation of "typical American", which I think is typically British, where the opening quote is placed on the font baseline, rather than aligned with the capital letters, but I cannot find that symbol at the moment.
The whole above paragraph was written to illustrate that you don't have a set rule for placing quotes and you probably have to research what it would be most likely to be used by your audience.
(You would need to open my answer for editing to see what quote characters are used, SO replaces them with something else)

HTML Escaping - Reg expressions?

I'd like to HTML escape a specific phrase automatically and logically that is currently a statement with words highlighted with quotation marks. Within the statement, quotation or inch marks could also be used to describe a distance.
The phrase could be:
Paul said "It missed us by about a foot". In fact it was only about 9".
To escape this phrase It should really be
<pre>Paul said “It missed us by about a foot”.
In fact it was only about 9′.</pre>
Which gives
<pre>Paul said “It missed us by about a foot”.
In fact it was only about 9″.</pre>
I can't think of a sample phrase to add in a " escape as well but that could be there!
I'm looking for some help on how to identify which of the escape values to replace " characters with at runtime. The phrase was just an example and it could be anything but should be correctly formed i.e. an opening and closing quote would be present if we are to correctly escape the text.
Would I use a regular expression to find a quoted phrase in the text i.e. two " " characters before a full stop and then replace the first then the second. with
“
then
”
If I found one " replace it with a
"
unless it was after a number where I replace it with
″
How would I deal with multiple quotes within a sentence?
"It just missed" Paul said "by a foot".
This would really stump me.....
<pre>"It just missed" Paul said "by 9" almost".</pre>
The above should read when escaped correctly. (I'm showing the actual characters this time)
“It just missed” Paul said “by 9″ almost”.
Obviously an edge case but I wondered if it's possible to escape this at runtime without an understanding of the content? If not help on the more obvious phrases would be appreciated.
I would do this in two passes:
The first pass searches for any "s which are immediately preceded by numbers and does that replacement:
s/([0-9])"/\1″/g
Depending on the text you're dealing with, you may want/need to extend this regex to also recognize numbers that are spelled out as words; I've only checked for digits for the sake of simplicity.
With all of those taken care of, a second pass can then easily convert pairs of "s as you've described:
s/"([^"]*)"/“\1”/g
Note the use of [^"]* rather than .* - we want to find two sets of double-quotes with any number of non-double-quote characters between them. By adding that restriction, there won't be any problems handling strings with multiple quoted sections. (This could also be accomplished using the non-greedy .*?, but a negated character class more clearly states your intent and, in most regex implementations, is more efficient.)
A stray, mismatched " somewhere in the string, or an inch marker which is missed by the first pass, can still cause problems, of course, but there's no way to avoid that possibility without implementing understanding of the content.
what you've described is basically a hidden markov model,
http://en.wikipedia.org/wiki/Hidden_Markov_model
you have a set of input symbols (your original text and ambiguous punctuation), and a set of output symbols (original text and more fine-grained punctuation) but no good way of really observing the connection between the two in a programmatic way. you could write some rules to cover some of the edge cases, but that will basically never work for the multiple quotes situation. in this case you can't really use a regex for the same reason, but with an hmm, and a bunch of training text you could probably mmake some pretty good guesses.
sorry that's probably not very helpful if you're trying to get something ready for deployment, but the input has greater ambiguity than the output, so your only option is to consider the context, and that basically means either a very lengthy set of rules, or some kind of machine learning approach.
interesting question though - it would be neat to see what kind of performance you could get. maybe someone's already written a paper on it?
I wondered if it's possible to escape
this at runtime without an
understanding of the content?
Considering that you're adding semantic meaning to the punctuation which is currently encoded in the other text... no, not really.
Regular expressions would be the easiest tool for at least part of it. I'd suggest looking for /\d+"/ for the inch number cases. But for quotes delimiters, after you'd looked for any other special cases or phrases, it may be easier to use an algorithm for matching pairs, like with parentheses and brackets: tokenize and count. Then test on real-world input and refine.
But I really have to ask: why?
I am not sure if it is possible at all to do that without understanding the meaning of the sentence. I tend to doubt it.
My first attempt would be the following.
go from left to right through the string
alternate replacing double primes with left and right double quotes, but replace with double primes if there is a number to the left
if the quotation marks are unbalanced at the end of the string go back until you find a number with double primes and change the double primes into left or right double quotes depending on the preceding double quotes.
I am quite sure that you can easily fail this strategy. But it is still the easy case - hard work starts when you have to deal with nested quotation marks.
I know this is off the wall, but have you considered Mechanical Turk? This is the sort of problem humans excel at, and computers, currently, are terrible at. Choosing the correct punctuation requires understanding of the meaning of the sentence, so a regex is bound to fail for edge cases.
You could try something like this. First replace the quotations with this regular expression:
"((?:[^"\d]+|\d"?)*)"
And than the inch sign:
(\d+)"
Here’s an example in JavaScript:
'"It just missed" Paul said "by 9" almost"'.replace(/"((?:[^"\d]*|\d["']?)+)"/g, "“$1”").replace(/(\d+)"/g, "$1″");

Regex to match attributes in HTML?

I have a txt file which actually is a html source of some webpage.
Inside that txt file there are various strings preceded by a "title=" tag.
e.g.
<div id='UWTDivDomains_5_6_2_2' title='Connectivity Framework'>
I am interested in getting the text Connectivity Framework to be extraced and written to a separate file.
Like this, there are many such tags each having a different text after the title='some text here which i need to extract '
I want to extract all such instances of the text from the html source/txt file and write to a separate txt file. The text can contain lower case, upper case letters and number only. The length of each text string(in characters) will vary.
I am using PowerGrep for windows. Powergrep allows me to search a text file with regular expression inout.
I tried using the search as
title='[a-zA-Z0-9]
It shows the correct matches, but it matches only first character of the string and writes only the first character of the text string matched to the second txt file, not all string.
I want all string to be matched and written to the second file.
What is the correct regular expression or way to do what i want to do, using powergrep?
-AD.
I'm just not sure how many times the question of regular expression parsing of HTML files has to be asked (and answered with the correct solution of "use a DOM parser"). It comes up every day.
The difficulties are:
In HTML attributes can have single-quotes, double-quotes or even no quotes;
Similar strings can appear in the HTML document itself;
You have to handle correct escaping; and
Malformed HTML (decent parsers are extremely robust to common errors).
So if you cater for all this (and it gets to be a pretty complicated yet still imperfect regex), it's still not 100%.
HTML parsers exist for a reason. Use them.
I'm not familiar with PowerGrep, however, your regex is incomplete. Try this:
title='[a-zA-Z0-9 ]*'
or better yet:
title='([^']*)'
The other answers all give correct changes to the regex, so I'll explain what the issue was with your original.
The square brackets indicate a character class - meaning that the regex will match any character within those brackets. However, like everything else, it will only match it once by default. Just as the regex "s" would match only the first character in "ssss", the regex "[a-zA-Z0-9]" will match only the first character in "Connectivity Framework".
By adding repetition, one can get that character class to match repeatedly. The easiest way to do this is by adding an asterisk after it (which will match 0 or more occurences). Thus the regex "[a-zA-Z0-9]*" will match as many characters in a row until it hits a character that is not in that character class (in your case, the space character since you didn't include that in your brackets).
Regexes though can be pretty complex to describe the syntax accurately - what if someone put a non-alphanumeric character such as an ampersand within the attribute? You could try to capture all input between the quotes by making the character set "anything except a quote character", so "'[^']*'" would usually do the right thing. Often you need to bear in mind escaping as well (e.g. with a string 'Mary\'s lamb' you do actually want to capture the apostrophe in the middle so a simple "everything but apostrophes" character set won't cut it) though thankfully this is not an issue with XML/HTML according to the specs.
Still, if there is an existing library available that will do the extraction for you, this is likely to be faster and more correct than rolling your own, so I would lean towards that if possible.
I would use this regular expression to get the title attribute values
<[a-z]+[^>]*\s+title\s*=\s*("[^"]*"|'[^']*'|[^\s >]*)
Note that this regex matches the attribute value expression with quotes. So you have to remove them if needed.
Here's the regex you need
title='([a-zA-Z0-9]+)'
but if you're going to be doing a lot more stuff like this, using a parser might make it much more robust and useful.
Try this instead:
title=\'[a-zA-Z0-9]*\'