How do I turn off smart quotes and apostrophes in Jekyll? It is breaking my gulp spellcheck process.
I want words like doesn't to stay with a single straight quote. Instead Jekyll is converting them to smart quotes like doesnt’ and I need them to stay single quoted for spell checking.
This is what I tried in my _config.yml:
kramdown:
smartquotes: ["apos", "rsquo", "ldquo", "rdquo"]
I'm using kramdown.
Here is my whole config:
name: Bitcoin Bulls
markdown: kramdown
timezone: America/Detroit
highlighter: pygments
author: David Smith
safe: true
lsi: false
permalink: none
url: http://www.bitcoinbulls.net
exclude: [CNAME, Gemfile, Gemfile.lock, '*.less', gruntfile.js, custom_css, node_modules, README.md, '*.svg', '*.docx']
include: [glyphicons-halflings-regular.svg]
kramdown:
smart_quotes: ["rdquo", "rsquo", "ldquo", "rdquo"]
relative_permalinks: false
defaults:
-
scope:
path: "" # empty string for all files
values:
layout: "default"
-
scope:
path: "" # empty string for all files
type: post
values:
layout: "post"
is_post: true
The underscore in smart_quotes is missing and the second array item needs to be apos to completely turn off smart-quotes for apostrophes.
kramdown:
smart_quotes: ["apos", "apos", "ldquo", "rdquo"]
To turn off smart-quotes for both apostrophes/single-quotes and double quotes, use this:
kramdown:
smart_quotes: ["apos", "apos", "quot", "quot"]
That's what is know as the "Programmer's world compliant config".
More details:
By default kramdown transforms apos and quot into typographic quotes. That is:
'apostrophe' becomes ‘apostrophe’
"quote" becomes “quote”
The default config provides guidance:
kramdown:
# smart_quotes:
#
# first parameter : how an opening apostrophe is transformed
# or apostrophe like in "I'm"
# default : ' -> ‘ (lsquo)
# apos : ' -> '
#
# second parameter : how a closing apostrophe is transformed
# default : ' -> ’ (rsquo)
# apos : ' -> '
#
# third parameter : how an opening double quote is transformed
# default : " -> “ (ldquo)
# quot : " -> "
#
# fourth parameter : how a closing double quote is transformed
# default : " -> ” (rdquo)
# quot : " -> "
#
# Default kramdown config
# smart_quotes: ["lsquo", "rsquo", "ldquo", "rdquo"]
#
# Programmer's world compliant config :
# smart_quotes: ["apos", "apos", "quot", "quot"]
Where:
quot = " : neutral quotation mark
apos = ' : apostrophe like in I'm
lsquo = ‘ : typographic opening single quotation mark
rsquo = ’ : typographic closing single quotation mark
ldquo = “ : typographic opening double quotation mark
rdquo = ” : typographic closing double quotation mark
Kramdown's documentation provides other options that may be of interest. The Wikipedia Quotation Mark page provides lots of details on the complications of interpretation and how things changed when Unicode was introduced.
Related
I need to extract Data from a single line of json-data which is inbetween two variables (Powershell)
my Variables:
in front of Data:
DeviceAddresses":[{"Id":
after Data:
,"
I tried this, but there needs to be some error because of all the special characters I'm using:
$devicepattern = {DeviceAddresses":[{"Id":{.*?},"}
#$deviceid = [regex]::match($changeduserdata, $devicepattern).Groups[1].Value
#$deviceid
As you've found, some character literals can't be used as-is in a regex pattern because they carry special meaning - we call these meta-characters.
In order to match the corresponding character literal in an input string, we need to escape it with \ -
to match a literal (, we use the escape sequence \(,
for a literal }, we use \}, and so on...
Fortunately, you don't need to know or remember which ones are meta-characters or escapable sequences - we can use Regex.Escape() to escape all the special character literals in a given pattern string:
$prefix = [regex]::Escape('DeviceAddresses":[{"Id":')
$capture = '(.*?)'
$suffix = [regex]::Escape(',"')
$devicePattern = "${prefix}${capture}${suffix}"
You also don't need to call [regex]::Match directly, PowerShell will populate the automatic $Matches variable with match groups whenever a scalar -match succeeds:
if($changeduserdata -match $devicePattern){
$deviceid = $Matches[1]
} else {
Write-Error 'DeviceID not found'
}
For reference, the following ASCII literals needs to be escaped in .NET's regex grammar:
$ ( ) * + . ? [ \ ^ { |
Additionally, # and (regular space character) needs to be escaped and a number of other whitespace characters have to be translated to their respective escape sequences to make patterns safe for use with the IgnorePatternWhitespace option (this is not applicable to your current scenario):
\u0009 => '\t' # Tab
\u000A => '\n' # Line Feed
\u000C => '\f' # Form Feed
\u000D => '\r' # Carriage Return
... all of which Regex.Escape() takes into account for you :)
To complement Mathias R. Jessen's helpful answer:
Generally, note that JSON data is much easier to work with - and processed more robustly - if you parse it into objects whose properties you can access - see the bottom section.
As for your regex attempt:
Note: The following also applies to all PowerShell-native regex features, such as the -match, -replace, and -split operators, the switch statement, and the Select-String cmdlet.
Mathias' answer uses [regex]::Escape() to escape the parts of the regex pattern to be used verbatim by the regex engine.
This is unequivocally the best approach if those verbatim parts aren't known in advance - e.g., when provided via a variable or expression, or passed as an argument.
However, in a regex pattern that is specified as a string literal it is often easier to individually \-escape the regex metacharacters, i.e. those characters that would otherwise have special meaning to the regex engine.
The list of characters that need escaping is (it can be inferred from the .NET Regular-Expression Quick Reference):
\ ( ) | . * + ? ^ $ [ {
If you enable the IgnorePatternWhiteSpace option (which you can do inline with
(?x), at the start of a pattern), you'll additionally have to \-escape:
#
significant whitespace characters (those you actually want matched) specified verbatim (e.g., ' ', or via string interpolation,"`t"); this does not apply to those specified via escape sequences (e.g., \t or \n).
Therefore, the solution could be simplified to:
# Sample JSON
$changeduserdata = '{"DeviceAddresses":[{"Id": 42,"More": "stuff"}]}'
# Note how [ and { are \-escaped
$deviceId = if ($changeduserdata -match 'DeviceAddresses":\[\{"Id":(.*?),"') {
$Matches[1]
}
Using ConvertFrom-Json to properly parse JSON into objects is both more robust and more convenient, as it allows property access (dot notation) to extract the value of interest:
# Sample JSON
$changeduserdata = '{"DeviceAddresses":[{"Id": 42,"More": "stuff"}]}'
# Convert to an object ([pscustomobject]) and drill down to the property
# of interest; note that the value of .DeviceAddresses is an *array* ([...]).
$deviceId = (ConvertFrom-Json $changeduserdata).DeviceAddresses[0].Id # -> 42
RoleFullPath
Applications\User Admin & Support-DEMO
PowerShell Code
$NewJSON.roleFullPath = $Line.RoleFullPath
.
.
.
.
$JSONPath = $RolePath + $FolderName + "-JSON.json"
Convertto-JSON $NewJSON | Out-file -Encoding "UTF8" $JSONPath
Output:
"roleFullPath": "Applications\\User Admin \u0026 Support-DEMO"
While converting from csv to json, character '&' is getting converted to '\u0026'
Any help?
In Windows PowerShell v5.1, ConvertTo-Json indeed unexpectedly encodes & characters as Unicode escape sequence \u0026, where 0026 represents hex. number 0x26, the Unicode code point representing the & character, U+0026.
(PowerShell Core, by contrast, preserves the & as-is.)
That said, JSON parsers should be able to interpret such escape sequences and, indeed, the complementary ConvertFrom-Json cmdlet is.
Note: The solutions below are general ones that can handle the Unicode escape sequences of any Unicode character; since ConvertTo-Json seemingly only uses these Unicode escape-sequence representations for the characters &, ', < and >, a simpler solution is possible, unless false positives must be ruled out - see this answer.
That said, if you do want to manually convert Unicode escape sequences into their character equivalents in JSON text, you can use the following - limited solution:
# Sample JSON with Unicode escapes.
$json = '{ "roleFullPath": "Applications\\User Admin \u0026 Support-DEMO" }'
# Replace Unicode escapes with the chars. they represent,
# with limitations.
[regex]::replace($json, '\\u[0-9a-fA-F]{4}', {
param($match) [char] [int] ('0x' + $match.Value.Substring(2))
})
The above yields:
{ "roleFullPath": "Applications\\User Admin & Support-DEMO" }
Note how \u0026 was converted to the char. it represents, &.
A robust solution requires more work:
There are characters that must be escaped in JSON and cannot be represented literally, so in order for the to-character conversion to work generically, these characters must be excluded.
Additionally, false positives must be avoided; e.g., \\u0026 is not a valid Unicode escape sequence, because a JSON parser interprets \\ as an escaped \ followed by verbatim u0026.
Finally, the Unicode sequences for " and \ must be translated into their escaped forms, \" and \\, and it is possible to represent a few ASCII-range control characters by C-style escape sequences, e.g., \t for a tab character (\u0009).
The following robust solution addresses all these issues:
# Sample JSON with Unicode escape sequences:
# \u0026 is &, which CAN be converted to the literal char.
# \u000a is a newline (LF) character, which CANNOT be converted, but can
# be translated to escape sequence "\n"
# \\u0026 is *not* a Unicode escape sequence and must be preserved as-is.
$json = '{
"roleFullPath": "Applications\u000aUser Admin \u0026 Support-DEMO-\\u0026"
}'
[regex]::replace($json, '(?<=(?:^|[^\\])(?:\\\\)*)\\u([0-9a-fA-F]{4})', {
param($match)
$codePoint = [int] ('0x' + $match.Groups[1].Value)
if ($codePoint -in 0x22, 0x5c) {
# " or \ must be \-escaped.
'\' + [char] $codePoint
}
elseif ($codePoint -in 0x8, 0x9, 0xa, 0xc, 0xd) {
# Control chars. that can be represented as short, C-style escape sequences.
('\b', '\t', '\n', $null, '\f', '\r')[$codePoint - 0x8]
}
elseif ($codePoint -le 0x1f -or [char]::IsSurrogate([char] $codePoint)) {
# Other control chars. and halves of surrogate pairs must be retained
# as escape sequences.
# (Converting surrogate pairs to a single char. would require much more effort.)
$match.Value
}
else {
# Translate to literal char.
[char] $codePoint
}
})
Output:
{
"roleFullPath": "Applications\nUser Admin & Support-DEMO-\\u0026"
}
To stop Powershell from doing this pipe your Json output through this
$jsonOutput | ForEach-Object { [System.Text.RegularExpressions.Regex]::Unescape($_) } | Set-Content $jsonPath -Encoding UTF8;
This will prevent the & being converted :)
I'm trying to perform a simple operation - turn some input text into JSON, process it, and use it further.
require 'json'
aws_region = "us-east-1"
tag = `sudo aws ec2 describe-tags --region="#{aws_region}" --
filters "Name=resource-type,Values=instance" "Name=key,Values=Group"
"Name=resource-id,Values=$(ec2metadata --instance-id)"`
puts tag
tag_json = tag.to_json.gsub(/\s+/, "")
#tag_json = tag.gsub("\n", "")
puts tag_json
obj = JSON.parse(tag_json)
desired_value = obj["Tags"][0]["Value"]
puts desired_value
I expected the above to strip out all whitespace including newlines, but to my surprise, the output still has newlines in it. The JSON.parse fails with the below error because the newlines are still present. With the additional tag_json assignment above uncommented, it removes the newlines and succeeds.
JSON::ParserError
-----------------
746: unexpected token at '"{\n\"Tags\": [\n{\n\"ResourceType\":
\"instance\", \n\"ResourceId\": \"i-XXXXXX\", \n\"Value\":
\"groupA\", \n\"Key\": \"Group\"\n}\n]\n}\n"'
I end up having to have a separate case for newlines. Why does gsub treat newline characters as non-whitespace? Is there any other expression that will combine all of whitespace, tabs and newlines so I can strip them out?
Maybe it's an encoding issue. Try tag_json = tag.to_json.gsub(/[\s\p{]/, "")
You don't need the + in your expression because gsub removes all occurrences of a single character anyway.
Consider "aaaaaa".gsub(/a/, '') # => ""
I would like to have a regex that will not allow spaces AFTER comma but spaces before comma should be allowed. The comma should also be optional.
My current regex:
^[\w,]+$
I have tried to add \s in it and also tried ^[\w ,]+$ but that allows spaces after comma as well!
This should be the test case:
Hello World // true
Hello, World // false (space after comma)
Hello,World // true
Hello,World World // false
Any help would be appreciated!
The below regex won't allow space after a comma,
^[\w ]+(?:,[^ ]+)?$
DEMO
Explanation:
^ start of a line.
[\w ] Matches a word charcter or a space one or more times.
(?:) This is called non-capturing groups. Anything inside this group won't be catched.
(?:,[^ ]+)? A comma followed by any character not of space one or more times. By adding ? after the non-capturing group, this tells the regex engine that it would be an optional one.
$ End of a line
I guess it depends what you want to do, if you are just testing for the presence of the grammar error, you can use something like.
See this example here >
var patt = / ,/g; // or /\s,/g if you want
var str = 'Hello ,World ,World';
var str2 = 'Hello, World, World';
console.log( patt.test(str) ) // True, there are space before commas
console.log( patt.test(str2) ) // False, the string is OK!
Lookaheads are useful but can be hard to understand without knowing the basics.
Use this site, it is great for visualising your Regex
You can use this regex.
^[\w ]+(?:,\S+)?$
Explanation:
^ # the beginning of the string
[\w ]+ # any character of: word characters, ' ' (1 or more times)
(?: # group, but do not capture (optional):
, # ','
\S+ # non-whitespace (all but \n, \r, \t, \f, and " ") (1 or more times)
)? # end of grouping
$ # before an optional \n, and the end of the string
Say I have a normal CSV like
# helloworld.csv
hello,world,,,"please don't replace quoted stuff like ,,",,
If I want mysqlimport to understand that some of those fields are NULL, then I need:
# helloworld.mysql.csv
hello,world,\N,\N,"please don't replace quoted stuff like ,,",\N,\N
I got some help from another question -- Why does sed not replace overlapping patterns -- but note the problem:
$ perl -pe 'while (s#,,#,\\N,#) {}' -pe 's/,$/,\\N/g' helloworld.csv
hello,world,\N,\N,"please don't replace quoted stuff like ,\N,",\N,\N
^^
How can I write the regex so it doesn't replace ,, if they're between quotes?
FINAL ANSWER
Here's the final perl I used, thanks to the accepted answer below:
perl -pe 's/^,/\\N,/; while (s/,(?=,)(?=(?:[^"]*"[^"]*")*[^"]*$)/,\\N/g) {}; s/,$/,\\N/' helloworld.csv
That takes care of leading, trailing, and unquoted empty strings.
Why not use Text::CSV? You can parse the file with it and then use map to replace empty fields with '\N', e.g.
use Text::CSV;
my $csv = Text::CSV->new({ binary => 1 }) or die Text::CSV->error_diag();
$csv->parse($line); # parse a CSV string into fields
my #fields = $csv->fields(); # get the parsed fields
#fields = map { $_ eq "" ? '\N' : $_ } #fields;
$csv->combine(#fields); # combine fields into a string
Assuming that you won't have escaped quotes, you can make sure that you only replace ,, if it's followed by an even number of quotes:
$subject =~
s/, # Match ,
(?=,) # only if followed by another ,
(?= # and only if followed by...
(?: # the following group:
[^"]*" # any number of non-quote characters, followed by one quote
[^"]*" # the same thing again (even number!)
)* # any number of times, followed by
[^"]* # any number of non-quotes until...
$ # end of string.
) # End of lookahead assertion
/,\N/x
g;
Input:
foo,,bar,,,baz,"foo,,,oof",zap,,zip
Output:
foo,\N,bar,\N,\N,baz,"foo,,,oof",zap,\N,zip