Search backslach in tcl using regexp - tcl

How to search backslash "\" in tcl using regexp. I tried following
regexp {\\} "b\a"
regexp {\\\\} "b\a"
I want to search text between "." and "\.". How to do this? Eg:
abcd.efg\.hij => efg , for this I tried this:
regexp {\.[a-z]*\\.} "abcd.efg\.hij" X

When single backslash is used in double quotes, then it has no special meaning at all. It should be escaped.
% set input "abcd.efg\.hij"; # Check the return value, it does not have backslash in it
abcd.efg.hij
%
% set user "din\esh"; # Check the return value
dinesh
%
% set input "abcd.efg\\.hij"; # Escaped the backslash. Check the return value
abcd.efg\.hij
%
% set input {abcd.efg\.hij}; # Or you have to brace the string
abcd.efg\.hij
%
So, your regexp should be updated as ,
% regexp "\\\\" "b\\a"
1
% regexp {\\} "b\\a"
1
% regexp {\\} {b\a}
1
% regexp {\\} {b\\a}
1
%
To extract the required data,
% set input {abcd.efg\.hij}
abcd.efg\.hij
% regexp {\.(.*)?\\} $input ignore match
1
% set match
efg

I would use \.([^\\\.]+)\\\. but it depends on what other possible samples may be.
The pattern matches an escaped dot \., then parenthesized expression ([^\\\.]+) that will extract efg (it says: not [^ backslash \\ or dot \. one or more times ]+), then explicit backslash \\ and dot \..
Your pattern will also work if you'll use capturing parenthesized expression. The match captured by such an expression will be put into the second variable:
regexp {\.([a-z]*)\\.} {abcd.efg\.hij} matchVar subMatchVar
You also have to take into account that a backslash in double-quoted string "abcd.efg\.hij" is substituted by the interpreter - the final string will become abcd.efg.hij, effectively preventing your pattern from recognizing it. So here I used curly braces or might use a variable with that string.
Take a look at Visual REGEXP. I use it occasionally.

Related

I need to extract Data from a single line of json-data which is inbetween two variables (Powershell)

I need to extract Data from a single line of json-data which is inbetween two variables (Powershell)
my Variables:
in front of Data:
DeviceAddresses":[{"Id":
after Data:
,"
I tried this, but there needs to be some error because of all the special characters I'm using:
$devicepattern = {DeviceAddresses":[{"Id":{.*?},"}
#$deviceid = [regex]::match($changeduserdata, $devicepattern).Groups[1].Value
#$deviceid
As you've found, some character literals can't be used as-is in a regex pattern because they carry special meaning - we call these meta-characters.
In order to match the corresponding character literal in an input string, we need to escape it with \ -
to match a literal (, we use the escape sequence \(,
for a literal }, we use \}, and so on...
Fortunately, you don't need to know or remember which ones are meta-characters or escapable sequences - we can use Regex.Escape() to escape all the special character literals in a given pattern string:
$prefix = [regex]::Escape('DeviceAddresses":[{"Id":')
$capture = '(.*?)'
$suffix = [regex]::Escape(',"')
$devicePattern = "${prefix}${capture}${suffix}"
You also don't need to call [regex]::Match directly, PowerShell will populate the automatic $Matches variable with match groups whenever a scalar -match succeeds:
if($changeduserdata -match $devicePattern){
$deviceid = $Matches[1]
} else {
Write-Error 'DeviceID not found'
}
For reference, the following ASCII literals needs to be escaped in .NET's regex grammar:
$ ( ) * + . ? [ \ ^ { |
Additionally, # and (regular space character) needs to be escaped and a number of other whitespace characters have to be translated to their respective escape sequences to make patterns safe for use with the IgnorePatternWhitespace option (this is not applicable to your current scenario):
\u0009 => '\t' # Tab
\u000A => '\n' # Line Feed
\u000C => '\f' # Form Feed
\u000D => '\r' # Carriage Return
... all of which Regex.Escape() takes into account for you :)
To complement Mathias R. Jessen's helpful answer:
Generally, note that JSON data is much easier to work with - and processed more robustly - if you parse it into objects whose properties you can access - see the bottom section.
As for your regex attempt:
Note: The following also applies to all PowerShell-native regex features, such as the -match, -replace, and -split operators, the switch statement, and the Select-String cmdlet.
Mathias' answer uses [regex]::Escape() to escape the parts of the regex pattern to be used verbatim by the regex engine.
This is unequivocally the best approach if those verbatim parts aren't known in advance - e.g., when provided via a variable or expression, or passed as an argument.
However, in a regex pattern that is specified as a string literal it is often easier to individually \-escape the regex metacharacters, i.e. those characters that would otherwise have special meaning to the regex engine.
The list of characters that need escaping is (it can be inferred from the .NET Regular-Expression Quick Reference):
\ ( ) | . * + ? ^ $ [ {
If you enable the IgnorePatternWhiteSpace option (which you can do inline with
(?x), at the start of a pattern), you'll additionally have to \-escape:
#
significant whitespace characters (those you actually want matched) specified verbatim (e.g., ' ', or via string interpolation,"`t"); this does not apply to those specified via escape sequences (e.g., \t or \n).
Therefore, the solution could be simplified to:
# Sample JSON
$changeduserdata = '{"DeviceAddresses":[{"Id": 42,"More": "stuff"}]}'
# Note how [ and { are \-escaped
$deviceId = if ($changeduserdata -match 'DeviceAddresses":\[\{"Id":(.*?),"') {
$Matches[1]
}
Using ConvertFrom-Json to properly parse JSON into objects is both more robust and more convenient, as it allows property access (dot notation) to extract the value of interest:
# Sample JSON
$changeduserdata = '{"DeviceAddresses":[{"Id": 42,"More": "stuff"}]}'
# Convert to an object ([pscustomobject]) and drill down to the property
# of interest; note that the value of .DeviceAddresses is an *array* ([...]).
$deviceId = (ConvertFrom-Json $changeduserdata).DeviceAddresses[0].Id # -> 42

Json dumps returns "\u0001" for "\1". I need to print the exact characters "\1" after passing to json dumps

Here is my code:
import json
a = "\1"
print json.dumps(a)
It returns "\u0001", instead of desired "\1".
Is there any way to get the exact character after passing with json dumps.
In Python, the string literal "\1" represents just one character of which the character code is 1. The backslash functions here as an escape to provide the character code as an octal value.
So either escape the backslash like this:
a = "\\1"
Or use the raw string literal notation with the r prefix:
a = r"\1"
Both will assign exactly the same string: print a produces:
\1
The output of json.dumps(a) will be:
"\\1"
... as also in JSON format, a literal backslash (reverse solidus) needs to be escaped by another backslash. But it truly represents \1.
The following prints True:
print a == json.loads(json.dumps(a))

Tcl Regular expression not working at the end of line

I'm trying to match a file that looks like this:
22.000 abc_/dasdf
23.652 abc_1/dasdf_0/l
The regular expression I used is this:
[regexp { (\S+)\s+(.+) } $line -> number name }
However, it only matches when there is a space after the string in the file. For example, it returns a match when:
22.000 abc_/dasdf<space>
But no match when there is nothing after /dasdf. By default, there are no such spaces after the string inside the file. Any reason why this could be?
That's because you have spaces inside the braces. Those are significant.
Use
regexp {(\S+)\s+(.+)} $line -> number name
# ......^...........^ no spaces here
or if you want whitespace for readability:
regexp -expanded { (\S+) \s+ (.+) } $line -> number name

How to match a colon after a close bracket

Why does the following not match the :
expect {
timeout {puts timedout\n}
\[1\]: {puts matched\n}
}
> expect test.tcl
[1]:
timedout
If I change it and remove the colon the match works:
expect {
timeout {puts timedout\n}
\[1\] {puts matched\n}
}
$ expect test.tcl
[1]
matched
Or if I get rid of the 1st bracket
expect {
timeout {puts timedout\n}
1\]: {puts matched\n}
}
then it matches:
$ expect test.tcl
1]:
matched
It is not the problem with :, but with [.
The [ is special to both Tcl and the Expect pattern matcher so it is particularly messy. To match a literal [, you have to backslash once from Tcl and then again so that it is not treated as a range during pattern matching. The first backslash, of course, has to be backslashed to prevent it from turning the next backslash into a literal backslash!
expect "\\\[" ; #matches literal '['
So, your code should be,
expect {
timeout {puts timedout\n}
\\\[1]: {puts matched\n}
}
You can prefix the ] with a backslash if it makes you feel good, but it is not
necessary. Since there is no matching left-hand bracket to be matched within the
double-quoted string, nothing special happens with the right-hand bracket. It stands for itself and is passed on to the Expect command, where it is then interpreted as the end of the range.
The next set of examples shows the behavior of [ as a pattern preceded by differing numbers of backslashes. If the [ is not prefixed by a backslash, Tcl interprets whatever follows as a command. For these examples, imagine that there is a procedure named XY that returns the string n*w.
expect" [XY]" ; # matches n followed by anything
expect "\[XY]" ; # matches X or Y
expect "\\[XY]" ; # matches n followed by anything followed by w
expect "\\\[XY]" ; # matches [XYl
expect "\\\\[XY]" ; # matches \ followed by n followed ...
expect "\\\\\[XY]" ; # matches sequence of \ and X or Y
The \\[XY] case deserves close scrutiny. Tcl interprets the first backslash to mean that the second is a literal character. Tcl then produces n*w as the result of the XY command. The pattern matcher ultimately sees the four character string n*w. The pattern matcher interprets this in the usual way. The backslash indicates that the n is to be matched literally (which it would even without the backslash since the n is not special to the pattern matcher).
Source : Exploring Expect
The patterns that worked for me:
-exact {[1]:}
-exact "\[1]:"
{\[1]:}
"\\\[1]:"

Regex to match spaces before comma but not after

I would like to have a regex that will not allow spaces AFTER comma but spaces before comma should be allowed. The comma should also be optional.
My current regex:
^[\w,]+$
I have tried to add \s in it and also tried ^[\w ,]+$ but that allows spaces after comma as well!
This should be the test case:
Hello World // true
Hello, World // false (space after comma)
Hello,World // true
Hello,World World // false
Any help would be appreciated!
The below regex won't allow space after a comma,
^[\w ]+(?:,[^ ]+)?$
DEMO
Explanation:
^ start of a line.
[\w ] Matches a word charcter or a space one or more times.
(?:) This is called non-capturing groups. Anything inside this group won't be catched.
(?:,[^ ]+)? A comma followed by any character not of space one or more times. By adding ? after the non-capturing group, this tells the regex engine that it would be an optional one.
$ End of a line
I guess it depends what you want to do, if you are just testing for the presence of the grammar error, you can use something like.
See this example here >
var patt = / ,/g; // or /\s,/g if you want
var str = 'Hello ,World ,World';
var str2 = 'Hello, World, World';
console.log( patt.test(str) ) // True, there are space before commas
console.log( patt.test(str2) ) // False, the string is OK!
Lookaheads are useful but can be hard to understand without knowing the basics.
Use this site, it is great for visualising your Regex
You can use this regex.
^[\w ]+(?:,\S+)?$
Explanation:
^ # the beginning of the string
[\w ]+ # any character of: word characters, ' ' (1 or more times)
(?: # group, but do not capture (optional):
, # ','
\S+ # non-whitespace (all but \n, \r, \t, \f, and " ") (1 or more times)
)? # end of grouping
$ # before an optional \n, and the end of the string