How to modify regular expressions so that it extracts same fields of both fields? - extract

When looking for some logs in one file I got two types of logs (one with white spaces and one without).
I would now like to extract doSomething and doAnotherThing out of these logs with one regular expression.
Logfile 1:
"taskType":"doSomething"
Logfile 2:
"taskType" : "doAnotherThing"
I coded this regular expression: taskType.....(?<taskType1>\w+)
It works good for Logfile 1 but not for Logfile 2, because it cuts the first two characters of the word. Is there a way to eliminate this issue?
Thanks!

"taskType"\s?:\s?"(doSomething|doAnotherThing)" works for me, try it here https://regex101.com/r/Mx1RtT/1

Related

How to get rid of in AA

I am reading data (combination of letters and numbers) from an excel sheet and put it into a text field in target application, where the input should yield a unique item from a database.
However there (sometimes) is a whitespace behind the data in the excel cell, which results in a "no data found" when this whitespace is entered into the search field in target application. The whitespace does not seem to be a space though, since i am unable to trim that whitespace AA-internally. I guess it is a (or some similar html special character).
edit: confirmed to be a by now.
Q: How can i get rid of such characters AA internally?
Tried: Neither (a) Trim, (b) Replace " " ->"", nor (c) Replace " "->"" work.
Workaround: I am currently checking for the length of the data provided: if its longer than 10 chars i only take the leftmost 10 chars. This works here, since its a business rule for the data i am working with, but i am still interested in an original solution, since there may be upcoming cases, where no business rule will help me out.
AA Version: 11.3.1
Thankful for input...
Okay, since it's non-breaking spaces character, you can replace it using Regex in replace command.
Find: \u00a0
Options: Regular Expression.
Got rid of it using Replace Command with RegEx ticked:
[^a-z A-Z 0-9]

What does 'multiline strings are different' meant by from RIDE (Robot Framework) output?

i am trying to compare two csv file data and followed below process in RIDE -
${csvA} = Get File ${filePathA}
${csvB} = Get File ${filePathB}
Should Be Equal As Strings ${csvA} ${csvB}
Here are my two csv contents -
csvA data
Harshil,45,8.03,DMJ
Divy,55,8,VVN
Parth,1,9,vvn
kjhjmb,44,0.5,bugg
csvB data
Harshil,45,8.03,DMJ
Divy,55,78,VVN
Parth,1,9,vvnbcb
acc,5,6,afafa
As few of the data is not in match, when i Run the code in RIDE, the result is FAIL. But in the log below data is shown -
**
Multiline strings are different:
--- first
+++ second
## -1,4 +1,4 ##
Harshil,45,8.03,DMJ
-Divy,55,8,VVN
-Parth,1,9,vvn
-kjhjmb,44,0.5,bugg
+Divy,55,78,VVN
+Parth,1,9,vvnbcb
+acc,5,6,afafa**
I would like to know the meaning of ---first +++second ##-1,4+1,4## content.
Thanks in advance!
When robot compares multiline strings (data that has newlines in it), it uses the standard unix tool diff to show the differences. Those characters are all part of what's called a unified diff. Even though you pass in raw data, it's treating the data as two files and showing the differences between the two in a format familiar to most programmers.
Here are two references to read more about the format:
What does "## -1 +1 ##" mean in Git's diff output?. (stackoverflow)
the diff man page (gnu.org)
In short, the ## gives you a reference for which line numbers are different, and the + and - show you which lines are different.
In your specific example it's telling you that three lines were different between the two strings: the line beginning with Divy, the line beginning with Parth, and the line beginning with acc. Since the line beginning with Harshil does not show a + or -, that means it was identical between the two strings.

I need to remove a piece of every line in my json file

I have a json output on my notepad and i know it is not in the correct format. At the end of each line there is a time stamp which is causing the bad format. I want to get rid of it using find and replace since the file is pretty big. The format is as follows :
"eventtimestamp": "05 23 2017 04:01:02"}
The above piece comes in at the end of every line. How can i get rid of it using find a replace or any other way.
All help is appreciated.
Thank you
If you need to alter every line in a consistent way then regex find/replace is a good option. Free tools like atom.io, Notepad++, and plenty of others offer this feature.
Assuming "eventtimestamp" is constant, then a simple regex that says "find everything starting with "eventtimestamp" and up to a '}'" will work.
"eventtimestamp".*(?=})
And "replace" that with an empty string.
ps) here's a demo of the regex in regexr.com--hovering over the parts of the pattern will explain what they do.
If you are not sure that the eventtimestamp field always comes in at the end of a line and/or as the last element of the object, prefer that kind of pattern: "eventtimestamp":\s*"[^"]+",?.
Note the useful surrounded excepted character class pattern "[^"]+" that can be declined with any other delimiter.

word2vec : find words similar in a case insensitive manner

I have access to word vectors on a text corpus of my interest. Now, the issue I am faced with is that these vectors are case sensitive, i.e for example "Him" is different from "him" is different from "HIM".
I would like to find words most similar to the word "Him" is a case insensitive manner. I use the distance.c program that comes bundled with the Google word2vec package. Here is where I am faced with an issue.
Should I pass as arguments "Him him HIM" to the distance.c executable. This would return the sent of words closed to the 3 words.
Or should I run the distance.c program separately with each of the 3 arguments ("Him" and "him" and "HIM"), and then put together these lists in a sensible way to arrive at the most similar words? Please suggest.
If you want to find similar words in a case-insensitive manner, you should convert all your word vectors to lowercase or uppercase, and then run the compiled version of distance.c.
This is fairly easy to do using standard shell tools.
For example, if your original data in a file called input.txt, the following will work on most Unix-like shells.
tr '[:upper:]' '[:lower:]' < input.txt > output.txt
You can transform the binary format to text, then manipulate as you see fit.

Trying to redirect output of a command to a variable

>> set signal_name [get_fanout abc_signal]
{xyz_blah_blah}
>> echo $signal_name
#142
>> set signal_name [get_fanout abc_signal]
{xyz_blah_blah}
>> echo $signal_name
#144
>>
I tried other stuff like catch etc, and every where, it returns #number. My goal is to be able to print the actual value instead of the number - xyz_blah_blah.
I am new to tcl. Want to understand, if this is an array or a pointer to an array or something like that. When I try the exact same thing with a different command, which returns just a value, then it works. This is a new command which returns value in parenthesis.
Please help. Thanks.
Every Tcl command produces a result value, which you capture and use by putting the call of the command in [square brackets] and putting the whole lot as part of an argument to another command. Thus, in:
set signal_name [get_fanout abc_signal]
the result of the call to get_fanout is used as the second argument to set. I suggest that you might also like to try doing this:
puts "-->[get_fanout abc_signal]<--"
It's just the same, except this time we're concatenating it with some other small string bits and printing the whole lot out. (In case you're wondering, the result of puts itself is always the empty string if there isn't an error, and set returns the contents of the variable.)
If that is still printing the wrong value (as well as the right one beforehand, without arrow marks around it) the real issue may well be that get_fanout is not doing what you expect. While it is possible to capture the standard output of a command, doing so is a considerably more advanced technique; it is probably better to consider whether there is an alternate mechanism to achieve what you want. (The get_fanout command is not a standard part of the Tcl language library or any very common add-on library like Tk or the Tcllib collection, so we can only guess at its behavior.)