Hellou,
How can rename this:
"Id": "3",
to this:
"Id": 3,
I have a long file with string records.
I try use IntelliJ renamer with this formula "\d+" but $0 return completly string with quotes.
You need to take a look at how regex groups work.
The $0 will always represent the entire match. To get a subsection of it (the number in your case) you need to use parenthesis around the relevant portions to create a capturing group and then you can reference each group by a 1-based index.
So in your case, a pattern of "(\d+?)" on your sample string would return "3" for $0 (the entire match), and 3 for $1 (the first capturing group).
Related
Consider the following JSON file example.json:
{
"key1": ["arr value 1", "arr value 2", "arr value 3"],
"key2": {
"key2_1": ["a1", "a2"],
"key2_2": {
"key2_2_1": 1.43123123,
"key2_2_2": 456.3123,
"key2_2_3": "string1"
}
}
}
The following jq command extracts a value from the above file:
jq ".key2.key2_2.key2_2_1" example.json
Output:
1.43123123
Is there an option in jq that, instead of printing the value itself, prints the location (line and column, start and end position) of the value within a (valid) JSON file, given an Object Identifier-Index (.key2.key2_2.key2_2_1 in the example)?
The output could be something like:
some_utility ".key2.key2_2.key2_2_1" example.json
Output:
(6,25) (6,35)
Given JSON data and a query, there is no
option in jq that, instead of printing the value itself, prints the location
of possible matches.
This is because JSON parsers providing an interface to developers usually focus on processing the logical structure of a JSON input, not the textual stream conveying it. You would have to instruct it to explicitly treat its input as raw text, while properly parsing it at the same time in order to extract the queried value. In the case of jq, the former can be achieved using the --raw-input (or -R) option, the latter then by parsing the read-in JSON-encoded string using fromjson.
The -R option alone would read the input linewise into an array of strings, which would have to be concatenated (e.g. using add) in order to provide the whole input at once to fromjson. The other way round, you could also provide the --slurp (or -s) option which (in combination with -R) already concatenates the input to a single string which then, after having parsed it with fromjson, would have to be split again into lines (e.g. using /"\n") in order to provide row numbers. I found the latter to be more convenient.
That said, this could give you a starting point (the --raw-output (or -r) option outputs raw text instead of JSON):
jq -Rrs '
"\(fromjson.key2.key2_2.key2_2_1)" as $query # save the query value as string
| ($query | length) as $length # save its length by counting its characters
| ./"\n" | to_entries[] # split into lines and provide 0-based line numbers
| {row: .key, col: .value | indices($query)[]} # find occurrences of the query
| "(\(.row),\(.col)) (\(.row),\(.col + $length))" # format the output
'
(5,24) (5,34)
Demo
Now, this works for the sample query, how about the general case? Your example queried a number (1.43123123) which is an easy target as it has the same textual representation when encoded as JSON. Therefore, a simple string search and length count did a fairly good job (not a perfect one because it would still find any occurrence of that character stream, not just "values"). Thus, for more precision, but especially with more complex JSON datatypes being queried, you would need to develop a more sophisticated searching approach, probably involving more JSON conversions, whitespace stripping and other normalizing shenanigans. So, unless your goal is to rebuild a full JSON parser within another one, you should narrow it down to the kind of queries you expect, and compose an appropriately tailored searching approach. This solution provides you with concepts to simultaneously process the input textually and structurally, and with a simple search and ouput integration.
Script to manipulate some proper names in a web story to help my reading tool pronounce them correctly.
I get the content of a webpage via
$webpage = (Invoke-WebRequest -URI 'https://wanderinginn.com/2018/03/20/4-20-e/').Content
This $webpage should be of type String.
Now
$webpage.IndexOf('<div class="entry-content">')
returns correct value, yet
$webpage.IndexOf("Previous Chapter")
returns unexpected value and I need some explanation why or how I can find the error myself.
In theory it should cut the "body" of the page run it through a list of proper nouns I want to Replace and push this into a htm-file.
It all works, but the value of IndexOf("Prev...") does not.
Edit:
After invoke-webrequest I can
Set-Clipboard $webrequest
and post this in notepad++, there I can find both 'div class="entry-content"' and 'Previous Chapter'.
If I do something like
Set-Clipboard $webpage.substring(
$webpage.IndexOf('<div class="entry-content">'),
$webpage.IndexOf('PreviousChapter')
)
I would expect Powershell to correctly determine both first instances of those strings and cut between. Therefore my clipboard should now have my desired content, yet the string goes further than the first occurrence.
tl;dr
You had a misconception about how String.Substring() method works: the second argument must be the length of the substring to extract, not the end index (character position) - see below.
As an alternative, you can use a more concise (albeit more complex) regex operation with -replace to extract the substring of interest in a single operation - see below.
Overall, it's better to use an HTML parser to extract the desired information, because string processing is brittle (HTML allows variations in whitespace, quoting style, ...).
As Lee_Dailey points out, you had a misconception about how the String.Substring() method works: its arguments are:
a starting index (0-based character position),
from which a substring of a given length should be returned.
Instead, you tried to pass another index as the length argument.
To fix this, you must subtract the lower index from the higher one, so as to obtain the length of the substring you want to extract:
A simplified example:
# Sample input from which to extract the substring
# '>>this up to here'
# or, better,
# 'this up to here'.
$webpage = 'Return from >>this up to here<<'
# WRONG (your attempt):
# *index* of 2nd substring is mistakenly used as the *length* of the
# substring to extract, which in this even *breaks*, because a length
# that exceeds the bounds of the string is specified.
$webpage.Substring(
$webpage.IndexOf('>>'),
$webpage.IndexOf('<<')
)
# OK, extracts '>>this up to here'
# The difference between the two indices is the correct length
# of the substring to extract.
$webpage.Substring(
($firstIndex = $webpage.IndexOf('>>')),
$webpage.IndexOf('<<') - $firstIndex
)
# BETTER, extracts 'this up to here'
$startDelimiter = '>>'
$endDelimiter = '<<'
$webpage.Substring(
($firstIndex = $webpage.IndexOf($startDelimiter) + $startDelimiter.Length),
$webpage.IndexOf($endDelimiter) - $firstIndex
)
General caveats re .Substring():
In the following cases this .NET method throws an exception, which PowerShell surfaces as a statement-terminating error; that is, by default the statement itself is terminated, but execution continues:
If you specify an index that is outside the bounds of the string (a 0-based character position less than 0 or one greater than the length of the string):
'abc'.Substring(4) # ERROR "startIndex cannot be larger than length of string"
If you specify a length whose endpoint would fall outside the bounds of the string (if the index plus the length yields an index that is greater than the length of the string).
'abc'.Substring(1, 3) # ERROR "Index and length must refer to a location within the string"
That said, you could use a single regex (regular expression) to extract the substring of interest, via the -replace operator:
$webpage = 'Return from >>this up to here<<'
# Outputs 'this up to here'
$webpage -replace '^.*?>>(.*?)<<.*', '$1'
The key is to have the regex match the entire string and extract the substring of interest via a capture group ((...)) whose value ($1) can then be used as the replacement string, effectively returning just that.
For more information about -replace, see this answer.
Note: In your specific case an additional tweak is needed, because you're dealing with a multiline string:
$webpage -replace '(?s).*?<div class="entry-content">(.*?)Previous Chapter.*', '$1'
Inline option ((?...)) s ensures that metacharacter . also matches newline characters (so that .* matches across lines), which it doesn't by default.
Note that you may have to apply escaping to the search strings to embed in the regex, if they happen to contain regex metacharacters (characters with special meaning in the context of a regex):
With embedded literal strings, \-escape characters as needed; e.g., escape .txt as \.txt
If a string to embed comes from a variable, apply [regex]::Escape() to its value first; e.g.:
$var = '.txt'
# [regex]::Escape() yields '\.txt', which ensures
# that '.txt' doesn't also match '_txt"
'a_txt a.txt' -replace ('a' + [regex]::Escape($var)), 'a.csv'
I am looking to detect anomalies in my JSON values.
Here's an example of the data queries via jq
"2014-03-26 01:58:00"
"9019549360"
"109092812_20150626"
"134670164"
""
"97695498"
"680561513"
I would like to display all the values that contain a - or a _ or is blank.
In other words, I'd like to display the following output
"2014-03-26 01:58:00"
"109092812_20150626"
""
Now, I have tried the following:
select (. | contains("-","_"," "))'
This appears to work, but in order to make it more robust, I'd like to expand this to include all special characters.
Your query won't detect empty strings, and will possibly emit the same string more than once. It would be easier to use test, e.g.:
select( length==0 or test("[-_ ]") )
Note also that the preliminary '.' in your query is unnecessary.
Addendum
From one of the comments, it awould appear that you will want to specify "[^a-zA-Z0-9]" or similar as the argument of test.
I'm mostly at a loss for how to describe this, so I'll start with a simple example that is similar to some JSON I'm working with:
"user_interface": {
username: "Hello, %USER.username%",
create_date: "Your account was created on %USER.create_date%",
favorite_color: "Your favorite color is: %USER.fav_color%"
}
The "special identifiers" located in the username create_date and favorite_color fields start and end with % characters, and are supposed to be replaced with the correct information for that particular user. An example for the favorite_color field would be:
Your favorite color is: Orange
Is there a proper term for these identifiers? I'm trying to search google for best practices or libraries when parsing these before I reinvent the wheel, but everything I can think of results in a sea of false-positives.
Just some thoughts on the subject of %special identifier%. Let's take a look at a small subset of examples, that implement almost similar strings replacement.
WSH Shell ExpandEnvironmentStrings Method
Returns an environment variable's expanded value.
WSH .vbs code snippet
Set WshShell = WScript.CreateObject("WScript.Shell")
WScript.Echo WshShell.ExpandEnvironmentStrings("WinDir is %WinDir%")
' WinDir is C:\Windows
.NET Composite Formatting
The .NET Framework composite formatting feature takes a list of objects and a composite format string as input. A composite format string consists of fixed text intermixed with indexed placeholders, called format items, that correspond to the objects in the list. The formatting operation yields a result string that consists of the original fixed text intermixed with the string representation of the objects in the list.
VB.Net code snippet
Console.WriteLine(String.Format("Prime numbers less than 10: {0}, {1}, {2}, {3}, {4}", 1, 2, 3, 5, 7 ))
' Prime numbers less than 10: 1, 2, 3, 5, 7
JavaScript replace Method (with RegEx application)
... The match variables can be used in text replacement where the replacement string has to be determined dynamically... $n ... The nth captured submatch ...
Also called Format Flags, Substitution, Backreference and Format specifiersJavaScript code snippet
console.log("Hello, World!".replace(/(\w+)\W+(\w+)/g, "$1, dear $2"))
// Hello, dear World!
Python Format strings
Format strings contain “replacement fields” surrounded by curly braces {}. Anything that is not contained in braces is considered literal text, which is copied unchanged to the output...
Python code snippet
print "The sum of 1 + 2 is {0}".format(1+2)
# The sum of 1 + 2 is 3
Ruby String Interpolation
Double-quote strings allow interpolation of other values using#{...} ...
Ruby code snippet
res = 3
puts "The sum of 1 + 2 is #{res}"
# The sum of 1 + 2 is 3
TestComplete Custom String Generator
... A string of macros, text, format specifiers and regular expressions that will be used to generate values. The default value of this parameter is %INT(1, 2147483647, 1) %NAME(ANY, FULL) lives in %CITY. ... Also, you can format the generated values using special format specifiers. For instance, you can use the following macro to generate a sequence of integer values with the specified minimum length (3 characters) -- %0.3d%INT(1, 100, 3).
Angular Expression
Angular expressions are JavaScript-like code snippets that are mainly placed in interpolation bindings such as{{ textBinding }}...
Django Templates
Variables are surrounded by {{ and }} like this:My first name is {{ first_name }}. My last name is {{ last_name }}.With a context of {'first_name': 'John', 'last_name': 'Doe'}, this template renders to:My first name is John. My last name is Doe.
Node.js v4 Template strings
... Template strings can contain place holders. These are indicated by the Dollar sign and curly braces (${expression}). The expressions in the place holders and the text between them get passed to a function...
JavaScript code snippet
var res = 3;
console.log(`The sum of 1 + 2 is ${res}`);
// The sum of 1 + 2 is 3
C/C++ Macros
Preprocessing expands macros in all lines that are not preprocessor directives...
Replacement in source code.
C++ code snippet
std::cout << __DATE__;
// Jan 8 2016
AutoIt Macros
AutoIt has an number of Macros that are special read-only variables used by AutoIt. Macros start with the # character ...
Replacement in source code.
AutoIt code snippet
MsgBox(0, "", "CPU Architecture is " & #CPUArch)
; CPU Architecture is X64
SharePoint solution Replaceable Parameters
Replaceable parameters, or tokens, can be used inside project files to provide values for SharePoint solution items whose actual values are not known at design time. They are similar in function to the standard Visual Studio template tokens... Tokens begin and end with a dollar sign ($) character. Any tokens used are replaced with actual values when a project is packaged into a SharePoint solution package (.wsp) file at deployment time. For example, the token $SharePoint.Package.Name$ might resolve to the string "Test SharePoint Package."
Apache Ant Replace Task
Replace is a directory based task for replacing the occurrence of a given string with another string in selected file... token... the token which must be replaced...
So, based on functional context I would call it %token% (such a flavor of strings with an identified "meaning").
This is a bit of a .json file I need to find information in:
"title":
"Spring bank holiday","date":"2012-06-04","notes":"Substitute day","bunting":true},
{"title":"Queen\u2019s Diamond Jubilee","date":"2012-06-05","notes":"Extra bank holiday","bunting":true},
{"title":"Summer bank holiday","date":"2012-08-27","notes":"","bunting":true},
{"title":"Christmas Day","date":"2012-12-25","notes":"","bunting":true},
{"title":"Boxing Day","date":"2012-12-26","notes":"","bunting":true},
{"title":"New Year\u2019s Day","date":"2013-01-01","notes":"","bunting":true},
{"title":"Good Friday","date":"2013-03-29","notes":"","bunting":false},
{"title":"
The file is much longer, but it is one long line of text.
I would like to display what bank holiday it is after a certain date, and also if it involves bunting.
I've tried grep and sed but I can't figure it out.
I'd like something like this:
[command] between [date] and [}] display [title] and [bunting]/[no bunting]
[title] should be just "Christmas Day" or something else
Forgot to mention:
I would like to achieve this in bash shell, either from the prompt or from a short bit of code.
You should use a proper JSON parser in a decent programming language, then you can do a lot of work in a safe way without too much code. How about this little Python code:
#!/usr/bin/env python
import json
with open('my.json') as jsonFile:
holidays = json.load(jsonFile)
for holiday in holidays:
if holiday['date'] > '2012-05-06':
print holiday['date'], ':', holiday['title'], \
("bunting" if holiday['bunting'] else "no bunting")
break # in case you only want one line of output
I could not figure out what exactly the output should be; if you can be more specific, I can adjust my example.
You can try this with awk:
awk -F"}," '{for(i=1;i<=NF;i++){print $i}}' file.json | awk -F"\"[:,]\"?" '$4>"2013-01-01"{printf "%s:%s:%s\n" ,$2,$4,$8}'
Seeing that the json file is one long string we first split this line into multiple json records on },. Then each individual record is split on a combination of ":, characters with an optional closing ". We then only output the line if its after a certain date.
This will find all records after Jan 1 2013.
EDIT:
The 2nd awk splits each individual json record into key-value pairs using a sub-string starting with ", followed by either a : or ,, and an optional ending ".
So in your example it will split on either ",", ":" or ":.
All odd fields are keys, and all even fields are values (hence $4 being the date in your example). We then check if $4(date) is after 2013-01-01.
I noticed i made a mistake on the optional " (should be followed by ? instead of *) in the split which i have now corrected and i also used printf function to display the values.