I want to match keys in JSON string on linux shell grep. My objective is to remove JSON keys so that values would come out in CSV. Please help me with regex. I tried "(.*?)":
{"field1":"value1","field2":"value2"}
But above regex matches "field1": and then "value1","field2":
So basically it shouldn't match groups containing comma. I know this should be done in python or java. But I want to avoid deployment of application on that specific server. Also internet access has been revoked from this server and many othe restrictions so I cannot install any new tools or commands. Is it possible?
You can try the following regex:
"([^"]+?)"\s*:
It will match any word character that may be between quotes(" ") succeed by a : (ignoring whitespaces).
Demo
Related
I have a very big HTML file (talking about 20MB) and I need to remove from the file a large amount of nodes of the form:
<tr><td>SPECIFIC-STRING</td><td>RANDOM-STRING</td><td>RANDOM-STRING</td></tr><tr><td style="padding-top:0" colspan="3">RANDOM-STRING</td></tr>
The file I need to work on is basically made of thousands of these strings, and I only need to remove those that have a specific first string, for instance, all those with the first string being "banana":
<tr><td>banana</td><td>RANDOM-STRING</td><td>RANDOM-STRING</td></tr><tr><td style="padding-top:0" colspan="3">RANDOM-STRING</td></tr>
I tried achieving this opening the file in Geany and using the replace feature with this regex:
<tr><td>banana<\/td><td>(.*)<\/td><td>(.*)<\/td><\/tr><tr><td(.*)<\/td><\/tr>
but the console output was that it removed X amount of occurrences, when I know there are way more occurrences than that in the file.
Firefox, Chrome and Brackets fail even to view the html code of the file due to it's size. I can't think of another way to do this due to my large unexperience with HTML.
You could be using a stream editor which as the name suggest streams the file content, thus never loads the whole file into the main memory.
A popular editor is sed. It does support RegEx.
Your command would have the following structure.
sed -i -E 's/SEARCH_REGEX/REPLACEMENT/g' INPUTFILE
-E for support of extended RegEx
-i for in-place editing mode
s denotes that you want to replace values
g is for global. By default sed would only replace the first occurrence so to replace all occrrences you must provide g
SEARCH_REGEX is the RegEx you need to find the substrings you want to replace
REPLACEMENT is the value you want to replace all matches with
INPUTFILE is the file sed is gonna read line-by line and do the replacement for you.
While regex may not be the best tool to do this kinda job, try this adjustment to your pattern:
<tr><td>banana<\/td><td>(.*?)<\/td><td>(.*?)<\/td><\/tr><tr><td(.*?)<\/td><\/tr>
That's making your .* matches lazy. I am wondering if those patterns are consuming too much.
I am looking for an elegant way to parse a text file (i.e. a log file containing source and destination IPs and lots of other data) keeping each line intact, and replacing all IPv4 addresses with the same IP followed by a comma and the GeoIP country code of that IP.
I have tried doing this in bash, sed, perl, and python. I tried a hundred perl one-liners and never quite got it because substitution like s/original/replacement/g doesn't want to execute GeoIP lookup in the substitution field. For example:
perl -pe 's/([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})/($1,system(geoiplookup $1))/g' < log.csv
results in:
"srcip=(110.110.110.110,system(geoiplookup 110.110.110.110))"
instead of the executing geoiplookup.
I've tried this with backticks as well as exec, lots of different punctuation, with the same result.
In Python I tried some code that looks like:
rexp_ip = r"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
repl = { rexp_ip: rexp_ip+".test" }
---
while line:
line = i.readline()
print(re.sub(rexp_ip, lambda m: str(repl.get(m.group())), line))
It seems pretty close but I'm not sure whether I'm on the right track here.
I would be open to bash, sed, awk, perl, python, or any other solution.
This seems fairly simple to me and I may be over-thinking it!
I am guessing I'm not the first person who has tried this and maybe I'm 'reinventing the wheel' here.
Any insight would be appreciated.
I may have solved my own problem using perl with /e switch--
$ perl -lpe 's/([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})/(`printf $1;geoiplookup $1`)/eg' < log.csv
This question already has an answer here:
jq special characters in nested keys
(1 answer)
Closed 3 years ago.
I am trying to use the jq command line JSON processor https://shapeshed.com/jq-json/ (which works great) to process a JSON file that seems to have been made using some poor choices.
Normally your id and value in the JSON file would not contain any periods such as:
{"id":"d9s7g9df7sd9","name":"Tacos"}
To get Tacos from the file you would do the following in bash:
echo $json | jq -r '.name'
This will give you Tacos (There may be some extra code missing from that example but you get the point.)
I have a JSON file that looks like this:
{"stat.blah":123,"stat.taco":495,"stat.yum... etc.
Notice how they decided to use a period in the identifying field associated with the value? This makes using jq very difficult because it associates the period as a separator to dig down into child values in the JSON. Sure, I could first load my file, replace all "." with "_" and that would fix the problem, but this seems like a really dumb and hackish solution. I have no way to change how the initial JSON file is generated. I just have to deal with it. Is there a way in bash I can do some special escape to make it ignore the period?
Thanks
Use generic object index syntax, e.g:
.["stat.taco"]
If you use the generic object syntax, e.g. .["stat.taco"], then chaining is done either using pipes as usual, or without the dot, e.g.
.["stat.taco"]["inner.key"]
If your jq is sufficiently recent, then you can use the chained-dot notation by quoting the keys with special characters, e.g.
."stat.taco"."inner.key"
You can also mix-and-match except that expressions such as: .["stat.taco"].["inner.key"] are not (as of jq 1.6) supported.
I'm making a query via Oracle SQLcl. I am spooling into a .json file.
The correct data is presented from the query, but the format is strange.
Starting off as:
SET ENCODING UTF-8
SET SQLFORMAT JSON
SPOOL content.json
Follwed by a query, produces a JSON file as requested.
However, how do I remove the outer structure, meaning this part:
{"results":[{"columns":[{"name":"ID","type":"NUMBER"},
{"name":"LANGUAGE","type":"VARCHAR2"},{"name":"LOCATION","type":"VARCHAR2"},{"name":"NAME","type":"VARCHAR2"}],"items": [
// Here is the actual data I want to see in the file exclusively
]
I only want to spool everything in the items array, not including that key itself.
Is this possible to set as a parameter before querying? Reading the Oracle docs have not yielded any answers, hence asking here.
Thats how I handle this.
After output to some file, I use jq command to recreate the file with only the items
ssh cat file.json | jq --compact-output --raw-output '.results[0].items' > items.json
`
Using this library = https://stedolan.github.io/jq/
I have a working code for parsing a JSON output using KornShell by treating it as a string of characters. The issue I have is that the vendor keeps changing the position of the field that I am intersted in. I understand in JSON, we can parse it by key-value pairs.
Is there something out there that can do this? I am intersted in a specific field and I would like to use it to run the checks on the status of another RESTAPI call.
My sample json output is like this:
JSONDATA value :
{
"status": "success",
"job-execution-id": 396805,
"job-execution-user": "flexapp",
"job-execution-trigger": "RESTAPI"
}
I would need the job-execution-id value to monitor this job through the rest of the script.
I am using the following command to parse it:
RUNJOB=$(print ${DATA} |cut -f3 -d':'|cut -f1 -d','| tr -d [:blank:]) >> ${LOGDIR}/${LOGFILE}
The problem with this is, it is field delimited by :. The field position has been known to be changed by the vendors during releases.
So I am trying to see if I can use a utility out there that would always give me the key-value pair of "job-execution-id": 396805, no matter where it is in the json output.
I started looking at jsawk, and it requires the js interpreter to be installed on our machines which I don't want. Any hint on how to go about finding which RPM that I need to solve it?
I am using RHEL5.5.
Any help is greatly appreciated.
The ast-open project has libdss (and a dss wrapper) which supposedly could be used with ksh. Documentation is sparse and is limited to a few messages on the ast-user mailing list.
The regression tests for libdss contain some json and xml examples.
I'll try to find more info.
Python is included by default with CentOS so one thing you could do is pass your JSON string to a Python script and use Python's JSON parser. You can then grab the value written out by the script. An example you could modify to meet your needs is below.
Note that by specifying other dictionary keys in the Python script you can get any of the values you need without having to worry about the order changing.
Python script:
#get_job_execution_id.py
# The try/except is because you'll probably have Python 2.4 on CentOS 5.5,
# and the straight "import json" statement won't work unless you have Python 2.6+.
try:
import json
except:
import simplejson as json
import sys
json_data = sys.argv[1]
data = json.loads(json_data)
job_execution_id = data['job-execution-id']
sys.stdout.write(str(job_execution_id))
Kornshell script that executes it:
#get_job_execution_id.sh
#!/bin/ksh
JSON_DATA='{"status":"success","job-execution-id":396805,"job-execution-user":"flexapp","job-execution-trigger":"RESTAPI"}'
EXECUTION_ID=`python get_execution_id.py "$JSON_DATA"`
echo $EXECUTION_ID