Convert json file to R object in R - json

I am trying to convert a json file in R. The format of the data is as follows:
{
"id": "xyz",
"root": {
"author": {
"name": "xyz",
"email": "xyx#xyz.org",
"date": "2014-10-08T00:10:30Z"
},
"authorer": {
"name": "xyz",
"email": "xyx#xyz.org",
"date": "2014-10-08T00:11:30Z"
},
"message": "This a test json",
"root": {
"id": "xyz1",
"url": "xyz"
},
"url": "xyz",
"message_count": 0
},
"url": "xyz",
"html_url": "xyz",
"comments_url": "abc",
"author": null,
"authorer": null,
"parent": [
{
"id": "xyz3",
"url": "xyz",
"html_url": "xyz"
}
]
}
After this a similar row begins, with {having the same formatted text }This is the code I wrote in R
install.packages("rjson")
library(rjson)
df <- fromJSON(paste(readLines("file.json"), collapse=""))
View(df)
I was wondering how do make this file readable in R? I wanted to see them as columns like this:
id root/author/name root/author/email root/author/date root/authorer/name
Refer to here: http://konklone.io/json/?id=dfeae96a607c7541b8fe (of how the input and output should look like).
I have provided a new link here for two rows: http://konklone.io/json/?id=3b01a02e17ec4fde3357
Thanks a lot

Is this what you want:
json <- '{
"id": "xyz",
"root": {
"author": {
"name": "xyz",
"email": "xyx#xyz.org",
"date": "2014-10-08T00:10:30Z"
},
"authorer": {
"name": "xyz",
"email": "xyx#xyz.org",
"date": "2014-10-08T00:11:30Z"
},
"message": "This a test json",
"root": {
"id": "xyz1",
"url": "xyz"
},
"url": "xyz",
"message_count": 0
},
"url": "xyz",
"html_url": "xyz",
"comments_url": "abc",
"author": null,
"authorer": null,
"parent": [
{
"id": "xyz3",
"url": "xyz",
"html_url": "xyz"
}
]
}'
out <- jsonlite::fromJSON(json)
out[vapply(out, is.null, logical(1))] <- "none"
data.frame(out, stringsAsFactors = FALSE)[,1:5]
id root.author.name root.author.email root.author.date root.authorer.name
1 xyz xyz xyx#xyz.org 2014-10-08T00:10:30Z xyz

Related

Check if a key exists and return another key

I need help with jq syntax on how to return the Gitlab job ID if it contains an artifact. The JSON output looks like this (removed a lot of unrelated info from it and added [...]):
[{
"id": 3219589880,
"status": "success",
"stage": "test",
"name": "job_with_no_artifact",
"ref": "main",
"tag": false,
"coverage": null,
"allow_failure": false,
"created_at": "2022-10-24T18:21:25.119Z",
"started_at": "2022-10-24T18:21:25.986Z",
"finished_at": "2022-10-24T18:21:38.464Z",
"duration": 12.478682,
"queued_duration": 0.499786,
"user": {
"id": 123456789,
[...]
},
"commit": {
"id": "5e0e1f287d20daf2036a3ca71c656dce55999265",
[...]
"pipeline": {
"id": 123456789,
[...]
"project": {
"ci_job_token_scope_enabled": false
},
"artifacts": [],
"runner": {
"id": 12270859,
[...]
},
"artifacts_expire_at": null,
"tag_list": []
}, {
"id": 3219589878,
"status": "success",
"stage": "test",
"name": "create_artifact_job_2",
"ref": "main",
"tag": false,
"coverage": null,
"allow_failure": false,
"created_at": "2022-10-24T18:21:25.111Z",
"started_at": "2022-10-24T18:21:25.922Z",
"finished_at": "2022-10-24T18:21:39.090Z",
"duration": 13.168405,
"queued_duration": 0.464364,
"user": {
"id": 123456789,
[...]
},
"commit": {
"id": "5e0e1f287d20daf2036a3ca71c656dce55999265",
[...]
},
"pipeline": {
"id": 675641982,
[...],
"project": {
"ci_job_token_scope_enabled": false
},
"artifacts_file": {
"filename": "artifacts.zip",
"size": 223
},
"artifacts": [{
"file_type": "archive",
"size": 223,
"filename": "artifacts.zip",
"file_format": "zip"
}, {
"file_type": "metadata",
"size": 153,
"filename": "metadata.gz",
"file_format": "gzip"
}],
"runner": {
"id": 12270845,
[...]
},
"artifacts_expire_at": "2022-10-25T18:21:35.859Z",
"tag_list": []
}, {
"id": 3219589876,
"status": "success",
"stage": "test",
"name": "create_artifact_job_1",
"ref": "main",
"tag": false,
"coverage": null,
"allow_failure": false,
"created_at": "2022-10-24T18:21:25.103Z",
"started_at": "2022-10-24T18:21:25.503Z",
"finished_at": "2022-10-24T18:21:41.407Z",
"duration": 15.904028,
"queued_duration": 0.098837,
"user": {
"id": 123456789,
[...]
},
"commit": {
"id": "5e0e1f287d20daf2036a3ca71c656dce55999265",
[...]
},
"pipeline": {
"id": 123456789,
[...]
},
"web_url": "WEB_URL",
"project": {
"ci_job_token_scope_enabled": false
},
"artifacts_file": {
"filename": "artifacts.zip",
"size": 217
},
"artifacts": [{
"file_type": "archive",
"size": 217,
"filename": "artifacts.zip",
"file_format": "zip"
}, {
"file_type": "metadata",
"size": 152,
"filename": "metadata.gz",
"file_format": "gzip"
}],
"runner": {
"id": 12270857,
},
"artifacts_expire_at": "2022-10-25T18:21:37.808Z",
"tag_list": []
}]
I've been trying to do either of the following using jQ:
Either:
Check if artifacts_file key exists in each iteration and if it does return the (job) id (so .[].id)
Check if artifacts array is empty in each iteration and if it is empty return the (job) id.
In both cases I'm able to do the first part but I am not sure how to return the .id key.
Related stackoverflow questions that I've been trying to utilize and adapt to my case:
jq - return array value if its length is not null
How to check for presence of 'key' in jq before iterating over the values
What I have so far: jq '[.[].artifacts[]|select(length > 0)] | .[]' which returns all the artifacts found (but it doesn't contain the .id of the job).
Checking the existence of a field using has:
.[] | select(has("artifacts_file")).id
3219589878
3219589876
Demo
Checking if a field is an empty array by comparing it to []:
.[] | select(.artifacts == []).id
3219589880
Demo

Add few variable based on the existing CSV and append to the existing JSON in Python

We do have one csv and one json.
Based on the values inside the CSV, we need to modify the json.
For instance:
Input CSV:
myID,goID1,goID2,goID3
a123-b456-c789,10.0.0.0/16,10.1.0.0/16,10.2.0.0/16
a123-b456-c789,11.0.0.0/16,11.1.0.0/16,11.2.0.0/16
Input JSON:
[
{
"id": "123",
"name": "test1",
"goValues": [
{
"id": "456",
"name": "10.3.0.0",
"myID": "a123-b456-c789",
"status": "active",
"goID": "10.3.0.0/16"
},
{
"id": "789",
"name": "10.4.0.0",
"myID": "a123-b456-c789",
"status": "active",
"goID": "10.4.0.0/16"
}
]
}
]
Now, here I need to (update) add the extra goValues which we get that from the CSV. inside the goValues id and status are generated later.
All we need is to do is to append the values of name, goID, myID.
name should be the same as the goID without subnet, goID is goID, myID is myID.
Convert this to json as below:
{
"name": "10.0.0.0",
"myID": "a123-b456-c789",
"goID": "10.0.0.0/16"
},
{
"name": "10.1.0.0",
"myID": "a123-b456-c789",
"goID": "10.1.0.0/16"
},
{
"name": "10.2.0.0",
"myID": "a123-b456-c789",
"goID": "10.2.0.0/16"
}
and append to the input_JSON:
[
{
"id": "123",
"name": "test1",
"goValues": [
{
"id": "456",
"name": "10.3.0.0",
"myID": "a123-b456-c789",
"status": "active",
"goID": "10.3.0.0/16"
},
{
"id": "789",
"name": "10.4.0.0",
"myID": "a123-b456-c789",
"status": "active",
"goID": "10.4.0.0/16"
},
{
"name": "10.0.0.0",
"myID": "a123-b456-c789",
"goID": "10.0.0.0/16"
},
{
"name": "10.1.0.0",
"myID": "a123-b456-c789",
"goID": "10.1.0.0/16"
},
{
"name": "10.2.0.0",
"myID": "a123-b456-c789",
"goID": "10.2.0.0/16"
}
]
}
]
Try this:
csv = pd.read_csv('csv.csv')
with open('json.json') as f:
j = json.load(f)
for idx, row in csv.iterrows():
for goID in row.filter(like='goID'):
j[0]['goValues'].append({
'name': goID.split('/')[0],
'myID': row['myID'],
'goID': goID,
})
with open('json.json', 'w') as f:
json.dump(j, f, indent=2)

Json parsing losgstash

i can't parse this json with logstash... someone could help me?
seems like the way it is parsed can't be readed by logstash.
there is a ruby code to parse this?
I cannot extract the fields nested in the square brackets
[
{
"capacity": 0,
"created_at": "2021-04-06T16:18:34+02:00",
"decisions": [
{
"duration": "22h16m4.141220361s",
"id": 842,
"origin": "CAPI",
"scenario": "crowdsecurity/http-bad-user-agent",
"scope": "ip",
"simulated": false,
"type": "ban",
"value": "3.214.184.223/32"
},
.
.
.
{
"duration": "22h16m4.195897491s",
"id": 904,
"origin": "CAPI",
"scenario": "crowdsecurity/http-backdoors-attempts",
"scope": "ip",
"simulated": false,
"type": "ban",
"value": "51.68.11.195/32"
}
],
"events": null,
"events_count": 0,
"id": 12,
"labels": null,
"leakspeed": "",
"machine_id": "N/A",
"message": "",
"scenario": "update : +63/-0 IPs",
"scenario_hash": "",
"scenario_version": "",
"simulated": false,
"source": {
"scope": "Community blocklist",
"value": ""
},
"start_at": "2021-04-06 16:18:34.750588276 +0200 +0200",
"stop_at": "2021-04-06 16:18:34.750588717 +0200 +0200"
}
]
Require JSON
JSON.parse(yourString)
Would likely be what you're looking for.
The module is described here

Nested json - store values in csv

I am trying to convert a nested json file into csv. It's data from a darts API and the structure is always the same. Nevertheless I got some problems flattening and storing the values in a csv because of the nested structure.
json:
{
"summaries": [{
"sport_event": {
"id": "sr:sport_event:12967512",
"start_time": "2017-11-11T13:15:00+00:00",
"start_time_confirmed": true,
"sport_event_context": {
"sport": {
"id": "sr:sport:22",
"name": "Darts"
},
"category": {
"id": "sr:category:104",
"name": "International"
},
"competition": {
"id": "sr:competition:597",
"name": "Grand Slam of Darts"
},
"season": {
"id": "sr:season:47332",
"name": "Grand Slam of Darts 2017",
"start_date": "2017-11-11",
"end_date": "2017-11-20",
"year": "2017",
"competition_id": "sr:competition:597"
},
"stage": {
"order": 1,
"type": "league",
"phase": "stage_1",
"start_date": "2017-11-11",
"end_date": "2017-11-15",
"year": "2017"
},
"round": {
"number": 1
},
"groups": [{
"id": "sr:league:29766",
"name": "Grand Slam of Darts 2017, Group G",
"group_name": "G"
}]
},
"coverage": {
"live": true
},
"competitors": [{
"id": "sr:competitor:35936",
"name": "Smith, Michael",
"abbreviation": "SMI",
"qualifier": "home"
}, {
"id": "sr:competitor:83895",
"name": "Wilson, James",
"abbreviation": "WIL",
"qualifier": "away"
}]
},
"sport_event_status": {
"status": "closed",
"match_status": "ended",
"home_score": 5,
"away_score": 3,
"winner_id": "sr:competitor:35936"
}
}, {
"sport_event": {
"id": "sr:sport_event:12967508",
"start_time": "2017-11-11T13:40:00+00:00",
"start_time_confirmed": true,
"sport_event_context": {
"sport": {
"id": "sr:sport:22",
"name": "Darts"
},
"category": {
"id": "sr:category:104",
"name": "International"
},
"competition": {
"id": "sr:competition:597",
"name": "Grand Slam of Darts"
},
"season": {
"id": "sr:season:47332",
"name": "Grand Slam of Darts 2017",
"start_date": "2017-11-11",
"end_date": "2017-11-20",
"year": "2017",
"competition_id": "sr:competition:597"
},
"stage": {
"order": 1,
"type": "league",
"phase": "stage_1",
"start_date": "2017-11-11",
"end_date": "2017-11-15",
"year": "2017"
},
"round": {
"number": 1
},
"groups": [{
"id": "sr:league:29764",
"name": "Grand Slam of Darts 2017, Group F",
"group_name": "F"
}]
},
"coverage": {
"live": true
},
"competitors": [{
"id": "sr:competitor:70916",
"name": "Bunting, Stephen",
"abbreviation": "BUN",
"qualifier": "home"
}, {
"id": "sr:competitor:191262",
"name": "de Zwaan, Jeffrey",
"abbreviation": "DEZ",
"qualifier": "away"
}]
},
"sport_event_status": {
"status": "closed",
"match_status": "ended",
"home_score": 5,
"away_score": 4,
"winner_id": "sr:competitor:70916"
}
}
So for each sport_event I would like to store the variables:
"start_time"
from "season" the variable "name"
from "competitors" both "id" and "name"
from "sport_event_status" the "winner_id"
I have already tried to flatten the json file with this code:
import json
f = open(r'path of file.json')
data = json.load(f)
def flatten(data):
for key,value in data.items():
print (str(key)+'->'+str(value))
if type(value) == type(dict()):
flatten(value)
elif type(value) == type(list()):
for val in value:
if type(val) == type(str()):
pass
elif type(val) == type(list()):
pass
else:
flatten(val)
flatten(data)
print(data)
This actually prints out the following:
id->sr:season:47332
name->Grand Slam of Darts 2017
start_date->2017-11-11
end_date->2017-11-20
year->2017
competition_id->sr:competition:597
Now my question is how to store the values I mentioned above in a csv file.
Thanks in advance for your support.
Using jq, you basically just have to transcribe your specification, adding a bit of context and taking care of an embedded array:
.summaries[]
| .sport_event # Your specification:
| [.start_time, # start_time
.sport_event_context.season.name] # from "season" the variable "name"
+ [.competitors[] | .id, .name] # from "competitors" both "id" and "name"
+ [.sport_event_status.winner_id] # from "sport_event_status" the "winner_id"
| #csv
Invocation
E.g.
jq -rf program.jq my.json

Convert nested json to csv to sheets json api

I'm want to make my json to csv so that i can upload it on google sheets and make it as json api. Whenever i have change data i will just change it on google sheets. But I'm having problems on converting my json file to csv because it changes the variables whenever i convert it. I'm using https://toolslick.com/csv-to-json-converter to convert my json file to csv.
What is the best way to convert json nested to csv ?
JSON
{
"options": [
{
"id": "1",
"value": "Jumbo",
"shortcut": "J",
"textColor": "#FFFFFF",
"backgroundColor": "#00000"
},
{
"id": "2",
"value": "Hot",
"shortcut": "D",
"textColor": "#FFFFFF",
"backgroundColor": "#FFFFFF"
}
],
"categories": [
{
"id": "1",
"order": 1,
"name": "First Category",
"active": true
},
{
"id": "2",
"order": 2,
"name": "Second Category",
"shortcut": "MT",
"active": true
}
],
"products": [
{
"id": "03c6787c-fc2a-4aa8-93a3-5e0f0f98cfb2",
"categoryId": "1",
"name": "First Product",
"shortcut": "First",
"options": [
{
"optionId": "1",
"price": 23
},
{
"optionId": "2",
"price": 45
}
],
"active": true
},
{
"id": "e8669cea-4c9c-431c-84ba-0b014f0f9bc2",
"categoryId": "2",
"name": "Second Product",
"shortcut": "Second",
"options": [
{
"optionId": "1",
"price": 11
},
{
"optionId": "2",
"price": 20
}
],
"active": true
}
],
"discounts": [
{
"id": "1",
"name": "S",
"type": 1,
"amount": 20,
"active": true
},
{
"id": "2",
"name": "P",
"type": 1,
"amount": 20,
"active": true
},
{
"id": "3",
"name": "G",
"type": 2,
"amount": 5,
"active": true
}
]
}
Using python, this can be easily done or almost done. Maybe this code will help you in some way to understand that.
import json,csv
data = []
with open('your_json_file_here.json') as file:
for line in file:
data.append(json.loads(line))
length = len(data)
with open('create_new_file.csv','w') as f:
writer = csv.writer(f)
writers = csv.DictWriter(f, fieldnames=['header1','header2'])
writers.writeheader()
for iter in range(length):
writer.writerow((data[iter]['specific_col_name1'],data[iter]['specific_col_name2']))
f.close()