JSON to CSV conversion Linux terminal - json

I have the following example.json. How can I parse it to csv in order to get the mean value (between ** mean_value **).
I want something like in example.csv:
305152,277504,320512
[
{
"name": "stats",
"columns": [
"time",
"mean"
],
"points": [
[
1444038496000,
**305152**
],
[
1444038494000,
**277504**
],
[
1444038492000,
**320512**
]
]
}
]

In python it looks like this
import json
results = []
with open('example.json', 'r') as f:
content = json.loads(f.read())
for element in content:
results.append(','.join([str(y[1]) for y in element['points']]))
with open('example.csv', 'w') as f:
f.write('\n'.join(results))

Related

Receive IPs from JSON file

I am trying to get all IPs from a JSON file using Python 2.7.5
However I can not manage to do it correctly.
Do someone have an advice how I can receive all IPs from ('addressPrefixes') in a txt file?
Here is the code I already got to download the json file:
import urllib
import json
from urllib import urlopen
testfile = urllib.URLopener()
testfile.retrieve("https://download.microsoft.com/download/7/1/D/71D86715-5596-4529-9B13-
DA13A5DE5B63/ServiceTags_Public_20210426.json", "AzureIPs.json")
print("---SUCCESSFULLY RECEIVED MICROSOFT AZURE IPS---")
with open('AzureIPs.json','r') as f:
data = json.load(f)
the JSON file contains many IPs and IP Ranges and looks like this:
{
"changeNumber": 145,
"cloud": "Public",
"values": [
{
"name": "ActionGroup",
"id": "ActionGroup",
"properties": {
"changeNumber": 9,
"region": "",
"regionId": 0,
"platform": "Azure",
"systemService": "ActionGroup",
"addressPrefixes": [
"13.66.60.119/32",
"13.66.143.220/30",
"13.66.202.14/32",
"13.66.248.225/32",
"13.66.249.211/32",
"13.67.10.124/30",
"13.69.109.132/30",
"13.71.199.112/30",
"13.77.53.216/30",
"13.77.172.102/32",
"13.77.183.209/32",
"13.78.109.156/30",
"13.84.49.247/32",
"2603:1030:c06:400::978/125",
"2603:1030:f05:402::178/125",
"2603:1030:1005:402::178/125",
"2603:1040:5:402::178/125",
"2603:1040:207:402::178/125",
"2603:1040:407:402::178/125",
"2603:1040:606:402::178/125",
"2603:1040:806:402::178/125",
"2603:1040:904:402::178/125",
"2603:1040:a06:402::178/125",
"2603:1040:b04:402::178/125",
"2603:1040:c06:402::178/125",
"2603:1040:d04:800::f8/125",
"2603:1040:f05:402::178/125",
"2603:1040:1104:400::178/125",
"2603:1050:6:402::178/125",
"2603:1050:403:400::1f8/125"
],
"networkFeatures": [
"API",
"NSG",
"UDR",
"FW"
]
}
},
{
"name": "ApplicationInsightsAvailability",
"id": "ApplicationInsightsAvailability",
"properties": {
"changeNumber": 2,
"region": "",
"regionId": 0,
"platform": "Azure",
"systemService": "ApplicationInsightsAvailability",
"addressPrefixes": [
"13.86.97.224/27",
"13.86.98.0/27",
"13.86.98.48/28",
"13.86.98.64/28",
"20.37.156.64/27",
"20.37.192.80/29",
"20.38.80.80/28",
"20.40.104.96/27",
"20.40.104.128/27",
"20.40.124.176/28",
"20.40.124.240/28",
"20.40.125.80/28",
"20.40.129.32/27",
"20.40.129.64/26",
"20.40.129.128/27",
"20.42.4.64/27",
"20.42.35.32/28",
"20.42.35.64/26",
"20.42.35.128/28",
"20.42.129.32/27",
"20.43.40.80/28",
"20.43.64.80/29",
"20.43.128.96/29",
"20.45.5.160/27",
"20.45.5.192/26",
"20.189.106.64/29",
"23.100.224.16/28",
"23.100.224.32/27",
"23.100.224.64/26"
],
"networkFeatures": [
"API",
"NSG",
"UDR",
"FW"
]
}
},
{
"name": "AzureActiveDirectory",
"id": "AzureActiveDirectory",
"properties": {
"changeNumber": 8,
"region": "",
"regionId": 0,
"platform": "Azure",
"systemService": "AzureAD",
"addressPrefixes": [
"13.64.151.161/32",
"13.66.141.64/27",
"13.67.9.224/27",
"13.69.66.160/27",
"13.69.229.96/27",
"13.70.73.32/27"
],
"networkFeatures": [
"API",
"NSG",
"UDR",
"FW",
"VSE"
]
}
}
Thank you for your time.
import urllib
import json
from urllib import urlopen
testfile = urllib.URLopener()
testfile.retrieve("https://download.microsoft.com/download/7/1/D/71D86715-5596-4529-9B13-
DA13A5DE5B63/ServiceTags_Public_20210426.json", "AzureIPs.json")
print("---SUCCESSFULLY RECEIVED MICROSOFT AZURE IPS---")
with open('AzureIPs.json','r') as f:
data = json.load(f)
################# CHANGES AFTER THIS LINE #################
ips = []
values = data['values']
for block in values:
ips.append(block.properties.addressPrefixes)
However you will get 2D array using this approach, if you need 1D array and not separate block of IPs from each corresponding block in values, you can use following code to flatten the array.
import numpy as np
2DArray = np.array(ips)
1DArray = 2DArray.flatten()

Combine multiple JSON files, and parse into CSV

I have about 100 JSON files, all titled with different dates and I need to merge them into one CSV file that has headers "date", "real_name", "text".
There are no dates listed in the JSON itself, and the real_name is nested. I haven't worked with JSON in a while and am a little lost.
The basic structure of the JSON looks more or less like this:
Filename: 2021-01-18.json
[
{
"client_msg_id": "xxxx",
"type": "message",
"text": "THIS IS THE TEXT I WANT TO PULL",
"user": "XXX",
"user_profile": {
"first_name": "XXX",
"real_name": "THIS IS THE NAME I WANT TO PULL",
"display_name": "XXX",
"is_restricted": false,
"is_ultra_restricted": false
},
"blocks": [
{
"type": "rich_text",
"block_id": "yf=A9",
}
]
}
]
So far I have
import glob
read_files = glob.glob("*.json")
output_list = []
all_items = []
for f in read_files:
with open(f, "rb") as infile:
output_list.append(json.load(infile))
data = {}
for obj in output_list[]
data['date'] = f
data['text'] = 'text'
data['real_name'] = 'real_name'
all_items.append(data)
Once you've read the JSON object, just index into the dictionaries for the data. You might need obj[0]['text'], etc., if your JSON data is really in a list in each file, but that seems odd and I'm assuming your data was pasted from output_list after you'd collected the data. So assuming your file content is exactly like below:
{
"client_msg_id": "xxxx",
"type": "message",
"text": "THIS IS THE TEXT I WANT TO PULL",
"user": "XXX",
"user_profile": {
"first_name": "XXX",
"real_name": "THIS IS THE NAME I WANT TO PULL",
"display_name": "XXX",
"is_restricted": false,
"is_ultra_restricted": false
},
"blocks": [
{
"type": "rich_text",
"block_id": "yf=A9",
}
]
}
test.py:
import json
import glob
from pathlib import Path
read_files = glob.glob("*.json")
output_list = []
all_items = []
for f in read_files:
with open(f, "rb") as infile:
output_list.append(json.load(infile))
data = {}
for obj in output_list:
data['date'] = Path(f).stem
data['text'] = obj['text']
data['real_name'] = obj['user_profile']['real_name']
all_items.append(data)
print(all_items)
Output:
[{'date': '2021-01-18', 'text': 'THIS IS THE TEXT I WANT TO PULL', 'real_name': 'THIS IS THE NAME I WANT TO PULL'}]

Convert array of JSON objects to string in pyspark

I have one requirement in which I need to create a custom JSON from the columns returned from one PySpark dataframe. So I wrote one UDF like the below which will return a JSON in String format from UDF for each row.
Parameter "entities" are in the array of JSON format.
def halResponse(entities, admantx, copilot_id):
json_resp = "{\"analyzedContent\": {"+json.dumps(entities)+"}}"
return json_resp
But in the response, I am not getting proper JSON i.e instead of proper key: value pair, I am just getting values(actual values replace with * for security purpose), not key and value.
Find the sample response:
"analyzedContents": [
{
"entities": [
[
"******",
*,
*********,
[
[
"***********",
"***********",
"***********",
[
"*****************"
],
**********
]
],
"**************"
]
]
}
]
}
Please help me to resolve this issue. After fixing, I should get the below sample response
"analyzedContents": [
{
"entities": [
[
"key":******",
"key":*,
"key":*********,
[
[
"key":"***********",
"key":"***********",
"key":"***********",
[
"key":"*****************"
],
"key":**********
]
],
"key":"**************"
]
]
}
]
}
Try this without using an UDF:
import pyspark.sql.functions as F
df2 = df.withColumn(
'response',
F.concat(
F.lit("{\"analyzedContent\": {"),
F.to_json(F.col("entities")),
F.lit("}}")
)
)

Convert json fetched into dataframe using R

I've json like below, which i got from below URL:
{
"info" : {
"1484121600" : [
212953175.053333,212953175.053333,null
],
"1484125200" : [
236203014.133333,236203014.133333,236203014.133333
],
"1484128800" : [
211414832.968889,null,211414832.968889
],
"1484132400" : [
208604573.791111,208604573.791111,208604573.791111
],
"1484136000" : [
231358374.288889,231358374.288889,231358374.288889
],
"1484139600" : [
210529301.097778,210529301.097778,210529301.097778
],
"1484143200" : [
212009682.04,null,212009682.04
],
"1484146800" : [
232364759.566667,232364759.566667,232364759.566667
],
"1484150400" : [
218138788.524444,218138788.524444,218138788.524444
],
"1484154000" : [
218883301.282222,218883301.282222,null
],
"1484157600" : [
237874583.771111,237874583.771111,237874583.771111
],
"1484161200" : [
216227081.924444,null,216227081.924444
],
"1484164800" : [
227102054.082222,227102054.082222,null
]
},
"summary" : "data",
"end" : 1484164800,
"start": 1484121600
}
I'm fetching this json from some url using jsonlite package in R like below:
library(jsonlite)
input_data <- fromJSON(url)
timeseries <- input_data[['info']] # till here code is fine
abc <- data.frame(ds = names(timeseries[[1]]),
y = unlist(timeseries[[1]]), stringsAsFactors = FALSE)
(something is wrong in above line)
I need to convert this data in timeseries variable into data frame; which will have index column as the epoch time and no. of columns in dataframe will depend upon no. of values in array and all arrays will have same no. of values for sure. But no. of values in array can be 1 0r 2 or etc; it is not fixed. Like in below example array size is 3 for all.
for eg : dataframe should look like:
index y1 y2 y3
1484121600 212953175.053333 212953175.053333 null
1484125200 236203014.133333 236203014.133333 236203014.133333
Please suggest how do I do this in R. I'm new to it.
JSON with only 1 item in array:
{
"info": {
"1484121600": [
212953175.053333
],
"1484125200": [
236203014.133333
],
"1484128800": [
211414832.968889
],
"1484132400": [
208604573.791111
],
"1484136000": [
231358374.288889
],
"1484139600": [
210529301.097778
],
"1484143200": [
212009682.04
],
"1484146800": [
232364759.566667
],
"1484150400": [
218138788.524444
],
"1484154000": [
218883301.282222
],
"1484157600": [
237874583.771111
],
"1484161200": [
216227081.924444
],
"1484164800": [
227102054.082222
]
},
"summary": "data",
"end": 1484164800,
"start": 1484121600
}
Consider binding the list of json values to a matrix with sapply(), then transpose columns to rows with t(), and finally convert to dataframe with data.frame()
abc <- data.frame(t(sapply(timeseries, c)))
colnames(abc) <- gsub("X", "y", colnames(abc))
abc
# y1 y2 y3
# 1484121600 212953175 212953175 NA
# 1484125200 236203014 236203014 236203014
# 1484128800 211414833 NA 211414833
# 1484132400 208604574 208604574 208604574
# 1484136000 231358374 231358374 231358374
# 1484139600 210529301 210529301 210529301
# 1484143200 212009682 NA 212009682
# 1484146800 232364760 232364760 232364760
# 1484150400 218138789 218138789 218138789
# 1484154000 218883301 218883301 NA
# 1484157600 237874584 237874584 237874584
# 1484161200 216227082 NA 216227082
# 1484164800 227102054 227102054 NA

render as JSON clipping of the zeros of the floating point value

I have a map which is like this, which has to be rendered as JSON to the output.
def formatedResult = [
results:[
[ Name:foo, sex:m, salary:171.900 ],
[ Name:bar, sex:m, salary:171.900 ]
]
]
I am rendering this response as
withFormat {
json {
render formatedResult as JSON
}
}
which produces the following result.
{
results: [{
Name: "foo",
sex: "m",
salary: 171.9
}, {
Name: "bar",
sex: "m",
salary: 171.9
}]
}
But is clipping off the zeros from the salary. What should I do to get the JSON with out clipping off the zeros?
If you are hard coding your values as in your example put them like this:
def formatedResult = [
results:[
[ Name:foo, sex:m, salary:"171.900" ],
[ Name:bar, sex:m, salary:"171.900" ]
]
]
Or if you are getting them from variable use toString() method to convert them to string.
def formatedResult = [
results:[
[ Name:foo, sex:m, salary:salary.toString()],
[ Name:bar, sex:m, salary:salary.toString()]
]
]
Finally your render:
withFormat {
json {
render formatedResult as JSON
}
}