Very new to python so please forgive me if this is a silly question but I have been attempting to loop an extraction of certain information within a .json file (specifically the date and one value in particular) in order to create a time series. Due to me having over 300 files I would like this to be done automatically, in order to easily create a time series of certain values. I have managed to print the data, however have failed to extract this information to a text file that would be readable in something like excel.
Please find attached both the example .json file I am trying to extract and my code so far. Thanks!
{
"AcquasitionInfo": {
"Date": {
"Day": 27,
"Month": 3,
"Year": 2011
},
"EarthSunDistance": 0.9977766,
"SolarAzimuth": 154.94013617,
"SolarZenith": 53.1387049,
"Time": {
"Hour": 11,
"Minute": 0,
"Second": 21
},
"sensorAzimuth": 0.0,
"sensorZenith": 0.0
},
"FileInfo": {
"CLOUD_MASK": "LS5TM_20110327_lat53lon354_r23p204_clouds.kea",
"FileBaseName": "LS5TM_20110327_lat53lon354_r23p204",
"IMAGE_DEM": "LS5TM_20110327_lat53lon354_r23p204_dem.kea",
"METADATA": "LS5TM_20110327_lat53lon354_r23p204_meta.json",
"ProviderMetadata": "LT05_L1TP_204023_20110327_20161208_01_T1_MTL.txt",
"RADIANCE": "LS5TM_20110327_lat53lon354_r23p204_vmsk_mclds_rad.kea",
"RADIANCE_WHOLE": "LS5TM_20110327_lat53lon354_r23p204_vmsk_rad.kea",
"SREF_6S_IMG": "LS5TM_20110327_lat53lon354_r23p204_vmsk_mclds_topshad_rad_srefdem.kea",
"STD_SREF_IMG": "LS5TM_20110327_lat53lon354_r23p204_vmsk_mclds_topshad_rad_srefdem_stdsref.kea",
"THERMAL_BRIGHT": "LS5TM_20110327_lat53lon354_r23p204_vmsk_thrad_thermbright.kea",
"THERMAL_BRIGHT_WHOLE": "LS5TM_20110327_lat53lon354_r23p204_vmsk_thrad_thermbright.kea",
"THERM_RADIANCE_WHOLE": "LS5TM_20110327_lat53lon354_r23p204_vmsk_thermrad.kea",
"TOA": "LS5TM_20110327_lat53lon354_r23p204_vmsk_mclds_rad_toa.kea",
"TOA_WHOLE": "LS5TM_20110327_lat53lon354_r23p204_vmsk_rad_toa.kea",
"TOPO_SHADOW_MASK": "LS5TM_20110327_lat53lon354_r23p204_toposhad.kea",
"VALID_MASK": "LS5TM_20110327_lat53lon354_r23p204_valid.kea",
"VIEW_ANGLE": "LS5TM_20110327_lat53lon354_r23p204_viewangle.kea"
},
"ImageInfo": {
"CellSizeRefl": 30.0,
"CellSizeTherm": 30.0,
"CloudCover": 52.0,
"CloudCoverLand": 79.0
},
"LocationInfo": {
"Geographical": {
"BBOX": {
"BLLat": 52.06993,
"BLLon": -5.34028,
"BRLat": 52.08621,
"BRLon": -1.72003,
"TLLat": 54.09075,
"TLLon": -5.45257,
"TRLat": 54.10827,
"TRLon": -1.65856
},
"CentreLat": 53.10330325240661,
"CentreLon": -3.5429440927905724
},
"Projected": {
"BBOX": {
"BLX": 354735.0,
"BLY": 5776815.0,
"BRX": 572985.0,
"BRY": 5776815.0,
"TLX": 354735.0,
"TLY": 5992035.0,
"TRX": 572985.0,
"TRY": 5992035.0
},
"CentreX": 463860.0,
"CentreY": 5884425.0,
"VPOLY": {
"MaxXX": 572985.0,
"MaxXY": 5950185.0,
"MaxYX": 405795.0,
"MaxYY": 5992035.0,
"MinXX": 354735.0,
"MinXY": 5819025.0,
"MinYX": 521775.0,
"MinYY": 5776815.0
}
}
},
"ProductsInfo": {
"ARCSIProducts": [
"CLOUDS",
"DOSAOTSGL",
"STDSREF",
"METADATA"
],
"ARCSI_AOT_RANGE_MAX": 0.5,
"ARCSI_AOT_RANGE_MIN": 0.05,
"ARCSI_AOT_VALUE": 0.5,
"ARCSI_CLOUD_COVER": 0.627807080745697,
"ARCSI_LUT_ELEVATION_MAX": 1100,
"ARCSI_LUT_ELEVATION_MIN": -100,
"ProcessDate": {
"Day": 11,
"Month": 7,
"Year": 2018
},
"ProcessTime": {
"Hour": 7,
"Minute": 24,
"Second": 55
}
},
"SensorInfo": {
"ARCSISensorName": "LS5TM",
"Path": 204,
"Row": 23,
"SensorID": "TM",
"SpacecraftID": "LANDSAT_5"
},
"SoftwareInfo": {
"Name": "ARCSI",
"URL": "http://www.rsgislib.org/arcsi",
"Version": "3.1.4"
} }
import glob
import json
jsonfile = glob.glob('*.json')
with open(jsonfile[0]) as f:
data = json.load(f)
print(data["AcquasitionInfo"]["Date"]["Day"])
print(data["AcquasitionInfo"]["Date"]["Month"])
print(data["AcquasitionInfo"]["Date"]["Year"])
print(data["ProductsInfo"]["ARCSI_AOT_VALUE"])
with open('data.txt', 'w') as outfile:
json.dump(["ProductsInfo"]["ARCSI_AOT_VALUE"], outfile)
You forgot the data in the last line :
import glob
import json
jsonfile = glob.glob('*.json')
with open(jsonfile[0]) as f:
data = json.load(f)
print(data["AcquasitionInfo"]["Date"]["Day"])
print(data["AcquasitionInfo"]["Date"]["Month"])
print(data["AcquasitionInfo"]["Date"]["Year"])
print(data["ProductsInfo"]["ARCSI_AOT_VALUE"])
with open('data.txt', 'w') as outfile:
json.dump(data["ProductsInfo"]["ARCSI_AOT_VALUE"], outfile)
EDIT:
You can do like this :
import json
import os
for file in os.listdir("."):
if file.endswith(".json"):
with open(file) as f:
data = json.load(f)
with open('data.txt', 'a') as outfile:
json.dump(data["ProductsInfo"]["ARCSI_AOT_VALUE"], outfile)
outfile.write(';')
Related
I am trying to get all IPs from a JSON file using Python 2.7.5
However I can not manage to do it correctly.
Do someone have an advice how I can receive all IPs from ('addressPrefixes') in a txt file?
Here is the code I already got to download the json file:
import urllib
import json
from urllib import urlopen
testfile = urllib.URLopener()
testfile.retrieve("https://download.microsoft.com/download/7/1/D/71D86715-5596-4529-9B13-
DA13A5DE5B63/ServiceTags_Public_20210426.json", "AzureIPs.json")
print("---SUCCESSFULLY RECEIVED MICROSOFT AZURE IPS---")
with open('AzureIPs.json','r') as f:
data = json.load(f)
the JSON file contains many IPs and IP Ranges and looks like this:
{
"changeNumber": 145,
"cloud": "Public",
"values": [
{
"name": "ActionGroup",
"id": "ActionGroup",
"properties": {
"changeNumber": 9,
"region": "",
"regionId": 0,
"platform": "Azure",
"systemService": "ActionGroup",
"addressPrefixes": [
"13.66.60.119/32",
"13.66.143.220/30",
"13.66.202.14/32",
"13.66.248.225/32",
"13.66.249.211/32",
"13.67.10.124/30",
"13.69.109.132/30",
"13.71.199.112/30",
"13.77.53.216/30",
"13.77.172.102/32",
"13.77.183.209/32",
"13.78.109.156/30",
"13.84.49.247/32",
"2603:1030:c06:400::978/125",
"2603:1030:f05:402::178/125",
"2603:1030:1005:402::178/125",
"2603:1040:5:402::178/125",
"2603:1040:207:402::178/125",
"2603:1040:407:402::178/125",
"2603:1040:606:402::178/125",
"2603:1040:806:402::178/125",
"2603:1040:904:402::178/125",
"2603:1040:a06:402::178/125",
"2603:1040:b04:402::178/125",
"2603:1040:c06:402::178/125",
"2603:1040:d04:800::f8/125",
"2603:1040:f05:402::178/125",
"2603:1040:1104:400::178/125",
"2603:1050:6:402::178/125",
"2603:1050:403:400::1f8/125"
],
"networkFeatures": [
"API",
"NSG",
"UDR",
"FW"
]
}
},
{
"name": "ApplicationInsightsAvailability",
"id": "ApplicationInsightsAvailability",
"properties": {
"changeNumber": 2,
"region": "",
"regionId": 0,
"platform": "Azure",
"systemService": "ApplicationInsightsAvailability",
"addressPrefixes": [
"13.86.97.224/27",
"13.86.98.0/27",
"13.86.98.48/28",
"13.86.98.64/28",
"20.37.156.64/27",
"20.37.192.80/29",
"20.38.80.80/28",
"20.40.104.96/27",
"20.40.104.128/27",
"20.40.124.176/28",
"20.40.124.240/28",
"20.40.125.80/28",
"20.40.129.32/27",
"20.40.129.64/26",
"20.40.129.128/27",
"20.42.4.64/27",
"20.42.35.32/28",
"20.42.35.64/26",
"20.42.35.128/28",
"20.42.129.32/27",
"20.43.40.80/28",
"20.43.64.80/29",
"20.43.128.96/29",
"20.45.5.160/27",
"20.45.5.192/26",
"20.189.106.64/29",
"23.100.224.16/28",
"23.100.224.32/27",
"23.100.224.64/26"
],
"networkFeatures": [
"API",
"NSG",
"UDR",
"FW"
]
}
},
{
"name": "AzureActiveDirectory",
"id": "AzureActiveDirectory",
"properties": {
"changeNumber": 8,
"region": "",
"regionId": 0,
"platform": "Azure",
"systemService": "AzureAD",
"addressPrefixes": [
"13.64.151.161/32",
"13.66.141.64/27",
"13.67.9.224/27",
"13.69.66.160/27",
"13.69.229.96/27",
"13.70.73.32/27"
],
"networkFeatures": [
"API",
"NSG",
"UDR",
"FW",
"VSE"
]
}
}
Thank you for your time.
import urllib
import json
from urllib import urlopen
testfile = urllib.URLopener()
testfile.retrieve("https://download.microsoft.com/download/7/1/D/71D86715-5596-4529-9B13-
DA13A5DE5B63/ServiceTags_Public_20210426.json", "AzureIPs.json")
print("---SUCCESSFULLY RECEIVED MICROSOFT AZURE IPS---")
with open('AzureIPs.json','r') as f:
data = json.load(f)
################# CHANGES AFTER THIS LINE #################
ips = []
values = data['values']
for block in values:
ips.append(block.properties.addressPrefixes)
However you will get 2D array using this approach, if you need 1D array and not separate block of IPs from each corresponding block in values, you can use following code to flatten the array.
import numpy as np
2DArray = np.array(ips)
1DArray = 2DArray.flatten()
I would like to extract only a small fraction of my JSON response in a .csv file. However, I need to go to 4 levels deep and I am currently only able to go to 3 level deep. My goal is to have a .csv with 3 columns campaign_id, campaign_name, cost_per_click and 3 lines for each of my campaigns.
Original JSON
{
"318429215527453": {
"conversion_events": {
"data": [
{
"id": "djdfhdf",
"name": "Total",
"cost": 328.14,
"metrics_breakdown": {
"data": [
{
"campaign_id": 2364,
"campaign_name": "uk",
"cost_per_click": 1345
},
{
"campaign_id": 7483,
"campaign_name": "fr",
"cost_per_click": 756
},
{
"campaign_id": 8374,
"campaign_name": "spain",
"cost_per_click": 545
},
{
"campaign_id": 2431,
"campaign_name": "ge",
"cost_per_click": 321
}
],
"paging": {
"cursors": {
"after": "MjUZD"
},
"next": "https://graph.facebook.com/v9.0/xxxx"
}
}
}
],
"summary": {
"count": 1,
"metric_date_range": {
"date_range": {
"begin_date": "2021-01-09T00:00:00+0100",
"end_date": "2021-02-08T00:00:00+0100",
"time_zone": "Europe/Paris"
},
"prior_period_date_range": {
"begin_date": "2020-12-10T00:00:00+0100",
"end_date": "2021-01-09T00:00:00+0100"
}
}
}
},
"id": "xxx"
}
}
reformated.py
import json
with open('campaigns.json') as json_file:
data = json.load(json_file)
reformated_json = data['318429215527453']['conversion_events']['data']
with open('data.json', 'w') as outfile:
json.dump(reformated_json, outfile)
I tried to add ['metrics_breakdown'] or another ['data'] at the end of reformated_json but I am getting TypeError: list indices must be integers or slices, not str.
{
"id": "djdfhdf",
"name": "Total",
"cost": 328.14,
"metrics_breakdown": {
"data": [
{
"campaign_id": 2364,
"campaign_name": "uk",
"cost_per_click": 1345,
},
{
"campaign_id": 7483,
"campaign_name": "fr",
"cost_per_click": 756,
},
{
"campaign_id": 8374,
"campaign_name": "spain",
"cost_per_click": 545,
},
{
"campaign_id": 2431,
"campaign_name": "ge",
"cost_per_click": 321,
},
],
"paging": {
"cursors": {
"after": "MjUZD"
},
"next": "https://graph.facebook.com/v9.0/xxxx"
}
}
}
]
import csv
import json
from typing import Dict, List, Union # typing for easy development
# read json function
def read_json(json_path: str) -> Union[Dict, List]:
with open(json_path, 'r') as file_io:
return json.load(file_io)
# write csv function
def write_csv(data: List[Dict], csv_path: str) -> None:
with open(csv_path, 'w') as file:
fieldnames = set().union(*data)
writer = csv.DictWriter(file, fieldnames=fieldnames,
lineterminator='\n')
writer.writeheader()
writer.writerows(data)
# parse campaigns using a comprehension
def parse_campaigns(data: Dict) -> List[Dict]:
return [row
for value in data.values() # first level (conversion events)
for root_data in value['conversion_events']['data'] # conversion events/data
for row in root_data['metrics_breakdown']['data']] # data/metrics_breakdown/data
json_data = read_json('./campaigns.json')
campaign_data = parse_campaigns(json_data)
write_csv(campaign_data, 'campaigns.csv')
campaigns.csv (I copied the data to multiple root dictionary objects):
cost_per_click,campaign_id,campaign_name
1345,2364,uk
756,7483,fr
545,8374,spain
321,2431,ge
1345,2364,uk
756,7483,fr
545,8374,spain
321,2431,ge
The first data subkey contains a single-element list. Dereference with [0] to get the element, then fetch the next layers of keys. Then a DictWriter can be used to write the CSV lines:
import json
import csv
with open('campaigns.json') as json_file:
data = json.load(json_file)
items = data['318429215527453']['conversion_events']['data'][0]['metrics_breakdown']['data']
with open('data.csv', 'w', newline='') as outfile:
w = csv.DictWriter(outfile,fieldnames=items[0].keys())
w.writeheader()
w.writerows(items)
Output:
campaign_id,campaign_name,cost_per_click
2364,uk,1345
7483,fr,756
8374,spain,545
2431,ge,321
I want to extract part of an existing JSON file based on a list of keys and save it into another JSON file
For eg-
{
"211": {
"year": "2020",
"field": "chemistry"
},
"51": {
"year": "2019",
"field":"physics"
},
"5": {
"year": "2014",
"field":"Literature"
}
Lets say the list =[5,51]
Output json file should contain
{
"5": {
"year": "2014",
"field":"Literature"
},
"51": {
"year": "2019",
"field":"physics"
}
}
}
It should not contain data for key 211
I believe this will work for you:
import json
# Open input file and deserialize JSON to a dict
with open("input_file.json", "r", encoding="utf8") as read_file:
input_file_dict = json.load(read_file)
# List of id's you want
inc_id_list = [5,51,2,101]
output_dict = dict()
# Iterate over the input file JSON and only add the items in the list
for id in input_file_dict.keys():
if int(id) in inc_id_list:
output_dict[id] = input_file_dict.get(id)
# Serialize output dict to JSON and write to output file
with open('output_file.json', 'w') as output_json_file:
json.dump(output_dict, output_json_file)
try this :
l= {
"211": {
"year": "2020",
"field": "chemistry"
},
"51": {
"year": "2019",
"field":"physics"
},
"5": {
"year": "2014",
"field":"Literature"
} }
for i,v in l.items():
if(i!="211"):
print(i,v)
It's a dictionary of JSON, with a missing bracket at the end, so that is easily done with a comprehension:
mydict = { "211": { "year": "2020", "field": "chemistry" }, "51": { "year": "2019", "field":"physics" }, "5": { "year": "2014", "field":"Literature" }}
incList = [5, 51]
myAnswer = {k:v for (k,v) in mydict.items() if int(k) in incList}
Now I am working on extracting information from a JSON file in Ruby. Then how can I extract just the numbers next to the word 'score' from the following text file? For example, I want to get 0.6748984055823062, 0.6280145725181376 on and on.
{
"sentiment_analysis": [
{
"positive": [
{
"sentiment": "Popular",
"topic": "games",
"score": 0.6748984055823062,
"original_text": "Popular games",
"original_length": 13,
"normalized_text": "Popular games",
"normalized_length": 13,
"offset": 0
},
{
"sentiment": "engaging",
"topic": "pop culture-inspired games",
"score": 0.6280145725181376,
"original_text": "engaging pop culture-inspired games",
"original_length": 35,
"normalized_text": "engaging pop culture-inspired games",
"normalized_length": 35,
"offset": 370
},
"negative": [
{
"sentiment": "get sucked into",
"topic": "the idea of planning",
"score": -0.7923352042939829,
"original_text": "Students get sucked into the idea of planning",
"original_length": 45,
"normalized_text": "Students get sucked into the idea of planning",
"normalized_length": 45,
"offset": 342
},
{
"sentiment": "be daunted",
"topic": null,
"score": -0.5734506634410159,
"original_text": "initially be daunted",
"original_length": 20,
"normalized_text": "initially be daunted",
"normalized_length": 20,
"offset": 2104
},
What I have tried is that I could read a file and set the text file to a hash variable using the JSON method.
require 'json'
json = JSON.parse(json_string)
Using the JSON class:
Importing a file:
require "json"
file = File.open "/path/to/your/file.json"
data = JSON.load file
Optionally, you can close it now:
file.close
The file looks like this:
{
"title": "Facebook",
"url": "https://www.facebook.com",
"posts": [
"lemon-car",
"dead-memes"
]
}
The file is now able to be read like this:
data["title"]
=> "Facebook"
data.keys
=> ["title", "url", "posts"]
data['posts']
=> ["lemon-car", "dead-memes"]
data["url"]
=> "https://www.facebook.com"
Hope this helped!
Parse Data from File:
data_hash = JSON.parse(File.read('file-name-to-be-read.json'))
Then just map over the data!
reviews = data_hash['sentiment_analysis'].first
reviews.map do |sentiment, reviews|
puts "#{sentiment} #{reviews.map { |review| review['score'] }}"
end
I think this is the simplest answer.
You can use Array#map to collect the reviews.
reviews = json['sentiment_analysis'][0]
positive_reviews = reviews['positive']
negative_reviews = reviews['negative']
positive_reviews.map { |review| review['score'] }
=> [0.6748984055823062, 0.6280145725181376]
negative_reviews.map { |review| review['score'] }
=> [-0.7923352042939829, -0.5734506634410159]
Hope this helps!
I'm building a gatling 2.1.3 scenario and I need to extract data from a json body.
Example of the body:
[
{
"objectId": "FirstFoo",
"pvrId": "413"
"type": "foo",
"name": "the first name",
"fooabilities": {
"foo1": true,
"foo2": true
},
"versions": [23, 23, 23, 23, 23, 23, 24, 23, 23],
"logo": [
{
"type": "firstlogo",
"width": 780,
"height": 490,
"url": "firstlogos/HD/{resolution}.png"
}
]
},
{
"objectId": "SecondFoo",
"pvrId": "414"
"type": "foo",
"name": "the second name",
"fooabilities": {
"foo1": true,
"foo2": false
},
"versions": [23, 23, 23, 23, 23, 23, 24, 23, 23],
"logo": [
{
"type": "secondlogo",
"width": 780,
"height": 490,
"url": "secondlogos/HD/{resolution}.png"
}
]
}
]
and I have this code trying to extract de data:
exec(
http("get object")
.get(commons.base_url_ws + "/my-resource/2.0/object/")
.headers(commons.headers_ws_session).asJSON
.check(jsonPath("$..*").findAll.saveAs("MY_RESULT"))) (1)
.exec(session => {
foreach("${MY_RESULT}", "result") { (2)
exec(session => {
val result= session("result").as[Map[String, Any]]
val objectId = result("objectId")
val version = result("version")
session.set("MY_RESULT_INFO", session("MY_RESULT_INFO").as[List[(String,Int)]] :+ Tuple2(objectId, version))
})
}
session
})
My goal is:
To extract the objectId and the 9th value from the version array.
I want it to look as Vector -> [(id1, version1),(id2, version2)] in the session to reuse later in another call to the API.
My concerns are:
(1) Is this going to create entries in the session with the complete sub objects? Because in other answers I was that is was always a map that was saved ("id" = [{...}]) and here I do not have ids.
(2) In the logs, I see that the session is loaded with a lot of data, but this foreach is never called. What could cause this ?
My experience in Scala is of a beginner - there may be issues I did not see.
I have looked into this issue: Gatling - Looping through JSON array and it is not exactly answering my case.
I found a way to do it with a regex.
.check(regex("""(?:"objectId"|"version"):"(.*?)",.*?(?:"objectId"|"version"):\[(?:.*?,){9}([0-9]*?),.*?\]""").ofType[(String, String)].findAll saveAs ("OBJECTS")))
I can then use this
foreach("${OBJECTS}", "object") {
exec(
http("Next API call")
.get(commons.base_url_ws + "/my-resource/2.0/foo/${object._1}/${object._2}")
[...]
}