AWS DynamoDB Issues adding values to existing table - json

I have already created a table called Sensors and identified Sensor as the hash key. I am trying to add to the table with my .json file. The items in my file look like this:
{
"Sensor": "Room Sensor",
"SensorDescription": "Turns on lights when person walks into room",
"ImageFile": "rmSensor.jpeg",
"SampleRate": "1000",
"Locations": "Baltimore, MD"
}
{
"Sensor": "Front Porch Sensor",
"SensorDescription": " ",
"ImageFile": "fpSensor.jpeg",
"SampleRate": "2000",
"Locations": "Los Angeles, CA"
}
There's 20 different sensors in the file. I was using the following command:
aws dynamodb batch-write-item \
--table-name Sensors \
--request-items file://sensorList.json \
--returned-consumed-capacity TOTAL
I get the following error:
Error parsing parameter '--request-items': Invalid JSON: Extra data: line 9 column 1 (char 189)
I've tried adding --table name Sensors to the cl and it says Unknown options: --table-name, Sensors. I've tried put-item and a few others. I am trying to understand what my errors are, what I need to change in my .json if anything, and what I need to change in my cl. Thanks!

Your input file is not a valid json. You are missing a comma to separate both objects, and you need to enclose everything with brackets [ ..... ]

Related

Dask how to open json with list of dicts

I'm trying to open a bunch of JSON files using read_json In order to get a Dataframe as follow
ddf.compute()
id owner pet_id
0 1 "Charlie" "pet_1"
1 2 "Charlie" "pet_2"
3 4 "Buddy" "pet_3"
but I'm getting the following error
_meta = pd.DataFrame(
columns=list(["id", "owner", "pet_id"]])
).astype({
"id":int,
"owner":"object",
"pet_id": "object"
})
ddf = dd.read_json(f"mypets/*.json", meta=_meta)
ddf.compute()
*** ValueError: Metadata mismatch found in `from_delayed`.
My JSON files looks like
[
{
"id": 1,
"owner": "Charlie",
"pet_id": "pet_1"
},
{
"id": 2,
"owner": "Charlie",
"pet_id": "pet_2"
}
]
As far I understand the problem is that I'm passing a list of dicts, so I'm looking for the right way to specify it the meta= argument
PD:
I also tried doing it in the following way
{
"id": [1, 2],
"owner": ["Charlie", "Charlie"],
"pet_id": ["pet_1", "pet_2"]
}
But Dask is wrongly interpreting the data
ddf.compute()
id owner pet_id
0 [1, 2] ["Charlie", "Charlie"] ["pet_1", "pet_2"]
1 [4] ["Buddy"] ["pet_3"]
The invocation you want is the following:
dd.read_json("data.json", meta=meta,
blocksize=None, orient="records",
lines=False)
which can be largely gleaned from the docstring.
meta looks OK from your code
blocksize must be None, since you have a whole JSON object per file and cannot split the file
orient "records" means list of objects
lines=False means this is not a line-delimited JSON file, which is the more common case for Dask (you are not assuming that a newline character means a new record)
So why the error? Probably Dask split your file on some newline character, and so a partial record got parsed, which therefore did not match your given meta.

How to evaluate JSON Path with fields that contain quotes inside a value?

I have a NiFi flow that takes JSON files and evaluates a JSON Path argument against them. It work perfectly except when dealing with records that contain Korean text. The Jayway JSONPath evaluator does not seem to recognize the escape "\" in the headline field before the double quote character. Here is an example (newlines added to help with formatting):
{"data": {"body": "[이데일리 김관용 기자] 우리 군이 2018 남북정상회담을 앞두고 남
북간 군사적 긴장\r\n완화와 평화로운 회담 분위기 조성을 위해 23일 0시를 기해 군사분계선
(MDL)\r\n일대에서의 대북확성기 방송을 중단했다.\r\n\r\n국방부는 이날 남북정상회담 계기
대북확성기 방송 중단 관련 내용을 발표하며\r\n“이번 조치가 남북간 상호 비방과 선전활동을
중단하고 ‘평화, 새로운 시작’을\r\n만들어 나가는 성과로 이어지기를 기대한다”고 밝혔
다.\r\n\r\n전방부대 우리 군 장병이 대북확성기 방송을 위한 장비를 점검하고 있다.\r\n[사
진=국방부공동취재단]\r\n\r\n\r\n\r\n▶ 당신의 생활 속 언제 어디서나 이데일리 \r\n▶
스마트 경제종합방송 ‘이데일리 TV’ | 모바일 투자정보 ‘투자플러스’\r\n▶ 실시간 뉴스와
속보 ‘모바일 뉴스 앱’ | 모바일 주식 매매 ‘MP트래블러Ⅱ’\r\n▶ 전문가를 위한 국내 최상의
금융정보단말기 ‘이데일리 마켓포인트 3.0’ | ‘이데일리 본드웹 2.0’\r\n▶ 증권전문가방송
‘이데일리 ON’ 1666-2200 | ‘ON스탁론’ 1599-2203\n<ⓒ종합 경제정보 미디어 이데일리 -
무단전재 & 재배포 금지> \r\n",
"mimeType": "text/plain",
"language": "ko",
"headline": "국방부 \"軍 대북확성기 방송, 23일 0시부터 중단\"",
"id": "EDYM00251_1804232beO/5WAUgdlYbHS853hYOGrIL+Tj7oUjwSYwT"}}
If this object is in my file, the JSON path evaluates blanks for all the path arguments. Is there a way to force the Jayway engine to recognize the "\"? It appears to function correctly in other languages.
I was able to resolve this after understanding the difference between definite and indefinite paths. The Jayway github README points out the following will make a path indefinite and return a list:
When evaluating a path you need to understand the concept of when a
path is definite. A path is indefinite if it contains:
.. - a deep scan operator
?(<expression>) - an expression
[<number>, <number> (, <number>)] - multiple array indexes Indefinite paths
always returns a list (as represented by current JsonProvider).
My JSON looked like the following:
{
"Version":"14",
"Items":[
{"data": {"body": "[이데일리 ... \r\n",
"mimeType": "text/plain",
"language": "ko",
"headline": "국방부 \"軍 ... 중단\"",
"id": "1"}
},
{"data": {"body": "[이데일리 ... \r\n",
"mimeType": "text/plain",
"language": "ko",
"headline": "국방부 \"軍 ... 중단\"",
"id": "2"}
...
}
]
}
This JSON path selector I was using ($.data.headline) did not grab the values as I expected. It instead returned null values.
Changing it to $.Items[*].data.headline or $..data.headline returns a list of each headline.

How do you print multiple key values from sub keys in a .json file?

Im pulling a list of AMI ids from my AWS account and its being written into a json file.
The json looks basically like this:
{
"Images": [
{
"CreationDate": "2017-11-24T11:05:32.000Z",
"ImageId": "ami-XXXXXXXX"
},
{
"CreationDate": "2017-11-24T11:05:32.000Z",
"ImageId": "ami-aaaaaaaa"
},
{
"CreationDate": "2017-10-24T11:05:32.000Z",
"ImageId": "ami-bbbbbbb"
},
{
"CreationDate": "2017-10-24T11:05:32.000Z",
"ImageId": "ami-cccccccc"
},
{
"CreationDate": "2017-12-24T11:05:32.000Z",
"ImageId": "ami-ddddddd"
},
{
"CreationDate": "2017-12-24T11:05:32.000Z",
"ImageId": "ami-eeeeeeee"
}
]
}
My code looks like this so far after gathering the info and writing it to a .json file locally:
#writes json output to file...
print('writing to response.json...')
with open('response.json', 'w') as outfile:
json.dump(response, outfile, ensure_ascii=False, indent=4, sort_keys=True, separators=(',', ': '))
#Searches file...
print('opening response.json...')
with open("response.json") as f:
file_parsed = json.load(f)
The next part im stuck on is how to iterate through the file and print only the CreationDate and ImageId values.
print('printing CreationDate and ImageId...')
for ami in file_parsed['Images']:
#print ami['CreationDate'] #THIS WORKS
#print ami['ImageId'] #THIS WORKS
#print ami['CreationDate']['ImageId']
The last line there gives me this no matter how I have tried it: TypeError: string indices must be integers
My desired output is something like this:
2017-11-24T11:05:32.000Z ami-XXXXXXXX
Ultimately what im looking to do is then iterate through lines that are a certain date or older and deregister those AMIs. So would I be converting these to a list or a dict?
Pretty much not a programmer here so dont drown me.
TIA
You have almost parsed the json but for the desired output you need to concatenate the 'CreationDate' and 'ImageId' like this:
for ami in file_parsed['Images']:
print(ami['CreationDate'] + " "+ ami['ImageId'])
CreationDate evaluates to a string. So you can only take numerical indices of a string which is why ['CreationDate']['ImageId'] leads to a TypeError. Your other two commented lines, however, were correct.
To check if the date is older, you can make use of the datetime module. For instance, you can take the CreationDate (which is a string), convert it to a datetime object, create your own based on what that certain date is, and compare the two.
Something to this effect:
def checkIfOlder(isoformat, targetDate):
dateAsString = datetime.strptime(isoformat, '%Y-%m-%dT%H:%M:%S.%fZ')
return dateAsString <= targetDate
certainDate = datetime(2017, 11, 30) # Or whichever date you want
So in your for loop:
for ami in file_parsed['Images']:
creationDate = ami['CreationDate']
if checkIfOlder(creationDate, certainDate):
pass # write code to deregister AMIs here
Resources that would benefit would be Python's datetime documentation and in particular, the strftime/strptime directives. HTH!

Hive Table Error loading json data

json file:
{
"DocId":"ABC",
"User":{
"Id":1234,
"Username":"sam1234",
"Name":"Sam",
"ShippingAddress":{
"Address1":"123 Main St.",
"Address2":null,
"City":"Durham",
"State":"NC"
},
"Orders":[{
"ItemId":6789,
"OrderDate":"11/11/2012"
},
{
"ItemId":4352,
"OrderDate":"12/12/2012"
}
]
}
}}
schema:
create external table sample_json(DocId string,User struct<Id:int,Username:string,Name:string,ShippingAddress:struct<Address1:string,Address2:string,City:string,State:string>,Orders:array<struct<ItemId:int,OrderDate:string>>>)ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe' location '/user/babu/sample_json';
--loading data to the hive table
load data inpath '/user/samplejson/samplejson.json' into table sample_json;
Error:
when I am firing the select query like
select * from sample_json;
Exception:
Failed with exception
java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException:
org.codehaus.jackson.JsonParseException: Unexpected end-of-input:
expected close marker for OBJECT (from [Source:
java.io.StringReader#8c3770; line: 1, column: 0]) at [Source:
java.io.StringReader#8c3770; line: 1, column: 3]
First please ensure that json file is valid through http://jsonlint.com and then remove any newline characters or unwanted spaces in the json file before loading the file into the hive table. Also please drop the table and create a new table if you have already loaded json files having newline characters into the table.
Following is the input you can try
{"DocId":"ABC",
"User":{"Id":1234,
"Username":"sam1234",
"Name":"Sam",
"ShippingAddress":{"Address1":"123 Main St.","Address2":null,"City":"Durham","State":"NC"},
"Orders":[{"ItemId":6789,"OrderDate":"11/11/2012"},
{"ItemId":4352,"OrderDate":"12/12/2012"}
]
}
}
remove the newline from the json file.
{"DocId": "ABC", "Userdetails": {"Id": 1234, "Username": "sam1234", "Name": "Sam", "ShippingAddress": {"Address1": "123 Main St.", "Address2": null, "City": "Durham", "State": "NC" }, "Orders":[{"ItemId": 6789, "OrderDate": "11/11/2012"}, {"ItemId": 4352, "OrderDate": "12/12/2012"}]}}
change User to userdetails as it's a identifier check the error which I got.
3.either use location or load data inpath. because both does the same work. Location does not create a folder in HDFS while load inpath does create folder.
following are the commands :
hive>
create external table sample_json(DocId string, userdetails struct< Id:int , Username:string,Name:string,ShippingAddress:struct,Orders:array< struct< ItemId:int, OrderDate:string>>>)ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' location '/user/admin';
OK
Time taken: 0.13 seconds
hive>
select * from sample_json;
OK
sample_json.docid sample_json.userdetails
ABC {"id":1234,"username":"sam1234","name":"Sam","shippingaddress":{"address1":"123 Main St.","address2":null,"city":"Durham","state":"NC"},"orders":[{"itemid":6789,"orderdate":"11/11/2012"},{"itemid":4352,"orderdate":"12/12/2012"}]}
Time taken: 0.106 seconds, Fetched: 1 row(s)

Importing JSON file into Firebase error

I keep getting that there is an error uploading/importing my JSON file into Firebase. I initially had an excel spreadsheet that I saved as a CSV file, then I used a CSV to JSON converter.
I validated the JSON file (which have the .json extension) with a couple of online tools.
Though, I'm still getting an error.
Here is an example of my JSON:
{
"Rk": 1,
"Tm": "SEA",
"H/A": "H",
"DOW": "Sun",
"Opp": "CLE",
"QB": "Russell Wilson",
"Grade": "BLUE",
"Def mu pts": 4,
"Inj status": 0,
"Notes": "Got to wonder if not having a proven power RB under center will negatively impact Wilson's production.",
"TFS $50K": "$8,300",
"Init sal": "$8,300",
"Var": "$0",
"WC": 0
}
The issue is your key's..
Firebase keys must be:
UTF-8 encoded, cannot contain . $ # [ ] / or ASCII control characters
0-31 or 127
your $50k key and the H/A are the issues.