Python open() as write giving wrong encoding in json - json

I have a python dictionary that looks like this:
{'id': 5677240, 'name': 'Conjunto de Panelas antiaderentes com 05 Peças Paris', 'quantity': 21, 'price': '192.84', 'category': 'Panelas'}
But when I try to write it to a JSON file, there's a little mess on the encoding:
{"id": 5677240, "name": "Conjunto de Panelas antiaderentes com 05 Pe\u00e7as Paris", "quantity": 21, "price": 192.84, "category": "Panelas"}
I've already tried putting encoding='utf-8 and using locale as false, but neither helped me.

You will have to disable ensure_ascii while writing to a file.
import json
data = {'id': 5677240, 'name': 'Conjunto de Panelas antiaderentes com 05 Peças Paris', 'quantity': 21, 'price': '192.84', 'category': 'Panelas'}
with open("a.json", "w") as f:
json.dump(data, f, ensure_ascii=False)
Output in file:
{"id": 5677240, "name": "Conjunto de Panelas antiaderentes com 05 Peças Paris", "quantity": 21, "price": "192.84", "category": "Panelas"}

Related

Get category of movie from json struct using spark scala

I have a df_movies and col of geners that look like json format.
|genres |
[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, {'id': 37, 'name': 'Western'}]
How can I extract the first field of 'name': val?
way #1
df_movies.withColumn
("genres_extract",regexp_extract(col("genres"),
""" 'name': (\w+)""",1)).show(false)
way #2
df_movies.withColumn
("genres_extract",regexp_extract(col("genres"),
"""[{'id':\s\d,\s 'name':\s(\w+)""",1))
Excepted: Action
You can use get_json_object function:
Seq("""[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 37, "name": "Western"}]""")
.toDF("genres")
.withColumn("genres_extract", get_json_object(col("genres"), "$[0].name" ))
.show()
+--------------------+--------------+
| genres|genres_extract|
+--------------------+--------------+
|[{"id": 28, "name...| Action|
+--------------------+--------------+
Another possibility is using the from_json function together with a self defined schema. This allows you to "unwrap" the json structure into a dataframe with all of the data in there, so that you can use it however you want!
Something like the following:
import org.apache.spark.sql.types._
Seq("""[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 37, "name": "Western"}]""")
.toDF("genres")
// Creating the necessary schema for the from_json function
val moviesSchema = ArrayType(
new StructType()
.add("id", StringType)
.add("name", StringType)
)
// Parsing the json string into our schema, exploding the column to make one row
// per json object in the array and then selecting the wanted columns,
// unwrapping the parsedActions column into separate columns
val parsedDf = df
.withColumn("parsedMovies", explode(from_json(col("genres"), moviesSchema)))
.select("parsedMovies.*")
parsedDf.show(false)
+---+---------+
| id| name|
+---+---------+
| 28| Action|
| 12|Adventure|
| 37| Western|
+---+---------+

Getting a syntax error how do I remove quotes from json.dumps()

I'm getting information from an external api. I'm getting an error when I try to update the database as the string formatting is incorrect.
There is a mixture of Double quotes and single quotes caused by the single quote O' in the first line which must tell python to use Double quotes.
To get over this I tried json.dumps() and that gives me the result below and causes a syntax error when inserting to the Database, because of the single quotes around every object.
[
'{"author_name": "Michael P O\'Shaughnessy", "rating": 4, "text": "Stayed for a midweek"}',
'{"author_name": "camille williams", "rating": 5, "text": "We looked around and found"}',
'{"author_name": "natasha sevrugina", "rating": 5, "text": "Stayed at the "}',
'{"author_name": "niamh kelly", "rating": 5, "text": "Great hotel in a central location"}',
'{"author_name": "Janette Wade", "rating": 5, "text": "Excellent staff
\\ud83d\\udc4f\\n\\nJanette\\nSpiritual Ceremonies"}'
]
It should look like this to achieve, valid Json
[
{"author_name": "Michael P O\' Shaughnessy", "rating": 4, "text": "Stayed for a midweek night"},
{"author_name": "camille williams", "rating": 5, "text": "We looked around and found"}
]
With out jsondumps the return from the api is, notice only the first line is changed
[
{'author_name': "Michael P O'Shaughnessy", 'rating': 4, 'text': "Stayed for a midweek night"}
{'author_name': 'camille williams', 'rating': 5, 'text': 'We looked around and found that was great."},
{'author_name': 'natasha sevrugina', 'rating': 5, 'text': 'Stayed at the Hotel '},
{'author_name': 'niamh kelly', 'rating': 5, 'text': 'Great hotel in a central location. '},
{'author_name': 'Janette Wade', 'rating': 5, 'text': 'Modern rooms. Great room service. '}
]
Which also gives a syntax error because of the mixture of single, double and single quotes.
Here are my calls to the api:
review_author = dictionary['result']['reviews'][i]['author_name']
review_rating = dictionary['result']['reviews'][i]['rating']
review_text = dictionary['result']['reviews'][i]['text']
dict_keys = ["author_name", "rating", "text"]
res_dict = {dict_keys[0]: review_author, dict_keys[1]: review_rating, dict_keys[2]: review_text}
bus_reviews.append(json.dumps(res_dict))
How can I remove the single quotes around json.dump()
I changed this
bus_reviews.append(json.dumps(res_dict))
to
bus_reviews.append(res_dict)
and I put the return statement to include the json dumps
return json.dumps(bus_reviews)

multiple JSON objects into R from one txt file

I am very new to Json files. I scraped a txt file with some million json objects such as:
{
"created_at":"Mon Oct 14 21:04:25 +0000 2013",
"default_profile":true,
"default_profile_image":true,
"description":"...",
"followers_count":5,
"friends_count":560,
"geo_enabled":true,
"id":1961287134,
"lang":"de",
"name":"Peter Schmitz",
"profile_background_color":"C0DEED",
"profile_background_image_url":"http://abs.twimg.com/images/themes",
"utc_offset":-28800,
...
}
{
"created_at":"Fri Oct 17 20:04:25 +0000 2015",
...
}
I want to extract the columns into a data frame in R:
Variable Value
created_at X
default_profile Y
…
In general, similar to how done here(multiple Json objects in one file extract by python) in Python. If anyone has an idea or a suggestion, help would be much appreciated! Thank you!
Here is an example on how you could approach it with two objects. I assume you were able to read the JSON from a file, otherwise see here.
myjson = '{"created_at": "Mon Oct 14 21:04:25 +0000 2013", "default_profile": true,
"default_profile_image": true, "description": "...", "followers_count":
5, "friends_count": 560, "geo_enabled": true, "id": 1961287134, "lang":
"de", "name": "Peter Schmitz", "profile_background_color": "C0DEED",
"profile_background_image_url": "http://abs.twimg.com/images/themes", "utc_offset": -28800}
{"created_at": "Mon Oct 15 21:04:25 +0000 2013", "default_profile": true,
"default_profile_image": true, "description": "...", "followers_count":
5, "friends_count": 560, "geo_enabled": true, "id": 1961287134, "lang":
"de", "name": "Peter Schmitz", "profile_background_color": "C0DEED",
"profile_background_image_url": "http://abs.twimg.com/images/themes", "utc_offset": -28800}
'
library("rjson")
# Split the text into a list of all JSON objects. I chose '!x!x!' pretty randomly.. There may be better ways of keeping the brackets wile splitting.
my_json_objects = head(strsplit(gsub('\\}','\\}!x!x!', myjson),'!x!x!')[[1]],-1)
# read the text as JSON objects
json_data <- lapply(my_json_objects, function(x) {fromJSON(x)})
# Transform to dataframes
json_data <- lapply(json_data, function(x) {data.frame(val=unlist(x))})
Output:
[[1]]
val
created_at Mon Oct 14 21:04:25 +0000 2013
default_profile TRUE
default_profile_image TRUE
description ...
followers_count 5
friends_count 560
geo_enabled TRUE
id 1961287134
lang de
name Peter Schmitz
profile_background_color C0DEED
profile_background_image_url http://abs.twimg.com/images/themes
utc_offset -28800
[[2]]
val
created_at Mon Oct 15 21:04:25 +0000 2013
default_profile TRUE
default_profile_image TRUE
description ...
followers_count 5
friends_count 560
geo_enabled TRUE
id 1961287134
lang de
name Peter Schmitz
profile_background_color C0DEED
profile_background_image_url http://abs.twimg.com/images/themes
utc_offset -28800
Hope this helps!

JSON Formatting error

I am getting this error while trying to import this JSON into google bigquery table
file-00000000: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. (error code: invalid)
JSON parsing error in row starting at position 0 at file: file-00000000. Start of array encountered without start of object. (error code: invalid)
This is the JSON
[{'instrument_token': 11192834, 'average_price': 8463.45, 'last_price': 8471.1, 'last_quantity': 75, 'buy_quantity': 1065150, 'volume': 5545950, 'depth': {'buy': [{'price': 8471.1, 'quantity': 300, 'orders': 131072}, {'price': 8471.0, 'quantity': 300, 'orders': 65536}, {'price': 8470.95, 'quantity': 150, 'orders': 65536}, {'price': 8470.85, 'quantity': 75, 'orders': 65536}, {'price': 8470.7, 'quantity': 225, 'orders': 65536}], 'sell': [{'price': 8471.5, 'quantity': 150, 'orders': 131072}, {'price': 8471.55, 'quantity': 375, 'orders': 327680}, {'price': 8471.8, 'quantity': 1050, 'orders': 65536}, {'price': 8472.0, 'quantity': 1050, 'orders': 327680}, {'price': 8472.1, 'quantity': 150, 'orders': 65536}]}, 'ohlc': {'high': 8484.1, 'close': 8336.45, 'low': 8422.35, 'open': 8432.75}, 'mode': 'quote', 'sell_quantity': 998475, 'tradeable': True, 'change': 1.6151959167271395}]
http://jsonformatter.org/ also gives parse error for this JSON block. Need help understanding where the formatting is wrong - this is the JSON from a rest API
This is not valid JSON. JSON uses double quotes, not single quotes. Also, True should be true.
If I had to guess, I would guess that this is Python code being passed off as JSON. :-)
I suspect that even once this is made into correct JSON, it's not the format Google BigQuery is expecting. From https://cloud.google.com/bigquery/data-formats#json_format, it looks like you should have a text file with one JSON object per line. Try just this:
{"mode": "quote", "tradeable": true, "last_quantity": 75, "buy_quantity": 1065150, "depth": {"buy": [{"quantity": 300, "orders": 131072, "price": 8471.1}, {"quantity": 300, "orders": 65536, "price": 8471.0}, {"quantity": 150, "orders": 65536, "price": 8470.95}, {"quantity": 75, "orders": 65536, "price": 8470.85}, {"quantity": 225, "orders": 65536, "price": 8470.7}], "sell": [{"quantity": 150, "orders": 131072, "price": 8471.5}, {"quantity": 375, "orders": 327680, "price": 8471.55}, {"quantity": 1050, "orders": 65536, "price": 8471.8}, {"quantity": 1050, "orders": 327680, "price": 8472.0}, {"quantity": 150, "orders": 65536, "price": 8472.1}]}, "change": 1.6151959167271395, "average_price": 8463.45, "ohlc": {"close": 8336.45, "high": 8484.1, "open": 8432.75, "low": 8422.35}, "instrument_token": 11192834, "last_price": 8471.1, "sell_quantity": 998475, "volume": 5545950}
OP has a valid JSON record but that wouldn't work with Biq Query, and here's why:
Google Big Query supports, JSON objects {}, one object per line. Check this out.
This basically means that you cannot supply list [] as json records and expect Big Query to detect it. You must always have one json object per line.
Here's a quick reference to what I am saying.
and there are more.
at last,
I highly recommend you read up the below and check out the link for more information on different forms of JSON structures, read this from the json.org

Load JSON array into Pig

I have a json file with the following format
[
{
"id": 2,
"createdBy": 0,
"status": 0,
"utcTime": "Oct 14, 2014 4:49:47 PM",
"placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia",
"longitude": 77.5983817,
"latitude": 12.9832418,
"createdDate": "Sep 16, 2014 2:59:03 PM",
"accuracy": 5,
"loginType": 1,
"mobileNo": "0000005567"
},
{
"id": 4,
"createdBy": 0,
"status": 0,
"utcTime": "Oct 14, 2014 4:52:48 PM",
"placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia",
"longitude": 77.5983817,
"latitude": 12.9832418,
"createdDate": "Oct 8, 2014 5:24:42 PM",
"accuracy": 5,
"loginType": 1,
"mobileNo": "0000005566"
}
]
when i try to load the data to pig using JsonLoader class I am getting a error like Unexpected end-of-input: expected close marker for OBJECT
a = LOAD '/user/root/jsoneg/exp.json' USING JsonLoader('id:int,createdBy:int,status:int,utcTime:chararray,placeName:chararray,longitude:double,latitude:double,createdDate:chararray,accuracy:double,loginType:double,mobileNo:chararray');
b = foreach a generate $0,$1,$2;
dump b;
I am also faced similar kind of problem sometime back, later i came to know that Pig JSON will not support multiline json format. It will always expect the json input must be in single line.
Instead of native Jsonloader, i suggest you to use elephantbird json loader. It is pretty good for Jsons formats.
You can download the jars from the below link
http://www.java2s.com/Code/Jar/e/elephant.htm
I changed your input format to single line and loaded through elephantbird as below
input.json
{"test":[{"id": 2,"createdBy": 0,"status": 0,"utcTime": "Oct 14, 2014 4:49:47 PM","placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia","longitude": 77.5983817,"latitude": 12.9832418,"createdDate": "Sep 16, 2014 2:59:03 PM","accuracy": 5,"loginType": 1,"mobileNo": "0000005567"},{"id": 4,"createdBy": 0,"status": 0,"utcTime": "Oct 14, 2014 4:52:48 PM","placeName": "21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia","longitude": 77.5983817,"latitude": 12.9832418,"createdDate": "Oct 8, 2014 5:24:42 PM","accuracy": 5,"loginType": 1,"mobileNo": "0000005566"}]}
PigScript:
REGISTER '/tmp/elephant-bird-hadoop-compat-4.1.jar';
REGISTER '/tmp/elephant-bird-pig-4.1.jar';
A = LOAD 'input.json ' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
B = FOREACH A GENERATE FLATTEN($0#'test');
C = FOREACH B GENERATE FLATTEN($0) AS mymap;
D = FOREACH C GENERATE mymap#'id',mymap#'placeName',mymap#'status';
DUMP D;
Output:
(2,21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia,0)
(4,21/F, Cunningham Main Rd, Sampangi Rama NagarBengaluruKarnatakaIndia,0)