We have a directory /our_jsons that has the files:
file1.json
{"team": 1, "leagueId": 1, "name": "the ballers"}
{"team": 2, "leagueId": 1, "name": "the hoopers"}
file2.json
{"team": 3, "leagueId": 1, "name": "the gamerrs"}
{"team": 4, "leagueId": 1, "name": "the drivers"}
file3.json
{"team": 5, "leagueId": 1, "name": "the jumpers"}
{"team": 6, "leagueId": 1, "name": "the riserss"}
and we need to stack these into a single file output_file.json, that simply has all of the JSONs in our directory combined / stacked on top of one another:
output_file.json
{"team": 1, "leagueId": 1, "name": "the ballers"}
{"team": 2, "leagueId": 1, "name": "the hoopers"}
{"team": 3, "leagueId": 1, "name": "the gamerrs"}
{"team": 4, "leagueId": 1, "name": "the drivers"}
{"team": 5, "leagueId": 1, "name": "the jumpers"}
{"team": 6, "leagueId": 1, "name": "the riserss"}
Is this possible to do with a bash command in Mac / Linux? We're hoping this is easier than combining ordinary JSONs because these are NDJSONs and so the files truly simply need to just be stacked on top of one-another. Our full data is much larger (~10GB of data split over 100+ newline-delimited JSONs), and we're hoping to find a decently-performant (under 2-5 minutes) solution if possible. I just installed and am reading docs on jq currently, and will update if we come up with a solution.
EDIT:
It looks like jq . our_jsons/* > output_file.json concats the JSONs, however the output is not an ND JSON but rather an ordinary (and invalid) JSON file...
cat tmp/* | jq -c '.' > tmp/output_file.json appears to get the job done!
Related
According to How to preserve integer data type when exporting to JSON?, it is not currently possible to preserve integer types when exporting from BigQuery to JSON. This minor detail about BigQuery --> GCS JSON exports has been causing us many problems. The result of one of our table exports is a newline-delimited JSON that looks like this:
{"leagueId": "1", "name": "the ballers"}
{"team": "2", "leagueId": "1", "name": "the hoopers"}
{"team": "3", "leagueId": "1", "name": "the gamerrs"}
{"team": "4", "leagueId": "1", "name": "the drivers"}
{"team": "5", "leagueId": "1", "name": "the jumpers"}
{"team": "6", "leagueId": "1", "name": "the riserss"}
team, leagueId should both be ints, and we'd like to modify this NDJSON converting these strings back into its. The output we're going for is:
{"leagueId": 1, "name": "the ballers"}
{"team": 2, "leagueId": 1, "name": "the hoopers"}
{"team": 3, "leagueId": 1, "name": "the gamerrs"}
{"team": 4, "leagueId": 1, "name": "the drivers"}
{"team": 5, "leagueId": 1, "name": "the jumpers"}
{"team": 6, "leagueId": 1, "name": "the riserss"}
Assuming we know / have a list/array of the columns that need to be converted from strings into ints [team, leagueId], how can we do this conversion? Is this possible with (a) a bash command using a tool like jq, or (b) is there some python solution? Our full NDJSON is ~10GB in size, and performance is important as this is a step in our daily data-ingestion pipeline.
Edit: How to convert a string to an integer in a JSON file using jq? - trying to use this post to help. Have come up with jq '.team | tonumber' tmp/testNDJSON.json, but this simply returns 1 2 3 4 5 6, not an updated JSON, and only handles one key, not multiple keys.
Edit2: jq -c '{leagueId: .leagueId | tonumber, team: .team | tonumber, name: .name}' tmp/testNDJSON.json > tmp/new_output.json this would work if not for the missing team value in the first JSON... getting closer.
you can use if
jq -c 'if .team then {leagueId: .leagueId | tonumber, team: .team | tonumber, name: .name}
else {leagueId: .leagueId | tonumber, name: .name} end '
more conditionals https://stedolan.github.io/jq/manual/v1.6/#ConditionalsandComparisons
I have an json dataframe with tedx talks as items (rows), that has a column 'ratings' in json format going like this. (The column depicts how the talk was described by audience)
[{"id": 7, "name": "Funny", "count": 19645}, {"id": 1, "name": "Beautiful", "count": 4573}, {"id": 9, "name": "Ingenious", "count": 6073}, ..........]
[{"id": 7, "name": "Funny", "count": 544}, {"id": 3, "name": "Courageous", "count": 139}, {"id": 2, "name": "Confusing", "count": 62}, {"id": 1, "name": "Beautiful", "count": 58}, ........]
Obviously the order of the descriptive words name is not standard/same for each item (tedx talk). Each word has an id(same for all talks) and a count respectively for each talk.
I am interested in manipulating the data and extracting three new integer columns regarding counts of: funny, inspiring, confusing, storing there the count for each of those words for the respective talks
Among other stuff, tried this
df['ratings'] = df['ratings'].map(lambda x: dict(eval(x)))
in return i get this error
File "C:/Users/Paul/Google Drive/WEEK4/ted-talks/w4e1.py", line 30, in
df['ratings'] = df['ratings'].map(lambda x: dict(eval(x)))
ValueError: dictionary update sequence element #0 has length 3; 2 is required
Been trying several different ways, but havent been able to even get values from the json formatted column properly. Any suggestions?
You can use list comprehension with flattening and convert string repr to list of dict by ast.literal_eval what is better solution like eval:
import pandas as pd
import ast
df = pd.DataFrame({'ratings': ['[{"id": 7, "name": "Funny", "count": 19645}, {"id": 1, "name": "Beautiful", "count": 4573}, {"id": 9, "name": "Ingenious", "count": 6073}]', '[{"id": 7, "name": "Funny", "count": 544}, {"id": 3, "name": "Courageous", "count": 139}, {"id": 2, "name": "Confusing", "count": 62}, {"id": 1, "name": "Beautiful", "count": 58}]']})
print (df)
ratings
0 [{"id": 7, "name": "Funny", "count": 19645}, {...
1 [{"id": 7, "name": "Funny", "count": 544}, {"i...
df1 = pd.DataFrame([y for x in df['ratings'] for y in ast.literal_eval(x)])
print (df1)
id name count
0 7 Funny 19645
1 1 Beautiful 4573
2 9 Ingenious 6073
3 7 Funny 544
4 3 Courageous 139
5 2 Confusing 62
6 1 Beautiful 58
I need to insert data in mongo but the JSON I am getting has multiple values in every field and I don't know how can I split them to insert in different documents.
I want to insert array data in different objects in MongoDB
{
"activity_template_id": [
1,
2,
3,
4,
5,
7
],
"done_date": [
"2019-08-10",
"2019-08-10",
"2019-08-10",
"0000-01-01",
"0000-01-01",
"0000-01-01"
],
"is_prescribed": [
"N",
"N",
"N",
"N",
"N",
"Y"
],
"material_id": [
1,
5,
21,
10,
14,
0
],
"qty": [
"1",
"1",
"1",
"0",
"0",
"0"
],
"unit_id": [
1,
1,
25,
0,
0,
0
],
}
(As far as I know) there is no feature in MongoDB itself that would process input data like that. You would do that in application code before calling MongoDB.
If there is no separate application, you can use standard JavaScript functions within the mongo shell to do that.
Hello I'm having trouble making a loop of if else in python, I need my if to check that there is a description "quantity" in my product and if there is to leave it as it is else to add in "quantity" : 0,
I want to make my for check that the "quantity" is present and if it's not to add it in.
But I have no idea how to make this for if else combo
data = json.load(json_data)
for product in data:
if product ["quantity"] in data
else 'w' product ["quantity":0]
It's going to show the result hopefully with this
with open('br2.json', 'w', encoding='utf8') as json_data:
json_data.write(json.dumps(data, ensure_ascii=False))
json_data.close()
I want it to go over a json like this
[{"id": 2162952, "name": "Kit Gamer acer - Notebook + Headset + Mouse",
"price": 25599.0, "category": "Eletrônicos"},
{"id": 3500957, "name": "Monitor 29 LG FHD Ultrawide com 1000:1 de
contraste", "quantity": 18, "price": 1559.4, "category":
"Eletrônicos"},
{"id": 1911864, "name": "Mouse Gamer Predator cestus 510 Fox Preto",
"price": 699.0, "category": "Acessórios"}]
And return it like this
[{"id": 2162952, "name": "Kit Gamer acer - Notebook + Headset +
Mouse","quantity": 0, "price": 25599.0, "category": "Eletrônicos"},
{"id": 3500957, "name": "Monitor 29 LG FHD Ultrawide com 1000:1 de
contraste", "quantity": 18, "price": 1559.4, "category": "Eletrônicos"},
{"id": 1911864, "name": "Mouse Gamer Predator cestus 510 Fox Preto",
"price": 699.0, "category": "Acessórios"}]
if product is a dictionary you can check if the key "quantity" is not in product and in that case add that key with value 0 with:
if "quantity" not in product:
product["quantity"] = 0
My Json file looks something like this (it's huge, so this is just simplified):
{
"foo": {
"id": [
20,
1,
3,
4,
60,
1,
],
"times": [
330.89,
5.33,
353.89,
33.89,
14.5,
207.5,
]
},
"poo": {
"id": [
20,
1,
3,
4,
60,
1,
],
"times": [
3.5,
323.89,
97.7,
154.5,
27.5,
265.60,
]
}
}
I have a similar json file as the one above, but a much more complex one. What I want to do is to use the "time" and "id" data and perform an action for the right "id" at the exact time. So the variables id and times are actually mapped to each other (has the same index). Is there a method to take out the right id for the right time to perform an action without having too many complicated loops?