Warning - I'm new to MongoDB and JSON.
I've a log file which contain JSON datasets. A single file has multiple JSON formats as it is capturing clickstream data. Here is an example of one log file.
[
{
"username":"",
"event_source":"server",
"name":"course.activated",
"accept_language":"",
"time":"2016-10-12T01:02:07.443767+00:00",
"agent":"python-requests/2.9.1",
"page":null,
"host":"courses.org",
"session":"",
"referer":"",
"context":{
"user_id":null,
"org_id":"X",
"course_id":"3T2016",
"path":"/api/enrollment"
},
"ip":"160.0.0.1",
"event":{
"course_id":"3T2016",
"user_id":11,
"mode":"audit"
},
"event_type":"activated"
},
{
"username":"VTG",
"event_type":"/api/courses/3T2016/",
"ip":"161.0.0.1",
"agent":"Mozilla/5.0",
"host":"courses.org",
"referer":"http://courses.org/16773734",
"accept_language":"en-AU,en;q=0.8,en-US;q=0.6,en;q=0.4",
"event":"{\"POST\": {}, \"GET\": {}}",
"event_source":"server",
"context":{
"course_user_tags":{
},
"user_id":122,
"org_id":"X",
"course_id":"3T2016",
"path":"/api/courses/3T2016/"
},
"time":"2016-10-12T00:51:57.756468+00:00",
"page":null
}
]
Now I want to store this data in MongoDB. So here are my novice questions:
Do I need to parse the file and then split it into 2 datasets before storing in MongoDB? If yes, then is here a simple program to do this as my file has multiple dataset formats?
Is there some magic in MongoDB that can split the various datasets when we upload it?
First of all you have invalid json format, Make sure your json being formatted as I have cite below. After Successfully having your json data you can perform Mongodb restore option to insert your data back to database.
mongorestore --host hostname --port 27017 --dir pathtojsonfile --db <database_name_to_restore>
Fo more information refer https://docs.mongodb.com/manual/reference/program/mongorestore/
Formatted json
[
{
"username":"",
"event_source":"server",
"name":"course.activated",
"accept_language":"",
"time":"2016-10-12T01:02:07.443767+00:00",
"agent":"python-requests/2.9.1",
"page":null,
"host":"courses.org",
"session":"",
"referer":"",
"context":{
"user_id":null,
"org_id":"X",
"course_id":"3T2016",
"path":"/api/enrollment"
},
"ip":"160.0.0.1",
"event":{
"course_id":"3T2016",
"user_id":11,
"mode":"audit"
},
"event_type":"activated"
},
{
"username":"VTG",
"event_type":"/api/courses/3T2016/",
"ip":"161.0.0.1",
"agent":"Mozilla/5.0",
"host":"courses.org",
"referer":"http://courses.org/16773734",
"accept_language":"en-AU,en;q=0.8,en-US;q=0.6,en;q=0.4",
"event":"{\"POST\": {}, \"GET\": {}}",
"event_source":"server",
"context":{
"course_user_tags":{
},
"user_id":122,
"org_id":"X",
"course_id":"3T2016",
"path":"/api/courses/3T2016/"
},
"time":"2016-10-12T00:51:57.756468+00:00",
"page":null
}
]
Related
I have a post body data as:
"My data": [{
"Data": {
"var1": 6.66,
"var2": 8.88
},
"var3": 9
}],
Here, if I post these details on POST DATA body, it will call "My Data" just once. I want to make it random as starting from 1 to 10 times so that "My data" is running for several times but randomly. If the random value is 2, then "My data" should run twice.
Help appreciated!
If you need to generate more blocks like this one:
{
"Data": {
"var1": 6.66,
"var2": 8.88
},
"var3": 9
}
It can be done using JSR223 PreProcessor and the following code:
def myData = []
1.upto(2, {
def entry = [:]
entry.put('Data', [var1: 6.66, var2: 8.88])
entry.put('var3', '9')
myData.add(entry)
})
vars.put('myData', new groovy.json.JsonBuilder(myData).toPrettyString())
log.info(vars.get('myData'))
The above example will generate 2 blocks:
If you want 10 - change 2 in the 1.upto(2, { line to 10
The generated data can be accessed as ${myData} where needed.
More information:
Apache Groovy - Parsing and producing JSON
Apache Groovy - Why and How You Should Use It
I want to generate schema from a newline delimited JSON file, having each row in the JSON file has variable-key/value pairs. File size can vary from 5 MB to 25 MB.
Sample Data:
{"col1":1,"col2":"c2","col3":100.75}
{"col1":2,"col3":200.50}
{"col1":3,"col3":300.15,"col4":"2020-09-08"}
Exptected Schema:
[
{"name": "col1", "type": "INTEGER"},
{"name": "col2", "type": "STRING"},
{"name": "col3", "type": "FLOAT"},
{"name": "col4", "type": "DATE"}
]
Notes:
There is no scope to use any tool, as files loaded into an inbound location dynamically. The code will use to trigger an event as-soon-as file arrives and perform schema comparison.
Your first problem is, that json does not have a date-type. So you will get str there.
What I would do, if I was you is this:
import json
# Wherever your input comes from
inp = """{"col1":1,"col2":"c2","col3":100.75}
{"col1":2,"col3":200.50}
{"col1":3,"col3":300.15,"col4":"2020-09-08"}"""
schema = {}
# Split it at newlines
for line in inp.split('\n'):
# each line contains a "dict"
tmp = json.loads(line)
for key in tmp:
# if we have not seen the key before, add it
if key not in schema:
schema[key] = type(tmp[key])
# otherwise check the type
else:
if schema[key] != type(tmp[key]):
raise Exception("Schema mismatch")
# format however you like
out = []
for item in schema:
out.append({"name": item, "type": schema[item].__name__})
print(json.dumps(out, indent=2))
I'm using python types for simplicity, but you can write your own function to get the type, e.g. if you want to check if a string is actually a date.
I have json like this below
{
"id": 1,
"interviewer": "hengtw1",
"incidenttwg1": {
"id": 5,
"child_occupation": [
6
],
},
}
How can i access child_occupation array. All i tried is incidenttwg1['child_occupation'] or ['incidenttwg1']['child_occupation']. Anyway its still doesn't work.
Any Help?? Thanks....
check this if your string is valid json in python
and
refer this to know more about json encoder and decoder
import json
# Decoding json
data = json.loads({"id": 1,"interviewer": "hengtw1","incidenttwg1": {"id": 5,"child_occupation": [6]}})
print(data["incidenttwg1"]["child_occupation"])
# this will print [6] (list)
print(data["incidenttwg1"]["child_occupation"][0])
# this will print 6 (list item)
I'm looking to create a python3 list of the locations from the json file city.list.json downloaded from OpenWeatherMaps http://bulk.openweathermap.org/sample/city.list.json.gz. The file passes http://json-validator.com/ but I can not figure out how to correctly open the file and create a list of values of key 'name'. I keep hitting json.loads errors about io.TextIOWrapper etc.
I created a short test file
[
{
"id": 707860,
"name": "Hurzuf",
"country": "UA",
"coord": {
"lon": 34.283333,
"lat": 44.549999
}
}
,
{
"id": 519188,
"name": "Novinki",
"country": "RU",
"coord": {
"lon": 37.666668,
"lat": 55.683334
}
}
]
Is there a way to parse this and create a list ["Hurzuf", "Novinki"] ?
You should use json.load() instead of json.loads(). I named my test file file.json and here is the code:
import json
with open('file.json', mode='r') as f:
# At first, read the JSON file and store its content in an Python variable
# By using json.load() function
json_data = json.load(f)
# So now json_data contains list of dictionaries
# (because every JSON is a valid Python dictionary)
# Then we create a result list, in which we will store our names
result_list = []
# We start to iterate over each dictionary in our list
for json_dict in json_data:
# We append each name value to our result list
result_list.append(json_dict['name'])
print(result_list) # ['Hurzuf', 'Novinki']
# Shorter solution by using list comprehension
result_list = [json_dict['name'] for json_dict in json_data]
print(result_list) # ['Hurzuf', 'Novinki']
You just simply iterate over elements in your list and check whether the key is equal to name.
I keep getting that there is an error uploading/importing my JSON file into Firebase. I initially had an excel spreadsheet that I saved as a CSV file, then I used a CSV to JSON converter.
I validated the JSON file (which have the .json extension) with a couple of online tools.
Though, I'm still getting an error.
Here is an example of my JSON:
{
"Rk": 1,
"Tm": "SEA",
"H/A": "H",
"DOW": "Sun",
"Opp": "CLE",
"QB": "Russell Wilson",
"Grade": "BLUE",
"Def mu pts": 4,
"Inj status": 0,
"Notes": "Got to wonder if not having a proven power RB under center will negatively impact Wilson's production.",
"TFS $50K": "$8,300",
"Init sal": "$8,300",
"Var": "$0",
"WC": 0
}
The issue is your key's..
Firebase keys must be:
UTF-8 encoded, cannot contain . $ # [ ] / or ASCII control characters
0-31 or 127
your $50k key and the H/A are the issues.