I am trying to access a json object which is stored as a zipped gz on an html website. I would like to do this directly with urllib if possible.
This is what I have tried:
import urllib
import json
#get the zip file
test = urllib.request.Request('http://files.tmdb.org/p/exports/movie_ids_01_27_2021.json.gz')
#unzip and read
with gzip.open(test, 'rt', encoding='UTF-8') as zipfile:
my_object = json.loads(zipfile)
but this fails with:
TypeError: filename must be a str or bytes object, or a file
Is it possible to read the json directly like this (e.g. I don't want to download locally).
Thank you.
Use requests library. pip install requests if you don't have it.
Then use the following code:
import requests
r = requests.get('http://files.tmdb.org/p/exports/movie_ids_01_27_2021.json.gz')
print(r.content)
r.content will be the binary content of the gzip file, but it will consume 11352985 bytes of memory (10.8 MB) because the data need to be kept somewhere.
then you can use
gzip.decompress(r.content)
to decompress the gzip binary and get the data. that will consume much bigger memory after decompression.
Related
I am relatively new to AWS s3 I am calling an API to load the JSON data directly to s3 bucket. From s3 bucket data will be read by Snowflake. After researching I found that using Boto3 we can load data into s3 directly. Code will look something like below, however one thing I am not sure about is What should I put for the key as there is no object created in my S3 bucket. Also, what is the good practice to load the JSON data to s3 ? Do I need to encode JSON data to 'UTF-8' as done here by SO user Uwe Bretschneider.
Thanks in advance!
Python code:
import json,urllib.request
import boto3
data = urllib.request.urlopen("https://api.github.com/users?since=100").read()
output = json.loads(data)
print (output) #Checking the data
s3 = boto3.client('s3')
s3.put_object(
Body=str(json.dumps(data))
Bucket='I_HAVE_BUCKET_NAME'
Key='your_key_here'
)
By using put_object, which means you are creating a new object in the bucket, so there is no existing key.
This key is just like a file name in the file system. You can specify whatever names you like, such as my-data.json or some-dir/my-data.json. You can find out more in https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html.
As for encoder, it's always good to specify the encoding IMO, just to make sure your source file has properly encoded too.
I have downloaded my Facebook data. I got the data in the form of JSON files.
Now I am trying to read these JSON files into NetworkX. I don't find any function to read graph from JSON file into NetworkX.
In another post, found the info related to reading a graph from JSON, where the JSON file was earlier created from NetworkX using json.dump().
But here in my case I have downloaded the data from Facebook. Is there any function to read graph from JSON file into NetworkX?
Unlike Pandas tables or Numpy arrays, JSON files has no rigid structure so one can't write a function to convert any JSON file to Networkx graph. If you want to construct a graph based on JSON, you should pick all needed info yourself. You can load a file with json.loads function, extract all nodes and edges according to your rules and then put them into your graph with add_nodes_from and add_edges_from functions.
For example Facebook JSON file you can write something like it:
import json
import networkx as nx
with open('fbdata.json') as f:
json_data = json.loads(f.read())
G = nx.DiGraph()
G.add_nodes_from(
elem['from']['name']
for elem in json_data['data']
)
G.add_edges_from(
(elem['from']['id'], elem['id'])
for elem in json_data['data']
)
nx.draw(
G,
with_labels=True
)
And get this graph:
I have recorded my script to upload json file through Jmeter, but now I am facing the problem while uploading json file through jmeter which is on my local drive.
I am already done with following steps:
Either use full path to file, you're uploading, or copy it to JMeter's "bin" folder
Don't forget to tick Use multipart/form-data for HTTP POST box
Make sure you provide correct Parameter name and MIME Type
Getting exception in my response data:
{"Exception":"Object reference not set to an instance of an object."}
For me works like this. JSON example May be Your server prefer some specific headers? E.g. content type and so on
Why you're using json file?
If you facing problems with generating random JSON data, here is a JSR223 preprocessor:
import org.apache.commons.io.FileUtils;
def time = vars.get("stratup_time")
File myFile = File.createTempFile("upload-", "" );
// Generate Random length string and write to file
FileUtils.write(myFile, "{\"test\":\"$time\"", "UTF-8" )
// Store file name in variable.
vars.put( "filename", myFile.getCanonicalPath() )
And PostProcessor:
import org.apache.commons.io.FileUtils;
// Delete file and do not throw error
FileUtils.deleteQuietly(new File( vars.get("filename")));
In order to be able to record the file upload event you need to copy your .json file to JMeter's "bin" folder, this way HTTP(S) Test Script Recorder will be able to capture the request and generate necessary HTTP Request sampler. It will also populate the HTTP Header Manager
More information: Recording File Uploads with JMeter
With Spark 1.6.2, reading a gzip compressed JSON file from a normal file-system:
val df = sqlContext
.read
.json("file:///data/blablacar/transactions.json.gz")
.count()
Will use a single task on a single worker.
But if I save the file:
sc.textFile("file:///data/blablacar/transactions.json.gz")
.saveAsTextFile("file:///user/blablacar/transactions")
sqlContext.read.json("file:///user/blablacar/transactions")
.count()
Will execute the first job on a single task, but the JSON decoding on several (which is good!).
Why Spark didn't unzip the file in memory and distribute the JSON decoding in several task in the 1st case?
Why Spark didn't unzip the file in memory and distribute the JSON decoding in several task in the 1st case?
Because you gzip compression is not splittable, therefore file has to be loaded as whole one a single machine. If you want parallel reads:
Don't use gzip at all, or
Use gzip compression on smaller files comparable to split size, or
Unpack files yourself before you pass them to Spark.
Calling .repartition(8) did the trick!
val rdd = sc.textFile("file:///data/blablacar/transactions.json.gz")
.repartition(8)
sqlContext.read.json(rdd)
.count()
I have a URL on my Raspberry Pi device server. How can I parse it?
I have used the urllib2 module to get the contents of the URL from server.
I want to use JSON for parsing the url.
so if i understad properly , you have a json file with a URL that you want to pass to your python module
used this a while back
import json
with open('yourjasonfilewithurl.jason') as json_data_file:
data = json.load(json_data_file)
The use the data function like this, the below is dependant on the number of elements you have set up in your json config file
myurl = (data["details"][0]["url"])
hope this helps