Convert csv to json using phython - json

This is my code and it convert successfully. However, when i import this json into firebase and it state that Invalid JSON files.
import csv
import json
csvfile = open('C:/Users/Senior/seaborn-data/Denver DatasetCleaning Finalize.csv', 'r')
jsonfile = open('C:/Users/Senior/seaborn-data/Denver DatasetCleaning Finalize.json', 'w')
fieldnames = ("OFFENSE_CODE ","OFFENSE_CATEGORY_ID","FIRST_OCCURRENCE_DATE","DATE","YEAR","MONTH","DAY","TIME","HOUR","MINUTE","INCIDENT_ADDRESS","GEO_LON","GEO_LAT","NEIGHBORHOOD_ID")
reader = csv.DictReader( csvfile, fieldnames)
for row in reader:
json.dump(row, jsonfile)
jsonfile.write('\n')

each time json.dump is called it is outputting json. but several json strings concatenated together are not still json
what you maybe want to do is read the entire csv into a variable, then json.dump that

Related

Removing header from json and leave json array

I have a json file in the form
{"total_rows":1000,"rows":[{data},{data},{data}]}
and I just want
[{data},{data},{data}]
I know pandas has desired output to dataframe like:
import pandas as pd
file_reading = json.loads(open(url).read())
df = pd.DataFrame.from_dict(file_reading['rows'])
print(df)
But I am hoping for a way to do this outputting to json array and its a big dataset so I dont want to loop
You opened a file without closing it. There's nothing fancy needed, the JSON just translate into a dictionary in Python:
with open(url) as fp:
file_reading = json.load(fp)
df = pd.DataFrame(file_reading["rows"])

python api json dict in dataframe

I want to scrape data at the county level from https://apidocs.covidactnow.org
However I could only get a dataframe with one line for each county, and data for each date is stored within a dictionary in each row/county. I would like to access this data and store it in long format (= have one row per county-date).
import requests
import pandas as pd
import os
if __name__ == '__main__':
os.chdir('/home/username/Desktop/')
url = 'https://api.covidactnow.org/v2/counties.timeseries.json?apiKey=ENTER_YOUR_KEY'
response = requests.get(url).json()
data = pd.DataFrame(response)
This seems like a trivial question, but I've tried for hours. What would be the best way to achieve that ?
Do you mean something like that?
import requests
url = 'https://api.covidactnow.org/v2/states.timeseries.csv?apiKey=YOURAPIKEY'
response = requests.get(url)
csv_response = (response.text)
# Then you can transform STRING to CSV
Check this fo string to CSV --> python parsing string to csv format

load a json file containing list of strings

I have a json file containing a list of strings like this:
['Hello\nHow are you?', 'What is your name?\nMy name is john']
I have to read this file and store it as a list of strings but I am so confused that how should I read json file like this. Also, I should use utf-8 encoding format.
Let's assume you have one or multiple lines as described in the json file. Here is my suggestion (Remember to replace the file name test.json to yours):
import ast
with open("test.json", "r") as input_file:
line_list = input_file.readlines()
all_texts = [item for sublist in line_list for item in ast.literal_eval(sublist)]
print(all_texts)
The file you have shown is not in json format. Anyways, to read a json file you have to do following
import json
jsonObj = json.loads('path/to/file.json')
This will return a dictionary object and store it in jsonObj.

How to save JSON data fetched from URL in PySpark?

I have fetched some .json data from API.
import urllib2
test=urllib2.urlopen('url')
print test
How can I save it as a table or data frame? I am using Spark 2.0.
This is how I succeeded importing .json data from web into df:
from pyspark.sql import SparkSession, functions as F
from urllib.request import urlopen
spark = SparkSession.builder.getOrCreate()
url = 'https://web.url'
jsonData = urlopen(url).read().decode('utf-8')
rdd = spark.sparkContext.parallelize([jsonData])
df = spark.read.json(rdd)
For this you can have some research and try using sqlContext. Here is Sample code:-
>>> df2 = sqlContext.jsonRDD(test)
>>> df2.first()
Moreover visit line and check for more things here,
https://spark.apache.org/docs/1.6.2/api/python/pyspark.sql.html
Adding to Rakesh Kumar answer, the way to do it in spark 2.0 is:
http://spark.apache.org/docs/2.1.0/sql-programming-guide.html#data-sources
As an example, the following creates a DataFrame based on the content of a JSON file:
# spark is an existing SparkSession
df = spark.read.json("examples/src/main/resources/people.json")
# Displays the content of the DataFrame to stdout
df.show()
Note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. For more information, please see JSON Lines text format, also called newline-delimited JSON. As a consequence, a regular multi-line JSON file will most often fail.
from pyspark import SparkFiles
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Project").getOrCreate()
zip_url = "https://raw.githubusercontent.com/spark-examples/spark-scala-examples/master/src/main/resources/zipcodes.json"
spark.sparkContext.addFile(zip_url)
zip_df = spark.read.json("file://" +SparkFiles.get("zipcodes.json"))
#click on raw and then copy url

Python: Build DataFrame from parts of JSON response

I am trying to develop an application to retrieve stock prices (in JSON) and then do some analysis on them. My problem is with getting the JSON response into a pandas DataFrame where I can work. Here is my code:
'''
References
http://stackoverflow.com/questions/6862770/python-3-let-json-object- accept-bytes-or-let-urlopen-output-strings
'''
import json
import pandas as pd
from urllib.request import urlopen
#set API call
url = "https://www.quandl.com/api/v3/datasets/WIKI/AAPL.json?start_date=2017-01-01&end_date=2017-01-31"
#make call and receive response
response = urlopen(url).read().decode('utf8')
dataresponse = json.loads(response)
#check incoming
#print(dataresponse)
df = pd.read_json(dataresponse)
print(df)
The application errors at df = pd.read_json... with error TypeError: Expected String or Unicode.
So I reckon this is the first hurdle.
The second is getting where I need to. The JSON response contains only two arrays I am interested in, column_names and data. How do I extract only these two and put into a pandas DataFrame?
To answer your first question, pd.read_json takes a JSON string directly, so you should be doing this:
pd.read_json(response)
But instead, considering how the data is structured, it's best to first convert the JSON string to a dictionary containing the data:
d = json.loads(response)
Then simply build the dataframe from d['dataset']['data'] and d['dataset']['column_names']:
pd.DataFrame(data=d['dataset']['data'], columns=d['dataset']['column_names'])