My data from web can't save as excel by using DataFrame - json

I use API to get the data from the web, and I successfully get the data!I want to save these data in an excel.But there must be some mistakes that I can't save!Also, there isn't a Error.Can anyone tell me how to fix it?Thanks!Wish you a good life!
Here is my code:
import requests
from bs4 import BeautifulSoup
Function_code=[input("請輸入所需代碼")]
Year_of_start_time=str(eval(input("請輸入開始年份")))
Month_of_start_time=str(eval(input("請輸入開始月份")))
Year_of_end_time=str(eval(input("請輸入結束年份")))
Month_of_end_time=str(eval(input("請輸入結束月份")))
Demension:[]
url="https://nstatdb.dgbas.gov.tw/dgbasAll/webMain.aspx?sdmx/"
if Function_code==["A11010208010"]:
Dimension="1+2.1+2+3+4..M"
elif Function_code==["A093102020"]:
Dimension=".1+2+3+4+5+6+7+8+9+10+11+12+13+14+15+16..M"
elif Function_code==["A018203010"]:
Dimension="1+2+3.1+2+3+4+5+6+7+8+9+10+11+12+13+14+15+16+17+18+19+20..M"
elif Function_code==["A093005010"]:
Dimension="1+2+3.1+2+3+4+5+6+7+8..M"
Function_code=str(Function_code)
r=requests.get(url+Function_code+"/"+Dimension+"&"+"startTime="+Year_of_start_time+"-"+Month_of_start_time+"&"+"endTime="+Year_of_end_time+"-"+Month_of_end_time)
print(r.text)
import json
import pandas as pd
import csv
list_of_dicts=r.json()
print(type(r))
print(type(list_of_dicts))
import json
import pandas as pd
df=pd.DataFrame(list_of_dicts)
df.to_excel('list_of_dicts.xlsx')
I wants save those data, which I craw from the web, into a excel.
btw, these data are 3-dimension.

Related

python api json dict in dataframe

I want to scrape data at the county level from https://apidocs.covidactnow.org
However I could only get a dataframe with one line for each county, and data for each date is stored within a dictionary in each row/county. I would like to access this data and store it in long format (= have one row per county-date).
import requests
import pandas as pd
import os
if __name__ == '__main__':
os.chdir('/home/username/Desktop/')
url = 'https://api.covidactnow.org/v2/counties.timeseries.json?apiKey=ENTER_YOUR_KEY'
response = requests.get(url).json()
data = pd.DataFrame(response)
This seems like a trivial question, but I've tried for hours. What would be the best way to achieve that ?
Do you mean something like that?
import requests
url = 'https://api.covidactnow.org/v2/states.timeseries.csv?apiKey=YOURAPIKEY'
response = requests.get(url)
csv_response = (response.text)
# Then you can transform STRING to CSV
Check this fo string to CSV --> python parsing string to csv format

not able to display the text within the a tag while web scraping using BeautifulSoup

i am trying to get the duration of the particular song using the .text(). but the output is plain while the name of the song and artist is displayed
from bs4 import BeautifulSoup
import requests
import csv
source=requests.get("https://gaana.com/artist/arijit-singh/latest/asc").text
soup=BeautifulSoup(source,"lxml")
with open("arijit_singh_new_update.csv","w") as arijit_csv_file:
arijit_csv_file_writer=csv.writer(arijit_csv_file)
arijit_csv_file_writer.writerow(["title","artists","duration"])
title_tag=soup.find("div",class_="playlist_thumb_det")
title=title_tag.a.text
composer_tag=soup.find("li",class_="s_artist p_artist desktop")
composer=composer_tag.a.text
duration_tag=soup.find("li",class_="s_duration")
duration=duration_tag.a.text
print(duration)
To save the song titles, artists and duration to csv file, you can use this example:
import csv
import requests
from bs4 import BeautifulSoup
url = "https://gaana.com/artist/arijit-singh/latest/asc"
soup=BeautifulSoup(requests.get(url).content,"lxml")
with open('arijit_singh_new_update.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(["title","artists","duration"])
for song, artist, duration in zip(
soup.select('.s_title a[data-type="playSong"]'),
soup.select('.s_artist [data-type="playSong"]'),
soup.select('.s_duration [data-type="playSong"]')):
writer.writerow([song.text, artist.text, duration.text])
This creates csv file arijit_singh_new_update.csv, in LibreOffice it looks:

Cannot plot candlestick data from Huobi json data

import pandas as pd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
from matplotlib.finance import candlestick_ohlc
import matplotlib.dates as mdates
import datetime as dt
import urllib
import json
from urllib.request import urlopen
import datetime as dt
import requests
dataLink ='http://api.huobi.com/staticmarket/btc_kline_015_json.js'
r = requests.get(dataLink) # r is a response object.
quotes = pd.DataFrame.from_records(r.json()) # fetches dataset
quotes[0] = pd.to_datetime(quotes[0].str[:-3], format='%Y%m%d%H%M%S')
#Naming columns
quotes.columns = ["Date","Open","High",'Low',"Close", "Vol"]
#Converting dates column to float values
quotes['Date'] = quotes['Date'].map(mdates.date2num)
#Making plot
fig = plt.figure()
fig.autofmt_xdate()
ax1 = plt.subplot2grid((6,1), (0,0), rowspan=6, colspan=1)
#Converts raw mdate numbers to dates
ax1.xaxis_date()
plt.xlabel("Date")
print(quotes)
#Making candlestick plot
candlestick_ohlc, (ax1,quotes.values,width=1,colorup='g',colordown='k',
alpha=0.75)
plt.show()
I'm trying to plot a candlestick chart from json data provided by Huobi but I can't sort the dates out & the plot looks horrible. Can you explain in fairly simple terms that a novice might understand what I am doing wrong please? This is my code ....
Thx, in advance`
You can put the fig.autofmt_xdate() at some point after calling the candlestick function; this will make the dates look nicer.
Concerning the plot itself, you may decide to make the bars a bit smaller, width=0.01, such that they won't overlap.
You may then also decide to zoom in a bit, to actually see what's going on in the chart, either interactively, or programmatically,
ax1.set_xlim(dt.datetime(2017,04,17,8),dt.datetime(2017,04,18,0))
This boiled down to a question of how wide to make the candlesticks given the granularity of the data as determined by the period & length parameters of the json feed. You just have to fiddle around with the width parameter in candlestick_ohlc() until the graph looks right...

How to save JSON data fetched from URL in PySpark?

I have fetched some .json data from API.
import urllib2
test=urllib2.urlopen('url')
print test
How can I save it as a table or data frame? I am using Spark 2.0.
This is how I succeeded importing .json data from web into df:
from pyspark.sql import SparkSession, functions as F
from urllib.request import urlopen
spark = SparkSession.builder.getOrCreate()
url = 'https://web.url'
jsonData = urlopen(url).read().decode('utf-8')
rdd = spark.sparkContext.parallelize([jsonData])
df = spark.read.json(rdd)
For this you can have some research and try using sqlContext. Here is Sample code:-
>>> df2 = sqlContext.jsonRDD(test)
>>> df2.first()
Moreover visit line and check for more things here,
https://spark.apache.org/docs/1.6.2/api/python/pyspark.sql.html
Adding to Rakesh Kumar answer, the way to do it in spark 2.0 is:
http://spark.apache.org/docs/2.1.0/sql-programming-guide.html#data-sources
As an example, the following creates a DataFrame based on the content of a JSON file:
# spark is an existing SparkSession
df = spark.read.json("examples/src/main/resources/people.json")
# Displays the content of the DataFrame to stdout
df.show()
Note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. For more information, please see JSON Lines text format, also called newline-delimited JSON. As a consequence, a regular multi-line JSON file will most often fail.
from pyspark import SparkFiles
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Project").getOrCreate()
zip_url = "https://raw.githubusercontent.com/spark-examples/spark-scala-examples/master/src/main/resources/zipcodes.json"
spark.sparkContext.addFile(zip_url)
zip_df = spark.read.json("file://" +SparkFiles.get("zipcodes.json"))
#click on raw and then copy url

Python3 - TypeError: the JSON object must be str, not 'bytes'

I'm a python beginner (working only with python3 so far) and I'm trying to present some code working the curses library to my classmates.
I got the code from a python/curses tutorial and it runs without problems in python2. In python3 it doesn't and I get the error in title.
Searching through the already asked questions, I found several solutions to this, but since being a absolute beginner with coding, I have no idea how to execute those in my specific code.
This is the code working in python2 :
import curses
from urllib2 import urlopen
from HTMLParser import HTMLParser
from simplejson import loads
def get_new_joke():
joke_json = loads(urlopen('http://api.icndb.com/jokes/random').read())
return HTMLParser().unescape(joke_json['value']['joke']).encode('utf-8')
Using the new modules in python3:
import curses
import json
import urllib
from html.parser import HTMLParser
def get_new_joke():
joke_json = loads(urlopen('http://api.icndb.com/jokes/random').read())
return HTMLParser().unescape(joke_json['value']['joke']).encode('utf-8')
Furthermore I tried to include this solution into my code:
Python 3, let json object accept bytes or let urlopen output strings
response = urllib.request.urlopen('http://api.icndb.com/jokes/random')
str_response = joke_json.readall().decode('utf-8')
obj = json.loads(str_response)
Tried around for hours now, but it tells me "json" ist not defined.