Trying to convert str object to DataFrame: web scraping with python - json

System: WIN10
IDE: MS VSCode
Language: Python version 3.7.3
Library: pandas version 1.0.1
Data source: https://hoopshype.com/salaries/#hh-tab-team-payroll
Dataset: team payrolls
I am having an issue for some reason when trying to convert a str(table) into a dataframe. I think I am grossly missing something here. The sample code is supplied below and it keeps throwing only the top row of data into the data frame. I think I am missing some other transformation process.
Steps were taken:
searched the net for various tutorials
tried splitting the data first then trying the conversion on it to no luck
Code:
# import Libraries
import pandas as pd
import requests
from bs4 import BeautifulSoup
from tabulate import tabulate
# data visualization
import matplotlib as plt
import seaborn as sns
# setting: For output purposes to show all columns
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 2000)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
# parsing the webpage
url = 'https://hoopshype.com/salaries/#hh-tab-team-payroll'
r = requests.get(url)
data = r.text
# create a beautfulsoup object
soup = BeautifulSoup(r.content,'lxml')
soup.prettify
# team salaries by year = tsby
table = soup.find_all('table')[0]
nbatsby = pd.read_html(str(table))
# this is where I am stuck
df = pd.DataFrame(nbatsby)
df.head(100)

pandas has read_html which can scrape an html table an convert it to dataframe directly :
import pandas as pd
import requests
r = requests.get("https://hoopshype.com/salaries/#hh-tab-team-payroll")
data = pd.read_html(r.text, attrs = {'class': 'hh-salaries-ranking-table'})[0]
print(data)
Output:
Unnamed: 0 Team 2019/20 2020/21 2021/22 2022/23 2023/24 2024/25
0 1.0 Portland $XXX,XXX,XXX $XXXX,XXX,XXX $XX,XXX,XXX $XX,XXX,XXX $XX,XXX,XXX $XX,XXX,XXX
1 2.0 Miami $XXX,XXX,XXX $XX,XXX,XXX $XX,XXX,XXX $XX,XXX,XXX $0 $0
................................................................................................................
28 29.0 Indiana $XXX,XXX,XXX $XXX,XXX,XXX $XX,XXX,XXX $XX,XXX,XXX $XX,XXX,XXX $0
29 30.0 New York $XXX,XXX,XXX $XX,XXX,XXX $XX,XXX,XXX $0 $0 $0

Related

I am trying to pass values to a function from the reliability library but without much success. here is my sample data and code

My sample data was as follows
Tag_Typ
Alpha_Estimate
Beta_Estimate
PM01_Avg_Cost
PM02_Avg_Cost
OLK-AC-101-14A_PM01
497.665
0.946584
1105.635
462.3833775
OLK-AC-103-01_PM01
288.672
0.882831
1303.8875
478.744375
OLK-AC-1105-01_PM01
164.282
0.787158
763.4475758
512.185814
OLK-AC-236-05A_PM01
567.279
0.756839
640.718
450.3277778
OLK-AC-276-05A_PM01
467.53
0.894773
1536.78625
439.78
This my sample code
import pandas as pd
import numpy as np
from reliability.Repairable_systems import optimal_replacement_time
import matplotlib.pyplot as plt
data = pd.read_excel (r'C:\Users\\EU_1_EQ_PM01_Cost.xlsx')
data_frame = pd.DataFrame(data, columns= ['Alpha_Estimate','Beta_Estimate','PM01_Avg_Cost','PM02_Avg_Cost'])
Alpha_Est=pd.DataFrame(data, columns= ['Alpha_Estimate'])
Beta_Est=pd.DataFrame(data, columns= ['Beta_Estimate'])
PM_Est=pd.DataFrame(data, columns= ['PM02_Avg_Cost'])
CM_Est=pd.DataFrame(data, columns= ['PM01_Avg_Cost'])
optimal_replacement_time(cost_PM=PM_Est, cost_CM=CM_Est, weibull_alpha=Alpha_Est, weibull_beta=Beta_Est,q=0)
plt.show()
I need to loop through the value set for each tag and pass those values to the Optimal replacement function to return the results.
[Sample Output]
ValueError: Can only compare identically-labeled DataFrame objects
I would appreciate any suggestions on how I can pass the values of the PM cost, PPM cost, and the distribution parameters alpha and beta in the function as I iterate through the tag-type and print the results for each tag. Thanks.
The core of your question is how to iterate through a list in Python. This will achieve what you're after:
import pandas as pd
from reliability.Repairable_systems import optimal_replacement_time
df = pd.read_excel(io=r"C:\Users\Matthew Reid\Desktop\sample_data.xlsx")
alpha = df["Alpha_Estimate"].tolist()
beta = df["Beta_Estimate"].tolist()
CM = df["PM01_Avg_Cost"].tolist()
PM = df["PM02_Avg_Cost"].tolist()
ORT = []
for i in range(len(alpha)):
ort = optimal_replacement_time(cost_PM=PM[i], cost_CM=CM[i], weibull_alpha=alpha[i], weibull_beta=beta[i],q=0)
ORT.append(ort.ORT)
print('List of the optimal replacement times:\n',ORT)
On a separate note, all of your beta values are less than 1. This means the hazard rate is decreasing (aka. infant mortality / early life failures). When you run the above script, each iteration will print the warning:
"WARNING: weibull_beta is < 1 so the hazard rate is decreasing, therefore preventative maintenance should not be conducted."
If you have any further questions, you know how to contact me :)

how to upload and read a zip file containing training and testing images data from google colab from my pc

I am new to google colab. I am implementing a pretrained vgg16 and resnet50 model using pytorch, but I am unable to load my file and read it as it returns an error of no directory found
I have uploaded the data through file also I have used to upload it using
from google.colab import files
uploaded = files.upload()
The file got uploaded but when I tried to unzip it because it is a zip file using
!unzip content/cropped_months
then it says
no file found
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision.transforms import *
from torch.optim import lr_scheduler
from torch.autograd import Variable
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
import copy
from google.colab import files
uploaded = files.upload()
!unzip content/cropped_months
data_dir = 'content/cropped_months'
​
#Define transforms for the training data and testing data
train_transforms = transforms.Compose([transforms.RandomRotation(30),transforms.RandomResizedCrop(224),transforms.RandomHorizontalFlip(),transforms.ToTensor(),transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
​
test_transforms = transforms.Compose([transforms.Resize(256),transforms.CenterCrop(224),transforms.ToTensor(),transforms.Normalize([0.485, 0.456, 0.406],[0.229, 0.224, 0.225])])
​
#pass transform here-in
train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)
​
#data loaders
trainloader = torch.utils.data.DataLoader(train_data, batch_size=8, shuffle=True)
testloader = torch.utils.data.DataLoader(test_data, batch_size=8, shuffle=True)
​
print("Classes: ")
class_names = train_data.classes
print(class_names)
first error
unzip: cannot find or open content/cropped_months,
content/cropped_months.zip or content/cropped_months.ZIP.
second error
--------------------------------------------------------------------------- FileNotFoundError Traceback (most recent call
last) in ()
16
17 #pass transform here-in
---> 18 train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
19 test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)
20
2 frames
/usr/local/lib/python3.6/dist-packages/torchvision/datasets/folder.py
in _find_classes(self, dir)
114 if sys.version_info >= (3, 5):
115 # Faster and available in Python 3.5 and above
--> 116 classes = [d.name for d in os.scandir(dir) if d.is_dir()]
117 else:
118 classes = [d for d in os.listdir(dir) if os.path.isdir(os.path.join(dir, d))]
FileNotFoundError: [Errno 2] No such file or directory:
'content/cropped_months (1)/train'
You are probably trying to access the wrong path. In my notebook, the file was uploaded to the working directory.
Use google.colab.files to upload the zip.
from google.colab import files
files.upload()
Upload your file. Google Colab will display where it was saved:
Saving dummy.zip to dummy.zip
Then just run !unzip:
!unzip dummy.zip
I think you can use PySurvival library is compatible with Torch , here the link :
https://square.github.io/pysurvival/miscellaneous/save_load.html

how to convert <class 'bs4.element.ResultSet'> to JSON in python using builtin operator json.dumps

how do i convert to json format,
i am getting an error "is not JSON serializable"
following is my program
from urllib2 import urlopen as uReq
import re
from bs4 import BeautifulSoup, Comment
import requests
import json
my_url='https://uae.dubizzle.com/en/property-for-rent/residential/apartmentflat/?filters=(neighborhoods.ids=123)&page=1'
uClient=uReq(my_url)
page_html= uClient.read()
page_soup=BeautifulSoup(page_html, 'html.parser')
comments = page_soup.findAll(text=lambda text:isinstance(text, Comment))
[comment.extract() for comment in comments]
json_output= page_soup.find_all("script",type="application/ld+json",string=re.compile("SingleFamilyResidence")) #find_all("script", "application/ld+json")
#comments = json_output.findAll(text=lambda text:isinstance(text, Comment))
#[comment.extract() for comment in comments]
#json_output.find_all(text="<script type=""application/ld+json"">").replaceWith("")
#print json_output
jsonD = json.dumps(json_output)
uClient.close()
[{"#context":"http://schema.org","#type":"SingleFamilyResidence","name":"Spacious 2BHK For Rent in Damascus Street Al Qusais","url":"https://dubai.dubizzle.com/property-for-rent/residential/apartmentflat/2018/4/29/spacious-two-bed-room-available-for-rent-i-2/","address":{"#type":"PostalAddress","addressLocality":"Dubai","addressRegion":"Dubai"},"":{"#type":"Product","name":"Spacious 2BHK For Rent in Damascus Street Al Qusais","url":"https://dubai.dubizzle.com/property-for-rent/residential/apartmentflat/2018/4/29/spacious-two-bed-room-available-for-rent-i-2/","offers":{"#type":"Offer","price":49000,"priceCurrency":"AED"}},"floorSize":1400,"numberOfRooms":2,"image":"https://dbzlpvfeeds-a.akamaihd.net/images/user_images/2018/04/29/80881784_CP_photo.jpeg","geo":{"#type":"GeoCoordinates","latitude":55.3923,"longitude":25.2893}}, {"#context":"http://schema.org","#type":"SingleFamilyResidence","name":"Fully Furnished 2 Bed Room Flat -Al Qusais","url":"https://dubai.dubizzle.com/property-for-rent/residential/apartmentflat/2017/10/9/fully-furnished-brand-new-2-bed-room-flat--2/","address":{"#type":"PostalAddress","addressLocality":"Dubai","addressRegion":"Dubai"},"":{"#type":"Product","name":"Fully Furnished 2 Bed Room Flat -Al Qusais","url":"https://dubai.dubizzle.com/property-for-rent/residential/apartmentflat/2017/10/9/fully-furnished-brand-new-2-bed-room-flat--2/","offers":{"#type":"Offer","price":70000,"priceCurrency":"AED"}},"floorSize":1400,"numberOfRooms":2,"image":"https://dbzlpvfeeds-a.akamaihd.net/images/user_images/2018/09/05/84371522_CP_photo.jpeg","geo":{"#type":"GeoCoordinates","latitude":55.3959,"longitude":25.2959}}]
Hi Added another wrapper of BeautifulSoup and got the expected json by
first getting text and using .get_text() Method and second using json.loads
thank you intelligentsia.
from urllib2 import urlopen as uReq
import re
from bs4 import BeautifulSoup, Comment
import requests
import json
my_url='https://uae.dubizzle.com/en/property-for-rent/residential/apartmentflat/?filters=(neighborhoods.ids=123)&page=1'
uClient=uReq(my_url)
page_html= uClient.read()
page_soup=BeautifulSoup(page_html, 'lxml')# 'html.parser')
json_output= BeautifulSoup(str(page_soup.find_all("script",type="application/ld+json",string=re.compile("SingleFamilyResidence"))), 'lxml')#find_all("script", "application/ld+json")
json_text=json_output.get_text()
json_data = json.loads(json_text)
print json_data
uClient.close()
first convert bs4.element.ResultSet to string after that change to json
json_data = json.dumps(str(json_output))

How to convert Fantasy Premier League Data from JSON to CSV?

I am new to python and as per my thesis work I am trying to convert JSON to csv.I am able to download data in JSON but when I am writing it back using dictionaries it is not converting JSON to CSV with every column.
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
import matplotlib.pyplot as plt
import numpy as np
import requests
from pprint import pprint
import csv
from time import sleep
s1='https://fantasy.premierleague.com/drf/element-summary/'
print s1
players = []
for player_link in range(1,450,1):
link = s1+""+str(player_link)
print link
r = requests.get(link)
print r
player =r.json()
players.append(player)
sleep(1)
with open('C:\Users\dell\Downloads\players_new2.csv', 'w') as f: # Just use 'w' mode in 3.x
w = csv.DictWriter(f,player.keys())
w.writeheader()
for player in players:
w.writerow(player)
I have uploaded the expected output(dec_15_expected.csv) and the program out with file name "player_new_wrong_output.csv"
https://drive.google.com/drive/folders/0BwKYmRU_0K6tZUljd3Q0aG1LT0U?usp=sharing
It will be a great help if some can tell what I am doing wrong.
Converting JSON to CSV is simple with pandas. Try this:
import pandas as pd
df=pd.read_json("input.json")
df.to_csv('output.csv')

How to separate text from twitter streaming JSON responses and run analysis on text with python?

I am trying to use the twitter API to run sentiment analysis on the text. I am running into the issue that I am not understanding the way to separate the text from each tweet and running the sentiment polarity analysis provided in the TextBlob library. Further more I would like this to only pull back on english tweets. The output is in JSON.
Here is the code to produce the tweets based on keywords (in this case "usd", "euro", "loonie") and my lame attempt at storing the text and using the result in a variable:
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import json
import re
import pandas as pd
import matplotlib.pyplot as plt
#Variables that contains the user credentials to access Twitter API
access_token = "xxxx"
access_token_secret = "xxxx"
consumer_key = "xxxx"
consumer_secret = "xxxx"
#This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):
def on_data(self, data):
print data
return True
def on_error(self, status):
print status
if __name__ == '__main__':
#This handles Twitter authentication and the connection to Twitter Streaming API
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
#This line filter Twitter Streams to capture data by the keywords: 'python', 'javascript', 'ruby'
stream.filter(track=['euro', 'dollar', 'loonie', ] )
tweets_data_path = stream.filter
tweets_data = []
tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
try:
tweet = json.loads(line)
tweets_data.append(tweet)
except:
continue
print len(tweets_data)
tweets['text'] = map(lambda tweet: tweet['text'], tweets_data)
wiki = TextBlob(tweets['text'])
r = wiki.sentiment.polarity
print r
This is what the output looks like:
{"created_at":"Sun Jun 14 23:43:31 +0000 2015","id":610231121016524801,"id_str":"610231121016524801","text":"RT #amirulimannn: RM6 diperlukan utk tukar kpd 1Pound.\nRM3 diperlukan utk tukar kpd 1S'pore Dollar.\n\nGraf matawang jatuh. Tak sedih ke? htt\u2026","source":"\u003ca href=\"http://twitter.com/download/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":42642877,"id_str":"42642877","name":"Wny","screen_name":"waaannnyyy","location":"Dirgahayu Darul Makmur","url":null,"description":"Aku serba tiada, aku kekurangan.","protected":false,"verified":false,"followers_count":320,"friends_count":239,"listed_count":1,"favourites_count":4344,"statuses_count":34408,"created_at":"Tue May 26 15:10:28 +0000 2009","utc_offset":28800,"time_zone":"Kuala Lumpur","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"FFFFFF","profile_background_image_url":"http://pbs.twimg.com/profile_background_images/433201191825047553/PM76m-v2.jpeg","profile_background_image_url_https":"https://pbs.twimg.com/profile_background_images/433201191825047553/PM76m-v2.jpeg","profile_background_tile":true,"profile_link_color":"DD2E44","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"EFEFEF","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http://pbs.twimg.com/profile_images/609402965795835904/mm6jjRRO_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/609402965795835904/mm6jjRRO_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/42642877/1415486321","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Sat Jun 13 03:33:29 +0000 2015","id":609564219495706624,"id_str":"609564219495706624","text":"RM6 diperlukan utk tukar kpd 1Pound.\nRM3 diperlukan utk tukar kpd 1S'pore Dollar.\n\nGraf matawang jatuh. Tak sedih ke? http://t.co/dum4skb6uK","source":"\u003ca href=\"http://twitter.com/download/android\" rel=\"nofollow\"\u003eTwitter for Android\u003c/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":481856658,"id_str":"481856658","name":"seorang iman","screen_name":"amirulimannn","location":"+06MY","url":"http://instagram.com/amirulimannn","description":"I wanna drown myself in a bottle of her perfume","protected":false,"verified":false,"followers_count":723,"friends_count":834,"listed_count":2,"favourites_count":4810,"statuses_count":50981,"created_at":"Fri Feb 03 07:49:55 +0000 2012","utc_offset":28800,"time_zone":"Kuala Lumpur","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"AD0A20","profile_background_image_url":"http://pbs.twimg.com/profile_background_images/378800000139426816/61DHBnYy.jpeg","profile_background_image_url_https":"https://pbs.twimg.com/profile_background_images/378800000139426816/61DHBnYy.jpeg","profile_background_tile":false,"profile_link_color":"E36009","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"24210E","profile_text_color":"89B5A2","profile_use_background_image":true,"profile_image_url":"http://pbs.twimg.com/profile_images/592744790283911169/dW7S73WA_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/592744790283911169/dW7S73WA_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/481856658/1428379855","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":1321,"favorite_count":229,"entities":{"hashtags":[],"trends":[],"urls":[],"user_mentions":[],"symbols":[],"media":[{"id":609564142886760448,"id_str":"609564142886760448","indices":[118,140],"media_url":"http://pbs.twimg.com/media/CHWbW7yUsAAyAEw.jpg","media_url_https":"https://pbs.twimg.com/media/CHWbW7yUsAAyAEw.jpg","url":"http://t.co/dum4skb6uK","display_url":"pic.twitter.com/dum4skb6uK","expanded_url":"http://twitter.com/amirulimannn/status/609564219495706624/photo/1","type":"photo","sizes":{"small":{"w":340,"h":340,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"medium":{"w":600,"h":600,"resize":"fit"},"large":{"w":1024,"h":1024,"resize":"fit"}}}]},"extended_entities":{"media":[{"id":609564142886760448,"id_str":"609564142886760448","indices":[118,140],"media_url":"http://pbs.twimg.com/media/CHWbW7yUsAAyAEw.jpg","media_url_https":"https://pbs.twimg.com/media/CHWbW7yUsAAyAEw.jpg","url":"http://t.co/dum4skb6uK","display_url":"pic.twitter.com/dum4skb6uK","expanded_url":"http://twitter.com/amirulimannn/status/609564219495706624/photo/1","type":"photo","sizes":{"small":{"w":340,"h":340,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"medium":{"w":600,"h":600,"resize":"fit"},"large":{"w":1024,"h":1024,"resize":"fit"}}}]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"in"},"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"trends":[],"urls":[],"user_mentions":[{"screen_name":"amirulimannn","name":"seorang iman","id":481856658,"id_str":"481856658","indices":[3,16]}],"symbols":[],"media":[{"id":609564142886760448,"id_str":"609564142886760448","indices":[139,140],"media_url":"http://pbs.twimg.com/media/CHWbW7yUsAAyAEw.jpg","media_url_https":"https://pbs.twimg.com/media/CHWbW7yUsAAyAEw.jpg","url":"http://t.co/dum4skb6uK","display_url":"pic.twitter.com/dum4skb6uK","expanded_url":"http://twitter.com/amirulimannn/status/609564219495706624/photo/1","type":"photo","sizes":{"small":{"w":340,"h":340,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"medium":{"w":600,"h":600,"resize":"fit"},"large":{"w":1024,"h":1024,"resize":"fit"}},"source_status_id":609564219495706624,"source_status_id_str":"609564219495706624"}]},"extended_entities":{"media":[{"id":609564142886760448,"id_str":"609564142886760448","indices":[139,140],"media_url":"http://pbs.twimg.com/media/CHWbW7yUsAAyAEw.jpg","media_url_https":"https://pbs.twimg.com/media/CHWbW7yUsAAyAEw.jpg","url":"http://t.co/dum4skb6uK","display_url":"pic.twitter.com/dum4skb6uK","expanded_url":"http://twitter.com/amirulimannn/status/609564219495706624/photo/1","type":"photo","sizes":{"small":{"w":340,"h":340,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"medium":{"w":600,"h":600,"resize":"fit"},"large":{"w":1024,"h":1024,"resize":"fit"}},"source_status_id":609564219495706624,"source_status_id_str":"609564219495706624"}]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"in","timestamp_ms":"1434325411453"}
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import json
# Variables that contains the user credentials to access Twitter API
access_token = ''
access_token_secret = ''
consumer_key = ''
consumer_secret = ''
# This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):
def on_data(self, data):
json_load = json.loads(data)
texts = json_load['text']
coded = texts.encode('utf-8')
s = str(coded)
print(s[2:-1])
return True
def on_error(self, status):
print(status)
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, StdOutListener())
# This line filter Twitter Streams to capture data by the keywords: 'python', 'javascript', 'ruby'
stream.filter(track=['euro', 'dollar', 'loonie', ], languages=['en'])
For your original question about the json: You can load the data asit streams using json.loads(). The reason for the other stuff so you don't get charmap error when you're extracting the data from twitter onto python. The reason for s[2:-1] is to get rid of the extra character from encoding to utf-8.
For english only tweets you can also filter directly from the stream using languages=['en'].
I'm not familiar with TextBlob library, but you can store it through multiple ways, just write your information onto a file and when you run TextBlob read directly from the file. You can replace print(s[2:-1]) or add to it:
myfile = open('text.csv','a')
myFile.write(s[2:-1])
myFile.write('\n') # adds a line between tweets
myFile.close()
You can read it using file = open('text.csv', 'r') to do your sentiment analysis. Don't forget to add file.close() anytime you open a file.