In Scrapy how to seperate items in output json file - json

I am a new learner of Scrapy and encounter a problem. I get several json responses when crawling websites(that part I have already done). I want to fill them in items and then output to one json file. But the output file is not what I expected.
The item class looks like this:
class USLPlayer(scrapy.Item):
ln = scrapy.Field()
fn = scrapy.Field() ...
The original json file structure looks like this:
{"players":{"4752569":{"ln":"Musa","fn":"Yahaya", .... ,"apprvd":"59750"}, "4801435":{"ln":"Ackley","fn":"Brian", ... ,"apprvd":"59750"}, ...}}
The expected result I hope to be looks like this:
{"item" :{"ln":"Musa","fn":"Yahaya", .... ,"apprvd":"59750"}},{"item": {"ln":"Ackley","fn":"Brian", ... ,"apprvd":"59750"}, ...
Basically I hope every item should be separated list.
The code about fill item is:
players = json.loads(plain_text)
for id, player in players["players"].items():
for key, value in player.items():
item = USLPlayer() item[key] = value
yield item
Is there any way I can ouput json file as I expected. Thank you very much for kind answer.

Have you tried the JSON lines feed exporter?
It will output your items as JSON objects one per line. Then, reading the list of players from the file is as easy as using json.loads on each line.

Related

How to retrieve data from an external nested json file on seed.rb

I want to retrieve data from an external nested JSON file on my seed.rb
The JSON looks like this:
{"people":[{"name":"John", "age":"23"}, {"name":"Jack", "age":"25"}]}
I saw a solution on GitHub but it only works on non-nested JSON.
Let's say you have JSON file db/seeds.json:
{"people":[{"name":"John", "age":"23"}, {"name":"Jack", "age":"25"}]}
You can use it like this in your db/seeds.rb:
seeds = JSON.parse(Rails.root.join("db", "seeds.json"), symbolize_names: true)
User.create(seeds[:people])
seeds[:people] in this case is array of hashes with user attributes
if you have:
json_data = {"people":[{"name":"John", "age":"23"}, {"name":"Jack", "age":"25"}]}
when you do:
json_data[:people]
you'll get an array:
[{:name=>"John", :age=>"23"}, {:name=>"Jack", :age=>"25"}]
if you want to use this array to populate a model, you can do:
People.create(json_data[:people])
if you want to read each item values, you can iterate through your data, like:
json_data[:people].each {|p| puts p[:name], p[:age]}

Saving an array in JSON in ionic

I have wrote a code like below to concatenate two arrays together and save them as a JSON file.
In this code, "seg" is an array of some number, which has been produced somewhere in my code. info is also an array containing some data following by "Seg" array.
Defining variable types:
seg: Array<any> = [];
info: Array<any>=[];
final: Array<{info:any, Seg:any}>=[];
push value in array and concatenate them together:
this.info.push({date_created: 25 , description: 'aaa', year:'2015'});
this.final.push({info: this.info ,Seg:this.seg});
this.file.writeFile(this.file.externalApplicationStorageDirectory, 'test.json', JSON.stringify(this.final));
the produced file is something like this:
[{"info":[{"date_created: 25 , "description"="aaa", "year" :"2015"}],"seg":[2,3,4,5]}]
As you can see, the info information is placed between two bracket, so JSON file consider it as a list, not record.
Does anyone knows , how can I remove this brackets from the info array sides?
Should change the type of variable from array to anything else?
You can use like this to store as a record
seg: Array<any> = [];
info: Array<any>=[];
final:{info:any, Seg:any};
this.final.Seg = this.seg;
this.final.info = this.info;

How to omit the header in when use spark to read csv.file?

I am trying to use Spark to read a csv file in jupyter notebook. So far I have
spark = SparkSession.builder.master("local[4]").getOrCreate()
reviews_df = spark.read.option("header","true").csv("small.csv")
reviews_df.collect()
This is how the reviews_df looks like:
[Row(reviewerID=u'A1YKOIHKQHB58W', asin=u'B0001VL0K2', overall=u'5'),
Row(reviewerID=u'A2YB0B3QOHEFR', asin=u'B000JJSRNY', overall=u'5'),
Row(reviewerID=u'AAI0092FR8V1W', asin=u'B0060MYKYY', overall=u'5'),
Row(reviewerID=u'A2TAPSNKK9AFSQ', asin=u'6303187218', overall=u'5'),
Row(reviewerID=u'A316JR2TQLQT5F', asin=u'6305364206', overall=u'5')...]
But each row of the data frame contains the column names, how can I reformat the data, so that it can become:
[(u'A1YKOIHKQHB58W', u'B0001VL0K2', u'5'),
(u'A2YB0B3QOHEFR', u'B000JJSRNY', u'5')....]
Dataframe always returns Row objects, thats why when you issue collect() on dataframe, it shows -
Row(reviewerID=u'A1YKOIHKQHB58W', asin=u'B0001VL0K2', overall=u'5')
to get what you want, you can do -
reviews_df.rdd.map(lambda row : (row.reviewerID,row.asin,row.overall)).collect()
this will return you tuple of values of rows

Python: How to combine the results of for loop into one output which extracted from json file?

I extracted some values from Json file and the result is as follow.
[92.21372509012637]
[92.21372509012637]
[240.3532913296266]
[240.3532913296266]
[240.3532913296266]
[240.3532913296266]
I would like to get the result in a single list. as follow.
[92.21372509012637, 92.21372509012637, 240.3532913296266, 240.3532913296266, 240.3532913296266, 240.3532913296266]
Following is some part of my code.
for i in range(len(response_i['objcontent'][0]['rowvalues'])):
lat = response_i['objcontent'][0]['rowvalues'][i][0]
decoded=base64.b64decode(lat)
if len(decoded)<9:
a=struct.unpack('d',decoded)
result=[]
for i in a:
result.append(i)
print (result)
any one knows how can I fix this issue?
Thank you.

How to get file of RDD in spark

I am playing with spark RDD with json files and i am doing something like below
val uisJson5 = sqlContext.read.json(
sc.textFile("s3n://localtion/*")
.filter(line =>
line.contains("\"xyz\":\"A\"")
&& line.contains("\"id\":\"adasdfasdfasd\"")
))
uisJson5.show()
I want to know the source json files as well from where the results are coming. Is there any way i can do this?
Edit:
I was able to do it using below code
val uisJson1 = sc.textFile("s3n://localtion/*”)
.filter(line => line.contains("\"xyz\":\"A\"")
&& line.contains("\"id\":\"adasdfasdfasd\""))
uisJson1.collect().foreach(println)
You are looking for wholeTextFiles along with flatMapValues.
wholeTextFiles lets you read a directory containing multiple small text files, and returns each of them as (filename, content) pairs. This is in contrast with textFile, which would return one record per line in each file.