I am trying to aggregate a bunch of JSON files in to a single one for three sources and three years. While so far I have only been able to do it through the tedious way, I am sure I could do it in a smarter and more elegant manner.
json1 <- lapply(readLines("NYT_1989.json"), fromJSON)
json2 <- lapply(readLines("NYT_1990.json"), fromJSON)
json3 <- lapply(readLines("NYT_1991.json"), fromJSON)
json4 <- lapply(readLines("WP_1989.json"), fromJSON)
json5 <- lapply(readLines("WP_1990.json"), fromJSON)
json6 <- lapply(readLines("WP_1991.json"), fromJSON)
json7 <- lapply(readLines("USAT_1989.json"), fromJSON)
json8 <- lapply(readLines("USAT_1990.json"), fromJSON)
json9 <- lapply(readLines("USAT_1991.json"), fromJSON)
jsonl <- list(json1, json2, json3, json4, json5, json6, json7, json8, json9)
Note that the year period goes equally for the three files from 1989 to 1991. Any ideas? Thanks!
PS: Example of the data inside each file:
{"date": "December 31, 1989, Sunday, Late Edition - Final", "body": "Frigid temperatures across much of the United States this month sent demand for heating oil soaring, providing a final upward jolt to crude oil prices. Some spot crude traded at prices up 40 percent or more from a year ago. Will these prices hold? Five experts on oil offer their views. That's assuming the economy performs as expected - about 1 percent growth in G.N.P. The other big uncertainty is the U.S.S.R. If their production drops more than 4 percent, prices could stengthen. ", "title": "Prospects;"}
{"date": "December 31, 1989, Sunday, Late Edition - Final", "body": "DATELINE: WASHINGTON, Dec. 30 For years, experts have dubbed Czechoslovakia's spy agency the ''two Czech'' service. But he cautioned against euphoria. ''The Soviets wouldn't have relied on just official cooperation,'' he said. ''It would be surprising if they haven't unilaterally penetrated friendly services with their own agents, too.'' ", "title": "Upheaval in the East: Espionage;"}
{"date": "December 31, 1989, Sunday, Late Edition - Final", "body": "SURVIVING the decline in the economy will be the overriding issue for 1990, say leaders of the county's business community. Successful Westchester business owners will face and overcome these risks and obstacles. Westchester is a land of opportunity for the business owner. ", "title": "Coping With the Economic Prospects of 1990"}
Here you go:
require(jsonlite)
filelist <- c("NYT_1989.json","NYT_1990.json","NYT_1991.json",
"WP_1989.json", "WP_1990.json","WP_1991.json",
"USAT_1989.json","USAT_1990.json","USAT_1991.json")
newJSON <- sapply(filelist, function(x) fromJSON(readLines(x)))
Read in just the body entry from each line of the input file.
You asked about how to just read in a subset of the JSON file. The file data referenced isn't actually JSON format. It is JSON like, hence we have to modify the input to fromJSON() to correctly read in the data. We dereference the result from fromJSON()$body to extract just the body variable.
filelist <- c("./data/NYT_1989.json", "./data/NYT_1990.json")
newJSON <- sapply(filelist, function(x) fromJSON(sprintf("[%s]", paste(readLines(x), collapse = ",")), flatten = FALSE)$body)
newJSON
Results
> filelist <- c("./data/NYT_1989.json", "./data/NYT_1990.json")
> newJSON <- sapply(filelist, function(x) fromJSON(sprintf("[%s]", paste(readLines(x), collapse = ",")), flatten = FALSE)$body)
> newJSON
./data/NYT_1989.json
[1,] "Frigid temperatures across much of the United States this month sent demand for heating oil soaring, providing a final upward jolt to crude oil prices. Some spot crude traded at prices up 40 percent or more from a year ago. Will these prices hold? Five experts on oil offer their views. That's assuming the economy performs as expected - about 1 percent growth in G.N.P. The other big uncertainty is the U.S.S.R. If their production drops more than 4 percent, prices could stengthen. "
[2,] "DATELINE: WASHINGTON, Dec. 30 For years, experts have dubbed Czechoslovakia's spy agency the ''two Czech'' service. But he cautioned against euphoria. ''The Soviets wouldn't have relied on just official cooperation,'' he said. ''It would be surprising if they haven't unilaterally penetrated friendly services with their own agents, too.'' "
[3,] "SURVIVING the decline in the economy will be the overriding issue for 1990, say leaders of the county's business community. Successful Westchester business owners will face and overcome these risks and obstacles. Westchester is a land of opportunity for the business owner. "
./data/NYT_1990.json
[1,] "Blue temperatures across much of the United States this month sent demand for heating oil soaring, providing a final upward jolt to crude oil prices. Some spot crude traded at prices up 40 percent or more from a year ago. Will these prices hold? Five experts on oil offer their views. That's assuming the economy performs as expected - about 1 percent growth in G.N.P. The other big uncertainty is the U.S.S.R. If their production drops more than 4 percent, prices could stengthen. "
[2,] "BLUE1: WASHINGTON, Dec. 30 For years, experts have dubbed Czechoslovakia's spy agency the ''two Czech'' service. But he cautioned against euphoria. ''The Soviets wouldn't have relied on just official cooperation,'' he said. ''It would be surprising if they haven't unilaterally penetrated friendly services with their own agents, too.'' "
[3,] "GREEN4 the decline in the economy will be the overriding issue for 1990, say leaders of the county's business community. Successful Westchester business owners will face and overcome these risks and obstacles. Westchester is a land of opportunity for the business owner. "
You might find the following apply tutorial useful:
Datacamp: R tutorial on the Apply family of functions
I also recommend reading:
R Inferno - Chapter 4 - Over-Vectorizing
trust my when I say this online free book has helped me a lot. It has also confirmed I am an idiot on multiple occasions :-)
Related
I am using a HuggingFace summariser pipeline and I noticed that if I train a model for 3 epochs and then at the end run evaluation on all 3 epochs with fixed random seeds, I get a different results based on whether I restart the python console 3 times or whether I load the different model (one for every epoch) on the same summariser object in a loop, and I would like to understand why we have this strange behaviour.
While my results are based on ROUGE score on a large dataset, I have made this small reproducible example to show this issue. Instead of using the weights of the same model at different training epochs, I decided to demonstrate using two different summarization models, but the effect is the same. Grateful for any help.
Notice how in the first run I firstly use the facebook/bart-large-cnn model and then the lidiya/bart-large-xsum-samsum model without shutting the python terminal. In the second run I only use lidiya/bart-large-xsum-samsum model and get different output (which should not be the case).
NOTE: this reproducible example won't work on a CPU machine as it doesn't seem sensitive to torch.use_deterministic_algorithms(True) and it might give different results every time when run on a CPU, so should be reproduced on a GPU.
FIRST RUN
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
import torch
# random text taken from UK news website
text = """
The veteran retailer Stuart Rose has urged the government to do more to shield the poorest from double-digit inflation, describing the lack of action as “horrifying”, with a prime minister “on shore leave” leaving a situation where “nobody is in charge”.
Responding to July’s 10.1% headline rate, the Conservative peer and Asda chair said: “We have been very, very slow in recognising this train coming down the tunnel and it’s run quite a lot of people over and we now have to deal with the aftermath.”
Attacking a lack of leadership while Boris Johnson is away on holiday, he said: “We’ve got to have some action. The captain of the ship is on shore leave, right, nobody’s in charge at the moment.”
Lord Rose, who is a former boss of Marks & Spencer, said action was needed to kill “pernicious” inflation, which he said “erodes wealth over time”. He dismissed claims by the Tory leadership candidate Liz Truss’s camp that it would be possible for the UK to grow its way out of the crisis.
"""
seed = 42
torch.cuda.manual_seed_all(seed)
torch.use_deterministic_algorithms(True)
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large-cnn")
model.eval()
summarizer = pipeline(
"summarization", model=model, tokenizer=tokenizer,
num_beams=5, do_sample=True, no_repeat_ngram_size=3, device=0
)
output = summarizer(text, truncation=True)
tokenizer = AutoTokenizer.from_pretrained("lidiya/bart-large-xsum-samsum")
model = AutoModelForSeq2SeqLM.from_pretrained("lidiya/bart-large-xsum-samsum")
model.eval()
summarizer = pipeline(
"summarization", model=model, tokenizer=tokenizer,
num_beams=5, do_sample=True, no_repeat_ngram_size=3, device=0
)
output = summarizer(text, truncation=True)
print(output)
output from lidiya/bart-large-xsum-samsum model should be
[{'summary_text': 'The UK economy is in crisis because of inflation. The government has been slow to react to it. Boris Johnson is on holiday.'}]
SECOND RUN (you must restart python to conduct the experiment)
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
import torch
text = """
The veteran retailer Stuart Rose has urged the government to do more to shield the poorest from double-digit inflation, describing the lack of action as “horrifying”, with a prime minister “on shore leave” leaving a situation where “nobody is in charge”.
Responding to July’s 10.1% headline rate, the Conservative peer and Asda chair said: “We have been very, very slow in recognising this train coming down the tunnel and it’s run quite a lot of people over and we now have to deal with the aftermath.”
Attacking a lack of leadership while Boris Johnson is away on holiday, he said: “We’ve got to have some action. The captain of the ship is on shore leave, right, nobody’s in charge at the moment.”
Lord Rose, who is a former boss of Marks & Spencer, said action was needed to kill “pernicious” inflation, which he said “erodes wealth over time”. He dismissed claims by the Tory leadership candidate Liz Truss’s camp that it would be possible for the UK to grow its way out of the crisis.
"""
seed = 42
torch.cuda.manual_seed_all(seed)
torch.use_deterministic_algorithms(True)
tokenizer = AutoTokenizer.from_pretrained("lidiya/bart-large-xsum-samsum")
model = AutoModelForSeq2SeqLM.from_pretrained("lidiya/bart-large-xsum-samsum")
model.eval()
summarizer = pipeline(
"summarization", model=model, tokenizer=tokenizer,
num_beams=5, do_sample=True, no_repeat_ngram_size=3, device=0
)
output = summarizer(text, truncation=True)
print(output)
output should be
[{'summary_text': 'The government has been slow to deal with inflation. Stuart Rose has urged the government to do more to shield the poorest from double-digit inflation.'}]
Why is the first output different from the second one?
You might re-seed the program after bart-large-cnn pipeline. Otherwise the seed generator would be used by the first pipeline and generate different outputs for your lidiya model across two scripts.
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
import torch
# random text taken from UK news website
text = """
The veteran retailer Stuart Rose has urged the government to do more to shield the poorest from double-digit inflation, describing the lack of action as “horrifying”, with a prime minister “on shore leave” leaving a situation where “nobody is in charge”.
Responding to July’s 10.1% headline rate, the Conservative peer and Asda chair said: “We have been very, very slow in recognising this train coming down the tunnel and it’s run quite a lot of people over and we now have to deal with the aftermath.”
Attacking a lack of leadership while Boris Johnson is away on holiday, he said: “We’ve got to have some action. The captain of the ship is on shore leave, right, nobody’s in charge at the moment.”
Lord Rose, who is a former boss of Marks & Spencer, said action was needed to kill “pernicious” inflation, which he said “erodes wealth over time”. He dismissed claims by the Tory leadership candidate Liz Truss’s camp that it would be possible for the UK to grow its way out of the crisis.
"""
seed = 42
torch.cuda.manual_seed_all(seed)
torch.use_deterministic_algorithms(True)
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large-cnn")
model.eval()
summarizer = pipeline(
"summarization", model=model, tokenizer=tokenizer,
num_beams=5, do_sample=True, no_repeat_ngram_size=3, device=0
)
output = summarizer(text, truncation=True)
seed = 42
torch.cuda.manual_seed_all(seed)
torch.use_deterministic_algorithms(True)
tokenizer = AutoTokenizer.from_pretrained("lidiya/bart-large-xsum-samsum")
model = AutoModelForSeq2SeqLM.from_pretrained("lidiya/bart-large-xsum-samsum")
model.eval()
summarizer = pipeline(
"summarization", model=model, tokenizer=tokenizer,
num_beams=5, do_sample=True, no_repeat_ngram_size=3, device=0
)
output = summarizer(text, truncation=True)
print(output)
I have 4 json files spread into to folders: folder1 and folder2. Each json file contains the date, the body and the title.
folder1.json:
{"date": "December 31, 1989, Sunday, Late Edition - Final", "body": "Frigid temperatures across much of the United States this month sent demand for heating oil soaring, providing a final upward jolt to crude oil prices. That's assuming the economy performs as expected - about 1 percent growth in G.N.P. The other big uncertainty is the U.S.S.R. If their production drops more than 4 percent, prices could stengthen. ", "title": "Prospects;"}
{"date": "December 31, 1989, Sunday, Late Edition - Final", "body": "DATELINE: WASHINGTON, Dec. 30 For years, experts have dubbed Czechoslovakia's spy agency the ''two Czech'' service. Agents of the Office for the Protection of State Secrets got one check from Prague, the pun goes, and another from their real bosses at K.G.B. headquarters in Moscow. Roy Godson, head of the Washington-based National Strategy Information Center and a well-known intelligence scholar, called any democratic change ''a net loss'' for Soviet intelligence. But he cautioned against euphoria. ''The Soviets wouldn't have relied on just official cooperation,'' he said. ''It would be surprising if they haven't unilaterally penetrated friendly services with their own agents, too.'' ", "title": "Upheaval in the East: Espionage;"}
folder2.json:
{"date": "December 31, 1989, Sunday, Late Edition - Final", "body": "SURVIVING the decline in the economy will be the overriding issue for 1990, say leaders of the county's business community. But facing business owners are numerous problems, from taxes and regulations at all levels of government to competition from other businesses in and out of Westchester. Successful Westchester business owners will face and overcome these risks and obstacles. Westchester is a land of opportunity for the business owner. ", "title": "Coping With the Economic Prospects of 1990"}
{"date": "December 29, 1989, Friday, Late Edition - Final", "body": "Eastern Airlines said yesterday that it was laying off 600 employees, mostly managers, and cutting wages by 10 percent or 20 percent for about half its work force. Thomas J. Matthews, Eastern's senior vice president of human resources, estimated that the measures would save the carrier about $100 million a year. Eastern plans to rebuild by making Atlanta its primary hub and expects to operate about 75 percent of its flights from there. ", "title": "Eastern Plans Wage Cuts, 600 Layoffs"}
I will like to create a common list of all these json files but only with the body of each article. So far I am trying the following:
json1 <- lapply(readLines("folder1.json"), fromJSON)
json2 <- lapply(readLines("folder2.json"), fromJSON)
jsonl <- list(json1$body, json2$body)
But it is not working. Any suggestions?
Andres Azqueta
Solution:
You need to derence the the fromJSON(), in the sapply() to only retrieve the body.
fromJSON()$body
Note: I am assuming the file format from you previous question
The point being the file format is sudo JSON, hence the modified fromJSON() call below.
OK, Let step through an example:
Stage 1: Concatenate JSON files into 1
filelist <- c("./data/NYT_1989.json", "./data/NYT_1990.json")
newJSON <- sapply(filelist, function(x) fromJSON(sprintf("[%s]", paste(readLines(x), collapse = ",")), flatten = FALSE))
newJSON[2]# Extract bodies
newJSON[5]# Extract bodies
Output
filelist <- c("./data/NYT_1989.json", "./data/NYT_1990.json")
> newJSON <- sapply(filelist, function(x) fromJSON(sprintf("[%s]", paste(readLines(x), collapse = ",")), flatten = FALSE))
> newJSON[2]# Extract bodies
[[1]]
[1] "Frigid temperatures across much of the United States this month sent demand for heating oil soaring, providing a final upward jolt to crude oil prices. Some spot crude traded at prices up 40 percent or more from a year ago. Will these prices hold? Five experts on oil offer their views. That's assuming the economy performs as expected - about 1 percent growth in G.N.P. The other big uncertainty is the U.S.S.R. If their production drops more than 4 percent, prices could stengthen. "
[2] "DATELINE: WASHINGTON, Dec. 30 For years, experts have dubbed Czechoslovakia's spy agency the ''two Czech'' service. But he cautioned against euphoria. ''The Soviets wouldn't have relied on just official cooperation,'' he said. ''It would be surprising if they haven't unilaterally penetrated friendly services with their own agents, too.'' "
[3] "SURVIVING the decline in the economy will be the overriding issue for 1990, say leaders of the county's business community. Successful Westchester business owners will face and overcome these risks and obstacles. Westchester is a land of opportunity for the business owner. "
> newJSON[5]# Extract bodies
[[1]]
[1] "Blue temperatures across much of the United States this month sent demand for heating oil soaring, providing a final upward jolt to crude oil prices. Some spot crude traded at prices up 40 percent or more from a year ago. Will these prices hold? Five experts on oil offer their views. That's assuming the economy performs as expected - about 1 percent growth in G.N.P. The other big uncertainty is the U.S.S.R. If their production drops more than 4 percent, prices could stengthen. "
[2] "BLUE1: WASHINGTON, Dec. 30 For years, experts have dubbed Czechoslovakia's spy agency the ''two Czech'' service. But he cautioned against euphoria. ''The Soviets wouldn't have relied on just official cooperation,'' he said. ''It would be surprising if they haven't unilaterally penetrated friendly services with their own agents, too.'' "
[3] "GREEN4 the decline in the economy will be the overriding issue for 1990, say leaders of the county's business community. Successful Westchester business owners will face and overcome these risks and obstacles. Westchester is a land of opportunity for the business owner. "
Stage 2: Concatenate and extract the body from all files...
Look for the reference to fromJSON()$body in code line...
filelist <- c("./data/NYT_1989.json", "./data/NYT_1990.json")
newJSON <- sapply(filelist, function(x) fromJSON(sprintf("[%s]", paste(readLines(x), collapse = ",")), flatten = FALSE)$body)
newJSON
Output
> filelist <- c("./data/NYT_1989.json", "./data/NYT_1990.json")
> newJSON <- sapply(filelist, function(x) fromJSON(sprintf("[%s]", paste(readLines(x), collapse = ",")), flatten = FALSE)$body)
> newJSON
./data/NYT_1989.json
[1,] "Frigid temperatures across much of the United States this month sent demand for heating oil soaring, providing a final upward jolt to crude oil prices. Some spot crude traded at prices up 40 percent or more from a year ago. Will these prices hold? Five experts on oil offer their views. That's assuming the economy performs as expected - about 1 percent growth in G.N.P. The other big uncertainty is the U.S.S.R. If their production drops more than 4 percent, prices could stengthen. "
[2,] "DATELINE: WASHINGTON, Dec. 30 For years, experts have dubbed Czechoslovakia's spy agency the ''two Czech'' service. But he cautioned against euphoria. ''The Soviets wouldn't have relied on just official cooperation,'' he said. ''It would be surprising if they haven't unilaterally penetrated friendly services with their own agents, too.'' "
[3,] "SURVIVING the decline in the economy will be the overriding issue for 1990, say leaders of the county's business community. Successful Westchester business owners will face and overcome these risks and obstacles. Westchester is a land of opportunity for the business owner. "
./data/NYT_1990.json
[1,] "Blue temperatures across much of the United States this month sent demand for heating oil soaring, providing a final upward jolt to crude oil prices. Some spot crude traded at prices up 40 percent or more from a year ago. Will these prices hold? Five experts on oil offer their views. That's assuming the economy performs as expected - about 1 percent growth in G.N.P. The other big uncertainty is the U.S.S.R. If their production drops more than 4 percent, prices could stengthen. "
[2,] "BLUE1: WASHINGTON, Dec. 30 For years, experts have dubbed Czechoslovakia's spy agency the ''two Czech'' service. But he cautioned against euphoria. ''The Soviets wouldn't have relied on just official cooperation,'' he said. ''It would be surprising if they haven't unilaterally penetrated friendly services with their own agents, too.'' "
[3,] "GREEN4 the decline in the economy will be the overriding issue for 1990, say leaders of the county's business community. Successful Westchester business owners will face and overcome these risks and obstacles. Westchester is a land of opportunity for the business owner. "
require(RJSONIO)
json_1<- fromJSON("~/folder1/1.json")
json_2<- fromJSON("~/folder2/2.json")
jsonl <- list(json1$body, json2$body)
I have several JSON files with texts in grouped into date, body and title. As an example consider:
{"date": "December 31, 1990, Monday, Late Edition - Final", "body": "World stock markets begin 1991 facing the threat of a war in the Persian Gulf, recessions or economic slowdowns around the world, and dismal earnings -- the same factors that drove stock markets down sharply in 1990. Finally, there is the problem of the Soviet Union, the wild card in everyone's analysis. It is a country whose problems could send stock markets around the world reeling if something went seriously awry. With Russia about to implode, that just adds to the risk premium, said Mr. Dhar. LOAD-DATE: December 30, 1990 ", "title": "World Markets;"}
{"date": "December 30, 1992, Sunday, Late Edition - Final", "body": "DATELINE: CHICAGO Gleaming new tractors are becoming more familiar sights on America's farms. Sales and profits at the three leading United States tractor makers -- Deere & Company, the J.I. Case division of Tenneco Inc. and the Ford Motor Company's Ford New Holland division -- are all up, reflecting renewed agricultural prosperity after the near-depression of the early and mid-1980's. But the recovery in the tractor business, now in its third year, is fragile. Tractor makers hope to install computers that can digest this information, then automatically concentrate the application of costly fertilizer and chemicals on the most productive land. Within the next 15 years, that capability will be commonplace, predicted Mr. Ball. LOAD-DATE: December 30, 1990 ", "title": "All About/Tractors;"}
I have three different newspapers with separate files containing all the texts produced for the period 1989 - 2016. My ultimate goal is to combine all the texts into a single corpus. I have done it in Python using the pandas library and I am wondering if it could be done in R similarly. Here is my code with the loop in R:
for (i in 1989:2016){
df0 = pd.DataFrame([json.loads(l) for l in open('NYT_%d.json' % i)])
df1 = pd.DataFrame([json.loads(l) for l in open('USAT_%d.json' % i)])
df2 = pd.DataFrame([json.loads(l) for l in open('WP_%d.json' % i)])
appended_data.append(df0)
appended_data.append(df1)
appended_data.append(df2)
}
Use jsonlite::stream_in to read your files and jsonlite::rbind.pages to combine them.
There many options in R to read json file and convert them to a data.frame/data.table.
Here one using jsonlite and data.table:
library(data.table)
library(jsonlite)
res <- lapply(1989:2016,function(i){
ff <- c('NYT_%d.json','USAT_%d.json' ,'WP_%d.json')
list_files_paths <- sprintf(ff,i)
rbindlist(lapply(list_files_paths,fromJSON))
})
Here res is a list of data.table. If you want to aggregate all data.table in a single data.table:
rbindlist(res)
Use ndjson::stream_in to read them in faster and flatter than jsonlite::stream_in :-)
I saw several posts about converting json into csv in R, but I have
this error:
Error in fromJSON(file = "C:/users/emily/destop/data.json") : argument "txt" is missing, with no default) after running following code.
mydf <- fromJSON(file= "C:/users/emily/destop/data.json")
I downloaded jsonlite package.
It seems that the command doesn't read the json data correctly or cannot convert it into csv file.
A few sample data from data.json looks as follows:
{"reviewerID": "A3TS466QBAWB9D", "asin": "0014072149", "reviewerName": "Silver Pencil", "helpful": [0, 0], "reviewText": "If you are a serious violin student on a budget, this edition has it all: Piano accompaniment, low price, urtext solo parts, and annotations by Maestro David Oistrakh where Bach's penmanship is hard to decipher. Additions (in dashes) are easily distinguishable from the original bowings. This is a delightful concerto that all intermediate level violinists should perform with a violin buddy. Get your copy, today, along with \"The Green Violin; Theory, Ear Training, and Musicianship for Violinists\" book to prepare for this concerto and for more advanced playing!", "overall": 5.0, "summary": "Perform it with a friend, today!", "unixReviewTime": 1370476800, "reviewTime": "06 6, 2013"}
{"reviewerID": "A3BUDYITWUSIS7", "asin": "0041291905", "reviewerName": "joyce gabriel cornett", "helpful": [0, 0], "reviewText": "This is and excellent edition and perfectly true to the orchestral version! It makes playing Vivaldi a joy! I uses this for a wedding and was totally satisfied with the accuracy!", "overall": 5.0, "summary": "Vivalldi's Four Seasons", "unixReviewTime": 1381708800, "reviewTime": "10 14, 2013"}
{"reviewerID": "A2HR0IL3TC4CKL", "asin": "0577088726", "reviewerName": "scarecrow \"scarecrow\"", "helpful": [0, 0], "reviewText": "this was written for Carin Levine in 2008, but not premiered until 2011 at the Musica Viva Fest in Munich. .the work's premise maybe about the arduousness, nefarious of existence, how we all \"Work\" at life, at complexity, at densities of differing lifeworlds. Ferneyhough's music might suggest that these multiple dimensions of an ontology exist in diagonals across differing spectrums of human cognition, how we come to think about an object,aisthetic shaped ones, and expressionistic ones as his music.The work has a nice classical shape,and holds the \"romantic\" at bay; a mere 7 plus minutes for Alto Flute, a neglected wind cadre member.The work has gorgeous arresting moments with a great bounty of extended timbres-pointillistic bursts\" Klangfarben Sehr Kraeftig\" that you do grow weary of; it is almost predictable now hearing these works; you know they will inhabit a dense timbral space of mega-speed lines tossed in all registers;still one listens; that gap Freud speaks about, that we know we need this at some level. . .the music slowed at times for structural rigour.. . we have a dramatic causality at play in the subject,(the work's title) the arduousness of pushing, aspiring, working toward something; pleasurable illuminating or not, How about emancipation from itself;I guess we need forget the production of surplus value herein,even with the rebellions in current urban areas today; It has no place. . . these constructions are leftovers from modernity, the \"gods\" still hover. . .\"gods\" that Ferneyhough has no power to dispel. . . . All are now commonplace for the new music literature of music.This music still sound quite stunning, marvelous and evocative today,it is simple at some level, direct, unencumbered and violent with spit-tongues,gratuitous lines, fluttertongue,percussive slap-keys,tremoli wistful glissandi harmonics, fast filigreed lines, and simply threadbare melos, an austere fragment of what was a melody. . .Claudio Arrau said someplace that the performer the musician must emanate a \"work\" while playing music, a \"struggle\", aesthetic or otherwise, Sviatoslav Richter thought this grotesque, to look at a musician playing the great music. It was ugly for him. . .You can hear Ms.Levine on youtube playing her work, she is quite convincing, you always need to impart an authority,succored in an emotive focus that the music itself has not succumbed to your own possession. You play the music, it doesn't \"play\" you. . . I'd hope though that music with this arduous construction and structural vigour that it would in fact come to possess the performer. . .it is one of the last libidinal pleasures remaining. . .", "overall": 5.0, "summary": "arduous indeed!", "unixReviewTime": 1371168000, "reviewTime": "06 14, 2013"}
{"reviewerID": "A2DHYD72O52WS5", "asin": "0634029231", "reviewerName": "Amazon Customer \"RCC\"", "helpful": [0, 0], "reviewText": "Greg Koch is a knowledgable and charismatic host. He is seriously fun to watch. The main problem with the video is the format. The lack of on-screen tab is a serious flaw. You have to watch very carefully, have a good understanding of the minor pentatonic, and basic foundation of blues licks to even have a chance at gleening anything from this video.If you're just starting out, pick up the IN THE STYLE OF series. While this series has its limitations (incomplete songs due to copyright, no doubt), it has on screen tab and each lick is played at a reasonably slow speed. In addition, their web site has downloadable tab.However, if you can hold your own in the minor pentatonic, give this a try. It is quite a workout and you'll find yourself a better player having taken on the challenge.", "overall": 3.0, "summary": "GREAT! BUT NOT FOR BEGINNERS.", "unixReviewTime": 1119571200, "reviewTime": "06 24, 2005"}
{"reviewerID": "A1MUVHT8BONL5K", "asin": "0634029347", "reviewerName": "Amazon Customer \"clapton music fan\"", "helpful": [2, 12], "reviewText": "I bought this DVD and I'm returning it. The description and editorial review are misleading. This is NOT a Clapton video. Certainly some clips from Clapton, but generally this is a \"how to\" video. Same applies to Clapton The Early Years!", "overall": 2.0, "summary": "NOT CLAPTION MUSIC VIDEO! A Learn How To Play Guitar LIKE Clapton", "unixReviewTime": 1129334400, "reviewTime": "10 15, 2005"}
Eventually I would like to read the data correctly and convert the data into csv file.
The following should work
library(RJSONIO)
# test2.json is a sample json file
# I removed the reviewText field to keep it short
# Also tested this out with 1 row of data
D2 <- RJSONIO::fromJSON("./test2.json")
# convert the numeric vector helpful to one string
D2$helpful <- paste(D2$helpful, collapse = " ")
D2
reviewerID asin reviewerName helpful
[1,] "A3TS466QBAWB9D" "0014072149" "Silver Pencil" "0 0"
D3 <- do.call(cbind, D2)
write.csv(D3, "D3.csv")
I am looking to merge 150 small JSON files (all formatted the same way with same variables) which I have imported into R via jsonlite.
The problem is that each file imports as list of 1. I can get an individual to convert to dataframe, but cannot find a way to systematically convert all.
The goal is the merge all into a single dataset.
An example from a JSON file:
{
"data": [
{
"EventId": "20020528X00745",
"narrative": "NTSB investigators may not have traveled in support of this investigation and used data provided by various sources to prepare this aircraft accident report.During the dark night cross-country flight, while at a cruise altitude of 2.000 feet msl, the pilot initiated a climb to 3,000 feet. A few minutes later, the engine's rpm dropped 200-300 rpm. The 67-hour pilot increased throttle to check for an rpm response. Subsequently, the engine lost power, and a forced landing was initiated. While approaching to land, the pilot noticed trees in front of the airplanes flight path and started looking for another place to land, but couldn't see anything because it was too dark. Subsequently, the aircraft impacted tress coming to rest upright. An examination of the engine under the supervision of an FAA inspector, revealed the left magneto's internal gears did not rotate with the engine. Removal of the left magneto revealed only one of two rubber drive isolators inside the ignition harness cap. Internal inspection revealed the contact points on the left hand side of the magneto did not open on rotation. Further examination of the airplane, displayed the ignition key turned to the left magneto only. The pilot reported to the NTSB investigator-in-charge, that he did not touch any switch while exiting the aircraft.",
"probable_cause": "The pilot's failure to set the ignition key to the both magnetos position, which resulted in a loss of engine power. Contributing factors were the failure of the left magneto, the lack of suitable terrain for the forced landing, and the dark night."
},
{
"EventId": "20090414X14441",
"narrative": "NTSB investigators used data provided by various entities, including, but not limited to, the Federal Aviation Administration and/or the operator and did not travel in support of this investigation to prepare this aircraft accident report.The pilot was following a highway to the northwest at 10,000 feet mean sea level. He crossed the mountain pass between 700 and 1,000 feet above ground level climbing slowly. Once on the west side of the pass, approaching the base of some cliffs, they encountered a strong down draft and the airspeed dropped rapidly and the airplane started to descend. The pilot reports that he attempted to keep the airspeed at 85 knots and climb but, that the airplane continued to lose altitude. He checked the engine instruments and did not note any degradation of engine performance. The airplane continued to descend. The pilot executed a forced landing in approximately the center of the valley ahead of them. The pilot reported that there were no preimpact mechanical malfunctions or failures. Based on the temperature and pressure readings from the closest weather reporting station, the density altitude at the accident site was about 9,200 feet.",
"probable_cause": "The pilot's encounter with a windshear/downdraft that exceeded the climb performance capabilities of the airplane."
},
Import in using fromJSON(file_000.json) -- creates a "large list"
After import, df <- file_000.json$data produces a dataframe with 3 variables
However, I do not know of a way to create 150 new dfs from the large list inputs. I have tried apply, do.call, functions, loops.
Two more than work for individual dataframes, but don't get me to the 150 I need:
test2 <- as.data.frame(file_000.json$data)
test3 <- unnest(file_000.json)
library(dplyr)
library(jsonlite)
x <- '{
"data": [
{
"EventId": "20020528X00745",
"narrative": "NTSB investigators",
"probable_cause": "The pilots failure"
},
{
"EventId": "asdfasfasfasfasdasdf",
"narrative": "NTSB investigators",
"probable_cause": "The pilots failure"
},
{
"EventId": "asdfafsdf",
"narrative": "NTSB investigators",
"probable_cause": "The pilots failure"
}
]
}
'
files <- replicate(10, tempfile(fileext = ".json"))
for (i in seq_along(files)) cat(x, file = files[i])
dplyr::bind_rows(lapply(files, function(z) {
jsonlite::fromJSON(z)$data
}))
#> Source: local data frame [30 x 3]
#>
#> EventId narrative probable_cause
#> (chr) (chr) (chr)
#> 1 20020528X00745 NTSB investigators The pilots failure
#> 2 asdfasfasfasfasdasdf NTSB investigators The pilots failure
#> 3 asdfafsdf NTSB investigators The pilots failure
#> 4 20020528X00745 NTSB investigators The pilots failure
#> 5 asdfasfasfasfasdasdf NTSB investigators The pilots failure
#> 6 asdfafsdf NTSB investigators The pilots failure
#> 7 20020528X00745 NTSB investigators The pilots failure
#> 8 asdfasfasfasfasdasdf NTSB investigators The pilots failure
#> 9 asdfafsdf NTSB investigators The pilots failure
#> 10 20020528X00745 NTSB investigators The pilots failure
#> .. ... ... ...