Make a dataframe from a json list - json

I would like convert data from json in a data frame in R. I tried with the package data.tree but i get only a data.frame with only NA...
library(dplyr)
library(jsonlite)
library(data.tree)
library(magrittr)
data<-fromJSON("http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/nama_gdp_c?precision=1&geo=EU28&unit=EUR_HAB&time=2010&time=2011&indic_na=B1GM")
repos<-as.Node(data)
repos %>% ToDataFrameTable(valeur=function(x) x$repos$value,annee= function(x) x$repos$dimension$time$category$label)
I tried this too:
repos %>% ToDataFrameTable(valeur=function(x) x$value,annee= function(x) x$dimension$time$category$label)
But here there is just a two columns data empty
I tried directly this
as.data.frame(valeur=data$value,annee=data$dimension$time$category$label)
but i get this :
"Error in as.data.frame(valeur = data$value, annee = data$dimension$time$category$label) : argument "x" is missing, with no default"
If someone know something...

How about this?
library(rjson)
js <- fromJSON(file="http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/nama_gdp_c?precision=1&geo=EU28&unit=EUR_HAB&time=2010&time=2011&indic_na=B1GM")
df <- data.frame(years=stack(js$dimension$time$category$label)$value,
value=stack(js$value)$value,
country=stack(js$dimension$geo$category$label)$value)
df
Output is:
years value country
1 2010 24400 European Union (28 countries)
2 2011 25100 European Union (28 countries)

Related

Issue loading HTML Table into R

I want to load the table at the bottom of the following webpage into R, either as a dataframe or table: https://www.lawschooldata.org/school/Yale%20University/18. My first instinct was to use the readHTMLTable function in the XML package
library(XML)
url <- "https://www.lawschooldata.org/school/Yale%20University/18"
##warning message after next line
table <- readHTMLTable(url)
table
However, this returns an empty list and gives me the following warning:
Warning message:XML content does not seem to be XML: ''
I also tried adapting code I found here Scraping html tables into R data frames using the XML package. This worked for 5 of the 6 tables on the page, but just returned the header row and one row with values from the header row for the 6th table, which is the one I am interested in. Code below:
library(XML)
library(RCurl)
library(rlist)
theurl <- getURL("https://www.lawschooldata.org/school/Yale%20University/18",.opts = list(ssl.verifypeer = FALSE) )
tables <- readHTMLTable(theurl)
##generates a list of the 6 tables on the page
tables <- list.clean(tables, fun = is.null, recursive = FALSE)
##takes the 6th table, which is the one I am interested in
applicanttable <- tables[[6]]
##the problem is that this 6th table returns just the header row and one row of values
##equal to those the header row
head(applicanttable)
Any insights would be greatly appreciated! For reference, I have also consulted the following posts that appear to have similar goals, but could not find a solution there:
Scraping html tables into R data frames using the XML package
Extracting html table from a website in R
The data is dynamically pulled from a nested JavaScript array, within a script tag when JavaScript runs in the browser. This doesn't happen when you use rvest to retrieve the non-rendered content (as seen in view-source).
You can regex out the appropriate nested array and then re-construct the table by splitting out the rows, adding the appropriate headers and performing some data manipulations on various columns e.g. some columns contain html which needs to be parsed to obtain the desired value.
As some columns e.g. Name contain values which could be interpreted as file paths , when using read_html, I use htmltidy to ensure handling as valid html.
N.B. If you use RSelenium then the page will render and you can just grab the table direct without reconstructing it.
TODO:
There are still some data type manipulations you could choose to apply to a few columns.
There is some more logic to be applied to ensure only Name is returned in Name column. Take the case of df$Name[10], this returns "Character and fitness issues" instead of Anxiousboy, due to the required value actually sitting in element.nextSibling.nextSibling of the p tag which is actually selected. These, infrequent, edge cases, need some additional logic built in. In this case, you might test for a particular string being returned then resort to re-parsing with an xpath expression.
R:
library(rvest)
#> Loading required package: xml2
#> Warning: package 'xml2' was built under R version 4.0.3
library(stringr)
library(htmltidy)
#> Warning: package 'htmltidy' was built under R version 4.0.3
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
get_value <- function(input) {
value <- tidy_html(input) %>%
read_html() %>%
html_node("a, p, span") %>%
html_text(trim = T)
result <- ifelse(is.na(value), input, value)
return(result)
}
tidy_result <- function(result) {
return(gsub("<.*", "", result))
}
page <- read_html("https://www.lawschooldata.org/school/Yale%20University/18")
s <- page %>% toString()
headers <- page %>%
html_nodes("#applicants-table th") %>%
html_text(trim = T)
s <- stringr::str_extract(s, regex("DataTable\\(\\{\n\\s+data:(.*\\n\\]\\n\\])", dotall = T)) %>%
gsub("\n", "", .)
rows <- stringr::str_extract_all(s, regex("(\\[.*?\\])", dotall = T))[[1]] %>% as.list()
df <- sapply(rows, function(x) {
stringr::str_match_all(x, "'(.*?)'")[[1]][, 2]
}) %>%
t() %>%
as_tibble(.name_repair = "unique")
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> * `` -> ...3
#> * `` -> ...4
#> * `` -> ...5
#> * ...
names(df) <- headers
df <- df %>%
rowwise() %>%
mutate(across(c("Name", "GRE", "URM", "$$$$"), .f = get_value)) %>%
mutate_at(c("Result"), tidy_result)
write.csv(df, "Yale Applications.csv")
Created on 2021-06-23 by the reprex package (v0.3.0)
Sample output:

How to download a html table with inconsistent number of columns in R?

I´m currently trying to download a table from the following URL:
url1<-"http://iambweb.ams.or.at/ambweb/showcognusServlet?tabkey=3643193&regionDisplay=%C3%96sterreich&export=html&outputLocale=de"
I downloaded and saved the file as .xls because I thought it is a Excel-file with the following code:
temp <- paste0(tempfile(), ".xls")
download.file(url1, destfile = temp, mode = "wb")
First I tried to read it in R as a Excel file but it seems to be a html (can be read by Excel though):
dfAMS <- read_excel(path = temp, sheet = "Sheet1", range = "I7:I37")
Therefore:
df <- read_html(temp)
Now unfortunately I´m stuck because the following lines of code won´t give me the intended result (a nice table or at least column I7:I37 in the .xls):
dfAMS <- html_node(df, "table") %>% html_table(fill = T) %>% tibble::as_tibble()
dplyr::glimpse(df)
I´m pretty sure the solution is rather simple but I´m currently stuck and can´t find a solution...
Thanks in advance!
Klamsi, the url points to an html file renamed to have a ".xls" extension. This is somewhat common practice among webmasters. Try it yourself by renaming the ".xls" extention to ".html".
A second problem is that the html has a very messy table configuration. The table of interest is the fifth table in the document.
This is a workaround to obtain the values for the overall population (or "range A7:B37, I7:K37")
url <- "http://iambweb.ams.or.at/ambweb/showcognusServlet?tabkey=3643193&regionDisplay=%C3%96sterreich&export=html&outputLocale=en"
df <- read_html(url) %>%
html_table(header = TRUE, fill = TRUE) %>%
.[[5]] %>% #Extract the fifth table in the list
as.data.frame() %>%
.[,c(1:11)] %>%
select(1:2, 9:11)
names <- unlist(df[1,])
names[1:2] <- c("item", "Bundesland")
colnames(df) <- names
df <- df[-1,]
df %>% head()
item Bundesland Bestand Veränderung zum VJ absolut Veränderung zum VJ in %
2 Arbeitslosigkeit Bgld 7119 -973 -0.120242214532872
3 Arbeitslosigkeit Ktn 16564 -2160 -0.115359965819269
4 Arbeitslosigkeit NÖ 46342 -6095 -0.116234719758949
5 Arbeitslosigkeit OÖ 29762 -4649 -0.135102147569091
6 Arbeitslosigkeit Sbg 11173 -643 -0.0544177386594448
7 Arbeitslosigkeit Stmk 28677 -5602 -0.1634236704688

Reading complex json data as dataframe in R

I have the following json data:
json_data <- data.frame(changedContent=c('{"documents":[],"images":[],"profileCommunications":[],"shortListedProfiles":[],"matrimonyUser":{"createdBy":null,"parentMatrimonyUserId":0,"userSalutationVal":"Mr.","matrimonyUserCode":"173773","matrimonyUserName":"SUDIPTO DEB BARMAN","emailAddress":"sudipto06#yahoo.com","contactNumber":"9434944429","emailOTP":"","mobilePhoneOTP":"","isEmailOTPVerified":1,"isMobilePhoneOTPverified":1,"isHideContact":null,"isHideEmail":null,"lastLogInTime":null,"userSessionDtlId":null,"modifiedBy":4444,"modifiedOn":1440167133028,"isDeleted":null,"isActive":1,"isAllowedLogin":null,"numberOfChildProfile":null,"matrimonyUserTypeId":100000006,"matrimonyUserTypeVal":"Online Customer","onlineStatusFlag":null,"lastSystemTransactionDateTime":null,"isLive":null,"mobileCountryCode":0,"userStatusIdValue":"Registered and Verified","crmUserStatusIdValue":null,"deactivateReasonIdValue":null,"deactivateReason":null,"matrimonyUserId":165614,"userSalutationId":100001617,"userStatusId":100002760,"crmUserStatusId":null,"deactivateReasonId":null,"createdOn":null},"aboutMes":[],"partnerPreference":{"isSubcastDealbreaker":null,"isOccupationDealbreaker":null,"isIndustryDealbreaker":null,"isIncomeDealbreaker":null,"isHeightDealbreaker":null,"isBodyTypeDealbreaker":null,"isHivDealbreaker":null,"isFamilyTypeDealbreaker":null,"isFamilyIncomeDealbreaker":null,"isDrinkingDealbreaker":null,"locationTypeIds":null,"isLocationTypeDealbreaker":null,"isLocationNameDealbreaker":null,"locationNameOthers":"","isMaritalStatusDealbreaker":null,"isSmokingDealbreaker":null,"isFoodHabitsDealbreaker":null,"isGothraDealbreaker":null,"isManglikDealbreaker":null,"isProfileCreatedbyDealbreaker":null,"religionIdsValues":"","casteIdsValues":null,"motherTongueIdsValues":"","minimumEducationValues":"","occupationIdsValues":"","industryIdsValues":"","bodyTypeIdsValues":"","hivIdValue":null,"familyTypeIdsValues":"","familyIncomeValues":"","drinkingIdValues":"","locationNameIdsValues":null,"maritalStatusIdsValues":"","smokingIdsValues":"","foodHabitsIdsValues":"","gothraIdsValues":"","manglikIdValue":null,"profileCreatedbyValues":"","heightFrom":null,"heightTo":null,"createdBy":4444,"userSessionDtlId":null,"modifiedBy":4444,"modifiedOn":1440167133115,"isDeleted":null,"isActive":1,"partnerPreferenceId":2757,"isReligionDealbreaker":null,"casteIds":null,"isCasteDealbreaker":null,"isMotherTongueDealbreaker":null,"subcaste":"","religionIds":null,"motherTongueIds":null,"minimumEducation":null,"occupationIds":null,"industryIds":null,"bodyTypeIds":null,"income":null,"incomeValues":"","familyIncome":null,"hivId":0,"familyTypeIds":null,"drinkingId":null,"locationNameIds":"","maritalStatusIds":null,"smokingIds":null,"foodHabitsIds":null,"gothraIds":null,"manglikId":0,"profileCreatedby":null,"adbCount":0,"fifCount":0,"ageFrom":null,"ageTo":null,"isAgeDealbreaker":null,"isminimumEducationDealbreaker":null,"userId":165614,"createdOn":1440167133115,"height":null},"profileAgentDtl":{"campaignId":"","acquirerCode":0,"createdBy":4444,"modifiedBy":4444,"modifiedOn":1440167133110,"isDeleted":null,"isActive":1,"relationshipMangerId":0,"sourceCode":100000004,"userId":165614,"createdOn":1440167133110,"idOdNo":"","relationshipMangerName":null,"relationshipMangerContact":"","profileAgentDtlId":2757,"dateOfEntry":1437935400000,"formSerialNo":"3661","sourceCodeVal":null,"agentCode":null,"acquirerCodeVal":null,"agentName":"","agentMobileNo":"","adBookingNo":""},"profileBasicRegistrationDtl":{"sourceId":null,"createdBy":4444,"userSessionDtlId":null,"modifiedBy":4444,"modifiedOn":1440167133109,"isDeleted":null,"isActive":1,"genderId":100000596,"priorityId":100001671,"profileCreatedById":100000590,"webSourceId":100001672,"dob":null,"genderVal":"Male","userId":165614,"profileCompleteness":null,"createdOn":1440167133109,"profileDtlId":2757,"nickName":null,"relation":null,"regViewersCount":null,"guestViewersCount":null,"trustScore":20,"webSourceVal":"Newspaper ","priorityVal":"Medium","profileCreatedByval":"Self","fieldContentModerationStatusId":null,"photoModerationStatusId":null,"documentModerationStatusId":null,"isPhotoHide":null,"isHoroscopeHide":null},"profileAstrologyDtl":{"createdBy":4444,"userSessionDtlId":null,"modifiedBy":4444,"modifiedOn":1440167133111,"isDeleted":null,"isActive":1,"userId":165614,"createdOn":1440167133111,"profileAstrologyDtlId":2757,"gothraId":0,"gaanId":0,"nakshatraId":0,"sunSignId":0,"moonSignId":0,"manglikFlagId":0,"placeOfBirth":"0","timeOfBirth":null,"isPreferredPartnerDtl":null,"gothraVal":"","gaanVal":"","nakshatraVal":"","sunSignVal":"","moonSignVal":"","manglikFlagVal":""},"profileFamilyDtl":{"permanentAddress":null,"createdBy":4444,"userSessionDtlId":null,"modifiedBy":4444,"modifiedOn":1440167133111,"isDeleted":null,"isActive":1,"familyIncome":0.0,"fathersStatusId":0,"mothersStatusId":0,"fathersOccupationId":0,"mothersOccupationId":0,"mothersIndustryId":null,"fathersIndustryId":null,"familyTypeId":0,"familyValueId":0,"familyKindId":0,"familyStatusId":0,"userId":165614,"createdOn":1440167133111,"moderatedOn":null,"profileFamilyDtlId":2757,"fathersName":"","fathersStatusVal":null,"motherName":"","mothersStatusVal":null,"numberOfSibling":0,"shortRefModerationStatus":null,"fathersOccupationVal":null,"mothersOccupationVal":null,"familyTypeVal":null,"familyValueVal":null,"familyKindVal":null,"familyStatusVal":null,"mothersIndustryVal":null,"fathersIndustryVal":null,"familyIncomeVal":"","moderatedBy":null,"moderatorRemarks":null,"ref1fullName":null,"ref1relationship":null,"ref1emailId":null,"ref1phoneNo":null,"ref1remarks":null,"ref2fullName":null,"ref2relationship":null,"ref2emailId":null,"ref2phoneNo":null,"ref2remarks":null},"profileLifestyleDtl":{"createdBy":4444,"userSessionDtlId":null,"modifiedBy":4444,"modifiedOn":1440167133110,"isDeleted":null,"isActive":1,"favouriteBooksTypeIds":null,"favouriteHobbiesTypeIds":null,"favouriteMoviesTypeIds":null,"favouriteMusicTypeIds":null,"favouriteSportsTypeIds":null,"livingInHouseTypeId":0,"vehicleTypeOwnedId":0,"petsId":0,"drinkingStatusId":0,"numberOfKids":0,"userId":165614,"createdOn":1440167133110,"moderatedOn":null,"moderatedBy":null,"isModerated":null,"moderatorRemarks":null,"profileLifestyleDtlId":2757,"smokingStatusId":0,"foodHabitsId":0,"financialPlansId":0,"retirementPlansId":0,"vehicleDescription":null,"vehicleNumber":0,"childrenDesiredId":null,"isReligionImportantFlagId":null,"religiousBeliefs":0,"smokingStatusVal":"","drinkingStatusVal":null,"foodHabitsVal":"","financialPlansVal":null,"retirementPlansVal":null,"vehicleTypeOwnedVal":null,"livingInHouseTypeVal":null,"petsVal":null,"childrenDesiredVal":null,"favouriteBooksTypeVals":"","favouriteMoviesTypeVals":"","favouriteMusicTypeVals":"","favouriteSportsTypeVals":"","favouriteHobbiesTypeVals":"","isReligionImportantFlagVal":null,"religiousBeliefsVal":"","favouriteHobbiesRating":null,"favouriteHobbiesDescription":null,"noOfKidsVal":null},"profileOccupationEducationDtl":{"highestSpecializationVal":null,"highestSpecializationOthersVal":"","createdBy":4444,"userSessionDtlId":null,"modifiedBy":4444,"modifiedOn":1440167133110,"isDeleted":null,"isActive":1,"highestEducationId":null,"occupationId":null,"designationId":null,"incomeCurrencyId":null,"education2id":0,"education3id":0,"specialization2id":0,"specialization3id":0,"highestSpecializationId":null,"industryId":null,"annualIncome":null,"userId":165614,"createdOn":1440167133110,"moderatedOn":null,"moderatedBy":null,"isModerated":null,"moderatorRemarks":null,"highestEducationVal":null,"occupationVal":null,"industryVal":null,"incomeCurrencyVal":null,"designationVal":null,"education3val":null,"education2val":null,"specialization2val":null,"specialization2othersVal":"","specialization3val":null,"specialization3othersVal":"","additionalQualification":null,"professionalQualification":null,"occupationOthersVal":"","departmentId":null,"employmentSectorId":null,"companyName":"","highestEducationInstituteVal":null,"education2instituteVal":"0","education3instituteVal":"","professionalQualificationVal":null,"departmentVal":null,"employmentSectorVal":null,"annualIncomeVal":null,"profileOccupationEducationDtlId":2757,"schoolName2":"","schoolName1":"","education2instituteId":null,"education3instituteId":null},"profilePersonalDtl":{"createdBy":4444,"userSessionDtlId":null,"modifiedBy":4444,"modifiedOn":1440167133110,"isDeleted":null,"isActive":1,"familyOriginId":0,"stateId":null,"countryId":null,"numChildrenProspect":0,"countryVal":null,"stateVal":null,"landmark":null,"locationVal":null,"userId":165614,"locationId":null,"religionId":100000598,"createdOn":1440167133110,"isPreferredPartnerDtl":null,"maritalStatusId":null,"maritalStatusVal":null,"subCaste":"","profilePersonalDtlId":2757,"motherTongueId":100000618,"casteId":null,"marryOutsideCasteId":0,"familyOriginVal":null,"facebookHandle":"","linkedInHandle":"","twiterHandle":null,"googlePlus":null,"casteText":"Kshatriya","homeTownText":"0","religionVal":"Hindu","motherTongueVal":"Bengali","marryOutsideCasteVal":"","isSocialMediaVerified":null,"numChildrenProspectVal":null,"locality":null},"profilePhysicalAttributesDtl":{"createdBy":4444,"userSessionDtlId":null,"modifiedBy":4444,"modifiedOn":1440167133110,"isDeleted":null,"isActive":1,"hivId":0,"bodyTypeId":0,"complexionId":0,"bloodGroupId":0,"userId":165614,"createdOn":1440167133110,"height":null,"isPreferredPartnerDtl":null,"hairColourId":0,"eyeColourId":0,"hairLengthId":0,"physicalStatusId":null,"disabilitiesVal":"","hivVal":"","knownAilmentVal":"","bodyTypeVal":null,"complexionVal":null,"hairColourVal":"","eyeColourVal":"","hairLengthVal":"","physicalStatusVal":null,"bloodGroupVal":null,"profilePhysicalAttributesDtlId":2757,"weight":null},"profileSiblingsDtl":null,"profileImageDtl":null,"notes":[{"createdBy":4444,"modifiedBy":4444,"modifiedOn":1440167133115,"isDeleted":null,"isActive":1,"userId":165614,"createdOn":1440167133115,"profileNotesDtlId":3499,"notesDescription":""}],"references":[],"relationOthers":[],"photoIdentificationDetails":null,"preModAboutMes":[{"answer":"null ","preModerationAboutMeId":1439283144614540579,"moderationStatus":1,"createdBy":4444,"questionVal":null,"userSessionDtlId":null,"modifiedBy":4444,"modifiedOn":1440167133092,"isActive":1,"isAnswerChange":0,"userId":165614,"questionId":1,"createdOn":1440167133092},{"answer":"null ","preModerationAboutMeId":1439283144614540580,"moderationStatus":1,"createdBy":4444,"questionVal":null,"userSessionDtlId":null,"modifiedBy":4444,"modifiedOn":1440167133093,"isActive":1,"isAnswerChange":0,"userId":165614,"questionId":2,"createdOn":1440167133093},{"answer":"null ","preModerationAboutMeId":1439283144614540581,"moderationStatus":1,"createdBy":4444,"questionVal":null,"userSessionDtlId":null,"modifiedBy":4444,"modifiedOn":1440167133094,"isActive":1,"isAnswerChange":0,"userId":165614,"questionId":3,"createdOn":1440167133094},{"answer":"null ","preModerationAboutMeId":1439283144614540582,"moderationStatus":1,"createdBy":4444,"questionVal":null,"userSessionDtlId":null,"modifiedBy":4444,"modifiedOn":1440167133094,"isActive":1,"isAnswerChange":0,"userId":165614,"questionId":4,"createdOn":1440167133094}],"preModContent":[{"preModerationContentId":1439307323336466240,"isChangeMatrimonyUserName":null,"isChangeLocality":0,"isChangeLandmark":0,"permanentAddress":"Dev Barman,Mayapur,PO-Talbagicha,Kharadpur-721306","isChangePermanentAddress":1,"nameOfInstitutionHighestEducation":"0","highestSpecializationVal":null,"highestSpecializationOthersVal":null,"createdBy":4444,"matrimonyUserName":"SUDIPTO DEB BARMAN","userSessionDtlId":null,"modifiedBy":4444,"modifiedOn":1440167133104,"isDeleted":null,"isActive":1,"highestEducationId":0,"occupationId":0,"designationId":0,"incomeCurrencyId":null,"highestSpecializationId":0,"industryId":0,"annualIncome":0.0,"stateId":100000269,"countryId":100000101,"dob":520972200000,"countryVal":null,"stateVal":null,"landmark":"","userId":165614,"moderationStatusId":null,"createdOn":1440167133104,"isChangeNameOfInstitutionHighestEducation":0,"isChangeHighestSpecialization":0,"highestEducationVal":null,"isChangeHighestEducation":0,"occupationVal":null,"isChangeOccupation":0,"industryVal":"","isChangeIndustry":0,"incomeCurrencyVal":null,"isChangeIncomeCurrency":0,"customerTypeId":null,"customerTypeVal":null,"isChangeCustomerType":1,"isChangeDob":1,"maritalStatusId":100000900,"maritalStatusVal":"Never Married","isChangeMaritalStatus":1,"isChangeCountry":1,"isChangeState":1,"cityId":0,"isChangeCity":0,"cityVal":null,"isChangeAnnualIncome":0,"designationVal":null,"isChangeDesignation":0,"subCaste":null,"hometown":null,"isChangeSubCaste":null,"isChangeHometown":null,"ref1fullName":null,"isChangeRef1fullName":null,"ref1relationship":null,"isChangeRef1relationship":null,"ref1emailId":null,"isChangeRef1emailId":null,"ref1phoneNo":null,"isChangeRef1phoneNo":null,"ref1remarks":null,"isChangeRef1remarks":null,"ref2fullName":null,"isChangeRef2fullName":null,"ref2relationship":null,"isChangeRef2relationship":null,"ref2emailId":null,"isChangeRef2emailId":null,"ref2phoneNo":null,"isChangeRef2phoneNo":null,"ref2remarks":null,"isChangeRef2remarks":null,"typeOfCustomer":null,"isChangeTypeOfCustomer":null,"highestEducationInstituteId":null,"typeOfCustomerId":100000006,"locality":""}],"preModReferences":[],"preModShortReferences":[{"moderationStatus":null,"createdBy":4444,"userSessionDtlId":null,"modifiedBy":4444,"modifiedOn":1440167133099,"isDeleted":null,"isActive":1,"userId":165614,"createdOn":1440167133099,"isModerated":null,"premoderationprofileImageDtlId":1772,"ref1fullName":"","isChangeRef1fullName":0,"ref1relationship":"","isChangeRef1relationship":0,"ref1emailId":"","isChangeRef1emailId":0,"ref1phoneNo":null,"isChangeRef1phoneNo":0,"ref1remarks":null,"ref2fullName":"","isChangeRef2fullName":0,"ref2relationship":"","isChangeRef2relationship":0,"ref2emailId":"","isChangeRef2emailId":0,"ref2phoneNo":null,"isChangeRef2phoneNo":0,"ref2remarks":null}],"paymentTransactions":[],"userPlanMappings":[],"userFeatureMappings":[],"userPlanMapping":null,"blockedProfiles":[],"notMyTypeProfiles":[]}')
I want to convert the above to a convenient data frame with 1 row each MatrimonyUserId in the above.I have tried a few things but unable to get this in desired format.
Assuming you can wrangle the json data into a nested list....
x <- jsonlite::fromJSON(jsontext)
I've found it's easiest to parse complex list structures by using the pipe operator and frequently checking the structure (limited to 1 or 2 levels.
str1 <- function(x) str(x, 1)
str2 <- function(x) str(x, 2)
# for pipe operator
library("magittr")
x %>% str1
x %>% .[[1]] %>% str2
Etc.

Substring in Data Frame R

I have data from GPS log like this : (this data in rows of data frame columns)
{"mAccuracy":20.0,"mAltitude":0.0,"mBearing":0.0,"mElapsedRealtimeNanos":21677339000000,"mExtras":{"networkLocationSource":"cached","networkLocationType":"wifi","noGPSLocation":{"mAccuracy":20.0,"mAltitude":0.0,"mBearing":0.0,"mElapsedRealtimeNanos":21677339000000,"mHasAccuracy":true,"mHasAltitude":false,"mHasBearing":false,"mHasSpeed":false,"mIsFromMockProvider":false,"mLatitude":35.1811956,"mLongitude":126.9104909,"mProvider":"network","mSpeed":0.0,"mTime":1402801381486},"travelState":"stationary"},"mHasAccuracy":true,"mHasAltitude":false,"mHasBearing":false,"mHasSpeed":false,"mIsFromMockProvider":false,"mLatitude":35.1811956,"mLongitude":126.9104909,"mProvider":"network","mSpeed":0.0,"mTime":1402801381486,"timestamp":1402801665.512}
The problem is I only need Latitude and longitude value, so I think i can use substring and sappy for applying to all data in dataframe.
But I am not sure this way is handsome because when i use substring ex: substr("abcdef", 2, 4) so I need to count who many chars from beginning until "mLatitude" , so anybody can give suggestion the fast way to processing it?
Thank you to #mnel for answering question, it's work , but i still have problem
From mnel answer I've created function like this :
fgps <- function(x) {
out <- fromJSON(x)
c(out$mExtras$noGPSLocation$mLatitude,
out$mExtras$noGPSLocation$mLongitude)
}
and then this is my data :
gpsdata <- head(dfallgps[,4],2)
[1] "{\"mAccuracy\":23.128,\"mAltitude\":0.0,\"mBearing\":0.0,\"mElapsedRealtimeNanos\":76437488000000,\"mExtras\":{\"networkLocationSource\":\"cached\",\"networkLocationType\":\"wifi\",\"noGPSLocation\":{\"mAccuracy\":23.128,\"mAltitude\":0.0,\"mBearing\":0.0,\"mElapsedRealtimeNanos\":76437488000000,\"mHasAccuracy\":true,\"mHasAltitude\":false,\"mHasBearing\":false,\"mHasSpeed\":false,\"mIsFromMockProvider\":false,\"mLatitude\":35.1779956,\"mLongitude\":126.9089661,\"mProvider\":\"network\",\"mSpeed\":0.0,\"mTime\":1402894224187},\"travelState\":\"stationary\"},\"mHasAccuracy\":true,\"mHasAltitude\":false,\"mHasBearing\":false,\"mHasSpeed\":false,\"mIsFromMockProvider\":false,\"mLatitude\":35.1779956,\"mLongitude\":126.9089661,\"mProvider\":\"network\",\"mSpeed\":0.0,\"mTime\":1402894224187,\"timestamp\":1402894517.425}"
[2] "{\"mAccuracy\":1625.0,\"mAltitude\":0.0,\"mBearing\":0.0,\"mElapsedRealtimeNanos\":77069916000000,\"mExtras\":{\"networkLocationSource\":\"cached\",\"networkLocationType\":\"cell\",\"noGPSLocation\":{\"mAccuracy\":1625.0,\"mAltitude\":0.0,\"mBearing\":0.0,\"mElapsedRealtimeNanos\":77069916000000,\"mHasAccuracy\":true,\"mHasAltitude\":false,\"mHasBearing\":false,\"mHasSpeed\":false,\"mIsFromMockProvider\":false,\"mLatitude\":35.1811881,\"mLongitude\":126.9084072,\"mProvider\":\"network\",\"mSpeed\":0.0,\"mTime\":1402894857416},\"travelState\":\"stationary\"},\"mHasAccuracy\":true,\"mHasAltitude\":false,\"mHasBearing\":false,\"mHasSpeed\":false,\"mIsFromMockProvider\":false,\"mLatitude\":35.1811881,\"mLongitude\":126.9084072,\"mProvider\":\"network\",\"mSpeed\":0.0,\"mTime\":1402894857416,\"timestamp\":1402894857.519}"
When run sapply why the data still shows in the result not just the results values.
sapply(gpsdata, function(gpsdata) fgps(gpsdata))
{"mAccuracy":23.128,"mAltitude":0.0,"mBearing":0.0,"mElapsedRealtimeNanos":76437488000000,"mExtras":{"networkLocationSource":"cached","networkLocationType":"wifi","noGPSLocation":{"mAccuracy":23.128,"mAltitude":0.0,"mBearing":0.0,"mElapsedRealtimeNanos":76437488000000,"mHasAccuracy":true,"mHasAltitude":false,"mHasBearing":false,"mHasSpeed":false,"mIsFromMockProvider":false,"mLatitude":35.1779956,"mLongitude":126.9089661,"mProvider":"network","mSpeed":0.0,"mTime":1402894224187},"travelState":"stationary"},"mHasAccuracy":true,"mHasAltitude":false,"mHasBearing":false,"mHasSpeed":false,"mIsFromMockProvider":false,"mLatitude":35.1779956,"mLongitude":126.9089661,"mProvider":"network","mSpeed":0.0,"mTime":1402894224187,"timestamp":1402894517.425}
[1,] 35.178
[2,] 126.909
{"mAccuracy":1625.0,"mAltitude":0.0,"mBearing":0.0,"mElapsedRealtimeNanos":77069916000000,"mExtras":{"networkLocationSource":"cached","networkLocationType":"cell","noGPSLocation":{"mAccuracy":1625.0,"mAltitude":0.0,"mBearing":0.0,"mElapsedRealtimeNanos":77069916000000,"mHasAccuracy":true,"mHasAltitude":false,"mHasBearing":false,"mHasSpeed":false,"mIsFromMockProvider":false,"mLatitude":35.1811881,"mLongitude":126.9084072,"mProvider":"network","mSpeed":0.0,"mTime":1402894857416},"travelState":"stationary"},"mHasAccuracy":true,"mHasAltitude":false,"mHasBearing":false,"mHasSpeed":false,"mIsFromMockProvider":false,"mLatitude":35.1811881,"mLongitude":126.9084072,"mProvider":"network","mSpeed":0.0,"mTime":1402894857416,"timestamp":1402894857.519}
[1,] 35.18119
[2,] 126.90841
I want the result looks like :
[1] 35.178 126.909
[2] 35.18119 126.90841
Thank you
It would appear that your data is in JSON format. Therefore, use a RJSONIO::fromJSON to read the file.
E.g.:
txt <- "{\"mAccuracy\":20.0,\"mAltitude\":0.0,\"mBearing\":0.0,\"mElapsedRealtimeNanos\":21677339000000,\"mExtras\":{\"networkLocationSource\":\"cached\",\"networkLocationType\":\"wifi\",\"noGPSLocation\":{\"mAccuracy\":20.0,\"mAltitude\":0.0,\"mBearing\":0.0,\"mElapsedRealtimeNanos\":21677339000000,\"mHasAccuracy\":true,\"mHasAltitude\":false,\"mHasBearing\":false,\"mHasSpeed\":false,\"mIsFromMockProvider\":false,\"mLatitude\":35.1811956,\"mLongitude\":126.9104909,\"mProvider\":\"network\",\"mSpeed\":0.0,\"mTime\":1402801381486},\"travelState\":\"stationary\"},\"mHasAccuracy\":true,\"mHasAltitude\":false,\"mHasBearing\":false,\"mHasSpeed\":false,\"mIsFromMockProvider\":false,\"mLatitude\":35.1811956,\"mLongitude\":126.9104909,\"mProvider\":\"network\",\"mSpeed\":0.0,\"mTime\":1402801381486,\"timestamp\":1402801665.512}"
Then process:
library(RJSONIO)
out <- fromJSON(txt)
out$$mLongitude
#[1] 126.9105
out$mLatitude
#[1] 35.1812
# to process multiple values
tt <- rep(txt,2)
myData <- lapply(tt, fromJSON)
latlong <- do.call(rbind,lapply(myData, `[` ,c('mLatitude','mLongitude')))
# or using rbind list
library(data.table)
latlong <- rbindlist(lapply(myData, `[` ,c('mLatitude','mLongitude')))

Importing data from a JSON file into R [duplicate]

This question already has answers here:
Parse JSON with R
(6 answers)
Closed 2 years ago.
Is there a way to import data from a JSON file into R? More specifically, the file is an array of JSON objects with string fields, objects, and arrays. The RJSON Package isn't very clear on how to deal with this http://cran.r-project.org/web/packages/rjson/rjson.pdf.
First install the rjson package:
install.packages("rjson")
Then:
library("rjson")
json_file <- "http://api.worldbank.org/country?per_page=10&region=OED&lendingtype=LNX&format=json"
json_data <- fromJSON(paste(readLines(json_file), collapse=""))
Update: since version 0.2.1
json_data <- fromJSON(file=json_file)
jsonlite will import the JSON into a data frame. It can optionally flatten nested objects. Nested arrays will be data frames.
> library(jsonlite)
> winners <- fromJSON("winners.json", flatten=TRUE)
> colnames(winners)
[1] "winner" "votes" "startPrice" "lastVote.timestamp" "lastVote.user.name" "lastVote.user.user_id"
> winners[,c("winner","startPrice","lastVote.user.name")]
winner startPrice lastVote.user.name
1 68694999 0 Lamur
> winners[,c("votes")]
[[1]]
ts user.name user.user_id
1 Thu Mar 25 03:13:01 UTC 2010 Lamur 68694999
2 Thu Mar 25 03:13:08 UTC 2010 Lamur 68694999
An alternative package is RJSONIO. To convert a nested list, lapply can help:
l <- fromJSON('[{"winner":"68694999", "votes":[
{"ts":"Thu Mar 25 03:13:01 UTC 2010", "user":{"name":"Lamur","user_id":"68694999"}},
{"ts":"Thu Mar 25 03:13:08 UTC 2010", "user":{"name":"Lamur","user_id":"68694999"}}],
"lastVote":{"timestamp":1269486788526,"user":
{"name":"Lamur","user_id":"68694999"}},"startPrice":0}]'
)
m <- lapply(
l[[1]]$votes,
function(x) c(x$user['name'], x$user['user_id'], x['ts'])
)
m <- do.call(rbind, m)
gives information on the votes in your example.
If the URL is https, like used for Amazon S3, then use getURL
json <- fromJSON(getURL('https://s3.amazonaws.com/bucket/my.json'))
First install the RJSONIO and RCurl package:
install.packages("RJSONIO")
install.packages("(RCurl")
Try below code using RJSONIO in console
library(RJSONIO)
library(RCurl)
json_file = getURL("https://raw.githubusercontent.com/isrini/SI_IS607/master/books.json")
json_file2 = RJSONIO::fromJSON(json_file)
head(json_file2)
load the packages:
library(httr)
library(jsonlite)
I have had issues converting json to dataframe/csv. For my case I did:
Token <- "245432532532"
source <- "http://......."
header_type <- "applcation/json"
full_token <- paste0("Bearer ", Token)
response <- GET(n_source, add_headers(Authorization = full_token, Accept = h_type), timeout(120), verbose())
text_json <- content(response, type = 'text', encoding = "UTF-8")
jfile <- fromJSON(text_json)
df <- as.data.frame(jfile)
then from df to csv.
In this format it should be easy to convert it to multiple .csvs if needed.
The important part is content function should have type = 'text'.
import httr package
library(httr)
Get the url
url <- "http://www.omdbapi.com/?apikey=72bc447a&t=Annie+Hall&y=&plot=short&r=json"
resp <- GET(url)
Print content of resp as text
content(resp, as = "text")
Print content of resp
content(resp)
Use content() to get the content of resp, but this time do not specify
a second argument. R figures out automatically that you're dealing
with a JSON, and converts the JSON to a named R list.