Flatten nested JSON columns in Pandas - json

I'm trying to find an easy way to flatten a nested JSON present in a dataframe column. The dataframe column looks as follows:
stock Name Annual
x Tesla {"0": {"date": "2020","dateFormatted": "2020-12-31","sharesMln": "3856.2405","shares": 3856240500},"1": {"date": "2019","dateFormatted": "2019-12-31","sharesMln": "3856.2405","shares": 3856240500}}
y Google {"0": {"date": "2020","dateFormatted": "2020-12-31","sharesMln": "2526.4506","shares": 2526450600},"1": {"date": "2019","dateFormatted": "2019-12-31","sharesMln": "2526.4506","shares": 2526450600},"2": {"date": "2018","dateFormatted": "2018-12-31","sharesMln": "2578.0992","shares": 2578099200}}
z Big Apple {}
How do I convert the above dataframe to:
Stock Name date dateFormatted sharesMln shares
x Tesla 2020 2020-12-31 3856.2405 3856240500
x Tesla 2019 2019-12-31 3856.2405 3856240500
y Google 2020 2020-12-31 2526.4506 2526450600
y Google 2019 2019-12-31 2526.4506 2526450600
y Google 2018 2018-12-31 2578.0992 2578099200
z Big Apple None None None None
I've tried using pd.json_normalize(dataframe['Annual'],max_level=1) but struggling to get the desired result as mentioned above.
Any pointers will be appreciated.

Get values from dicts and transform each element of the list to a row with explode while index is duplicated. Then, expand the nested dict (values of your first dict) to columns. Finally, you have to join your original dataframe with the new dataframe.
>>> df
stock Name Annual
0 x Tesla {'0': {'date': '2020', 'dateFormatted': '2020-...
1 y Google {'0': {'date': '2020', 'dateFormatted': '2020-...
2 z Big Apple {}
data = df['Annual'].apply(lambda x: x.values()) \
.explode() \
.apply(pd.Series)
df = df.join(data).drop(columns='Annual')
Output result:
>>> df
stock Name date dateFormatted sharesMln shares
0 x Tesla 2020 2020-12-31 3856.2405 3.856240e+09
0 x Tesla 2019 2019-12-31 3856.2405 3.856240e+09
1 y Google 2020 2020-12-31 2526.4506 2.526451e+09
1 y Google 2019 2019-12-31 2526.4506 2.526451e+09
1 y Google 2018 2018-12-31 2578.0992 2.578099e+09
2 z Big Apple NaN NaN NaN NaN

Related

Convert JSON keys to set in different columns with preset headers in pandas

Want to implement dataframe expansion from JSON data with preset values. Collating all the key values within JSON data to be set as column headers
id |Name | MonthWise
0 |ABC |{'102022':{'val':100, 'count':1}}
1 |XYZ |{'102022':{'val':20,'count':5},'092022':{'val':20,'count':2}}
2 |DEF |{}
3 |PQR |{'082022':{'val':50,'count':3}}
Here df containing MonthWise column which has JSON objects, which needs to transposed into 12 different columns like 'MMYYYY' (Yearly data)
Something like:
id |Name |042022.val | 042022.count |....|102022.val | 102022.count|....| 032023.val| 032023.count
0 |ABC |nan|nan|....|100|1|....|nan|nan
1 |XYZ |nan|nan|....|20|5|....|nan|nan
2 |DEF |nan|nan|....|nan|nan|....|nan|nan
3 |PQR |nan|nan|....|nan|nan|....|nan|nan
I have tried with df['MonthWise'].apply(pd.json_normalize(x, max_level=1)) but no success.
There is no need for apply in this case. You may use pd.json_normalize as follows
import pandas as pd
# sample data
df = pd.DataFrame({
'id': [0, 1, 2, 3],
'Name': ['ABC', 'XYZ', 'DEF', 'PQR'],
'MonthWise': [{'102022':{'val':100, 'count':1}}, {'102022':{'val':20,'count':5},'092022':{'val':20,'count':2}}, {}, {'082022':{'val':50,'count':3}}]
})
# normalize
result = pd.concat([df[['id','Name']], pd.json_normalize(df['MonthWise'])], axis=1)
This returns
id Name 102022.val 102022.count 092022.val 092022.count 082022.val 082022.count
0 0 ABC 100.0 1.0 NaN NaN NaN NaN
1 1 XYZ 20.0 5.0 20.0 2.0 NaN NaN
2 2 DEF NaN NaN NaN NaN NaN NaN
3 3 PQR NaN NaN NaN NaN 50.0 3.0
(I believe that the expected result in your original post is inconsistent with the input dataframe)

Undefined columns selected using panelvar package

Have anyone used panel var in R?
Currently I'm using the package panelvar of R. And I'm getting this error :
Error in `[.data.frame`(data, , c(colnames(data)[panel_identifier], required_vars)) :
undefined columns selected
And my syntax currently is:
model1<-pvargmm(
dependent_vars = c("Change.."),
lags = 2,
exog_vars = c("Price"),
transformation = "fd",
data = base1,
panel_identifier = c("id", "t"),
steps = c("twostep"),
system_instruments = FALSE,
max_instr_dependent_vars = 99,
min_instr_dependent_vars = 2L,
collapse = FALSE)
I don't know why my panel_identifier is not working, it's pretty similar to the example given by panelvar package, however, it doesn't work, I want to appoint that base1 is on data.frame format. any ideas? Also, my data is structured like this:
head(base1)
id t country DDMMYY month month_text day Date_txt year Price Open
1 1 1296 China 1-4-2020 4 Apr 1 Apr 01 2020 12588.24 12614.82
2 1 1295 China 31-3-2020 3 Mar 31 Mar 31 2020 12614.82 12597.61
High Low Vol. Change..
1 12775.83 12570.32 NA -0.0021
2 12737.28 12583.05 NA 0.0014
thanks in advance !
Check the documentation of the package and the SSRN paper. For me it helped to ensure all entered formats are identical (you can check this with str(base1) command). For example they write:
library(panelvar)
data("Dahlberg")
ex1_dahlberg_data <-
pvargmm(dependent_vars = .......
When I look at it I get
>str(Dahlberg)
'data.frame': 2385 obs. of 5 variables:
$ id : Factor w/ 265 levels "114","115","120",..: 1 1 1 1 1 1 1 1 1 2 ...
$ year : Factor w/ 9 levels "1979","1980",..: 1 2 3 4 5 6 7 8 9 1 ...
$ expenditures: num 0.023 0.0266 0.0273 0.0289 0.0226 ...
$ revenues : num 0.0182 0.0209 0.0211 0.0234 0.018 ...
$ grants : num 0.00544 0.00573 0.00566 0.00589 0.00559 ...
For example the input data must be a data.frame (in my case it had additional type specifications like tibble or data.table). I resolved it by casting as.data.frame() on it.

Storing birthday in database, what type to use if there's missing values

I am collecting sports data for a database I will be creating soon, and one of the columns I have for one of my tables is player birthdays. I am using R to scrape and collect the data, and here is what it currently looks like:
head(my_df)
Name BirthDate Age Place Height Weight
1 Mike NA NA NA
2 Austin August 17, 1994 22.295 Roseville, California 73 210
3 Koby January 19, 1997 20.140 Wolfforth, Texas 70 255
4 Ty 1991 26.000 Amarillo, Texas 70 165
5 Cole NA NA NA
6 Jeff July 24, 1995 21.320 Boulder, Colorado 72 200
dput(head(my_df))
structure(list(Name = c("Mike", "Austin", "Koby", "Ty", "Cole",
"Jeff"), BirthDate = c("", "August 17, 1994", "January 19, 1997",
"1991", "", "July 24, 1995"), Age = c(NA, 22.295, 20.14, 26,
NA, 21.32), Place = c("", "Roseville, California", "Wolfforth, Texas",
"Amarillo, Texas", "", "Boulder, Colorado"), Height = c(NA, 73,
70, 70, NA, 72), Weight = c(NA, 210L, 255L, 165L, NA, 200L)), .Names = c("Name",
"BirthDate", "Age", "Place", "Height", "Weight"), row.names = 25:30, class = "data.frame")
The BirthDate column has two inconsistencies that make appropriate formatting difficult:
there are entirely missing values
for some columns, I have the birthyear but am missing Day and Month.
I'm relatively new with databases and am trying to plan ahead with this. Does anybody have any thoughts on the best way to handle this? To be more specific, how can I write the R code to format the column the right way?
Not a complete answer but a direction to try...
Genealogy databases, which have similar problems, will often have an extra column (a sort of meta-date column) that expresses confidence in, or quality of, the birth-date column. This can then be used at query-time to mediate the results returned. Could be "estimated", "not given", "year only" to suit your app.
If we intend to do calculations on date as date, or on year and month columns, for example: the age of the player, how many of them were born on January, etc. then we can can separate it into 3 numeric columns.
do.call(rbind,
lapply(my_df$BirthDate, function(i){
x <- rev(unlist(strsplit(i, " ")))
res <- data.frame(
BirthDate = i,
YYYY = as.numeric(x[1]),
DD = as.numeric(sub(",", "", x[2], fixed = TRUE)),
MM = match(x[3], month.name))
res$DOB <- as.Date(paste(res$DD, res$MM, res$YYYY, sep = "/"),
format = "%d/%m/%Y")
res
}))
# BirthDate YYYY DD MM DOB
# 1 NA NA NA <NA>
# 2 August 17, 1994 1994 17 8 1994-08-17
# 3 January 19, 1997 1997 19 1 1997-01-19
# 4 1991 1991 NA NA <NA>
# 5 NA NA NA <NA>
# 6 July 24, 1995 1995 24 7 1995-07-24

Using JSON schema as column headers in dataframe

Ok, as per a previous question (here) I've now managed to read a load of JSON data into R and to get the data into a data frame. here's the code:-
getCall <- GET("http://long-url.com",
authenticate("myusername", "password"))
contJSON <- content(getCall)
contJSON = sub("\n\r\n", "", contJSON)
df1 <- fromJSON(sprintf("[%s]", gsub("\n", ",", contJSON)), asText=TRUE)
df <- data.frame(matrix(unlist(df1), nrow=31, byrow=T))
Which gets me a data frame that looks as follows:-
head(df[,1:8])
X1 X2 X3 X4 X5 X6 X7 X8
1 2013-05-01 33682 11838 8023 3815 84 177.000000 177.000000
2 2013-05-02 32622 11626 7945 3681 58 210.000000 210.000000
3 2013-05-03 28467 11102 7786 3316 56 186.000000 186.000000
4 2013-05-04 20884 9031 6670 2361 51 7.000000 7.000000
5 2013-05-05 20481 8782 6390 2392 58 1.000000 1.000000
6 2013-05-06 25175 10019 7082 2937 62 24.000000 24.000000
However, there are no column names in my data frame. When I search for "names" in my JSON object R returns "NULL" so that doesn't give me anything useful.
I am wondering if there is any simple way (that might be repeatable on more general cases) to get the names for the column headers from the JSON schema.
I'm aware there are similar questions elsewhere on the site, but this one did not appear to be covered.
EDIT: As per the comment, here is the structure of the contJSON object.
"{\"metricDate\":\"2013-05-01\",\"pageCountTotal\":\"33682\",\"landCountTotal\":\"11838\",\"newLandCountTotal\":\"8023\",\"returnLandCountTotal\":\"3815\",\"spiderCountTotal\":\"84\",\"goalCountTotal\":\"177.000000\",\"callGoalCountTotal\":\"177.000000\",\"callCountTotal\":\"237.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.50\",\"callConversionPerc\":\"74.68\"}\n{\"metricDate\":\"2013-05-02\",\"pageCountTotal\":\"32622\",\"landCountTotal\":\"11626\",\"newLandCountTotal\":\"7945\",\"returnLandCountTotal\":\"3681\",\"spiderCountTotal\":\"58\",\"goalCountTotal\":\"210.000000\",\"callGoalCountTotal\":\"210.000000\",\"callCountTotal\":\"297.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.81\",\"callConversionPerc\":\"70.71\"}\n{\"metricDate\":\"2013-05-03\",\"pageCountTotal\":\"28467\",\"landCountTotal\":\"11102\",\"newLandCountTotal\":\"7786\",\"returnLandCountTotal\":\"3316\",\"spiderCountTotal\":\"56\",\"goalCountTotal\":\"186.000000\",\"callGoalCountTotal\":\"186.000000\",\"callCountTotal\":\"261.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.68\",\"callConversionPerc\":\"71.26\"}\n{\"metricDate\":\"2013-05-04\",\"pageCountTotal\":\"20884\",\"landCountTotal\":\"9031\",\"newLandCountTotal\":\"6670\",\"returnLandCountTotal\":\"2361\",\"spiderCountTotal\":\"51\",\"goalCountTotal\":\"7.000000\",\"callGoalCountTotal\":\"7.000000\",\"callCountTotal\":\"44.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.08\",\"callConversionPerc\":\"15.91\"}\n{\"metricDate\":\"2013-05-05\",\"pageCountTotal\":\"20481\",\"landCountTotal\":\"8782\",\"newLandCountTotal\":\"6390\",\"returnLandCountTotal\":\"2392\",\"spiderCountTotal\":\"58\",\"goalCountTotal\":\"1.000000\",\"callGoalCountTotal\":\"1.000000\",\"callCountTotal\":\"8.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.01\",\"callConversionPerc\":\"12.50\"}\n{\"metricDate\":\"2013-05-06\",\"pageCountTotal\":\"25175\",\"landCountTotal\":\"10019\",\"newLandCountTotal\":\"7082\",\"returnLandCountTotal\":\"2937\",\"spiderCountTotal\":\"62\",\"goalCountTotal\":\"24.000000\",\"callGoalCountTotal\":\"24.000000\",\"callCountTotal\":\"47.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.24\",\"callConversionPerc\":\"51.06\"}\n{\"metricDate\":\"2013-05-07\",\"pageCountTotal\":\"35892\",\"landCountTotal\":\"12615\",\"newLandCountTotal\":\"8391\",\"returnLandCountTotal\":\"4224\",\"spiderCountTotal\":\"62\",\"goalCountTotal\":\"239.000000\",\"callGoalCountTotal\":\"239.000000\",\"callCountTotal\":\"321.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.89\",\"callConversionPerc\":\"74.45\"}\n{\"metricDate\":\"2013-05-08\",\"pageCountTotal\":\"34106\",\"landCountTotal\":\"12391\",\"newLandCountTotal\":\"8389\",\"returnLandCountTotal\":\"4002\",\"spiderCountTotal\":\"90\",\"goalCountTotal\":\"221.000000\",\"callGoalCountTotal\":\"221.000000\",\"callCountTotal\":\"295.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.78\",\"callConversionPerc\":\"74.92\"}\n{\"metricDate\":\"2013-05-09\",\"pageCountTotal\":\"32721\",\"landCountTotal\":\"12447\",\"newLandCountTotal\":\"8541\",\"returnLandCountTotal\":\"3906\",\"spiderCountTotal\":\"54\",\"goalCountTotal\":\"207.000000\",\"callGoalCountTotal\":\"207.000000\",\"callCountTotal\":\"280.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.66\",\"callConversionPerc\":\"73.93\"}\n{\"metricDate\":\"2013-05-10\",\"pageCountTotal\":\"29724\",\"landCountTotal\":\"11616\",\"newLandCountTotal\":\"8063\",\"returnLandCountTotal\":\"3553\",\"spiderCountTotal\":\"139\",\"goalCountTotal\":\"207.000000\",\"callGoalCountTotal\":\"207.000000\",\"callCountTotal\":\"301.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.78\",\"callConversionPerc\":\"68.77\"}\n{\"metricDate\":\"2013-05-11\",\"pageCountTotal\":\"22061\",\"landCountTotal\":\"9660\",\"newLandCountTotal\":\"6971\",\"returnLandCountTotal\":\"2689\",\"spiderCountTotal\":\"52\",\"goalCountTotal\":\"3.000000\",\"callGoalCountTotal\":\"3.000000\",\"callCountTotal\":\"40.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.03\",\"callConversionPerc\":\"7.50\"}\n{\"metricDate\":\"2013-05-12\",\"pageCountTotal\":\"23341\",\"landCountTotal\":\"9935\",\"newLandCountTotal\":\"6960\",\"returnLandCountTotal\":\"2975\",\"spiderCountTotal\":\"45\",\"goalCountTotal\":\"0.000000\",\"callGoalCountTotal\":\"0.000000\",\"callCountTotal\":\"12.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.00\",\"callConversionPerc\":\"0.00\"}\n{\"metricDate\":\"2013-05-13\",\"pageCountTotal\":\"36565\",\"landCountTotal\":\"13583\",\"newLandCountTotal\":\"9277\",\"returnLandCountTotal\":\"4306\",\"spiderCountTotal\":\"69\",\"goalCountTotal\":\"246.000000\",\"callGoalCountTotal\":\"246.000000\",\"callCountTotal\":\"324.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.81\",\"callConversionPerc\":\"75.93\"}\n{\"metricDate\":\"2013-05-14\",\"pageCountTotal\":\"35260\",\"landCountTotal\":\"13797\",\"newLandCountTotal\":\"9375\",\"returnLandCountTotal\":\"4422\",\"spiderCountTotal\":\"59\",\"goalCountTotal\":\"212.000000\",\"callGoalCountTotal\":\"212.000000\",\"callCountTotal\":\"283.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.54\",\"callConversionPerc\":\"74.91\"}\n{\"metricDate\":\"2013-05-15\",\"pageCountTotal\":\"35836\",\"landCountTotal\":\"13792\",\"newLandCountTotal\":\"9532\",\"returnLandCountTotal\":\"4260\",\"spiderCountTotal\":\"94\",\"goalCountTotal\":\"187.000000\",\"callGoalCountTotal\":\"187.000000\",\"callCountTotal\":\"258.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.36\",\"callConversionPerc\":\"72.48\"}\n{\"metricDate\":\"2013-05-16\",\"pageCountTotal\":\"33136\",\"landCountTotal\":\"12821\",\"newLandCountTotal\":\"8755\",\"returnLandCountTotal\":\"4066\",\"spiderCountTotal\":\"65\",\"goalCountTotal\":\"192.000000\",\"callGoalCountTotal\":\"192.000000\",\"callCountTotal\":\"260.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.50\",\"callConversionPerc\":\"73.85\"}\n{\"metricDate\":\"2013-05-17\",\"pageCountTotal\":\"29564\",\"landCountTotal\":\"11721\",\"newLandCountTotal\":\"8191\",\"returnLandCountTotal\":\"3530\",\"spiderCountTotal\":\"213\",\"goalCountTotal\":\"166.000000\",\"callGoalCountTotal\":\"166.000000\",\"callCountTotal\":\"222.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.42\",\"callConversionPerc\":\"74.77\"}\n{\"metricDate\":\"2013-05-18\",\"pageCountTotal\":\"23686\",\"landCountTotal\":\"9916\",\"newLandCountTotal\":\"7335\",\"returnLandCountTotal\":\"2581\",\"spiderCountTotal\":\"56\",\"goalCountTotal\":\"5.000000\",\"callGoalCountTotal\":\"5.000000\",\"callCountTotal\":\"34.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.05\",\"callConversionPerc\":\"14.71\"}\n{\"metricDate\":\"2013-05-19\",\"pageCountTotal\":\"23528\",\"landCountTotal\":\"9952\",\"newLandCountTotal\":\"7184\",\"returnLandCountTotal\":\"2768\",\"spiderCountTotal\":\"57\",\"goalCountTotal\":\"1.000000\",\"callGoalCountTotal\":\"1.000000\",\"callCountTotal\":\"14.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.01\",\"callConversionPerc\":\"7.14\"}\n{\"metricDate\":\"2013-05-20\",\"pageCountTotal\":\"37391\",\"landCountTotal\":\"13488\",\"newLandCountTotal\":\"9024\",\"returnLandCountTotal\":\"4464\",\"spiderCountTotal\":\"69\",\"goalCountTotal\":\"227.000000\",\"callGoalCountTotal\":\"227.000000\",\"callCountTotal\":\"291.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.68\",\"callConversionPerc\":\"78.01\"}\n{\"metricDate\":\"2013-05-21\",\"pageCountTotal\":\"36299\",\"landCountTotal\":\"13174\",\"newLandCountTotal\":\"8817\",\"returnLandCountTotal\":\"4357\",\"spiderCountTotal\":\"77\",\"goalCountTotal\":\"164.000000\",\"callGoalCountTotal\":\"164.000000\",\"callCountTotal\":\"221.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.24\",\"callConversionPerc\":\"74.21\"}\n{\"metricDate\":\"2013-05-22\",\"pageCountTotal\":\"34201\",\"landCountTotal\":\"12433\",\"newLandCountTotal\":\"8388\",\"returnLandCountTotal\":\"4045\",\"spiderCountTotal\":\"76\",\"goalCountTotal\":\"195.000000\",\"callGoalCountTotal\":\"195.000000\",\"callCountTotal\":\"262.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.57\",\"callConversionPerc\":\"74.43\"}\n{\"metricDate\":\"2013-05-23\",\"pageCountTotal\":\"32951\",\"landCountTotal\":\"11611\",\"newLandCountTotal\":\"7757\",\"returnLandCountTotal\":\"3854\",\"spiderCountTotal\":\"68\",\"goalCountTotal\":\"167.000000\",\"callGoalCountTotal\":\"167.000000\",\"callCountTotal\":\"231.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.44\",\"callConversionPerc\":\"72.29\"}\n{\"metricDate\":\"2013-05-24\",\"pageCountTotal\":\"28967\",\"landCountTotal\":\"10821\",\"newLandCountTotal\":\"7396\",\"returnLandCountTotal\":\"3425\",\"spiderCountTotal\":\"106\",\"goalCountTotal\":\"167.000000\",\"callGoalCountTotal\":\"167.000000\",\"callCountTotal\":\"203.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.54\",\"callConversionPerc\":\"82.27\"}\n{\"metricDate\":\"2013-05-25\",\"pageCountTotal\":\"19741\",\"landCountTotal\":\"8393\",\"newLandCountTotal\":\"6168\",\"returnLandCountTotal\":\"2225\",\"spiderCountTotal\":\"78\",\"goalCountTotal\":\"0.000000\",\"callGoalCountTotal\":\"0.000000\",\"callCountTotal\":\"28.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.00\",\"callConversionPerc\":\"0.00\"}\n{\"metricDate\":\"2013-05-26\",\"pageCountTotal\":\"19770\",\"landCountTotal\":\"8237\",\"newLandCountTotal\":\"6009\",\"returnLandCountTotal\":\"2228\",\"spiderCountTotal\":\"79\",\"goalCountTotal\":\"0.000000\",\"callGoalCountTotal\":\"0.000000\",\"callCountTotal\":\"8.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.00\",\"callConversionPerc\":\"0.00\"}\n{\"metricDate\":\"2013-05-27\",\"pageCountTotal\":\"26208\",\"landCountTotal\":\"9755\",\"newLandCountTotal\":\"6779\",\"returnLandCountTotal\":\"2976\",\"spiderCountTotal\":\"82\",\"goalCountTotal\":\"26.000000\",\"callGoalCountTotal\":\"26.000000\",\"callCountTotal\":\"40.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.27\",\"callConversionPerc\":\"65.00\"}\n{\"metricDate\":\"2013-05-28\",\"pageCountTotal\":\"36980\",\"landCountTotal\":\"12463\",\"newLandCountTotal\":\"8226\",\"returnLandCountTotal\":\"4237\",\"spiderCountTotal\":\"132\",\"goalCountTotal\":\"208.000000\",\"callGoalCountTotal\":\"208.000000\",\"callCountTotal\":\"276.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.67\",\"callConversionPerc\":\"75.36\"}\n{\"metricDate\":\"2013-05-29\",\"pageCountTotal\":\"34190\",\"landCountTotal\":\"12014\",\"newLandCountTotal\":\"8279\",\"returnLandCountTotal\":\"3735\",\"spiderCountTotal\":\"90\",\"goalCountTotal\":\"179.000000\",\"callGoalCountTotal\":\"179.000000\",\"callCountTotal\":\"235.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.49\",\"callConversionPerc\":\"76.17\"}\n{\"metricDate\":\"2013-05-30\",\"pageCountTotal\":\"33867\",\"landCountTotal\":\"11965\",\"newLandCountTotal\":\"8231\",\"returnLandCountTotal\":\"3734\",\"spiderCountTotal\":\"63\",\"goalCountTotal\":\"160.000000\",\"callGoalCountTotal\":\"160.000000\",\"callCountTotal\":\"219.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.34\",\"callConversionPerc\":\"73.06\"}\n{\"metricDate\":\"2013-05-31\",\"pageCountTotal\":\"27536\",\"landCountTotal\":\"10302\",\"newLandCountTotal\":\"7333\",\"returnLandCountTotal\":\"2969\",\"spiderCountTotal\":\"108\",\"goalCountTotal\":\"173.000000\",\"callGoalCountTotal\":\"173.000000\",\"callCountTotal\":\"226.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.68\",\"callConversionPerc\":\"76.55\"}\n\r\n"
One thing that works is to split on newlines, call from JSON on each row, then recombine the result.
contJSON <- sub("\n\r\n", "", contJSON) #as before
rowJSON <- strsplit(contJSON, "\n")[[1]]
row <- lapply(rowJSON, fromJSON)
as.data.frame(do.call(rbind, row))

Matlab's nanmean( ) function not working with dimensions other than 1

Take this example from the mathworks help of nanmean():
X = magic(3);
X([1 6:9]) = repmat(NaN,1,5)
X =
NaN 1 NaN
3 5 NaN
4 NaN NaN
>> y = nanmean(X,2)
??? Error using ==> nanmean
Too many input arguments.
Why is it showing error even when the docs say the mean can be taken in any dimension dim of X as y = nanmean(X,dim)? Thanks.
I run exactly the code you have and I get no error. In particlar here is what I ran:
>> X = magic(3);
X([1 6:9]) = repmat(NaN,1,5)
X =
NaN 1 NaN
3 5 NaN
4 NaN NaN
>> y = nanmean(X,2)
y =
1
4
4
>> which nanmean
C:\Program Files\MATLAB\R2010b\toolbox\stats\stats\nanmean.m
The only thing I can think of is that you have a different version of nanmean.m on your path. Try a which nanmean and see if it points into the stats toolbox.
here is the reason:
If X contains a vector of all NaN values along some dimension, the vector is empty once the NaN values are removed, so the sum of the remaining elements is 0. Since the mean involves division by 0, its value is NaN. The output NaN is not a mean of NaN values.
Look at:
http://www.mathworks.com/help/toolbox/stats/nanmean.html