Convert JSON keys to set in different columns with preset headers in pandas - json

Want to implement dataframe expansion from JSON data with preset values. Collating all the key values within JSON data to be set as column headers
id |Name | MonthWise
0 |ABC |{'102022':{'val':100, 'count':1}}
1 |XYZ |{'102022':{'val':20,'count':5},'092022':{'val':20,'count':2}}
2 |DEF |{}
3 |PQR |{'082022':{'val':50,'count':3}}
Here df containing MonthWise column which has JSON objects, which needs to transposed into 12 different columns like 'MMYYYY' (Yearly data)
Something like:
id |Name |042022.val | 042022.count |....|102022.val | 102022.count|....| 032023.val| 032023.count
0 |ABC |nan|nan|....|100|1|....|nan|nan
1 |XYZ |nan|nan|....|20|5|....|nan|nan
2 |DEF |nan|nan|....|nan|nan|....|nan|nan
3 |PQR |nan|nan|....|nan|nan|....|nan|nan
I have tried with df['MonthWise'].apply(pd.json_normalize(x, max_level=1)) but no success.

There is no need for apply in this case. You may use pd.json_normalize as follows
import pandas as pd
# sample data
df = pd.DataFrame({
'id': [0, 1, 2, 3],
'Name': ['ABC', 'XYZ', 'DEF', 'PQR'],
'MonthWise': [{'102022':{'val':100, 'count':1}}, {'102022':{'val':20,'count':5},'092022':{'val':20,'count':2}}, {}, {'082022':{'val':50,'count':3}}]
})
# normalize
result = pd.concat([df[['id','Name']], pd.json_normalize(df['MonthWise'])], axis=1)
This returns
id Name 102022.val 102022.count 092022.val 092022.count 082022.val 082022.count
0 0 ABC 100.0 1.0 NaN NaN NaN NaN
1 1 XYZ 20.0 5.0 20.0 2.0 NaN NaN
2 2 DEF NaN NaN NaN NaN NaN NaN
3 3 PQR NaN NaN NaN NaN 50.0 3.0
(I believe that the expected result in your original post is inconsistent with the input dataframe)

Related

Flatten nested JSON columns in Pandas

I'm trying to find an easy way to flatten a nested JSON present in a dataframe column. The dataframe column looks as follows:
stock Name Annual
x Tesla {"0": {"date": "2020","dateFormatted": "2020-12-31","sharesMln": "3856.2405","shares": 3856240500},"1": {"date": "2019","dateFormatted": "2019-12-31","sharesMln": "3856.2405","shares": 3856240500}}
y Google {"0": {"date": "2020","dateFormatted": "2020-12-31","sharesMln": "2526.4506","shares": 2526450600},"1": {"date": "2019","dateFormatted": "2019-12-31","sharesMln": "2526.4506","shares": 2526450600},"2": {"date": "2018","dateFormatted": "2018-12-31","sharesMln": "2578.0992","shares": 2578099200}}
z Big Apple {}
How do I convert the above dataframe to:
Stock Name date dateFormatted sharesMln shares
x Tesla 2020 2020-12-31 3856.2405 3856240500
x Tesla 2019 2019-12-31 3856.2405 3856240500
y Google 2020 2020-12-31 2526.4506 2526450600
y Google 2019 2019-12-31 2526.4506 2526450600
y Google 2018 2018-12-31 2578.0992 2578099200
z Big Apple None None None None
I've tried using pd.json_normalize(dataframe['Annual'],max_level=1) but struggling to get the desired result as mentioned above.
Any pointers will be appreciated.
Get values from dicts and transform each element of the list to a row with explode while index is duplicated. Then, expand the nested dict (values of your first dict) to columns. Finally, you have to join your original dataframe with the new dataframe.
>>> df
stock Name Annual
0 x Tesla {'0': {'date': '2020', 'dateFormatted': '2020-...
1 y Google {'0': {'date': '2020', 'dateFormatted': '2020-...
2 z Big Apple {}
data = df['Annual'].apply(lambda x: x.values()) \
.explode() \
.apply(pd.Series)
df = df.join(data).drop(columns='Annual')
Output result:
>>> df
stock Name date dateFormatted sharesMln shares
0 x Tesla 2020 2020-12-31 3856.2405 3.856240e+09
0 x Tesla 2019 2019-12-31 3856.2405 3.856240e+09
1 y Google 2020 2020-12-31 2526.4506 2.526451e+09
1 y Google 2019 2019-12-31 2526.4506 2.526451e+09
1 y Google 2018 2018-12-31 2578.0992 2.578099e+09
2 z Big Apple NaN NaN NaN NaN

How can I merge/join multiple columns from two dataframes, depending on a matching pattern

I would like to merge two dataframes based on similar patterns in the chromosome column. I made various attempts with R & BASH such as with "data.table" "tidyverse", & merge(). Could someone help me by providing alternative solutions in R, BASH, Python, Perl, etc. for solving this solution? I would like to merge based on the chromosome information and retain both counts/RXNs.
NOTE: These two DFs are not aligned and I am also curious what happens if some values are missing.
Thanks and Cheers:
DF1:
Chromosome;RXN;ID
1009250;q9hxn4;NA
1010820;p16256;NA
31783;p16588;"PNTOt4;PNTOt4pp"
203;3-DEHYDROQUINATE-DEHYDRATASE-RXN;"DHQTi;DQDH"
DF2:
Chromosome;Count1;Count2;Count3;Count4;Count5
203;1;31;1;0;0;0
1010820;152;7;0;11;4
1009250;5;0;0;17;0
31783;1;0;0;0;0;0
Expected Result:
Chromosome;RXN;Count1;Count2;Count3;Count4;Count5
1009250;q9hxn4;5;0;0;17;0
1010820;p16256;152;7;0;11;4
31783;p16588;1;0;0;0;0
203;3-DEHYDROQUINATE-DEHYDRATASE-RXN;1;31;1;0;0;0
As bash was mentioned in the text body, I offer you an awk solution. The dataframes are in files df1 and df2:
$ awk '
BEGIN {
FS=OFS=";" # input and output field delimiters
}
NR==FNR { # process df1
a[$1]=$2 # hash to an array, 1st is the key, 2nd the value
next # process next record
}
{ # process df2
$2=(a[$1] OFS $2) # prepend RXN field to 2nd field of df2
}1' df1 df2 # 1 is output command, mind the file order
The 2 last lines could be written perhaps more clearly:
...
{
print $1,a[$1],$2,$3,$4,$5,$6
}' df1 df2
Output:
Chromosome;RXN;Count1;Count2;Count3;Count4;Count5
203;3-DEHYDROQUINATE-DEHYDRATASE-RXN;1;31;1;0;0;0
1010820;p16256;152;7;0;11;4
1009250;q9hxn4;5;0;0;17;0
31783;p16588;1;0;0;0;0;0
Output will be in the order of df2. Chromosome present in df1 but not in df2 will not be included. Chromosome in df2 but not in df1 will be output from df2 with empty RXN field. Also, if there are duplicate chromosomes in df1, the last one is used. This can be fixed if it is an issue.
If I understand your request correctly, this should do it in Python. I've made the Chromosome column into the index of each DataFrame.
from io import StringIO
txt1 = '''Chromosome;RXN;ID
1009250;q9hxn4;NA
1010820;p16256;NA
31783;p16588;"PNTOt4;PNTOt4pp"
203;3-DEHYDROQUINATE-DEHYDRATASE-RXN;"DHQTi;DQDH"'''
txt2 = """Chromosome;Count1;Count2;Count3;Count4;Count5;Count6
203;1;31;1;0;0;0
1010820;152;7;0;11;4
1009250;5;0;0;17;0
31783;1;0;0;0;0;0"""
df1 = pd.read_csv(
StringIO(txt1),
sep=';',
index_col=0,
header=0
)
df2 = pd.read_csv(
StringIO(txt2),
sep=';',
index_col=0,
header=0
)
DF1:
RXN ID
Chromosome
1009250 q9hxn4 NaN
1010820 p16256 NaN
31783 p16588 PNTOt4;PNTOt4pp
203 3-DEHYDROQUINATE-DEHYDRATASE-RXN DHQTi;DQDH
DF2:
Count1 Count2 Count3 Count4 Count5 Count6
Chromosome
203 1 31 1 0 0 0.0
1010820 152 7 0 11 4 NaN
1009250 5 0 0 17 0 NaN
31783 1 0 0 0 0 0.0
result = pd.concat(
[df1.sort_index(), df2.sort_index()],
axis=1
)
print(result)
RXN ID Count1 Count2 Count3 Count4 Count5 Count6
Chromosome
203 3-DEHYDROQUINATE-DEHYDRATASE-RXN DHQTi;DQDH 1 31 1 0 0 0.0
31783 p16588 PNTOt4;PNTOt4pp 1 0 0 0 0 0.0
1009250 q9hxn4 NaN 5 0 0 17 0 NaN
1010820 p16256 NaN 152 7 0 11 4 NaN
The concat command also handles mismatched indices by simply filling in NaN values for columns in e.g. df1 if df2 doesn't have have the same index, and vice versa.

Read_json populates with empty lists; how to remove those rows

I've got a Pandas dataframe created with pd.read_json(). When I read it in, I get a few cells that have just an empty list or None, and I want to detect the rows with those [], None in certain columns. For example:
feat 1 feat 2 feat 3
0 [] [] 5
1 6 8 3
2 None 10 NaN
I want to remove rows 0 and 2 because they have None/NaN/empty lists. How can I do this with Pandas?
You can applymap the [] and None to NaN:
Note: replace works for the None but not the []... this solution seems to be a little sensitive (hence the use of negation ~)...
In [11]: df.applymap(lambda x: x == [] or x is None)
Out[11]:
feat 1 feat 2 feat 3
0 True True False
1 False False False
2 True False False
In [12]: df.where(~df.applymap(lambda x: x == [] or x is None))
Out[12]:
feat 1 feat 2 feat 3
0 NaN NaN 5
1 6 8 3
2 NaN 10 NaN

Using JSON schema as column headers in dataframe

Ok, as per a previous question (here) I've now managed to read a load of JSON data into R and to get the data into a data frame. here's the code:-
getCall <- GET("http://long-url.com",
authenticate("myusername", "password"))
contJSON <- content(getCall)
contJSON = sub("\n\r\n", "", contJSON)
df1 <- fromJSON(sprintf("[%s]", gsub("\n", ",", contJSON)), asText=TRUE)
df <- data.frame(matrix(unlist(df1), nrow=31, byrow=T))
Which gets me a data frame that looks as follows:-
head(df[,1:8])
X1 X2 X3 X4 X5 X6 X7 X8
1 2013-05-01 33682 11838 8023 3815 84 177.000000 177.000000
2 2013-05-02 32622 11626 7945 3681 58 210.000000 210.000000
3 2013-05-03 28467 11102 7786 3316 56 186.000000 186.000000
4 2013-05-04 20884 9031 6670 2361 51 7.000000 7.000000
5 2013-05-05 20481 8782 6390 2392 58 1.000000 1.000000
6 2013-05-06 25175 10019 7082 2937 62 24.000000 24.000000
However, there are no column names in my data frame. When I search for "names" in my JSON object R returns "NULL" so that doesn't give me anything useful.
I am wondering if there is any simple way (that might be repeatable on more general cases) to get the names for the column headers from the JSON schema.
I'm aware there are similar questions elsewhere on the site, but this one did not appear to be covered.
EDIT: As per the comment, here is the structure of the contJSON object.
"{\"metricDate\":\"2013-05-01\",\"pageCountTotal\":\"33682\",\"landCountTotal\":\"11838\",\"newLandCountTotal\":\"8023\",\"returnLandCountTotal\":\"3815\",\"spiderCountTotal\":\"84\",\"goalCountTotal\":\"177.000000\",\"callGoalCountTotal\":\"177.000000\",\"callCountTotal\":\"237.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.50\",\"callConversionPerc\":\"74.68\"}\n{\"metricDate\":\"2013-05-02\",\"pageCountTotal\":\"32622\",\"landCountTotal\":\"11626\",\"newLandCountTotal\":\"7945\",\"returnLandCountTotal\":\"3681\",\"spiderCountTotal\":\"58\",\"goalCountTotal\":\"210.000000\",\"callGoalCountTotal\":\"210.000000\",\"callCountTotal\":\"297.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.81\",\"callConversionPerc\":\"70.71\"}\n{\"metricDate\":\"2013-05-03\",\"pageCountTotal\":\"28467\",\"landCountTotal\":\"11102\",\"newLandCountTotal\":\"7786\",\"returnLandCountTotal\":\"3316\",\"spiderCountTotal\":\"56\",\"goalCountTotal\":\"186.000000\",\"callGoalCountTotal\":\"186.000000\",\"callCountTotal\":\"261.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.68\",\"callConversionPerc\":\"71.26\"}\n{\"metricDate\":\"2013-05-04\",\"pageCountTotal\":\"20884\",\"landCountTotal\":\"9031\",\"newLandCountTotal\":\"6670\",\"returnLandCountTotal\":\"2361\",\"spiderCountTotal\":\"51\",\"goalCountTotal\":\"7.000000\",\"callGoalCountTotal\":\"7.000000\",\"callCountTotal\":\"44.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.08\",\"callConversionPerc\":\"15.91\"}\n{\"metricDate\":\"2013-05-05\",\"pageCountTotal\":\"20481\",\"landCountTotal\":\"8782\",\"newLandCountTotal\":\"6390\",\"returnLandCountTotal\":\"2392\",\"spiderCountTotal\":\"58\",\"goalCountTotal\":\"1.000000\",\"callGoalCountTotal\":\"1.000000\",\"callCountTotal\":\"8.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.01\",\"callConversionPerc\":\"12.50\"}\n{\"metricDate\":\"2013-05-06\",\"pageCountTotal\":\"25175\",\"landCountTotal\":\"10019\",\"newLandCountTotal\":\"7082\",\"returnLandCountTotal\":\"2937\",\"spiderCountTotal\":\"62\",\"goalCountTotal\":\"24.000000\",\"callGoalCountTotal\":\"24.000000\",\"callCountTotal\":\"47.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.24\",\"callConversionPerc\":\"51.06\"}\n{\"metricDate\":\"2013-05-07\",\"pageCountTotal\":\"35892\",\"landCountTotal\":\"12615\",\"newLandCountTotal\":\"8391\",\"returnLandCountTotal\":\"4224\",\"spiderCountTotal\":\"62\",\"goalCountTotal\":\"239.000000\",\"callGoalCountTotal\":\"239.000000\",\"callCountTotal\":\"321.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.89\",\"callConversionPerc\":\"74.45\"}\n{\"metricDate\":\"2013-05-08\",\"pageCountTotal\":\"34106\",\"landCountTotal\":\"12391\",\"newLandCountTotal\":\"8389\",\"returnLandCountTotal\":\"4002\",\"spiderCountTotal\":\"90\",\"goalCountTotal\":\"221.000000\",\"callGoalCountTotal\":\"221.000000\",\"callCountTotal\":\"295.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.78\",\"callConversionPerc\":\"74.92\"}\n{\"metricDate\":\"2013-05-09\",\"pageCountTotal\":\"32721\",\"landCountTotal\":\"12447\",\"newLandCountTotal\":\"8541\",\"returnLandCountTotal\":\"3906\",\"spiderCountTotal\":\"54\",\"goalCountTotal\":\"207.000000\",\"callGoalCountTotal\":\"207.000000\",\"callCountTotal\":\"280.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.66\",\"callConversionPerc\":\"73.93\"}\n{\"metricDate\":\"2013-05-10\",\"pageCountTotal\":\"29724\",\"landCountTotal\":\"11616\",\"newLandCountTotal\":\"8063\",\"returnLandCountTotal\":\"3553\",\"spiderCountTotal\":\"139\",\"goalCountTotal\":\"207.000000\",\"callGoalCountTotal\":\"207.000000\",\"callCountTotal\":\"301.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.78\",\"callConversionPerc\":\"68.77\"}\n{\"metricDate\":\"2013-05-11\",\"pageCountTotal\":\"22061\",\"landCountTotal\":\"9660\",\"newLandCountTotal\":\"6971\",\"returnLandCountTotal\":\"2689\",\"spiderCountTotal\":\"52\",\"goalCountTotal\":\"3.000000\",\"callGoalCountTotal\":\"3.000000\",\"callCountTotal\":\"40.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.03\",\"callConversionPerc\":\"7.50\"}\n{\"metricDate\":\"2013-05-12\",\"pageCountTotal\":\"23341\",\"landCountTotal\":\"9935\",\"newLandCountTotal\":\"6960\",\"returnLandCountTotal\":\"2975\",\"spiderCountTotal\":\"45\",\"goalCountTotal\":\"0.000000\",\"callGoalCountTotal\":\"0.000000\",\"callCountTotal\":\"12.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.00\",\"callConversionPerc\":\"0.00\"}\n{\"metricDate\":\"2013-05-13\",\"pageCountTotal\":\"36565\",\"landCountTotal\":\"13583\",\"newLandCountTotal\":\"9277\",\"returnLandCountTotal\":\"4306\",\"spiderCountTotal\":\"69\",\"goalCountTotal\":\"246.000000\",\"callGoalCountTotal\":\"246.000000\",\"callCountTotal\":\"324.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.81\",\"callConversionPerc\":\"75.93\"}\n{\"metricDate\":\"2013-05-14\",\"pageCountTotal\":\"35260\",\"landCountTotal\":\"13797\",\"newLandCountTotal\":\"9375\",\"returnLandCountTotal\":\"4422\",\"spiderCountTotal\":\"59\",\"goalCountTotal\":\"212.000000\",\"callGoalCountTotal\":\"212.000000\",\"callCountTotal\":\"283.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.54\",\"callConversionPerc\":\"74.91\"}\n{\"metricDate\":\"2013-05-15\",\"pageCountTotal\":\"35836\",\"landCountTotal\":\"13792\",\"newLandCountTotal\":\"9532\",\"returnLandCountTotal\":\"4260\",\"spiderCountTotal\":\"94\",\"goalCountTotal\":\"187.000000\",\"callGoalCountTotal\":\"187.000000\",\"callCountTotal\":\"258.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.36\",\"callConversionPerc\":\"72.48\"}\n{\"metricDate\":\"2013-05-16\",\"pageCountTotal\":\"33136\",\"landCountTotal\":\"12821\",\"newLandCountTotal\":\"8755\",\"returnLandCountTotal\":\"4066\",\"spiderCountTotal\":\"65\",\"goalCountTotal\":\"192.000000\",\"callGoalCountTotal\":\"192.000000\",\"callCountTotal\":\"260.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.50\",\"callConversionPerc\":\"73.85\"}\n{\"metricDate\":\"2013-05-17\",\"pageCountTotal\":\"29564\",\"landCountTotal\":\"11721\",\"newLandCountTotal\":\"8191\",\"returnLandCountTotal\":\"3530\",\"spiderCountTotal\":\"213\",\"goalCountTotal\":\"166.000000\",\"callGoalCountTotal\":\"166.000000\",\"callCountTotal\":\"222.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.42\",\"callConversionPerc\":\"74.77\"}\n{\"metricDate\":\"2013-05-18\",\"pageCountTotal\":\"23686\",\"landCountTotal\":\"9916\",\"newLandCountTotal\":\"7335\",\"returnLandCountTotal\":\"2581\",\"spiderCountTotal\":\"56\",\"goalCountTotal\":\"5.000000\",\"callGoalCountTotal\":\"5.000000\",\"callCountTotal\":\"34.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.05\",\"callConversionPerc\":\"14.71\"}\n{\"metricDate\":\"2013-05-19\",\"pageCountTotal\":\"23528\",\"landCountTotal\":\"9952\",\"newLandCountTotal\":\"7184\",\"returnLandCountTotal\":\"2768\",\"spiderCountTotal\":\"57\",\"goalCountTotal\":\"1.000000\",\"callGoalCountTotal\":\"1.000000\",\"callCountTotal\":\"14.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.01\",\"callConversionPerc\":\"7.14\"}\n{\"metricDate\":\"2013-05-20\",\"pageCountTotal\":\"37391\",\"landCountTotal\":\"13488\",\"newLandCountTotal\":\"9024\",\"returnLandCountTotal\":\"4464\",\"spiderCountTotal\":\"69\",\"goalCountTotal\":\"227.000000\",\"callGoalCountTotal\":\"227.000000\",\"callCountTotal\":\"291.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.68\",\"callConversionPerc\":\"78.01\"}\n{\"metricDate\":\"2013-05-21\",\"pageCountTotal\":\"36299\",\"landCountTotal\":\"13174\",\"newLandCountTotal\":\"8817\",\"returnLandCountTotal\":\"4357\",\"spiderCountTotal\":\"77\",\"goalCountTotal\":\"164.000000\",\"callGoalCountTotal\":\"164.000000\",\"callCountTotal\":\"221.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.24\",\"callConversionPerc\":\"74.21\"}\n{\"metricDate\":\"2013-05-22\",\"pageCountTotal\":\"34201\",\"landCountTotal\":\"12433\",\"newLandCountTotal\":\"8388\",\"returnLandCountTotal\":\"4045\",\"spiderCountTotal\":\"76\",\"goalCountTotal\":\"195.000000\",\"callGoalCountTotal\":\"195.000000\",\"callCountTotal\":\"262.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.57\",\"callConversionPerc\":\"74.43\"}\n{\"metricDate\":\"2013-05-23\",\"pageCountTotal\":\"32951\",\"landCountTotal\":\"11611\",\"newLandCountTotal\":\"7757\",\"returnLandCountTotal\":\"3854\",\"spiderCountTotal\":\"68\",\"goalCountTotal\":\"167.000000\",\"callGoalCountTotal\":\"167.000000\",\"callCountTotal\":\"231.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.44\",\"callConversionPerc\":\"72.29\"}\n{\"metricDate\":\"2013-05-24\",\"pageCountTotal\":\"28967\",\"landCountTotal\":\"10821\",\"newLandCountTotal\":\"7396\",\"returnLandCountTotal\":\"3425\",\"spiderCountTotal\":\"106\",\"goalCountTotal\":\"167.000000\",\"callGoalCountTotal\":\"167.000000\",\"callCountTotal\":\"203.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.54\",\"callConversionPerc\":\"82.27\"}\n{\"metricDate\":\"2013-05-25\",\"pageCountTotal\":\"19741\",\"landCountTotal\":\"8393\",\"newLandCountTotal\":\"6168\",\"returnLandCountTotal\":\"2225\",\"spiderCountTotal\":\"78\",\"goalCountTotal\":\"0.000000\",\"callGoalCountTotal\":\"0.000000\",\"callCountTotal\":\"28.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.00\",\"callConversionPerc\":\"0.00\"}\n{\"metricDate\":\"2013-05-26\",\"pageCountTotal\":\"19770\",\"landCountTotal\":\"8237\",\"newLandCountTotal\":\"6009\",\"returnLandCountTotal\":\"2228\",\"spiderCountTotal\":\"79\",\"goalCountTotal\":\"0.000000\",\"callGoalCountTotal\":\"0.000000\",\"callCountTotal\":\"8.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.00\",\"callConversionPerc\":\"0.00\"}\n{\"metricDate\":\"2013-05-27\",\"pageCountTotal\":\"26208\",\"landCountTotal\":\"9755\",\"newLandCountTotal\":\"6779\",\"returnLandCountTotal\":\"2976\",\"spiderCountTotal\":\"82\",\"goalCountTotal\":\"26.000000\",\"callGoalCountTotal\":\"26.000000\",\"callCountTotal\":\"40.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"0.27\",\"callConversionPerc\":\"65.00\"}\n{\"metricDate\":\"2013-05-28\",\"pageCountTotal\":\"36980\",\"landCountTotal\":\"12463\",\"newLandCountTotal\":\"8226\",\"returnLandCountTotal\":\"4237\",\"spiderCountTotal\":\"132\",\"goalCountTotal\":\"208.000000\",\"callGoalCountTotal\":\"208.000000\",\"callCountTotal\":\"276.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.67\",\"callConversionPerc\":\"75.36\"}\n{\"metricDate\":\"2013-05-29\",\"pageCountTotal\":\"34190\",\"landCountTotal\":\"12014\",\"newLandCountTotal\":\"8279\",\"returnLandCountTotal\":\"3735\",\"spiderCountTotal\":\"90\",\"goalCountTotal\":\"179.000000\",\"callGoalCountTotal\":\"179.000000\",\"callCountTotal\":\"235.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.49\",\"callConversionPerc\":\"76.17\"}\n{\"metricDate\":\"2013-05-30\",\"pageCountTotal\":\"33867\",\"landCountTotal\":\"11965\",\"newLandCountTotal\":\"8231\",\"returnLandCountTotal\":\"3734\",\"spiderCountTotal\":\"63\",\"goalCountTotal\":\"160.000000\",\"callGoalCountTotal\":\"160.000000\",\"callCountTotal\":\"219.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.34\",\"callConversionPerc\":\"73.06\"}\n{\"metricDate\":\"2013-05-31\",\"pageCountTotal\":\"27536\",\"landCountTotal\":\"10302\",\"newLandCountTotal\":\"7333\",\"returnLandCountTotal\":\"2969\",\"spiderCountTotal\":\"108\",\"goalCountTotal\":\"173.000000\",\"callGoalCountTotal\":\"173.000000\",\"callCountTotal\":\"226.000000\",\"onlineGoalCountTotal\":\"0.000000\",\"conversionPerc\":\"1.68\",\"callConversionPerc\":\"76.55\"}\n\r\n"
One thing that works is to split on newlines, call from JSON on each row, then recombine the result.
contJSON <- sub("\n\r\n", "", contJSON) #as before
rowJSON <- strsplit(contJSON, "\n")[[1]]
row <- lapply(rowJSON, fromJSON)
as.data.frame(do.call(rbind, row))

Matlab's nanmean( ) function not working with dimensions other than 1

Take this example from the mathworks help of nanmean():
X = magic(3);
X([1 6:9]) = repmat(NaN,1,5)
X =
NaN 1 NaN
3 5 NaN
4 NaN NaN
>> y = nanmean(X,2)
??? Error using ==> nanmean
Too many input arguments.
Why is it showing error even when the docs say the mean can be taken in any dimension dim of X as y = nanmean(X,dim)? Thanks.
I run exactly the code you have and I get no error. In particlar here is what I ran:
>> X = magic(3);
X([1 6:9]) = repmat(NaN,1,5)
X =
NaN 1 NaN
3 5 NaN
4 NaN NaN
>> y = nanmean(X,2)
y =
1
4
4
>> which nanmean
C:\Program Files\MATLAB\R2010b\toolbox\stats\stats\nanmean.m
The only thing I can think of is that you have a different version of nanmean.m on your path. Try a which nanmean and see if it points into the stats toolbox.
here is the reason:
If X contains a vector of all NaN values along some dimension, the vector is empty once the NaN values are removed, so the sum of the remaining elements is 0. Since the mean involves division by 0, its value is NaN. The output NaN is not a mean of NaN values.
Look at:
http://www.mathworks.com/help/toolbox/stats/nanmean.html