Error in eval(expr, envir, enclos) while using Predict function - regression

When I try to run predict() on the dataset, it keeps giving me error -
Error in eval(expr, envir, enclos) : object 'LoanRange' not found
Here is the part of dataset -
LoanRange Loan.Type N WAFICO WALTV WAOrigRev WAPTValue
1 0-99999 Conventional 109 722.5216 63.55385 6068.239 0.6031879
2 0-99999 FHA 30 696.6348 80.00100 7129.650 0.5623650
3 0-99999 VA 13 698.6986 74.40525 7838.894 0.4892977
4 100000-149999 Conventional 860 731.2333 68.25817 6438.330 0.5962638
5 100000-149999 FHA 285 673.2256 82.42225 8145.068 0.5211495
6 100000-149999 VA 125 704.1686 87.71306 8911.461 0.5020074
7 150000-199999 Conventional 1291 738.7164 70.08944 8125.979 0.6045117
8 150000-199999 FHA 403 672.0891 84.65318 10112.192 0.5199632
9 150000-199999 VA 195 694.1885 90.77495 10909.393 0.5250807
10 200000-249999 Conventional 1162 740.8614 70.65027 8832.563 0.6111419
11 200000-249999 FHA 348 667.6291 85.13457 11013.856 0.5374226
12 200000-249999 VA 221 702.9796 91.76759 11753.642 0.5078298
13 250000-299999 Conventional 948 742.0405 72.22742 9903.160 0.6106858
Following is the code used for predicting count data N after determining the overdispersion-
model2=glm(N~Loan.Type+WAFICO+WALTV+WAOrigRev+WAPTValue, family=quasipoisson(link = "log"), data = DF)
summary(model2)
This is what I have done to create a sequence of count and use predict function-
countaxis <- seq (0,1500,150)
Y <- predict(model2, list(N=countaxis, type = "response")
At this step, I get the error -
Error in eval(expr, envir, enclos) : object 'LoanRange' not found
Can someone please point me where is the problem here.

Think about what exactly you are trying to predict. You are providing the predict function values of N (via countaxis), but in fact the way you set up your model, N is your response variable and the remaining variables are the predictors. That's why R is asking for LoanRange. It actually needs values for LoanRange, Loan.Type, ..., WAPTValue in order to predict N. So you need to feed predict inputs that let the model try to predict N.
For example, you could do something like this:
# create some fake data to predict N
newdata1 = data.frame(rbind(c("0-99999", "Conventional", 722.5216, 63.55385, 6068.239, 0.6031879),
c("150000-199999", "VA", 12.5216, 3.55385, 60.239, 0.0031879)))
colnames(newdata1) = c("LoanRange" ,"Loan.Type", "WAFICO" ,"WALTV" , "WAOrigRev" ,"WAPTValue")
# ensure that numeric variables are indeed numeric and not factors
newdata1$WAFICO = as.numeric(as.character(newdata1$WAFICO))
newdata1$WALTV = as.numeric(as.character(newdata1$WALTV))
newdata1$WAPTValue = as.numeric(as.character(newdata1$WAPTValue))
newdata1$WAOrigRev = as.numeric(as.character(newdata1$WAOrigRev))
# make predictions - this will output values of N
predict(model2, newdata = newdata1, type = "response")

Related

The number of MiniBatchSize always 1 even when i change it in minibatchqueue parameters to 32? in MATLAB

I follow the example in Matlab for CNN with Multiple Outputs
Train Network with Multiple Outputs
https://www.mathworks.com/help/deeplearning/ug/train-network-with-multiple-outputs.html
But with a colour image as input data with two output as below:
Name Size Bytes Class Attributes
XTrain 128x128x3x2016 792723456 double
Name Size Bytes Class Attributes
yTrain1 1x2016 2230 categorical
Name Size Bytes Class Attributes
yTrain2 1x2016 16128 double
I define minibatchqueue object as below:
mbq = minibatchqueue(dsTrain,...
'MiniBatchSize',32,...
'MiniBatchFcn', #preprocessData,...
'MiniBatchFormat',{'SSCB','',''});
But the number of Batch size in output is always 1 as
mbq =
minibatchqueue with 3 outputs and properties:
Mini-batch creation:
MiniBatchSize: 1
PartialMiniBatch: 'return'
MiniBatchFcn: #preprocessDataJwan
DispatchInBackground: 0
Outputs:
OutputCast: {'single' 'single' 'single'}
OutputAsDlarray: [1 1 1]
MiniBatchFormat: {'SSCB' '' ''}
OutputEnvironment: {'auto' 'auto' 'auto'}
even I check the data size
[dlX,dlY1,dlY2] = next(mbq);
size(dlX)
size(dlY1)
size(dlY2)
ans =
128 128 3 2016
ans =
2 2016
ans =
1 2016
please, can anyone help me with that?

How to use HuggingFace nlp library's GLUE for CoLA

I've been trying to use the HuggingFace nlp library's GLUE metric to check whether a given sentence is a grammatical English sentence. But I'm getting an error and is stuck without being able to proceed.
What I've tried so far;
reference and prediction are 2 text sentences
!pip install transformers
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-large-uncased')
reference="Security has been beefed across the country as a 2 day nation wide curfew came into effect."
prediction="Security has been tightened across the country as a 2-day nationwide curfew came into effect."
import nlp
glue_metric = nlp.load_metric('glue',name="cola")
#Using BertTokenizer
encoded_reference=tokenizer.encode(reference, add_special_tokens=False)
encoded_prediction=tokenizer.encode(prediction, add_special_tokens=False)
glue_score = glue_metric.compute(encoded_prediction, encoded_reference)
Error I'm getting;
ValueError Traceback (most recent call last)
<ipython-input-9-4c3a3ce7b583> in <module>()
----> 1 glue_score = glue_metric.compute(encoded_prediction, encoded_reference)
6 frames
/usr/local/lib/python3.6/dist-packages/nlp/metric.py in compute(self, predictions, references, timeout, **metrics_kwargs)
198 predictions = self.data["predictions"]
199 references = self.data["references"]
--> 200 output = self._compute(predictions=predictions, references=references, **metrics_kwargs)
201 return output
202
/usr/local/lib/python3.6/dist-packages/nlp/metrics/glue/27b1bc63e520833054bd0d7a8d0bc7f6aab84cc9eed1b576e98c806f9466d302/glue.py in _compute(self, predictions, references)
101 return pearson_and_spearman(predictions, references)
102 elif self.config_name in ["mrpc", "qqp"]:
--> 103 return acc_and_f1(predictions, references)
104 elif self.config_name in ["sst2", "mnli", "mnli_mismatched", "mnli_matched", "qnli", "rte", "wnli", "hans"]:
105 return {"accuracy": simple_accuracy(predictions, references)}
/usr/local/lib/python3.6/dist-packages/nlp/metrics/glue/27b1bc63e520833054bd0d7a8d0bc7f6aab84cc9eed1b576e98c806f9466d302/glue.py in acc_and_f1(preds, labels)
60 def acc_and_f1(preds, labels):
61 acc = simple_accuracy(preds, labels)
---> 62 f1 = f1_score(y_true=labels, y_pred=preds)
63 return {
64 "accuracy": acc,
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py in f1_score(y_true, y_pred, labels, pos_label, average, sample_weight, zero_division)
1097 pos_label=pos_label, average=average,
1098 sample_weight=sample_weight,
-> 1099 zero_division=zero_division)
1100
1101
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py in fbeta_score(y_true, y_pred, beta, labels, pos_label, average, sample_weight, zero_division)
1224 warn_for=('f-score',),
1225 sample_weight=sample_weight,
-> 1226 zero_division=zero_division)
1227 return f
1228
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py in precision_recall_fscore_support(y_true, y_pred, beta, labels, pos_label, average, warn_for, sample_weight, zero_division)
1482 raise ValueError("beta should be >=0 in the F-beta score")
1483 labels = _check_set_wise_labels(y_true, y_pred, average, labels,
-> 1484 pos_label)
1485
1486 # Calculate tp_sum, pred_sum, true_sum ###
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py in _check_set_wise_labels(y_true, y_pred, average, labels, pos_label)
1314 raise ValueError("Target is %s but average='binary'. Please "
1315 "choose another average setting, one of %r."
-> 1316 % (y_type, average_options))
1317 elif pos_label not in (None, 1):
1318 warnings.warn("Note that pos_label (set to %r) is ignored when "
ValueError: Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted'].
However, I'm able to get results (pearson and spearmanr) for 'stsb' with the same workaround as given above.
Some help and a workaround for(cola) this is really appreciated. Thank you.
In general, if you are seeing this error with HuggingFace, you are trying to use the f-score as a metric on a text classification problem with more than 2 classes. Pick a different metric, like "accuracy".
For this specific question:
Despite what you entered, it is trying to compute the f-score. From the example notebook, you should set the metric name as:
metric_name = "pearson" if task == "stsb" else "matthews_correlation" if task == "cola" else "accuracy"

what type of model can i use to train this data

I have downloaded and labeled data from
http://archive.ics.uci.edu/ml/datasets/pamap2+physical+activity+monitoring
my task is to gain an insight into the data from what is given, I have round 34 attributes in a data frame(all clean no nan values)
and want to train a model based on one target attribute 'heart_rate' given the rest of the attributes(all are numbers of a participant performing various activities )
I wanted to use Linear regression model but can not use my dataframe for some reason, however, I do not mind starting from 0 if you think I am doing it wrong
my DataFrame columns:
> Index(['timestamp', 'activity_ID', 'heart_rate', 'IMU_hand_temp',
> 'hand_acceleration_16_1', 'hand_acceleration_16_2',
> 'hand_acceleration_16_3', 'hand_gyroscope_rad_7',
> 'hand_gyroscope_rad_8', 'hand_gyroscope_rad_9',
> 'hand_magnetometer_μT_10', 'hand_magnetometer_μT_11',
> 'hand_magnetometer_μT_12', 'IMU_chest_temp', 'chest_acceleration_16_1',
> 'chest_acceleration_16_2', 'chest_acceleration_16_3',
> 'chest_gyroscope_rad_7', 'chest_gyroscope_rad_8',
> 'chest_gyroscope_rad_9', 'chest_magnetometer_μT_10',
> 'chest_magnetometer_μT_11', 'chest_magnetometer_μT_12',
> 'IMU_ankle_temp', 'ankle_acceleration_16_1', 'ankle_acceleration_16_2',
> 'ankle_acceleration_16_3', 'ankle_gyroscope_rad_7',
> 'ankle_gyroscope_rad_8', 'ankle_gyroscope_rad_9',
> 'ankle_magnetometer_μT_10', 'ankle_magnetometer_μT_11',
> 'ankle_magnetometer_μT_12', 'Intensity'],
> dtype='object')
first 5 rows:
timestamp activity_ID heart_rate IMU_hand_temp hand_acceleration_16_1 hand_acceleration_16_2 hand_acceleration_16_3 hand_gyroscope_rad_7 hand_gyroscope_rad_8 hand_gyroscope_rad_9 ... ankle_acceleration_16_1 ankle_acceleration_16_2 ankle_acceleration_16_3 ankle_gyroscope_rad_7 ankle_gyroscope_rad_8 ankle_gyroscope_rad_9 ankle_magnetometer_μT_10 ankle_magnetometer_μT_11 ankle_magnetometer_μT_12 Intensity
2928 37.66 lying 100.0 30.375 2.21530 8.27915 5.58753 -0.004750 0.037579 -0.011145 ... 9.73855 -1.84761 0.095156 0.002908 -0.027714 0.001752 -61.1081 -36.8636 -58.3696 low
2929 37.67 lying 100.0 30.375 2.29196 7.67288 5.74467 -0.171710 0.025479 -0.009538 ... 9.69762 -1.88438 -0.020804 0.020882 0.000945 0.006007 -60.8916 -36.3197 -58.3656 low
2930 37.68 lying 100.0 30.375 2.29090 7.14240 5.82342 -0.238241 0.011214 0.000831 ... 9.69633 -1.92203 -0.059173 -0.035392 -0.052422 -0.004882 -60.3407 -35.7842 -58.6119 low
2931 37.69 lying 100.0 30.375 2.21800 7.14365 5.89930 -0.192912 0.019053 0.013374 ... 9.66370 -1.84714 0.094385 -0.032514 -0.018844 0.026950 -60.7646 -37.1028 -57.8799 low
2932 37.70 lying 100.0 30.375 2.30106 7.25857 6.09259 -0.069961 -0.018328 0.004582 ... 9.77578 -1.88582 0.095775 0.001351 -0.048878 -0.006328 -60.2040 -37.1225 -57.8847 low
if you check the timestamp attribute you will see that the data acquired is in milliseconds so it might be a good idea to use the data from this dataframe as in every 2-5 seconds and train the model
also as an option, I want to use as one of these models for this task Linear,polynomial, multiple linear, agglomerative clustering and kmeans clustering.
my code:
target = subject1.DataFrame(data.target, columns=["heart_rate"])
X = df
y = target[“heart_rate”]
lm = linear_model.LinearRegression()
model = lm.fit(X,y)
predictions = lm.predict(X)
print(predictions)[0:5]
Error:
AttributeError Traceback (most recent call last)
<ipython-input-93-b0c3faad3a98> in <module>()
3 #heart_rate
4 # Put the target (housing value -- MEDV) in another DataFrame
----> 5 target = subject1.DataFrame(data.target, columns=["heart_rate"])
c:\python36\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
5177 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5178 return self[name]
-> 5179 return object.__getattribute__(self, name)
5180
5181 def __setattr__(self, name, value):
AttributeError: 'DataFrame' object has no attribute 'DataFrame'
for fixing the error I have used:
subject1.columns = subject1.columns.str.strip()
but still did not work
Thank you, sorry if I was not precise enough.
Try this:
X = df.drop("heart_rate", axis=1)
y = df[[“heart_rate”]]
X=X.apply(zscore)
test_size=0.30
seed=7
X_train, X_test, y_train, y_test=train_test_split(X, y, test_size=test_size, random_state=seed)
lm = linear_model.LinearRegression()
model = lm.fit(X,y)
predictions = lm.predict(X)
print(predictions)[0:5]

Dimension problem when converting a MATLAB .m script into an Octave compatible syntax

I want to run a MATLAB script M-file to reconstruct a point cloud in Octave. Therefore I had to rewrite some parts of the code to make it compatible with Octave. Actually the M-file works fine in Octave (I don't get any errors) and also the plotted point cloud looks good at first glance, but it seems that the variables are only half the size of the original MATLAB variables. In the attached screenshots you can see what I mean.
Octave:
MATLAB:
You can see that the dimension of e.g. M in Octave is 1311114x3 but in MATLAB it is 2622227x3. The actual number of rows in my raw file is 2622227 as well.
Here you can see an extract of the raw file (original data) that I use.
Rotation angle Measured distance
-0,090 26,295
-0,342 26,294
-0,594 26,294
-0,846 26,295
-1,098 26,294
-1,368 26,296
-1,620 26,296
-1,872 26,296
In MATLAB I created my output variable as follows.
data = table;
data.Rotationangle = cell2mat(raw(:, 1));
data.Measureddistance = cell2mat(raw(:, 2));
As there is no table function in Octave I wrote
data = cellfun(#(x)str2num(x), strrep(raw, ',', '.'))
instead.
Octave also has no struct2array function, so I had to replace it as well.
In MATLAB I wrote.
data = table2array(data);
In Octave this was a bit more difficult to do. I had to create a struct2array function, which I did by means of this bug report.
%% Create a struct2array function
function retval = struct2array (input_struct)
%input check
if (~isstruct (input_struct) || (nargin ~= 1))
print_usage;
endif
%convert to cell array and flatten/concatenate output.
retval = [ (struct2cell (input_struct)){:}];
endfunction
clear b;
b.a = data;
data = struct2array(b);
Did I make a mistake somewhere and could someone help me to solve this problem?
edit:
Here's the part of my script where I'm using raw.
delimiter = '\t';
startRow = 5;
formatSpec = '%s%s%[^\n\r]';
fileID = fopen(filename,'r');
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'HeaderLines' ,startRow-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
fclose(fileID);
%% Convert the contents of columns containing numeric text to numbers.
% Replace non-numeric text with NaN.
raw = repmat({''},length(dataArray{1}),length(dataArray)-1);
for col=1:length(dataArray)-1
raw(1:length(dataArray{col}),col) = mat2cell(dataArray{col}, ones(length(dataArray{col}), 1));
end
numericData = NaN(size(dataArray{1},1),size(dataArray,2));
for col=[1,2]
% Converts text in the input cell array to numbers. Replaced non-numeric
% text with NaN.
rawData = dataArray{col};
for row=1:size(rawData, 1)
% Create a regular expression to detect and remove non-numeric prefixes and
% suffixes.
regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\.]*)+[\,]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\.]*)*[\,]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
try
result = regexp(rawData(row), regexstr, 'names');
numbers = result.numbers;
% Detected commas in non-thousand locations.
invalidThousandsSeparator = false;
if numbers.contains('.')
thousandsRegExp = '^\d+?(\.\d{3})*\,{0,1}\d*$';
if isempty(regexp(numbers, thousandsRegExp, 'once'))
numbers = NaN;
invalidThousandsSeparator = true;
end
end
% Convert numeric text to numbers.
if ~invalidThousandsSeparator
numbers = strrep(numbers, '.', '');
numbers = strrep(numbers, ',', '.');
numbers = textscan(char(numbers), '%f');
numericData(row, col) = numbers{1};
raw{row, col} = numbers{1};
end
catch
raw{row, col} = rawData{row};
end
end
end
You don't see any raw in my workspaces because I clear all temporary variables before I reconstruct my point cloud.
Also my original data in row 1311114 and 1311115 look normal.
edit 2:
As suggested here is a small example table to clarify what I want and what MATLAB does with the table2array function in my case.
data =
-0.0900 26.2950
-0.3420 26.2940
-0.5940 26.2940
-0.8460 26.2950
-1.0980 26.2940
-1.3680 26.2960
-1.6200 26.2960
-1.8720 26.2960
With the struct2array function I used in Octave I get the following array.
data =
-0.090000 26.295000
-0.594000 26.294000
-1.098000 26.294000
-1.620000 26.296000
-2.124000 26.295000
-2.646000 26.293000
-3.150000 26.294000
-3.654000 26.294000
If you compare the Octave array with my original data, you can see that every second row is skipped. This seems to be the reason for 1311114 instead of 2622227 rows.
edit 3:
I tried to solve my problem with the suggestions of #Tasos Papastylianou, which unfortunately was not successful.
First I did the variant with a struct.
data = struct();
data.Rotationangle = [raw(:,1)];
data.Measureddistance = [raw(:,2)];
data = cell2mat( struct2cell (data ).' )
But this leads to the following structure in my script. (Unfortunately the result is not what I would like to have as shown in edit 2. Don't be surprised, I only used a small part of my raw file to accelerate the run of my script, so here are only 769 lines.)
[766,1] = -357,966
[767,1] = -358,506
[768,1] = -359,010
[769,1] = -359,514
[1,2] = 26,295
[2,2] = 26,294
[3,2] = 26,294
[4,2] = 26,296
Furthermore I get the following error.
error: unary operator '-' not implemented for 'cell' operands
error: called from
Cloud_reconstruction at line 137 column 11
Also the approach with the dataframe octave package didn't work. When I run the following code it leads to the error you can see below.
dataframe2array = #(df) cell2mat( struct(df).x_data );
pkg load dataframe;
data = dataframe();
data.Rotationangle = [raw(:, 1)];
data.Measureddistance = [raw(:, 2)];
dataframe2array(data)
error:
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 147 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 106 column 20
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 176 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 106 column 20
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 147 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 107 column 23
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 176 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 107 column 23
error: RHS(_,2): but RHS has size 768x1
error: called from
df_matassign at line 179 column 11
subsasgn at line 172 column 14
Cloud_reconstruction at line 107 column 23
Both error messages refer to the following part of my script where I'm doing the reconstruction of the point cloud in cylindrical coordinates.
distLaserCenter = 47; % Distance between the pipe centerline and the blind zone in mm
m = size(data,1); % Find the length of the first dimension of data
zincr = 0.4/360; % z increment in mm per deg
data(:,1) = -data(:,1);
for i = 1:m
data(i,2) = data(i,2) + distLaserCenter;
if i == 1
data(i,3) = 0;
elseif abs(data(i,1)-data(i-1)) < 100
data(i,3) = data(i-1,3) + zincr*(data(i,1)-data(i-1));
else abs(data(i,1)-data(i-1)) > 100;
data(i,3) = data(i-1,3) + zincr*(data(i,1)-(data(i-1)-360));
end
end
To give some background information for a better understanding. The script is used to reconstruct a pipe as a point cloud. The surface of the pipe was scanned from inside with a laser and the laser measured several points (distance from laser to the inner wall of the pipe) at each deg of rotation. I hope this helps to understand what I want to do with my script.
Not sure exactly what you're trying to do, but here's a toy example of how a struct could be used in an equivalent manner to a table:
matlab:
data = table;
data.A = [1;2;3;4;5];
data.B = [10;20;30;40;50];
table2array(data)
octave:
data = struct();
data.A = [1;2;3;4;5];
data.B = [10;20;30;40;50];
cell2mat( struct2cell (data ).' )
Note the transposition operation (.') before passing the result to cell2mat, since in a table, the 'fieldnames' are arranged horizontally in columns, whereas the struct2cell ends up arranging what used to be the 'fieldnames' as rows.
You might also be interested in the dataframe octave package, which performs similar functions to matlab's table (or in fact, R's dataframe object): https://octave.sourceforge.io/dataframe/ (you can install this by typing pkg install -forge dataframe in your console)
Unfortunately, the way to display the data as an array is still not ideal (see: https://stackoverflow.com/a/55417141/4183191), but you can easily convert that into a tiny function, e.g.
dataframe2array = #(df) cell2mat( struct(df).x_data );
Your code can then become:
pkg load dataframe;
data = dataframe();
data.A = [1;2;3;4;5];
data.B = [10;20;30;40;50];
dataframe2array(data)

Is it possible to write a table to a file in JSON format in R?

I'm making word frequency tables with R and the preferred output format would be a JSON file. sth like
{
"word" : "dog",
"frequency" : 12
}
Is there any way to save the table directly into this format? I've been using the write.csv() function and convert the output into JSON but this is very complicated and time consuming.
set.seed(1)
( tbl <- table(round(runif(100, 1, 5))) )
## 1 2 3 4 5
## 9 24 30 23 14
library(rjson)
sink("json.txt")
cat(toJSON(tbl))
sink()
file.show("json.txt")
## {"1":9,"2":24,"3":30,"4":23,"5":14}
or even better:
set.seed(1)
( tab <- table(letters[round(runif(100, 1, 26))]) )
a b c d e f g h i j k l m n o p q r s t u v w x y z
1 2 4 3 2 5 4 3 5 3 9 4 7 2 2 2 5 5 5 6 5 3 7 3 2 1
sink("lets.txt")
cat(toJSON(tab))
sink()
file.show("lets.txt")
## {"a":1,"b":2,"c":4,"d":3,"e":2,"f":5,"g":4,"h":3,"i":5,"j":3,"k":9,"l":4,"m":7,"n":2,"o":2,"p":2,"q":5,"r":5,"s":5,"t":6,"u":5,"v":3,"w":7,"x":3,"y":2,"z":1}
Then validate it with http://www.jsonlint.com/ to get pretty formatting. If you have multidimensional table, you'll have to work it out a bit...
EDIT:
Oh, now I see, you want the dataset characteristics sink-ed to a JSON file. No problem, just give us a sample data, and I'll work on a code a bit. Practically, you need to carry out the data into desirable format, hence convert it to JSON. list should suffice. Give me a sec, I'll update my answer.
EDIT #2:
Well, time is relative... it's a common knowledge... Here you go:
( dtf <- structure(list(word = structure(1:3, .Label = c("cat", "dog",
"mouse"), class = "factor"), frequency = c(12, 32, 18)), .Names = c("word",
"frequency"), row.names = c(NA, -3L), class = "data.frame") )
## word frequency
## 1 cat 12
## 2 dog 32
## 3 mouse 18
If dtf is a simple data frame, yes, data.frame, if it's not, coerce it! Long story short, you can do:
toJSON(as.data.frame(t(dtf)))
## [1] "{\"V1\":{\"word\":\"cat\",\"frequency\":\"12\"},\"V2\":{\"word\":\"dog\",\"frequency\":\"32\"},\"V3\":{\"word\":\"mouse\",\"frequency\":\"18\"}}"
I though I'll need some melt with this one, but simple t did the trick. Now, you only need to deal with column names after transposing the data.frame. t coerces data.frames to matrix, so you need to convert it back to data.frame. I used as.data.frame, but you can also use toJSON(data.frame(t(dtf))) - you'll get X instead of V as a variable name. Alternatively, you can use regexp to clean the JSON file (if needed), but it's a lousy practice, try to work it out by preparing the data.frame.
I hope this helped a bit...
These days I would typically use the jsonlite package.
library("jsonlite")
toJSON(mydatatable, pretty = TRUE)
This turns the data table into a JSON array of key/value pair objects directly.
RJSONIO is a package "that allows conversion to and from data in Javascript object notation (JSON) format". You can use it to export your object as a JSON file.
library(RJSONIO)
writeLines(toJSON(anobject), "afile.JSON")