How to create the correlogram for a data set(200 up observations + 25 up columns) containing both categorical and numerical variables? - csv

Correlogram for a data set (csv file) with both categorical & numerical variables.
Tried to sketch correlogram using ggcorrplot package. Got an error "x should be numeric"

Related

How to plot multivariate binary data

I have a dataset of 78 variables which the input data of all the variables are binary (0 and 1). I want to plot the data in one graph. originally I plan to plot in PCA, but I think it won't work since PCA required numerical input data (is it?). Any suggestions what kind of data visualization to be used for this type of data? Thank you very much.
I do python and R.

Created factors with EFA, tried regressing (lm) with control variables - Error message "variable lengths differ"

EFA first-timer here!
I ran an Exploratory Factor Analysis (EFA) on a data set ("df1" = 1320 observations) with 50 variables by creating a subset with relevant variables only that have no missing values ("df2" = 301 observations).
I was able to filter 4 factors (19 variables in total).
Now I would like to take those 4 factors and regress them with control variables.
For instance: Factor 1 (df2$fa1) describes job satisfaction.
I would like to control for age and marital status.
Fa1Regression <- lm(df2$fa1 ~ df1$age + df1$marital)
However I receive the error message:
Error in model.frame.default(formula = df2$fa1 ~ df1$age + :
variable lengths differ (found for 'df1$age')
What can I do to run the regression correctly? Can I delete observations from df1 that are nonexistent in df2 so that the variable lengths are the same?
Its having a problem using lm to regress a latent factor on other coefficients. Instead, use the lavaan package, where your model statement would be myModel<- 'df2$fa1~ x1+x2+x3'

Support vector regression based GIS anaysis

I'm new here and I really want some help. I have a dataset including geographical information (longitude, latitude.. ) and I want to ensure the prediction of some aspects using this dataset with Support Vector Regression, but I don't know how to perform this task. I have the following inquires,
Is there a specific precessing I need to go through?
Does SVR consider a geographic dataset as normal data set or are there some specificities in term of tools and treatment?
Any recommended prediction analytics tools (including SVR) considering geographical data?
This given solution is for the situation that you want to extract the independent variable base on the dependent variable from a raster.
but if you have you all dependent and independent data with their corresponding location you simply use svm function in R and you then add a raster or vector (new) data to your predict function for prediction, or you also can use the estimated coefficient of dependent variable in raster calculator in GIS and multiply them to the corresponding independent variable and finally you will get your predicted raster.
Simply you can do the following for spatial data in R.
First of all, the support vector regression can be used for prediction of real value and you can use the library("e1071") in R in order to execute this algorithm.
you can import your dataset as CSV along with lat and long columns.
transform your data.fram to Spatial data.frame
#Read data
dat<-read.csv(choose.files())
#convert the data to SPDF.
dat_sp=SpatialPoints(cbind(dat$x,dat$y))
#add your Geographical referense system
dat_crs=CRS("+proj=utm +zone=39 +datum=WGS84")
#Data Frams for SpatialPoint Data(Creating a SpatialPoints data frame for dat)
dat_spdf=SpatialPointsDataFrame(coords = dat_sp,data = dat, proj4string = dat_crs)
plot(dat_spdf, col='blue', cex=1, pch=16, axes=TRUE)
#Extract value
dat_spdf$ref <- extract(raster , dat_spdf)
then you can extract your data on a raster data or whatever you have(your independent variable).
and finally, you can use the following cold in R.
SVM(dependent ~.,independent)
But you need to really have an intuition about what the SVR is and how to evaluate the result.
you also can show your result as a final raster map.
you can use toolbox package or you may use raster package.

How to get residuals as variable in multiple regression in R?

I am trying to extract regression residuals as a variable to my data so I could use them for different analysis in mydataset.
I use the following code:
fresid<-lm(var1 ~ var2+var3+var4+var5, data=female)
This gives me a table to the global enviroment, but A am unable to get fitting residuals for my data.

How to use Gnuplot to create histogram from binned data from CSV file?

I have a CSV file which is generated by a process that outputs the data in pre-defined bins (say from -100 to +100 in steps of 10). So, each line looks somewhat like this:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
i.e. 20 comma separated values, the first representing the frequency in the range -100 to -90, while the last represents the frequency between 90 to 100.
The problem is, Gnuplot seems to require the raw data for it to be able to generate a histogram, whereas I have only the frequency distribution. How do I proceed in this case? I'm looking for the simplest possible histogram, that perhaps displays the data using vertical bars.
You already have histogram data, so you mustn't use "set histogram".
Generate the x-values from the linenumbers, and do a simple boxplot
plot dataf using (($0-10)*10):$1 with boxes