I have a dataset of 78 variables which the input data of all the variables are binary (0 and 1). I want to plot the data in one graph. originally I plan to plot in PCA, but I think it won't work since PCA required numerical input data (is it?). Any suggestions what kind of data visualization to be used for this type of data? Thank you very much.
I do python and R.
Related
My question may sound easy, however, I have difficulty in creating an Hdf5 dataset from my 3d medical images that have been saved in nii format for both images and manual segmentation (label files). My questions are:
The blob shape in pycaffe is N*C*W*H, is it different order in matcaffe? for example in pycaffe the data blob shape will be 1*1*60*320*320 for a grayscale volume with width and height 320-by-320 and 60 slices. I tried to use a Matlab code to create HDF5 dataset for the 3D data, and the order of blob in hdf5info file is 320*320*60*1*1 for both data and label. How should I change the orders in the Matlab code to be readable in Pycaffe?
Is there any python code for creating the hdf5 database for 3D data?
if I create the hdf5 data in Matlab and use the list the pycaffe, will it raise issue?
Thanks
Matlab arranges elements in ND arrays from the first dimension to the last (like fortran), while caffe and hdf5 stores the arrays from last dimension to first ("col-major").
This answer explains how to simply modify Matlab's code to write "col-major" arrays to HDF5 files.
You might also find this answer useful for your problem.
I'm new here and I really want some help. I have a dataset including geographical information (longitude, latitude.. ) and I want to ensure the prediction of some aspects using this dataset with Support Vector Regression, but I don't know how to perform this task. I have the following inquires,
Is there a specific precessing I need to go through?
Does SVR consider a geographic dataset as normal data set or are there some specificities in term of tools and treatment?
Any recommended prediction analytics tools (including SVR) considering geographical data?
This given solution is for the situation that you want to extract the independent variable base on the dependent variable from a raster.
but if you have you all dependent and independent data with their corresponding location you simply use svm function in R and you then add a raster or vector (new) data to your predict function for prediction, or you also can use the estimated coefficient of dependent variable in raster calculator in GIS and multiply them to the corresponding independent variable and finally you will get your predicted raster.
Simply you can do the following for spatial data in R.
First of all, the support vector regression can be used for prediction of real value and you can use the library("e1071") in R in order to execute this algorithm.
you can import your dataset as CSV along with lat and long columns.
transform your data.fram to Spatial data.frame
#Read data
dat<-read.csv(choose.files())
#convert the data to SPDF.
dat_sp=SpatialPoints(cbind(dat$x,dat$y))
#add your Geographical referense system
dat_crs=CRS("+proj=utm +zone=39 +datum=WGS84")
#Data Frams for SpatialPoint Data(Creating a SpatialPoints data frame for dat)
dat_spdf=SpatialPointsDataFrame(coords = dat_sp,data = dat, proj4string = dat_crs)
plot(dat_spdf, col='blue', cex=1, pch=16, axes=TRUE)
#Extract value
dat_spdf$ref <- extract(raster , dat_spdf)
then you can extract your data on a raster data or whatever you have(your independent variable).
and finally, you can use the following cold in R.
SVM(dependent ~.,independent)
But you need to really have an intuition about what the SVR is and how to evaluate the result.
you also can show your result as a final raster map.
you can use toolbox package or you may use raster package.
I'm totally new in caffe and I'm try to convert a tensorflow model to caffe.
I have a tuple which's shape is a little complex for it's stored some word vector.
This is the shape of the tuple data——
data[0]: a list, [684, 84], stores the sentence vector;
data[1]: a list, [684, 84], stores the position vector;
data[2]: a matrix, [684, 10], stores the aspects of the sentence;
data[3]: a matrix, [1, 684], stores the label of each sentence;
data[4]: a number, stores the max length of sentences;
Each row represents a sentences, which is also a sample of the dataset.
In tf, I return the whole tuple from a function which is wrote by myself.
train_data = read_data(FLAGS.train_data, source_count, source_word2idx)
I noticed that caffe always requires a data layer before training the data, but I don't have ideas how to convert my data to lmdb type or just sent them as a tuple or matrix into the model.
By the way, I'm using pycaffe.
Counld anyone help?
Thanks a lot!
There's no particular magic; all you need to do is to write an input routine that reads the file and returns the data in the format expected for train_data. You do not need to pre-convert your data to LMDB or any other format; just write read data to accept your current input format, and give the model the format it requires.
We can't help you from there: you haven't specified the model's format at all, and you've given us only the shape for the input data (no internal structure or semantics). Simply treat the data as if you were figuring out how to organize the input data for a given output format.
I want to know if I can use Spatial Join functions for visualize a dataset based in two variables.
My csv has 541000 rows and I'm trying to make a visualization in Zeppelin with Spark to minimize de point draws.
All examples I've seen are to GIS systems but there are not the type of data I need.
My csv is this:
id, variableX, variableY, type.
I'm trying to apply a Spatial Join logic to variableX and variableY.
Thank you.
spark-highcharts might do what you want.
It's too much to plot half million points directly. There are some aggregation or filter needed. spark-highcharts will do the aggregation automatically.
For 2 dimension data, chart type like, line, area, spline.
For 3 dimension data, chart type like, arearange, scatter can be used.
With following code to plot bank data provided in Zeppelin Tutorial. It can plot a spline chart with xAxis use column age, and yAxis using aggregated average balance
import com.knockdata.spark.highcharts._
import com.knockdata.spark.highcharts.model._
highcharts(bank.series("name" -> "age", "y" -> avg($"balance")).orderBy($"age")).
xAxis(new XAxis("age").typ("category")).
chart(Chart.spline).
plot()
I have a CSV file which is generated by a process that outputs the data in pre-defined bins (say from -100 to +100 in steps of 10). So, each line looks somewhat like this:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
i.e. 20 comma separated values, the first representing the frequency in the range -100 to -90, while the last represents the frequency between 90 to 100.
The problem is, Gnuplot seems to require the raw data for it to be able to generate a histogram, whereas I have only the frequency distribution. How do I proceed in this case? I'm looking for the simplest possible histogram, that perhaps displays the data using vertical bars.
You already have histogram data, so you mustn't use "set histogram".
Generate the x-values from the linenumbers, and do a simple boxplot
plot dataf using (($0-10)*10):$1 with boxes