how split regression data to convert to hdf5 (Caffe)

how split regression data to convert to hdf5 (Caffe) - regression

I am using #shai's code to convert data to hdf5, but after converting the size of data is too large, larger than limit size of caffe (2GB)
so , my question is how we should split the data?
we only need convert data separately depends on what we want ?

It would be helpful to have more information to give a better answer. Please give the actual error that Caffe gives you. Also, what exactly is larger than 2GB? Is it the txt file?
HDF5 is not very efficient. I would recommend either using the database layer (http://caffe.help/manual/layers/data.html) or writing your own customized Python layer (http://caffe.help/manual/layers/python.html).

Related

The result of using the catboostclassifier model's output python file to predict is different from the result of using model to predict directly

I want to verify that the predicted results from the exported file are consistent with those predicted directly.
I use the output Python file with the model description of catclassifier to predict result:
But the result which is predicted directly is 2.175615211102761. It is verified that this is true for multiple data. I want to know why and how to solve it.
float_sample and cat_sample look like

Supplementary question: the results predicted by using the model file described in Python language provided by the catboost tutorial are different from those predicted directly by the model

reading .csv file + JSON with Matlab

So I have a .CSV file that contains dataset information, the data seems to be described in JSON. I want to read it with MatLab. One line example(7000 total) of the data:
imagename.jpg,"[[{""name"":""nose"",""position"":[2911.68,1537.92]},{""name"":""left eye"",""position"":[3101.76,544.32]},{""name"":""right eye"",""position"":[2488.32,544.32]},{""name"":""left ear"",""position"":null},{""name"":""right ear"",""position"":null},{""name"":""left shoulder"",""position"":null},{""name"":""right shoulder"",""position"":[190.08,1270.08]},{""name"":""left elbow"",""position"":null},{""name"":""right elbow"",""position"":[181.44,3231.36]},{""name"":""left wrist"",""position"":[2592,3093.12]},{""name"":""right wrist"",""position"":[2246.4,3965.76]},{""name"":""left hip"",""position"":[3006.72,3360.96]},{""name"":""right hip"",""position"":[155.52,3412.8]},{""name"":""left knee"",""position"":null},{""name"":""right knee"",""position"":null},{""name"":""left ankle"",""position"":[2350.08,4786.56]},{""name"":""right ankle"",""position"":[1460.16,5019.84]}]]","[[{""segment"":[[0,17.28],[933.12,5175.36],[0,5166.72],[0,2306.88]]}]]",https://imageurl.jpg,
If I use the Import functionlity/tool, I am able separate the data in four colums using the , as delimiter:
Image File Name,Key Points,Segmentation,Image URL,
imagename.jpg,
"[[{""name"":""nose"",""position"":[2911.68,1537.92]},{""name"":""left eye"",""position"":[3101.76,544.32]},{""name"":""right eye"",""position"":[2488.32,544.32]},{""name"":""left ear"",""position"":null},{""name"":""right ear"",""position"":null},{""name"":""left shoulder"",""position"":null},{""name"":""right shoulder"",""position"":[190.08,1270.08]},{""name"":""left elbow"",""position"":null},{""name"":""right elbow"",""position"":[181.44,3231.36]},{""name"":""left wrist"",""position"":[2592,3093.12]},{""name"":""right wrist"",""position"":[2246.4,3965.76]},{""name"":""left hip"",""position"":[3006.72,3360.96]},{""name"":""right hip"",""position"":[155.52,3412.8]},{""name"":""left knee"",""position"":null},{""name"":""right knee"",""position"":null},{""name"":""left ankle"",""position"":[2350.08,4786.56]},{""name"":""right ankle"",""position"":[1460.16,5019.84]}]]",
"[[{""segment"":[[0,17.28],[933.12,5175.36],[0,5166.72],[0,2306.88]]}]]",
https://imageurl.jpg,
But I have truble trying to use the tool to do further decomposition of the data. Of corse the ideal would be to separate the data in a code.
I hope someone can orientate me on how to or the tools I need to use. I have seen other questions, but they don't seem to fit my particular case.
Thank you very much!!

You can read a JSON file and store it in a MATLAB structure using the following command structure1 = matlab.internal.webservices.fromJSON(json_string)
You can create a JSON string from a MATLAB structure using the following command json_string= matlab.internal.webservices.toJSON(structure1)

JSONlab is what you want. It has a 'loadjson' function which inputs a char array of JSON data and returns a struct with all the data

How to pre-process category info in deeplearning (keras) training input data?

I have category information in the training input data, I am wondering what's the best way to normalize it.
The category information is like "city", "gender" and etc.
I'd like to use Keras to handle the process.

Scikitlearn has a preprocessing library with functions to normalize or scale your data.
This video gives an example for how to preprocess data that will be used for training a model with Keras. The preprocessing here is done with the library mentioned above.
As shown in the video, with the use of Scikitlearn's MinMaxScaler class, you can specify a range that you want your data to be transformed into, and then fit your data to that range using the MinMaxScaler.fit_transform() function.

How to train for fcn using my own images?

I have a problem to train for FCN with caffe.
I prepared my images(original image data, segmented image data).
eg).jpg.
Then, I want to convert my data to lmdb using convert_imageset.exe, but its format is image(array)_label(int). But my data is image(array)_label(array).
How to convert own images for FCN?

Edit convert_imageset.exe sources to handle your desired format - it's in caffe/tools/convert_imageset.cpp

Json-Opening Yelp Data Challenge's data set

I am interested in data mining and I am writing my thesis about it. For my thesis I want to use yelp's data challenge's data set, however i can not open it since it is in json format and almost 2 gb. In its website its been said that the dataset can be opened in phyton using mrjob, but I am also not very good with programming. I searched online and looked some of the codes yelp provided in github however I couldn't seem to find an article or something which explains how to open the dataset, clearly.
Can you please tell me step by step how to open this file and maybe how to convert it to csv?
https://www.yelp.com.tr/dataset_challenge
https://github.com/Yelp/dataset-examples

data is in .tar format when u extract it again it has another file,rename it to .tar and then extract it.you will get all the json files

yes you can use pandas. Take a look:
import pandas as pd
# read the entire file into a python array
with open('yelp_academic_dataset_review.json', 'rb') as f:
data = f.readlines()
# remove the trailing "\n" from each line
data = map(lambda x: x.rstrip(), data)
data_json_str = "[" + ','.join(data) + "]"
# now, load it into pandas
data_df = pd.read_json(data_json_str)
Now 'data_df' contains the yelp data ;)
Case, you want convert it directly to csv, you can use this script
https://github.com/Yelp/dataset-examples/blob/master/json_to_csv_converter.py
I hope it can help you

To process huge json files, use a streaming parser.
Many of these files aren't a single json, but a stream of jsons (known as "jsons format"). Then a regular json parser will consider everything but the first entry to be junk.
With a streaming parser, you can start reading the file, process parts, and wrote them to the desired output; then continue writing.
There is no single json-to-csv conversion.
Thus, you will not find a general conversion utility, you have to customize the conversion for your needs.
The reason is that a JSON is a tree but a CSV is not. There exists no ultimative and efficient conversion from trees to table rows. I'd stick with JSON unless you are always extracting only the same x attributes from the tree.
Start coding, to become a better programmer. To succeed with such amounts of data, you need to become a better programmer.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

how split regression data to convert to hdf5 (Caffe) - regression

I am using #shai's code to convert data to hdf5, but after converting the size of data is too large, larger than limit size of caffe (2GB) so , my question is how we should split the data? we only need convert data separately depends on what we want ?

Related

The result of using the catboostclassifier model's output python file to predict is different from the result of using model to predict directly

reading .csv file + JSON with Matlab

How to pre-process category info in deeplearning (keras) training input data?

How to train for fcn using my own images?

Json-Opening Yelp Data Challenge's data set

Categories

Resources