Generating a DATASET using OMNet++ - csv

After creating a Simulation using OMNet++ and with the help of the INET framework , i have generated .vec and .sca files , the problem is that when i want to import the .csv file i don't get some important features such as the event number , time ...etc., the features that i get are here in this picture :
my output
So please, what should i do to make these features appear in my output ?

It may be possible to get the event number: https://doc.omnetpp.org/omnetpp/manual/#sec:ana-sim:vector-eventnum-recording but it is not exported by default to CSV
You are already geting the vector time in the value "vectime".
Each vector has a comma separated list of times in vectime and a corresponding comma separated list of values in vecvalue.

Related

receive Excel data and turn into objects to format a JSON

I have this solution that helps me creating a Wizard to fill some data and turn into JSON, the problem now is that I have to receive a xlsx and turn specific data from it into JSON, not all the data but only the ones I want which are documented in the last link.
In this link: https://stackblitz.com/edit/xlsx-to-json I can access the excel data and turn into object (when I print document.getElementById('output').innerHTML = JSON.parse(dataString); it shows [object Object])
I want to implement this solution and automatically get the specified fields in the config.ts but can't get to work. For now, I have these in my HTML and app-component.ts
https://stackblitz.com/edit/angular-xbsxd9 (It's probably not compiling but it's to show the code only)
It wasn't quite clear what you were asking, but based on the assumption that what you are trying to do is:
Given the data in the spreadsheet that is uploaded
Use a config that holds the list of column names you want returned in the JSON when the user clicks to download
based on this, I've created a fork of your sample here -> Forked Stackbliz
what I've done is:
use the map operator on the array returned from the sheet_to_json method
Within the map, the process is looping through each key of the record (each key being a column in this case).
If a column in the row is defined in the propertymap file (config), then return it.
This approach strips out all columns you don't care about up front. so that by the time the user clicks to download the file, only the columns you want are returned. If you need to maintain the original columns, then you can move this logic somewhere more convenient for you.
I also augmented the property map a little to give you more granular control over how to format the data in the returned JSON. i.e. don't treat numbers as strings in the final output. you can use this as a template if it suites your needs for any additional formatting.
hope it helps.

Spark - load numbers from a CSV file with non-US number format

I have a CSV file which I want to convert to Parquet for futher processing. Using
sqlContext.read()
.format("com.databricks.spark.csv")
.schema(schema)
.option("delimiter",";")
.(other options...)
.load(...)
.write()
.parquet(...)
works fine when my schema contains only Strings. However, some of the fields are numbers that I'd like to be able to store as numbers.
The problem is that the file arrives not as an actual "csv" but semicolon delimited file, and the numbers are formatted with German notation, i.e. comma is used as decimal delimiter.
For example, what in US would be 123.01 in this file would be stored as 123,01
Is there a way to force reading the numbers in different Locale or some other workaround that would allow me to convert this file without first converting the CSV file to a different format? I looked in Spark code and one nasty thing that seems to be causing issue is in CSVInferSchema.scala line 268 (spark 2.1.0) - the parser enforces US formatting rather than e.g. rely on the Locale set for the JVM, or allowing configuring this somehow.
I thought of using UDT but got nowhere with that - I can't work out how to get it to let me handle the parsing myself (couldn't really find a good example of using UDT...)
Any suggestions on a way of achieving this directly, i.e. on parsing step, or will I be forced to do intermediate conversion and only then convert it into parquet?
For anybody else who might be looking for answer - the workaround I went with (in Java) for now is:
JavaRDD<Row> convertedRDD = sqlContext.read()
.format("com.databricks.spark.csv")
.schema(stringOnlySchema)
.option("delimiter",";")
.(other options...)
.load(...)
.javaRDD()
.map ( this::conversionFunction );
sqlContext.createDataFrame(convertedRDD, schemaWithNumbers).write().parquet(...);
The conversion function takes a Row and needs to return a new Row with fields converted to numerical values as appropriate (or, in fact, this could perform any conversion). Rows in Java can be created by RowFactory.create(newFields).
I'd be happy to hear any other suggestions how to approach this but for now this works. :)

SuperCsvException CellProcessorNumber processing google contacts csv exports

I am using SuperCsv to process contact csv files from different sources.
Number of columns is the same and there is a header in file so I want to use the CsvBeanReader.
Has different sources have different columns and header titles, I am building dynamically the cellProcessors array based on the number of columns identified in the header.
I was struggling for a few hours with a SuperCsvException telling me there was a mismatch between the number of processors and some particular files which happen to all be csv exports from google mail contacts applications before I noticed these files had datarows ending with a useless comma where has the header row has not.
I solved the problem by catching the first SuperCsvException and adding the extra cell processor at this time but I was wondering whether this last comma was present in other types of csv files and whether superCsv had any option that could allow to keep the power of CsvBeanReader allowing for this last comma flexibility.
I would consider using the CsvListReader.Read() to get a list of string values. If you then by the length of the list know what to do, you can apply an array of processors using the Util.executeCellProcessors() which takes as input the list of strings and the cellprocessors.

Random selection from CSV file in Jmeter

I have a very large CSV file (8000+ items) of URLs that I'm reading with a CSV Data Set Config element. It is populating the path of an HTTP Request sampler and iterating through with a while controller.
This is fine except what I want is have each user (thread) to pick a random URL from the CSV URL list. What I don't want is each thread using CSV items sequentially.
I was able to achieve this with a Random Order Controller with multiple HTTP Request samplers , however 8000+ HTTP Samplers really bogged down jmeter to an unusable state. So this is why I put the HTTP Sampler URLs in the CSV file. It doesn't appear that I can use the Random Order Controller with the CSV file data however. So how can I achieve random CSV data item selection per thread?
There is another way to achieve this:
create a separate thread group
depending on what you want to achieve:
add a (random) loop count -> this will set a start offset for the thread group that does the work
add a loop count or forever and a timer and let it loop while the other thread group is running. This thread group will read a 'pseudo' random line
It's not really random, the file is still read sequentially, but your work thread makes jumps in the file. It worked for me ;-)
There's no random selection function when reading csv data. The reason is you would need to read the whole file into memory first to do this and that's a bad idea with a load test tool (any load test tool).
Other commercial tools solve this problem by automatically re-processing the data. In JMeter you can achieve the same manually by simply sorting the data using an arbitrary field. If you sort by, say Surname, then the result is effectively random distribution.
Note. If you ensure the default All Threads is set for the CSV Data Set Config then the data will be unique in the scope of the JMeter process.
The new Random CSV Data Set Config from BlazeMeter plugin should perfectly fit your needs.
As other answers have stated, the reason you're not able to select a line at random is because you would have to read the whole file into memory which is inefficient.
Rather than trying to get JMeter to handle this on the fly, why not just randomise the file order itself before you start the test?
A scripting language such as perl makes short work of this:
cat unrandom.csv | perl -MList::Util=shuffle -e 'print shuffle<STDIN>' > random.csv
For my case:
single column
small dataset
Non-changing CSV
I just discard using CSV and refer to https://stackoverflow.com/a/22042337/6463291 and use a Bean Preprocessor instead, something like this:
String[] query = new String[]{"csv_element1", "csv_element2", "csv_element3"};
Random random = new Random();
int i = random.nextInt(query.length);
vars.put("randomOption",query[i]);
Performance seems ok, if you got the same issue can try this out.
I am not sure if this will work, but I will anyways suggest it.
Why not divide your URLs in 100 different CSV files. Then in each thread you generate the random number and use that number to identify CSV file to read using __CSVRead function.
CSVRead">http://jmeter.apache.org/usermanual/functions.html#_CSVRead
Now the only part I am not sure if the __CSVRead function reopens the file every time or shares the same file handle across the threads.
You may want to try it. Please share your findings.
A much straight forward solution.
In CSV file, add another column (say B)
apply =RAND() function in the first cell of column B (say B1). This will create random float number.
Drag the cell (say B1) corner to apply for all the corresponding URLs
Sort column B.
your URL will be sorted randomly.
Delete column B.

CSV format for OpenCV machine learning algorithms

Machine learning algorithms in OpenCV appear to use data read in CSV format. See for example this cpp file. The data is read into an OpenCV machine learning class CvMLData using the following code:
CvMLData data;
data.read_csv( filename )
However, there does not appear to be any readily available documentation on the required format for the csv file. Does anyone know how the csv file should be arranged?
Other (non-Opencv) programs tend to have a line per training example, and begin with an integer or string indicating the class label.
If I read the source for that class, particularly the str_to_flt_elem function, and the class documentation I conclude that valid formats for individual items in the file are:
Anything that can be parsed to a double by strod
A question mark (?) or the empty string to represent missing values
Any string that doesn't parse to a double.
Items 1 and 2 are only valid for features. anything matched by item 3 is assumed to be a class label, and as far as I can deduce the order of the items doesn't matter. The read_csv function automatically assigns each column in the csv file the correct type, and (if you want) you can override the labels with set_response_index. Delimiter wise you can use the default (,) or set it to whatever you like before calling read_csv with set_delimiter (as long as you don't use the decimal point).
So this should work for example, for 6 datapoints in 3 classes with 3 features per point:
A,1.2,3.2e-2,+4.1
A,3.2,?,3.1
B,4.2,,+0.2
B,4.3,2.0e3,.1
C,2.3,-2.1e+3,-.1
C,9.3,-9e2,10.4
You can move your text label to any column you want, or even have multiple text labels.