SuperCsvException CellProcessorNumber processing google contacts csv exports - csv

I am using SuperCsv to process contact csv files from different sources.
Number of columns is the same and there is a header in file so I want to use the CsvBeanReader.
Has different sources have different columns and header titles, I am building dynamically the cellProcessors array based on the number of columns identified in the header.
I was struggling for a few hours with a SuperCsvException telling me there was a mismatch between the number of processors and some particular files which happen to all be csv exports from google mail contacts applications before I noticed these files had datarows ending with a useless comma where has the header row has not.
I solved the problem by catching the first SuperCsvException and adding the extra cell processor at this time but I was wondering whether this last comma was present in other types of csv files and whether superCsv had any option that could allow to keep the power of CsvBeanReader allowing for this last comma flexibility.

I would consider using the CsvListReader.Read() to get a list of string values. If you then by the length of the list know what to do, you can apply an array of processors using the Util.executeCellProcessors() which takes as input the list of strings and the cellprocessors.

Related

Generating a DATASET using OMNet++

After creating a Simulation using OMNet++ and with the help of the INET framework , i have generated .vec and .sca files , the problem is that when i want to import the .csv file i don't get some important features such as the event number , time ...etc., the features that i get are here in this picture :
my output
So please, what should i do to make these features appear in my output ?
It may be possible to get the event number: https://doc.omnetpp.org/omnetpp/manual/#sec:ana-sim:vector-eventnum-recording but it is not exported by default to CSV
You are already geting the vector time in the value "vectime".
Each vector has a comma separated list of times in vectime and a corresponding comma separated list of values in vecvalue.

receive Excel data and turn into objects to format a JSON

I have this solution that helps me creating a Wizard to fill some data and turn into JSON, the problem now is that I have to receive a xlsx and turn specific data from it into JSON, not all the data but only the ones I want which are documented in the last link.
In this link: https://stackblitz.com/edit/xlsx-to-json I can access the excel data and turn into object (when I print document.getElementById('output').innerHTML = JSON.parse(dataString); it shows [object Object])
I want to implement this solution and automatically get the specified fields in the config.ts but can't get to work. For now, I have these in my HTML and app-component.ts
https://stackblitz.com/edit/angular-xbsxd9 (It's probably not compiling but it's to show the code only)
It wasn't quite clear what you were asking, but based on the assumption that what you are trying to do is:
Given the data in the spreadsheet that is uploaded
Use a config that holds the list of column names you want returned in the JSON when the user clicks to download
based on this, I've created a fork of your sample here -> Forked Stackbliz
what I've done is:
use the map operator on the array returned from the sheet_to_json method
Within the map, the process is looping through each key of the record (each key being a column in this case).
If a column in the row is defined in the propertymap file (config), then return it.
This approach strips out all columns you don't care about up front. so that by the time the user clicks to download the file, only the columns you want are returned. If you need to maintain the original columns, then you can move this logic somewhere more convenient for you.
I also augmented the property map a little to give you more granular control over how to format the data in the returned JSON. i.e. don't treat numbers as strings in the final output. you can use this as a template if it suites your needs for any additional formatting.
hope it helps.

When i am sending the csv file to marklogic it is not overwriting the previous one?

I am sending the following csv files to marklogic
id,first_name,last_name,email,country,ip_address
5,Shawn,Grant,sgrant0#51.la,Liberia,37.194.161.124
5,Joshua,Fields,jfields1#godaddy.com,Colombia,54.224.238.176
5,Johnny,Bell,jbell2#t.co,Finland,159.38.61.122
Through mlcp using following command
C:\mlcp-9.0.3\bin>mlcp.bat import -host localhost -port 9636 -username admin -pa
ssword admin -input_file_path D:\test.csv -input_file_type delimited_text -docum
ent_type json
What happened ?
When i seen query console i had one JSON document with following information
id,first_name,last_name,email,country,ip_address
5,Shawn,Grant,sgrant0#51.la,Liberia,37.194.161.124
What i am expecting ?
By default first column of csv is taken by creating json/xml document . Since i am sending 3 rows it should have latest information(i.e.3rd row) right.
By Assumption
Since i am sending all three rows at once in mlcp we cant say which one is going first to ML DB
Let me know whether my assumption is right or wrong ..
Thanks
MLCP wants to be as fast as possible. In the case of CSV files it will process the rows using many threads (and even shard the document if you pass the split option). With this, there is no guarantee that it will be processed in any particular order. You may be able to tune some of the settings in MLCP to use one thread and not shard the file to affect the results you want, but in that case, you are loosing some of the power of MLCP.
Second to that, an observaion: You are adding quite a bit of overhead of inserting and overwriting un-needed documents from how I interpret your problem statement. Why not sort and filter your initial CSV document to only one record per ID and save your computer from doing more work.

Editing JSON - Add Attribute

I have a slew of JSON files I'm getting dumps of, with data from the day/period it was pulled. Most of the JSON files I'm dealing with are a lot larger than this, but I figured a smaller one would be easier to work with.
{"playlists":[{"uri":"spotify:user:11130196075:playlist:1Ov4b3NkyzIMwfY9E8ixpE","listeners":366,"streams":386,"dateAdded":"2016-02-24","newListeners":327,"title":"#Covers","owner":"Saga Prommeedet"},{"uri":"spotify:user:mickeyrose30:playlist:2Ov4b3NkyzIMwfY9E8ixpE","listeners":229,"streams":263,"dateAdded":"removed","newListeners":154,"title":"bestcovers2016","owner":"Mickey Rose"}],"top":2,"total":53820}
What I'm essentially trying to do is add a date attribute to each line of data, so that when I combine multiple JSON files to put through an analytical tool, the right row of data is associated with the correct date. My first thought was to write it as such:
{"playlists":[{"uri":"spotify:user:11130196075:playlist:1Ov4b3NkyzIMwfY9E8ixpE","listeners":366,"streams":386,"dateAdded":"2016-02-24","newListeners":327,"title":"#Covers","owner":"Saga Prommeedet"},{"uri":"spotify:user:mickeyrose30:playlist:2Ov4b3NkyzIMwfY9E8ixpE","listeners":229,"streams":263,"dateAdded":"removed","newListeners":154,"title":"bestcovers2016","owner":"Mickey Rose"}],"top":2,"total":53820,"date":072617}
since the "top" and "total" attributes are showing up on each row of data (with the associated values also showing up on each row) when I put it through an analytical tool like Tableau.
Also, have been editing and saving files through Brackets, and testing things through this converter (https://konklone.io/json/)
In javascript language
var m = JSON.parse(json_string);
m["date"]="20170804";
JSON.stringify(m);
This will work for you, very simple,

Random selection from CSV file in Jmeter

I have a very large CSV file (8000+ items) of URLs that I'm reading with a CSV Data Set Config element. It is populating the path of an HTTP Request sampler and iterating through with a while controller.
This is fine except what I want is have each user (thread) to pick a random URL from the CSV URL list. What I don't want is each thread using CSV items sequentially.
I was able to achieve this with a Random Order Controller with multiple HTTP Request samplers , however 8000+ HTTP Samplers really bogged down jmeter to an unusable state. So this is why I put the HTTP Sampler URLs in the CSV file. It doesn't appear that I can use the Random Order Controller with the CSV file data however. So how can I achieve random CSV data item selection per thread?
There is another way to achieve this:
create a separate thread group
depending on what you want to achieve:
add a (random) loop count -> this will set a start offset for the thread group that does the work
add a loop count or forever and a timer and let it loop while the other thread group is running. This thread group will read a 'pseudo' random line
It's not really random, the file is still read sequentially, but your work thread makes jumps in the file. It worked for me ;-)
There's no random selection function when reading csv data. The reason is you would need to read the whole file into memory first to do this and that's a bad idea with a load test tool (any load test tool).
Other commercial tools solve this problem by automatically re-processing the data. In JMeter you can achieve the same manually by simply sorting the data using an arbitrary field. If you sort by, say Surname, then the result is effectively random distribution.
Note. If you ensure the default All Threads is set for the CSV Data Set Config then the data will be unique in the scope of the JMeter process.
The new Random CSV Data Set Config from BlazeMeter plugin should perfectly fit your needs.
As other answers have stated, the reason you're not able to select a line at random is because you would have to read the whole file into memory which is inefficient.
Rather than trying to get JMeter to handle this on the fly, why not just randomise the file order itself before you start the test?
A scripting language such as perl makes short work of this:
cat unrandom.csv | perl -MList::Util=shuffle -e 'print shuffle<STDIN>' > random.csv
For my case:
single column
small dataset
Non-changing CSV
I just discard using CSV and refer to https://stackoverflow.com/a/22042337/6463291 and use a Bean Preprocessor instead, something like this:
String[] query = new String[]{"csv_element1", "csv_element2", "csv_element3"};
Random random = new Random();
int i = random.nextInt(query.length);
vars.put("randomOption",query[i]);
Performance seems ok, if you got the same issue can try this out.
I am not sure if this will work, but I will anyways suggest it.
Why not divide your URLs in 100 different CSV files. Then in each thread you generate the random number and use that number to identify CSV file to read using __CSVRead function.
CSVRead">http://jmeter.apache.org/usermanual/functions.html#_CSVRead
Now the only part I am not sure if the __CSVRead function reopens the file every time or shares the same file handle across the threads.
You may want to try it. Please share your findings.
A much straight forward solution.
In CSV file, add another column (say B)
apply =RAND() function in the first cell of column B (say B1). This will create random float number.
Drag the cell (say B1) corner to apply for all the corresponding URLs
Sort column B.
your URL will be sorted randomly.
Delete column B.