I've trained a classifier in Weka, and I'm able to use it on test data. Additionally, I can opt to display the classifier's predictions in the log window for this test data.
However, for my current project, it would be convenient for me to be able to get this data in CSV format. Is this possible in Weka ? Is it only possible when using the command line (something I'll eventually move towards) ?
I could always save the entire buffer result to a text file, but in that case, I would have to parse the file and remove all the "noise" (which isn't really noise, but you get the point).
So, to conclude, is there any way to output Weka's predictions for a test set to a CSV file ?
Edit: as the answer below shows, there is an option to do this. However, it can only be found in Weka 3.7 and above !
I assume you use Weka's Explorer. In the Classify tab click on More options..., then click on Output predictions and select CSV. Now click on the box showing CSV and a window opens where you can fill in the properties of writing to a CSV file. Click on outputFile and select a folder and type a filename (note: you must supply a filename). Running a new test will now save the prediction results to your CSV file.
In Weka 3.6.x you can right-click your model, choose "Visualize classifier errors" and Save the data (including the prediction) from there.
If you use weka knowledge flow to build models (Easier than explorer), there are CSV data sinks you can use to save as CSV file.
Related
How can I edit a large JSON manually?
I have a large JSON file, about 100 MB. I'd like to manually inspect some attributes, and then add more attributes to some of the objects.
I'd start off by looking at a subset of the file. Say, the 1st 100 objects. I'd gradually scale up to looking then at maybe 250, then a thousand, etc.
Can someone suggest a language or software (I'm running Windows) that excels at this task?
Some previous suggestion that aren't working or can't work.
Sublime - Could never load the file. Loading bar forever. Had to kill.
NotePad++ - Could never load. Froze. Had to kill.
Anything online - The data is confidential.
More Python and Jupyter information.
with open(path, 'r') as f:
data = json.load(f)
for i, (k, v) in enumerate(data.items()):
print(i, k, v)
if i == 2:
break
Causes an error. I think it has to do with Jupyter, but I'm not sure.
IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.
Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)
That makes me wonder if going about it this way is just dumb.
Possible Solutions
Build a custom app using TKinter
Just don't use a Jupyter Notebook
What you can do is to write a simple GUI program. use TKinter, to create a window and a text area inside it to show the json, a text box where you will input, how many objects you want to see, and a button named Next or something to see next and one more button to save.The following will be the functionalities for each of the items.
First you will be reading complete json in python and making it a dict.
Next Button - This will keep iterating based on the value in the TextBox. you could write a custom generator, where it will be yielding based on the number of values required.
Save Button-: This will keep saving the current json into a new json or if you could, you can try to write a function to update the current json directly.
Text Area - you should take the dictionary and convert to json and show the output from the Next Button's generator.
If you are using linux (or have an opportunity to transfer the file to *nix) you might wish to check out for number of lines within a file via
wc -l myfile.json
Let's say, for the purpose of simplicity, that your file has 2530000 lines and you wish to split it into 100k lines each, you can utilize any of the commands available at your distro to split the file further into desired chunks and then to edit them, one by one.
If you are comfortable with going the "linux way", check out some of the hints given on other topics, i.e.
edit multi-GB file when vi editor doesn't work
I hope it helps!
The only viewer I have used that works on large files (I had up to 250MB size files) is Dadroit. It is fast to view and comes with search.
Now, to edit, I use vi. I search for the location and make local edits. Vim or another simpler editor should work on Windows. Have you tried vscode? 100MB shouldn't be too large for it.
The other awesome terminal tool for viewing and editing data is Visidata. I have had mixed luck with it working on json files.
Not the best answer, but the problem with reading the JSON seems limited to Jupyter Notebooks (or even the limitations of my laptop).
Working in Spyder or running from the command line circumvents the Jupyter error mentioned in the original question.
It'd be great if someone knew how to tweak Jupyter to avoid this problem (sorry, I'm not sure how yet).
for editor,try notepad++
for language, try Python
since you haven't give your data structure, I can't give more answer.
I've been using Pentaho Data Integration lately and currently I intend to use it to a project I'm in. The assist I'm looking for is the following:
There can be variable CSV file inputs in a folder
Is there a way to get all .csv files (the operator/ series of operators) using Pentaho?
After this step I believe what I have to do is pretty simple, as I only have to merge those files together.
Thanks
Use the Text File Input. It allows for folders using a regular expression and can handle csv files
Add the "Get File Names" step before the "CSV file input" step. When the CSV step has input, then a field appears in the configuration dialog allowing you to get the filename from the incoming stream.
I'm trying to get geometry data from a large quantity of shapefiles into a database (Google Datastore). The thing is, I don't need to work with maps, I just need the coordinates, so I would like just the numerical coordinates. Ideally I'd like to use CSV, but any plain text would be workable. I have a Mac and have been able to get QGIS installed (I also tried udig but the interface was baffling). While it is easy to load a shp file into QGIS as a vector layer, I'm lost as to how to export the geometry, or even if it is possible.
Does anyone know how to extract plain text geometry from a shp file? Ideally with QGIS, but any method would be appreciated.
The "You can simply right-click the layer entry in QGIS and select "Save as"" approach was right
But the "GEOMETRY=AS_WKT" in the OGR layer option was missing.
I may also be a good idea to convert the coordinate system to WGS 84, as CSV are usually expected not to be projected (and shapefile sometimes are)
You can simply right-click the layer entry in QGIS and select "Save as".
In the dialog, there's an option to save as "CSV".
There are plenty of options to refine the format of the generated CSV file, as well as there are many other file formats to choose from.
Update:
See here for a solution: https://gis.stackexchange.com/a/8846
Outdated Response:
It is possible, in a sort of roundabout way...
Open the attribute table for the layer you want to save.
Select all rows.
Copy the rows
Paste into a spreadsheet
Save the spreadsheet as a csv.
Unfortunately there is no way to do this directly in QGIS.
See here for more details:
https://gis.stackexchange.com/questions/8844/get-list-of-coordinates-for-points-in-a-layer/8911#8911
I am trying to export all the plots of my NetLogo model after simulation runs in a csv format with the primitive export-all-plots.
I haven't found yet the way to open this csv file with an external reader in order to get more clear plots. I tried with gnuplot but it looks like it's not able to open the csv format created with NetLogo:
"export-plots data (NetLogo 5.0.5)"
^
"C:\results\interface.csv", line 1: invalid command
How can I open csv plots with an external reader?
There are two complicating factors about NetLogo's plot export format. First, there's a three line header at the beginning (plus an empty line after) that just gives information about the model and when the data was generated. Next, there's data about the model settings, the plot state (pen colors and such). Finally, there's the data itself, which itself is somewhat complicated by the fact that you can have multiple pens per plot. So I'm not surprised gnuplot couldn't read it as is.
The table's are quite easy to use in GUI spreadsheet application, like Excel, LibreOffice's Calc, or Gnumeric. You can just select the data you want and generate the plots.
To do this at the command line, I'm afraid you might have to write a script to read it in. This should be pretty easy in something like Python or R. Just skip the metadata lines, and use a CSV parser to read in the rest.
You might also try using BehaviorSpace to generate the data, but make sure to use the table output. It let's you generate the data from many runs at once, and the format is a little more consistent. There are still 6 lines of metadata at the top, but you can just delete that. I believe this is more the standard practice in NetLogo.
I want to create .csv files with the Report Generation Toolkit in Labview.
They must actually be .csv files which can be opened with Notepad or something similar.
Creating a .csv is not that hard, it's just a matter of adding the extension to the file name that's going to be created.
If I create a .csv file this way it opens nicely in excel just the way it should, but if I open it in Notepad it shows all kind of characters and it doesn't even come close to the data I wrote to the file.
I create the files with the Labview code below:
Link to image (can't post image yet because I've got to few points)
I know .csv files can be created with the Write to Spreadsheet VI but I would like to use the Report Generation Toolkit because it's pretty easy to add columns and rows to the file and that is something I really need.
you can use the Robust CSV package on the lavag.org forum to read and write 2D arrays to CSV files.
http://lavag.org/files/file/239-robust-csv/
Calling a file "csv" does not make it a CSV file. I never used the toolkit to generate an Excel file, but I'm assuming it creates an XLS or XLSX file, regardless of what extension you give it, which is why you're seeing gibberish (probably XLS, since it's been around for a while and I believe XLSX is XML, not binary).
I'm not sure what your problem is with the write spreadsheet VI. It has an append input, so I assume you can use that to at least add rows directly to a file, although I can't say I ever tried it. I would prefer handling all the data in memory explicitly, where you can easily use the array functions to add rows or columns to the array and then overwrite the entire file.