D3, DC, CSV - How to stop scientific notation rounding numbers in js - csv

I'm loading multiple .csv files, which I then merge on the basis of a common ID. In some cases, the ID is an integer > 12000000000000.
Initially I was loading CSV files, but switched to DSV in the hope of fixing the problem. I'm using queue.js, so my loading code looks like this:
queue()
.defer(d3.dsv("|", "text/plain"), 'portfolio2.csv')
.defer(d3.dsv("|", "text/plain"), 'ratings2.csv')
Although the files have an extension of .csv, they are pipe delimited and load fine.
I think the problem is that D3 or javascript or some part of my code loads in scientific notation, and somehow it overwrites the original data with a rounded number. So where I have an ID of 12000000110858, once I load the webpage, the number changes in the CSV file to 12000000000000. There is no line of code overwriting this ID field. It's as though the act of loading the file is enough by itself.
I've tried using pipe delimited and that suffers the same problem.
Of course, my app then falls over, because it can't perform the matching. Is anyone familiar with this problem and knows of a solution?
Thanks for any help.

Related

Edit a large JSON file

How can I edit a large JSON manually?
I have a large JSON file, about 100 MB. I'd like to manually inspect some attributes, and then add more attributes to some of the objects.
I'd start off by looking at a subset of the file. Say, the 1st 100 objects. I'd gradually scale up to looking then at maybe 250, then a thousand, etc.
Can someone suggest a language or software (I'm running Windows) that excels at this task?
Some previous suggestion that aren't working or can't work.
Sublime - Could never load the file. Loading bar forever. Had to kill.
NotePad++ - Could never load. Froze. Had to kill.
Anything online - The data is confidential.
More Python and Jupyter information.
with open(path, 'r') as f:
data = json.load(f)
for i, (k, v) in enumerate(data.items()):
print(i, k, v)
if i == 2:
break
Causes an error. I think it has to do with Jupyter, but I'm not sure.
IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.
Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)
That makes me wonder if going about it this way is just dumb.
Possible Solutions
Build a custom app using TKinter
Just don't use a Jupyter Notebook
What you can do is to write a simple GUI program. use TKinter, to create a window and a text area inside it to show the json, a text box where you will input, how many objects you want to see, and a button named Next or something to see next and one more button to save.The following will be the functionalities for each of the items.
First you will be reading complete json in python and making it a dict.
Next Button - This will keep iterating based on the value in the TextBox. you could write a custom generator, where it will be yielding based on the number of values required.
Save Button-: This will keep saving the current json into a new json or if you could, you can try to write a function to update the current json directly.
Text Area - you should take the dictionary and convert to json and show the output from the Next Button's generator.
If you are using linux (or have an opportunity to transfer the file to *nix) you might wish to check out for number of lines within a file via
wc -l myfile.json
Let's say, for the purpose of simplicity, that your file has 2530000 lines and you wish to split it into 100k lines each, you can utilize any of the commands available at your distro to split the file further into desired chunks and then to edit them, one by one.
If you are comfortable with going the "linux way", check out some of the hints given on other topics, i.e.
edit multi-GB file when vi editor doesn't work
I hope it helps!
The only viewer I have used that works on large files (I had up to 250MB size files) is Dadroit. It is fast to view and comes with search.
Now, to edit, I use vi. I search for the location and make local edits. Vim or another simpler editor should work on Windows. Have you tried vscode? 100MB shouldn't be too large for it.
The other awesome terminal tool for viewing and editing data is Visidata. I have had mixed luck with it working on json files.
Not the best answer, but the problem with reading the JSON seems limited to Jupyter Notebooks (or even the limitations of my laptop).
Working in Spyder or running from the command line circumvents the Jupyter error mentioned in the original question.
It'd be great if someone knew how to tweak Jupyter to avoid this problem (sorry, I'm not sure how yet).
for editor,try notepad++
for language, try Python
since you haven't give your data structure, I can't give more answer.

Puppet - CSV file header

I'm, writing a Puppet (3.6.2) module that reads data fields from a CSV file via the extlookup function and I cannot figure out how to tell extlookup that the first line is the header field. Does extlookup support this? If not, can anyone recommend an external function I could import and use?
thanks,
PS - Yes I know about hiera, and having the data in YAML or JSON files but my requirement is CSV files only.
Brandon
The behavior of extlookup() is pretty well documented. It makes no special provision for column headers, which are by no means an inherent feature of CSV format. Indeed, if your header line is not readable as a data line, then your file is not CSV at all.
Supposing that your file is indeed valid CSV, the absolute simplest solution would be to ignore the issue. It presents a problem only if the first column heading duplicates an actual or potential data name. If it does not, then you will never look up or use the psuedo-value represented by the first row.
If your file in fact is not CSV on account of its first line, or if the first column name conflicts with a real data name, then it seems the next best alternative would be to just remove that line, or to avoid creating it in the first place. I don't see any reason why one of these should not be possible.
I know about heira, and having the data in YAML or JSON files but my requirement is CSV files only.
How sad. Do be aware that extlookup() has long been deprecated, and it was removed from Puppet 4.
I'm inclined to suggest you implement a translator from CSV to Hiera-friendly YAML, and use Hiera in your module. Alternatively, Hiera supports custom backends, and it's not too hard to write one. I am unaware of an existing CSV backend for Hiera, but you could write one. Ignoring a header line would then be under your control, and you would simultaneously achieve a measure of future-proofing.

"Inconsistent number of matrix lines compared to the number of labels" runtime exception error when importing large CSV file into Gephi

The full error is "java.lang.RuntimeException: java.lang.Exception: Inconsistent number of matrix lines compared to the number of labels."
I am trying to pull an adjacency matrix stored in a CSV file into Gephi so that I can use its modularity optimization tool and make a really slick chart of my data. I compiled the data in Excel (yes, it took forever) and saved it as CSV, and then I opened the file in Notepad and used Ctrl + H to replace all commas with semicolons (and saved it as a CSV file again). My dataset is 5,654 x 5,654 cells, not counting the labels. It is an r-neighborhood graph with r = .6299 (80th percentile and above).
I searched Google and StackOverflow and I only found one solution for my error message: to remove all the spaces in the file. I used Ctrl + H again to remove all spaces, but I received the same error message when I tried to upload the "spaceless" CSV file. Just to double-check that saving it as CSV didn't cause an issue, I checked the CSV by opening it up in Excel. The file opened correctly, but I do not have much experience with CSV files so I do not know if anything was off. It seemed as though all the records were separated by semicolons instead of commas and I did not see any spaces.
Is it the size of my file? I am currently struggling through learning some Python and R, and I would be open to creating this adjacency matrix CSV file in either of those environments and then feeding it to Gephi. I just need a dependable solution that works without bogging my computer down in Excel all afternoon and allows me to be the "slick graph superhero" of my office.
Not a direct answer to your problem but there is also the Excel/CSV import spigot to whatever it might be useful. Otherwise you could perhaps try to import the network with NodeXL and then save it in GraphML format which can then be opened by Gephi
Good tip from http://social-dynamics.org/gephi-faq/
A. One thing to try is removing any extra spaces from your csv file.
Sometimes these trip up the import. Open the csv file using a simple
text editor like NotePad or TextEdit, and then use find/replace to
remove any spaces. Save the adjacency matrix and then try importing it
again.
Removing spaces helped me to fix the issue.

export plots with netlogo

I am trying to export all the plots of my NetLogo model after simulation runs in a csv format with the primitive export-all-plots.
I haven't found yet the way to open this csv file with an external reader in order to get more clear plots. I tried with gnuplot but it looks like it's not able to open the csv format created with NetLogo:
"export-plots data (NetLogo 5.0.5)"
^
"C:\results\interface.csv", line 1: invalid command
How can I open csv plots with an external reader?
There are two complicating factors about NetLogo's plot export format. First, there's a three line header at the beginning (plus an empty line after) that just gives information about the model and when the data was generated. Next, there's data about the model settings, the plot state (pen colors and such). Finally, there's the data itself, which itself is somewhat complicated by the fact that you can have multiple pens per plot. So I'm not surprised gnuplot couldn't read it as is.
The table's are quite easy to use in GUI spreadsheet application, like Excel, LibreOffice's Calc, or Gnumeric. You can just select the data you want and generate the plots.
To do this at the command line, I'm afraid you might have to write a script to read it in. This should be pretty easy in something like Python or R. Just skip the metadata lines, and use a CSV parser to read in the rest.
You might also try using BehaviorSpace to generate the data, but make sure to use the table output. It let's you generate the data from many runs at once, and the format is a little more consistent. There are still 6 lines of metadata at the top, but you can just delete that. I believe this is more the standard practice in NetLogo.

Creating a CSV file with the Report Generation Toolkit in Labview

I want to create .csv files with the Report Generation Toolkit in Labview.
They must actually be .csv files which can be opened with Notepad or something similar.
Creating a .csv is not that hard, it's just a matter of adding the extension to the file name that's going to be created.
If I create a .csv file this way it opens nicely in excel just the way it should, but if I open it in Notepad it shows all kind of characters and it doesn't even come close to the data I wrote to the file.
I create the files with the Labview code below:
Link to image (can't post image yet because I've got to few points)
I know .csv files can be created with the Write to Spreadsheet VI but I would like to use the Report Generation Toolkit because it's pretty easy to add columns and rows to the file and that is something I really need.
you can use the Robust CSV package on the lavag.org forum to read and write 2D arrays to CSV files.
http://lavag.org/files/file/239-robust-csv/
Calling a file "csv" does not make it a CSV file. I never used the toolkit to generate an Excel file, but I'm assuming it creates an XLS or XLSX file, regardless of what extension you give it, which is why you're seeing gibberish (probably XLS, since it's been around for a while and I believe XLSX is XML, not binary).
I'm not sure what your problem is with the write spreadsheet VI. It has an append input, so I assume you can use that to at least add rows directly to a file, although I can't say I ever tried it. I would prefer handling all the data in memory explicitly, where you can easily use the array functions to add rows or columns to the array and then overwrite the entire file.