Is there an easy way, to save a Google Ngram result
http://books.google.com/ngrams/
as a csv?
So that I get a list like
1900 peace 500000times
1901 peace 540000times
and so on?
I downloaded their raw data but have no idea how to handle it. When I open those csv in OpenOffice, I can't even see a single word.
It can be done, and it's actually quite easy. Generate the graph you want on the Google Ngram viewer, then use your browser's function to show the page source code (this might be hidden under advanced or developer options). Then in the code (probably on line 297), you will find the data simply listed. You can use any word processor and/or spreadsheet software to clean up the data and export them as CSV.
No, you'd have to go to their datasets, which would be daunting to find what you wanted, or you could try Microsoft Research's N-Gram service.
Related
In Google forms CSV download, in the US/UK/Ireland locale, options selected in multiple-response questions are separated by commas. But so are questions, so that makes the lines impossible to parse.
eg
Q1 could be a single-response, Q2 options could be these three:
This, that
Those
Theirs, them, and other
In the CSV we might get
Q1, Q2
Response1, Those, Theirs, them, and other
which of course looks like four responses to Q2 rather than 2.
HOWEVER in a European (Swedish) locale, the options are separated by semicolons, which is much easier to parse:
Q1,Q2
Response1, Those; Theirs, them, and other
SO: can we specify different separators and if so how?
I do not have programmatic access to the account, all I can get is a CSV download from the owner so I'm interested to see if there are settings I can ask them to use.
Details
Docs, Microsoft Windows.
I asked this question on the Google Forms community with no reply:
https://support.google.com/docs/thread/13554553?hl=en
There is a solution
I do not have programmatic access to the account, all I can get is a
CSV download from the owner so I'm interested to see if there are
settings I can ask them to use.
You just ask them to:
Temporarily change from the US/UK/Ireland locale to a European locale.
Download the CSV file
Change back to the US/UK/Ireland locale.
Send you the new CSV file.
After you receive the file, if you want to import it in a sheet adjust your locale accordingly, if not, you are done.
Marikamitsos made me try once again - and I discovered that now the US and Ireland locales now use the semicolon as item delimiters.
That's good, but it would have been helpful if that change had been flagged somewhere.
I don't know when it happened.
How can I edit a large JSON manually?
I have a large JSON file, about 100 MB. I'd like to manually inspect some attributes, and then add more attributes to some of the objects.
I'd start off by looking at a subset of the file. Say, the 1st 100 objects. I'd gradually scale up to looking then at maybe 250, then a thousand, etc.
Can someone suggest a language or software (I'm running Windows) that excels at this task?
Some previous suggestion that aren't working or can't work.
Sublime - Could never load the file. Loading bar forever. Had to kill.
NotePad++ - Could never load. Froze. Had to kill.
Anything online - The data is confidential.
More Python and Jupyter information.
with open(path, 'r') as f:
data = json.load(f)
for i, (k, v) in enumerate(data.items()):
print(i, k, v)
if i == 2:
break
Causes an error. I think it has to do with Jupyter, but I'm not sure.
IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.
Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)
That makes me wonder if going about it this way is just dumb.
Possible Solutions
Build a custom app using TKinter
Just don't use a Jupyter Notebook
What you can do is to write a simple GUI program. use TKinter, to create a window and a text area inside it to show the json, a text box where you will input, how many objects you want to see, and a button named Next or something to see next and one more button to save.The following will be the functionalities for each of the items.
First you will be reading complete json in python and making it a dict.
Next Button - This will keep iterating based on the value in the TextBox. you could write a custom generator, where it will be yielding based on the number of values required.
Save Button-: This will keep saving the current json into a new json or if you could, you can try to write a function to update the current json directly.
Text Area - you should take the dictionary and convert to json and show the output from the Next Button's generator.
If you are using linux (or have an opportunity to transfer the file to *nix) you might wish to check out for number of lines within a file via
wc -l myfile.json
Let's say, for the purpose of simplicity, that your file has 2530000 lines and you wish to split it into 100k lines each, you can utilize any of the commands available at your distro to split the file further into desired chunks and then to edit them, one by one.
If you are comfortable with going the "linux way", check out some of the hints given on other topics, i.e.
edit multi-GB file when vi editor doesn't work
I hope it helps!
The only viewer I have used that works on large files (I had up to 250MB size files) is Dadroit. It is fast to view and comes with search.
Now, to edit, I use vi. I search for the location and make local edits. Vim or another simpler editor should work on Windows. Have you tried vscode? 100MB shouldn't be too large for it.
The other awesome terminal tool for viewing and editing data is Visidata. I have had mixed luck with it working on json files.
Not the best answer, but the problem with reading the JSON seems limited to Jupyter Notebooks (or even the limitations of my laptop).
Working in Spyder or running from the command line circumvents the Jupyter error mentioned in the original question.
It'd be great if someone knew how to tweak Jupyter to avoid this problem (sorry, I'm not sure how yet).
for editor,try notepad++
for language, try Python
since you haven't give your data structure, I can't give more answer.
I have a folder with hundreds of files that were saved on a specific format of a given software (in this case it is the Qualisys Track Manager and the file format is .qtm).
This software has the option of exporting the files to another format such as TSV, MAT, C3D,...
My problem: I want to export all my files to TSV format but the only way I know is open the software, go to File->Export->To TSV. And doing this for hundreds of files is time consuming. So I was thinking on writing a script where I could call my files, access the software and it would do the export automatically.
But I have no clue how to do this, I was thinking on writing a script on Notepad++, running on the command window and then I would get all the files on TSV format.
[EDIT] After some research I think maybe a Batch script or a PowerShell script may help me but I have no idea how to run automatically the commands of the software of if it is even possible... (I am using Windows10)
It is highly likely to be a perpetual file format(.qtm) and Powershell/batch would not understand it. Unless this file can be read in a known way (Text XML etc), they would not be able to convert it.
I googled it and seems QTM have a REST API interface. It would be the best chance you have. I'm not sure if the documentation is available publicly, I didn't find it. I'd recommend you contact their support for REST API document/ask if their REST API can handle this task/sample code to get you start.
Then you can make REST API calls with Invoke-RestMethod in a loop from powershell.
I'm trying to get geometry data from a large quantity of shapefiles into a database (Google Datastore). The thing is, I don't need to work with maps, I just need the coordinates, so I would like just the numerical coordinates. Ideally I'd like to use CSV, but any plain text would be workable. I have a Mac and have been able to get QGIS installed (I also tried udig but the interface was baffling). While it is easy to load a shp file into QGIS as a vector layer, I'm lost as to how to export the geometry, or even if it is possible.
Does anyone know how to extract plain text geometry from a shp file? Ideally with QGIS, but any method would be appreciated.
The "You can simply right-click the layer entry in QGIS and select "Save as"" approach was right
But the "GEOMETRY=AS_WKT" in the OGR layer option was missing.
I may also be a good idea to convert the coordinate system to WGS 84, as CSV are usually expected not to be projected (and shapefile sometimes are)
You can simply right-click the layer entry in QGIS and select "Save as".
In the dialog, there's an option to save as "CSV".
There are plenty of options to refine the format of the generated CSV file, as well as there are many other file formats to choose from.
Update:
See here for a solution: https://gis.stackexchange.com/a/8846
Outdated Response:
It is possible, in a sort of roundabout way...
Open the attribute table for the layer you want to save.
Select all rows.
Copy the rows
Paste into a spreadsheet
Save the spreadsheet as a csv.
Unfortunately there is no way to do this directly in QGIS.
See here for more details:
https://gis.stackexchange.com/questions/8844/get-list-of-coordinates-for-points-in-a-layer/8911#8911
I am trying to export all the plots of my NetLogo model after simulation runs in a csv format with the primitive export-all-plots.
I haven't found yet the way to open this csv file with an external reader in order to get more clear plots. I tried with gnuplot but it looks like it's not able to open the csv format created with NetLogo:
"export-plots data (NetLogo 5.0.5)"
^
"C:\results\interface.csv", line 1: invalid command
How can I open csv plots with an external reader?
There are two complicating factors about NetLogo's plot export format. First, there's a three line header at the beginning (plus an empty line after) that just gives information about the model and when the data was generated. Next, there's data about the model settings, the plot state (pen colors and such). Finally, there's the data itself, which itself is somewhat complicated by the fact that you can have multiple pens per plot. So I'm not surprised gnuplot couldn't read it as is.
The table's are quite easy to use in GUI spreadsheet application, like Excel, LibreOffice's Calc, or Gnumeric. You can just select the data you want and generate the plots.
To do this at the command line, I'm afraid you might have to write a script to read it in. This should be pretty easy in something like Python or R. Just skip the metadata lines, and use a CSV parser to read in the rest.
You might also try using BehaviorSpace to generate the data, but make sure to use the table output. It let's you generate the data from many runs at once, and the format is a little more consistent. There are still 6 lines of metadata at the top, but you can just delete that. I believe this is more the standard practice in NetLogo.