Downnload CSV problem Julia - ArgumentError: Symbol name may not contain \0 - csv

I'm trying to get a simple csv file from a url in Julia using Downloads and CSV without success.
This is what I've done so far:
using Downloads, CSV
url = "https://r-data.pmagunia.com/system/files/datasets/dataset-85141.csv"
f = Downloads.download(url)
df = CSV.read(f, DataFrame)
But I get the following error: ArgumentError: Symbol name may not contain \0
I've tried using normalizenames, but also without success:
f = Downloads.download(url)
df = CSV.File(f, normalizenames=true)
But then I get Invalid UTF-8 string as an error message.
When I simply download the file and get it from my PC with CSV.read I get no errors.

The server is serving that file with Content-Encoding: gzip, i.e. the data that is transferred is compressed and the client is expected to decompress it. You can try this out yourself on the command line, curl does not decompress by default:
$ curl https://r-data.pmagunia.com/system/files/datasets/dataset-85141.csv [9:40:49]
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
however if you pass the --compressed flag:
$ curl --compressed https://r-data.pmagunia.com/system/files/datasets/dataset-85141.csv
"time","Nile"
1871,1120
1872,1160
1873,963
[...]
Downloads.jl uses libcurl and I can't find much mention of handling of compressed content in the Downloads.jl repository.
To fix this for now you can upgrade to v0.9.4 of CSV.jl, it handles gzipped CSV-files transparently.
If updating is not an option you can use CodecZlib.jl manually:
using Downloads, CSV, DataFrames, CodecZlib
url = "https://r-data.pmagunia.com/system/files/datasets/dataset-85141.csv"
f = Downloads.download(url)
df = open(fh -> CSV.read(GzipDecompressorStream(fh), DataFrame), f)

Related

Opensmile: unreadable csv file while extracting prosody features from wav file

I am extracting prosody features from an audio file while using Opensmile using Windows version of Opensmile. It runs successful and an output csv is generated. But when I open csv, it shows some rows that are not readable. I used this command to extract prosody feature:
SMILEXtract -C \opensmile-3.0-win-x64\config\prosody\prosodyShs.conf -I audio_sample_01.wav -O prosody_sample1.csv
And the output of csv looks like this:
[
Even I tried to use the sample wave file given in Example audio folder given in opensmile directory and the output is same (not readable). Can someone help me in identifying where the problem is actually? and how can I fix it?
You need to enable the csvSink component in the configuration file to make it work. The file config\prosody\prosodyShs.conf that you are using does not have this component defined and always writes binary output.
You can verify that it is the standart binary output in this way: omit the -O parameter from your command so it becomesSMILEXtract -C \opensmile-3.0-win-x64\config\prosody\prosodyShs.conf -I audio_sample_01.wav and execute it. You will get a output.htk file which is exactly the same as the prosody_sample1.csv.
How output csv? You can take a look at the example configuration in opensmile-3.0-win-x64\config\demo\demo1_energy.conf where a csvSink component is defined.
You can find more information in the official documentation:
Get started page of the openSMILE documentation
The section on configuration files
Documentation for cCsvSink
This is how I solved the issue. First I added the csvSink component to the list of the component instances. instance[csvSink].type = cCsvSink
Next I added the configuration parameters for this instance.
[csvSink:cCsvSink]
reader.dmLevel = energy
filename = \cm[outputfile(O){output.csv}:file name of the output CSV
file]
delimChar = ;
append = 0
timestamp = 1
number = 1
printHeader = 1
\{../shared/standard_data_output_lldonly.conf.inc}`
Now if you run this file it will throw you errors because reader.dmLevel = energy is dependent on waveframes. So the final changes would be:
[energy:cEnergy]
reader.dmLevel = waveframes
writer.dmLevel = energy
[int:cIntensity]
reader.dmLevel = waveframes
[framer:cFramer]
reader.dmLevel=wave
writer.dmLevel=waveframes
Further reference on how to configure opensmile configuration files can be found here

error finding and uploading a file in octave

I tried converting my .csv file to .dat format and tried to load the file into Octave. It throws an error:
unable to find file filename
I also tried to load the file in .csv format using the syntax
x = csvread(filename)
and it throws the error:
'filename' undefined near line 1 column 13.
I also tried loading the file by opening it on the editor and I tried loading it and now it shows me
warning: load: 'filepath' found by searching load path
error: load: unable to determine file format of 'Salary_Data.dat'.
How can I load my data?
>> load Salary_Data.dat
error: load: unable to find file Salary_Data.dat
>> Salary_Data
error: 'Salary_Data' undefined near line 1 column 1
>> Salary_Data
error: 'Salary_Data' undefined near line 1 column 1
>> Salary_Data
error: 'Salary_Data' undefined near line 1 column 1
>> x = csvread(Salary_Data)
error: 'Salary_Data' undefined near line 1 column 13
>> x = csvread(Salary_Data.csv)
error: 'Salary_Data' undefined near line 1 column 13
>> load Salary_Data.dat
warning: load: 'C:/Users/vaith/Desktop\Salary_Data.dat' found by searching load path
error: load: unable to determine file format of 'Salary_Data.dat'
>> load Salary_Data.csv
warning: load: 'C:/Users/vaith/Desktop\Salary_Data.csv' found by searching load path
error: load: unable to determine file format of 'Salary_Data.csv'
Salary_Data.csv
YearsExperience,Salary
1.1,39343.00
1.3,46205.00
1.5,37731.00
2.0,43525.00
2.2,39891.00
2.9,56642.00
3.0,60150.00
3.2,54445.00
3.2,64445.00
3.7,57189.00
3.9,63218.00
4.0,55794.00
4.0,56957.00
4.1,57081.00
4.5,61111.00
4.9,67938.00
5.1,66029.00
5.3,83088.00
5.9,81363.00
6.0,93940.00
6.8,91738.00
7.1,98273.00
7.9,101302.00
8.2,113812.00
8.7,109431.00
9.0,105582.00
9.5,116969.00
9.6,112635.00
10.3,122391.00
10.5,121872.00
Ok, you've stumbled through a whole pile of issues here.
It would help if you didn't give us error messages without the commands that produced them.
The first message means you were telling Octave to open something called filename and it couldn't find anything called filename. Did you define the variable filename? Your second command and the error message suggests you didn't.
Do you know what Octave's working directory is? Is it the same as where the file is located? From the response to your load commands, I'd guess not. The file is located at C:/Users/vaith/Desktop. Octave's working directory is probably somewhere else.
(Try the pwd command and see what it tells you. Use the file browser or the cd command to navigate to the same location as the file. help pwd and help cd commands would also provide useful information.)
The load command, used as a command (load file.txt) can take an input that is or isn't defined as a string. A function format (load('file.txt') or csvread('file.txt')) must be a string input, hence the quotes around file.txt. So all of your csvread input commands thought you were giving it variable names, not filenames.
Last, the fact that load couldn't read your data isn't overly surprising. Octave is trying to guess what kind of file it is and how to load it. I assume you tried help load to see what the different command options are? You can give it different options to help Octave figure it out. If it actually is a csv file though, and is all numbers not text, then csvread might still be your best option if you use it correctly. help csvread would be good information for you.
It looks from your data like you have a header line that is probably confusing the load command. For data that simply formatted, the csvread command can bring in the data. It will replace your header text with zeros.
So, first, navigate to the location of the file:
>> cd C:/Users/vaith/Desktop
then open the file:
>> mydata = csvread('Salary_Data.csv')
mydata =
0.00000 0.00000
1.10000 39343.00000
1.30000 46205.00000
1.50000 37731.00000
2.00000 43525.00000
...
If you plan to reuse the filename, you can assign it to a variable, then open the file:
>> myfile = 'Salary_Data.csv'
myfile = Salary_Data.csv
>> mydata = csvread(myfile)
mydata =
0.00000 0.00000
1.10000 39343.00000
1.30000 46205.00000
1.50000 37731.00000
2.00000 43525.00000
...
Notice how the filename is stored and used as a string with quotation marks, but the variable name is not. Also, csvread converted non-numeric header data to 'zeros'. The help for csvread and dlmread show you how to change it to something other than zero, or to skip a certain number of rows. If you want to preserve the text, you'll have to use some other input function.

Input error: Expected '--nodes' to have at least 1 valid item, but had 0 []

I've read plenty of articles about this issue on here, but I still can't seem to get around this issue. I've been trying to use Neo4j-import on some large genome data CSVs I have, but it doesn't seem to recognise the files. My command line input is as follows:
user#LenovoPC ~/.config/Neo4j Desktop/Application/neo4jDatabases/database-2f182948-e170-45b1-b9f4-19d236ff5d43/installation-3.5.1 $ \
bin/neo4j-import --into data/databases/graph.db --id-type string \
--nodes:Allele variants.csv --nodes:Chromosome chromosome.csv --nodes:Phenotype phenotypes.csv \
--nodes:Sample samples.csv --relationships:BELONGS_TO variant_chromosomes.csv \
--relationships: sample_phenotypes.csv --relationships:ALTERNATIVE_TO variant_variants.csv \
--relationships:HAS sample_variants50-99.csv.gz
But I'm getting the following error:
WARNING: neo4j-import is deprecated and support for it will be removed in a future version of Neo4j; please use neo4j-admin import instead.
Input error: Expected '--nodes' to have at least 1 valid item, but had 0 []
Caused by:Expected '--nodes' to have at least 1 valid item, but had 0 []
java.lang.IllegalArgumentException: Expected '--nodes' to have at least 1 valid item, but had 0 []
at org.neo4j.kernel.impl.util.Validators.lambda$atLeast$6(Validators.java:144)
at org.neo4j.helpers.Args.validated(Args.java:670)
at org.neo4j.helpers.Args.interpretOptionsWithMetadata(Args.java:637)
at org.neo4j.tooling.ImportTool.extractInputFiles(ImportTool.java:623)
at org.neo4j.tooling.ImportTool.main(ImportTool.java:445)
at org.neo4j.tooling.ImportTool.main(ImportTool.java:380)
I included the file path, as I'm using Neo4j Desktop and am not sure if this has a different file structure? My csv files are stored in the import folder (but I also have copies in the current folder and the graph.db folder just in case).
The import directory is as follows:
user#LenovoPC ~/.config/Neo4j Desktop/Application/neo4jDatabases/database-2f182948-e170-45b1-b9f4-19d236ff5d43/installation-3.5.1/import $ dir
chromosomes.csv samples.csv variants.csv
phenotypes.csv sample_variants50-99.csv.gz variants.csv.gz
sample_phenotypes.csv variant_chromosomes.csv
variant_variants.csv
I can only assume that it's my filepath, but I've tried quite a few alternatives and had no luck at all. If anyone could shed some light on what the issue is, I would really appreciate it!
Best is to cd into the desktop directory, place the csv files into the import folder.
then you can do:
cd ~/.config/Neo4j Desktop/Application/neo4jDatabases/database-2f182948-e170-45b1-b9f4-19d236ff5d43/installation-3.5.1
bin/neo4j-import --into data/databases/graph.db --id-type string \
--nodes:Allele import/variants.csv \
--nodes:Chromosome import/chromosome.csv \
--nodes:Phenotype import/phenotypes.csv \
--nodes:Sample import/samples.csv \
--relationships:BELONGS_TO import/variant_chromosomes.csv \
--relationships import/sample_phenotypes.csv \
--relationships:ALTERNATIVE_TO import/variant_variants.csv \
--relationships:HAS import/sample_variants50-99.csv.gz
Some more notes:
HAS is a pretty generic relationship type
I left off the colon here: --relationships import/sample_phenotypes.csv not sure if you have the rel-type in the file
is this a single file? --relationships:HAS import/sample_variants50-99.csv.gz

how to read 7z json file in R

Cannot find the answer how to load 7z file in R. I can't use this:
s <- system("7z e -o <path> <archive>")
because of error 127. Maybe that's because I'm on Windows? However, 7z opens when I click in TotalCommander.
I'm trying something like this:
con <- gzfile(path, 'r')
ff <- readLines(con, encoding = "UTF-8")
h <- fromJSON(ff)
I have Error:
Error: parse error: trailing garbage
7z¼¯' ãSp‹ Ë:ô–¦ÐÐY#4U¶å¿ç’
(right here) ------^
The encoding is totally not there, when I load this file uncompressed it's ok without specifying the encoding. Moreover it's 2x longer. I have thousands of 7z files need to read them one by one in a loop, read, analyze and get out. Could anyone give me some hints how to do it effectively?
When uncompressed it easily works using:
library(jsonlite)
f <- read_json(path, simplifyVector = T)
EDIT
There are many json files in one 7z file. The above error is probably caused by parser which reads raw data of whole file. I don't know how to link these files or specify the connection attributes.

How to auto format JSON on save in Vim

To be honest go has spoiled me. With go I got used to having a strict formatting standard that is being enforced by my editor (vim) and is almost accepted and followed by everybody else on the team and around the world.
I wanted to format JSON files on save the same way.
Question: How to auto format/indent/lint json files on save in vim.
In one command, try this:
execute '%!python -m json.tool' | w
You could then add you own key binding to make it a simpler keystroke. Of course, for this to work, you need to have Python installed on your machine.
If you are keen on using external tool and you are doing some work with json, I would suggest using the jq:
https://stedolan.github.io/jq/
Then, you can execute :%!jq . inside vim which will replace the current buffer with the output of jq.
%!python -m json.tool
or
%!python -c "import json, sys, collections; print json.dumps(json.load(sys.stdin, object_pairs_hook=collections.OrderedDict), ensure_ascii=False, indent=4)"
you can add this to your vimrc:
com! FormatJSON %!python -m json.tool
than you can use :FormatJson format json files
Thanks mMontu and Jose B, this is what I ended up doing:
WARNING this will overwrite your buffer. So if you OPEN a json file that already has a syntax error, you will lose your whole file (or can lose it).
Add this line to your ~/.vimrc
" Ali: to indent json files on save
autocmd FileType json autocmd BufWritePre <buffer> %!python -m json.tool
you need to have python on your machine, of course.
EDIT: this next one should not overwrite your buffer if your json has error. Which makes it the correct answer, but since I don't have a good grasp of Vim script or shell for that matter, I present it as an experimental thing that you can try if you are feeling lucky. It may depend on your shell too. You are warned.
" Ali: to indent json files on save
autocmd FileType json autocmd BufWritePre <buffer> %!python -m json.tool 2>/dev/null || echo <buffer>
A search for JSON plugins on vim.org returned this:
jdaddy.vim : JSON manipulation and pretty printing
It has the following on description:
gqaj "pretty prints" (wraps/indents/sorts keys/otherwise cleans up)
the JSON construct under the cursor.
If it does the formatting you are expecting then you could create an autocmd BufWritePre to format when saving.
Here is my solution. It doesn't exactly address the question part of "on save" but if you perform this action before save it will output errors you can then fix before save.
Also, it depends on only one external tool -- jq -- which has become the gold standard of unix shell JSON processing tools. And which you probably already have installed (macOS and Linux/Unix only; idk how this would behave in Windows)
Basically, it's just:
ggVG!jq '.'
That will highlight the entire JSON document then run it through jq which will just parse it for correctness, reformat it (e.g. fix any indents, etc), and spit the output back into the Vim editor.
If you want to parse only part of the document, you can highlight that part manually by pressing v or V and then run
!jq '.'
The benefit here is that you can fix subsections of your document this way.
Vim Autoformat
https://github.com/Chiel92/vim-autoformat
There is this Vim plugin which supports multiple auto format and indent schemes as well as extending with custom formatters per filetype.
https://github.com/Chiel92/vim-autoformat#default-formatprograms
Note:
You will need to have nodejs and js-beautify installed as vim-autoformat uses these as the default external tool.
npm install -g js-beautify
Another solution is to use coc-format-json.
I did some organizing (though some of it had nothing to do with vim) and to write the script by yourself on the neovim!
solution1: neovim
1-1: write the script by yourself
Neovim allows Python3 plugins to be defined by placing python files or packages in rplugin/python3/ in a runtimepath folder)
in my case
- init.vim
- rplugin/python3/[your_py_file_set].py
- rplugin/python3/fmt_file.py
The fmt_file.py as following
# rplugin/python3/fmt_file.py
import pynvim
import json
#pynvim.plugin
class Plugin:
__slots__ = ('vim',)
def __init__(self, vim):
self.vim = vim
#pynvim.command('FormatJson', nargs='*', range='')
def format_json(self, args, rg):
"""
USAGE::
:FormatJson
"""
try:
buf = self.vim.current.buffer
json_content: str = '\n'.join(buf[:])
dict_content: dict = json.loads(json_content)
new_content: str = json.dumps(dict_content, indent=4, sort_keys=True)
buf[:] = new_content.split('\n')
except Exception as e:
self.vim.current.line = str(e)
afterwards run: :UpdateRemotePlugins from within Nvim once, to generate the necessary Vimscript to make your Plugin available. (and you best restart the neovim)
and then, you open the JSON file that one you want to format and typing: :FormatJson in the command. all done.
don't forget to tell vim where is your python
" init.vim
let g:python3_host_prog = '...\python.exe''
and pip install pynvim
1-2: use tool.py
where tool.py is located on the Lib/json/tool.py
:%!python -m json.tool
solution2: command line
If you already install the python, and you can open the command line:
python -m json.tool "test.json" >> "output.json"
solution3: python
I write a simple script for those things.
"""
USAGE::
python fmt_file.py fmt-json "C:\test\test.json"
python fmt_file.py fmt-json "C:\test\test.json" --out_path="abc.json"
python fmt_file.py fmt-json "test.json" --out_path="abc.json"
"""
import click # pip install click
from click.types import File
import json
from pathlib import Path
#click.group('json')
def gj():
...
#gj.command('fmt-json')
#click.argument('file_obj', type=click.File('r', encoding='utf-8'))
#click.option('--out_path', default=None, type=Path, help='output path')
def format_json(file_obj: File, out_path: Path):
new_content = ''
with file_obj as f:
buf_list = [_ for _ in f]
if buf_list:
json_content: str = '\n'.join(buf_list)
dict_content: dict = json.loads(json_content)
new_content: str = json.dumps(dict_content, indent=4, sort_keys=True)
if new_content:
with open(out_path if out_path else Path('./temp.temp_temp.json'),
'w', encoding='utf-8') as f:
f.write(new_content)
def main():
for register_group in (gj,):
register_group()
if __name__ == '__main__':
main()
you can search for 'vim-json-line-format' plugin, Open a file in Normal mode, move your cursor on the json line, use <leader>pj to show formated json by print it, use <leader>wj could change the text to formatted json.
Invalid json can not format!
Use ALE to auto-format on save
Configure ALE to format JSON
add the following to .vim/vimfiles/after/ftplugin/json.vim:
let b:ale_fix_on_save = 1 " Fix files when they are saved.