To be honest go has spoiled me. With go I got used to having a strict formatting standard that is being enforced by my editor (vim) and is almost accepted and followed by everybody else on the team and around the world.
I wanted to format JSON files on save the same way.
Question: How to auto format/indent/lint json files on save in vim.
In one command, try this:
execute '%!python -m json.tool' | w
You could then add you own key binding to make it a simpler keystroke. Of course, for this to work, you need to have Python installed on your machine.
If you are keen on using external tool and you are doing some work with json, I would suggest using the jq:
https://stedolan.github.io/jq/
Then, you can execute :%!jq . inside vim which will replace the current buffer with the output of jq.
%!python -m json.tool
or
%!python -c "import json, sys, collections; print json.dumps(json.load(sys.stdin, object_pairs_hook=collections.OrderedDict), ensure_ascii=False, indent=4)"
you can add this to your vimrc:
com! FormatJSON %!python -m json.tool
than you can use :FormatJson format json files
Thanks mMontu and Jose B, this is what I ended up doing:
WARNING this will overwrite your buffer. So if you OPEN a json file that already has a syntax error, you will lose your whole file (or can lose it).
Add this line to your ~/.vimrc
" Ali: to indent json files on save
autocmd FileType json autocmd BufWritePre <buffer> %!python -m json.tool
you need to have python on your machine, of course.
EDIT: this next one should not overwrite your buffer if your json has error. Which makes it the correct answer, but since I don't have a good grasp of Vim script or shell for that matter, I present it as an experimental thing that you can try if you are feeling lucky. It may depend on your shell too. You are warned.
" Ali: to indent json files on save
autocmd FileType json autocmd BufWritePre <buffer> %!python -m json.tool 2>/dev/null || echo <buffer>
A search for JSON plugins on vim.org returned this:
jdaddy.vim : JSON manipulation and pretty printing
It has the following on description:
gqaj "pretty prints" (wraps/indents/sorts keys/otherwise cleans up)
the JSON construct under the cursor.
If it does the formatting you are expecting then you could create an autocmd BufWritePre to format when saving.
Here is my solution. It doesn't exactly address the question part of "on save" but if you perform this action before save it will output errors you can then fix before save.
Also, it depends on only one external tool -- jq -- which has become the gold standard of unix shell JSON processing tools. And which you probably already have installed (macOS and Linux/Unix only; idk how this would behave in Windows)
Basically, it's just:
ggVG!jq '.'
That will highlight the entire JSON document then run it through jq which will just parse it for correctness, reformat it (e.g. fix any indents, etc), and spit the output back into the Vim editor.
If you want to parse only part of the document, you can highlight that part manually by pressing v or V and then run
!jq '.'
The benefit here is that you can fix subsections of your document this way.
Vim Autoformat
https://github.com/Chiel92/vim-autoformat
There is this Vim plugin which supports multiple auto format and indent schemes as well as extending with custom formatters per filetype.
https://github.com/Chiel92/vim-autoformat#default-formatprograms
Note:
You will need to have nodejs and js-beautify installed as vim-autoformat uses these as the default external tool.
npm install -g js-beautify
Another solution is to use coc-format-json.
I did some organizing (though some of it had nothing to do with vim) and to write the script by yourself on the neovim!
solution1: neovim
1-1: write the script by yourself
Neovim allows Python3 plugins to be defined by placing python files or packages in rplugin/python3/ in a runtimepath folder)
in my case
- init.vim
- rplugin/python3/[your_py_file_set].py
- rplugin/python3/fmt_file.py
The fmt_file.py as following
# rplugin/python3/fmt_file.py
import pynvim
import json
#pynvim.plugin
class Plugin:
__slots__ = ('vim',)
def __init__(self, vim):
self.vim = vim
#pynvim.command('FormatJson', nargs='*', range='')
def format_json(self, args, rg):
"""
USAGE::
:FormatJson
"""
try:
buf = self.vim.current.buffer
json_content: str = '\n'.join(buf[:])
dict_content: dict = json.loads(json_content)
new_content: str = json.dumps(dict_content, indent=4, sort_keys=True)
buf[:] = new_content.split('\n')
except Exception as e:
self.vim.current.line = str(e)
afterwards run: :UpdateRemotePlugins from within Nvim once, to generate the necessary Vimscript to make your Plugin available. (and you best restart the neovim)
and then, you open the JSON file that one you want to format and typing: :FormatJson in the command. all done.
don't forget to tell vim where is your python
" init.vim
let g:python3_host_prog = '...\python.exe''
and pip install pynvim
1-2: use tool.py
where tool.py is located on the Lib/json/tool.py
:%!python -m json.tool
solution2: command line
If you already install the python, and you can open the command line:
python -m json.tool "test.json" >> "output.json"
solution3: python
I write a simple script for those things.
"""
USAGE::
python fmt_file.py fmt-json "C:\test\test.json"
python fmt_file.py fmt-json "C:\test\test.json" --out_path="abc.json"
python fmt_file.py fmt-json "test.json" --out_path="abc.json"
"""
import click # pip install click
from click.types import File
import json
from pathlib import Path
#click.group('json')
def gj():
...
#gj.command('fmt-json')
#click.argument('file_obj', type=click.File('r', encoding='utf-8'))
#click.option('--out_path', default=None, type=Path, help='output path')
def format_json(file_obj: File, out_path: Path):
new_content = ''
with file_obj as f:
buf_list = [_ for _ in f]
if buf_list:
json_content: str = '\n'.join(buf_list)
dict_content: dict = json.loads(json_content)
new_content: str = json.dumps(dict_content, indent=4, sort_keys=True)
if new_content:
with open(out_path if out_path else Path('./temp.temp_temp.json'),
'w', encoding='utf-8') as f:
f.write(new_content)
def main():
for register_group in (gj,):
register_group()
if __name__ == '__main__':
main()
you can search for 'vim-json-line-format' plugin, Open a file in Normal mode, move your cursor on the json line, use <leader>pj to show formated json by print it, use <leader>wj could change the text to formatted json.
Invalid json can not format!
Use ALE to auto-format on save
Configure ALE to format JSON
add the following to .vim/vimfiles/after/ftplugin/json.vim:
let b:ale_fix_on_save = 1 " Fix files when they are saved.
Related
I am extracting prosody features from an audio file while using Opensmile using Windows version of Opensmile. It runs successful and an output csv is generated. But when I open csv, it shows some rows that are not readable. I used this command to extract prosody feature:
SMILEXtract -C \opensmile-3.0-win-x64\config\prosody\prosodyShs.conf -I audio_sample_01.wav -O prosody_sample1.csv
And the output of csv looks like this:
[
Even I tried to use the sample wave file given in Example audio folder given in opensmile directory and the output is same (not readable). Can someone help me in identifying where the problem is actually? and how can I fix it?
You need to enable the csvSink component in the configuration file to make it work. The file config\prosody\prosodyShs.conf that you are using does not have this component defined and always writes binary output.
You can verify that it is the standart binary output in this way: omit the -O parameter from your command so it becomesSMILEXtract -C \opensmile-3.0-win-x64\config\prosody\prosodyShs.conf -I audio_sample_01.wav and execute it. You will get a output.htk file which is exactly the same as the prosody_sample1.csv.
How output csv? You can take a look at the example configuration in opensmile-3.0-win-x64\config\demo\demo1_energy.conf where a csvSink component is defined.
You can find more information in the official documentation:
Get started page of the openSMILE documentation
The section on configuration files
Documentation for cCsvSink
This is how I solved the issue. First I added the csvSink component to the list of the component instances. instance[csvSink].type = cCsvSink
Next I added the configuration parameters for this instance.
[csvSink:cCsvSink]
reader.dmLevel = energy
filename = \cm[outputfile(O){output.csv}:file name of the output CSV
file]
delimChar = ;
append = 0
timestamp = 1
number = 1
printHeader = 1
\{../shared/standard_data_output_lldonly.conf.inc}`
Now if you run this file it will throw you errors because reader.dmLevel = energy is dependent on waveframes. So the final changes would be:
[energy:cEnergy]
reader.dmLevel = waveframes
writer.dmLevel = energy
[int:cIntensity]
reader.dmLevel = waveframes
[framer:cFramer]
reader.dmLevel=wave
writer.dmLevel=waveframes
Further reference on how to configure opensmile configuration files can be found here
I have a huge newline delimited JSON file input.json which like this:
{ "name":"a.txt", "content":"...", "other_keys":"..."}
{ "name":"b.txt", "content":"...", "something_else":"..."}
{ "name":"c.txt", "content":"...", "etc":"..."}
...
How can I split it into multiple text files, where file names are taken from "name" and file content is taken from "content"? Other keys can be ignored. Currently toying with jq tool without luck.
The key to an efficient, jq-based solution is to pipe the output of jq (invoked with the -c option) to a program such as awk to perform the actual writing of the output files.
jq -c '.name, .content' input.json |
awk 'fn {print > fn; close(fn); fn=""; next;}
{fn=$0; sub(/^"/,"",fn); sub(/"$/,"",fn);}'
Warnings
Blindly relying on the JSON input for the file names has some risks,
e.g.
what if the same "name" is specified more than once?
if a file already exists, the above program will simply append to it.
Also, somewhere along the line, the validity of .name as a filename should be checked.
Related answers on SO
This question has been asked and answered on SO in slightly different forms before,
see e.g. Split a JSON file into separate files
jq doesn't have the output capabilities to create the desired files after grouping the objects; you'll need to use another language with a JSON library. An example using Python:
import json
import fileinput
for line in fileinput.input(): # Read from standard input or filename arguments
d = json.loads(line)
with open(d['name'], "a") as f:
print(d['content'], file=f)
This has the drawback of repeatedly opening and closing each file multiple times, but it's simple. A more complex, but more efficient, example would use an exit stack context manager.
import json
import fileinput
import contextlib
with contextlib.ExitStack() as es:
files = {}
for line in fileinput.input():
d = json.loads(line)
file_name = d['name']
if file_name not in files:
files[file_name] = es.enter_context(open(file_name, "w"))
print(d['content'], file=files[file_name])
Put briefly, files are opened and cached as they are discovered. Once the loop completes (or in the event of an exception), the exit stack ensures all files previously opened are properly closed.
If there's a chance that there will be too many files to have open simultaneously, you'll have to use the simple-but-inefficient code, though you could implement something even more complex that just keeps a small, fixed number of files open at any given time, reopening them in append mode as necessary. Implementing that is beyond the scope of this answer, though.
The following jq-based solution ensures that the output in the JSON files is pretty-printed,
but ignores any input object with .content equal to the JSON string: "IGNORE ME":
jq 'if .content == "IGNORE ME"
then "Skipping IGNORE ME" | stderr | empty
else .name, .content, "IGNORE ME" end' input.json |
awk '/^"IGNORE ME"$/ {close(fn); fn=""; next}
fn {print >> fn; next}
{fn=$0; sub(/^"/,"",fn); sub(/"$/,"",fn);}'
Is it possible to find out in a .tcl script, what python version is installed? In other words, how can I tell what python version is in default path from a .tcl script?
Tcl Wiki doesn't include useful information about this
currently I am calling a python script which prints sys.version and parsing its output.
.py
import sys
def find_version():
version = sys.version
version = version.split()[0].split('.')
version = version[0] + '.' + version[1]
print(version)
if __name__ == '__main__':
find_version()
.tcl
set file "C://find_python_version.py"
set output [exec python $file]
I would use Python's sys.version_info because I can format the version string in any way I like:
set pythonVersion [exec python -c {import sys; print("%d.%d.%d" % sys.version_info[:3])}]
puts "Python version: $pythonVersion"
Output:
Python version: 2.7.15
A couple of notes:
A Python script (in curly braces) follows the -c flag will print out the version in the form x.y.z, you can format it any way you like
The value of sys.version_info is a list of many elements, see documentation. I am interested only in the first 3 elements, hence sys.version_info[:3]
The print statement/function with parentheses will work with both Python 2 and Python 3
A simple enough approach seems to be to parse the result of python --version:
proc pythonVersion {{pythonExecutable "python"}} {
# Tricky point: Python 2.7 writes version info to stderr!
set info [exec $pythonExecutable --version 2>#1]
if {[regexp {^Python ([\d.]+)$} $info --> version]} {
return $version
}
error "failed to parse output of $pythonExecutable --version: '$info'"
}
Testing on this system:
% pythonVersion
3.6.8
% pythonVersion python2.7
2.7.15
Looks OK to me.
I have a CSV file and I wish to understand its encoding. Is there a menu option in Microsoft Excel that can help me detect it
OR do I need to make use of programming languages like C# or PHP to deduce it.
You can use Notepad++ to evaluate a file's encoding without needing to write code. The evaluated encoding of the open file will display on the bottom bar, far right side. The encodings supported can be seen by going to Settings -> Preferences -> New Document/Default Directory and looking in the drop down.
In Linux systems, you can use file command. It will give the correct encoding
Sample:
file blah.csv
Output:
blah.csv: ISO-8859 text, with very long lines
If you use Python, just use a print() function to check the encoding of a csv file. For example:
with open('file_name.csv') as f:
print(f)
The output is something like this:
<_io.TextIOWrapper name='file_name.csv' mode='r' encoding='utf8'>
You can also use python chardet library
# install the chardet library
!pip install chardet
# import the chardet library
import chardet
# use the detect method to find the encoding
# 'rb' means read in the file as binary
with open("test.csv", 'rb') as file:
print(chardet.detect(file.read()))
Use chardet https://github.com/chardet/chardet (documentation is short and easy to read).
Install python, then pip install chardet, at last use the command line command.
I tested under GB2312 and it's pretty accurate. (Make sure you have at least a few characters, sample with only 1 character may fail easily).
file is not reliable as you can see.
Or you can execute in python console or in Jupyter Notebook:
import csv
data = open("file.csv","r")
data
You will see information about the data object like this:
<_io.TextIOWrapper name='arch.csv' mode='r' encoding='cp1250'>
As you can see it contains encoding infotmation.
CSV files have no headers indicating the encoding.
You can only guess by looking at:
the platform / application the file was created on
the bytes in the file
In 2021, emoticons are widely used, but many import tools fail to import them. The chardet library is often recommended in the answers above, but the lib does not handle emoticons well.
icecream = '🍦'
import csv
with open('test.csv', 'w') as f:
wf = csv.writer(f)
wf.writerow(['ice cream', icecream])
import chardet
with open('test.csv', 'rb') as f:
print(chardet.detect(f.read()))
{'encoding': 'Windows-1254', 'confidence': 0.3864823918622268, 'language': 'Turkish'}
This gives UnicodeDecodeError while trying to read the file with this encoding.
The default encoding on Mac is UTF-8. It's included explicitly here but that wasn't even necessary... but on Windows it might be.
with open('test.csv', 'r', encoding='utf-8') as f:
print(f.read())
ice cream,🍦
The file command also picked this up
file test.csv
test.csv: UTF-8 Unicode text, with CRLF line terminators
My advice in 2021, if the automatic detection goes wrong: try UTF-8 before resorting to chardet.
In Python, You can Try...
from encodings.aliases import aliases
alias_values = set(aliases.values())
for encoding in set(aliases.values()):
try:
df=pd.read_csv("test.csv", encoding=encoding)
print('successful', encoding)
except:
pass
As it is mentioned by #3724913 (Jitender Kumar) to use file command (it also works in WSL on Windows), I was able to get encoding information of a csv file by executing file --exclude encoding blah.csv using info available on man file as file blah.csv won't show the encoding info on my system.
import pandas as pd
import chardet
def read_csv(path: str, size: float = 0.10) -> pd.DataFrame:
"""
Reads a CSV file located at path and returns it as a Pandas DataFrame. If
nrows is provided, only the first nrows rows of the CSV file will be
read. Otherwise, all rows will be read.
Args:
path (str): The path to the CSV file.
size (float): The fraction of the file to be used for detecting the
encoding. Defaults to 0.10.
Returns:
pd.DataFrame: The CSV file as a Pandas DataFrame.
Raises:
UnicodeError: If the encoding of the file cannot be detected with the
initial size, the function will retry with a larger size (increased by
0.20) until the encoding can be detected or an error is raised.
"""
try:
byte_size = int(os.path.getsize(path) * size)
with open(path, "rb") as rawdata:
result = chardet.detect(rawdata.read(byte_size))
return pd.read_csv(path, encoding=result["encoding"])
except UnicodeError:
return read_csv(path=path, size=size + 0.20)
Hi, I just added a function to find the correct encoding and read the csv in the given file path. Thought it would be useful
Just add the encoding argument that matches the file you`re trying to upload.
open('example.csv', encoding='UTF8')
I have a working code for parsing a JSON output using KornShell by treating it as a string of characters. The issue I have is that the vendor keeps changing the position of the field that I am intersted in. I understand in JSON, we can parse it by key-value pairs.
Is there something out there that can do this? I am intersted in a specific field and I would like to use it to run the checks on the status of another RESTAPI call.
My sample json output is like this:
JSONDATA value :
{
"status": "success",
"job-execution-id": 396805,
"job-execution-user": "flexapp",
"job-execution-trigger": "RESTAPI"
}
I would need the job-execution-id value to monitor this job through the rest of the script.
I am using the following command to parse it:
RUNJOB=$(print ${DATA} |cut -f3 -d':'|cut -f1 -d','| tr -d [:blank:]) >> ${LOGDIR}/${LOGFILE}
The problem with this is, it is field delimited by :. The field position has been known to be changed by the vendors during releases.
So I am trying to see if I can use a utility out there that would always give me the key-value pair of "job-execution-id": 396805, no matter where it is in the json output.
I started looking at jsawk, and it requires the js interpreter to be installed on our machines which I don't want. Any hint on how to go about finding which RPM that I need to solve it?
I am using RHEL5.5.
Any help is greatly appreciated.
The ast-open project has libdss (and a dss wrapper) which supposedly could be used with ksh. Documentation is sparse and is limited to a few messages on the ast-user mailing list.
The regression tests for libdss contain some json and xml examples.
I'll try to find more info.
Python is included by default with CentOS so one thing you could do is pass your JSON string to a Python script and use Python's JSON parser. You can then grab the value written out by the script. An example you could modify to meet your needs is below.
Note that by specifying other dictionary keys in the Python script you can get any of the values you need without having to worry about the order changing.
Python script:
#get_job_execution_id.py
# The try/except is because you'll probably have Python 2.4 on CentOS 5.5,
# and the straight "import json" statement won't work unless you have Python 2.6+.
try:
import json
except:
import simplejson as json
import sys
json_data = sys.argv[1]
data = json.loads(json_data)
job_execution_id = data['job-execution-id']
sys.stdout.write(str(job_execution_id))
Kornshell script that executes it:
#get_job_execution_id.sh
#!/bin/ksh
JSON_DATA='{"status":"success","job-execution-id":396805,"job-execution-user":"flexapp","job-execution-trigger":"RESTAPI"}'
EXECUTION_ID=`python get_execution_id.py "$JSON_DATA"`
echo $EXECUTION_ID