Creating TAB-separated values from .xlsx with ssconvert - tabs

Due to commas within cell values, I am not able to use the ssconvert utility for .xls(x) to .csv conversion.
Is there a possibility to create tab-separated values directly from xlsx with ssconvert (command line spreadsheet format converter)?
ssconvert infile.xlsx outfile.tsv raises the error:
Unable to guess exporter to use
Hence, I have tried to generate a raw text file under specification of some export options, in particular, the separator:
ssconvert -O 'separator=\t format=raw' infile.xlsx outfile.txt
which results in output like value1\tvalue2\tvalue3, i.e., string \t is not translated into tabulator.

When called from a shell-script: Put a real tab-character between allowed quotes, e.g:
ssconvert -O 'separator=" " format=raw' infile.xlsx outfile1.txt
ssconvert -O "separator=\" \" format=raw" infile.xlsx outfile2.txt
ssconvert -O "separator=' ' format=raw" infile.xlsx outfile3.txt
To type this directly into a command shell a ctrl-v before the tab might be necessary.
Pasting this with the mouse will probably fail, as the tab will be replaced by spaces.

Related

Calling Imagemagick from awk?

I have a CSV of image details I want to loop over in a bash script. awk seems like an obvious choice to loop over the data.
For each row, I want to take the values, and use them to do Imagemagick stuff. The following isn't working (obviously):
awk -F, '{ magick "source.png" "$1.jpg" }' images.csv
GNU AWK excels at processing structured text data, although it can be used to summon commands using system function it is less handy for that than some other language, e.g. python has module of standard library called subprocess which is more feature-rich.
If you wish to use awk for this task anyway, then I suggest preparing output to be feed into bash command, say you have file.txt with following content
file1.jpg,file1.bmp
file2.png,file2.bmp
file3.webp,file3.bmp
and you have files listed in 1st column in current working directory and wish to convert them to files shown in 2nd column and access to convert command, then you might do
awk 'BEGIN{FS=","}{print "convert \"" $1 "\" \"" $2 "\""}' file.txt | bash
which is equvialent to starting bash and doing
convert "file1.jpg" "file1.bmp"
convert "file2.png" "file2.bmp"
convert "file3.webp" "file3.bmp"
Observe that I have used literal " to enclose filenames, so it should work with names containing spaces. Disclaimer: it might fail if name containing special character, e.g. ".

Bash: source file executes code inside strings

(Premise:)
Hi, i'm trying to read a JSON file whose content has to be copied inside a variable in a file that i'm going to need later, according to google the only way to do this is by using source. I need to avoid command execution during the sourcing tho, and, being a JSON file, all those quotes are causing me a bit of a headache.
Because of this I need to be sure that each and every one of them is escaped to be treated as plain text. I tried using sed command like this:
sed -e "s/'/\\\'/g" -e 's/"/\\"/g'
Checking again in the file i can see that every single quote has been escaped apart from the outer ones
ex: {{"foo":"bar"},{"bar":"foo}} -> VAR='{{\"foo\":\"bar\"}{\"bar\":\"foo\"}}'
Somehow when i execute the sourcing i get a lot of errors about commands and directories not existing.
(Question:)
Do you have any idea of what's going on? Is this even a viable solution to get to my goal? Is there any better way? If what i'm asking is not possible, is there any other way to use a file as a string variable store?
(Goal generic desired behaviour:)
read json
create conf file
attach json content to variable declaration string ("VAR=$(readthejson)")
attach variable declaration string to conf file
source conf file
use var as a string
(Trial specific desired behaviour:)
a=$( sed -e "s/'/\\\'/g" -e 's/"/\\"/g' myjson.json )
echo "LOCK='$a'" >> file
The lines above successfully fill my file with json content and escapes all quotes
(example from a package-lock.json file:)
LOCK='{
\"name\": \"Undecided\",
\"version\": \"0.0.1\",
\"lockfileVersion\": 2,
\"requires\": true,
\"packages\": {
\"\": {
\"version\": \"0.0.1\",
\"dependencies\": {
\"#capacitor/android\": \"3.1.2\",
\"#capacitor/app\": \"1.0.2\",
\"#capacitor/core\": \"3.1.2\",
\"#capacitor/haptics\": \"1.0.2\",
...
At this point i would expect that sourcing file would result in my strings being loaded in my script and being usable like so:
source file
echo "$LOCK"
output:
{
"name": "Undecided",
"version": "0.0.1",
"lockfileVersion": 2,
"requires": true,
"packages": {
"": {
"version": "0.0.1",
"dependencies": {
"#capacitor/android": "3.1.2",
"#capacitor/app": "1.0.2",
"#capacitor/core": "3.1.2",
"#capacitor/haptics": "1.0.2",
(Actual behaviour:)
The script escapes everything as needed. Though when i source it it outputs this:
usage: install [-bCcpSsv] [-B suffix] [-f flags] [-g group] [-m mode]
[-o owner] file1 file2
install [-bCcpSsv] [-B suffix] [-f flags] [-g group] [-m mode]
[-o owner] file1 ... fileN directory
install -d [-v] [-g group] [-m mode] [-o owner] directory ...
file: line 2128: },: command not found
file: line 2129: "node_modules/#hapi/bourne":: No such file or directory
file: line 2130: "version":: command not found
file: line 2131: "resolved":: command not found
file: line 2132: "integrity":: command not found
file: line 2133: "deprecated":: command not found
file: line 2134: },: command not found
file: line 2135: "node_modules/#hapi/hoek":: No such file or directory
file: line 2136: "version":: command not found
file: line 2137: "resolved":: command not found
file: line 2138: "integrity":: command not found
file: line 2139: "deprecated":: command not found
file: line 2140: },: command not found
file: line 2141: "node_modules/#hapi/joi":: No such file or directory
file: line 2142: "version":: command not found
file: line 2143: "resolved":: command not found
file: line 2144: "integrity":: command not found
file: line 2145: "deprecated":: command not found
file: line 2146: "dependencies":: command not found
file: line 2147: "#hapi/address":: No such file or directory
file: line 2148: "#hapi/bourne":: No such file or directory
file: line 2149: "#hapi/hoek":: No such file or directory
file: line 2150: "#hapi/topo":: No such file or directory
file: line 2151: syntax error near unexpected token `}'
file: line 2151: ` }'
It looks like it's ignoring backslashes, or that they get swapped with non escaped quotes and then sourced.
I mean, i would expect that echo "\"lorem\"\"ipsum\"" result in "lorem""ipsum", not in a
lorem command not found
ipsum command not found
Disclaimer: i'm not asking to code for me or debug my code (it's sad i really have to specify this)
Is there any better way?
If you want to output the value to the file to be sourced later, whatever the variable value, I like to use declare -p.
lock=$(cat myjson.json)
declare -p lock >> file_to_be_sourced.sh
declare is specific to Bash, and it will always output a properly quoted string that can be sourced later. Another way is to use printf "%q".
Do you have any idea of what's going on?
"s/'/\\\'/g" part is wrong. If you start with ' quotes, you have replace ' with '\''. \ is literal when inside ' quotes. And remove the s/"/\\"/g part - just keep inside ' quotes. The difference is cool to observe when using /bin/printf which is from coreutils vs printf as bash builtin - they use different quoting "style":
$ var="a\"b'c"
$ echo "$var"
a"b'c
$ printf "%q\n" "$var"
a\"b\'c
$ /bin/printf "%q\n" "$var"
'a"b'\''c'
$ declare -p var
declare -- var="a\"b'c"
If what i'm asking is not possible, is there any other way to use a file as a string variable store?
If it's a "store" with many strings, could be you could use an associative array.
declare -A mapvar
mapvar["abc"]=$(< file1.json)
mapvar["def"]=$(< file2.json)
declare -p mapvar > file_to_be_sourced.sh
I finally got to a solution.
Apparently if you echo "Something='$a'" it does't matter if you have single, double, triple, escaped or whatever quotes inside that variable: he just doesn't care and use the double ones.
For example in my case he just ignored the outermost ' and considered everything else as a command
My solution was very easily to:
echo "Something:\"$a\""
And now it's treated as an actual string.
I'm pretty confused about why, but that's probably caused by how bash works with quotes

Splitting a CSV column wise with no unique delimiters in shell script

I have a CSV with multiple rows some of which look like this
"ABC","Unfortunately, system has failed"," - Error in system"
"DEF","Check the button labelled "WARNING"","Warning in system"
"XYZ","Everything is okay","No errors"
I need to split these lines and extract the columns such as
I run a loop for each of the rows and extract the 2nd column as
awk -F , '{print $2}' $line
where $line represents each row. However, I end up getting incorrect values. For example, while trying to fetch 1st row 2nd column, using the above command gives me "Unfortunately
and not "Unfortunately, system has failed"
I understand that my strings have both commas and quotes in them which makes it harder to split based on a delimiter. Is there anything else I can try?
Using GNU awk and FPAT:
$ gawk '
BEGIN {
FPAT="([^,]*)|(\"[^\"]+\")"
}
{
print $2
}' file
Output:
"Unfortunately, system has failed"
"Check the button labelled "WARNING""
"Everything is okay"
It's not complete CSV parser, for example newlines inside quotes are not handled, you need to deal with them yourself (check NF and combine records). More about FPAT:
https://www.gnu.org/software/gawk/manual/html_node/Splitting-By-Content.html
If you want to get rid of those quotes:
$ gawk '
BEGIN {
FPAT="([^,]+)|(\"[^\"]+\")"
}
{
for(i=1;i<=NF;i++) # loop all fields
gsub(/^"|"$/,"",$i) # remove quotes surrounding fields
print $2
}' file
Output sample:
...
Check the button labelled "WARNING"
...
If you want to put your input data in a 3x3 table
you can do it with awk:
awk -v FS=',[^ ]' -v OFS="|" '{print $1, $2, $3}' file
ABC"|Unfortunately, system has failed"| - Error in system"
"DEF"|Check the button labelled "WARNING""|Warning in system"
"XYZ"|Everything is okay"|No errors"
the code:
why setting FS=',[^ ]'? Just that comma in , system is not a separator.

Bash: Base64 encode 1 column in a very large .csv and output to new file

I've tried using the code below but the csv file has over 80 million lines (roughly 25gb) and some of the special characters seem to break the echo command. The csv has 2 columns separated by a comma.
ex:
blah, blah2
data1,data2
line3,fd$$#$%T%^Y%&$$B
somedata,%^&%^&%^&^
The goal is to take that second column and base64 is to get ready to import into a sql db. I'm doing a base64 encode on the second column so there's unicode support etc and no character will corrupt the db.
I'm looking for a more efficient way of doing this that won't break on special chars etc.
awk -F "," '
{
"echo "$2" | base64" | getline x
print $1, x
}
' OFS=',' input.csv > base64.csv
Error:
sh: 1: Syntax error: word unexpected (expecting ")") :
not foundrf :
not found201054 :
not foundth :
not foundz09
| base64' (Too many open files)ut.csv FNR=1078) fatal: cannot open pipe `echo q1w2e3r4
The problem is that you're not quoting the argument to echo in the the awk script.
But there's no need to use awk for this, bash can parse the file directly.
IFS=, while read -r col1 col2
do
base64=$(base64 <<<"$col2")
echo "$col1, $base64"
done < input.csv > base64.csv
Try something like this in your MySQL command-line client:
LOAD DATA LOCAL '/tmp/filename.txt' INTO TABLE tbl FIELDS TERMINATED BY ','
You can reorder fields if needed and apply special expressions if you need to remove special characters, concatenate strings, convert date format, etc. If you still really need base64 conversion, MySQL versions 5.6 and later have a native function for that (TO_BASE64()), while there is a UDF for the older ones. See base64 encode in MySQL
However, as long as your columns do not have commas, LOAD DATA INFILE will be able to handle it, and you can save some disk space by avoiding the conversion.
For details on how to use LOAD DATA INFILE, see MySQL manual: https://dev.mysql.com/doc/refman/5.7/en/load-data.html
You will need to authenticate to MySQL as a user with the LOAD privilege, and have local-infile option enabled (e.g. by passing --local-infile=1 on the command line.
The goal is to take that second column and base64
With awk getline function:
awk -F',[[:space:]]*' '{ cmd="echo \042"$2"\042 | base64"; cmd | getline v;
close(cmd); print $1","v }' input.csv > base64.csv
The base64.csv contents (for your current input):
blah,YmxhaDIK
data1,ZGF0YTIK
line3,ZmQyNzMwOCMkJVQlXlklJjI3MzA4Qgo=
somedata,JV4mJV4mJV4mXgo=

Shell: Replacing each New Line "\n" character with "\\n"

I'm inserting a git diff of changed files into a JSON object to send using a curl request.
The problem is it doesn't like the new-line characters being inserted into the JSON but I'm not sure how to get around that. Translate tool didn't work, this perl solution I'm using is close but just replaces with spaces:
changedfiles=$(git diff --name-only $3..$4 | perl -p -e 's/\n/ /')
and changing it to this didn't help:
changedfiles=$(git diff --name-only $3..$4 | perl -p -e 's/\n/\\n/')
Can anyone point me in the right direction? It doesn't need to use perl, it just needs to work
(...being simple would be nice too)
Instead of trying to do ad-hoc escaping for characters that your immediate testing finds problematic, how about using an actual JSON library that handles all of them in a solid way?
Here's an example in bash using inlined python:
python -c '
import json
import sys
print(json.dumps({"data": sys.argv[1]}))
' "$(git diff --name-only $3..$4)"
It prints the json object { "data": "your command output here" } with standards compliant escaping.
This is what I think you want to do to get a quoted list of files separated by commas (i.e. for inserting into a JSON string):
git diff --name-only $3..$4 | perl -p -e 's/(.*)/"$1",/;s/\n//;s/""/","/'
This works if your files don't contain double quotes or special characters that need to be JSON escaped.
First, we put the files in quotes followed by a comma, then remove newlines, then change the "" between files to ",". Although, this is kind of a hack. Somewhat better might be:
git diff --name-only $3..$4 | perl -p -e '$/="";s/(.*)\n/"$1",/g;s/,$//'
Here we read in the whole input, newlines and all, do our substitution and remove the final comma.