Saving specific Excel sheet as .csv - csv

I am trying figure out how to save a specific Excel sheet as CSV via command line on Linux.
I am able to save the first sheet with the command below:
libreoffice --headless --convert-to csv --outdir /tmp /tmp/test.xls
It seems that there should be a way to specify the sheet I want to save, but I am not able to find one.
Is there a way to save it via LibreOffice?

I know OP has probably moved on by now but since this was the first result in my search, I figured I'd take a stab at leaving an answer that works and is actually usable for the next googler.
First, LibreOffice still only lets you save the first sheet. If that is all you need, then try libreoffice --convert-to csv Test.ods. Interestingly, the GUI does the same thing - only letting you export the active sheet. So it's not that terminal is ignored so much that it is just a limitation in LibreOffice.
I needed to extract several sheets into separate csv files so "active sheet only" didn't cut it for me. After seeing this answer only had a macro as the suggestion, I kept looking. There were a few ways to get the other sheets in various places I found after this page. I don't recall any of them that allowed you to extract a specific sheet (unless it was some random github tool that I skipped over).
I liked the method of using the Gnumeric spreadsheet application because it is in most central repos and doesn't involve converting to xsl / xslx first. However, there are a few caveats to be aware of.
First, if you want to be able to extract only one sheet without knowing the sheet name ahead of time then this won't work. If you do know the sheet name ahead or time or are ok with extracting all the sheets, then this works fairly well. The sheet name can be used to create the output files so it's not completely lost which is nice too.
Second, if you want the quoting style to match the same style you'd get by manually exporting from the LibreOffice GUI, then you will need to forget the term "csv" and think in terms of "txt" until you finish the conversion (e.g. convert to .txt files then rename them). Otherwise, if you don't care about an exact match on quoting style, then this doesn't matter. I will show both ways below. If you don't know what a quoting style is, basically in csv if you have spaces or a string that contains , you put quotes around the cell value to distinguish from the commas used to separate text. Some programs quote everything, others quote if there are spaces and/or commas in the value, and others don't quote at all (or only quote for commas?).
Last, there seems to be a difference in the precision when converting via LibreOffice and Gnumeric's ssconvert tool. Not enough to matter for most people, for most use-cases. But still worth noting. In my original ods file, I had a formula that was taking the average of 3 cells with 58.14, 59.1, and 59.05 respectfully. This average came to 58.7633333333333 when I exported via the LibreOffice GUI. With ssconvert, the same value was 58.76333333333333 (e.g. it had one additional decimal place compared to LibreOffice version). I didn't really care for my purposes but if you need to exactly match LibreOffice or don't want the extra precision, then I guess it might matter.
From man ssconvert, we have the following options:
-S, --export-file-per-sheet: Export a file for each sheet if the exporter only supports one sheet at a time. The output filename is treated as a template in which sheet number is substituted for %n, sheet name is substituted for %s, and sheet object name is substituted for %o in case of graph export. If there are no substitutions, a default of ".%n" is added.
-O, --export-options=optionsstring : Specify parameters for the chosen exporter. optionsstring is a list of parameter=value pairs, separated by spaces. The parameter names and values allowed are specific to the exporter and are documented below. Multiple parameters can be specified
During my testing, the -O options were ignored if I specified the output file with a .csv extension. But if I used .txt then they worked fine.
I'm not covering them all and I'm paraphrasing so read the man page if you want more details. But some of the options you can provide in the optionsstring are as follows:
sheet: Name of the sheet. You can repeat this option for multiple sheets. In my testing, using indexes did NOT work.
separator: If you want a true comma separated values files, then we'll need to use commas.
format: I'll be using raw bc I want the unformatted values. If you need something special for dates, etc read the man page.
quoting-mode: when to quote values. can be always, auto, or never. If you want to mimic LibreOffice as closely as possible, choose never.
So let's get to a terminal.
# install gnomic on fedora
$ sudo dnf install -y gnumeric
# install gnomic on ubuntu/mint/debian
$ sudo apt-get install -y gnumeric
# use the ssconvert util from gnumeric to do the conversion
# let it do the default quoting - this will NOT match LibreOffice
# in this example, I am just exporting 1 named sheet using
# -S, --export-file-per-sheet
$ ssconvert -S -O 'sheet=mysheet2' Test.ods test_a_%s.csv
$ ls *.csv
test_a_mysheet2.csv
# same thing but more closely mimicking LibreOffice output
$ ssconvert -S -O 'sheet=mysheet2 separator=, format=raw quoting-mode=never' Test.ods test_b_%s.txt;
$ mv test_b_mysheet2.txt test_b_mysheet2.csv;
# Q: But what if I don't know the sheet names?
# A: then you'll need to export everything
# notice the 'sheet' option is removed from the
# list of -O options vs previous command
$ ssconvert -S -O 'separator=, format=raw quoting-mode=never' Test.ods test_c_%n_%s.txt;
$ ls test_c*
test_c_0_mysheet.txt test_c_3_yoursheet2.txt
test_c_1_mysheet2.txt test_c_4_yoresheet.txt
test_c_2_yoursheet.txt test_c_5_holysheet.txt
# Now to rename all those *.txt files to *.csv
$ prename 's/\.txt/\.csv/g' test_c_*.txt
$ ls test_c*
test_c_0_mysheet.csv test_c_3_yoursheet2.csv
test_c_1_mysheet2.csv test_c_4_yoresheet.csv
test_c_2_yoursheet.csv test_c_5_holysheet.csv

Command:
soffice --headless "macro:///Library1.Module1.ConvertSheet(~/Desktop/Software/OpenOffice/examples/input/Test1.ods, Sheet2)"
Code:
Sub ConvertSheet( SpreadSheetPath as String, SheetNameSeek as String)
REM IN SpreadSheetPath is the FULL PATH and file
REM IN SheetName sheet name to be found and converted to CSV
Dim Doc As Object
Dim Dummy()
SheetNameSeek=trim(SheetNameSeek)
If (Not GlobalScope.BasicLibraries.isLibraryLoaded("Tools")) Then
GlobalScope.BasicLibraries.LoadLibrary("Tools")
End If
REM content of an opened window can be replaced with the help of the frame parameter and SearchFlags:
SearchFlags = com.sun.star.frame.FrameSearchFlag.CREATE + _
com.sun.star.frame.FrameSearchFlag.ALL
REM Set up a propval object to store the filter properties
Dim Propval(1) as New com.sun.star.beans.PropertyValue
Propval(0).Name = "FilterName"
Propval(0).Value = "Text - txt - csv (StarCalc)"
Propval(1).Name = "FilterOptions"
Propval(1).Value = "44,34,76,1"
Url=ConvertToUrl(SpreadSheetPath)
Doc = StarDesktop.loadComponentFromURL(Url, "MyFrame", _SearchFlags, Dummy)
FileN=FileNameoutofPath(Url)
BaseFilename = Tools.Strings.GetFileNameWithoutExtension(FileN)
DirLoc=DirectoryNameoutofPath(ConvertFromUrl(Url),"/")+"/"
Sheets = Doc.Sheets
NumSheets = Sheets.Count - 1
For J = 0 to NumSheets
SheetName = Sheets(J).Name
if (SheetName = SheetNameSeek) then
Doc.getCurrentController.setActiveSheet(Sheets(J))
Filename = DirLoc + BaseFilename + "."+ SheetName + ".csv"
FileURL = convertToURL(Filename)
Doc.StoreAsURL(FileURL, Propval())
end if
Next J
Doc.close(true)
NextFile = Dir
End Sub

I ended up using xlsx2csv
Version 0.7.8 supports general xlsx files pretty well. It allows to specify the tab by number and by name.
It does not do a good job on macros and complication multi-sheet documents, but it does a very good job on regular multi-sheet xlsx documents.
Unfortunately, xlsx2csv does not support password protected xlsx, so for that I still have to use Win32::OLE Perl module and run it on Windows environment.
From what I can see Libreoffice still does not have the ability to select the tab via command line.

Related

Recursively Replace One Windows Path w/ Another in Text Files

I have a large amount of text files stored on a Red Hat server that contain explicit Windows paths. Today, that path has changed and I would like to change the text files to reflect the new path. As they are Windows paths, they all contain single backslashes. I would like to maintain the single backslashes if possible.
I wanted to ask what the best method to perform this string replacement would be. I have made backups of folders so that I may test on a smaller scale before applying to the larger scale that will affect my group members.
Example:
Change $oldPath to $newPath in all *.py files recursively contained in current directory.
i.e. $oldPath\common\file_referenced should become $newPath\common\file_referenced
Robustly using any awk in any shell on every Unix box and regardless of which characters your old or new directory paths contain and whether or not the final directory in either old or new could be a substring of another existing directory name:
$ cat file
\old\fashioned\common\file_referenced
$ oldPath='\old\fashioned'
$ newPath='\new\fangled\etc'
$ awk '
BEGIN { old=ARGV[1]; new=ARGV[2]; ARGV[1]=ARGV[2]="" }
index($0"\\",old"\\")==1 { $0=new substr($0,length(old)+1) }
1' "$oldPath" "$newPath" file
\new\fangled\etc\common\file_referenced
To update all .py files in a directory you could use GNU awk for -i inplace, or you could do for i in *.py; do awk '...' old new "$i" > tmp && mv tmp "$i"; done, or you could use find and/or xargs, etc. - any of the common Unix ways to process multiple files with any command.

How to copy or move multiple files with same extension?

So I am trying to move a bunch of files with similar extensions from /home/ to /root/
Code I tried is
file copy /home/*.abc.xyz /root/
Also tried
set infile [glob -nocomplain /home/*.abc.xyz ]
if { [llength $infile] > 0 } {
file copy $infile /root/
}
No success.
Your two attempts fail for different reasons:
There is no wildcard expansion in arguments to file copy, or any Tcl command, for that matter: file copy /home/*.abc.xyz /root/. This will look for a single source with a literal * in its filename.
glob -nocomplain /home/*.abc.xyz is ok to collect the sources, but glob returns a list of sources. file copy requires each source to passed as a separate argument, not a single one. To expand a single collection value of source files into a multiple separate arguments, use the Tcl expansion operator {*}
Therefore:
set infiles [glob -nocomplain *.tcl]
if {[llength $infiles]} {
file copy {*}$infiles /tmp/tgt/
}
For a 1-line answer:
file copy {*}[glob /home/*.abc.xyz] /root/.
The file copy (and file rename) commands have two forms (hence the reference to the manual page in the comment). The first form copies a single file to a new target. The second form copies all the file name arguments to a new directory and this form of the command insists that the directory name be the last argument and you may have an arbitrary number of source file names preceding. Also, file copy does not do glob expansion on its arguments, so as you rightly surmised, you also need to use the glob command to obtain a list of the files to copy. The problem is that the glob command returns a list of file names and you passed that list as a single argument, i.e.
file copy $infile /root/
passes the list as a single argument and so the file copy command thinks it is dealing with the first form and attempts to find a file whose name matches that of the entire list. This file probably doesn't exist. Placing the error message in your question would have helped us to know for sure.
So what you want to do is take the list of files contained in the infile variable and expand it into separate argument words. Since this is a common situation, Tcl has some syntax to help (assuming you are not using some ancient version of Tcl). Try using the command:
file copy {*}$infile /root/
in place of your first attempt and see if that helps the situation.

awk processing files with different extensions

I have to process multiple CSV and TXT files in one awk script. My cmd file on windows looks like: gawk -f script.awk *.csv *.txt > output.file
I'd like to use this cmd file as I don't want to always type into the command prompt whenever I want to run the script. I would like to perform different tasks with the different file types. I have tried some stuff inside the script file like if (match(FILENAME, ".csv")) && (FNR > 1) but none of them were working. I have about 4-5 CSV files and a lot of (like 1000+) TXT files, these are all input files. The content of the CSV files are all in the same schema, one column between quotes. Example:
"Player"
"adigabor"
I want to ignore the first line of all the input CSV files when processing them and add each record w/o the quotes into an array and after that I'd like to process the TXT files which I can do just fine, my problem is that I couldn't perform the different tasks with the different input file extensions in one script.
It would be extremely useful if you told us in what way "none of them were working" so we're not just guessing but here goes anyway:
The main problem with match(FILENAME, ".csv") is it'll match csv preceded by any char anywhere in the file name. To get files that end in literally .csv you want:
match(FILENAME,/\.csv$/)
but you don't need to call a function for that:
FILENAME ~ /\.csv$/
So your script would look like:
FILENAME ~ /\.csv$/ {
if ( FNR > 1 ) {
do CSV stuff
}
next
}
{
do TXT stuff
}
If you still can't do whatever you're trying to do then edit your question to include sample input files (at least one of each small .csv and .txt files) and expected output along with a better explanation of what you are trying to do.

Using arrays in a for loop, in bash [duplicate]

This question already has answers here:
bash script, create array of all files in a directory
(3 answers)
Closed 7 years ago.
I am currently working on a bash script where I must download files from our mySQL database, host them somewhere different, then update the database with the new location for the image. The last portion is my problem area, creating the array full of filenames and iterating through them, replacing the file names in the database as we go.
For whatever reason I keep getting these kinds of errors:
not found/X2b6qZP.png: 1: /xxx/images/X2b6qZP.png: ?PNG /xxx/images/X2b6qZP.png: 2: /xxx/images/X2b6qZP.png: : not found
/xxx/images/X2b6qZP.png: 1: /xxx/images/X2b6qZP.png: Syntax error: word unexpected (expecting ")")
files=$($DOWNLOADDIRECTORY/*)
files=$(${files[#]##*/})
# Iterate through the file names in the download directory, and assign the new values to the detail table.
for file in "${files[#]}"
do
mysql -h ${HOST} -u ${USER} -p${PASSWORD} ${DBNAME} "UPDATE crm_category_detail SET detail_value = 'http://xxx.xxx.x.xxx/img/$file' WHERE detail_value LIKE '%imgur.com/$file'"
done
You are trying to execute a glob as a command. The syntax to use arrays is array=(tokens):
files=("$DOWNLOADDIRECTORY"/*)
files=("${files[#]##*/}")
You are also trying to run your script with sh instead of bash.
Do not run sh file or use #!/bin/sh. Arrays are not supported in sh.
Instead use bash file or #!/bin/bash.
whats going on right here?
files=$($DOWNLOADDIRECTORY/*)
I dont think this is doing what you think it is doing.
According to this answer, you want to omit the first $ to get an array of files.
files=($DOWNLOADDIRECTORY/*)
I just wrote a sample script
#!/bin/sh
alist=(/*)
printf '%s\n' "${alist[#]}"
Output
/bin
/boot
/data
/dev
/dist
/etc
/home
/lib
....
Your assignments are not creating arrays. You need arrayname=( values for array ) as the notation. Hence:
files=( "$DOWNLOADDIRECTORY"/* )
files=( "${files[#]##*/}" )
The first line will give you all the names in the directory specified by $DOWNLOADDIRECTORY. The second carefully removes the directory prefix.
I've used spaces after ( and before ) for clarity; the shell neither requires nor objects to them. I used double quotes around the variable name and expansions to keep things sane when name do contain spaces etc.
Although it isn't immediately obvious why you might do this, its advantage over many alternatives is that it preserves spaces etc in file names.
You could just loop directly over the files:
for file in "$DOWNLOADDIRECTORY"/*; do
file="${file##*/}" # or file=$(basename "$file")
# MySQL stuff
done
Some quoting added in case of spaces in paths.

Natural ordering files in directory into a cell array using Octave

I have files being generated by another program/user that have names such as "jh-1.txt, jh-2.txt, ..., jh-100.txt, ..., jh-1024.txt". I'm extracting a column from these files, manipulating the data, and outputting to a new matrix. The only problem is that Octave is using ASCII ordering and not natural ordering when reading in the files. Thus, the output matrix is not ordered in a natural way. My question is, can Octave sort file names in a natural order? I'm getting file names in the standard method:
fileDirectory = '/path/to/directory';
filePattern = fullfile(fileDirectory, '*.txt'); % Selects only the txt files.
dataFiles = dir(filePattern); % Gets the info from the txt files in the directory.
baseFileName = {dataFiles.name}'; % Gets all the txt file names.
I can't rename the files because this is a script for another user. They are on a Windows machine and already have Octave installed with Cygwin and I don't want to make them use the command line more than they have to because they are unfamiliar with it. Alternatively, it would be nice to have the output with the file names in a column but, I haven't figured that one out either (bit of a noob with Octave myself). That way the user could use Excel (which they are familiar with) to sort the columns.
I don't think there's a built in natural sort in Octave. However, there is a natural sort submission on Mathwork's File Exchange. I've not used it, but the comments imply it works in Octave too.