Combine CSV files in windows (.cmd or .bat file preferably) - csv

I may at various times have .csv files I need to combine. They have the same headers and column layout. I just need a simple way to combine them in Windows 7. The user may not always have excel installed.
A .cmd macro would be great, but the ones I found online don't work.
The best i've got so far is this:
"open a command window ("cmd.exe") and type the following two lines (no brackets)
cd "Desktop\[csv-files]"
type *.csv > my-new-file.csv"
Where the files to be combined are in Desktop\[csv-files].
BUT - it seems to create duplicates (or in some case triplicates) of the combined entries. For instance I have 2 files I tested with 23 and 26 unique entries respectivly. I got out a file with 100 entries and at least one entry repeated 3 times.
Right now the .csv files I am testing are only ~25 entries long, but in time they could be thousands or more.

Sounds like you have an issue with using *.csv and redirecting the output to a .csv file in the same folder. DOS seems to be finding the my-new-file.csv file because of the *.csv and is typing it into itself... You could use a different output filename extension until after the type command finishes, then you could rename the output file... Something like:
cd "Desktop\[csv-files]"
type *.csv > my-new-file.txt
ren my-new-file.txt my-new-file.csv
You can also skip the header of each file after the first, so that you don't end up with file headers throughout the middle of the output file. Try the following:
#echo off
setlocal ENABLEDELAYEDEXPANSION
set cnt=1
cd "Desktop\[csv-files]"
for %%i in (*.csv) do (
if !cnt!==1 (
for /f "delims=" %%j in ('type "%%i"') do echo %%j >> my-new-file.txt
) else (
for /f "skip=1 delims=" %%j in ('type "%%i"') do echo %%j >> my-new-file.txt
)
set /a cnt+=1
)
endlocal
ren my-new-file.txt my-new-file.csv
Explanation:
I used ENABLEDELAYEDEXPANSION to make sure the cnt variable is properly evaluated. When delayed expansion is enabled, you use ! to distinguish variables instead of %. So to evaluate the cnt variable, you use !cnt! instead of %cnt%. Delaying expansion makes it wait to evaluate the value of cnt until the moment that it is used. Sometimes, but not always, if you use %cnt%, it will equal a value from a previous iteration. If you enable delayed expansion and use !cnt!, it will always evaluate the correct current value.
By setting cnt to 1, we can run different code for the 1st .csv file that is processed. The code includes all lines from the 1st .csv file, but skips the first line of all subsequent .csv files.
I used a nested for loop. The outer for cycles through all .csv files in the current folder. The inner for loop executes the type "%%i" command, where %%i is the name of the .csv file. Each line of the file is processed individually as %%j, which is passed to the echo %%j command. echo would normally print the value for %%j to the command prompt window. However, you can redirect the output to a file using > or >>. The > redirector overwrites the output file with the new value. The >> redirector appends the new value to the output file. Since each line of each file, and each file is being processed individually, we must use the >> redirector to push all content into a single file.
When using the for /f command, the output is broken into individual parts using the specified delimiter. The default delimiter is a space. If I didn't include "delims=", then the text This is fun would be broken into the following:
%%j = This
%%k = is
%%l = fun
We want to process the whole line from the .csv file all-at-once. By setting the delimiter to nothing ("delims="), the whole line can be processed using %%j.
For more specific help about how the for command works, type for /? at a command prompt.
endlocal reverts the environment to its state at the point where setlocal was used. Any variables you declared are removed, and extensions are set back to their prior value.

Related

Autosum column in csv using batch

I have created batch file to combine all csv file in folder. Below is my batch file code.
#ECHO OFF
SET first=y
SET newfile=Summary.csv
for %%F in (*.csv) do IF NOT %%F==%newfile% (
if defined first (
COPY /y "%%F" %newfile% >nul
set "first="
) else (
FOR /f "skip=1delims=" %%i IN (%%F) DO >> %newfile% ECHO %%i
)
)
My question is, how do i add into the code if i want to add the autosum for every column?
Below is my example csv file after i run the batch file.
Name,A4 Used,A3 Used,Others
A,23,9,2
B,61,41,0
C,5,85,7
I need to create an autosum for every column like example below.
Name,A4 Used,A3 Used,Others
A,23,9,2
B,61,41,0
C,5,85,7
Total,89,135,9
Any idea guys?
This task could be done with following commented batch code depending on contents of processed CSV files:
#echo off
setlocal EnableExtensions DisableDelayedExpansion
rem Exit this batch file if current directory does not contain any CSV file.
if not exist *.csv goto EndBatch
rem The summary CSV file is created first in directory for temporary
rem files to avoid that outer FOR loop below tries to process also
rem the summary file. The summary file is created with header row.
set "NewFile=%TEMP%\Summary.csv"
echo Name,A4 Used,A3 Used,Others>"%NewFile%"
rem Make sure there is no summary CSV file in current directory
rem from a previous execution of this batch file in this directory.
del "Summary.csv" 2>nul
rem Initialize the environment variables for total sum of each column.
set "TotalColumn2=0"
set "TotalColumn3=0"
set "TotalColumn4=0"
rem The outer loop is executed for each CSV file in current directory.
rem The inner loop reads each CSV file line by line. The first line is
rem always skipped. Skipped are also empty lines and lines starting with
rem a semicolon. All other lines are split up into four substrings using
rem comma as separator (delimiter).
for %%I in (*.csv) do (
for /F "usebackq skip=1 tokens=1-4 delims=," %%A in ("%%I") do (
if not "%%D" == "" (
set /A TotalColumn2+=%%B
set /A TotalColumn3+=%%C
set /A TotalColumn4+=%%D
>>"%NewFile%" echo %%A,%%B,%%C,%%D
) else (
del "%NewFile%"
echo ERROR: A line in "%%I" has not four comma separated values.
echo/
pause
goto EndBatch
)
)
)
rem Append to summary file the total sums and move the summary file
rem from temporary files directory to current directory. If that fails
rem unexpected, delete the summary file in temporary files directory.
>>"%NewFile%" echo Total,%TotalColumn2%,%TotalColumn3%,%TotalColumn4%
move "%NewFile%" "Summary.csv" >nul
if errorlevel 1 (
del "%NewFile%"
echo ERROR: Could not move Summary.csv to "%CD%".
echo/
pause
)
:EndBatch
endlocal
Please note the limitations of Windows command interpreter:
Arithmetic expressions can be done only with 32-bit signed integers which means the value range is limited from -2147483648 to 2147483647. There is no support for floating point arithmetic.
Command FOR interprets a sequence of delimiters as one delimiter on splitting up a line into substrings. So a row like D,80,,20 in a CSV file results in loop variable A gets assigned D, loop variable B gets assigned 80, loop variable C gets assigned 20 and loop variable D has nothing assigned. In this case the batch file exits with an error message.
For understanding the used commands and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.
del /?
echo /?
endlocal /?
for /?
goto /?
if /?
move /?
pause /?
rem /?
set /?
setlocal /?
Read also the Microsoft article about Using Command Redirection Operators.

.bat file to automatically add column to pipeline csv and populate with today's date

I'm very new to .bat files and have excitedly created some to copy, move and rename documents.
Despite searching, I'm getting stuck with a more complex command, largely because the document I'm trying to modify is pipeline delimited rather than 'normal' csv...
My question: Can I, and if I can - how do I take an existing pipeline delimited csv that always has the same number of columns and add a column onto the end with todays date (DD/MM/YYYY) in it for every row?
$ awk -F, 'NF>1{$0 = "\"YYYY-MM-DD\"" FS $0}1' file
sed 'N;s/","/","YYYY-MM-DD HH:MM:SS","/5' file
I cant seem to get anything to even modify the document at the moment :-(
Your batch attempt isn't that bad:
#echo off
for /f "delims=" %%a in ('type "file.csv"') do (
>>"fileout.csv" echo.%%a|%time%
)
Just a few adjustments:
#echo off
(for /f "delims=" %%a in (file.csv) do (
echo(%%a^|%time%
))>"fileout.csv"
for /f is able to process the contents of a file directly (no need for type, although it works fine)
Redirecting (>>) inside the loop is slow, because the file will be opened, and closed every time you write to it (although it works). It's much faster to only once open/close the file (especially with large files).
echo. is not secure (although it works fine in most circumstances), best option is echo(.
The pipe, | is a special character and in this case needs escaping with the caret, ^.
Note: for /f skips empty lines, so they will not be in the new file. Same with lines, that start with ; (default EOL)
Edit for "adding |Date to the Header":
#echo off
<file.csv set /p header=
>fileout.csv echo %header%^|DATE
(for /f "skip=1 delims=" %%a in (file.csv) do (
echo(%%a^|%date%
))>>"fileout.csv"
<file.csv set /p header= is a way to read just one (the first) line of a file to a variable. Write it back with the new data appended (or leave it unchanged - your comment isn't quite clear about that). Use skip=1 to skip the first line with further processing.
Don't forget to change ))>"fileout.csv" to ))>>"fileout.csv".

Read all the csv file in a folder and only showed a single header in the output file

I would like to read all the csv file in the folder and compile it using an awk file. Below is the code that i had wrote:
#echo off
del c_1.csv
setlocal ENABLEDELAYEDEXPANSION
set file2=*.csv
set outputfile=c_1.csv
REM get header:
set /p header=<%outputfile%
for %%i in (*.csv) do (
if not exist %header% (
nawk -f "c_1.awk" *.csv >> c_1.csv
)
if exist %header% (
nawk -f more +1 "c_1.awk" *.csv >> c_1.csv
)
)
echo done!
setlocal
pause
goto:eof
But the header still printed in my output file and it had also printed extra data that is incorrect also. Ur help will be appreciated.Thanks
Will this not do what you want?
nawk "FNR==1 && NR!=1{next;}{print}" *.csv>c_1.csv
Idea taken from here.
EditAs it seems I understood your request wrongly, (I didn't properly read the question and assumed you were concatenating files, but only retaining the header on the first). You appear to be running an awk script, c_1.awk on all csv's in the current directory, if the header of any csv doesn't match the input from outputfile then you're intending to 'compile' the entire file, if it does then you're wanting to bypass that header.
The main problem with your batch-file lies with the fact that if exist doesn't tell you if the content of %header% is empty, for that you'll need If Defined header. That said, as you have already deleted the input file, your set /p command would output an error The system cannot find the file specified. and header will still not be defined.
I think that what you should really do is adjust your awk script such that it takes the header to match as an input parameter. That would be much better than trying to check the content in a different language then run one of two awk commands depending upon that content.

Batch File that analyze and present data from csv files

I want to create a .bat file that will present the last row of every .csv files that the file name start with "Togo".
The batch file will be located in the same folder as the .csv files.
To output should be the:
[File Name]
[Last Row Data]
This batch file should always run and test the .csv files every 5 minutes.
SO is not a free code-writing service. Your question is likely to be deleted or closed since you have not shown any attempt to solve your problem.
That having been said, it's difficult to start in batch, so here's a solution.
#ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
SET "sourcedir=U:\sourcedir\t w o"
FOR /f "delims=" %%a IN (
'dir /b /a-d "%sourcedir%\togo*.csv" '
) DO (
FOR /f "usebackqdelims=" %%q IN ("%sourcedir%\%%a") DO SET "line=%%q"
ECHO %%a !line!
)
GOTO :EOF
The first two lines turn off batch's debugging display (show-the-command, then execute it) and invoke a mode where access to variables that have changed within a "code block" (parenthesised series of commands) is enabled (normally, it's disabled and only the value at the time the if (or whatever) is encountered is available.)
Line 3 sets a variable called sourcedir and assigns it a value. The enclosing quotes ensure that trailing spaces are not included in the value assigned. I've deliberately used a directoryname that includes spaces because that's a common problem and it proves the batch in my test regime. Your directoryname would be different - simply substitute that. The directoryname . means "the current directory"
Lines 4-6 could be combined to one - just stylistics. It means "perform a directory scan, no directory names (/a-d) in basic form (/b) that is, names-only, of the directory whose name is in the variable sourcedir and whose names fit the pattern togo*.csv. Process each resultant line by ignoring the default delimiters and assigning the result (ie the entire line of the directory list, ie the filenames) to the metavariable (loop-control variable) %%a.
The next line reads each line of the file built from the source-directory name and the filename currently in %%a. delims is set to nothing so the entire line will be assigned to the metavariable %%q and the usebackq option tells cmd that the parenthesised string is a quoted-filename, not a string (or a single-quoted command-to-be-executed as in the first for.) the variable line will then be set with each successive line from the file, so ;ine will have the last line from the file when the for...%%q... ends.
The following line show the filename in %%a and the text from the last line of that file in line.
Note the difference - %%x to access the contents of a metavariable, %var% to access the contents of a variable, but !var! to access the changed value (if delayedexpansion has been invoked).
The goto :eof means "go to the physical end-of-file" CMD understands :eof to mean "physical end-of-file"
So - cut-and-paste to a file named whatever.bat and then run by simply entering
*whatever*
In general,
for /?
will yield help for the for command, and this holds for most batch commands. Look on SO for thousands of examples.
You may also examine
timeout
cls
choice
for clues about how to achieve your every 5 minutes ambition. You might want to run this from the task scheduler to get an every 5 minutes display - many ways to achieve the same thing.

Automating CSV file merging and cleaning

I'm not a code guy and have spent whole day trying to get this done without success, hoping I can get some help from the experts.
I have a folder called Vehicles, within which are two sub-folders - Automobiles and Trucks. Each of sub-folders contain two CSV files which have identical (to that sub-folder) headers/structure.
What I'm trying to accomplish:
Take the two CSV files in Automobiles folder merge them without duplicating headers and name the merged file as Automobiles.csv
Delete all rows in Automobiles.csv where 6th column (header is Fuel_Type) is "Diesel" (without the quotes) then move the file from sub-folder to main Vehicles folder.
Take the two CSV files in Trucks folder merge them without duplicating headers and name merge file as Trucks.csv
For merged file in trucks folder remove all rows where 6th column (header is "Fuel_Type") is "Diesel" (without the quotes) then move the file from sub-folder to main Vehicles folder.
Obviously if someone can help with 1 and 2 I can manipulate it for 3 and 4.
BONUS POINTS :) take the Automobiles.csv and Trucks.csv files and create Vehicles.xls file with Automobiles and Trucks tabs.
Few details - files are pretty large, each CSV can up to 350 thousand rows x 150 columns and be 200 MB in size each. All the Batch scripts that I tried to put together removing headers seemed to freeze with larger files.
Due to user permissions on work computers would strongly prefer to use something that is native to Windows7/8 and doesn't require additional software, but would consider other options if nothing native is available.
I am going to assume all csv files contain the same header information, in the same order. This is how i would do it in powershell v2:
Function Merge-Vehicles
{
param(
[string]$PathToCsv1,
[string]$PathToCsv2,
[string]$ExportPath
)
$regex = "^(?:d|D)(?:i|I)(?:e|E)(?:s|S)(?:e|E)(?:l|L)$"
$CSV = Import-Csv $PathToCsv1
$CSV += Import-Csv $PathToCsv2
$CSV | Where-Object { ( $_.'Fuel_Type' -notmatch "$regex") } | Export-Csv -Path $ExportPath
}
First the function will import the csv files from a user defined path, and combine the imported objects into an array. It will than filter the objects that do not contain the string "Diesel" within the property 'Fuel_Type'. The objects that remain will be exported as a csv to a user defined path.
To call the function for Automobiles
Merge-Vehicles -PathToCsv1 C:\Vehicles\Autmobiles\csv1.csv -PathToCsv2 C:\Vehicles\Autmobiles\csv2.csv -ExportPath C:\Vehicles\Autmobiles.csv
And for Trucks
Merge-Vehicles -PathToCsv1 C:\Vehicles\Trucks\csv1.csv -PathToCsv2 C:\Vehicles\Trucks\csv2.csv -ExportPath C:\Vehicles\Trucks.csv
I don't have excel installed so I can't help with the workbook creation.
#ECHO OFF
SETLOCAL
SET "sourcedir=U:\vehicles\automobiles"
SET "destdir=U:\vehicles"
FOR /f "usebackqdelims=" %%a IN ("%sourcedir%\file1.csv" "%sourcedir%\file2.csv") DO (
ECHO(%%a>"%destdir%\vehicles.csv"
ECHO(%%a>"%destdir%\vehicles_censored.csv"
GOTO donehdrs
)
:donehdrs
FOR /f "skip=1tokens=1-6*usebackqdelims=," %%a IN ("%sourcedir%\file1.csv" "%sourcedir%\file2.csv") DO (
ECHO(%%a,%%b,%%c,%%d,%%e,%%f,%%g>>"%destdir%\vehicles.csv"
IF "%%f" neq "Diesel" ECHO(%%a,%%b,%%c,%%d,%%e,%%f,%%g>>"%destdir%\vehicles_censored.csv"
)
GOTO :EOF
You don't really indicate whether you want the intermediate files.
Note that this will not work properly if any field before field 6 contains a "quoted string containing commas" nor if any field before field 6 is completely empty (ie. on lines of the form data1,data2,,data4,data5,data6,data7)
Insufficient info to improve without examining a plethora of possibilities...
It might even be easier to simply use copy to merge the two files, then use an editor to remove the superfluous header line.
Batch files are inherently slow, and this is evident when large data files are processed. I think the method below is the fastest way to perform this process using a Batch file.
#echo off
setlocal EnableDelayedExpansion
cd C:\Vehicles
call :MergeFiles Automobiles > Automobiles.csv
call :MergeFiles Trucks > Trucks.csv
goto :EOF
:MergeFiles
rem Enter to the given sub-folder (Automobiles or Trucks)
pushd %1
rem Process the (two) CSV files existent here
set "keepHeader=1"
for %%f in (*.csv) do (
ECHO Procesing file: "%%~Ff" > CON
call :ProcessFile < "%%f"
set "keepHeader="
)
popd
exit /B
:ProcessFile
set /P "line="
if defined keepHeader echo(!line!
SET LINES=0
rem Reset errorlevel
ver > NUL
:loop
set /P "line="
if errorlevel 1 exit /B
for /F "tokens=6 delims=," %%a in ("!line!") do (
if "%%a" neq "Diesel" echo(!line!
)
REM Optional part: show progress
SET /A "LINES+=1, MOD100=LINES%%100"
IF %MOD100% EQU 0 ECHO Lines processed: %LINES% > CON
goto loop
Note: this method fail if input lines are larger than 1023 characters, but accordingly to your data (350 thousand rows in a file 200 MB in size) each row have 600 bytes approx.