I have a csv file with commas used to separate values. I want to replace commas with semicolons via batch, but leave the commas that are inside quotations.
So for example:
012,ABC,"DE,FG",345
must become:
012;ABC;"DE,FG";345
How can I do that via Batch?
If you happen to have the JREPL.BAT regular expression text processing utility (v7.9 or later), then you can use:
jrepl "," ";" /p "([\c\q]+)|\q.*?\q" /prepl "$1?{$0}:$0" /f "test.csv" /o -
Use call jrepl if you put the command within a batch script.
The original file will be overwritten. You can substitute a new file name for - if you don't want to overwrite the original.
JREPL.BAT is pure script (hybrid JScript/batch) that runs natively on any Windows machine from XP onward - no 3rd party .exe file required.
The JREPL solution works by performing the replacement in two steps.
1) The /P option breaks each line into unquoted strings and quoted strings. The /PREPL option passes unquoted strings on to the normal FIND/REPLACE, and unquoted strings are preserved as is.
2) The main FIND/REPLACE substitutes ; for ,
It is possible to reliably accomplish this with pure batch using a variant of a technique developed by jeb at 'Pretty print' windows %PATH% variable - how to split on ';' in CMD shell. Although any pure batch solution will be significantly slower than hybrid solutions like JREPL.BAT, ParseCSV.bat, or a powershell solution.
Here is a batch script derived from jeb's technique - simply pass the name of the CSV file as the one and only argument. The original file will be overwritten. It should be trivial to modify the script to write the output to a new file instead. See jeb's post for an overview of how this seemingly magical technique works.
#echo off
setlocal disableDelayedExpansion
>"%~1.new" (
for /f usebackq^ delims^=^ eol^= %%A in ("%~1") do (
set "ln=%%A"
call :repl
)
)
move /y "%~1.new" "%~1" >nul
exit /b
:repl
set "ln=%ln:"=""%"
set "ln=%ln:^=^^%"
set "ln=%ln:&=^&%"
set "ln=%ln:|=^|%"
set "ln=%ln:<=^<%"
set "ln=%ln:>=^>%"
set "ln=%ln:,=^,^,%"
set ln=%ln:""="%
set "ln=%ln:"=""%"
set "ln=%ln:,,=;%"
set "ln=%ln:^,^,=,%"
set "ln=%ln:""="%"
setlocal enableDelayedExpansion
echo(!ln!
exit /b
The script should be able to process almost any valid CSV file input. The only restrictions are:
Empty lines are stripped from the output (should not be a problem with CSV)
Line lengths are limited to around 8 kb. The exact limit is dependent on how many intermediate substitutions must be performed.
Powershell is probably the better solution but you can use a neat hybrid batch file called ParseCSV.bat. It allows you to specify the input and output delimiters. The input delimiter uses a comma by default. So you only need to specify the output delimiter.
ParseCSV.bat /o:; <"file.csv" >"filenew.csv"
This possible alternative appears to work with the single line example you've provided:
#Echo Off
If Not Exist "file.csv" Exit/B
(For /F "Delims=" %%A In ('FindStr "^" "file.csv"') Do (Set "$="
For %%B In (%%A) Do Call Set "$=%%$%%;%%B"
Call Echo %%$:~1%%))>"filenew.csv"
I'm very new to .bat files and have excitedly created some to copy, move and rename documents.
Despite searching, I'm getting stuck with a more complex command, largely because the document I'm trying to modify is pipeline delimited rather than 'normal' csv...
My question: Can I, and if I can - how do I take an existing pipeline delimited csv that always has the same number of columns and add a column onto the end with todays date (DD/MM/YYYY) in it for every row?
$ awk -F, 'NF>1{$0 = "\"YYYY-MM-DD\"" FS $0}1' file
sed 'N;s/","/","YYYY-MM-DD HH:MM:SS","/5' file
I cant seem to get anything to even modify the document at the moment :-(
Your batch attempt isn't that bad:
#echo off
for /f "delims=" %%a in ('type "file.csv"') do (
>>"fileout.csv" echo.%%a|%time%
)
Just a few adjustments:
#echo off
(for /f "delims=" %%a in (file.csv) do (
echo(%%a^|%time%
))>"fileout.csv"
for /f is able to process the contents of a file directly (no need for type, although it works fine)
Redirecting (>>) inside the loop is slow, because the file will be opened, and closed every time you write to it (although it works). It's much faster to only once open/close the file (especially with large files).
echo. is not secure (although it works fine in most circumstances), best option is echo(.
The pipe, | is a special character and in this case needs escaping with the caret, ^.
Note: for /f skips empty lines, so they will not be in the new file. Same with lines, that start with ; (default EOL)
Edit for "adding |Date to the Header":
#echo off
<file.csv set /p header=
>fileout.csv echo %header%^|DATE
(for /f "skip=1 delims=" %%a in (file.csv) do (
echo(%%a^|%date%
))>>"fileout.csv"
<file.csv set /p header= is a way to read just one (the first) line of a file to a variable. Write it back with the new data appended (or leave it unchanged - your comment isn't quite clear about that). Use skip=1 to skip the first line with further processing.
Don't forget to change ))>"fileout.csv" to ))>>"fileout.csv".
I'm trying to delete by batch or vbs text delimited by commas (CSV) that are always in the same position. It would not affect the first line, only lines 2 onwards.
Example text from file:
Code,Batch,File #,Reg Hours,O/T,Cost Number,Rate,Earnings,Earnings,Memo Code,Memo Amount,Earnings Code,Earnings Amount,Hours Code,Hours Amount,Earnings Code,Earnings Amount,Adjust Code,Adjust Amount
ABC,123,3980 ,78.52,,12331,10.00,,,,,,,,
ABC,123,4026 ,29.38,,12331,10.00,,,,,,,,
ABC,123,5065 ,64.46,,12331,10.00,,,,,,,,
ABC,123,5125 ,80.00, 0.54,12331,11.00,,,,,,,,
I would like to end up with text:
Code,Batch,File #,Reg Hours,O/T,Cost Number,Rate,Earnings,Earnings,Memo Code,Memo Amount,Earnings Code,Earnings Amount,Hours Code,Hours Amount,Earnings Code,Earnings Amount,Adjust Code,Adjust Amount
ABC,123,3980 ,78.52,,12331,,,,,,,,,
ABC,123,4026 ,29.38,,12331,,,,,,,,,
ABC,123,5065 ,64.46,,12331,,,,,,,,,
ABC,123,5125 ,80.00, 0.54,12331,,,,,,,,,
The only difference is the Rate area. It is the 7th separated value from the left, or 9th from the right. The first line remains intact.
Is there a way for the batch/vbs to determine the comma separated value position, delete the value or replace it with 'nothing', and ignore the first line?
For this example, we can assume the file will always be named file.csv, and located in D:\location - 'D:\location\file.csv'
Thank you!
REM <!-- language: lang-dos -->
#ECHO Off
SETLOCAL ENABLEDELAYEDEXPANSION
SET "sourcedir=U:\sourcedir"
SET "destdir=U:\destdir"
SET "filename1=%sourcedir%\q46534752.txt"
SET "outfile=%destdir%\outfile.txt"
:: Remove the output file
DEL "%outfile%" >NUL 2>nul
:: To reproduce the first line intact
FOR /f "usebackqdelims=" %%a IN ("%filename1%") DO >"%outfile%" ECHO %%a&GOTO hdrdone
:hdrdone
(
REM to process the header line, remove the "skip=1" from the "for...%%a" command
FOR /f "usebackqskip=1delims=" %%a IN ("%filename1%") DO (
REM step 1 - replace all commas with "|," to separate separators
SET "line=%%a"
SET "line=!line:,=|,!"
FOR /f "tokens=1-7*delims=|" %%A IN ("!line!") DO (
SET "line=%%A%%B%%C%%D%%E%%F%%H"
ECHO !line:^|=!
)
)
)>>"%outfile%"
GOTO :EOF
You would need to change the settings of sourcedir and destdir to suit your circumstances.
I used a file named q46534752.txt containing your data for my testing.
Produces the file defined as %outfile%
Processing of the header line is an issue. The code as presented should do as you ask, but it seems illogical to retain the column name in the resultant file when the process is intended to remove that column. To process the header line also, delete the first for line and remove the skip=1 (which skips the first line) from the second.
The fundamental issue is that batch treats a string of delimiters as a single delimiter, so it's necessary to separate those delimiters. This is not possible against a metavariable, but can be done within a loop by transferring the metavariable into an ordinary environment variable (line) and performing the string-replace ceremony on that ordinary variable in delayed expansion mode.
So - replace each , with |,, then process the resultant string using | as a delimiter. Note that the metavariable is in a different case for the second for - one of the few occasions where cmd is case-sensitive. Reconstruct the string, omitting column 7 (%%G) and using the * token meaning the eighth token (%%H) receives the remainder-of-line after the highest explicitly-mentioned token number (7) and echo it after removing remaining | characters.
Note that it is normal policy to refuse code-requests on SO, and only respond in a manner to fix faulty code. In this case however, succeeding browsers may find this response to be the key to doing a similar task and hence refrain from posting unnecessarily. Also, I'm bored witless.
I'm not a code guy and have spent whole day trying to get this done without success, hoping I can get some help from the experts.
I have a folder called Vehicles, within which are two sub-folders - Automobiles and Trucks. Each of sub-folders contain two CSV files which have identical (to that sub-folder) headers/structure.
What I'm trying to accomplish:
Take the two CSV files in Automobiles folder merge them without duplicating headers and name the merged file as Automobiles.csv
Delete all rows in Automobiles.csv where 6th column (header is Fuel_Type) is "Diesel" (without the quotes) then move the file from sub-folder to main Vehicles folder.
Take the two CSV files in Trucks folder merge them without duplicating headers and name merge file as Trucks.csv
For merged file in trucks folder remove all rows where 6th column (header is "Fuel_Type") is "Diesel" (without the quotes) then move the file from sub-folder to main Vehicles folder.
Obviously if someone can help with 1 and 2 I can manipulate it for 3 and 4.
BONUS POINTS :) take the Automobiles.csv and Trucks.csv files and create Vehicles.xls file with Automobiles and Trucks tabs.
Few details - files are pretty large, each CSV can up to 350 thousand rows x 150 columns and be 200 MB in size each. All the Batch scripts that I tried to put together removing headers seemed to freeze with larger files.
Due to user permissions on work computers would strongly prefer to use something that is native to Windows7/8 and doesn't require additional software, but would consider other options if nothing native is available.
I am going to assume all csv files contain the same header information, in the same order. This is how i would do it in powershell v2:
Function Merge-Vehicles
{
param(
[string]$PathToCsv1,
[string]$PathToCsv2,
[string]$ExportPath
)
$regex = "^(?:d|D)(?:i|I)(?:e|E)(?:s|S)(?:e|E)(?:l|L)$"
$CSV = Import-Csv $PathToCsv1
$CSV += Import-Csv $PathToCsv2
$CSV | Where-Object { ( $_.'Fuel_Type' -notmatch "$regex") } | Export-Csv -Path $ExportPath
}
First the function will import the csv files from a user defined path, and combine the imported objects into an array. It will than filter the objects that do not contain the string "Diesel" within the property 'Fuel_Type'. The objects that remain will be exported as a csv to a user defined path.
To call the function for Automobiles
Merge-Vehicles -PathToCsv1 C:\Vehicles\Autmobiles\csv1.csv -PathToCsv2 C:\Vehicles\Autmobiles\csv2.csv -ExportPath C:\Vehicles\Autmobiles.csv
And for Trucks
Merge-Vehicles -PathToCsv1 C:\Vehicles\Trucks\csv1.csv -PathToCsv2 C:\Vehicles\Trucks\csv2.csv -ExportPath C:\Vehicles\Trucks.csv
I don't have excel installed so I can't help with the workbook creation.
#ECHO OFF
SETLOCAL
SET "sourcedir=U:\vehicles\automobiles"
SET "destdir=U:\vehicles"
FOR /f "usebackqdelims=" %%a IN ("%sourcedir%\file1.csv" "%sourcedir%\file2.csv") DO (
ECHO(%%a>"%destdir%\vehicles.csv"
ECHO(%%a>"%destdir%\vehicles_censored.csv"
GOTO donehdrs
)
:donehdrs
FOR /f "skip=1tokens=1-6*usebackqdelims=," %%a IN ("%sourcedir%\file1.csv" "%sourcedir%\file2.csv") DO (
ECHO(%%a,%%b,%%c,%%d,%%e,%%f,%%g>>"%destdir%\vehicles.csv"
IF "%%f" neq "Diesel" ECHO(%%a,%%b,%%c,%%d,%%e,%%f,%%g>>"%destdir%\vehicles_censored.csv"
)
GOTO :EOF
You don't really indicate whether you want the intermediate files.
Note that this will not work properly if any field before field 6 contains a "quoted string containing commas" nor if any field before field 6 is completely empty (ie. on lines of the form data1,data2,,data4,data5,data6,data7)
Insufficient info to improve without examining a plethora of possibilities...
It might even be easier to simply use copy to merge the two files, then use an editor to remove the superfluous header line.
Batch files are inherently slow, and this is evident when large data files are processed. I think the method below is the fastest way to perform this process using a Batch file.
#echo off
setlocal EnableDelayedExpansion
cd C:\Vehicles
call :MergeFiles Automobiles > Automobiles.csv
call :MergeFiles Trucks > Trucks.csv
goto :EOF
:MergeFiles
rem Enter to the given sub-folder (Automobiles or Trucks)
pushd %1
rem Process the (two) CSV files existent here
set "keepHeader=1"
for %%f in (*.csv) do (
ECHO Procesing file: "%%~Ff" > CON
call :ProcessFile < "%%f"
set "keepHeader="
)
popd
exit /B
:ProcessFile
set /P "line="
if defined keepHeader echo(!line!
SET LINES=0
rem Reset errorlevel
ver > NUL
:loop
set /P "line="
if errorlevel 1 exit /B
for /F "tokens=6 delims=," %%a in ("!line!") do (
if "%%a" neq "Diesel" echo(!line!
)
REM Optional part: show progress
SET /A "LINES+=1, MOD100=LINES%%100"
IF %MOD100% EQU 0 ECHO Lines processed: %LINES% > CON
goto loop
Note: this method fail if input lines are larger than 1023 characters, but accordingly to your data (350 thousand rows in a file 200 MB in size) each row have 600 bytes approx.
I may at various times have .csv files I need to combine. They have the same headers and column layout. I just need a simple way to combine them in Windows 7. The user may not always have excel installed.
A .cmd macro would be great, but the ones I found online don't work.
The best i've got so far is this:
"open a command window ("cmd.exe") and type the following two lines (no brackets)
cd "Desktop\[csv-files]"
type *.csv > my-new-file.csv"
Where the files to be combined are in Desktop\[csv-files].
BUT - it seems to create duplicates (or in some case triplicates) of the combined entries. For instance I have 2 files I tested with 23 and 26 unique entries respectivly. I got out a file with 100 entries and at least one entry repeated 3 times.
Right now the .csv files I am testing are only ~25 entries long, but in time they could be thousands or more.
Sounds like you have an issue with using *.csv and redirecting the output to a .csv file in the same folder. DOS seems to be finding the my-new-file.csv file because of the *.csv and is typing it into itself... You could use a different output filename extension until after the type command finishes, then you could rename the output file... Something like:
cd "Desktop\[csv-files]"
type *.csv > my-new-file.txt
ren my-new-file.txt my-new-file.csv
You can also skip the header of each file after the first, so that you don't end up with file headers throughout the middle of the output file. Try the following:
#echo off
setlocal ENABLEDELAYEDEXPANSION
set cnt=1
cd "Desktop\[csv-files]"
for %%i in (*.csv) do (
if !cnt!==1 (
for /f "delims=" %%j in ('type "%%i"') do echo %%j >> my-new-file.txt
) else (
for /f "skip=1 delims=" %%j in ('type "%%i"') do echo %%j >> my-new-file.txt
)
set /a cnt+=1
)
endlocal
ren my-new-file.txt my-new-file.csv
Explanation:
I used ENABLEDELAYEDEXPANSION to make sure the cnt variable is properly evaluated. When delayed expansion is enabled, you use ! to distinguish variables instead of %. So to evaluate the cnt variable, you use !cnt! instead of %cnt%. Delaying expansion makes it wait to evaluate the value of cnt until the moment that it is used. Sometimes, but not always, if you use %cnt%, it will equal a value from a previous iteration. If you enable delayed expansion and use !cnt!, it will always evaluate the correct current value.
By setting cnt to 1, we can run different code for the 1st .csv file that is processed. The code includes all lines from the 1st .csv file, but skips the first line of all subsequent .csv files.
I used a nested for loop. The outer for cycles through all .csv files in the current folder. The inner for loop executes the type "%%i" command, where %%i is the name of the .csv file. Each line of the file is processed individually as %%j, which is passed to the echo %%j command. echo would normally print the value for %%j to the command prompt window. However, you can redirect the output to a file using > or >>. The > redirector overwrites the output file with the new value. The >> redirector appends the new value to the output file. Since each line of each file, and each file is being processed individually, we must use the >> redirector to push all content into a single file.
When using the for /f command, the output is broken into individual parts using the specified delimiter. The default delimiter is a space. If I didn't include "delims=", then the text This is fun would be broken into the following:
%%j = This
%%k = is
%%l = fun
We want to process the whole line from the .csv file all-at-once. By setting the delimiter to nothing ("delims="), the whole line can be processed using %%j.
For more specific help about how the for command works, type for /? at a command prompt.
endlocal reverts the environment to its state at the point where setlocal was used. Any variables you declared are removed, and extensions are set back to their prior value.