How to replace commas with semicolons except for commas in Quotes? - csv

I have a csv file with commas used to separate values. I want to replace commas with semicolons via batch, but leave the commas that are inside quotations.
So for example:
012,ABC,"DE,FG",345
must become:
012;ABC;"DE,FG";345
How can I do that via Batch?

If you happen to have the JREPL.BAT regular expression text processing utility (v7.9 or later), then you can use:
jrepl "," ";" /p "([\c\q]+)|\q.*?\q" /prepl "$1?{$0}:$0" /f "test.csv" /o -
Use call jrepl if you put the command within a batch script.
The original file will be overwritten. You can substitute a new file name for - if you don't want to overwrite the original.
JREPL.BAT is pure script (hybrid JScript/batch) that runs natively on any Windows machine from XP onward - no 3rd party .exe file required.
The JREPL solution works by performing the replacement in two steps.
1) The /P option breaks each line into unquoted strings and quoted strings. The /PREPL option passes unquoted strings on to the normal FIND/REPLACE, and unquoted strings are preserved as is.
2) The main FIND/REPLACE substitutes ; for ,
It is possible to reliably accomplish this with pure batch using a variant of a technique developed by jeb at 'Pretty print' windows %PATH% variable - how to split on ';' in CMD shell. Although any pure batch solution will be significantly slower than hybrid solutions like JREPL.BAT, ParseCSV.bat, or a powershell solution.
Here is a batch script derived from jeb's technique - simply pass the name of the CSV file as the one and only argument. The original file will be overwritten. It should be trivial to modify the script to write the output to a new file instead. See jeb's post for an overview of how this seemingly magical technique works.
#echo off
setlocal disableDelayedExpansion
>"%~1.new" (
for /f usebackq^ delims^=^ eol^= %%A in ("%~1") do (
set "ln=%%A"
call :repl
)
)
move /y "%~1.new" "%~1" >nul
exit /b
:repl
set "ln=%ln:"=""%"
set "ln=%ln:^=^^%"
set "ln=%ln:&=^&%"
set "ln=%ln:|=^|%"
set "ln=%ln:<=^<%"
set "ln=%ln:>=^>%"
set "ln=%ln:,=^,^,%"
set ln=%ln:""="%
set "ln=%ln:"=""%"
set "ln=%ln:,,=;%"
set "ln=%ln:^,^,=,%"
set "ln=%ln:""="%"
setlocal enableDelayedExpansion
echo(!ln!
exit /b
The script should be able to process almost any valid CSV file input. The only restrictions are:
Empty lines are stripped from the output (should not be a problem with CSV)
Line lengths are limited to around 8 kb. The exact limit is dependent on how many intermediate substitutions must be performed.

Powershell is probably the better solution but you can use a neat hybrid batch file called ParseCSV.bat. It allows you to specify the input and output delimiters. The input delimiter uses a comma by default. So you only need to specify the output delimiter.
ParseCSV.bat /o:; <"file.csv" >"filenew.csv"

This possible alternative appears to work with the single line example you've provided:
#Echo Off
If Not Exist "file.csv" Exit/B
(For /F "Delims=" %%A In ('FindStr "^" "file.csv"') Do (Set "$="
For %%B In (%%A) Do Call Set "$=%%$%%;%%B"
Call Echo %%$:~1%%))>"filenew.csv"

Related

.bat file to automatically add column to pipeline csv and populate with today's date

I'm very new to .bat files and have excitedly created some to copy, move and rename documents.
Despite searching, I'm getting stuck with a more complex command, largely because the document I'm trying to modify is pipeline delimited rather than 'normal' csv...
My question: Can I, and if I can - how do I take an existing pipeline delimited csv that always has the same number of columns and add a column onto the end with todays date (DD/MM/YYYY) in it for every row?
$ awk -F, 'NF>1{$0 = "\"YYYY-MM-DD\"" FS $0}1' file
sed 'N;s/","/","YYYY-MM-DD HH:MM:SS","/5' file
I cant seem to get anything to even modify the document at the moment :-(
Your batch attempt isn't that bad:
#echo off
for /f "delims=" %%a in ('type "file.csv"') do (
>>"fileout.csv" echo.%%a|%time%
)
Just a few adjustments:
#echo off
(for /f "delims=" %%a in (file.csv) do (
echo(%%a^|%time%
))>"fileout.csv"
for /f is able to process the contents of a file directly (no need for type, although it works fine)
Redirecting (>>) inside the loop is slow, because the file will be opened, and closed every time you write to it (although it works). It's much faster to only once open/close the file (especially with large files).
echo. is not secure (although it works fine in most circumstances), best option is echo(.
The pipe, | is a special character and in this case needs escaping with the caret, ^.
Note: for /f skips empty lines, so they will not be in the new file. Same with lines, that start with ; (default EOL)
Edit for "adding |Date to the Header":
#echo off
<file.csv set /p header=
>fileout.csv echo %header%^|DATE
(for /f "skip=1 delims=" %%a in (file.csv) do (
echo(%%a^|%date%
))>>"fileout.csv"
<file.csv set /p header= is a way to read just one (the first) line of a file to a variable. Write it back with the new data appended (or leave it unchanged - your comment isn't quite clear about that). Use skip=1 to skip the first line with further processing.
Don't forget to change ))>"fileout.csv" to ))>>"fileout.csv".

how to remove a text before a character or string on a CSV file using batch

I have a CSV file that I want to modify using batch to remove a string basically I have the next
randomID1, randomID2, randomID3, networkinterface, othercolumn1, othercolumn2,
abc123AAB, 098189909, 999181818, net on Server123, FORCED, anotherthing,
abc2455aB, 848449388, 123131232, LocalNet on SEV1, FORCED, otherlessstuff,
My relevant caracthers are Server123 and SEV1, so I need to convert the above on
randomID1, randomID2, randomID3, networkinterface, othercolumn1, othercolumn2,
abc123AAB, 098189909, 999181818, Server123, FORCED, anotherthing,
abc2455aB, 848449388, 123131232, SEV1, FORCED, otherlessstuff,
This means removing 'net on ' and 'LocalNet on ' strings.
How can I do this?
Batch language is far from ideal for this, but here's a basic script to simply line-by-line remove occurrences of "net on " and "LocalNet on " from input.txt and save the result as output.txt:
#ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
TYPE NUL > output.txt
FOR /F "delims=" %%L IN (input.txt) DO (
SET LINE=%%L
SET LINE=!LINE:LocalNet on =!
ECHO/!LINE:net on =!>> output.txt
)
Refinements are possible and may be needed. E.g., this won't work if the file contains reserved characters such as &. And it's not case sensitive. The latter is the reason the "LocalNet on " substitution is done before the "net on " substitution which is a substring when case insensitive. There's nothing CSV specific here because from your question that doesn't appear to be required. But if for example you needed to treat different comma-separated tokens differently, that can be done with a "delims=," option and some extra code.

Batch: Convert .csv to tab-delimited text, only some fields are quoted, contain commas between quotes (eBay order file)

I'm trying to convert the eBay File Exchange download into a tab-delimited format my shipping software can read.
If each and every column were quoted, this would be easy--but they're not. Only some columns (name, item listing title, etc) are quoted, and some quoted columns contain commas. The rest are bare of quotes.
I need a way to parse and convert this in a .bat file, but using comma as a delimiter splits the quoted fields if they contain a comma too, giving me unusable data. I'm certain there's a simple fix for this, I just can't figure it out.
Eric J is correct - solving this kind of problem with batch is not simple. But it is possible :-)
The main problem is how to differentiate between quoted and unquoted commas - jeb solved a similar problem with quoted vs. unquoted semicolons at 'Pretty print' windows %PATH% variable - how to split on ';' in CMD shell. The code below looks very different, but the fundamental concept is the same.
The code below should work for pretty much any CSV as long as all lines are less than ~8000 bytes long. Batch variable values are limited to 8191 bytes, and some characters are temporarily expanded to two bytes.
The code assumes there are not any existing TABs within the CSV file.
It does not modify any existing quotes.
As I say, the code should work, but it will be painfully SLOW if you have a large file. You would be much better off with a .NET solution as Eric J suggested.
#echo off
setlocal disableDelayedExpansion
set "file=optionalPathinfo\yourFile.csv"
:: Define a TAB variable
for /f "delims=" %%A in (
'forfiles /p "%~dp0." /m "%~nx0" /c "cmd /c echo(0x09"'
) do set "TAB=%%A"
:: Read each line from CSV, convert it, and write to new file with .new extension
>"%file%.new" (
for /f usebackq^ delims^=^ eol^= %%A in ("%file%") do (
set "line=%%A"
call :processLine
)
)
exit /b
:processLine
setlocal enableDelayedExpansion
:: Protect problem characters
set "line=!line:#=#A!"
set "line=!line:^=#K!"
set "line=!line:&=#M!"
set "line=!line:|=#P!"
set "line=!line:<=#L!"
set "line=!line:>=#G!"
:: Mark commas with leading caret (escape)
set "line=!line:,=^,!"
:: Remove mark from unquoted commas, but first temporarily
:: disable delayed expansion to protect any ! characters
setlocal disableDelayedExpansion
set ^"line=%line%"
setlocal enableDelayedExpansion
:: Protect remaining marked commas
set "line=!line:^,=#C!"
:: Convert remaining commas to TAB
set "line=!line:,=%TAB%!"
:: Restore protected characters
set "line=!line:#C=,!"
set "line=!line:#G=>!"
set "line=!line:#L=<!"
set "line=!line:#P=|!"
set "line=!line:#M=&!"
set "line=!line:#K=^!"
set "line=!line:#A=#!"
:: Write modified line
echo(!line!
exit /b
There's a further complication: A field with a quote and a comma will also have the quote escaped:
Jim "Smitty" Smith, Jr.
would be represented in the CSV file as
"Jim ""Smitty"" Smith, Jr."
This is not the kind of problem that is easily solved in a batch file. However, there is preexisting functionality to deal with the CSV format that can be used from any .NET compatible language including Powershell. If that is an option, have a look at
http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader
For information on calling the .NET methods to read CSV files from Powershell, have a look at
http://blogs.msdn.com/b/mattbie/archive/2010/02/23/how-to-call-net-and-win32-methods-from-powershell-and-your-troubleshooting-packs.aspx

Find Special Characters in multiple csv files with a batch file

I have multiple csv files that I need to search for Special characters like the exclamation mark! if the character is found delete the information between the commas with a .bat file. the email address always seems to be where people screw up. example: 233dd123dde3,Valid,boxer,Nov-13,Philip Smith,andrew!#myaxxus.net,16666
suggestion with sed for Windows:
sed -i.bak "s/[^,]*![^,]*//" *.csv
That seems like a drastic measure to completely drop the entire value if a single character exists, but it can be done.
Note that the you must account for the fact that the first value does not have a leading comma, and the last value does not have a trailing comma.
This solution will not properly handle quoted values containing commas.
I'm using a hybrid JScript/batch utility called REPL.BAT that performs a regex search and replace on stdin and writes the result to stdout. It is pure script that works on any modern Windows from XP onward - no 3rd party executebable required. Full documentation is embedded within the utility.
Assuming that REPL.BAT is in your current directory, or better yet, somewhere within your path:
#echo off
for %%F in (*.csv) do (
type "%%F" | repl "(^|,)[^,]*![^,]*(,|$)" "$1$2" >"%%F.new"
move /y "%%F.new" "%%F" >nul
)
EDIT
Now that I see Endoro's sed solution, I realize that the default greedy match means you don't have to explicitly match the commas. The following simpler regex works just as well:
#echo off
for %%F in (*.csv) do (
type "%%F" | repl "[^,]*![^,]*" "" >"%%F.new"
move /y "%%F.new" "%%F" >nul
)

Creating a batch/visual basic script to put a random quote into an html file

At work our end-users are on Windows XP and using Outlook Express. Whenever a user composes an email or replies to one, Outlook Express "reads" a static html file located on c:\, and uses the content as a signature. This works perfectly fine.
Now my coworker gave me a simple text(.txt) file with 100+ lines, each line containing a "motivational quote".
My objective is to somehow have a random quote extracted from this text file, and inserted into the static html-signature file.
Since I am limited to what XP natively supports and can't install any additional software such as python, I assume either batch or vbscript would be the proper choice (if not only). I imagine a script which is executed via. the Windows Task Scheduler every 15 minutes or so, which randomly reads a line from the .txt-file, and updates it into the static html-signature file.
Is this possible in any way, or are neither batch nor vbscript capable of doing something like that?
Any help or advice will be GREATLY appreciated :)
You can create a signature template that has embedded variables that are replaced by delayed expansion. Any exclamation point ! or caret ^ literals must be encoded as variables as well:
!QUOTE! = The random quote
!X! = exclamation point literal
!C! = caret literal (probably not needed)
Additional variables could be added to the template as needed.
Here is a trivial HTML template as an example
<!X!doctype html>
<html>
<head>
<title>Random Quote</title>
</head>
<body>
<p><strong>!QUOTE!</strong></p>
</body>
</html>
The following batch file will select a random quote from the quote file and write out the signature file after replacing the variables in the template.
EDIT - I improved performance and slightly altered the limitations by using FOR /F to read the quote line instead of SET /P.
#echo off
setlocal disableDelayedExpansion
::Define the files
set quoteFile="quotes.txt"
set signatureTemplate="template.txt"
set signatureFile="signature.html"
::Define constants for ! and ^ substitutions in template
set "X=!"
set "C=^"
::Count the number of quotes
for /f %%N in ('find /c /v "" ^<%quoteFile%') do set quoteCount=%%N
::Pick a random number of quotes to skip
set /a "skip=%random% %% %quoteCount%"
::Load the selected quote into a variable
if %skip% gtr 0 (set skip=skip=%skip%) else (set skip=)
for /f "usebackq %skip% delims=" %%A in (%quoteFile%) do (
set quote=%%A
goto :break
)
:break
::Read the signature template and write the signature file
::Delayed expansion will automatically replace !quote!, !X! and !C!
setlocal enableDelayedExpansion
>%signatureFile% (
for /f "usebackq delims=" %%A in (%signatureTemplate%) do echo %%A
)
There are a few limitations to the script as written:
Template lines that are blank or begin with ; will be skipped
The quotes file must not have any blank lines or lines that start with ;
Here's a batch script that will get a random line from a file in one pass and then print it to the console and write it to a file.
So where I have echo !LINE! is where you'd write your HTML file. It's actually kind of painful in batch because >, <, %, ^, !, and others characters are special and need to be escaped with ^ in front.
#echo off
SET /A LINE_NUM=0
SET LINE=
setlocal EnableDelayedExpansion
for /F "delims=" %%l in (random_lines.txt) do (
call:rand 0 !LINE_NUM!
IF !RAND_NUM! LSS 1 (
SET LINE=%%l
)
SET /A LINE_NUM=!LINE_NUM! + 1
)
echo !LINE!
echo ^<^^!doctype html^>^<html^>^<head^>^<title^>Random Quote^</title^>^</head^>
> %OUT_FILE%
echo ^<body^>^<p^>^<strong^>!LINE!^</strong^>^</p^> >> %OUT_FILE%
echo ^</body^>^</html^> >> %OUT_FILE%
goto :EOF
REM rand()
REM Input: %1 is min, %2 is max
REM Output: RAND_NUM is set to a random number from min through max.
:rand
SET /A RAND_NUM=%RANDOM% * (%2 - %1 + 1) / 32768 + %1
goto :EOF
Alternatively, and probably better, instead of putting the HTML inside the batch file, you can keep it in a separate file in two pieces. The glue that joins the two pieces and makes a complete HTML file is the line you picked. For example, I can create a file called sig_file_header.txt that contains this:
<!doctype html>
<html>
<head>
<title>Random Quote</title>
</head>
<body>
<p><strong>
Then I can create a file called sig_file_footer.txt with this:
</strong></p>
</body>
</html>
Notice that when I put these files together, header followed by footer, I get a full HTML document. So when I put them together, I can cram the line the script picked in there and get a full HTML document with the line in it.
Doing that is easy. Replace the 4 lines starting with echo !LINE! in the script above with the following 3:
type sig_file_header.txt > new_signature_file.html
echo !LINE! >> new_signature_file.html
type sig_file_footer.txt >> new_signature_file.html
You'll need to
Get the count of lines in the file.
Get a random number n between 1 and the number of lines.
Get the nth line from the file.
That should be reasonably simple to accomplish with a batch script. Unfortunately, I don't know any Windows batch script, so I can't provide any more advice.