Find Special Characters in multiple csv files with a batch file - csv

I have multiple csv files that I need to search for Special characters like the exclamation mark! if the character is found delete the information between the commas with a .bat file. the email address always seems to be where people screw up. example: 233dd123dde3,Valid,boxer,Nov-13,Philip Smith,andrew!#myaxxus.net,16666

suggestion with sed for Windows:
sed -i.bak "s/[^,]*![^,]*//" *.csv

That seems like a drastic measure to completely drop the entire value if a single character exists, but it can be done.
Note that the you must account for the fact that the first value does not have a leading comma, and the last value does not have a trailing comma.
This solution will not properly handle quoted values containing commas.
I'm using a hybrid JScript/batch utility called REPL.BAT that performs a regex search and replace on stdin and writes the result to stdout. It is pure script that works on any modern Windows from XP onward - no 3rd party executebable required. Full documentation is embedded within the utility.
Assuming that REPL.BAT is in your current directory, or better yet, somewhere within your path:
#echo off
for %%F in (*.csv) do (
type "%%F" | repl "(^|,)[^,]*![^,]*(,|$)" "$1$2" >"%%F.new"
move /y "%%F.new" "%%F" >nul
)
EDIT
Now that I see Endoro's sed solution, I realize that the default greedy match means you don't have to explicitly match the commas. The following simpler regex works just as well:
#echo off
for %%F in (*.csv) do (
type "%%F" | repl "[^,]*![^,]*" "" >"%%F.new"
move /y "%%F.new" "%%F" >nul
)

Related

How to replace commas with semicolons except for commas in Quotes?

I have a csv file with commas used to separate values. I want to replace commas with semicolons via batch, but leave the commas that are inside quotations.
So for example:
012,ABC,"DE,FG",345
must become:
012;ABC;"DE,FG";345
How can I do that via Batch?
If you happen to have the JREPL.BAT regular expression text processing utility (v7.9 or later), then you can use:
jrepl "," ";" /p "([\c\q]+)|\q.*?\q" /prepl "$1?{$0}:$0" /f "test.csv" /o -
Use call jrepl if you put the command within a batch script.
The original file will be overwritten. You can substitute a new file name for - if you don't want to overwrite the original.
JREPL.BAT is pure script (hybrid JScript/batch) that runs natively on any Windows machine from XP onward - no 3rd party .exe file required.
The JREPL solution works by performing the replacement in two steps.
1) The /P option breaks each line into unquoted strings and quoted strings. The /PREPL option passes unquoted strings on to the normal FIND/REPLACE, and unquoted strings are preserved as is.
2) The main FIND/REPLACE substitutes ; for ,
It is possible to reliably accomplish this with pure batch using a variant of a technique developed by jeb at 'Pretty print' windows %PATH% variable - how to split on ';' in CMD shell. Although any pure batch solution will be significantly slower than hybrid solutions like JREPL.BAT, ParseCSV.bat, or a powershell solution.
Here is a batch script derived from jeb's technique - simply pass the name of the CSV file as the one and only argument. The original file will be overwritten. It should be trivial to modify the script to write the output to a new file instead. See jeb's post for an overview of how this seemingly magical technique works.
#echo off
setlocal disableDelayedExpansion
>"%~1.new" (
for /f usebackq^ delims^=^ eol^= %%A in ("%~1") do (
set "ln=%%A"
call :repl
)
)
move /y "%~1.new" "%~1" >nul
exit /b
:repl
set "ln=%ln:"=""%"
set "ln=%ln:^=^^%"
set "ln=%ln:&=^&%"
set "ln=%ln:|=^|%"
set "ln=%ln:<=^<%"
set "ln=%ln:>=^>%"
set "ln=%ln:,=^,^,%"
set ln=%ln:""="%
set "ln=%ln:"=""%"
set "ln=%ln:,,=;%"
set "ln=%ln:^,^,=,%"
set "ln=%ln:""="%"
setlocal enableDelayedExpansion
echo(!ln!
exit /b
The script should be able to process almost any valid CSV file input. The only restrictions are:
Empty lines are stripped from the output (should not be a problem with CSV)
Line lengths are limited to around 8 kb. The exact limit is dependent on how many intermediate substitutions must be performed.
Powershell is probably the better solution but you can use a neat hybrid batch file called ParseCSV.bat. It allows you to specify the input and output delimiters. The input delimiter uses a comma by default. So you only need to specify the output delimiter.
ParseCSV.bat /o:; <"file.csv" >"filenew.csv"
This possible alternative appears to work with the single line example you've provided:
#Echo Off
If Not Exist "file.csv" Exit/B
(For /F "Delims=" %%A In ('FindStr "^" "file.csv"') Do (Set "$="
For %%B In (%%A) Do Call Set "$=%%$%%;%%B"
Call Echo %%$:~1%%))>"filenew.csv"

how to remove a text before a character or string on a CSV file using batch

I have a CSV file that I want to modify using batch to remove a string basically I have the next
randomID1, randomID2, randomID3, networkinterface, othercolumn1, othercolumn2,
abc123AAB, 098189909, 999181818, net on Server123, FORCED, anotherthing,
abc2455aB, 848449388, 123131232, LocalNet on SEV1, FORCED, otherlessstuff,
My relevant caracthers are Server123 and SEV1, so I need to convert the above on
randomID1, randomID2, randomID3, networkinterface, othercolumn1, othercolumn2,
abc123AAB, 098189909, 999181818, Server123, FORCED, anotherthing,
abc2455aB, 848449388, 123131232, SEV1, FORCED, otherlessstuff,
This means removing 'net on ' and 'LocalNet on ' strings.
How can I do this?
Batch language is far from ideal for this, but here's a basic script to simply line-by-line remove occurrences of "net on " and "LocalNet on " from input.txt and save the result as output.txt:
#ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
TYPE NUL > output.txt
FOR /F "delims=" %%L IN (input.txt) DO (
SET LINE=%%L
SET LINE=!LINE:LocalNet on =!
ECHO/!LINE:net on =!>> output.txt
)
Refinements are possible and may be needed. E.g., this won't work if the file contains reserved characters such as &. And it's not case sensitive. The latter is the reason the "LocalNet on " substitution is done before the "net on " substitution which is a substring when case insensitive. There's nothing CSV specific here because from your question that doesn't appear to be required. But if for example you needed to treat different comma-separated tokens differently, that can be done with a "delims=," option and some extra code.

Batch: Convert .csv to tab-delimited text, only some fields are quoted, contain commas between quotes (eBay order file)

I'm trying to convert the eBay File Exchange download into a tab-delimited format my shipping software can read.
If each and every column were quoted, this would be easy--but they're not. Only some columns (name, item listing title, etc) are quoted, and some quoted columns contain commas. The rest are bare of quotes.
I need a way to parse and convert this in a .bat file, but using comma as a delimiter splits the quoted fields if they contain a comma too, giving me unusable data. I'm certain there's a simple fix for this, I just can't figure it out.
Eric J is correct - solving this kind of problem with batch is not simple. But it is possible :-)
The main problem is how to differentiate between quoted and unquoted commas - jeb solved a similar problem with quoted vs. unquoted semicolons at 'Pretty print' windows %PATH% variable - how to split on ';' in CMD shell. The code below looks very different, but the fundamental concept is the same.
The code below should work for pretty much any CSV as long as all lines are less than ~8000 bytes long. Batch variable values are limited to 8191 bytes, and some characters are temporarily expanded to two bytes.
The code assumes there are not any existing TABs within the CSV file.
It does not modify any existing quotes.
As I say, the code should work, but it will be painfully SLOW if you have a large file. You would be much better off with a .NET solution as Eric J suggested.
#echo off
setlocal disableDelayedExpansion
set "file=optionalPathinfo\yourFile.csv"
:: Define a TAB variable
for /f "delims=" %%A in (
'forfiles /p "%~dp0." /m "%~nx0" /c "cmd /c echo(0x09"'
) do set "TAB=%%A"
:: Read each line from CSV, convert it, and write to new file with .new extension
>"%file%.new" (
for /f usebackq^ delims^=^ eol^= %%A in ("%file%") do (
set "line=%%A"
call :processLine
)
)
exit /b
:processLine
setlocal enableDelayedExpansion
:: Protect problem characters
set "line=!line:#=#A!"
set "line=!line:^=#K!"
set "line=!line:&=#M!"
set "line=!line:|=#P!"
set "line=!line:<=#L!"
set "line=!line:>=#G!"
:: Mark commas with leading caret (escape)
set "line=!line:,=^,!"
:: Remove mark from unquoted commas, but first temporarily
:: disable delayed expansion to protect any ! characters
setlocal disableDelayedExpansion
set ^"line=%line%"
setlocal enableDelayedExpansion
:: Protect remaining marked commas
set "line=!line:^,=#C!"
:: Convert remaining commas to TAB
set "line=!line:,=%TAB%!"
:: Restore protected characters
set "line=!line:#C=,!"
set "line=!line:#G=>!"
set "line=!line:#L=<!"
set "line=!line:#P=|!"
set "line=!line:#M=&!"
set "line=!line:#K=^!"
set "line=!line:#A=#!"
:: Write modified line
echo(!line!
exit /b
There's a further complication: A field with a quote and a comma will also have the quote escaped:
Jim "Smitty" Smith, Jr.
would be represented in the CSV file as
"Jim ""Smitty"" Smith, Jr."
This is not the kind of problem that is easily solved in a batch file. However, there is preexisting functionality to deal with the CSV format that can be used from any .NET compatible language including Powershell. If that is an option, have a look at
http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader
For information on calling the .NET methods to read CSV files from Powershell, have a look at
http://blogs.msdn.com/b/mattbie/archive/2010/02/23/how-to-call-net-and-win32-methods-from-powershell-and-your-troubleshooting-packs.aspx

Combine CSV files in windows (.cmd or .bat file preferably)

I may at various times have .csv files I need to combine. They have the same headers and column layout. I just need a simple way to combine them in Windows 7. The user may not always have excel installed.
A .cmd macro would be great, but the ones I found online don't work.
The best i've got so far is this:
"open a command window ("cmd.exe") and type the following two lines (no brackets)
cd "Desktop\[csv-files]"
type *.csv > my-new-file.csv"
Where the files to be combined are in Desktop\[csv-files].
BUT - it seems to create duplicates (or in some case triplicates) of the combined entries. For instance I have 2 files I tested with 23 and 26 unique entries respectivly. I got out a file with 100 entries and at least one entry repeated 3 times.
Right now the .csv files I am testing are only ~25 entries long, but in time they could be thousands or more.
Sounds like you have an issue with using *.csv and redirecting the output to a .csv file in the same folder. DOS seems to be finding the my-new-file.csv file because of the *.csv and is typing it into itself... You could use a different output filename extension until after the type command finishes, then you could rename the output file... Something like:
cd "Desktop\[csv-files]"
type *.csv > my-new-file.txt
ren my-new-file.txt my-new-file.csv
You can also skip the header of each file after the first, so that you don't end up with file headers throughout the middle of the output file. Try the following:
#echo off
setlocal ENABLEDELAYEDEXPANSION
set cnt=1
cd "Desktop\[csv-files]"
for %%i in (*.csv) do (
if !cnt!==1 (
for /f "delims=" %%j in ('type "%%i"') do echo %%j >> my-new-file.txt
) else (
for /f "skip=1 delims=" %%j in ('type "%%i"') do echo %%j >> my-new-file.txt
)
set /a cnt+=1
)
endlocal
ren my-new-file.txt my-new-file.csv
Explanation:
I used ENABLEDELAYEDEXPANSION to make sure the cnt variable is properly evaluated. When delayed expansion is enabled, you use ! to distinguish variables instead of %. So to evaluate the cnt variable, you use !cnt! instead of %cnt%. Delaying expansion makes it wait to evaluate the value of cnt until the moment that it is used. Sometimes, but not always, if you use %cnt%, it will equal a value from a previous iteration. If you enable delayed expansion and use !cnt!, it will always evaluate the correct current value.
By setting cnt to 1, we can run different code for the 1st .csv file that is processed. The code includes all lines from the 1st .csv file, but skips the first line of all subsequent .csv files.
I used a nested for loop. The outer for cycles through all .csv files in the current folder. The inner for loop executes the type "%%i" command, where %%i is the name of the .csv file. Each line of the file is processed individually as %%j, which is passed to the echo %%j command. echo would normally print the value for %%j to the command prompt window. However, you can redirect the output to a file using > or >>. The > redirector overwrites the output file with the new value. The >> redirector appends the new value to the output file. Since each line of each file, and each file is being processed individually, we must use the >> redirector to push all content into a single file.
When using the for /f command, the output is broken into individual parts using the specified delimiter. The default delimiter is a space. If I didn't include "delims=", then the text This is fun would be broken into the following:
%%j = This
%%k = is
%%l = fun
We want to process the whole line from the .csv file all-at-once. By setting the delimiter to nothing ("delims="), the whole line can be processed using %%j.
For more specific help about how the for command works, type for /? at a command prompt.
endlocal reverts the environment to its state at the point where setlocal was used. Any variables you declared are removed, and extensions are set back to their prior value.

Quick and dirty way to parse a mozilla firefox json file

I want to scrape all of the url's out of my .json bookmark backup that firefox creates and output a .txt file.
Here's a sample of one of the objects located in the file:
{"index":1,"title":"Bookmarks Toolbar","id":3,"parent":1,"dateAdded":1219177758531250,"lastModified":1288873459187000,"annos":[{"name":"bookmarkProperties/description","flags":0,"expires":4,"mimeType":null,"type":3,"value":"Add bookmarks to this folder to see them displayed on the Bookmarks Toolbar"}],"type":"text/x-moz-place-container","root":"toolbarFolder","children":[{"title":"","id":25,"parent":3,"dateAdded":1224693644437500,"lastModified":1236888979406250,"annos":[{"name":"placesInternal/GUID","flags":0,"expires":4,"mimeType":null,"type":3,"value":"{f6066e21-10ff-46a2-af7a-2891f8dca345}0"}],"type":"text/x-moz-place","uri":"http://www.google.com/"}
These objects are comma-separated and should all contain at least one member that contains a string whose value is the url of the bookmark.
Here's a sample of what the .txt file would have in it:
http://www.google.com
http://www.yahoo.com
http://www.etc.com`
Ideally, I'm interested in seeing if this can be pulled off using any scripting tools available within a generic Windows XP "environment".
If Windows can't cut it, what would be the quickest & easiest solution to this?
Is there a website or program that can do pattern matching or regex to parse the file do search & replace before I go install something like Active Perl or Strawberry Perl and write a script for it.
Another way I found is the method at the following site:
http://forums.mozillazine.org/viewtopic.php?f=38&t=1057265&sid=66d981cc79d1ff63644e0cdd5b665a37
Basically you do the following:
(1) Create a firefox bookmark with the following as the location:
javascript:(function(){var E=document.getElementsByTagName('PRE')[0],T=E.innerHTML,i=0,r1,r2;t=new Array();while(/("uri":"([^"]*)")/g.exec(T)){r1=RegExp.$1;r2=RegExp.$2;if(/^https?:/.exec(r2)){t[i++]='['+(i)+']:<a href='+r2+'>'+r2+'<\/a>';}}with(window.open().document){for(i=0;t[i];i++)write(t[i]+'<br>');close();}})();
(2) Open a blank firefox tab.
(3) drag your firefox json file into the blank tab, this should open the json file.
(4) goto your bookmark you created in step 1.
(5) you should have a list of "clickable urls" for all your bookmarks.
If you have Excel, it's probably easy to do a text to columns split
http://office.microsoft.com/en-us/excel-help/split-names-by-using-convert-text-to-columns-HA001149851.aspx
on ". Given the format (order of fields) is always the same, you should have the URLs somewhere near the last column.
I haven't tested this.
NOTE: Verify/correct all below file paths to match your system.
#Echo Off
Rem FFExportBookmarks.bat
SetLocal EnableDelayedExpansion
Set JSONFile="%APPDATA%\Mozilla\Firefox\Profiles\xyz42pdq.default\bookmarkbackups\Bookmarks.json"
Set FavOut="%USERPROFILE%\My Documents\FFBookmarks.txt"
Set JSONTemp="%Temp%\JSONTemp.txt"
Echo.> %JSONTemp%
Set JSONTemp1="%Temp%\JSONTemp1.txt"
Echo.> %JSONTemp1%
For /f "UseBackQ Delims=" %%N In ('Type %JSONFile%') Do (
Set JSONInput=%%N
Rem Filter double " and other delimiters
Set JSONInput=!JSONInput:"=!
Set JSONInput=!JSONInput: =!
Set JSONInput=!JSONInput:^,= !
Set JSONInput=!JSONInput:[= !
Set JSONInput=!JSONInput:]= !
Set JSONInput=!JSONInput:{= !
Set JSONInput=!JSONInput:}= !
For %%K In (!JSONInput!) Do For /f "Tokens=1,2 Delims=:" %%X In ("%%K") Do (
If /i "%%X"=="uri" Echo %%Y >> %FavOut%
)
)
Start "" %FavOut%
It wasn't very quick, but it's plenty dirty!