Deleting/replacing characters delimited by commas - csv

I'm trying to delete by batch or vbs text delimited by commas (CSV) that are always in the same position. It would not affect the first line, only lines 2 onwards.
Example text from file:
Code,Batch,File #,Reg Hours,O/T,Cost Number,Rate,Earnings,Earnings,Memo Code,Memo Amount,Earnings Code,Earnings Amount,Hours Code,Hours Amount,Earnings Code,Earnings Amount,Adjust Code,Adjust Amount
ABC,123,3980 ,78.52,,12331,10.00,,,,,,,,
ABC,123,4026 ,29.38,,12331,10.00,,,,,,,,
ABC,123,5065 ,64.46,,12331,10.00,,,,,,,,
ABC,123,5125 ,80.00, 0.54,12331,11.00,,,,,,,,
I would like to end up with text:
Code,Batch,File #,Reg Hours,O/T,Cost Number,Rate,Earnings,Earnings,Memo Code,Memo Amount,Earnings Code,Earnings Amount,Hours Code,Hours Amount,Earnings Code,Earnings Amount,Adjust Code,Adjust Amount
ABC,123,3980 ,78.52,,12331,,,,,,,,,
ABC,123,4026 ,29.38,,12331,,,,,,,,,
ABC,123,5065 ,64.46,,12331,,,,,,,,,
ABC,123,5125 ,80.00, 0.54,12331,,,,,,,,,
The only difference is the Rate area. It is the 7th separated value from the left, or 9th from the right. The first line remains intact.
Is there a way for the batch/vbs to determine the comma separated value position, delete the value or replace it with 'nothing', and ignore the first line?
For this example, we can assume the file will always be named file.csv, and located in D:\location - 'D:\location\file.csv'
Thank you!

REM <!-- language: lang-dos -->
#ECHO Off
SETLOCAL ENABLEDELAYEDEXPANSION
SET "sourcedir=U:\sourcedir"
SET "destdir=U:\destdir"
SET "filename1=%sourcedir%\q46534752.txt"
SET "outfile=%destdir%\outfile.txt"
:: Remove the output file
DEL "%outfile%" >NUL 2>nul
:: To reproduce the first line intact
FOR /f "usebackqdelims=" %%a IN ("%filename1%") DO >"%outfile%" ECHO %%a&GOTO hdrdone
:hdrdone
(
REM to process the header line, remove the "skip=1" from the "for...%%a" command
FOR /f "usebackqskip=1delims=" %%a IN ("%filename1%") DO (
REM step 1 - replace all commas with "|," to separate separators
SET "line=%%a"
SET "line=!line:,=|,!"
FOR /f "tokens=1-7*delims=|" %%A IN ("!line!") DO (
SET "line=%%A%%B%%C%%D%%E%%F%%H"
ECHO !line:^|=!
)
)
)>>"%outfile%"
GOTO :EOF
You would need to change the settings of sourcedir and destdir to suit your circumstances.
I used a file named q46534752.txt containing your data for my testing.
Produces the file defined as %outfile%
Processing of the header line is an issue. The code as presented should do as you ask, but it seems illogical to retain the column name in the resultant file when the process is intended to remove that column. To process the header line also, delete the first for line and remove the skip=1 (which skips the first line) from the second.
The fundamental issue is that batch treats a string of delimiters as a single delimiter, so it's necessary to separate those delimiters. This is not possible against a metavariable, but can be done within a loop by transferring the metavariable into an ordinary environment variable (line) and performing the string-replace ceremony on that ordinary variable in delayed expansion mode.
So - replace each , with |,, then process the resultant string using | as a delimiter. Note that the metavariable is in a different case for the second for - one of the few occasions where cmd is case-sensitive. Reconstruct the string, omitting column 7 (%%G) and using the * token meaning the eighth token (%%H) receives the remainder-of-line after the highest explicitly-mentioned token number (7) and echo it after removing remaining | characters.
Note that it is normal policy to refuse code-requests on SO, and only respond in a manner to fix faulty code. In this case however, succeeding browsers may find this response to be the key to doing a similar task and hence refrain from posting unnecessarily. Also, I'm bored witless.

Related

Batch Script - Delete Columns in csv

I do need a batch script who will remove all columns in a csv, except column 1,2 and 5
My Code:
(for /f "tokens=1,2,5 delims=;" %%i in (Input.csv) do echo %%i,%%j,%%k) > Output.csv
Input CSV
1;2;3;4;5;6;7;8;9;10
10160;"Some Name";"Something:0.8";;5;;;;;XY
Expected Output:
1;2;5
10160;"Some Name";5
Real Output
1,2,5
10160,"Some Name",XY
Does anyone have any idea why it keeps the tenth column in the second line instead of the fifth?
SETLOCAL ENABLEDELAYEDEXPANSION
(FOR /f "delims=" %%b IN (Input.csv) DO SET "line=%%b"&SET "line=!line:;;=; ;!"&for /f "tokens=1,2,5 delims=;" %%i in ("!line:;;=; ;!") do echo %%i,%%j,%%k)
The problem is that a sequence of delimiters is considered as a single delimiter, so you need to change each delimiter pair so that it contains a string, and repeat the operation for any remaining delimiter-pairs.
Obviously, you would need to take action to take care of a reported field that now contains a single space, and this will alter any quoted field that contains ;;
Note also that any data containing ! or % is likely to be corrupted and certain other symbols (such as &) may also yield unexpected results. If the data is restricted to alphamerics, spaces, commas, etc. it should be fine.

How to replace commas with semicolons except for commas in Quotes?

I have a csv file with commas used to separate values. I want to replace commas with semicolons via batch, but leave the commas that are inside quotations.
So for example:
012,ABC,"DE,FG",345
must become:
012;ABC;"DE,FG";345
How can I do that via Batch?
If you happen to have the JREPL.BAT regular expression text processing utility (v7.9 or later), then you can use:
jrepl "," ";" /p "([\c\q]+)|\q.*?\q" /prepl "$1?{$0}:$0" /f "test.csv" /o -
Use call jrepl if you put the command within a batch script.
The original file will be overwritten. You can substitute a new file name for - if you don't want to overwrite the original.
JREPL.BAT is pure script (hybrid JScript/batch) that runs natively on any Windows machine from XP onward - no 3rd party .exe file required.
The JREPL solution works by performing the replacement in two steps.
1) The /P option breaks each line into unquoted strings and quoted strings. The /PREPL option passes unquoted strings on to the normal FIND/REPLACE, and unquoted strings are preserved as is.
2) The main FIND/REPLACE substitutes ; for ,
It is possible to reliably accomplish this with pure batch using a variant of a technique developed by jeb at 'Pretty print' windows %PATH% variable - how to split on ';' in CMD shell. Although any pure batch solution will be significantly slower than hybrid solutions like JREPL.BAT, ParseCSV.bat, or a powershell solution.
Here is a batch script derived from jeb's technique - simply pass the name of the CSV file as the one and only argument. The original file will be overwritten. It should be trivial to modify the script to write the output to a new file instead. See jeb's post for an overview of how this seemingly magical technique works.
#echo off
setlocal disableDelayedExpansion
>"%~1.new" (
for /f usebackq^ delims^=^ eol^= %%A in ("%~1") do (
set "ln=%%A"
call :repl
)
)
move /y "%~1.new" "%~1" >nul
exit /b
:repl
set "ln=%ln:"=""%"
set "ln=%ln:^=^^%"
set "ln=%ln:&=^&%"
set "ln=%ln:|=^|%"
set "ln=%ln:<=^<%"
set "ln=%ln:>=^>%"
set "ln=%ln:,=^,^,%"
set ln=%ln:""="%
set "ln=%ln:"=""%"
set "ln=%ln:,,=;%"
set "ln=%ln:^,^,=,%"
set "ln=%ln:""="%"
setlocal enableDelayedExpansion
echo(!ln!
exit /b
The script should be able to process almost any valid CSV file input. The only restrictions are:
Empty lines are stripped from the output (should not be a problem with CSV)
Line lengths are limited to around 8 kb. The exact limit is dependent on how many intermediate substitutions must be performed.
Powershell is probably the better solution but you can use a neat hybrid batch file called ParseCSV.bat. It allows you to specify the input and output delimiters. The input delimiter uses a comma by default. So you only need to specify the output delimiter.
ParseCSV.bat /o:; <"file.csv" >"filenew.csv"
This possible alternative appears to work with the single line example you've provided:
#Echo Off
If Not Exist "file.csv" Exit/B
(For /F "Delims=" %%A In ('FindStr "^" "file.csv"') Do (Set "$="
For %%B In (%%A) Do Call Set "$=%%$%%;%%B"
Call Echo %%$:~1%%))>"filenew.csv"

Batch File that analyze and present data from csv files

I want to create a .bat file that will present the last row of every .csv files that the file name start with "Togo".
The batch file will be located in the same folder as the .csv files.
To output should be the:
[File Name]
[Last Row Data]
This batch file should always run and test the .csv files every 5 minutes.
SO is not a free code-writing service. Your question is likely to be deleted or closed since you have not shown any attempt to solve your problem.
That having been said, it's difficult to start in batch, so here's a solution.
#ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
SET "sourcedir=U:\sourcedir\t w o"
FOR /f "delims=" %%a IN (
'dir /b /a-d "%sourcedir%\togo*.csv" '
) DO (
FOR /f "usebackqdelims=" %%q IN ("%sourcedir%\%%a") DO SET "line=%%q"
ECHO %%a !line!
)
GOTO :EOF
The first two lines turn off batch's debugging display (show-the-command, then execute it) and invoke a mode where access to variables that have changed within a "code block" (parenthesised series of commands) is enabled (normally, it's disabled and only the value at the time the if (or whatever) is encountered is available.)
Line 3 sets a variable called sourcedir and assigns it a value. The enclosing quotes ensure that trailing spaces are not included in the value assigned. I've deliberately used a directoryname that includes spaces because that's a common problem and it proves the batch in my test regime. Your directoryname would be different - simply substitute that. The directoryname . means "the current directory"
Lines 4-6 could be combined to one - just stylistics. It means "perform a directory scan, no directory names (/a-d) in basic form (/b) that is, names-only, of the directory whose name is in the variable sourcedir and whose names fit the pattern togo*.csv. Process each resultant line by ignoring the default delimiters and assigning the result (ie the entire line of the directory list, ie the filenames) to the metavariable (loop-control variable) %%a.
The next line reads each line of the file built from the source-directory name and the filename currently in %%a. delims is set to nothing so the entire line will be assigned to the metavariable %%q and the usebackq option tells cmd that the parenthesised string is a quoted-filename, not a string (or a single-quoted command-to-be-executed as in the first for.) the variable line will then be set with each successive line from the file, so ;ine will have the last line from the file when the for...%%q... ends.
The following line show the filename in %%a and the text from the last line of that file in line.
Note the difference - %%x to access the contents of a metavariable, %var% to access the contents of a variable, but !var! to access the changed value (if delayedexpansion has been invoked).
The goto :eof means "go to the physical end-of-file" CMD understands :eof to mean "physical end-of-file"
So - cut-and-paste to a file named whatever.bat and then run by simply entering
*whatever*
In general,
for /?
will yield help for the for command, and this holds for most batch commands. Look on SO for thousands of examples.
You may also examine
timeout
cls
choice
for clues about how to achieve your every 5 minutes ambition. You might want to run this from the task scheduler to get an every 5 minutes display - many ways to achieve the same thing.

Batch: Convert .csv to tab-delimited text, only some fields are quoted, contain commas between quotes (eBay order file)

I'm trying to convert the eBay File Exchange download into a tab-delimited format my shipping software can read.
If each and every column were quoted, this would be easy--but they're not. Only some columns (name, item listing title, etc) are quoted, and some quoted columns contain commas. The rest are bare of quotes.
I need a way to parse and convert this in a .bat file, but using comma as a delimiter splits the quoted fields if they contain a comma too, giving me unusable data. I'm certain there's a simple fix for this, I just can't figure it out.
Eric J is correct - solving this kind of problem with batch is not simple. But it is possible :-)
The main problem is how to differentiate between quoted and unquoted commas - jeb solved a similar problem with quoted vs. unquoted semicolons at 'Pretty print' windows %PATH% variable - how to split on ';' in CMD shell. The code below looks very different, but the fundamental concept is the same.
The code below should work for pretty much any CSV as long as all lines are less than ~8000 bytes long. Batch variable values are limited to 8191 bytes, and some characters are temporarily expanded to two bytes.
The code assumes there are not any existing TABs within the CSV file.
It does not modify any existing quotes.
As I say, the code should work, but it will be painfully SLOW if you have a large file. You would be much better off with a .NET solution as Eric J suggested.
#echo off
setlocal disableDelayedExpansion
set "file=optionalPathinfo\yourFile.csv"
:: Define a TAB variable
for /f "delims=" %%A in (
'forfiles /p "%~dp0." /m "%~nx0" /c "cmd /c echo(0x09"'
) do set "TAB=%%A"
:: Read each line from CSV, convert it, and write to new file with .new extension
>"%file%.new" (
for /f usebackq^ delims^=^ eol^= %%A in ("%file%") do (
set "line=%%A"
call :processLine
)
)
exit /b
:processLine
setlocal enableDelayedExpansion
:: Protect problem characters
set "line=!line:#=#A!"
set "line=!line:^=#K!"
set "line=!line:&=#M!"
set "line=!line:|=#P!"
set "line=!line:<=#L!"
set "line=!line:>=#G!"
:: Mark commas with leading caret (escape)
set "line=!line:,=^,!"
:: Remove mark from unquoted commas, but first temporarily
:: disable delayed expansion to protect any ! characters
setlocal disableDelayedExpansion
set ^"line=%line%"
setlocal enableDelayedExpansion
:: Protect remaining marked commas
set "line=!line:^,=#C!"
:: Convert remaining commas to TAB
set "line=!line:,=%TAB%!"
:: Restore protected characters
set "line=!line:#C=,!"
set "line=!line:#G=>!"
set "line=!line:#L=<!"
set "line=!line:#P=|!"
set "line=!line:#M=&!"
set "line=!line:#K=^!"
set "line=!line:#A=#!"
:: Write modified line
echo(!line!
exit /b
There's a further complication: A field with a quote and a comma will also have the quote escaped:
Jim "Smitty" Smith, Jr.
would be represented in the CSV file as
"Jim ""Smitty"" Smith, Jr."
This is not the kind of problem that is easily solved in a batch file. However, there is preexisting functionality to deal with the CSV format that can be used from any .NET compatible language including Powershell. If that is an option, have a look at
http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader
For information on calling the .NET methods to read CSV files from Powershell, have a look at
http://blogs.msdn.com/b/mattbie/archive/2010/02/23/how-to-call-net-and-win32-methods-from-powershell-and-your-troubleshooting-packs.aspx

How to drop all but last cell in CSV using CMD

my goal is to write a script that will monitor process memory usage and run % based comparison on it to determine if there is a memory leak in the said process.
I am using the following command to get the momory usage of the process:
tasklist /fi "imagename eq %PROCESS%" /FO csv | findstr K
SAMPLE:
"cmd.exe","11640","Console","1","3,160 K"
This gives me a CSV file with last cell being the memory usage. I have two problems that I need help with.
Problem 1) How do I drop all but the last cell so that I can then assign the Kb used to a variable for comparison.
Problem 2) How do I get rid of the comma in the number? That kind of makes using comma as delim hard :/
Is there a better command than tasklist for this? I just need the raw number that the program is using, it can be in KB or MB.
Id love to be able to not have dependencies, but if I have to have dependencies I can include them with the batch.
Also is there any way for findstr to not return the entire line?
Thanks for any help! Ive been trying to get this solved for two days now with not much luck.
#ECHO OFF
SETLOCAL
FOR /f "delims=" %%i IN (memcsv.csv) DO CALL :process %%i
GOTO :EOF
:process
SET memsize=%~5
SET memsize=%memsize:,=%
ECHO memsize found = %memsize%
GOTO :eof
This should get your output into a variable called memsize.
It uses a file memcsv.csv as input, but you could replace mmcsv.csv with
'tasklist /fi "imagename eq %PROCESS%" /FO csv ^| findstr Mem'
to operate directly on the output of FINDSTR. Your resultant line would thus be
FOR /f "delims=" %%i IN ('tasklist /fi "imagename eq %PROCESS%" /FO csv ^| findstr Mem') DO CALL :process %%i
which, for ease of legibility could be entered as
FOR /f "delims=" %%i IN (
'tasklist /fi "imagename eq %PROCESS%" /FO csv ^| findstr Mem'
) DO CALL :process %%i
Note that the line-breaks are specific - before and after the single-quote.
Also that the single-quotes are REQUIRED and that there is a caret (^) before the pipe (|) which tells cmd that the pipe is part of the command to be executed, not part of the FOR command
Edit to add explanation of HOW.
The ouput of the tasklist...|findstr... can be used as input to a for/f as if it was a file. All you need do is to surround the command with SINGLE-QUOTES and ensure that redirectors like | < > are "escaped" by a caret.
FOR /F "reads" the "file" line-by-line, assigning (by default) the first "token" in the line to the "metavariable" (the loop-control variable, %%i in the above case). This behaviour canbe modified by the addition of control-clauses to the FOR/F. You may use `tokens=x,y,z" for instance to assign token number x, number y and number z to %%i, %%j, %%k respectively.
TOKENS are counted from 1 and have a value of the line contents up to a (series of) delimiter(s). By default, delimiters are spaces, commas, semicolons and TABs, so a line
TOKEN_ONE TOKEN_2,TOKEN_THREE;Token_FOUR
when seen by
for /f "tokens=1,3,4" %%i in (filecontainingaboveline) do
would set %%i=TOKEN_ONE %%j=TOKEN_THREE %%k=Token_FOUR
Using "delims=" turns OFF the delimiters and hence the ENTIRE line is assigned to the metavariable.
HENCE, in the above code, the entire line is assigned to %%i and delivered to the subroutine :process.
From :process's point-of-view, it has been given the argument ** "cmd.exe","11640","Console","1","3,160 K"** which it interprets as a sequence of 5 parameters separated by commas - and a comma (or any other separator) WITHIN "quotes" is data, not a separator.
Parameter number 5 is accessed by %5 - and that is "3,160 K" - including the quotes and comma.
The variable is set to the value of the fifth parameter - the tilde (~) means "remove enclosing quotes." Hence memsize acquires a value of 3,160 K
The next SET replaces the string after the colon in the nominated variable with the string after the = - replace commas with nothing, and assign the result to the memsize variable.
The goto :eof means 'go to the physical end-of-file.` It is very specific - the colon MUST be present. Reaching end-of-file terminates a subroutine or batch-process.
To remove the last 2 characters of the variable, you could use
SET var=%var:0,~-2%
where var is the variable-name.
SEE
SET /?
from the prompt for documentation.
Also GOTO /? and FOR/? for more details on these commands...