Replace column values from a csv file with batch - csv

I have a commma seperated csv-file like this:
ID,USER_ID, COL3_STR, COL4_INT
id1,username1,exampleA, 5
id2,username1,exampleB,0
id3,username2,NULL,-1
id4,username3,,3,false,20
Each value from the 2nd column USER_ID must be replaced with testusername (except the header "USER_ID"). The values are different, so I can't search a defined string.
My idea was to use a for-loop and get the second token from each line to get the username. For example:
#ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
SET currentDir=%~dp0
SET "srcfile=%currentDir%\inputfile.csv"
SET "outfile=%currentDir%\result.csv"
for /f "tokens=2 delims=," %%A IN (%srcfile%) DO (
ECHO %%A)
ECHO done
PAUSE
Output:
USER_ID
username1
username1
username2
username3
So the 2nd column of the (new) csv file must look like:
USER_ID
testusername
testusername
testusername
testusername
I saw another question with an helpful answer.
Example: When each username is "admin":
(
for /f "delims=" %%A in (%srcfile%) do (
set "line=%%A"
for /f "tokens=2 delims=," %%B in ("admin") do set "line=!line:%%B=testuser!"
echo !line!
)
)>%outfile%
But this works only for a defined string. It's my first batch-script and I don't know how to "combine" this for my situation. I hope sombody can help me.
Must work for Windows 7 and 10.

You need all the tokens (for writing the modified file), not just the second one:
for /f "tokens=1,2,* delims=," %%A in (%srcfile%) do echo %%A,testuser,%%C
(where * is "the rest of the line, undelimited"). %%B would be the username, so just write the replacement string instead.
You could use an if statement to process the first line differently, or you process it separately:
<"%srcfile%" set /p header=
(
echo %header%
for /f "skip=1 tokens=1,2,* delims=," %%A in (%srcfile%) do echo %%A,testuser,%%C
) > "%outfile%"

The following script (let us call it repl_2nd.bat) replaces the values in the second column of a CSV file and correctly handles empty fields (where separators occur next to each other like ,,):
#echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_FILE=%~1" & rem // (path to input file; `%~1` is first argument)
set "_NVAL=testusername" & rem // (new string for second column)
set "_SEPC=," & rem // (separator character usually `,`)
rem // Especially handle first line:
< "%_FILE%" (
set "HEAD=" & set /P HEAD=""
setlocal EnableDelayedExpansion
echo(!HEAD!
endlocal
)
rem // Read input file line by line:
for /F usebackq^ skip^=1^ delims^=^ eol^= %%L in ("%_FILE%") do (
rem // Store current line string:
set "LINE=%%L"
rem // Toggle delayed expansion to avoid loss of `!`:
setlocal EnableDelayedExpansion
rem /* Replace each separator `,` by `","` and enclose whole line string in `""`,
rem resulting in all items to become quoted, ven empty ones, hence avoiding
rem adjacent separators, which would become collapsed to one by `for /F`;
rem then split the edited line string at the first and second separators: */
for /F "tokens=1,2,* delims=%_SEPC% eol=%_SEPC%" %%A in (^""!LINE:%_SEPC%="^%_SEPC%"!"^") do (
rem /* Unquote the first item, then join a separator and the replacement string;
rem then remove the outer pair of quotes from the remaining line string: */
endlocal & set "REPL=%%~A%_SEPC%%_NVAL%" & set "REST=%%~C"
rem // Append the remaining line string with `","` replaced by separators `,`:
setlocal EnableDelayedExpansion & echo(!REPL!%_SEPC%!REST:"%_SEPC%"=%_SEPC%!
)
endlocal
)
endlocal
exit /B
To use the script o a file in the current working directory, use this command line:
repl_2nd.bat "inputfile.csv"
To store the output to another file, use the following command line:
repl_2nd.bat "inputfile.csv" > "outputfile.csv"

Related

Sort CSV Records

I am trying to sort a csv file on a specific column using batch scripting.
The csv file has about 22 column and column L(10) contains zip codes. There are multiple records with the same zip code and I need to sort these record in ascending numerical order.
This is what I've done so far,
for /F "tokens=1-22 delims=," %%a in (test.csv) do (
rem Define the sorting column in next line: %%a=1, %%b=2, etc...
set "line["%%l"]=%%d,%%f,%%l"
)
for /F "tokens=1* delims==" %%a in ('set line[') do echo %%b >> result2.txt
This is my result. It is removing records with duplicated zip code. I should see multiple row with the same zip code but with different names of course.
"John","Doe","12078"
"John","Doe3","12095"
"John","Doe5","12197"
OR %%f in (*csv) do (
SET CurrentFile=%%f
SET /a NumLines=0
For /f %%j in ('Find "" /v /c ^< !CurrentFile!') Do (
Set /a NumLines=%%j
(set row=%~1) & (set last=%~1)
For /F "tokens=4-7 delims=," %%D in ('type !CurrentFile!') do (
if not defined row (set row=%%D %%F) else (set last=%%D %%F)
)
echo.
echo. Filename: !CurrentFile!
echo. Record Count: !NumLines!
echo. First Record Name:!row!
echo. Last Record Name: !last!
) >> Result.txt
)
ENDLOCAL
setlocal EnableDelayedExpansion
for /F "tokens=1-22 delims=," %%a in (test.csv) do (
rem Define the sorting column in next *three lines*: %%a=1, %%b=2, etc...
if not defined V%%~l set "V%%~l=1000"
set /A "V%%~l+=1"
set "line[%%~l!V%%~l!]=%%d,%%f,%%l"
)
for /F "tokens=1* delims==" %%a in ('set line[') do echo %%b >> result2.txt
If there are multiple records with the same zip code, then it is necessary to identify each one of them. This solution uses a variable called V<zip code> as counter for each one of the records with the same zip code. Then, the value of such a variable is joined to the zip code itself in order to create a unique key for each record. The program assumes that there is a maximum of 999 records with the same zip code; if this value is not enough, just add a zero in if not defined V%%~l set "V%%~l=1000" line...
#ECHO OFF
SETLOCAL
SET "sourcedir=U:\sourcedir"
SET "destdir=U:\destdir"
SET "filename1=%sourcedir%\q56588370.txt"
SET "outfile=%destdir%\outfile.txt"
SET "sortfile=%destdir%\sortfile.txt"
SET /a sortcol=3
(
FOR /f "usebackqdelims=" %%a IN ("%filename1%") DO (
rem full line in %%a
SET "fullline=%%a"
CALL :sub %%a
)
)>"%sortfile%"
(
FOR /f "tokens=1*delims=+" %%a IN (' sort "%sortfile%"') DO (
ECHO %%b
)
)>"%outfile%"
DEL "%sortfile%"
GOTO :EOF
:sub
IF %sortcol% neq 1 FOR /L %%z IN (2,1,%sortcol%) DO SHIFT
ECHO %1+%fullline%
GOTO :eof
You would need to change the settings of sourcedir and destdir to suit your circumstances.
I used a file named q56588370.txt containing some dummy data for my testing.
Produces the file defined as %outfile%. %sortfile% is simply a temporary file having whatever name you desire within reason.
Retrieve each line of your file, and assign its content to a variable fullline, then execute the subroutine :sub with each line, passing the entire line as a parameter. Since each line must be a comma-separated list of items which may either be a quoted string or a string which doesn't contain spaces or commas, it can be decoded by the subroutine, so all that is required is to shift the parameter-list (columnrequired - 1) times and the required sort-data is in %1.
output %1 followed by a delimiter and the entire line originally read (parenthesising a series of statements and redirecting sends the data that normally appears on the screen to the redirection destination) into a temporary file, sort it and remove the data prefixed to each line using the chosen delimiter.
This way, more than one column could be chosen, and the data manipulated as required - for instance, locally "zip codes" are 4-digit (which can begin 0) and other countries use other formats or the ever-popular extension code that might be applied to a ZIP can be recorded and processed.
Here's my test data:
"John","Doe","12345","moredata 1"
"John","Do, or not","12345","moredata 2"
"John","Doe 4","12344","moredata 3"
"John","Doe 5","12345","moredata 4"
"John","Doe 6","12345","moredata 5"
"John","Doe 7","12344","moredata 6"
and output:
"John","Doe 4","12344","moredata 3"
"John","Doe 7","12344","moredata 6"
"John","Do, or not","12345","moredata 2"
"John","Doe 5","12345","moredata 4"
"John","Doe 6","12345","moredata 5"
"John","Doe","12345","moredata 1"

replace string in csv file using batch file

I have a csv file with 18 fields. I need to copy the file to a txt file, delete the first four lines, replace the data in field #8, and save the file with a new name.
The data in field #8 is an integer (for example, 1, 2, 3, etc). Each integer needs to be replaced with a separate value (for example, I need to replace 1 with 1005 and 3 with 1008). I am trying to modify/fix the following batch file:
#echo off
More +4 datatest.csv > datacopy.txt
( FOR /f "tokens=8 delims=," %%h in (datacopy.txt) do (
if "%%h"=="3" (echo 1008) else (
echo %%a %%b %%c` echo %%a %%b %%c
)
)
)>paygoinvoice.txt
#echo on
With only one token selected, you'll get only one column (%%h)
parsing the more command directly, there is no need for a temporary file.
depending on how many integers to replace, you may use a pseudo array with the int as an index/pointer.
you may either get all columns separately (tokens=1-18,%%A..%%R) or gather the rest * in one for variable.
#echo off & Setlocal EnableDelayedExpansion
( FOR /f "tokens=1-8* delims=," %%A in ('More +4 datatest.csv') do (
Set "H=%%H"
if "%%H"=="1" Set "H=1005"
if "%%H"=="3" Set "H=1008"
echo %%A,%%B,%%C,...,!H!,%%I
)
)>paygoinvoice.txt
#echo on

Batch comparing 2 differents csv files

I havent find anything on internet so i need your help.
I have 2 CSV Files that i would like to compare:
the first one is like :
"Name","PrimarySmtpAddress","EmailAddresses"
the second one is like :
"Name","$_.TotalItemSize.Value.ToMB()"
the output file must show which name is both in first and second files
And i want to have, as output, a file with all the data in the first files but with the "$_.TotalItemSize.Value.ToMB()" added a the end of each lines.
for exemple it would do something like :
"Name","PrimarySmtpAddress","EmailAddresses","$_.TotalItemSize.Value.ToMB()",
I must be not very clear because me english is not perfect.
Can you guys please help me ? im not very good at scripting.
thank you very much.
edit :
REM #echo off
setlocal enabledelayedexpansion
set var1
set var2
for /f "tokens=1 delims=," %%A in (file2.txt) do (
set var1=%%A
echo %var1%
for /f "tokens=1 delims=," %%B in (file1.txt) do (
set var2=%%B
echo %var2%
if ("%var1%"=="%var2%")
(
echo equal var
)
else
(
echo not equal var
)
pause
)
)
pause
It looks like the IF is not working
for each line in 1.csv, look for the name in 2.csv and print combined line.
The REGEX may look a bit strange to you, it's:
/rc:: r=use Regex, c:=use string (necessary, as there could be spaces)
^: "Start of string"
\": a literal "
%%~a: the name without quotes
/": another literal "
,: a literal , (optional)
"tokens=1,* delims=," means "put the first token into %%m and all the rest into %%n"
Note: there is no IF. It's replaced by findstr, which extracts just the line, you need.
Note: this may be slow with big files (2.csv is read multiple times (as much as there are lines in 1.txt))
#echo off
setlocal enabledelayedexpansion
for /f "tokens=1,* delims=," %%a in (1.csv) do (
for /f "tokens=1,* delims=," %%m in ('findstr /rc:"^\"%%~a\"," 2.csv') do (
echo %%a,%%b,%%n
)
)
Names, that aren't in both files, will be skipped.

Batch: Fill array by text file with escaped characters

I've got two arrays to fill from a text file. One with UTF-8 umlauts and one with escaped.
all_headings_html_umlauts_escaped.txt
^&Uuml^;berblick
^&Auml^;pfel
^&Ouml^;sterreich
all_headings_utf8_umlauts.txt
Überblick
Äpfel
Österreich
My batch file:
#echo off
:: Build array to iterate through
set /A n=0
for /F "usebackq delims=" %%a in ("all_headings_utf8_umlauts.txt") do (
set /A n+=1
REM call echo %%n%%
call set arrayutfeight[%%n%%]=%%a
call set o=%%n%%
)
for /L %%i in (1,1,%o%) do call echo %%arrayutfeight[%%i]%%
pause
:: Build arrayy to iterate through
set /A p=0
for /F "usebackq delims=" %%b in ("all_headings_html_umlauts_escaped.txt") do (
set /A p+=1
REM call echo %%k%%
call set arrayhtmlescaped[%%p%%]=%%b
call set q=%%p%%
)
for /L %%i in (1,1,%q%) do call echo %%arrayhtmlescaped[%%i]%%
pause
The ouput of the first array works perfectly and as it should be but the ouput of the second one is three times "ECHO is off".
Any ideas why and how I can solve this issue? I really need as an output in my batch file from the array ^&Uuml^;berblick...
KR
Mark
The management of the ^ caret character is complicated in a Batch file. Such a character is duplicated when it appears in a line in certain cases. In this way, the call set "arrayhtmlescaped[%%p%%]=%%b" line stores two carets per each one in the file, so extra carets must be removed. The simplest way to do that is using Delayed Expansion, but in the echo command the carets are placed outside quotes, so it is necessary to escape each caret with an additional one.
#echo off
setlocal EnableDelayedExpansion
:: Build arrayy to iterate through
set /A p=0
for /F "usebackq delims=" %%b in ("all_headings_html_umlauts_escaped.txt") do (
set /A p+=1
REM call echo %%k%%
call set "arrayhtmlescaped[%%p%%]=%%b"
call set q=%%p%%
)
for /L %%i in (1,1,%q%) do echo !arrayhtmlescaped[%%i]:^^^^=^^!
pause

How to split CSV column values into other columns using batch script

I'm new to batch files and this is a tricky question. In stores.csv there is a column called 'Image' which stores vertical-line-delimited image URLs as values. There are also additional columns called 'AltImage2', 'AltImage3', etc. How can I split the vertical-line-delimited string into columns that start with 'AltImage' for each row in the CSV? 'AltImage' columns only go to AltImage5, and there may not be five image URLs in a given row. I would also like to keep the first image URL in the 'Image' column if possible.
Example of headers and single row of data:
Company,Title,Image,AltImage2,AltImage3,AltImage4,AltImage5
Testco,U2X40,image1.png|image2.png|image3.png
Desired result after running batch:
Company,Title,Image,AltImage2,AltImage3,AltImage4,AltImage5
Testco,U2X40,image1.png,image2.png,image3.png
So far I've tried this:
for /f "tokens=3 delims=, " %%a in ("stores.csv") do (
echo run command here "%%a"
)
But cannot even echo the values in the Image column.
Here is a solution using Bash script (unfortunately I need batch): How do I split a string on a delimiter in Bash?
#echo off
setlocal
< stores.csv (
rem Read and write the header
set /P "header="
call echo %%header%%
rem Process the rest of lines
for /F "tokens=1-3 delims=|" %%a in ('findstr "^"') do echo %%a,%%b,%%c
)
I think this handles your parsing problem. Pay attention to quotes and the usebackq option.
for /f "skip=1 tokens=3 delims=," %%a in (stores.csv) do for /f "tokens=1-5 delims=|" %%b in ("%%a") do echo %%b %%c %%d %%e %%f
Here's a fuller solution to play with. There may be a more elegant way to handle optional commas. And you'll have to handle directing the output to whichever place is appropriate.
#echo off
setlocal enabledelayedexpansion
echo Company,Title,Image,AltImage2,AltImage3,AltImage4,AltImage5
for /f "skip=1 tokens=3 delims=," %%a in (stores.csv) do (
for /f "tokens=1-5 delims=|" %%b in ("%%a") do (
set line=%%b
if not "%%c"=="" set line=!line!,
set line=!line!%%c
if not "%%d"=="" set line=!line!,
set line=!line!%%d
if not "%%e"=="" set line=!line!,
set line=!line!%%e
if not "%%f"=="" set line=!line!,
set line=!line!%%f
echo !line!
)
)
read the file line by line and replace | with , (you have to escape the | and use delayed expansion:
#echo off
setlocal enabledelayedexpansion
(
for /f "delims=" %%a in (old.csv) do (
set line=%%a
echo !line:^|=,!
)
)>new.csv