I am trying to sort a csv file on a specific column using batch scripting.
The csv file has about 22 column and column L(10) contains zip codes. There are multiple records with the same zip code and I need to sort these record in ascending numerical order.
This is what I've done so far,
for /F "tokens=1-22 delims=," %%a in (test.csv) do (
rem Define the sorting column in next line: %%a=1, %%b=2, etc...
set "line["%%l"]=%%d,%%f,%%l"
)
for /F "tokens=1* delims==" %%a in ('set line[') do echo %%b >> result2.txt
This is my result. It is removing records with duplicated zip code. I should see multiple row with the same zip code but with different names of course.
"John","Doe","12078"
"John","Doe3","12095"
"John","Doe5","12197"
OR %%f in (*csv) do (
SET CurrentFile=%%f
SET /a NumLines=0
For /f %%j in ('Find "" /v /c ^< !CurrentFile!') Do (
Set /a NumLines=%%j
(set row=%~1) & (set last=%~1)
For /F "tokens=4-7 delims=," %%D in ('type !CurrentFile!') do (
if not defined row (set row=%%D %%F) else (set last=%%D %%F)
)
echo.
echo. Filename: !CurrentFile!
echo. Record Count: !NumLines!
echo. First Record Name:!row!
echo. Last Record Name: !last!
) >> Result.txt
)
ENDLOCAL
setlocal EnableDelayedExpansion
for /F "tokens=1-22 delims=," %%a in (test.csv) do (
rem Define the sorting column in next *three lines*: %%a=1, %%b=2, etc...
if not defined V%%~l set "V%%~l=1000"
set /A "V%%~l+=1"
set "line[%%~l!V%%~l!]=%%d,%%f,%%l"
)
for /F "tokens=1* delims==" %%a in ('set line[') do echo %%b >> result2.txt
If there are multiple records with the same zip code, then it is necessary to identify each one of them. This solution uses a variable called V<zip code> as counter for each one of the records with the same zip code. Then, the value of such a variable is joined to the zip code itself in order to create a unique key for each record. The program assumes that there is a maximum of 999 records with the same zip code; if this value is not enough, just add a zero in if not defined V%%~l set "V%%~l=1000" line...
#ECHO OFF
SETLOCAL
SET "sourcedir=U:\sourcedir"
SET "destdir=U:\destdir"
SET "filename1=%sourcedir%\q56588370.txt"
SET "outfile=%destdir%\outfile.txt"
SET "sortfile=%destdir%\sortfile.txt"
SET /a sortcol=3
(
FOR /f "usebackqdelims=" %%a IN ("%filename1%") DO (
rem full line in %%a
SET "fullline=%%a"
CALL :sub %%a
)
)>"%sortfile%"
(
FOR /f "tokens=1*delims=+" %%a IN (' sort "%sortfile%"') DO (
ECHO %%b
)
)>"%outfile%"
DEL "%sortfile%"
GOTO :EOF
:sub
IF %sortcol% neq 1 FOR /L %%z IN (2,1,%sortcol%) DO SHIFT
ECHO %1+%fullline%
GOTO :eof
You would need to change the settings of sourcedir and destdir to suit your circumstances.
I used a file named q56588370.txt containing some dummy data for my testing.
Produces the file defined as %outfile%. %sortfile% is simply a temporary file having whatever name you desire within reason.
Retrieve each line of your file, and assign its content to a variable fullline, then execute the subroutine :sub with each line, passing the entire line as a parameter. Since each line must be a comma-separated list of items which may either be a quoted string or a string which doesn't contain spaces or commas, it can be decoded by the subroutine, so all that is required is to shift the parameter-list (columnrequired - 1) times and the required sort-data is in %1.
output %1 followed by a delimiter and the entire line originally read (parenthesising a series of statements and redirecting sends the data that normally appears on the screen to the redirection destination) into a temporary file, sort it and remove the data prefixed to each line using the chosen delimiter.
This way, more than one column could be chosen, and the data manipulated as required - for instance, locally "zip codes" are 4-digit (which can begin 0) and other countries use other formats or the ever-popular extension code that might be applied to a ZIP can be recorded and processed.
Here's my test data:
"John","Doe","12345","moredata 1"
"John","Do, or not","12345","moredata 2"
"John","Doe 4","12344","moredata 3"
"John","Doe 5","12345","moredata 4"
"John","Doe 6","12345","moredata 5"
"John","Doe 7","12344","moredata 6"
and output:
"John","Doe 4","12344","moredata 3"
"John","Doe 7","12344","moredata 6"
"John","Do, or not","12345","moredata 2"
"John","Doe 5","12345","moredata 4"
"John","Doe 6","12345","moredata 5"
"John","Doe","12345","moredata 1"
Related
This is my first time working with batch files. I am trying to extract certain columns from original csv and pipe output to new csv. The following code is what I wrote based on this link:
https://stackoverflow.com/a/17557532/16034206
#echo off
setlocal EnableDelayedExpansion
Rem for /f "skip=1 usebackq tokens=1,2,10,11 delims=," %%i in (sample.csv) do #echo %%i,%%j,%%k,%%l >>output.csv
echo "Your script is starting..."
FOR /F "skip=1 usebackq delims=" %%L in (sample.csv) DO (
set "line=%%L,,,,,,,,"
set "line=#!line:,=,#!"
FOR /F "tokens=1,2,10,11 delims=," %%a in ("!line!") DO (
set "param1=%%a"
set "param2=%%b"
set "param10=%%c"
set "param11=%%d"
set "param1=!param1:~1!"
set "param2=!param2:~1!"
set "param10=!param10:~1!"
set "param11=!param11:~1!"
if "%%~A"=="RH" echo !param1!, !param2!, !param10!, !param11! >> output.csv
)
)
echo "Your script has completed"
I am looking to apply logic to check param1 contains a substring "#gmail.com" AND that param10 starts with a specific string "100" before outputting that specific row of 4 columns into the csv.
I checked how to use if-statement from this link: https://stackoverflow.com/a/17474377/10671013
but I have not found any links on SO discussing "containing substring" or checking for "starting with a string". Please advise.
Remove the substring you look for from the first column and compare it with the original string, if not equal (string contains substring), check the first three characters of the other column. (This substring substitution is case insensitive):
if not "!param1:#gmail.com=!" == "!param1!" if "!param10:~0,3!" == "100" echo ...
I have a commma seperated csv-file like this:
ID,USER_ID, COL3_STR, COL4_INT
id1,username1,exampleA, 5
id2,username1,exampleB,0
id3,username2,NULL,-1
id4,username3,,3,false,20
Each value from the 2nd column USER_ID must be replaced with testusername (except the header "USER_ID"). The values are different, so I can't search a defined string.
My idea was to use a for-loop and get the second token from each line to get the username. For example:
#ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
SET currentDir=%~dp0
SET "srcfile=%currentDir%\inputfile.csv"
SET "outfile=%currentDir%\result.csv"
for /f "tokens=2 delims=," %%A IN (%srcfile%) DO (
ECHO %%A)
ECHO done
PAUSE
Output:
USER_ID
username1
username1
username2
username3
So the 2nd column of the (new) csv file must look like:
USER_ID
testusername
testusername
testusername
testusername
I saw another question with an helpful answer.
Example: When each username is "admin":
(
for /f "delims=" %%A in (%srcfile%) do (
set "line=%%A"
for /f "tokens=2 delims=," %%B in ("admin") do set "line=!line:%%B=testuser!"
echo !line!
)
)>%outfile%
But this works only for a defined string. It's my first batch-script and I don't know how to "combine" this for my situation. I hope sombody can help me.
Must work for Windows 7 and 10.
You need all the tokens (for writing the modified file), not just the second one:
for /f "tokens=1,2,* delims=," %%A in (%srcfile%) do echo %%A,testuser,%%C
(where * is "the rest of the line, undelimited"). %%B would be the username, so just write the replacement string instead.
You could use an if statement to process the first line differently, or you process it separately:
<"%srcfile%" set /p header=
(
echo %header%
for /f "skip=1 tokens=1,2,* delims=," %%A in (%srcfile%) do echo %%A,testuser,%%C
) > "%outfile%"
The following script (let us call it repl_2nd.bat) replaces the values in the second column of a CSV file and correctly handles empty fields (where separators occur next to each other like ,,):
#echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_FILE=%~1" & rem // (path to input file; `%~1` is first argument)
set "_NVAL=testusername" & rem // (new string for second column)
set "_SEPC=," & rem // (separator character usually `,`)
rem // Especially handle first line:
< "%_FILE%" (
set "HEAD=" & set /P HEAD=""
setlocal EnableDelayedExpansion
echo(!HEAD!
endlocal
)
rem // Read input file line by line:
for /F usebackq^ skip^=1^ delims^=^ eol^= %%L in ("%_FILE%") do (
rem // Store current line string:
set "LINE=%%L"
rem // Toggle delayed expansion to avoid loss of `!`:
setlocal EnableDelayedExpansion
rem /* Replace each separator `,` by `","` and enclose whole line string in `""`,
rem resulting in all items to become quoted, ven empty ones, hence avoiding
rem adjacent separators, which would become collapsed to one by `for /F`;
rem then split the edited line string at the first and second separators: */
for /F "tokens=1,2,* delims=%_SEPC% eol=%_SEPC%" %%A in (^""!LINE:%_SEPC%="^%_SEPC%"!"^") do (
rem /* Unquote the first item, then join a separator and the replacement string;
rem then remove the outer pair of quotes from the remaining line string: */
endlocal & set "REPL=%%~A%_SEPC%%_NVAL%" & set "REST=%%~C"
rem // Append the remaining line string with `","` replaced by separators `,`:
setlocal EnableDelayedExpansion & echo(!REPL!%_SEPC%!REST:"%_SEPC%"=%_SEPC%!
)
endlocal
)
endlocal
exit /B
To use the script o a file in the current working directory, use this command line:
repl_2nd.bat "inputfile.csv"
To store the output to another file, use the following command line:
repl_2nd.bat "inputfile.csv" > "outputfile.csv"
I have a simple batch to write to C:\Program Files (x86)\Data\ori.csv file the folowing information: division, originator name
#echo off
CHOICE /C NS /M "Please Choose Division:"
echo.
if errorlevel 1 set division=A8-NN
if errorlevel 2 set division=A8-NS
:PROMPT
set /P ori= "Add %division% Originator? [(Y)=yes / (N)=No] "
IF /I "%ori%" NEQ "N" goto add (
) else (
goto exit
)
:add
set /p oriname= "Please Enter %division% Originator Name "
echo Division %division% Originator %oriname% has been Sucessfully added
echo %division%,%oriname% >>C:\%programfiles(x86)%\data\Ori.csv
echo.
goto prompt
:exit
pause
the output of csv to be e.g.
A8-NN,Chris
A8-NN,Alfredo
A8-NS,Joe
A8-NN,Patrick
A8-NS,Ann
etc
the data of this .csv is gonna change every 2 months for the divisions (new people assigned in each division)
My problem is that i want in a seperate batch file from ori.csv file to read the data and for a specific division use the choice command to choose one originator
As far i have done this:
CHOICE /C NS /M "Please Choose Division:"
echo.
if errorlevel 1 set division=A8-NN
if errorlevel 2 set division=A8-NS
count=
for /f "tokens=1-20* delims=," %%a in ('type "C:\%programfiles(x86)%\data\Ori.csv"') do (
if %%a== ("%division%)
set b = %%b
set "count=!count!+1"
echo %count% %%b
)
)
What i tried to do is to the %count% variable store a number identifier and to the %b variable store the originator name. How can I use those two variables as input to a choice command?
You'll need a slightly different approach. After some clarification in the comments.. I would recommended that you move towards set /p here, simply because we never know how many options there will be, you say max 20, but tomorrow suddenly there are 27, then what? So I will rather be safe than sorry :)
#echo off & set cnt=0
setlocal enabledelayedexpansion
for /f "usebackq tokens=1* delims=," %%a in ("%programfiles(x86)%\data\Ori.csv") do (
if not defined %%a (
set /a cnt+=1
set "%%a=%%a"
set "!cnt!=%%a"
echo !cnt!. %%a
)
)
set /p "oper=Please choose Division (1 to !cnt!): "
if not %oper% gtr !cnt! (
set cnt=0
for /f "usebackq tokens=1-20* delims=," %%a in ("%programfiles(x86)%\data\Ori.csv") do if "%%a" == "!%oper%!" set /a cnt+=1 & echo !cnt! %%b
)
I have a csv file with 18 fields. I need to copy the file to a txt file, delete the first four lines, replace the data in field #8, and save the file with a new name.
The data in field #8 is an integer (for example, 1, 2, 3, etc). Each integer needs to be replaced with a separate value (for example, I need to replace 1 with 1005 and 3 with 1008). I am trying to modify/fix the following batch file:
#echo off
More +4 datatest.csv > datacopy.txt
( FOR /f "tokens=8 delims=," %%h in (datacopy.txt) do (
if "%%h"=="3" (echo 1008) else (
echo %%a %%b %%c` echo %%a %%b %%c
)
)
)>paygoinvoice.txt
#echo on
With only one token selected, you'll get only one column (%%h)
parsing the more command directly, there is no need for a temporary file.
depending on how many integers to replace, you may use a pseudo array with the int as an index/pointer.
you may either get all columns separately (tokens=1-18,%%A..%%R) or gather the rest * in one for variable.
#echo off & Setlocal EnableDelayedExpansion
( FOR /f "tokens=1-8* delims=," %%A in ('More +4 datatest.csv') do (
Set "H=%%H"
if "%%H"=="1" Set "H=1005"
if "%%H"=="3" Set "H=1008"
echo %%A,%%B,%%C,...,!H!,%%I
)
)>paygoinvoice.txt
#echo on
I'm new to batch files and this is a tricky question. In stores.csv there is a column called 'Image' which stores vertical-line-delimited image URLs as values. There are also additional columns called 'AltImage2', 'AltImage3', etc. How can I split the vertical-line-delimited string into columns that start with 'AltImage' for each row in the CSV? 'AltImage' columns only go to AltImage5, and there may not be five image URLs in a given row. I would also like to keep the first image URL in the 'Image' column if possible.
Example of headers and single row of data:
Company,Title,Image,AltImage2,AltImage3,AltImage4,AltImage5
Testco,U2X40,image1.png|image2.png|image3.png
Desired result after running batch:
Company,Title,Image,AltImage2,AltImage3,AltImage4,AltImage5
Testco,U2X40,image1.png,image2.png,image3.png
So far I've tried this:
for /f "tokens=3 delims=, " %%a in ("stores.csv") do (
echo run command here "%%a"
)
But cannot even echo the values in the Image column.
Here is a solution using Bash script (unfortunately I need batch): How do I split a string on a delimiter in Bash?
#echo off
setlocal
< stores.csv (
rem Read and write the header
set /P "header="
call echo %%header%%
rem Process the rest of lines
for /F "tokens=1-3 delims=|" %%a in ('findstr "^"') do echo %%a,%%b,%%c
)
I think this handles your parsing problem. Pay attention to quotes and the usebackq option.
for /f "skip=1 tokens=3 delims=," %%a in (stores.csv) do for /f "tokens=1-5 delims=|" %%b in ("%%a") do echo %%b %%c %%d %%e %%f
Here's a fuller solution to play with. There may be a more elegant way to handle optional commas. And you'll have to handle directing the output to whichever place is appropriate.
#echo off
setlocal enabledelayedexpansion
echo Company,Title,Image,AltImage2,AltImage3,AltImage4,AltImage5
for /f "skip=1 tokens=3 delims=," %%a in (stores.csv) do (
for /f "tokens=1-5 delims=|" %%b in ("%%a") do (
set line=%%b
if not "%%c"=="" set line=!line!,
set line=!line!%%c
if not "%%d"=="" set line=!line!,
set line=!line!%%d
if not "%%e"=="" set line=!line!,
set line=!line!%%e
if not "%%f"=="" set line=!line!,
set line=!line!%%f
echo !line!
)
)
read the file line by line and replace | with , (you have to escape the | and use delayed expansion:
#echo off
setlocal enabledelayedexpansion
(
for /f "delims=" %%a in (old.csv) do (
set line=%%a
echo !line:^|=,!
)
)>new.csv