append multiple .csv files - csv

I have several thousand .csv files in a folder and am trying to use the cdm to append them. Each file is the same table with top header and bottom notes. For example,
121030_2003.csv
121030_2004.csv
...
121031_2003.csv
121031_2004.csv
...
I tried copy *.csv all.csv from cmd and I would like to add code for the resulting file to have:
the header reported only once at the beginning, and possibly no notes
an additional column, reporting the name of the source file to keep track of it.

I think you can use cat in Linux.
e.g. for appending 1.txt 2.txt 3.txt to an output file 0.txt
cat 1.txt 2.txt 3.txt > 0.txt
csv files are just like txt files so it will be the same in your case, except that 'type' is used in cmd for 'cat'.

Related

How to copy or move multiple files with same extension?

So I am trying to move a bunch of files with similar extensions from /home/ to /root/
Code I tried is
file copy /home/*.abc.xyz /root/
Also tried
set infile [glob -nocomplain /home/*.abc.xyz ]
if { [llength $infile] > 0 } {
file copy $infile /root/
}
No success.
Your two attempts fail for different reasons:
There is no wildcard expansion in arguments to file copy, or any Tcl command, for that matter: file copy /home/*.abc.xyz /root/. This will look for a single source with a literal * in its filename.
glob -nocomplain /home/*.abc.xyz is ok to collect the sources, but glob returns a list of sources. file copy requires each source to passed as a separate argument, not a single one. To expand a single collection value of source files into a multiple separate arguments, use the Tcl expansion operator {*}
Therefore:
set infiles [glob -nocomplain *.tcl]
if {[llength $infiles]} {
file copy {*}$infiles /tmp/tgt/
}
For a 1-line answer:
file copy {*}[glob /home/*.abc.xyz] /root/.
The file copy (and file rename) commands have two forms (hence the reference to the manual page in the comment). The first form copies a single file to a new target. The second form copies all the file name arguments to a new directory and this form of the command insists that the directory name be the last argument and you may have an arbitrary number of source file names preceding. Also, file copy does not do glob expansion on its arguments, so as you rightly surmised, you also need to use the glob command to obtain a list of the files to copy. The problem is that the glob command returns a list of file names and you passed that list as a single argument, i.e.
file copy $infile /root/
passes the list as a single argument and so the file copy command thinks it is dealing with the first form and attempts to find a file whose name matches that of the entire list. This file probably doesn't exist. Placing the error message in your question would have helped us to know for sure.
So what you want to do is take the list of files contained in the infile variable and expand it into separate argument words. Since this is a common situation, Tcl has some syntax to help (assuming you are not using some ancient version of Tcl). Try using the command:
file copy {*}$infile /root/
in place of your first attempt and see if that helps the situation.

Combine multiple csv files with same prefix in CMD?

I have a folder with several csv files. They are labeled like this:
Name1-year.csv
Name1-year.csv
Name2-year.csv
Name2-year.csv
Name3-year.csv
etc...
I know you can use copy *.csv combined.csv to combine all the csv files in a directory, but is there a way to combine files with the same prefix?
You can write
copy prefix* copiedfile.csv
To filter for a certain prefix use this:
copy "prefix*.csv" "combined.csv" /B
The /B tells copy to treat the destination a binary file, hence no EOF/SUB character (ASCII 0x1A) becomes appended.

Merge csv in another folder location

I'm trying to merge csv files into one text file using a batch file.
My batch file is located in C:\Users\aallen and the CSV files are located in C:\Users\aallen\Test
The batch file will only work when its located in the same location as the csv.
I have tried the following commands with no joy:
1) cd "C:\Users\aallen\Test" copy *csv test.csv
2) copy "C:\Users\aallen\Test" *csv test.csv
What I'm I missing?
Collecting the information from Question and Comments, you want to combine several CSV files into one, but only keep the headerline once.
more +1 is able to show a file, skipping first lines (see more /?), but more +1 *.csv does only skip the first line of the first file and keeps it at all other files (just the opposite of what you need). So you have to process one file after the other with a for loop and check for first file yourself (can be done with a flag-variable (flag here). Redirect the whole loop to your resultfile.
#echo off
set "first=yes"
(for %%a in ("C:\Users\aallen\Test\*.csv") do (
if defined first (
type "%%a"
set "first="
) else (
more +1 "%%a"
)
))>"d:\new location\test.csv"
Note: more at command line prints just one screen and then pauses until you press a key. But when you use it in a batchfile, it doesn't (well, to be honest, it does, but after ~65000 lines. I hope, your files are shorter)

File not found: trying to append csv data in Stata

I have some .csv files in the same directory and I am trying to append these in Stata. But when I use append, Stata cannot find the next file. My code is the following:
cd "C:\mydir"
insheet using "file1.csv", clear
append using "file2.csv"
With the last line, I obtain the following error:
file file2.csv not found
I have more expertise with R and I know this procedure is similar to rbind.
You can't append a .csv file to a Stata dataset that is produced by insheet. Save the .csv files as Stata files, insheet the last, and then append the Stata ones to that.

awk processing files with different extensions

I have to process multiple CSV and TXT files in one awk script. My cmd file on windows looks like: gawk -f script.awk *.csv *.txt > output.file
I'd like to use this cmd file as I don't want to always type into the command prompt whenever I want to run the script. I would like to perform different tasks with the different file types. I have tried some stuff inside the script file like if (match(FILENAME, ".csv")) && (FNR > 1) but none of them were working. I have about 4-5 CSV files and a lot of (like 1000+) TXT files, these are all input files. The content of the CSV files are all in the same schema, one column between quotes. Example:
"Player"
"adigabor"
I want to ignore the first line of all the input CSV files when processing them and add each record w/o the quotes into an array and after that I'd like to process the TXT files which I can do just fine, my problem is that I couldn't perform the different tasks with the different input file extensions in one script.
It would be extremely useful if you told us in what way "none of them were working" so we're not just guessing but here goes anyway:
The main problem with match(FILENAME, ".csv") is it'll match csv preceded by any char anywhere in the file name. To get files that end in literally .csv you want:
match(FILENAME,/\.csv$/)
but you don't need to call a function for that:
FILENAME ~ /\.csv$/
So your script would look like:
FILENAME ~ /\.csv$/ {
if ( FNR > 1 ) {
do CSV stuff
}
next
}
{
do TXT stuff
}
If you still can't do whatever you're trying to do then edit your question to include sample input files (at least one of each small .csv and .txt files) and expected output along with a better explanation of what you are trying to do.