Adding header to all .csv files in folder and include filename - csv

I'm a command line newbie and I'm trying to figure out how I can add a header to multiple .csv files. The new header should have the following: 'TaxID' and 'filename'
I've tried multiple commands like sed, ed, awk, echo but if it worked it only changed the first file it found (I said *.csv in my command) and I can only manage this for TaxID.
Can anyone help me to get the filename into the header as well and do this for all my csv files?
(Note, I'm using a Mac)
Thank you!

Here's one way to do it, there are certainly others:
$ for i in *.csv;do echo $i;cp "$i" "$i.bak" && { echo "TaxID,$i"; cat "$i.bak"; } >"$i";done
Here's a sample run:
$ cat file1.csv
1,2
3,4
$ cat file2.csv
a,b
c,d
$ for i in *.csv;do echo $i;cp "$i" "$i.bak" && { echo "TaxID,$i"; cat "$i.bak"; } >"$i";done
file1.csv
file2.csv
$ cat file1.csv.bak
1,2
3,4
$ cat file1.csv
TaxID,file1.csv
1,2
3,4
$ cat file2.csv.bak
a,b
c,d
$ cat file2.csv
TaxID,file2.csv
a,b
c,d
Breaking it down:
$ for i in *.csv; do
This loops over all the files ending in .csv in the current directory. Each will be put in the shell variable i in turn.
echo $i;
This just echoes the current filename so you can see the progress. This can be safely left out.
cp "$i" "$i.bak"
Copy the current file (whose name is in i) to a backup. This is both to preserve the file if something goes awry, and gives subsequent commands something to copy from.
&&
Only run the subsequent commands if the cp succeeds. If you can't make a backup, don't continue.
{
Start a group command.
echo "TaxID,$i";
Output the desired header.
cat "$i.bak";
Output the original file.
}
End the group command.
>"$i";
Redirect the output of the group command (the new header and the contents of the original file) to the original file. This completes one file.
done
Finish the loop over all the files.
For fun, here are a couple of other ways (one JRD beat me to), including one using ed!
$ for i in *.csv;do echo $i;perl -p -i.bak -e 'print "TaxID,$ARGV\n" if $. == 1' "$i";done
$ for i in *.csv;do echo $i;echo -e "1i\nTaxID,$i\n.\nw\nq\n" | ed "$i";done

Here is on way in perl that modifies the files in place by adding a header of TaxID,{filename}, ignoring adding the header if it thinks it already exists.
ls
a.csv b.csv
cat a.csv
1,a.txt
2,b.txt
cat b.csv
3,c.txt
4,d.txt
ls *.csv | xargs -I{} -n 1 \
perl -p -i -e 'print "TaxID,{}\n" if !m#^TaxID# && !$h; $h = 1;' {}
cat a.csv
TaxID,a.csv
1,a.txt
2,b.txt
cat b.csv
TaxID,b.csv
3,c.txt
4,d.txt
You may want to create some backups of your files, or run on a few sample copies before running in earnest.
Explanatory:
List all files in directory with .csv extenstion
ls *.csv
"Pipe" the output of ls command into xargs so the perl command can run for each file. -I{} allows the filename to be subsequently referenced with {}. -n tells xargs to only pass 1 file at a time to perl.
| xargs -I{} -n 1
-p print each line of the input (file)
-i modifying the file in place
-e execute the following code
perl -p -i -e
Perl will implicitly loop over each line of the file and print it (due to -p). Print the header if we have not printed the header already and the current line doesn't already look like a header.
'print "TaxID,{}\n" if !m#^TaxID# && !$h; $h = 1;'
This is replaced with the filename.
{}
All told, in this example the commands to be run would be:
perl -p -i -e 'print "TaxID,{}\n" if !m#^TaxID# && !$h; $h = 1;' a.csv
perl -p -i -e 'print "TaxID,{}\n" if !m#^TaxID# && !$h; $h = 1;' b.csv
perl -p -i -e 'print "TaxID,{}\n" if !m#^TaxID# && !$h; $h = 1;' c.csv
perl -p -i -e 'print "TaxID,{}\n" if !m#^TaxID# && !$h; $h = 1;' d.csv

Related

How should I dockerize a CMS, such that MySQL works nice with git?

I want to dockerize a MODX application for development and store it in git (again, for development.) There's a solution here, but all the MySQL files are now in binary, plus the database cares about their permissions. I'd like to either
put all of mysql's data in a single massive binary file, so I don't have to care about permissions and I can put it in LFS or
Somehow export the database to an SQL file on container shutdown and import it on launch, so I can use diffs.
So I've actually implemented a partial solution to your problem (though I'm still learning to use docker, this potential solution encapsulates everything else).
I use MODx as my CMS of choice, however, theoretically this should work for other CMS'es too.
In my git workflow, I have a pre-commit hook set to mysqldump the database into a series of SQL files, which when implementing in production represent the inputs into mysql to recreate the entire database.
Some of the code in the following example is not directly related to the answer, and it is also worth noting that I personally implement a final column within each table of the database that actually separates out different rows of data into different branches of the git repo (because my chosen workflow involves 3 parallel branches, each for local development, staging, and production respectively).
The sample code below is a pre-commit hook of one of my older projects, which I no longer use, but the same code is still relatively in use (with a few exceptions unrelated to this post). It goes far beyond the question, because it is verbatim from my repo, but perhaps it might spark some inspiration.
In this example you'll also see references to "lists" which are text files containing the various individual repos and some settings, which are imploded into bash associative arrays, which requires bash 4.0 or higher. There is also a reference to 'mysql-defaults' which is a text file that contains my database credentials so the script can run without interruption.
#!/bin/bash
# Set repository and script variables
REPO_NAME='MODX';REPO_FOLDER='modx';REPO_KEY='modx';REPO_TYPE='MODX';
declare -a REPO_PREFIX_COUNTS=();
MODULES_STRING=$(cat /Users/cjholowatyj/Dev/modules-list | tr "\n" " ");MODULES_ARRAY=(${MODULES_STRING});# echo ${MODULES_ARRAY[1]};
PROJECTS_STRING=$(cat /Users/cjholowatyj/Dev/projects-list | tr "\n" " ");PROJECTS_ARRAY=(${PROJECTS_STRING});# echo ${PROJECTS_ARRAY[1]};
THEMES_STRING=$(cat /Users/cjholowatyj/Dev/themes-list | tr "\n" " ");THEMES_ARRAY=(${THEMES_STRING});# echo ${THEMES_ARRAY[1]};
alias mysql='/Applications/MAMP/Library/bin/mysql --defaults-file=.git/hooks/mysql-defaults';
alias dump='/Applications/MAMP/Library/bin/mysqldump --defaults-file=.git/hooks/mysql-defaults';
alias dump-compact='/Applications/MAMP/Library/bin/mysqldump --defaults-file=.git/hooks/mysql-defaults --no-create-info --skip-add-locks --skip-disable-keys --skip-comments --skip-extended-insert --compact';
shopt -s expand_aliases
# Print status message in terminal console
/bin/echo "Running ${REPO_NAME} Pre-Commits...";
# Switch to repository directory
# shellcheck disable=SC2164
cd "/Users/cjholowatyj/Dev/${REPO_FOLDER}/";
# Fetch database tables dedicated to this repository
mysql -N information_schema -e "select table_name from tables where table_schema = 'ka_local2019' and table_name like '${REPO_KEY}_%'" | tr '\n' ' ' > sql/${REPO_KEY}_tables.txt;
tablesExist=$(wc -c "sql/${REPO_KEY}_tables.txt" | awk '{print $1}')
# Reset pack_ sql files
if [[ -f sql/pack_structure.sql ]]; then rm sql/pack_structure.sql; fi
if [[ -f sql/pack_data.sql ]]; then rm sql/pack_data.sql; fi
touch sql/pack_structure.sql
touch sql/pack_data.sql
dump --add-drop-database --no-create-info --no-data --skip-comments --databases ka_local2019 >> sql/pack_structure.sql
# Process repository tables & data
if [[ ${tablesExist} -gt 0 ]]; then
dump --no-data --skip-comments ka_local2019 --tables `cat sql/${REPO_KEY}_tables.txt` >> sql/pack_structure.sql
dump-compact ka_local2019 --tables `cat sql/${REPO_KEY}_tables.txt` --where="flighter_key IS NULL" >> sql/pack_data.sql
sed -i "" "s/AUTO_INCREMENT=[0-9]+[ ]//g" sql/pack_structure.sql
fi
dump-compact ka_local2019 --where="flighter_key='${REPO_KEY}'" >> sql/pack_data.sql
isLocalHead=$(grep -c cjholowatyj .git/HEAD);
if [[ ${isLocalHead} = 1 ]]; then
dump-compact ka_local2019 --where="flighter_key='${REPO_KEY}-local'" >> sql/pack_data.sql
sed -i "" "s/\.\[${REPO_KEY}-local]//g" sql/pack_data.sql
fi
isDevelopHead=$(grep -c develop .git/HEAD);
if [[ ${isDevelopHead} = 1 ]]; then
dump-compact ka_local2019 --where="flighter_key='${REPO_KEY}-develop'" >> sql/pack_data.sql
sed -i "" "s/\.\[${REPO_KEY}-develop]//g" sql/pack_data.sql
sed -i "" "s/ka_local2019/ka_dev2019/g" sql/pack_structure.sql
sed -i "" "s/ka_local2019/ka_dev2019/g" sql/pack_structure.sql
fi
isReleaseHead=$(grep -c release .git/HEAD);
if [[ ${isReleaseHead} = 1 ]]; then
dump-compact ka_local2019 --where="flighter_key='${REPO_KEY}-release'" >> sql/pack_data.sql
sed -i "" "s/\.\[${REPO_KEY}-release]//g" sql/pack_data.sql
sed -i "" "s/ka_local2019/ka_rel2019/g" sql/pack_structure.sql
sed -i "" "s/ka_local2019/ka_rel2019/g" sql/pack_structure.sql
fi
# Create master structure sql file for this repository (and delete it once again if it is empty)
awk '/./ { e=0 } /^$/ { e += 1 } e <= 1' < sql/pack_structure.sql > sql/${REPO_KEY}_structure.sql
structureExists=$(wc -c "sql/${REPO_KEY}_structure.sql" | awk '{print $1}')
if [[ ${structureExists} -eq 0 ]]; then rm sql/${REPO_KEY}_structure.sql; fi
# Create master sql data file in case the entire database needs to be rebuilt from scratch
awk '/./ { e=0 } /^$/ { e += 1 } e <= 1' < sql/pack_data.sql > sql/all_${REPO_KEY}_data.sql
# Commit global repository sql files
git add sql/all_${REPO_KEY}_data.sql
if [[ ${structureExists} -gt 0 ]]; then git add sql/${REPO_KEY}_structure.sql; fi
# Deleting any existing sql files to recreate them fresh below
if [[ -f sql/create_modx_data.sql ]]; then rm sql/create_modx_data.sql; fi
if [[ -f sql/create_flighter_data.sql ]]; then rm sql/create_flighter_data.sql; fi
for i in "${MODULES_ARRAY[#]}"
do
if [[ -f sql/create_${i}_data.sql ]]; then rm sql/create_${i}_data.sql; fi
done
if [[ -f sql/create_${REPO_KEY}_data.sql ]]; then rm sql/create_${REPO_KEY}_data.sql; fi
# Parse global repository data and separate out data filed by table prefix
lastPrefix='';
lastTable='';
while IFS= read -r iLine;
do
thisLine="${iLine}";
thisPrefix=$(echo ${thisLine} | grep -oEi '^INSERT INTO `([0-9a-zA-Z]+)_' | cut -d ' ' -f 3 | cut -d '`' -f 2 | cut -d '_' -f 1);
thisTable=$(echo ${thisLine} | grep -oEi '^INSERT INTO `([0-9a-zA-Z_]+)`' | cut -d ' ' -f 3 | cut -d '`' -f 2);
if [[ $(echo -n ${thisPrefix} | wc -m) -gt 0 ]]; then
if [[ -n "${REPO_PREFIX_COUNTS[$thisPrefix]}" ]]; then
if [[ ${REPO_PREFIX_COUNTS[$thisPrefix]} -lt 1 ]]; then
if [[ -f sql/create_${thisPrefix}_data.sql ]]; then rm sql/create_${thisPrefix}_data.sql; fi
touch "sql/create_${thisPrefix}_data.sql";
fi
REPO_PREFIX_COUNTS[$thisPrefix]=0;
fi
REPO_PREFIX_COUNTS[$thisPrefix]+=1;
echo "${thisLine}" >> sql/create_${thisPrefix}_data.sql;
if [[ ${thisTable} != ${lastTable} ]]; then
if [[ ${thisPrefix} != ${lastPrefix} ]]; then
if [[ -f sql/delete_${thisPrefix}_data.sql ]]; then rm sql/delete_${thisPrefix}_data.sql; fi
touch "sql/delete_${thisPrefix}_data.sql";
fi
if [[ $(echo -n ${thisTable} | wc -m) -gt 0 ]]; then
echo "DELETE FROM \`${thisTable}\` WHERE \`flighter_key\` LIKE '${REPO_KEY}%';" >> sql/delete_${thisPrefix}_data.sql
fi
fi
# Add previous prefix sql file to git if lastPrefix isn't ''
if [[ $(echo -n ${lastPrefix} | wc -m) -gt 0 ]]; then
git add "sql/create_${lastPrefix}_data.sql";
git add "sql/delete_${lastPrefix}_data.sql";
fi
fi
lastPrefix=${thisPrefix};
lastTable=${thisTable};
done < sql/all_${REPO_KEY}_data.sql
# Add previous prefix sql file to git for the final lastPrefix value
git add "sql/create_${lastPrefix}_data.sql";
git add "sql/delete_${lastPrefix}_data.sql";
# Clean up unused files
rm "sql/${REPO_KEY}_tables.txt";
rm "sql/pack_data.sql";
rm "sql/pack_structure.sql";
git add sql/;
A couple nuances worth noting are... (1) My code strips out all the auto_increment cursors from each table because they were creating a lot of unnecessary changes in the sql files and that ended up making commits more complicated. (2) My code also strips out the database name itself, because on the production server, I will be specifying the database that will be used and it is not the same database name as the one I use for local development and we don't want the data going to the wrong place. (3) This workflow also separates database structure and the data itself in the files I commit to git, which may be a source of confusion if you didn't already pick up on that.
On the flip side, when implementing the project on a server, I also have code which intuitively iterates through all the *.sql files and imports them into my database one at a time. I won't share the exact code for security reasons, but the general gist is... mysql mysql_database < database_file.sql

Getting unexpect end of file with control-m characters

I have a simple script which give me an unexpected end of file. Everything seems good to me
#!/bin/bash
me="$(basename "$(test -L "$0" && readlink "$0" || echo "$0")")"
if [ $# -ge 5 ]; then
echo "OK"
else
echo "$me <arg1> <arg2> <arg3> <arg4> <arg5>"
fi
After checking with OP in comments got to know that OP may have got control M characters in your file use tr -d '\r' < Input_file > temp_file && mv temp_file Input_file put your script's actual name in place of Input_file and try this command and you should be good then.

while loop calling function but only for first line, Serverlist.txt contains multiple server details

I am trying to catch the log, Serverlist.txt contains some servers details like root 10.0.0.1 22 TestServer, while I run the script it only read the first line and exit, its not working for further lines. Below is my script.
newdate1=`date -d "yesterday" '+%b %d' | sed 's/0/ /g'`
newdate2=`date -d "yesterday" '+%d/%b/%Y'`
newdate3=`date -d "yesterday" '+%y%m%d'`
DL=/opt/$newdate3
Serverlist=/opt/Serverlist.txt
serverlog()
{
mkdir -p $DL/$NAME
ssh -p$PORT $USER#$IP "cat /var/log/messages*|grep '$newdate1'"|cat > $DL/$NAME/messages.log
}
while read USER IP PORT NAME
do
serverlog
sleep 1;
done <<<"$Serverlist"
Use < instead of <<<. <<<is a Here String substitution. The right side is evaluated, and then the result is read from the loop as standard input:
$ FILE="my_file"
$ cat $FILE
First line
Last line
$ while read LINE; do echo $LINE; done <$FILE
First line
Last line
$ set -x
$ while read LINE; do echo $LINE; done <<<$FILE
+ read LINE
+ echo my_file
my_file
+ read LINE
$ while read LINE; do echo $LINE; done <<<$(ls /home)
++ ls /home
+ read LINE
+ echo antxon install lost+found
antxon install lost+found
+ read LINE
$
I got the answer from another link.
you can use "-n" option in ssh, this will not break the loop and you will get the desired result.

Extract a table from a MySQL dump with tac, grep and sed...in reverse

I was following this guide on extracting a table from a mysql dump with grep, so I wouldn't have to restore all 50GB of data to have a peek at one table. The two main commands to pull the table are:
grep -n "Table structure" [MySQL_dump_filename].sql
which gets the line numbers for table definitions, then
sed -n '[starting_line_number],[ending_line_number] p' [MySQL_dump_filename].sql > [table_output_filename].sql
I would like to search the .sql dump in reverse order though, as what I need is towards the end of the file and will take quite awhile to grep though the first 48GB of data. I'm on OS X and installed tac (via brew as noted here). But is it possible to setup the command to accomplish this and have it quit after sed grabs the needed lines? If not I might as well grep from the beginning and not tac at all, just wait it out. Or ctrl-c once I see the file populated in another terminal.
Example run:
$ tac dump.sql | grep -n "Table structure"
...
751:-- Table structure for table `answer`
779:-- Table structure for table `template`
806:-- Table structure for table `resource`
...
But of course those are the line numbers in reverse, so if you need the 'template' table you would need to sed -n '752,779 p', but from the end of the file otherwise you'll get the wrong line number (sed will count from the beginning of the file).
a few quick pointers:
dd can help you to skip very fast N bytes/blocks/whatever if you are sure those first N gb are not usefull
after skipping, no need to 1) grep to find line number then 2) sed to skip until line number n (reading twice the huge remaining): you could directly:
awk '/beginningpattern/,/endpattern/ { print $0 ; }' #warning: syntax uncomplete, better read about awk and its prowess. You can do all sort of neat stuff.
Here is a more streamlined way to know where all table definitions begin and end.
For a give file rolando.sql, create a script that does the following:
DAT=rolando.sql
TBLMAP=tblmap.txt
TBLLST=${TBLLST}.lst
TBLTMP=${TBLMAP}.tmp
RUNMAP=DisplayTables.sh
grep -n "^-- Table structure" ${DAT} |sed 's/:/ /'| awk '{print $1}' > ${TBLTMP}
grep -n "^) ENGINE=" ${DAT} |sed 's/:/ /'| awk '{print $1}' >> ${TBLTMP}
sort -n < ${TBLTMP} > ${TBLLST}
rm -f ${TBLTMP}
rm -f ${TBLMAP}
POS=1
for X in `cat ${TBLLST}`
do
(( POS = 1 - POS ))
if [ ${POS} -eq 0 ]
then
(( Y = X - 2 ))
fi
if [ ${POS} -eq 1 ]
then
echo "${Y},${X}" >> ${TBLMAP}
fi
done
rm -f ${TBLLST}
echo "Table Structures From ${DAT}"
for XY in `cat ${TBLMAP}`
do
echo "sed -n '${XY}p' ${DAT}" >> ${RUNMAP}
done
chmod +x ${RUNMAP}
./${RUNMAP}
This script will output every create table statement for you. It will include the DROP TABLE statements also. If you do not want the drop table statements, you this one:
DAT=rolando.sql
TBLMAP=tblmap.txt
TBLLST=${TBLLST}.lst
TBLTMP=${TBLMAP}.tmp
RUNMAP=DisplayTables.sh
grep -n "^CREATE TABLE" ${DAT} | sed 's/:/ /' | awk '{print $1}' > ${TBLTMP}
grep -n "^) ENGINE=" ${DAT} | sed 's/:/ /' | awk '{print $1}' >> ${TBLTMP}
sort -n < ${TBLTMP} > ${TBLLST}
rm -f ${TBLTMP}
rm -f ${TBLMAP}
POS=1
for X in `cat ${TBLLST}`
do
(( POS = 1 - POS ))
if [ ${POS} -eq 0 ]
then
(( Y = X ))
fi
if [ ${POS} -eq 1 ]
then
echo "${Y},${X}" >> ${TBLMAP}
fi
done
rm -f ${TBLLST}
echo echo "Table Structures From ${DAT}" > ${RUNMAP}
for XY in `cat ${TBLMAP}`
do
echo "sed -n '${XY}p' ${DAT}" >> ${RUNMAP}
done
chmod +x ${RUNMAP}
./${RUNMAP}
Give it a Try !!!

How do I find files that do not end with a newline/linefeed?

How can I list normal text (.txt) filenames, that don't end with a newline?
e.g.: list (output) this filename:
$ cat a.txt
asdfasdlsad4randomcharsf
asdfasdfaasdf43randomcharssdf
$
and don't list (output) this filename:
$ cat b.txt
asdfasdlsad4randomcharsf
asdfasdfaasdf43randomcharssdf
$
Use pcregrep, a Perl Compatible Regular Expressions version of grep which supports a multiline mode using -M flag that can be used to match (or not match) if the last line had a newline:
pcregrep -LMr '\n\Z' .
In the above example we are saying to search recursively (-r) in current directory (.) listing files that don't match (-L) our multiline (-M) regex that looks for a newline at the end of a file ('\n\Z')
Changing -L to -l would list the files that do have newlines in them.
pcregrep can be installed on MacOS with the homebrew pcre package: brew install pcre
Ok it's my turn, I give it a try:
find . -type f -print0 | xargs -0 -L1 bash -c 'test "$(tail -c 1 "$0")" && echo "No new line at end of $0"'
If you have ripgrep installed:
rg -l '[^\n]\z'
That regular expression matches any character which is not a newline, and then the end of the file.
Give this a try:
find . -type f -exec sh -c '[ -z "$(sed -n "\$p" "$1")" ]' _ {} \; -print
It will print filenames of files that end with a blank line. To print files that don't end in a blank line change the -z to -n.
If you are using 'ack' (http://beyondgrep.com) as a alternative to grep, you just run this:
ack -v '\n$'
It actually searches all lines that don't match (-v) a newline at the end of the line.
The best oneliner I could come up with is this:
git grep --cached -Il '' | xargs -L1 bash -c 'if test "$(tail -c 1 "$0")"; then echo "No new line at end of $0"; exit 1; fi'
This uses git grep, because in my use-case I want to ensure files commited to a git branch have ending newlines.
If this is required outside of a git repo, you can of course just use grep instead.
grep -RIl '' . | xargs -L1 bash -c 'if test "$(tail -c 1 "$0")"; then echo "No new line at end of $0"; exit 1; fi'
Why I use grep? Because you can easily filter out binary files with -I.
Then the usual xargs/tail thingy found in other answers, with the addition to exit with 1 if a file has no newline. So this can be used in a pre-commit githook or CI.
This should do the trick:
#!/bin/bash
for file in `find $1 -type f -name "*.txt"`;
do
nlines=`tail -n 1 $file | grep '^$' | wc -l`
if [ $nlines -eq 1 ]
then echo $file
fi
done;
Call it this way: ./script dir
E.g. ./script /home/user/Documents/ -> lists all text files in /home/user/Documents ending with \n.
This is kludgy; someone surely can do better:
for f in `find . -name '*.txt' -type f`; do
if test `tail -c 1 "$f" | od -c | head -n 1 | tail -c 3` != \\n; then
echo $f;
fi
done
N.B. this answers the question in the title, which is different from the question in the body (which is looking for files that end with \n\n I think).
Most solutions on this page do not work for me (FreeBSD 10.3 amd64). Ian Will's
OSX solution does almost-always work, but is pretty difficult to follow : - (
There is an easy solution that almost-always works too : (if $f is the file) :
sed -i '' -e '$a\' "$f"
There is a major problem with the sed solution : it never gives you the
opportunity to just check (and not append a newline).
Both the above solutions fail for DOS files. I think the most
portable/scriptable solution is probably the easiest one,
which I developed myself : - )
Here is that elementary sh script which combines file/unix2dos/tail. In
production, you will likely need to use "$f" in quotes and fetch tail output
(embedded into the shell variable named last) as \"$f\"
if file $f | grep 'ASCII text' > /dev/null; then
if file $f | grep 'CRLF' > /dev/null; then
type unix2dos > /dev/null || exit 1
dos2unix $f
last="`tail -c1 $f`"
[ -n "$last" ] && echo >> $f
unix2dos $f
else
last="`tail -c1 $f`"
[ -n "$last" ] && echo >> $f
fi
fi
Hope this helps someone.
This example
Works on macOS (BSD) and GNU/Linux
Uses standard tools: find, grep, sh, file, tail, od, tr
Supports paths with spaces
Oneliner:
find . -type f -exec sh -c 'file -b "{}" | grep -q text' \; -exec sh -c '[ "$(tail -c 1 "{}" | od -An -a | tr -d "[:space:]")" != "nl" ]' \; -print
More readable version
Find under current directory
Regular files
That 'file' (brief mode) considers text
Whose last byte (tail -c 1) is not represented by od's named character "nl"
And print their paths
#!/bin/sh
find . \
-type f \
-exec sh -c 'file -b "{}" | grep -q text' \; \
-exec sh -c '[ "$(tail -c 1 "{}" | od -An -a | tr -d "[:space:]")" != "nl" ]' \; \
-print
Finally, a version with a -f flag to fix the offending files (requires bash).
#!/bin/bash
# Finds files without final newlines
# Pass "-f" to also fix those files
fix_flag="$([ "$1" == "-f" ] && echo -true || echo -false)"
find . \
-type f \
-exec sh -c 'file -b "{}" | grep -q text' \; \
-exec sh -c '[ "$(tail -c 1 "{}" | od -An -a | tr -d "[:space:]")" != "nl" ]' \; \
-print \
$fix_flag \
-exec sh -c 'echo >> "{}"' \;
Another option:
$ find . -name "*.txt" -print0 | xargs -0I {} bash -c '[ -z "$(tail -n 1 {})" ] && echo {}'
Since your question has the perl tag, I'll post an answer which uses it:
find . -type f -name '*.txt' -exec perl check.pl {} +
where check.pl is the following:
#!/bin/perl
use strict;
use warnings;
foreach (#ARGV) {
open(FILE, $_);
seek(FILE, -2, 2);
my $c;
read(FILE,$c,1);
if ( $c ne "\n" ) {
print "$_\n";
}
close(FILE);
}
This perl script just open, one per time, the files passed as parameters and read only the next-to-last character; if it is not a newline character, it just prints out the filename, else it does nothing.
This example works for me on OSX (many of the above solutions did not)
for file in `find . -name "*.java"`
do
result=`od -An -tc -j $(( $(ls -l $file | awk '{print $5}') - 1 )) $file`
last_char=`echo $result | sed 's/ *//'`
if [ "$last_char" != "\n" ]
then
#echo "Last char is .$last_char."
echo $file
fi
done
Here another example using little bash build-in commands and which:
allows you to filter for extension (e.g. | grep '\.md$' filters only the md files)
pipe more grep commands for extending the filter (like exclusions | grep -v '\.git' to exclude the files under .git
use the full power of grep parameters to for more filters or inclusions
The code basically, iterates (for) over all the files (matching your chosen criteria grep) and if the last 1 character of a file (-n "$(tail -c -1 "$file")") is not not a blank line, it will print the file name (echo "$file").
The verbose code:
for file in $(find . | grep '\.md$')
do
if [ -n "$(tail -c -1 "$file")" ]
then
echo "$file"
fi
done
A bit more compact:
for file in $(find . | grep '\.md$')
do
[ -n "$(tail -c -1 "$file")" ] && echo "$file"
done
and, of course, the 1-liner for it:
for file in $(find . | grep '\.md$'); do [ -n "$(tail -c -1 "$file")" ] && echo "$file"; done