Insert text from bash script to mysql database and losing format - mysql

I writing a script, which run backup with rsync and the output of rsync insert to database.
Its working well except one little thing. The text in the DB is one line and there is not 'new line' bunch of spaces or tabulator. However if I writing the query to the screen, is fine. Its looking what i want in db cell.
Can anyone help in this issue?
Here is the script:
#!/bin/bash
DATUM=$(date +%Y-%m-%d)
IDO=$(date +%H:%M:%S)
LOG_FAJL="/media/2TB/MENTES_PC/log/"$DATUM"_"$IDO"_rsync_2TB.log"
LOG_FAJL_SED="/media/2TB/MENTES_PC/log/"$DATUM"_"$IDO"_rsync_2TB_SED.log"
CEL="/media/BACKUP_2TB/"
FORRAS="/media/2TB/"
KIZARAS1="--exclude='/media/2TB/lost+found/'" -- exclude='/media/2TB/.Trash-1000/'
DB_USER="user"
DB_PASSWORD="secret"
DB="xx"
TABLA="table"
if [ ! -f "$LOG_FAJL" ];
then
printf "%b" "\\\\n\\n\\t\\t\\t\\t\\t\\t\\t\\t##########################################" > "$LOG_FAJL" # more fancy header...
fi
printf "%b" "\\n\\n\\n A mentés készült: $(date +%Y-%m-%d_%H:%M:%S) \\n" >> "$LOG_FAJL"
printf "%b" " *************************************\\n" >> "$LOG_FAJL"
KEZDES=$(date +%s)
printf "%b" "\\n "$FORRAS" mentése ide: "$CEL >> "$LOG_FAJL"
printf "%b" "\\n ==========================================\\n\\n" >> "$LOG_FAJL"
rsync -azvh --delete "$KIZARAS1" --stats --human-readable $FORRAS $CEL >> "$LOG_FAJL"
VEGE=$(date +%s)
ELTELT=$(($VEGE-$KEZDES))
printf "%b" "\\n\\n A mentés tartott: " >> "$LOG_FAJL"
printf "%02dh:%02dm:%02ds" "$(($ELTELT/3600))" "$(($ELTELT%3600/60))" "$(($ELTELT%60))" >> "$LOG_FAJL"
printf "%b" "-ig\\n ********************************" >> "$LOG_FAJL"
cat "$LOG_FAJL" | sed "s/'/\\\'/g" > "$LOG_FAJL_SED"
QUERY="INSERT INTO "$TABLA" (datum,ido,log) VALUES ('"$DATUM"','"$IDO"','"$(cat "$LOG_FAJL_SED")"');"
echo "$QUERY"
mysql -u$DB_USER -p$DB_PASSWORD $DB -e "$QUERY"

Curious what application you are using to view the inserted data, Mysql Workbench, Squirrel, etc?
I suspect that the newline/carriage return is actually there, it is just that the app is not showing them.
You can use mysqldump to extract the data to a file and see if the newlines are in the output.

Related

How should I dockerize a CMS, such that MySQL works nice with git?

I want to dockerize a MODX application for development and store it in git (again, for development.) There's a solution here, but all the MySQL files are now in binary, plus the database cares about their permissions. I'd like to either
put all of mysql's data in a single massive binary file, so I don't have to care about permissions and I can put it in LFS or
Somehow export the database to an SQL file on container shutdown and import it on launch, so I can use diffs.
So I've actually implemented a partial solution to your problem (though I'm still learning to use docker, this potential solution encapsulates everything else).
I use MODx as my CMS of choice, however, theoretically this should work for other CMS'es too.
In my git workflow, I have a pre-commit hook set to mysqldump the database into a series of SQL files, which when implementing in production represent the inputs into mysql to recreate the entire database.
Some of the code in the following example is not directly related to the answer, and it is also worth noting that I personally implement a final column within each table of the database that actually separates out different rows of data into different branches of the git repo (because my chosen workflow involves 3 parallel branches, each for local development, staging, and production respectively).
The sample code below is a pre-commit hook of one of my older projects, which I no longer use, but the same code is still relatively in use (with a few exceptions unrelated to this post). It goes far beyond the question, because it is verbatim from my repo, but perhaps it might spark some inspiration.
In this example you'll also see references to "lists" which are text files containing the various individual repos and some settings, which are imploded into bash associative arrays, which requires bash 4.0 or higher. There is also a reference to 'mysql-defaults' which is a text file that contains my database credentials so the script can run without interruption.
#!/bin/bash
# Set repository and script variables
REPO_NAME='MODX';REPO_FOLDER='modx';REPO_KEY='modx';REPO_TYPE='MODX';
declare -a REPO_PREFIX_COUNTS=();
MODULES_STRING=$(cat /Users/cjholowatyj/Dev/modules-list | tr "\n" " ");MODULES_ARRAY=(${MODULES_STRING});# echo ${MODULES_ARRAY[1]};
PROJECTS_STRING=$(cat /Users/cjholowatyj/Dev/projects-list | tr "\n" " ");PROJECTS_ARRAY=(${PROJECTS_STRING});# echo ${PROJECTS_ARRAY[1]};
THEMES_STRING=$(cat /Users/cjholowatyj/Dev/themes-list | tr "\n" " ");THEMES_ARRAY=(${THEMES_STRING});# echo ${THEMES_ARRAY[1]};
alias mysql='/Applications/MAMP/Library/bin/mysql --defaults-file=.git/hooks/mysql-defaults';
alias dump='/Applications/MAMP/Library/bin/mysqldump --defaults-file=.git/hooks/mysql-defaults';
alias dump-compact='/Applications/MAMP/Library/bin/mysqldump --defaults-file=.git/hooks/mysql-defaults --no-create-info --skip-add-locks --skip-disable-keys --skip-comments --skip-extended-insert --compact';
shopt -s expand_aliases
# Print status message in terminal console
/bin/echo "Running ${REPO_NAME} Pre-Commits...";
# Switch to repository directory
# shellcheck disable=SC2164
cd "/Users/cjholowatyj/Dev/${REPO_FOLDER}/";
# Fetch database tables dedicated to this repository
mysql -N information_schema -e "select table_name from tables where table_schema = 'ka_local2019' and table_name like '${REPO_KEY}_%'" | tr '\n' ' ' > sql/${REPO_KEY}_tables.txt;
tablesExist=$(wc -c "sql/${REPO_KEY}_tables.txt" | awk '{print $1}')
# Reset pack_ sql files
if [[ -f sql/pack_structure.sql ]]; then rm sql/pack_structure.sql; fi
if [[ -f sql/pack_data.sql ]]; then rm sql/pack_data.sql; fi
touch sql/pack_structure.sql
touch sql/pack_data.sql
dump --add-drop-database --no-create-info --no-data --skip-comments --databases ka_local2019 >> sql/pack_structure.sql
# Process repository tables & data
if [[ ${tablesExist} -gt 0 ]]; then
dump --no-data --skip-comments ka_local2019 --tables `cat sql/${REPO_KEY}_tables.txt` >> sql/pack_structure.sql
dump-compact ka_local2019 --tables `cat sql/${REPO_KEY}_tables.txt` --where="flighter_key IS NULL" >> sql/pack_data.sql
sed -i "" "s/AUTO_INCREMENT=[0-9]+[ ]//g" sql/pack_structure.sql
fi
dump-compact ka_local2019 --where="flighter_key='${REPO_KEY}'" >> sql/pack_data.sql
isLocalHead=$(grep -c cjholowatyj .git/HEAD);
if [[ ${isLocalHead} = 1 ]]; then
dump-compact ka_local2019 --where="flighter_key='${REPO_KEY}-local'" >> sql/pack_data.sql
sed -i "" "s/\.\[${REPO_KEY}-local]//g" sql/pack_data.sql
fi
isDevelopHead=$(grep -c develop .git/HEAD);
if [[ ${isDevelopHead} = 1 ]]; then
dump-compact ka_local2019 --where="flighter_key='${REPO_KEY}-develop'" >> sql/pack_data.sql
sed -i "" "s/\.\[${REPO_KEY}-develop]//g" sql/pack_data.sql
sed -i "" "s/ka_local2019/ka_dev2019/g" sql/pack_structure.sql
sed -i "" "s/ka_local2019/ka_dev2019/g" sql/pack_structure.sql
fi
isReleaseHead=$(grep -c release .git/HEAD);
if [[ ${isReleaseHead} = 1 ]]; then
dump-compact ka_local2019 --where="flighter_key='${REPO_KEY}-release'" >> sql/pack_data.sql
sed -i "" "s/\.\[${REPO_KEY}-release]//g" sql/pack_data.sql
sed -i "" "s/ka_local2019/ka_rel2019/g" sql/pack_structure.sql
sed -i "" "s/ka_local2019/ka_rel2019/g" sql/pack_structure.sql
fi
# Create master structure sql file for this repository (and delete it once again if it is empty)
awk '/./ { e=0 } /^$/ { e += 1 } e <= 1' < sql/pack_structure.sql > sql/${REPO_KEY}_structure.sql
structureExists=$(wc -c "sql/${REPO_KEY}_structure.sql" | awk '{print $1}')
if [[ ${structureExists} -eq 0 ]]; then rm sql/${REPO_KEY}_structure.sql; fi
# Create master sql data file in case the entire database needs to be rebuilt from scratch
awk '/./ { e=0 } /^$/ { e += 1 } e <= 1' < sql/pack_data.sql > sql/all_${REPO_KEY}_data.sql
# Commit global repository sql files
git add sql/all_${REPO_KEY}_data.sql
if [[ ${structureExists} -gt 0 ]]; then git add sql/${REPO_KEY}_structure.sql; fi
# Deleting any existing sql files to recreate them fresh below
if [[ -f sql/create_modx_data.sql ]]; then rm sql/create_modx_data.sql; fi
if [[ -f sql/create_flighter_data.sql ]]; then rm sql/create_flighter_data.sql; fi
for i in "${MODULES_ARRAY[#]}"
do
if [[ -f sql/create_${i}_data.sql ]]; then rm sql/create_${i}_data.sql; fi
done
if [[ -f sql/create_${REPO_KEY}_data.sql ]]; then rm sql/create_${REPO_KEY}_data.sql; fi
# Parse global repository data and separate out data filed by table prefix
lastPrefix='';
lastTable='';
while IFS= read -r iLine;
do
thisLine="${iLine}";
thisPrefix=$(echo ${thisLine} | grep -oEi '^INSERT INTO `([0-9a-zA-Z]+)_' | cut -d ' ' -f 3 | cut -d '`' -f 2 | cut -d '_' -f 1);
thisTable=$(echo ${thisLine} | grep -oEi '^INSERT INTO `([0-9a-zA-Z_]+)`' | cut -d ' ' -f 3 | cut -d '`' -f 2);
if [[ $(echo -n ${thisPrefix} | wc -m) -gt 0 ]]; then
if [[ -n "${REPO_PREFIX_COUNTS[$thisPrefix]}" ]]; then
if [[ ${REPO_PREFIX_COUNTS[$thisPrefix]} -lt 1 ]]; then
if [[ -f sql/create_${thisPrefix}_data.sql ]]; then rm sql/create_${thisPrefix}_data.sql; fi
touch "sql/create_${thisPrefix}_data.sql";
fi
REPO_PREFIX_COUNTS[$thisPrefix]=0;
fi
REPO_PREFIX_COUNTS[$thisPrefix]+=1;
echo "${thisLine}" >> sql/create_${thisPrefix}_data.sql;
if [[ ${thisTable} != ${lastTable} ]]; then
if [[ ${thisPrefix} != ${lastPrefix} ]]; then
if [[ -f sql/delete_${thisPrefix}_data.sql ]]; then rm sql/delete_${thisPrefix}_data.sql; fi
touch "sql/delete_${thisPrefix}_data.sql";
fi
if [[ $(echo -n ${thisTable} | wc -m) -gt 0 ]]; then
echo "DELETE FROM \`${thisTable}\` WHERE \`flighter_key\` LIKE '${REPO_KEY}%';" >> sql/delete_${thisPrefix}_data.sql
fi
fi
# Add previous prefix sql file to git if lastPrefix isn't ''
if [[ $(echo -n ${lastPrefix} | wc -m) -gt 0 ]]; then
git add "sql/create_${lastPrefix}_data.sql";
git add "sql/delete_${lastPrefix}_data.sql";
fi
fi
lastPrefix=${thisPrefix};
lastTable=${thisTable};
done < sql/all_${REPO_KEY}_data.sql
# Add previous prefix sql file to git for the final lastPrefix value
git add "sql/create_${lastPrefix}_data.sql";
git add "sql/delete_${lastPrefix}_data.sql";
# Clean up unused files
rm "sql/${REPO_KEY}_tables.txt";
rm "sql/pack_data.sql";
rm "sql/pack_structure.sql";
git add sql/;
A couple nuances worth noting are... (1) My code strips out all the auto_increment cursors from each table because they were creating a lot of unnecessary changes in the sql files and that ended up making commits more complicated. (2) My code also strips out the database name itself, because on the production server, I will be specifying the database that will be used and it is not the same database name as the one I use for local development and we don't want the data going to the wrong place. (3) This workflow also separates database structure and the data itself in the files I commit to git, which may be a source of confusion if you didn't already pick up on that.
On the flip side, when implementing the project on a server, I also have code which intuitively iterates through all the *.sql files and imports them into my database one at a time. I won't share the exact code for security reasons, but the general gist is... mysql mysql_database < database_file.sql

Use arguments as a part of variable inside of function

If I type:
function chk_is_it_started(){
PROCC_NAME_$1="my_process_$1";
echo "PROCC_NAME_$1 is: $PROCC_NAME_$1";
PID_FILE_OF_APP_$1="/run/pidfile_$PROCC_NAME_$1.pid"
PATH_OF_PROCCESS_NAME_$1=`ps -aux|grep $PROCC_NAME_$1|grep -v grep|awk -F" " '{print $12}'`
PID_NUMBER_OF_APP_$1=`ps -aux|grep $PROCC_NAME_$1|grep -v grep|awk -F" " '{print $2}'`
NUMBER_OF_OCCURENCE_$1=`echo ${#PID_NUMBER_OF_APP_$1[#]}`
if [[ "$NUMBER_OF_OCCURENCE_$1" == 0 ]];then
echo -e "Proccess isn't started..\nNow process $PATH_OF_PROCCESS_NAME_$1 is running and I'm creating a PID file..."
python /emu/script/$PROCC_NAME_$1.py & disown & echo $! > $PID_FILE_OF_APP_$1
else
echo "Proccess is STARTRED"
fi
}
chk_is_it_started blabla;
I will got the error:
root#orangepipc:~# chk_is_it_started blabla;
Could not find the database of available applications, run update-command-not-found as root to fix this
PROCC_NAME_blabla=my_process_blabla: command not found
PROCC_NAME_blabla is: blabla
-bash: PID_FILE_OF_APP_blabla=/run/pidfile_blabla.pid: No such file or directory
Could not find the database of available applications, run update-command-not-found as root to fix this
PATH_OF_PROCCESS_NAME_blabla=: command not found
Could not find the database of available applications, run update-command-not-found as root to fix this
PID_NUMBER_OF_APP_blabla=: command not found
-bash: ${#PID_NUMBER_OF_APP_$1[#]}: bad substitution
Could not find the database of available applications, run update-command-not-found as root to fix this
NUMBER_OF_OCCURENCE_blabla=: command not found
Proccess is STARTRED
But it is not!
Where I'm Making the misstake?
If I'm using th ecode without function it work!
Thx
I found the solution...
function chk_is_it_started(){
PROCC_NAME="dht22_$1"
# echo "PROCC_NAME_$1 is: $PROCC_NAME"
PID_FILE_OF_APP="/run/pidfile_$PROCC_NAME.pid"
# echo "PID_FILE_OF_APP is: $PID_FILE_OF_APP"
PATH_OF_PROCCESS_NAME=`ps -aux|grep $PROCC_NAME_$1|grep -v grep|awk -F" " '{print $12}'`
# echo "PATH_OF_PROCCESS_NAME is: $PATH_OF_PROCCESS_NAME"
PID_NUMBER_OF_APP=`ps -aux|grep $PROCC_NAME_$1|grep -v grep|awk -F" " '{print $2}'`
# echo "PID_NUMBER_OF_APP is $PID_NUMBER_OF_APP"
PID_NUMBER_OF_APP=( $PID_NUMBER_OF_APP )
# echo "PID_NUMBER_OF_APP is $PID_NUMBER_OF_APP"
NUMBER_OF_OCCURENCE=`echo ${#PID_NUMBER_OF_APP[#]}`
# echo "NUMBER_OF_OCCURENCE is: $NUMBER_OF_OCCURENCE"
if [[ "$NUMBER_OF_OCCURENCE" == 0 ]];then
echo -e "Proccess isn't started..\nNow process $PATH_OF_PROCCESS_NAME is running and I create a PID file..."
python /emu/script/$PROCC_NAME.py & disown & echo $! > $PID_FILE_OF_APP
# exit
else
echo "Proccess is STARTRED"
fi
if [[ "$NUMBER_OF_OCCURENCE" > 1 ]];then
echo -e "Process $PROCC_NAME.py is started more than 1x"
echo -e "Now killing all proccess one by one"
while [ "$NUMBER_OF_OCCURENCE" != "1" ];
do
echo "Usao sam u while"
PID_NUMBER_OF_APP=`ps -aux|grep $PROCC_NAME|grep -v grep|awk -F" " '{print $2}'`
echo "PID_NUMBER_OF_APP is: $PID_NUMBER_OF_APP"
PID_NUMBER_OF_APP=( $PID_NUMBER_OF_APP )
echo "PID_NUMBER_OF_APP is $PID_NUMBER_OF_APP"
NUMBER_OF_OCCURENCE=`echo ${#PID_NUMBER_OF_APP[#]}`
echo "NUMBER_OF_OCCURENCE is: $NUMBER_OF_OCCURENCE"
kill $PID_NUMBER_OF_APP
rm -fr $PID_FILE_OF_APP
done
echo -e "Starting process $PROCC_NAME.py and creating a PID file..."
python /emu/script/$PROCC_NAME.py & echo $! > $PID_FILE_OF_APP
fi
}
chk_is_it_started bla1
chk_is_it_started bla2
Btw saluting user who gave vote -1 to my question :)

How can I execute MySQL commands line by line from bash and capture the output?

If there is an alternative to bash, I will appreciate it too.
I have a large dump of MySQL commands (over 10 GB)
When restoring the dump I get a few warnings and occasionally an error. I need to execute those commands and process all warnings and errors. And, preferibly to do it automatically.
mysql --show-warnings
tee logfile.log
source dump.sql
The logfile will contain many lines telling each command was successful, and will display some warnings, particulartly truncate colums. But the original file has tens of thousands of very large INSERTs, the log is not particularly helpful. Despite it requires some kind of supervised interaction. (I cannot program a crontab, for example.)
#!/bin/bash
echo "tee logfile.log" > script.sql
echo "source $1" > script.sql
mysql --show-warnings < script.sql > tmpfile.log 2>&1
cat tmpfile.log >> logfile.log
The tee command doesn't work in this batch environment. I can capture all the warnings, but I cannot figure out which command produced each warning.
So I came down with this small monstruosity:
#!/bin/bash
ERRFILE=$(basename "$0" .sh).err.log
LOGFILE=$(basename "$1" .sql).log
log_action() {
WARN=$(cat)
[ -z "$WARN" ] || echo -e "Line ${1}: ${WARN}\n${2}" >> "$LOGFILE"
}
echo 0 > "$ERRFILE"
log_error() {
ERNO=$(cat "$ERRFILE")
ERR=$(cat)
[ -z "$ERR" ] || echo -e "*** ERROR ***\nLine ${1}: ${ERR}\n${2}" >> "$LOGFILE"
(( ERNO++ ))
echo $ERNO > "$ERRFILE"
}
COUNT=0
COMMAND=''
echo -e "**** BEGIN $(date +%Y-%m-%d\ %H:%M:%S)\n" > "$LOGFILE"
exec 4> >(log_action $COUNT "$COMMAND")
exec 5> >(log_error $COUNT "$COMMAND")
exec 3> >(mysql --show-warnings >&4 2>&5)
while IFS='' read -r LINE || [[ -n "$line" ]]
do
(( COUNT++ ))
[ ${#LINE} -eq 0 ] && continue # discard blank lines
[ "${LINE:0:2}" = "--" ] && continue # discard comments
COMMAND+="$LINE" # build command
[ "${LINE: -1}" != ";" ] && continue # if not finnished keep building
echo $COMMAND >&3 # otherwise execute
COMMAND=''
done < "$1"
exec 3>$-
exec 5>$-
exec 4>$-
echo -e "**** END $(date +%Y-%m-%d\ %H:%M:%S)\n" >> "$LOGFILE"
ERRS=$(cat "$ERRFILE")
[ "ERRS" = 0 ] || echo "${ERRS} Errors." >&2
This scans the file at $1 and sends the commands to an open MySQL connection at &3. That part is working fine.
The capture of warnings and errors is not working though.
It only records the first error.
It only records the first warning.
I haven't find a good way to pass the line number $COUNT and offending command $COMMAND to the recording functions.
The only error is after the time stamps, and the only warning is after the error, which is not the chronology of the script.

Adding header to all .csv files in folder and include filename

I'm a command line newbie and I'm trying to figure out how I can add a header to multiple .csv files. The new header should have the following: 'TaxID' and 'filename'
I've tried multiple commands like sed, ed, awk, echo but if it worked it only changed the first file it found (I said *.csv in my command) and I can only manage this for TaxID.
Can anyone help me to get the filename into the header as well and do this for all my csv files?
(Note, I'm using a Mac)
Thank you!
Here's one way to do it, there are certainly others:
$ for i in *.csv;do echo $i;cp "$i" "$i.bak" && { echo "TaxID,$i"; cat "$i.bak"; } >"$i";done
Here's a sample run:
$ cat file1.csv
1,2
3,4
$ cat file2.csv
a,b
c,d
$ for i in *.csv;do echo $i;cp "$i" "$i.bak" && { echo "TaxID,$i"; cat "$i.bak"; } >"$i";done
file1.csv
file2.csv
$ cat file1.csv.bak
1,2
3,4
$ cat file1.csv
TaxID,file1.csv
1,2
3,4
$ cat file2.csv.bak
a,b
c,d
$ cat file2.csv
TaxID,file2.csv
a,b
c,d
Breaking it down:
$ for i in *.csv; do
This loops over all the files ending in .csv in the current directory. Each will be put in the shell variable i in turn.
echo $i;
This just echoes the current filename so you can see the progress. This can be safely left out.
cp "$i" "$i.bak"
Copy the current file (whose name is in i) to a backup. This is both to preserve the file if something goes awry, and gives subsequent commands something to copy from.
&&
Only run the subsequent commands if the cp succeeds. If you can't make a backup, don't continue.
{
Start a group command.
echo "TaxID,$i";
Output the desired header.
cat "$i.bak";
Output the original file.
}
End the group command.
>"$i";
Redirect the output of the group command (the new header and the contents of the original file) to the original file. This completes one file.
done
Finish the loop over all the files.
For fun, here are a couple of other ways (one JRD beat me to), including one using ed!
$ for i in *.csv;do echo $i;perl -p -i.bak -e 'print "TaxID,$ARGV\n" if $. == 1' "$i";done
$ for i in *.csv;do echo $i;echo -e "1i\nTaxID,$i\n.\nw\nq\n" | ed "$i";done
Here is on way in perl that modifies the files in place by adding a header of TaxID,{filename}, ignoring adding the header if it thinks it already exists.
ls
a.csv b.csv
cat a.csv
1,a.txt
2,b.txt
cat b.csv
3,c.txt
4,d.txt
ls *.csv | xargs -I{} -n 1 \
perl -p -i -e 'print "TaxID,{}\n" if !m#^TaxID# && !$h; $h = 1;' {}
cat a.csv
TaxID,a.csv
1,a.txt
2,b.txt
cat b.csv
TaxID,b.csv
3,c.txt
4,d.txt
You may want to create some backups of your files, or run on a few sample copies before running in earnest.
Explanatory:
List all files in directory with .csv extenstion
ls *.csv
"Pipe" the output of ls command into xargs so the perl command can run for each file. -I{} allows the filename to be subsequently referenced with {}. -n tells xargs to only pass 1 file at a time to perl.
| xargs -I{} -n 1
-p print each line of the input (file)
-i modifying the file in place
-e execute the following code
perl -p -i -e
Perl will implicitly loop over each line of the file and print it (due to -p). Print the header if we have not printed the header already and the current line doesn't already look like a header.
'print "TaxID,{}\n" if !m#^TaxID# && !$h; $h = 1;'
This is replaced with the filename.
{}
All told, in this example the commands to be run would be:
perl -p -i -e 'print "TaxID,{}\n" if !m#^TaxID# && !$h; $h = 1;' a.csv
perl -p -i -e 'print "TaxID,{}\n" if !m#^TaxID# && !$h; $h = 1;' b.csv
perl -p -i -e 'print "TaxID,{}\n" if !m#^TaxID# && !$h; $h = 1;' c.csv
perl -p -i -e 'print "TaxID,{}\n" if !m#^TaxID# && !$h; $h = 1;' d.csv

Extract a table from a MySQL dump with tac, grep and sed...in reverse

I was following this guide on extracting a table from a mysql dump with grep, so I wouldn't have to restore all 50GB of data to have a peek at one table. The two main commands to pull the table are:
grep -n "Table structure" [MySQL_dump_filename].sql
which gets the line numbers for table definitions, then
sed -n '[starting_line_number],[ending_line_number] p' [MySQL_dump_filename].sql > [table_output_filename].sql
I would like to search the .sql dump in reverse order though, as what I need is towards the end of the file and will take quite awhile to grep though the first 48GB of data. I'm on OS X and installed tac (via brew as noted here). But is it possible to setup the command to accomplish this and have it quit after sed grabs the needed lines? If not I might as well grep from the beginning and not tac at all, just wait it out. Or ctrl-c once I see the file populated in another terminal.
Example run:
$ tac dump.sql | grep -n "Table structure"
...
751:-- Table structure for table `answer`
779:-- Table structure for table `template`
806:-- Table structure for table `resource`
...
But of course those are the line numbers in reverse, so if you need the 'template' table you would need to sed -n '752,779 p', but from the end of the file otherwise you'll get the wrong line number (sed will count from the beginning of the file).
a few quick pointers:
dd can help you to skip very fast N bytes/blocks/whatever if you are sure those first N gb are not usefull
after skipping, no need to 1) grep to find line number then 2) sed to skip until line number n (reading twice the huge remaining): you could directly:
awk '/beginningpattern/,/endpattern/ { print $0 ; }' #warning: syntax uncomplete, better read about awk and its prowess. You can do all sort of neat stuff.
Here is a more streamlined way to know where all table definitions begin and end.
For a give file rolando.sql, create a script that does the following:
DAT=rolando.sql
TBLMAP=tblmap.txt
TBLLST=${TBLLST}.lst
TBLTMP=${TBLMAP}.tmp
RUNMAP=DisplayTables.sh
grep -n "^-- Table structure" ${DAT} |sed 's/:/ /'| awk '{print $1}' > ${TBLTMP}
grep -n "^) ENGINE=" ${DAT} |sed 's/:/ /'| awk '{print $1}' >> ${TBLTMP}
sort -n < ${TBLTMP} > ${TBLLST}
rm -f ${TBLTMP}
rm -f ${TBLMAP}
POS=1
for X in `cat ${TBLLST}`
do
(( POS = 1 - POS ))
if [ ${POS} -eq 0 ]
then
(( Y = X - 2 ))
fi
if [ ${POS} -eq 1 ]
then
echo "${Y},${X}" >> ${TBLMAP}
fi
done
rm -f ${TBLLST}
echo "Table Structures From ${DAT}"
for XY in `cat ${TBLMAP}`
do
echo "sed -n '${XY}p' ${DAT}" >> ${RUNMAP}
done
chmod +x ${RUNMAP}
./${RUNMAP}
This script will output every create table statement for you. It will include the DROP TABLE statements also. If you do not want the drop table statements, you this one:
DAT=rolando.sql
TBLMAP=tblmap.txt
TBLLST=${TBLLST}.lst
TBLTMP=${TBLMAP}.tmp
RUNMAP=DisplayTables.sh
grep -n "^CREATE TABLE" ${DAT} | sed 's/:/ /' | awk '{print $1}' > ${TBLTMP}
grep -n "^) ENGINE=" ${DAT} | sed 's/:/ /' | awk '{print $1}' >> ${TBLTMP}
sort -n < ${TBLTMP} > ${TBLLST}
rm -f ${TBLTMP}
rm -f ${TBLMAP}
POS=1
for X in `cat ${TBLLST}`
do
(( POS = 1 - POS ))
if [ ${POS} -eq 0 ]
then
(( Y = X ))
fi
if [ ${POS} -eq 1 ]
then
echo "${Y},${X}" >> ${TBLMAP}
fi
done
rm -f ${TBLLST}
echo echo "Table Structures From ${DAT}" > ${RUNMAP}
for XY in `cat ${TBLMAP}`
do
echo "sed -n '${XY}p' ${DAT}" >> ${RUNMAP}
done
chmod +x ${RUNMAP}
./${RUNMAP}
Give it a Try !!!