My purpose is to analyse web-application logs, use mysql as database. First, i filtered some useless information use awk to generate a filted-log, then i apply LOAD DATA import this log to mysql.
My problem is : those original logs generate every 10mins, every day. How can i generate filted-logs once new web-application logs was generated? After new filted-logs generated, how can i import those files to mysql automatically?
the original logs:
20150414/0900.log
20150414/0910.log
I´ve tried to create a little script that will easy explain the way to do it. There you have an awk that controls all the readFiles. If the number of read files is bigger when a new read is done, the system will parse de name and save it in "readFiles" file which will be checked in the awk to assure the file was not read before.
Please, check your system will not erase old logs, and be careful of splitting readed control files, or creating new ones each day to avoid very big files.
//this will give you the today datae
date +%Y%m%d
This is the code:
echo "x" > readFiles
lastnum=0
num=0
count=0
while true
do
echo "LOOKING FOR NEW FILES. LASTCOUNT="$lastcount
count=`ls ./2015*/*.log | wc -l`
echo $count
if [ $count -gt $lastnum ]
then
lastnum=$count
`ls ./2015*/*.log | awk -F"/" 'BEGIN {
while(( getline < "readFiles") > 0 ) {
readedFiles[$0]
}}
{if(!($0 in readedFiles)){print $0}}
'`>> readFiles
echo "WAITING RESTART"
sleep 10
else
echo "NO NEW FILES FOUND"
sleep 10
fi
done
Instead of writing script to monitor logs. I use inotify-tools to trigger scripts on filesystem events, just few lines get things done.
NOW=$(date +"%Y%m%d")
while true ;
do
inotifywait -r -e create,move /rsynclog/logs/$NOW && \
/rsynclog/logs/generate.sh
done
Related
In a project, we use 2 IDEs. The project contains hundreds files of code, and hundreds special files of JSON format which constantly get reread and rewritten by these IDEs. While we used single IDE, it's not a problem, files always get written the same way. Unfortunately, different IDEs save JSON with different ordering which leads to dozens of changes for GIT and uselessly overwhelmed diff. These files are important and must not be excluded by GitIgnore, but they rarely get changed, and this probably can be handled manually.
So, is there a terminal command to quickly undo/unselect changes for specific file extension? Or, maybe it is possible for GIT to track changes of JSONs without considering the order?
I also had an idea to use custom script for reordering the JSONs, but it would consume too much CPU, and also lead to rereading by an IDE which is also bad.
Update
I found the following command from another SO question:
git checkout main -- $(git ls-files -- "*.yy")
This workaround isn't handy but basically solves the problem. If anybody knows how to make GIT ignore JSON ordering, it would be great!
One way to temporarily ignore changes to the json files is to tell git to assume they haven't changed:
git update-index --assume-unchanged file-to-ignore.json
And only when you want to commit, tell git to really look at the file again:
git update-index --no-assume-unchanged file-to-ignore.json
Another option would be to use a pre-commit-hook to sort the json only when committing.
i'd make a git pre-commit hook to make sure all JSONs are always formatted the same way, for example in .git/hooks/pre-commit put
#!/bin/sh
php git/precommit_hook.php
exit $?
and if you're on a unix-system, make sure pre-commit is chmod +x .git/hooks/pre-commit
and in git/precommit_hook.php put
<?php
declare (strict_types = 1);
if(PHP_VERSION_ID < 70300) {
fwrite(STDERR, "PHP 7.3 or higher is required to run this script");
exit(1);
}
$changed_files = explode("\x00", rtrim(shell_exec("git diff --name-only --cached -z"), "\x00"));
foreach ($changed_files as $file) {
if(!file_exists($file)) {
// File was deleted, skip it
continue;
}
$ext = pathinfo($file, PATHINFO_EXTENSION);
if ($ext === "json") {
$json = json_decode(file_get_contents($file), true);
if (json_last_error() !== JSON_ERROR_NONE) {
fwrite(STDERR, "JSON Error: " . json_last_error_msg() . " in $file, will not format it\n");
continue;
}
$json = json_encode($json, JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE | JSON_THROW_ON_ERROR);
file_put_contents($file, $json, LOCK_EX);
}
}
now all *.json files will be committed with the PHP json formatters JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE | JSON_THROW_ON_ERROR
no matter what IDE you use :)
I've got a MySQL DB set up on my system for local testing, and I'm monitoring the tables to see when a change is made.
Step 1 - Go to DIR
cd /usr/local/mysql-5.7.16-osx10.11-x86_64/data/blog_atom_tables/
Step 2 - Run Script
watchDB
Where watchDB() is (slightly modified for readability)...
function watchDB() {
declare -A aa // Associative array of filenames and their md5 hashes
declare k // Holder for current md5
prt="0"
while true; do // Run forever
// Loop through all table files within directory
for i in *.ibd;
do
k=$(sudo md5 -q $i) // md5 of file (table)
// If table has not been hashed yet
if [[ ${aa[$(echo $i | cut -f 1 -d '.')]} == "" ]]; then
aa[$(echo $i | cut -f 1 -d '.')]=$k
// If table has been hashed, and diff md5 (i.e. table changed)
elif [[ ${aa[$(echo $i | cut -f 1 -d '.')]} != $k ]]; then
echo $i;
aa[$(echo $i | cut -f 1 -d '.')]=$k
fi
done
done
}
TL;DR Loop through all the table files within the directory, save a copy of each md5, and continue looping through checking for a change.
I don't need to see what rows/columns have been changed, only that the table itself is different. For the most part, this works exactly as I want, but calculating the md5 for every table takes a noticeable amount of time. For only 25 tables, it takes between 3 and 5 seconds to execute each loop.
Is there a quicker way to do this, other than md5? I'd use something like cmp, but I need to save a reference of the current state of the file, so I have something to compare it against.
This is only about 1/6 of the total tables that will eventually be in there, so any improvement on speed is welcome.
While it's not really checking the content of the file, you could use file system attributes as a simple way to monitor for changes. Unless the filesystem is mounted with the timestamps disabled, you can monitor the access time and modification time timestamps:
stat -f "%m" <filename>
The filesystem driver knows when reads and writes occur and subsequently updates the timestamps.
Hi & thanks in advance.
I'm trying to update a column(version) on an MySQL table from a Bash script.
I've populated a variable with the version numbers, but it fails after applying the first version in the list.
CODE:
UP_VER=`seq ${DB_VER} ${LT_VER} | sed '1d'`
UP_DB=`echo "UPDATE client SET current_db_vers='${UP_VER}' WHERE client_name='${CLIENT}'" | ${MYSQL_ID}`
while read -r line
do
${UP_DB}
if [[ "${OUT}" -eq "0" ]]; then
echo "Database upgraded.."
else
echo "Failed to upgrade.."
exit 1
fi
done < "${UP_VER}"
Thanks
Hopefully solved... My $UP_VER is in a a row not a column.
You're misunderstanding what several shell constructs do:
var=`command` # This executes the command immediately, and stores
# its result (NOT the command itself) in the variable
... < "${UP_VER}" # Treats the contents of $UP_VER as a filename, and tries
# to use that file as input
if [[ "${OUT}" -eq "0" ]]; then # $OUT is not defined anywhere
... current_db_vers='${UP_VER}' ... # this sets current_db_vers to the entire
# list of versions at once
Also, in the shell it's best to use lowercase (or mixed-case) variable names to avoid conflicts with the variables that have special meanings (which are all uppercase).
To fix the first problem, my recommendation is don't try to store shell commands in variables, it doesn't work right. (See BashFAQ #50: I'm trying to put a command in a variable, but the complex cases always fail!.) Either use a function, or just write the command directly where it's going to be executed. In this case I'd vote for just putting it directly where it's going to be executed. BTW, you're making the same mistake with ${MYSQL_ID}, so I'd recommend fixing that as well.
For the second problem, you can use <<< "${UP_VER}" to feed a variable's contents as input (although this is a bashism, and not available in generic posix shells). But in this case I'd just use a for loop:
for ((ver=db_ver+1; ver<=lt_ver; ver++)); do
For the third problem, the simplest way to test the success of a command is to put it directly in the if:
if somecommand; then
echo "Database upgraded.."
else # ... etc
So, here's my take at a rewrite:
mysql_id() {
# appropriate function definition goes here...
}
for ((ver=db_ver+1; ver<=lt_ver; ver++)); do
if echo "UPDATE client SET current_db_vers='${ver}' WHERE client_name='${client}'" | mysql_id; then
echo "Database upgraded.."
else
echo "Failed to upgrade.."
exit 1
fi
done
... but I'm not sure I understand what it's supposed to do. It seems to be updating current_db_vers one number at a time until it reaches $ver_lt... but why not set it directly to $ver_lt in a single UPDATE?
try something like :
done <<< "${UP_VER}"
I have a directory with about 2.5 million files and is over 70 GB.
I want to split this into subdirectories, each with 1000 files in them.
Here's the command I've tried using:
i=0; for f in *; do d=dir_$(printf %03d $((i/1000+1))); mkdir -p $d; mv "$f" $d; let i++; done
That command works for me on a small scale, but I can leave it running for hours on this directory and it doesn't seem to do anything.
I'm open for doing this in any way via command line: perl, python, etc. Just whatever way would be the fastest to get this done...
I suspect that if you checked, you'd noticed your program was actually moving the files, albeit really slowly. Launching a program is rather expensive (at least compared to making a system call), and you do so three or four times per file! As such, the following should be much faster:
perl -e'
my $base_dir_qfn = ".";
my $i = 0;
my $dir;
opendir(my $dh, $base_dir_qfn)
or die("Can'\''t open dir \"$base_dir_qfn\": $!\n");
while (defined( my $fn = readdir($dh) )) {
next if $fn =~ /^(?:\.\.?|dir_\d+)\z/;
my $qfn = "$base_dir_qfn/$fn";
if ($i % 1000 == 0) {
$dir_qfn = sprintf("%s/dir_%03d", $base_dir_qfn, int($i/1000)+1);
mkdir($dir_qfn)
or die("Can'\''t make directory \"$dir_qfn\": $!\n");
}
rename($qfn, "$dir_qfn/$fn")
or do {
warn("Can'\''t move \"$qfn\" into \"$dir_qfn\": $!\n");
next;
};
++$i;
}
'
Note: ikegami's helpful Perl-based answer is the way to go - it performs the entire operation in a single process and is therefore much faster than the Bash + standard utilities solution below.
A bash-based solution needs to avoid loops in which external utilities are called order to perform reasonably.
Your own solution calls two external utilities and creates a subshell in each loop iteration, which means that you'll end up creating about 7.5 million processes(!) in total.
The following solution avoids loops, but, given the sheer number of input files, will still take quite a while to complete (you'll end up creating 4 processes for every 1000 input files, i.e., ca. 10,000 processes in total):
printf '%s\0' * | xargs -0 -n 1000 bash -O nullglob -c '
dirs=( dir_*/ )
dir=dir_$(printf %04s $(( 1 + ${#dirs[#]} )))
mkdir "$dir"; mv "$#" "$dir"' -
printf '%s\0' * prints a NUL-separated list of all files in the dir.
Note that since printf is a Bash builtin rather than an external utility, the max. command-line length as reported by getconf ARG_MAX does not apply.
xargs -0 -n 1000 invokes the specified command with chunks of 1000 input filenames.
Note that xargs -0 is nonstandard, but supported on both Linux and BSD/OSX.
Using NUL-separated input robustly passes filenames without fear of inadvertently splitting them into multiple parts, and even works with filenames with embedded newlines (though such filenames are very rare).
bash -O nullglob -c executes the specified command string with option nullglob turned on, which means that a globbing pattern that matches nothing will expand to the empty string.
The command string counts the output directories created so far, so as to determine the name of the next output dir with the next higher index, creates the next output dir, and moves the current batch of (up to) 1000 files there.
if the directory is not under use, I suggest the following
find . -maxdepth 1 -type f | split -l 1000 -d -a 5
this will create n number of files named x00000 - x02500 (just to make sure 5 digits although 4 will work too). You can then move the 1000 files listed in each file to a corresponding directory.
perhaps set -o noclobber to eliminate risk of overrides in case of name clash.
to move the files, it's easier to use brace expansion to iterate over file names
for c in x{00000..02500};
do d="d$c";
mkdir $d;
cat $c | xargs -I f mv f $d;
done
Moving files around is always a challenge. IMHO all the solutions presented so far have some risk of destroying your files. This may be because the challenge sounds simple, but there is a lot to consider and to test when implementing it.
We must also not underestimate the efficiency of the solution as we are potentially handling a (very) large number of files.
Here is script carefully & intensively tested with own files. But of course use at your own risk!
This solution:
is safe with filenames that contain spaces.
does not use xargs -L because this will easily result in "Argument list too long" errors
is based on Bash 4 and does not depend on awk, sed, tr etc.
is scaling well with the amount of files to move.
Here is the code:
if [[ "${BASH_VERSINFO[0]}" -lt 4 ]]; then
echo "$(basename "$0") requires Bash 4+"
exit -1
fi >&2
opt_dir=${1:-.}
opt_max=1000
readarray files <<< "$(find "$opt_dir" -maxdepth 1 -mindepth 1 -type f)"
moved=0 dirnum=0 dirname=''
for ((i=0; i < ${#files[#]}; ++i))
do
if [[ $((i % opt_max)) == 0 ]]; then
((dirnum++))
dirname="$opt_dir/$(printf "%02d" $dirnum)"
fi
# chops the LF printed by "find"
file=${files[$i]::-1}
if [[ -n $file ]]; then
[[ -d $dirname ]] || mkdir -v "$dirname" || exit
mv "$file" "$dirname" || exit
((moved++))
fi
done
echo "moved $moved file(s)"
For example, save this as split_directory.sh. Now let's assume you have 2001 files in some/dir:
$ split_directory.sh some/dir
mkdir: created directory some/dir/01
mkdir: created directory some/dir/02
mkdir: created directory some/dir/03
moved 2001 file(s)
Now the new reality looks like this:
some/dir contains 3 directories and 0 files
some/dir/01 contains 1000 files
some/dir/02 contains 1000 files
some/dir/03 contains 1 file
Calling the script again on the same directory is safe and returns almost immediately:
$ split_directory.sh some/dir
moved 0 file(s)
Finally, let's take a look at the special case where we call the script on one of the generated directories:
$ time split_directory.sh some/dir/01
mkdir: created directory 'some/dir/01/01'
moved 1000 file(s)
real 0m19.265s
user 0m4.462s
sys 0m11.184s
$ time split_directory.sh some/dir/01
moved 0 file(s)
real 0m0.140s
user 0m0.015s
sys 0m0.123s
Note that this test ran on a fairly slow, veteran computer.
Good luck :-)
This is probably slower than a Perl program (1 minute for 10.000 files) but it should work with any POSIX compliant shell.
#! /bin/sh
nd=0
nf=0
/bin/ls | \
while read file;
do
case $(expr $nf % 10) in
0)
nd=$(/usr/bin/expr $nd + 1)
dir=$(printf "dir_%04d" $nd)
mkdir $dir
;;
esac
mv "$file" "$dir/$file"
nf=$(/usr/bin/expr $nf + 1)
done
With bash, you can use arithmetic expansion $((...)).
And of course this idea can be improved by using xargs - should not take longer than ~ 45 sec for 2.5 million files.
nd=0
ls | xargs -L 1000 echo | \
while read cmd;
do
nd=$((nd+1))
dir=$(printf "dir_%04d" $nd)
mkdir $dir
mv $cmd $dir
done
I would use the following from the command line:
find . -maxdepth 1 -type f |split -l 1000
for i in `ls x*`
do
mkdir dir$i
mv `cat $i` dir$i& 2>/dev/null
done
Key is the "&" which threads out each mv statement.
Thanks to karakfa for the split idea.
I'm trying to write a bash script that, among other things, extracts information from a mysql database. I tried the following to extract a file from entry 20:
mysql -se "select file_column_name from table where id=20;" >file.txt
That gave me a file.txt with the file name, not the file contents. How would I get the actual blob into file.txt?
Turn the value in file.txt into a variable and then use it as you need to? i.e.
blobFile=$(cat file.txt)
echo "----- contents of $blobFile ---------"
cat $blobFile
# copy the file somewhere else
scp $blobFile user#Remote /path/to/remote/loc/for/blobFile
# look for info in blobfile
grep specialInfo $blobFile
# etc ...
Is that what you want/need to do?
I hope this helps.