Fastest way to search a string in a mysql database - mysql

When searching a string in an mysql database without knowing in what table or column I might find it, I usually resort to
mysqldump --extended-insert=FALSE --complete-insert=TRUE dbname|grep SomeString
That dumps a lot of unneeded data, ignores indeces invented for that exact purpose, and makes it rather hard to localize the results. Is there a more convenient and more performant way? (Except installing 3rd party software.)
About duplicate questions: As a comment pointed out, I am not restricted to SQL queries, but will accept any solution - may it be a bash script, some CLI function I might have not seen, or grepping the DB file as suggested.

I found a find command that lists the file which contains a specific string:
find [path of data folder of database]/*.ibd -type f -exec grep -i '[search string]' {} +
Outputs when found:
Binary file [path]/[table].ibd matches
Doesn't outputs anything when not found.
Means:
find : linux find command
path : the path pointing to the database folder under mysql/data
directory which contains the table's ibd files
-type f : only search for files
-exec : execute this while listing
grep -i : case insensitive text search
string : string to search for
For others, read this:
https://serverfault.com/a/343177
Same usage here:
http://www.valleyprogramming.com/blog/linux-find-grep-commands-exec-combine-search

Related

Recursively Replace One Windows Path w/ Another in Text Files

I have a large amount of text files stored on a Red Hat server that contain explicit Windows paths. Today, that path has changed and I would like to change the text files to reflect the new path. As they are Windows paths, they all contain single backslashes. I would like to maintain the single backslashes if possible.
I wanted to ask what the best method to perform this string replacement would be. I have made backups of folders so that I may test on a smaller scale before applying to the larger scale that will affect my group members.
Example:
Change $oldPath to $newPath in all *.py files recursively contained in current directory.
i.e. $oldPath\common\file_referenced should become $newPath\common\file_referenced
Robustly using any awk in any shell on every Unix box and regardless of which characters your old or new directory paths contain and whether or not the final directory in either old or new could be a substring of another existing directory name:
$ cat file
\old\fashioned\common\file_referenced
$ oldPath='\old\fashioned'
$ newPath='\new\fangled\etc'
$ awk '
BEGIN { old=ARGV[1]; new=ARGV[2]; ARGV[1]=ARGV[2]="" }
index($0"\\",old"\\")==1 { $0=new substr($0,length(old)+1) }
1' "$oldPath" "$newPath" file
\new\fangled\etc\common\file_referenced
To update all .py files in a directory you could use GNU awk for -i inplace, or you could do for i in *.py; do awk '...' old new "$i" > tmp && mv tmp "$i"; done, or you could use find and/or xargs, etc. - any of the common Unix ways to process multiple files with any command.

Algorithm to delete every files in a directory, except some in a given list

Assume we have a directory with structure like this, I marked directories as (+) and files as (-)
rootdir
+a
+a1
-f1
-f2
+a2
-f3
+b
+b1
+b2
-f4
-f5
-f6
+b3
-f7
-f8
and a given list of files like
/a/a1/f1
/b/b1/b2/f5
/b/b3/f7
I am struggling to find the way to remove every files inside root, except the one in the given list. So after the program executed, the root directory should look like this:
rootdir
+a
+a1
-f1
+b
+b1
+b2
-f5
+b3
-f7
This example just for easier to understand the problem. In reality, the given list include around 4 thousands of files. And the root directory has the size of ~15GB with a hundreds of thousands files inside.
That would be easy to search inside a folder, and to remove files that matched in a given list. Let just say we solve the revert issue, to keep files that matched in a given list.
Programs written in Perl/Python are prefer.
First, store your list of files you want to keep inside an associative container like a Python dict or a map of some kind.
Second, simply iterate (in Python, os.walk) over the entire directory structure, and every time you see a file, check if it is in the associative container of paths to keep. If not, delete it (in Python, os.unlink).
Alternatively:
First, create a temporary directory on the same filesystem.
Second, move (os.renames, which generates new subdirectories as needed) all the "keep" files to the temporary directory, with the same structure.
Third, overwrite (os.removedirs followed by os.rename, or just shutil.move) the original directory with the temporary one.
The os.walk path:
import os
keep = set(['/a/a1/f1', '/b/b1/b2/f5', '/b/b3/f7'])
for dirpath, dirnames, filenames in os.walk('./'):
for name in filenames:
path = os.path.join(dirpath, name).lstrip('.')
print('check ' + path)
if path not in keep:
print('delete ' + path)
else:
print('keep ' + path)
It doesn't do anything except inform you.
It don't think os.walk is too slow, and it gives you the option of keeping by regex patterns or any other criteria.
This is a working code for your problem.
import os
def list_files(directory):
for root, dirs, files in os.walk(directory):
for name in files:
yield os.path.join(root, name)
files_to_delete = {'/home/vedang/Desktop/a.out', '/home/vedang/Desktop/ABC/temp.txt'} #Keep a set instead of list for faster lookups
for f in list_files('/home/vedang/Desktop'):
if f in files_to_delete:
os.unlink(f)
Here is a function which accepts a set of files you wish to keep and the root directory from which you wish to begin deleting files.
It's a classic recursive Depth-First-Search that will remove empty directories after deleting all the unwanted files
import os
def delete_files(keep_list:set, curr_dir):
files = os.listdir(curr_dir)
for f in files:
path = f"{curr_dir}/{f}"
if os.path.isfile(path):
if path not in keep_list:
os.remove(path)
elif os.path.islink(path):
os.unlink(path)
elif os.path.isdir(path):
delete_files(keep_list, path)
files = os.listdir(curr_dir)
if not files:
os.rmdir(curr_dir)
here i got a solution in a different aspect,
suppose we are at linux environment,
first,
find .
to get a long list with all file path/folder explained
second, suppose we got the exclude path list, in order to exclude at your volume ( say thousands ) , we could just append these to the previous list, and
| sort | uniq - c |grep -v "^2"
to get the to delete list,
and third
| xargs rm
to actually do the deletion

Using arrays in a for loop, in bash [duplicate]

This question already has answers here:
bash script, create array of all files in a directory
(3 answers)
Closed 7 years ago.
I am currently working on a bash script where I must download files from our mySQL database, host them somewhere different, then update the database with the new location for the image. The last portion is my problem area, creating the array full of filenames and iterating through them, replacing the file names in the database as we go.
For whatever reason I keep getting these kinds of errors:
not found/X2b6qZP.png: 1: /xxx/images/X2b6qZP.png: ?PNG /xxx/images/X2b6qZP.png: 2: /xxx/images/X2b6qZP.png: : not found
/xxx/images/X2b6qZP.png: 1: /xxx/images/X2b6qZP.png: Syntax error: word unexpected (expecting ")")
files=$($DOWNLOADDIRECTORY/*)
files=$(${files[#]##*/})
# Iterate through the file names in the download directory, and assign the new values to the detail table.
for file in "${files[#]}"
do
mysql -h ${HOST} -u ${USER} -p${PASSWORD} ${DBNAME} "UPDATE crm_category_detail SET detail_value = 'http://xxx.xxx.x.xxx/img/$file' WHERE detail_value LIKE '%imgur.com/$file'"
done
You are trying to execute a glob as a command. The syntax to use arrays is array=(tokens):
files=("$DOWNLOADDIRECTORY"/*)
files=("${files[#]##*/}")
You are also trying to run your script with sh instead of bash.
Do not run sh file or use #!/bin/sh. Arrays are not supported in sh.
Instead use bash file or #!/bin/bash.
whats going on right here?
files=$($DOWNLOADDIRECTORY/*)
I dont think this is doing what you think it is doing.
According to this answer, you want to omit the first $ to get an array of files.
files=($DOWNLOADDIRECTORY/*)
I just wrote a sample script
#!/bin/sh
alist=(/*)
printf '%s\n' "${alist[#]}"
Output
/bin
/boot
/data
/dev
/dist
/etc
/home
/lib
....
Your assignments are not creating arrays. You need arrayname=( values for array ) as the notation. Hence:
files=( "$DOWNLOADDIRECTORY"/* )
files=( "${files[#]##*/}" )
The first line will give you all the names in the directory specified by $DOWNLOADDIRECTORY. The second carefully removes the directory prefix.
I've used spaces after ( and before ) for clarity; the shell neither requires nor objects to them. I used double quotes around the variable name and expansions to keep things sane when name do contain spaces etc.
Although it isn't immediately obvious why you might do this, its advantage over many alternatives is that it preserves spaces etc in file names.
You could just loop directly over the files:
for file in "$DOWNLOADDIRECTORY"/*; do
file="${file##*/}" # or file=$(basename "$file")
# MySQL stuff
done
Some quoting added in case of spaces in paths.

Squid StoreId rewrite

I try to configure my proxy to de-duplicate some cached files.
Some site add query-string at the end of URL and so the file is cached multiple times. Ex :
http://download.oracle.com/otn-pub/java/jdk/7u75-b13/jre-7u75-linux-x64.tar.gz?AuthParam=kjzeghfhrehbfgjernf
http://download.oracle.com/otn-pub/java/jdk/7u75-b13/jre-7u75-linux-x64.tar.gz?AuthParam=jzehrguihegeijhpijf
I would like to create et rewrite rule for storeId like that :
^http:\/\/download\.oracle\.com\/otn\-pub\/java\/([a-zA-Z0-9\/\.\-\_]+\.(tar\.gz)) http://download.oracle.com/otn-pub/java/$1
but I have'nt found documention about how to do that.
Ok, so after long research I have find the answer to my question. I write here if case of someone else have the same question.
First of all, I have install Squid 3.4, the first version witch support StoreId rewrite.
Second, after reading StoreId documentation :
wiki.squid-cache.org/Features/StoreID
wiki.squid-cache.org/Features/StoreID/DB
and lot of google search I found this perl program http://pastebin.ca/2422099. It take a database file as first argument, you can find examples in the second link before. In the file I have had a line as above :
^http:\/\/download\.oracle\.com\/otn\-pub\/java\/([a-zA-Z0-9\/\.\-\_]+\.(tar\.gz)) http://download.oracle.com/otn-pub/java/$1
Third, in my squid.conf, I had this line :
store_id_program /usr/local/squid/store-id.pl /usr/local/squid/store_id_db
store_id_children 5 startup=1
store_id_program is the path to the perl file with in argument the database file.
store_id_children represent the number of subprocess allowed to the program, maximum 5, 1 at the beginning.
In the same squid.conf I replace this line :
refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
by
refresh_pattern -i cgi-bin 0 0% 0
to allow caching url with query string.
Last, I ensure that the store-id.pl has 'x' permission
Hope this help :)
PS: Just a trick, in the db file, you must have to columns separate by a tabulation (not a space). To be sure, you can use this command (find in doc):
cat dbfile | sed -r -e 's/\s+/\t/g' |sed '/^\#/d' >cleaned_db_file

Parse ClamAV logs in Bash script using Regex to insert in MySQL

Morning/Evening all,
I've got a problem where I'm making a script for work that uses ClamAV to scan for malware, and then place it's results in MySQL by taking the resultant ClamAV logs using grep with awk to convert the right parts of the log to a variable. The problem I have is that whilst I have done the summary ok, the syntax of detections makes it slightly more difficult. I'm no expert at regex by all means and this is a bit of a learning experience, so there is probably a far better way of doing it than I have!
The lines I'm trying to parse looks like these:
/net/nas/vol0/home/recep/SG4rt.exe: Worm.SomeFool.P FOUND
/net/nas/vol0/home/recep/SG4rt.exe: moved to '/srv/clamav/quarantine/SG4rt.exe'
As far as I was able to establish, I need a positive lookbehind to match what happens after and before the colon, without actually matching the colon or the space after it, and I can't see a clear way of doing it from RegExr without it thinking I'm trying to look for two colons. To make matters worse, we sometimes get these too...
WARNING: Can't open file /net/nas/vol0/home/laser/samples/sample1.avi: Permission denied
The end result is that I can build a MySQL query that inserts the path, malware found and where it was moved to or if there was an error then the path, then the error encountered so as to convert each element to a variable contents in a while statement.
I've done the scan summary as follows:
Summary looks like:
----------- SCAN SUMMARY -----------
Known viruses: 329
Engine version: 0.97.1
Scanned directories: 17350
Scanned files: 50342
Infected files: 3
Total errors: 1
Data scanned: 15551.73 MB
Data read: 16382.67 MB (ratio 0.95:1)
Time: 3765.236 sec (62 m 45 s)
Parsing like this:
SCANNED_DIRS=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Scanned directories" | awk '{gsub("Scanned directories: ", "");print}')
SCANNED_FILES=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Scanned files" | awk '{gsub("Scanned files: ", "");print}')
INFECTED=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Infected files" | awk '{gsub("Infected files: ", "");print}')
DATA_SCANNED=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Data scanned" | awk '{gsub("Data scanned: ", "");print}')
DATA_READ=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Data read" | awk '{gsub("Data read: ", "");print}')
TIME_TAKEN=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Time" | awk '{gsub("Time: ", "");print}')
END_TIME=$(date +%s)
mysql -u scanner_parser --password=removed sc_live -e "INSERT INTO bs.live.bs_jobstat VALUES (NULL, '$CURRTIME', '$PID', '$IY', '$SCANNED_DIRS', '$SCANNED_FILES', '$INFECTED', '$DATA_SCANNED', '$DATA_READ', '$TIME_TAKEN', '$END_TIME');"
rm -f /srv/clamav/$IY-scan-$LOGTIME.log
Some of those variables are from other parts of the script and can be ignored. The reason I'm doing this is to save logfile clutter and have a simple web based overview of the status of the system.
Any clues? Am I going about all this the wrong way? Thanks for help in advance, I do appreciate it!
From what I can determine from the question, it seems like you are asking how to distinguish the lines you want from the logger lines that start with WARNING, ERROR, INFO.
You can do this without getting to fancy with lookahead or lookbehind. Just grep for lines beginning with
"/net/nas/vol0/home/recep/SG4rt.exe: "
then using awk you can extract the remainder of the line. Or you can gsub the prefix out like you are doing in the summary processing section.
As far as the question about processing the summary goes, what strikes me most is that you are processing the entire file multiple times, each time pulling out one kind of line. For tasks like this, I would use Perl, Ruby, or Python and make one pass through the file, collecting the pieces of each line after the colon, storing them in regular programming language variables (not env variables), and forming the MySQL insert string using interpolation.
Bash is great for some things but IMHO you are justified in using a more general scripting language (Perl, Python, Ruby come to mind).