Apache: log storage into MySQL - mysql

Method 1: Pipe Log
Recently I've read an article about how to save Apache log in MySQL database. Briefly, the idea is to pipe each log to MySQL:
# Format log as a MySQL query
LogFormat "INSERT INTO apache_logs \
set ip='%h',\
datetime='%{%Y-%m-%d %H:%M:%S}t',\
status='%>s',\
bytes_sent='%B',\
content_type='%{Content-Type}o',\
url_requested='%r',\
user_agent='%{User-Agent}i',\
referer='%{Referer}i';" \
mysql_custom_log
# execute queries
CustomLog "|/usr/bin/mysql -h 127.0.0.1 -u log_user -plog_pass apache_logs" mysql_custom_log
# save queries to log file
CustomLog logs/mysql_custom_log mysql_custom_log
Question
It seems that untreated user inputs (ie: user_agent & referer) would be passed directly to MySQL.
Therefore, is this method vulnerable to SQL injection? If so, is it possible to harden it?
Method 2: Apache module
mod_log_sql is an Apache module that seems to do something similar, ie: "logs all requests to a database". According to the documentation, such module has several advantages:
power of data extraction with SQL-based log
more configurable and flexible than the standard module [mod_log_config]
links are kept alive in between queries to save speed and overhead
any failed INSERT commands are preserved to a local file
no more tasks like log rotation
no need to collate/interleave the many separate logfiles
However, despite all this advantages, mod_log_sql doesn't seem to be popular:
the documentation doesn't mention one production level user
few discussions through the web
several periods without a maintainer
Which sounds like a warning to me (although I might be wrong).
Questions
Any known reason why this module doesn't seem to be popular?
Is it vulnerable to SQL injection? If so, is it possible to harden it?
Which method should have better performance?

Pipe Log method is better because it creates a stream between your log and your database, this can reflect directly on time performance in insertion/searching. Another point about pipe log is the possibility to use a NoSQL database which is optmized for searching or insertion via specific queries, one example is the ELK Stack, Elasticsearch + Logstash(Log Parser + Stream) and Kibana.
Would recommend any reading related to that: https://www.guru99.com/elk-stack-tutorial.html
Related to your question about SQL Injection, it deppends on how you are communicating with your database, despite the type of database or method to store your log. You need to secure it by using tokens as a example.
Related to apache module, the intention was to made a pipe log but the last commented part it's from 2006, and the documentation is not user friendly.

Related

How to restore a MySQL database "safely"

How should you go about restoring (and backing) up a MySQL database "safely"? By "safely" I mean: the restore should create/overwrite a desired database, but not risk altering anything outside that database.
I have already read https://dev.mysql.com/doc/refman/5.7/en/backup-types.html.
I have external users. They & I may want to exchange backups for restore. We do not have a commercial MySQL Enterprise Backup, and are not looking for a third-party commercial offering.
In Microsoft SQL Server there are BACKUP and RESTORE commands. BACKUP creates a file containing just the database you want; both its rows and all its schema/structure are included. RESTORE accepts such a file, and creates or overwrites its structure. The user can restore to a same-named database, or specify a different database name. This kind of behaviour is just what I am looking for.
In MySQL I have come across 3 possibilities:
Most people seem to use mysqldump to create a "dump file", and mysql to read that back in. The dump file contains a list of arbitrary MySQL statements, which are simply executed by mysql. This is quite unacceptable: the file could contain any SQL statements. (Limiting access rights of restoring user to try to ensure it cannot do anything "naughty" is not acceptable.) There is also the issue that the user may have created the dump file with the "Include CREATE Schema" option (MySQL Workbench), which hard-codes the original database name for recreation. This "dump" approach is totally unsuitable to me, and I find it surprising that anyone would use it in a production environment.
I have come across MySQL's SELECT ... INTO OUTFILE and LOAD DATA INFILE statements. At least they do not contain SQL code to execute. However, they look like a lot of work, deal with a table at time not the whole database, and don't deal with the structure of the tables, you have to know that yourself for restoring. There is a mysqlimport helper command-line utility, but I don't see anything for the export side, and I don't see it for restoring a complete database.
The last is to use what MySQL refers to as "Physical (Raw)" rather than "Logical" Backups. This works on the database directories and files themselves. It is the equivalent of SQL Server's detach/attach method for backing up/restoring. But, as per https://dev.mysql.com/doc/refman/5.7/en/backup-types.html, it has all sorts of caveats, e.g. "Backups are portable only to other machines that have identical or similar hardware characteristics." (I have no idea, e.g. some users are Windows versus Windows, I have no idea about their architecture) and "Backups can be performed while the MySQL server is not running. If the server is running, it is necessary to perform appropriate locking so that the server does not change database contents during the backup." (let alone restores).
So can anything satisfy (what I regard as) my modest requirements, as outlined above, for MySQL backup/restore? Am I really the only person who finds the above 3 as the only, yet unacceptable, possible solutions?
1 - mysqldump - I use this quite a bit, usually in environments where I am handling all the details myself. I do have one configuration where I use that to send copies of a development database - to be dumped/restored in its entirety - to other developers. It is probably the fastest solution, has some reasonable configuration options (e.g., to include/exclude specific tables) and generates very functional SQL code (e.g., each INSERT batch is small enough to avoid locking/speed issues). For a "replace entire database" or "replace key tables in a specific database" solution, it works very well. I am not too concerned about the "arbitrary SQL commands" problem - if that is an issue then you likely have other issues with users trying to "do their own thing".
2 - SELECT ... INTO OUTFILE and LOAD DATA INFILE - The problem with these is that if you have any really big tables then the LOAD DATA INFILE statement can cause problems because it is trying to load everything all at once. You also have to add code to create (if needed) or empty the tables before LOAD DATA.
3 - Physical (raw) file transfer. This can work but under limited circumstances. I had one situation with a multi-gigabyte database and decided to compress the raw files, move them to the new machine, uncompress and just tell MySQL "everything is already there". It mostly worked well. But I would not recommend it for any unattended/end-user process due to the MANY possible problems.
What do I recommend?
1 - mysqldump - live with its limitations and risks, set up a script to call mysqldump and compress the file (I am pretty sure there are options in mysqldump to do the compression automatically), include the date in the file name so that there is less confusion as the files are sent around, and make a simple script for users to load the file.
2 - Write your own program. I have done this a few times. This is more work initially but allows you to control every aspect of the process and transfer a file that only contains data without any actual SQL code. You can control the specific database, tables, etc. One catch is that if you make any changes to the table structure, indexes, etc. you will need to make sure that information is somehow transmitted to the receiving problem so that it can change the structures as needed - that is not a problem with mysqldump as it normally replaces the tables, creating the new structures, indexes, etc. This can be written in any language that can connect to MySQL - it does not have to be the same language as your application.
If you're not going to use third party tools (like innobackupex for example) then you're limited to use ... mysqldump, which is in the mysql package.
I can't understand why it is not acceptable for you, why you don't like sql commands in those dumps. Best practice,when restoring a single db into the server, which already contains other databases, is to have a separated user, with rights only to write into the restored db. Then even if the user performing restore, would change the sql commands and tried to write to another db, they will not be able to.
When doing raw backup (physical copy of database files) you need to have all the instances down, mysql server not running. Similar hardware means you need to have the same directories as the source server (unless you would change my.cnf before starting the server, and putting all the files to right directories).
When coming into mysql, try to not compare it to sql server - it's totally different approach and philosophy.
But if you would convinced yourself anyhow to use third party tool - I recommend innobackex from Percona, which is free btw.
The export tool that complements mysqlimport is mysqldump --tab. This outputs CSV files like SELECT...INTO OUTFILE. It also outputs the table structure in much smaller .sql files. So there are two files for each table.
Once you recreate your tables from the .sql files, you can use mysqlimport to import all the data files. You can even use the mysqlimport --use-threads option to make it load multiple data files in parallel.
This way you have more control over which schema to load the data into, and it should run a lot faster than loading a large SQL dump.

How to find (decode) PostgreSQL query from Wireshark File?

I am on Linux platform with PostgreSQL 5.5. I am trying to monitor all traffic related to PostgreSQL between Master and Slave. To that end, I used Wireshark to monitor the traffic. Then, I started PostgreSQL and ran three queries (Create table Hello, Create table Bye & inserted an image to PostgreSQL database). During queries, I ran Wireshark on Master just to capture the traffic between Master and Slave.
But there is one problems with PostgreSQL traffic captured using Wireshark. All the traffic is sent/received in TCP packets and that traffic is in coded form. I can't read that data. I want to find out all those three queries from Wirehsark that I inserted in PostgreSQL database.
What is the best way to go about finding queries of PostgreSQL?
On the other hand, I ran same queries on MySQL database and repeated above mentioned experiment. I can easily read all those three queries in wireshark dump because they are not in coded form.
Wireshark file of PostgreSQL experiment is available on Wireshark-File. I need to find out above three queries from Wirehsark file.
About File:
192.168.50.11 is the source machine from where I inserted queries to remote PostgreSQL's Master server. 192.168.50.12 is the IP of Master's server. 192.168.50.13 is the slave's IP address. Queries were executed from .11 and inserted into .12 and then replicated to .13 using Master-Slave approach.
Pointers will be very welcome.
You are probably using WAL-based replication (the default) which means you can't.
This involves shipping the transaction-logs between machines. This is actual on-disk representation of the data.
There are alternative trigger-based replication methods (slony etc) and the new logical replication.
Neither will let you recreate the complete original query as I understand it, but would let you get closer.
There are systems which duplicate the queries on nodes (like MySQL) but they aren't quite the same thing.
If you want to know exactly what queries are running on the master, turn on query logging and monitor the logs instead.
Solution to my own problem:
I got the solution of my question.
I used Python code to insert queries into remote PostgreSQL database. I used following line in PostgreSQL to connect with database.
con = psycopg2.connect(host="192.168.50.12", database="postgres", user="postgres", password="faban")
If you use above approach then all the data will be sent in encrypted form. If you use the approach given below in python code then all the data will be sent in decrypted form. You can easily read all queries in Wireshark.
con = psycopg2.connect("host=192.168.50.12 dbname=postgres user=postgres password=faban sslmode=disable")
Same is the case in C-Code as well.
Decrypted data
sprintf(conninfo, "dbname=postgres hostaddr=192.168.50.12 user=postgres password=faban sslmode=disable");
Encrypted Data
sprintf(conninfo, "dbname=postgres hostaddr=192.168.50.12 user=postgres password=faban");

Pushing data to client whenever a database field changes

I'm using socket.io to send data to client from my database. But my code send data to client every second even the data is the same. How can I send data only when field in db was changed not every 1 second.
Here is my code: http://pastebin.com/kiTNHgnu
With MySQL there is no easy/simple way to get notified of changes. A couple of options include:
If you have access to the server the database is running on, you could stream MySQL's binlog through some sort of parser and then check for events that modify the desired table/column. There are already such binlog parsers on npm if you want to go this route.
Use a MySQL trigger to call out to a UDF (user-defined function). This is a little tricky because there aren't a whole lot of these for your particular need. There are some that could be useful however, such as mysql2redis which pushes to a Redis queue which would work if you already have Redis installed somewhere. There is a STOMP UDF for various queue implementations that support that wire format. There are also other UDFs such as log_error which writes to a file and sys_exec which executes arbitrary commands on the server (obviously dangerous so it should be an absolute last resort). If none of these work for you, you may have to write your own UDF which does take quite some time (speaking from experience) if you're not already familiar with the UDF C interface.
I should also note that UDFs could introduce delays in triggered queries depending on how long the UDF takes to execute.

History of queries in MySql

Is there any way to check the query that occurs in my MySql database?
For example:
I have an application (OTRS) that allows you to generate reports according to the frames that I desire. I would like to know which query is made by the application in the database.
Because I will use it to integrate with other reporting software.
Is this possible?
Yes, you can enable logging in your MySQL server. there are several types of logs you can use, depending on what you want to log, starting from errors only or slow queries, and to logs that write everything done on your server.
See the full doc here
Although, as Nir says, mysql can log all queries (you should be looking at the general log or the slow log configured with a threshold of 0 seconds) this will show all the queries being run; on a production system it may prove difficult to match what you are doing in your browser with specific entries in the log.
The reason I suggest using the slow query log is that there are tools available which will remove the parameters from the queries, allowing you to see what SQL code is running more frequently.
If you have some proficiency in Perl it should be straightforward to output - all queries are processed via an abstraction layer.
(Presumably you are aware that the schema is published)

MySQL log reader

So, I'm trying to analyse some of my program's MySQL queries. However, while I've got MySQL general query logging turned on, and can view the log file in a text editor (eg. notepad++), the program writes 1000s of lines of query a minute, so I could do with a slightly better program for reading the logs. Things that would be nice:
Better syntax highlighting.
Real-time updating.
doesn't get too slow when looking at long files
Handles random binary sequences in the log without breaking
Any suggestions?
Edit: Windows-7 compatible programmes only
You can try using tail -f <file_path>. That will follow the log as it's appended to.
Additionally, you could give multitail a try. It supports syntax highlighting (through regex).
pt-query-digest from the Percona Toolkit (= Maatkit, but Maatkit will not be developed any further, so switch to the Percona Toolkit). Don't use as a 'live' inspector though, but just as a bulk tool.
Use mysql log tables like general log and slow query log.
Update your mysql config file with:
general_log=1
slow_query_log=1
slow-launch-TIME = 2
log-output = TABLE
OR
You can use MySQL Administrator to view logs(general log, slow query log, error log).
OR
You can also view that log file using TextPad software. It can support a file more than a GB to read write.
So far, from testing out a bunch of programmes, the best option I've found is baretail, which has good real-time updating and handles large files reasonably well. It could do with better MySql-specific syntax, but it's not bad.
Alternatively, it turns out that there are actually options in notepad++ (in preferences: misc) to turn on real-time updating, but this doesn't work well unless you have focus on the notepad++ window
There's also a windows implementation of tail