Persistence & Performance of libapache2-mod-log-sql - mysql

I have been working on a requirement for our apache2 logs to be recorded to a mysql database, instead of the text log file norm.
I had no difficulty in accomplishing the setup and config, and it works as expected, however there seems to be a bit of information that I cannot find (or may very well be that I am searching for the wrong thing).
Is there anyone out there who use (or even like to use) libapache2-mod-log-sql that are able to tell more about its connection to mysql? Is it persistent? What kind of resource impact should I expect?
These two issues are core to my research, and yet so rare to find info on.
thanks in advance.

Related

Error accessing Cosmos through Hive

Literally from:
https://ask.fiware.org/question/84/cosmos-error-accessing-hive/
As the answer in the quoted FIWARE Q&A entry suggest the problem is fixed by now. its here: https://ask.fiware.org/question/79/cosmos-database-privacy/. However, it seems like other issues arisen related to the solution, namely: Through ssh connection, the typing the hive command results in the following error: https://cloud.githubusercontent.com/assets/13782883/9439517/0d24350a-4a68-11e5-9a46-9d8a24e016d4.png the hiveSQL queries work fine (through ssh) regardless the error message.
When launching exactly the same hiveSQL queries (each one of them worked flawlessly two weeks ago) remotely, the request times out even in absurd time windows (10 minutes). The most basic commands ('use $username;', 'show tables';) also time out.
(The thrift client is: https://github.com/garamon/php-thrift-hive-client)
Since the Cosmos usage is an integral part of our project, it is of utmost importance whether it is a temporal issue caused by the fixes or it is a permanent change in the remote availability (could not identify relevant changes in the documentation).
Apart from fixing the issue you mention, we moved to a HiveServer2 deployment instead of the old Hive server (or HiveServer1), which had several performance drawbacks dued to, indeed, the usage of Thrift (particularly, only one connection could be served at the same time). HiveServer2 now allows for parallel queries.
Being said that, most probably the client you are using is not valid anymore since it could be specifically designed for working with a HiveServer1 instance. The good news are it seems there several other client implementations for HS2 using PHP, such as https://github.com/QwertyManiac/hive-hs2-php-thrift (this is the first entry I found when performing a search in Google).
What is true is this is not officialy documented anywhere (it is only mentioned in this other SOF question). So, nice catch! I'll add it inmediatelly.

MySQL Workbench Safety

I'm a little (very) paranoid. I work with workbench for my own projects as well as for work. One of the things I am completely frightened about is performing dangerous SQL commands like delete from x whilst connected to my work's remote server erroneously (thinking it is my development machine). The question then is, is there a way to configure workbench to prevent you from making stupid (and usually tired) mistakes, or is there an alternative that does? Or is it just a thing of being more controlled?
Getting a confirmation from the user as suggested in the comments is probably not a good solution, depending on the frequency you send queries. After the first 20-30 of such confirmation dialogs you get tired of them and just click them away.
A much better way is to establish 2 simple habits:
Give your users only the absolute minimum of privileges, so that they can do their work. This limits the damage they can cause.
Make backups. There's no question if you need a backup, only when.

How to keep databases synchronized between hosting account and a local testing server?

I have several databases hosted on a shared server, and a local testing server which I use for development.
I would like to keep both set of databases somewhat synchronized (more or less daily).
So far, my ideas to solve the problem seem very clumsy. Anyway, for reference, here is what I have considered so far:
Make a database dump from online databases, trash local databases, and recreate the databases from the dump. It's a lot of work and requires a lot of download time (which guarantees I won't do it as much as I would like it to be done)
Write a small web service to access the new data, and write a small application locally to communicate with said web service, download the newest data, and update the local databases.
Both solutions sound like a lot of work for a problem that is probably already solved a zillion times over. Or maybe it's even an existing feature which I completely overlooked.
Is there an easy way to keep databases more or less in synch? Ideally something that I can set up once, schedule and forget about.
I am using MySQL 5 (MyISAM) databases on both servers.
=============
Edit: I had a look at replication, but it seems that I can't go that route because the shared hosting does not give me enough control on the server itself (I got most permissions on my databases, but not on the MySQL server itself)
I only need to keep the data synchronized, nothing else. Is there any other solution that doesn't require full control on the server?
Edit 2:
Sorry, I forgot to mention I am running on a LAMP stack on the shared server, so Windows-only solutions won't work.
I am surprised to see that there is no obvious off-the-shelves solution for this problem.
Have you considered replication? It's not to be trifled with but may be what you want. See here for more details... http://dev.mysql.com/doc/refman/5.0/en/replication-configuration.html
Take a look at Microsoft Sync Framework - you will need to code in .net, but it can resolve your issues.
http://msdn.microsoft.com/en-in/sync/default(en-us).aspx
Here is a sample for SQL server, but it can be adapted to mysql as well using ado.net provider for Mysql.
http://code.msdn.microsoft.com/sync/Release/ProjectReleases.aspx?ReleaseId=4835
You will need the additional tables for change tracking and anchors (keeping track of last synchronization) for this to work, in your mysql database, but you wont need full control as long as you can access the db.
Replication would have simpler :), but this might just work in your case.

SQLite concurrency issue a deal breaker?

I am looking at databases for a home project (ASP.NET MVC) which I might host eventually. After reading a similar question here on Stack Overflow I have decided to go with MySQL.
However, the easy of use & deployment of SQLite is tempting, and I would like to confirm my reasons before I write it off completely.
My goal is to maintain user status messages (like Twitter). This would mean mostly a single table with user-id/status-message couples. Read / Insert / Delete operation for status message. No modification is necessary.
After reading the following paragraph I have decided that SQLite can't work for me. I DO have a simple database, but since ALL my transaction work with the SAME table I might face some problems.
SQLite uses reader/writer locks on the entire database file. That means if any process is reading from any part of the database, all other processes are prevented from writing any other part of the database. Similarly, if any one process is writing to the database, all other processes are prevented from reading any other part of the database.
Is my understanding naive? Would SQLite work fine for me? Also does MySQL offer something that SQLite wouldn't when working with ASP.NET MVC? Ease of development in VS maybe?
If you're willing to wait half a month, the next SQLite release intends to support write-ahead logging, which should allow for more write concurrency.
I've been unable to get even the simple concurrency SQLite claims to support to work - even after asking on SO a couple of times.
Edit
Since I wrote the above, I have been able to get concurrent writes and reads to work with SQLite. It appears I was not properly disposing of NHibernate sessions - putting Using blocks around all code that created sessions solved the problem.
/Edit
But it's probably fine for your application, especially with the Write-ahead Logging that user380361 mentions.
Small footprint, single file installation, fast, works well with NHibernate, free, public domain - a very nice product in almost all respects!

We're using JDBC+XMLRPC+Tomcat+MySQL to execute potentially large MySQL queries. What is a better way?

I'm working on a Java based project that has a client program which needs to connect to a MySQL database on a remote server. This was implemented is as follows:
Use JDBC to write the SQL queries to be executed which are then hosted as a servlet using Apache Tomcat and made accessible via XML-RPC. The client code uses XML-RPC to remotely execute these JDBC based functions. This allows us to keep our MySQL database non-public, restricts use to the pre-defined functions, and allows Tomcat to manage the database transactions (which I've been told is better than letting MySQL do it alone, but I really don't understand why). However, this approach requires a lot of boiler-plate code, and Tomcat is a huge memory hog on our server.
I'm looking for a better way to do this. One way I'm considering is to make the MySQL database publicly accessible, re-writing the JDBC based code as stored procedures, and restricting public use to these procedures only. The problem I see with this are that translating all the JDBC code to stored procedures will be difficult and time consuming. I'm also not too familiar with MySQL's permissions. Can one grant access to a stored procedure which performs select statements on a table, but also deny arbitrary select statements on that same table?
Any other ideas are welcome, as are thoughts and or sugguestions on the stored procedure solution.
Thank you!
You can probably get the RAM upgraded in your server for less than the cost of even a few days development time, so don't write any code if that's all you're getting from the exercise. Also, just because the memory is used inside of tomcat, it doesn't mean that tomcat itself is using it. The memory could be used up by data or by technical flaws in your code.
If you've tried additional RAM and it is being eaten up, then that smells like a coding issue, so I'd suggest using a profiler, or log data to try and work out what the root cause is before changing anything. If the cause is large data sets then using the database directly will only delay the inevitable, instead you'd need to look at things like paging, summarisation, client side caching, or redesigning clients to reduce the use of expensive queries. Using a profiler, or simply reviewing the code base, will also tell you if something is creating too many objects (especially strings, or XML nodes) or leaking memory.
Boiler plate code can be avoided by refactoring creatively, and its good that you do avoid repetition. Its unclear how much structure you might already have, but with a little work its easy to centralise boilerplate JDBCs calls. There is no fundamental reason JDBC code should be repeated, perhaps you could tell us what code is being repeated?
Finally, I'll venture that there are many good reasons to put a web tier over your database. Flexibility (of deployment), compatibility, control (over the SQL) and security are all good reasons to keep the web tier.
MySQL 5.0.3+ does have an execute privilege that you can set (without setting select privileges) that should allow you to get the functionality you seek.
However, note this mysql bug report with JDBC (well and a lot of other drivers).
When calling the [procedure] with JDBC, I get "java.sql.SQLException: Driver requires
declaration of procedure to either contain a '\nbegin' or '\n' to follow argument
declaration, or SELECT privilege on mysql.proc to parse column types."
the workaround is:
See "noAccessToProcedureBodies" in /J 5.0.3 for a somewhat hackish, non-JDBC compliant
workaround.
I am sure you could implement your solution without much boiler-plate, esp. using something like Spring's remoting. Also, how much memory is Tomcat eating? I frankly believe that if it's just doing what you are describing, it could work in less than 128mb (conservative guess).
Your alternative is the "correct by the book" way of solving the problem. I say build a prototype and see how it works. The major problems you could have are:
MySQL having some important gotcha in this regard
MySQL's Stored Procedure support being too primitive and forcing you to do a lot of work
Some other strange hiccup
I'm probably one of those MySQL haters, so the situation might be better than I think.