MySQL tracking system - mysql

I have to implement a tracking system backed up by a MySQL database. The system will track many apps with at least 5 events tracked for each app (e.g. how many users clicked on link x, how many users visited page y). Some apps will have millions of users so a few thousand updates/second is not a far fetched assumption.
Another component of the system will have to compute some statistical info that should be update every minute. The system should also record past values of those statistical values.
The approach a friend of mine suggested was to log every event in a log table and have a cron job that runs every minute and computes the desired info and updates a stats table.
This sounds reasonable to me. Are there better alternatives?
Thanks.

I've logged to a mysql log table with a cron that crunches it.
I generally use innodb tables in my apps, but for the log table I did it as myisam and used insert DELAYED . . . queries.
Myisam doesn't provide all the goodies of innodb, but I believe it is slightly faster (for that reason).
The main thing you are worried about is database locking when your cron is running, but using "insert delayed" gets around that problem for the most part.

if your hits rate it too high for even insert delated into myisam table to handle, you may want to keep recent hits in memory (memcache can come in handy, or a custom daemon you can write) and process the hits from memory periodically into the database stats table (aggregated).

I would really recommend you to use an already existing log analyzer analyzing the already existing logs from your web server. One example is webalizer. Even better in my opinion is an external system such as google analytics. This works better since it will keep working with intermediate systems such as load balancers and caches in place.

Related

Best way to handle a MySQL table with millions of records updating and large readings

I have a table of about 5 millions of records, which are being updated every updated a lot (about 10.000 of them every minute).
From the other side, I have to read that table a lot. Fortunately, I don't need the data of "right now" and I can "cache" it (posible solution based on this below), but no more than 20 seconds old.
That could lead to table locking and I'm afraid of that ... any solution ?
I thought about leaving the table just for updating issues, and make a VIEW to copy all data to a a table dedicated just to be read, but it's a big table and it'd take too much time.
Any other ideas ?
There are a lot of things you've not told us about. The structure of the table, the nature of the updates, the database engine, the underlying hardware, the constraints of costs and time, the required ressilliency of the solution, the stating you have already done to try to identify the core issues. This is not an invitation to share this - we can't advise on capacity planning, describing even the basic steps for performance tuning goes way beyond the scope of a post here.
I expect people will vote to close this, but I will gives you a few pointers:
Build the capability to test the traffic volume and measure the performance so that you're not wasting your time in your tuning efforts.
Table locking should only be an issue on myisam not the other engines.
Use the handler api to connect to the database, ideally via an event based daemon which can aggregate the updates into fewer logical operations.
Pay attention to how you configure your storage.
Set up asynch replication to a slave node and do your reads on that.

Reduce database writes with memached

I would like to convert my stats tracking system not to write to the database directly, as we're hitting bottlenecks.
We're currently using memcached for certain aspects of the site, and I wanted to use it for storing stats and committing them to mysql DB periodically.
The issue lies however in the number of items (which is in the millions) for which potentially there could be stats collected between the cronjob runs that would commit them into the database. Other than running a SELECT * FROM data and checking for existence of every single memcache key, and then updating the table.... is there any other way to do this?
(I'm not saying below is gospel, this is just my gut feeling. As said later on, I don't have the specifics of your system :) And obviously no offence meant etc :) )
I would advice against using memcached for this. Memcached is build te quickly retrieve values that you've gotten before, not to store values. The big difference is that is your cache is getting full, you'll loose your data.
Normally, you'd just have no data in your cache, and recollect the data from the source, which is impossible in this case. That alone would be a reason for me to try an dissuade you from this.
Now you say the major problem is the mysql connection limit you are hitting. If you do simple stuff (like what we talked about in the comments: the insert delayed), it's just a case of increasing the limit. You should probably have enough power to have your scripts/users go to the database once and say "this should eventually be added", and then go away. If your users can't even open 1 connection for that, there's a serious resource problem you probably won't fix by adding extra layers of cache?
Obviously hard to say without any specs of the system, soft and hardware, but my suggestion would be to see if you can just let them open their connections by increasing the limit, and fiddle with the server variables a bit, instead of monkey-patching your system by using a memcached as an in-between layer.
I had a similar issue with statistic data. But please don't use memcached for it. You can't be sure that ALL your items will moved to DB. You can loose data and/or double process data.
You should analyse your bottleneck against how much data you are writing/reading and how many connections you need. And than switch to something scalable like Hadoop, Cassandra, Scripe and other systems.
You need to provide additional information on the platform that you are running: O/S, database (version), storage engine, RAM, CPU (if possible)?
Are you inserting into a single table or more than one table?
Can you disable the indexes on the tables you are inserting into as this slows down the insert functions.
Are you running any triggers or stored procedures to compute values as you insert the raw data?

Can sqlite3 handle 30 concurrent update requests gracefully?

We like the simplicity of sqlite3 but are concerned about its ability to handle concurrent updates gracefully. Our web app is for about 30 users (50 users maximum) who has rights to update and a number of web users (let's say 500 web users) who can only read the page. Those 30 (50) users likely will not do update simultaneously. Daily update to the db should be no more than 1000 updates (consider saving one db record into a table as ONE update) on regular base. The update activity most likely happens during the 9am-5pm work hour.
Since sqlite3 locks the whole db for update (not sure if it locks for read request), our question is that is sqlite3 powerful enough to handle the concurrent updates gracefully in our situation without showing the exception error.
Thanks so much.
I think you already have enough information about how SQLite works. So the answer to your question is "yes" it can handle. But the real question is what would be the performance? It depends on the frequency of updates/inserts to your database. Updates will lock and keep reads waiting.
Let's say the performance is acceptable and you use it. What if your database gets corrupted? Even most advanced DBMS systems can have corrupted data. There can be many reasons of this from server shutdown to bugs. If your SQLite gets corrupted, as far as I know it is harder to recover the database file.
I'd strongly suggest don't risk and use a non-embedded DBMS.

Most efficient method of logging data to MySQL

We have a service which sees several hundred simultaneous connections throughout the day, peeking at about 2000, for about 3 million hits a day, and growing. With each request I need to log 4 or 5 pieces of data to MySQL, we originally used the logging that came with the app were using however it was terribly inefficient and would run my db server at >3x the average cpu load, and would eventually bring the server to it knees.
At this point we are going to add our own logging to the application (php), the only option I have for logging data is the MySQL db, as this is the only common resource available to all of the http servers. This data will be mostly writes however everyday we generate reports based on the data, then crunch and archive the old data.
What recommendations can be made to ensure that I don't take down our services with logging data?
The solution we took with this problem was to create an archive table then regularly ( every 15 minutes, on an app server) crunch the data and put it back into the tables that were used to generate reports. The archive table of course did not have any indices, the tables which the reports are generated from have several indices.
Some stats on this approach:
Short Version: >360 times faster
Long Version:
The original code/model did direct inserts into the indexed table, and the average insert took .036 seconds, using the new code/model inserts took less than .0001 seconds (I was not able to get an accurate fix on the insert time I had to measure 100,000 inserts and average for the insert time). The post-processing (crunch) took an average 12 seconds for several tens-of-thousands records. Overall we were greatly pleased with this approach and so far it has worked incredibly well for us.
Based on what you describe, I recommend you try to leverage the fact that you don't need to read this data immediately and pursue a "periodic bulk commit route". That is, buffer the logging data in RAM on the app servers and doing periodic bulk commits. If you have multiple application nodes, some sort of randomized approach would help even more (e.g., commit updated info every 5 +/- 2 minutes).
The main drawback with this approach is that if an app server fails, you lose the buffered data. However, that's only bad if (a) you absolutely need all of the data and (b) your app servers crash regularly. Small chance that both are true, but in the event they are, you can simply persist your buffer to local disk (temporarily) on an app server if that's really a concern.
The main idea is:
buffering the data
periodic bulk commits (leveraging some sort of randomization in a distributed system would help)
Another approach is to stop opening and closing connections if possible (e.g., keep longer lived connections open). While that's likely a good first step, it may require a fair amount of work on your part on a part of the system that you may not have control over. But if you do, it's worth exploring.

How many databases can MySQL handle?

My MySql server currently has 235 databases. Should I worry?
They all have same structure with MyISAM tables.
The hardware is a virtual machine with 2 GB RAM running on a Quad-Core AMD Opteron 2.2GHz.
Recently cPanel sent me an email saying that MySql has failed and a restart has been made.
New databases are being expected to be created and I wonder if I should add more memory or if I should simply add another virtual machine.
The "databases" in mysql are really catalogues, is has no effect on its limits whether you put all the tables in one or each in its own.
The main problem is the table cache. Without tuning it, you're going to have the default table cache (=64 typically), which means you will be closing a table every time you open one. This is incredibly bad.
Except in MyISAM, it's even worse, because closing a table throws its key blocks out of the key cache, which means subsequent index lookups or scans will be reading actual blocks from disc, which is horrible and slow and really needs to be avoided.
My advice is:
If possible, immediately increase the table cache to > the total number of tables
Monitor the global status variable Opened_Tables in your monitoring; if it increases rapidly, this is bad.
Carry out performance and robustness testing on your the same hardware in a non-production environment (if you are not doing so already).
(reposting my comment for better visibility)
Thank you all for your comments. The system is something similar with Google Analytics. Users website's visits are being logged into a "master" table. A native application is monitoring the master table and processes the registered visits and writes them to users' database. Each user has its own DB. This has been decided for sharding. Various reports and statistics are being run for each user. And it is faster if it only runs on specific DB (less data) I know this is not the best setup. But we have to deal with it for a while.
I dont believe there is a hard limit, the only thing that's really limiting you will be your hardware and the traffic these databases will be getting.
You seem to have very little memory, which probably means you dont have massive numbers of connections...
You should start by profiling usage for each database (or set of databases, depending on how they are used of course).
My suggestion - MySQL (or any database server for that matter) could use more memory. You can never have enough.
You are doing it wrong.
Comment with some specifics about your databases, and we can probably fill you in on where your design went wrong.