I have a dating website in which i send daily alerts and log alerts in ALERTS_LOG.
CREATE TABLE `ALERTS_LOG` (
`RECEIVERID` mediumint(11) unsigned NOT NULL DEFAULT '0',
`MATCHID` mediumint(11) unsigned NOT NULL DEFAULT '0',
`DATE` smallint(6) NOT NULL DEFAULT '0',
KEY `RECEIVER` (`RECEIVER`),
KEY `USER` (`USER`)
) ENGINE=MRG_MyISAM DEFAULT CHARSET=latin1 INSERT_METHOD=LAST UNION=(`ALERTS_LOG110`,`ALERTS_LOG111`,`ALERTS_LOG112`)
Logic Of Insertion : I have create merge table and each sub tables like ALERTS_LOG110 store 0-15 days record. On every 1st and 16th i create a new table and change definition of mergeMyisam.
Example : INSERT_METHOD=LAST UNION=(ALERTS_LOG111,ALERTS_LOG112,ALERTS_LOG113).
Advantage :
Deletion of is super fast.
Issues with this approach:
1. When i change definition, i often got site down issue as when i change the definition, indexes need to get on cache and all select queries got stuck.
2. Locking issue because of too many inserts and select.
So, can I look MongoDB for solving this issue?
No, not really. Re-engineering your application to use two different database types because of performance on this log table seems like a poor choice.
It's not really clear why you have so many entries being logged, but on the face of it look might like to look into partitioning in MySQL and partition your table by day or week and then drop those partitions. Deletion is still super fast and there would be no downtime for it because you won't be changing object names every day.
Related
I have a large table called "queue". It has 12 million records right now.
CREATE TABLE `queue` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`userid` varchar(64) DEFAULT NULL,
`action` varchar(32) DEFAULT NULL,
`target` varchar(64) DEFAULT NULL,
`name` varchar(64) DEFAULT NULL,
`state` int(11) DEFAULT '0',
`timestamp` int(11) DEFAULT '0',
`errors` int(11) DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `idx_unique` (`userid`,`action`,`target`),
KEY `idx_userid` (`userid`),
KEY `idx_state` (`state`)
) ENGINE=InnoDB;
Multiple PHP workers (150) use this table simultaneously.
They select a record, perform a network request using the selected data and then delete the record.
I get mixed execution times from the select and delete queries. Is the delete command locking the table?
What would be the best approach for this scenario?
SELECT record + NETWORK request + DELETE the record
SELECT record + NETWORK request + MARK record as completed + DELETE completed records using a cron from time to time (I don't want an even bigger table).
Note: The queue gets new records every minute but the INSERT query is not the issue here.
Any help is appreciated.
"Don't queue it, just do it". That is, if the tasks are rather fast, it is better to simply perform the action and not queue it. Databases don't make good queuing mechanisms.
DELETE does not lock an InnoDB table. However, you can write a DELETE that seems that naughty. Let's see your actual SQL so we can work in improving it.
12M records? That's a huge backlog; what's up?
Shrink the datatypes so that the table is not gigabytes:
action is only a small set of possible values? Normalize it down to a 1-byte ENUM or TINYINT UNSIGNED.
Ditto for state -- surely it does not need a 4-byte code?
There is no need for INDEX(userid) since there is already an index (UNIQUE) starting with userid.
If state has only a few value, the index won't be used. Let's see your enqueue and dequeue queries so we can discuss how to either get rid of that index or make it 'composite' (and useful).
What's the current value of MAX(id)? Is it threatening to exceed your current limit of about 4 billion for INT UNSIGNED?
How does PHP use the queue? Does it hang onto an item via an InnoDB transaction? That defeats any parallelism! Or does it change state. Show us the code; perhaps the lock & unlock can be made less invasive. It should be possible to run a single autocommitted UPDATE to grab a row and its id. Then, later, do an autocommitted DELETE with very little impact.
I do not see a good index for grabbing a pending item. Again, let's see the code.
150 seems like a lot -- have you experimented with fewer? They may be stumbling over each other.
Is the Slowlog turned on (with a low value for long_query_time)? If so, I wonder what is the 'worst' query. In situations like this, the answer may be surprising.
We have a sports shopping website that recommends products to users. our query recommends by doing a JOIN on three tables of the following effect: (1) what sports a user is interested in, (2) what products are part of that sport, and (3) eliminate products the user has already bought. We have three tables currently. The response time is 3 seconds.
In an effort to make the query response faster, we are proposing combing two tables into one table . The attached image shows the proposed logic. My question is:
is the proposed query even possible as a single query
if all else is equal, will the proposed logic be faster than the current logic - even if it is a small amount?
We are on AWS MySQL RDS. All indexes have been done correctly. Please don't discuss about migrating to Redis, MEMSql etc, i am just interested at this stage to understand if the proposed logic will be faster.
Thank you for your help!!
CREATEs
CREATE TABLE UserPreferences (
UserPreferenceId int(11) NOT NULL AUTO_INCREMENT,
UserId int(11) NOT NULL,
FamilyId int(11) NOT NULL,
InsertedDate datetime NOT NULL,
PRIMARY KEY (UserPreferenceId),
KEY userID (UserId),
KEY FamilyId (FamilyId),
KEY user (UserId),
KEY fk_UserPreferences_1 (FamilyId),
) ENGINE=InnAoDB AUTO_INCREMENT=261 DEFAULT CHARSET=utf8
CREATE TABLE ArticleToFamily (
ArticleToFamilyId int(10) unsigned NOT NULL AUTO_INCREMENT,
ArticleId int(11) DEFAULT NULL,
FamilyId int(11) unsigned NOT NULL,
InsertedDate datetime DEFAULT NULL,
Confidence int(11) NOT NULL DEFAULT '0',
Rank int(11) NOT NULL DEFAULT '0',
PRIMARY KEY ArticleToFamilyId),
KEY ArticleIdAndFamilyId` (ArticleId,FamilyId),
KEY FamilyId (FamilyId)
) ENGINE=InnoDB AUTO_INCREMENT=19795572 DEFAULT CHARSET=latin1
CREATE TABLE ItemsUserHasBought (
ItemsUserHasBoughtId int(11) NOT NULL AUTO_INCREMENT,
UserId int(11) NOT NULL,
ArticleId int(11) NOT NULL,
BuyDate datetime NOT NULL,
InsertedDate datetime NOT NULL,
UpdatedDate datetime NOT NULL,
Status char(1) NOT NULL DEFAULT '1',
PRIMARY KEY (ItemsUserHasBoughtId),
KEY ArticleId (ArticleId)
) ENGINE=InnoDB AUTO_INCREMENT=367 DEFAULT CHARSET=latin1
Don't do it.
Combining tables usually means denormalization of some kind, which is not the direction you want to be moving in a relational database. It's rarely side-effect free and often fails to achieve the desired gains. All in all, something to avoid, to be done only when all other avenues are exhausted.
Instead, check your indexes on the three tables that you have. It's likely that adding a foreign key in the right place can easily make this query run in a fraction of it's current time. Unfortunately, until we know what indexes you're already using, we can't be any more specific about how to improve it. It's also possible you're doing the right things here, and are really hitting a wall in terms of what your server is able to do... but probably not.
If indexes don't help, the next place I'd usually look is a materialized/indexed view. This is supported by Sql Server, Oracle, Postgresql, and most other modern database server engines. Sadly, like Windowing Functions, the APPLY/lateral join operation, and correct NULL handling, indexed views are among the many parts of ansi sql where MySql lags behind other dbs. MySql is sadly becoming more and more of a joke with each passing year... but then that's probably all part of Oracle's plan since the Sun acquisition. If you really want an open source DB, Postgresql has outclassed MySql for years now in pretty much every category. MySql is living now off of it's old momentum; it's popular because it's been popular, and is therefore widely available among the low-cost web hosts, but not at all because it's better.
Don't get me wrong: MySql used to be a great option. Postgresql hardly existed, and Oracle and Sql Server weren't any better back then and priced out of reach for most small businesses. But Oracle, Sql Server, Postgresql, and others have all moved on in ways that MySql hasn't. Postgresql, specifically, has gotten easier to manage while MySql has lost some of the simplicity that gave it an advantage, without picking up enough features that really matter.
But anyone can be an armchair architect, and I've editorialized way too much already. Given wholesale database change isn't likely to be an option for you by now anyway, take a long close look at your indexes. It's a good bet you'll be able to fix your problem that way. And if you can't, you can always throw more hardware at your server. Because MySql is cheaper, right?
I have a service in which users may "like" content posted by other users. Currently, the system doesn't filter out content that the user has already liked, which is undesirable behavior. I have a table called LikeRecords which stores a userID, a contentID, and a timePlaced timestamp. The idea is to use this table to filter content that a user has already liked when choosing what to display.
The thing is, I'm a MySQL amateur, and don't understand scaling and maintenance well. Even though I only have about 1,500 users, this table already has 45,000 records. I'm worried that as my service grows to tens or hundreds of thousands of users, this table will explode into millions and become slow since the filter operation would be called very frequently.
Is there a better design pattern I could use here, or a maintenance technique I should use?
EDIT: Here is the query for building the table in question:
CREATE TABLE `likerecords` (
`likeID` int(11) NOT NULL AUTO_INCREMENT,
`userID` int(10) unsigned NOT NULL,
`orderID` int(11) NOT NULL,
`timePlaced` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`special` tinyint(1) NOT NULL,
PRIMARY KEY (`likeID`)
) ENGINE=InnoDB AUTO_INCREMENT=44775 DEFAULT CHARSET=latin1
I would be using it to filter results in other tables, such as an "orders" table.
I have a database table that is giving me those headaches, errors when inserting lots of data. Let me break down what exactly happens and I'm hoping someone will have some insight into how I can get this figured out.
Basically I have a table that has 11+ million records in it and it's growing everyday. We track how times a user is viewing a video and their progress in that video. You can see below what the structure is like. Our setup is a master db with two slaves attached to it. Nightly we run a cron script to compile some statistical data out of this table and compile them into a couple other tables we use just for reporting. These cron scripts only do SELECT statements on the slave and will do the insert into our statistical tables on the master (so it'll propagate down). Like clockwork every time we run this script it will lock up our production table. I thought moving the SELECT to a slave would fix this issue and since we aren't even writing into the main table with the cron but rather other tables, I'm now perplexed what could possibly cause this locking up.
It's almost as if it seems that every time a large read on the main table (master or slave) it locks up the master. As soon as the cron is complete, the table goes back to normal performance.
My question is several levels about INNODB. I've had thoughts that it might be indexing that would cause this issue but maybe it's other variables on INNODB settings that I'm not fully understanding. As you would imagine, I want to keep the master from getting this lockup. I don't really care if the slave is pegged out during this script run as long as it won't effect my master db. Is this something that can happen with Slave/Master relationships in MYSQL?
The tables that are getting the compiled information to are stats_daily, stats_grouped for reference.
The biggest issue I have here, to restate a little, is that I don't understand what can cause the locking like this. Taking the reads off the master and just doing inserts into another table doesn't seem like it should do anything on the master original table. I can watch the errors start streaming in, however, 3 minutes after the script starts and it will end immediately when the script stops.
The table I'm working with is below.
CREATE TABLE IF NOT EXISTS `stats` (
`ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
`VID` int(10) unsigned NOT NULL DEFAULT '0',
`UID` int(10) NOT NULL DEFAULT '0',
`Position` smallint(10) unsigned NOT NULL DEFAULT '0',
`Progress` decimal(3,2) NOT NULL DEFAULT '0.00',
`ViewCount` int(10) unsigned NOT NULL DEFAULT '0',
`DateFirstView` int(10) unsigned NOT NULL DEFAULT '0', // Use unixtimestamps
`DateLastView` int(10) unsigned NOT NULL DEFAULT '0', // Use unixtimestamps
PRIMARY KEY (`ID`),
KEY `VID` (`VID`,`UID`),
KEY `UID` (`UID`),
KEY `DateLastView` (`DateLastView`),
KEY `ViewCount` (`ViewCount`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=15004624 ;
Does anyone have any thoughts or ideas on this?
UPDATE:
The errors I get from the master DB
MysqlError: Lock wait timeout exceeded; try restarting transaction
Uncaught exception 'Exception' with message 'invalid query UPDATE stats SET VID = '13156', UID = '73859', Position = '0', Progress = '0.8', ViewCount = '1', DateFirstView = '1375789950', DateLastView = '1375790530' WHERE ID = 14752456
The update query fails because of the locking. The query is actually valid. I'll get 100s of these and afterwards I can randomly copy/paste these queries and they will work.
UPDATE 2
Queries and Explains from Cron Script
Query Ran on the Slave (leaving php variables in curly brackets for reference):
SELECT
VID,
COUNT(ID) as ViewCount,
DATE_FORMAT(FROM_UNIXTIME(DateLastView), '%Y-%m-%d') AS YearMonthDay,
{$today} as DateModified
FROM stats
WHERE DateLastView >= {$start_date} AND DateLastView <= {$end_date}
GROUP BY YearMonthDay, VID
EXPLAIN of the SELECT Stat
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE stats range DateLastView DateLastView 4 NULL 25242 Using where; Using temporary; Using filesort
That result set is looped and inserted into the compiled table. Unfortunately I don't have support for batched inserts with this (I tried) so I have to loop through these one at a time instead of sending a batch of 100 or 500 to the server at a time. This is inserted into the master DB.
foreach ($results as $result)
{
$query = "INSERT INTO stats_daily (VID, ViewCount, YearMonthDay, DateModified) VALUES ({$result->VID}, {$result->ViewCount}, '{$result->YearMonthDay}', {$today} );
DoQuery($query);
}
The GROUP BY is the culprit. Apparently MySQL decides to use a temporary table in this case (perhaps because the table has exceeded some limit) which is very inefficient.
I ran into similar problems, but no clear solution. You could consider splitting your stats table into two tables, a 'daily' and a 'history' table. Run your query on the 'daily' table which only contains entries from the latest 24 hours or whatever your interval is, then clean up the table.
To get the info into your permanent 'history' table, either write your stats into both tables from code, or copy them over from daily into history before cleanup.
I have these table structures and while it works, using EXPLAIN on certain SQL queries gives 'Using temporary; Using filesort' on one of the table. This might hamper performance once the table is populated with thousands of data. Below are the table structure and explanations of the system.
CREATE TABLE IF NOT EXISTS `jobapp` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`fullname` varchar(50) NOT NULL,
`icno` varchar(14) NOT NULL,
`status` tinyint(1) NOT NULL DEFAULT '1',
`timestamp` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `icno` (`icno`)
) ENGINE=MyISAM;
CREATE TABLE IF NOT EXISTS `jobapplied` (
`appid` int(11) NOT NULL,
`jid` int(11) NOT NULL,
`jobstatus` tinyint(1) NOT NULL,
`timestamp` int(10) NOT NULL,
KEY `jid` (`jid`),
KEY `appid` (`appid`)
) ENGINE=MyISAM;
Query I tried which gives aforementioned statement:
EXPLAIN SELECT japp.id, japp.fullname, japp.icno, japp.status, japped.jid, japped.jobstatus
FROM jobapp AS japp
INNER JOIN jobapplied AS japped ON japp.id = japped.appid
WHERE japped.jid = '85'
AND japped.jobstatus = '2'
AND japp.status = '2'
ORDER BY japp.`timestamp` DESC
This system is for recruiting new staff. Once registration is opened, hundreds of applicant will register in a single time. They are allowed to select 5 different jobs. Later on at the end of registration session, the admin will go through each job one by one. I have used a single table (jobapplied) to store 2 items (applicant id, job id) to record who applied what. And this is the table which causes aforementioned statement. I realize this table is without PRIMARY key but I just can't figure out any other way later on for the admin to search specifically which job who have applied.
Any advice on how can I optimize the table?
Apart from the missing indexes and primary keys others have mentioned . . .
This might hamper performance once the
table is populated with thousands of
data.
You seem to be assuming that the query optimizer will use the same execution plan on a table with thousands of rows as it will on a table with just a few rows. Optimizers don't work like that.
The only reliable way to tell how a particular vendor's optimizer will execute a query on a table with thousands of rows--which is still a small table, and will probably easily fit in memory--is to
load a scratch version of the
database with thousands of rows
"explain" the query you're interested
in
FWIW, the last test I ran like this involved close to a billion rows--about 50 million in each of about 20 tables. The execution plan for that query--which included about 20 left outer joins--was a lot different than it was for the sample data (just a few thousand rows).
You are ordering by jobapp.timestamp, but there is no index for timestamp so the tablesort (and probably the temporary) will be necessary try adding and index for timestamp to jobapp something like KEY timid (timestamp,id)