LINQ query with FirstOrDefault vs ToArray - mysql

Using MySQL 5.6 and the following table structure:
CREATE TABLE `dataitem` (
`AI` int(11) unsigned NOT NULL AUTO_INCREMENT,
`ID` binary(16) NOT NULL,
`OwnerID` binary(16) NOT NULL,
`DataItemTimeUtc` datetime NOT NULL,
`DataItemTimeLocal` datetime NOT NULL,
`DataItemTimeMicroSeconds` int(11) NOT NULL,
`DataItemArrivalTimeUtc` datetime NOT NULL DEFAULT '0001-01-01 00:00:00',
`DataItemTimeTimeZoneID` binary(16) NOT NULL,
`QuestionID` binary(16) NOT NULL,
`QuestionHistoryID` binary(16) DEFAULT NULL,
`QuestionAbsolutePositionID` varchar(1000) COLLATE utf8_unicode_ci DEFAULT NULL,
`GroupSessionIDString` varchar(250) COLLATE utf8_unicode_ci DEFAULT NULL,
`DataItemType` int(11) NOT NULL,
`DataEntryDevice` varchar(250) COLLATE utf8_unicode_ci DEFAULT NULL,
`DataEntryDeviceCradle` varchar(250) COLLATE utf8_unicode_ci DEFAULT NULL,
`DataItemXml` longtext COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`AI`),
UNIQUE KEY `dataitem_ID_UQ_Idx` (`ID`),
KEY `dataitem_OwnerID_Idx` (`OwnerID`),
KEY `dataitem_DataItemTimeUtc_Idx` (`DataItemTimeUtc`),
KEY `dataitem_QuestionID_Idx` (`QuestionID`),
KEY `dataitem_QuestionHistoryID_Idx` (`QuestionHistoryID`),
KEY `dataitem_QuestionAbsolutePositionID_Idx` (`QuestionAbsolutePositionID`(255)),
KEY `dataitem_DataItemType_Idx` (`DataItemType`)
) ENGINE=InnoDB AUTO_INCREMENT=23467 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
I am experiencing something that I am struggling to understand. The following query causes a fatal error because it is taking forever to execute:
Guid patientid = new Guid("cfed2acf-acbd-4ab2-8c23-7ab0b3a8cfa3");
var latestRecord = (from f in QueryHelper.GetEntityTable<DataItem>()
where
f.OwnerID == patientid
&& f.QuestionAbsolutePositionID == "5867FF5EC08B9C0422EFD1359B2802B29A8E167952D381EC70AE53CE6D4C9318"
orderby f.DataItemTimeUtc descending
select f.ID).FirstOrDefault();
However if I change .FirstOrDefault() to .ToArray() the query runs like a flash and returs 2 results. Can someone explain this?
SQL Query generated from .ToArray():
SELECT t0.`ID`
FROM `DataItem` AS t0
WHERE ((t0.`OwnerID` = #p0) AND (t0.`QuestionAbsolutePositionID` = #p1))
ORDER BY t0.`DataItemTimeUtc` DESC
-- p0 = [cfed2acf-acbd-4ab2-8c23-7ab0b3a8cfa3]
-- p1 = [5867FF5EC08B9C0422EFD1359B2802B29A8E167952D381EC70AE53CE6D4C9318]
SQL query generated from .FirstOrDefault():
SELECT t0.`ID`
FROM `DataItem` AS t0
WHERE ((t0.`OwnerID` = #p0) AND (t0.`QuestionAbsolutePositionID` = #p1))
ORDER BY t0.`DataItemTimeUtc` DESC
LIMIT 0, 1
-- p0 = [cfed2acf-acbd-4ab2-8c23-7ab0b3a8cfa3]
-- p1 = [5867FF5EC08B9C0422EFD1359B2802B29A8E167952D381EC70AE53CE6D4C9318]

First, figure out why QuestionAbsolutePositionID needs to be 1000 characters long. It it can be less than 256, make it so. If no, then ask yourself whether it can be changed to CHARACTER SET ascii. It looks like hex, which works fine with ascii. (Rarely do "ids" include accented letters, Cyrillic, Japanese, etc.) If neither of those 'fixes' are possible, can you upgrade to MySQL 5.7?
Once you have fixed the problem of index size (above), add this 'composite' (and 'covering') index; it should speed up the query:
INDEX(OwnerID, QuestionAbsolutePositionID, DataItemTimeUtc, ID)
(The first two columns can be in either order.)
If it does not help, then we need to discuss the #variables.

Related

Error while I'm trying to partitioning a table

Here is my posts table:
CREATE TABLE `posts` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`date` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`img` varchar(255) COLLATE utf8_croatian_ci NOT NULL,
`vid` varchar(255) COLLATE utf8_croatian_ci NOT NULL,
`title` varchar(255) COLLATE utf8_croatian_ci NOT NULL,
`subtitle` varchar(255) COLLATE utf8_croatian_ci NOT NULL,
`auth` varchar(54) COLLATE utf8_croatian_ci NOT NULL,
`story` longtext COLLATE utf8_croatian_ci NOT NULL,
`tags` varchar(255) COLLATE utf8_croatian_ci NOT NULL,
`status` varchar(100) COLLATE utf8_croatian_ci NOT NULL,
`moder` varchar(50) COLLATE utf8_croatian_ci NOT NULL,
`rec` varchar(50) COLLATE utf8_croatian_ci NOT NULL,
`pos` varchar(50) COLLATE utf8_croatian_ci NOT NULL,
`inde` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=117 DEFAULT CHARSET=utf8 COLLATE=utf8_croatian_ci
I want to make two partitions in order to improve query performances.
First partition should contain all non-archive rows.
Second partition - all archive rows.
ALTER TABLE posts
PARTITION BY LIST COLUMNS (status)
(
PARTITION P1 VALUES IN ('admin', 'moder', 'public', 'rec'),
PARTITION P2 VALUES IN ('archive')
);
phpmyadmin error:
Static analysis:
1 errors were found during analysis.
Unrecognized alter operation. (near "" at position 0)
MySQL said:
#1503 - A PRIMARY KEY must include all columns in the table's partitioning function
Any help?
What queries are you trying to speed up? Since the only index you currently have, WHERE id=... or WHERE id BETWEEN ... AND ... are the only queries that will be fast. And the partitioning you suggest will not help much for other queries.
You seem to have only dozens of rows; don't consider partitioning unless you expect to have at least a million rows.
status has only 5 values? Then make it ENUM('archive', 'admin', 'moder', 'public', 'rec') NOT NULL. That will take 1 byte instead of lots.
If you will be querying on date and/or status and/or auth, then let's talk about indexes, especially 'composite' indexes on such. And, to achieve the "archive" split you envision, put status as the first column in the index.

Clear MYSQL Slow Query LOG - Rails

The following throws slow query log.
APN::Notification.
select('apn_notifications.*, devices.device_uid').
joins('INNER JOIN apn_devices ON (apn_notifications.device_id = apn_devices.id) INNER JOIN devices ON (apn_devices.device_id = devices.id)').
where(['apn_notifications.sent_at IS NULL AND apn_notifications.badge > 0 AND devices.customer_id = ? AND devices.device_type IN (?)', customer.id, Object::Device.platform_device_types('ios')])
The output of EXPLAIN
EXPLAIN for: SELECT apn_notifications.*, devices.device_uid FROM `apn_notifications` INNER JOIN apn_devices ON (apn_notifications.device_id = apn_devices.id) INNER JOIN devices ON (apn_devices.device_id = devices.id) WHERE (apn_notifications.disabled_at IS NULL) AND (apn_notifications.sent_at IS NULL AND apn_notifications.badge > 0 AND devices.customer_id = 1 AND devices.device_type IN ('iphone4','ipad','iphone3'))
The Output of 'show create table apn_notifications'
| apn_notifications | CREATE TABLE `apn_notifications` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`device_id` int(11) DEFAULT NULL,
`errors_nb` int(11) DEFAULT NULL,
`device_language` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`sound` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`alert` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`badge` int(11) DEFAULT NULL,
`custom_properties` text COLLATE utf8_unicode_ci,
`sent_at` datetime DEFAULT NULL,
`disabled_at` datetime DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_apn_notifications_on_device_id` (`device_id`)
) ENGINE=InnoDB AUTO_INCREMENT=12984412 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
the apn_notifications table has 1.5 million records. So if I try to add index, it is taking much longer time. What is the best way to remove this from slow query?
Also From Mysql 5.6, adding index will not result in any downtime. Am I right?
Often "composite" indexes are better:
apn_notifications: INDEX(device_id, sent_at, badge)
apn_notifications: INDEX(sent_at, badge)
devices: INDEX(customer_id, device_type)
apn_devices is a many-to-many mapping? If so, check that it is following the guidelines here .
I added the following indexes and that reduced lots of time.
add_index :apn_notifications, :sent_at
add_index :apn_notifications, :badge
NOTE: Already indexes were added for foreign keys

MySQL indexing optimization

I have following table - 'element'
CREATE TABLE `element` (
`eid` bigint(22) NOT NULL AUTO_INCREMENT,
`tag_name` varchar(45) COLLATE utf8_bin DEFAULT NULL,
`text` text COLLATE utf8_bin,
`depth` tinyint(2) DEFAULT NULL,
`classes` tinytext COLLATE utf8_bin,
`webarchiver_uniqueid` int(11) DEFAULT NULL,
`created` datetime DEFAULT NULL,
`updated` datetime DEFAULT NULL,
`rowstatus` char(1) COLLATE utf8_bin DEFAULT 'A',
PRIMARY KEY (`eid`)
) ENGINE=InnoDB AUTO_INCREMENT=12090 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Column details and current index details are given above. Almost 90% of queries on this table are like:
select * from element
where tag_name = 'XXX'
and text = 'YYYY'
and depth = 20
and classes = 'ZZZZZ'
and rowstatus = 'A'
What would be the most optimal way to create index on this table? The table has around 60k rows.
Change classes from TINYTEXT to VARCHAR(255) (or some more reasonable size), then have
INDEX(tag_name, depth, classes)
with the columns in any order. I left out rowstatus because it smells like a column that is likely to change. (Anyway, a flag does not add much to an index.)
You can't include TEXT or BLOB columns in an index. And it is not worth it to do a 'prefix' index.
Since a PRIMARY KEY is a UNIQUE key, DROP INDEX eid_UNIQUE.
Is there some reason for picking "binary" / "utf8_bin" for all the character fields?

How to optimize data aggregation in day of week, time of day for a period of one month back

Here is the situation:
I have a SAAS Application which is a simple RSS Feed reader. I think most people know what this is - users subscribing to RSS feeds and then reading items from them. Nothing new. One feed can have many subscribers.
I've implemented some statistics for the users, but I don't think I've chosen the right approach, because things are getting slower by the hour as the number of users and feeds grows.
Here's what I'm doing now:
At every hour get the total number of articles for each feed:
SELECT COUNT(*) FROM articles WHERE feed_id=?
Get the previous value to calculate the delta (this is getting a little slow):
SELECT value FROM feeds_stats WHERE feed_id=? AND name='total_articles' ORDER BY date DESC LIMIT 1
Insert the new value and delta:
INSERT INTO feeds_stats (date,feed_id,name,value,delta) VALUES ('".date("Y-m-d H:i:s",$global_timestamp)."','".$feed_id','total_articles','".$value."','".($value-$old_value)."')
For every user get his feeds and for each feed get the number of articles he has read:
SELECT COUNT(*) FROM users_articles ua JOIN articles a ON a.id=ua.article_id WHERE a.feed_id='%s' AND ua.user_id='%s' AND ua.read=1
users_articles is a table which holds the read state of each article per user
Then again get the delta:
SELECT value FROM users_feeds_stats WHERE user_id='?' AND feed_id='?' AND name='total_reads' ORDER BY date DESC LIMIT 1
And insert the new value + delta:
INSERT INTO users_feeds_stats (date,user_id,feed_id,name,value,delta) VALUES ('".date("Y-m-d H:i:s",$global_timestamp)."','".$user_id."','".$feed_id."','total_reads','".$value."','".($value-$old_value)."')
When all feeds for the user has been processed comes the aggregation part:
This is a bit tricky and I think there should be a lot of room for optimization here.
Here is the actual aggregation function in PHP:
<?php
function aggregate_user_stats($user_id=false,$feed_id=false){
global $global_timestamp;
// defined dimensions
$feed_types[0] = array("days_back" => 31, "group_by" => "DATE_FORMAT(date, '%Y-%m-%d')");
$feed_types[1] = array("days_back" => 31, "group_by" => "WEEKDAY(date)+1");
$feed_types[2] = array("days_back" => 31, "group_by" => "HOUR(date)");
if($user_id){
$where = " WHERE id=".$user_id;
}
$feed_where = "";
$getusers = mysql_query("SELECT id FROM users".$where)or die(__LINE__." ".mysql_error());
while($user = mysql_fetch_assoc($getusers)){
if($feed_id){
$feed_where = " AND feed_id=".$feed_id;
}
$user_feeds = array();
$getfeeds = mysql_query("SELECT feed_id FROM subscriptions WHERE user_id='".$user["id"]."' AND active=1".$feed_where)or die(__LINE__." ".mysql_error());
while($row = mysql_fetch_assoc($getfeeds)){
foreach($feed_types as $tab => $type){
$getdata = mysql_query("
SELECT ".$type["group_by"]." AS date, name, SUM(delta) AS delta FROM feeds_stats WHERE feed_id = '".$row["feed_id"]."' AND name='total_articles' AND date > DATE_SUB(NOW(), INTERVAL ".$type["days_back"]." DAY) GROUP BY name, ".$type["group_by"]."
UNION
SELECT ".$type["group_by"]." AS date, name, SUM(delta) AS delta FROM users_feeds_stats WHERE user_id = '".$user["id"]."' AND feed_id = '".$row["feed_id"]."' AND name='total_reads' AND date > DATE_SUB(NOW(), INTERVAL ".$type["days_back"]." DAY) GROUP BY name, ".$type["group_by"]."
")or die(__LINE__." ".mysql_error());
$data = array();
while($row = mysql_fetch_assoc($getdata)){
$data[$row["date"]][$row["name"]] = $row["delta"];
}
if(count($data)){
db_start_trx();
mysql_query("DELETE FROM stats_feeds_over_time WHERE feed_id='".$row["feed_id"]."' AND user_id='".$user["id"]."' AND tab='".$tab."'")or die(__LINE__." ".mysql_error());
foreach($data as $time => $keys){
mysql_query("REPLACE INTO stats_feeds_over_time (feed_id,user_id,tab,date,total_articles,total_reads,total_favs) VALUES ('".$row["feed_id"]."','".$user["id"]."','".$tab."','".$time."','".$keys["total_articles"]."','".$keys["total_reads"]."','".$keys["total_favs"]."')")or die(__LINE__." ".mysql_error());
}
db_commit_trx();
}
}
}
}
}
Some notes:
Edit: Here are the DDL's of the involved tables:
CREATE TABLE `articles` (
`id` INTEGER(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`feed_id` INTEGER(11) UNSIGNED NOT NULL,
`date` INTEGER(10) UNSIGNED NOT NULL,
`date_updated` INTEGER(11) UNSIGNED NOT NULL,
`title` VARCHAR(1000) COLLATE utf8_general_ci NOT NULL DEFAULT '',
`url` VARCHAR(2000) COLLATE utf8_general_ci NOT NULL DEFAULT '',
`author` VARCHAR(200) COLLATE utf8_general_ci NOT NULL DEFAULT '',
`hash` CHAR(32) COLLATE utf8_general_ci NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
UNIQUE KEY `feed_id_hash` (`feed_id`, `hash`),
KEY `date` (`date`),
KEY `url` (`url`(255))
)ENGINE=InnoDB
AUTO_INCREMENT=0
CHARACTER SET 'utf8' COLLATE 'utf8_general_ci'
COMMENT='';
CREATE TABLE `users_articles` (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`user_id` INTEGER(11) UNSIGNED NOT NULL,
`article_id` INTEGER(11) UNSIGNED NOT NULL,
`subscription_id` INTEGER(11) UNSIGNED NOT NULL,
`read` TINYINT(4) UNSIGNED NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `user_id` (`user_id`, `article_id`),
KEY `article_id` (`article_id`),
KEY `subscription_id` (`subscription_id`)
)ENGINE=InnoDB
CHECKSUM=1 AUTO_INCREMENT=0
CHARACTER SET 'utf8' COLLATE 'utf8_general_ci'
COMMENT='';
CREATE TABLE `feeds_stats` (
`id` INTEGER(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`feed_id` INTEGER(11) UNSIGNED NOT NULL,
`date` DATETIME NOT NULL,
`name` VARCHAR(50) COLLATE utf8_general_ci NOT NULL DEFAULT '',
`value` INTEGER(11) NOT NULL,
`delta` INTEGER(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`),
KEY `feed_id` (`feed_id`),
KEY `date` (`date`)
)ENGINE=InnoDB
AUTO_INCREMENT=0
CHARACTER SET 'utf8' COLLATE 'utf8_general_ci'
COMMENT='';
CREATE TABLE `users_feeds_stats` (
`id` INTEGER(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`user_id` INTEGER(11) UNSIGNED NOT NULL DEFAULT '0',
`feed_id` INTEGER(11) UNSIGNED NOT NULL,
`date` DATETIME NOT NULL,
`name` VARCHAR(50) COLLATE utf8_general_ci NOT NULL DEFAULT '',
`value` INTEGER(11) NOT NULL,
`delta` INTEGER(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`),
KEY `feed_id` (`feed_id`),
KEY `user_id` (`user_id`),
KEY `date` (`date`)
)ENGINE=InnoDB
AUTO_INCREMENT=0
CHARACTER SET 'utf8' COLLATE 'utf8_general_ci'
COMMENT='';
CREATE TABLE `stats_feeds_over_time` (
`feed_id` INTEGER(11) UNSIGNED NOT NULL,
`user_id` INTEGER(11) NOT NULL,
`tab` INTEGER(11) NOT NULL,
`date` VARCHAR(30) COLLATE utf8_general_ci NOT NULL DEFAULT '',
`total_articles` DOUBLE(9,2) UNSIGNED NOT NULL,
`total_reads` DOUBLE(9,2) UNSIGNED NOT NULL,
`total_favs` DOUBLE(9,2) UNSIGNED NOT NULL,
PRIMARY KEY (`feed_id`, `user_id`, `tab`, `date`)
)ENGINE=InnoDB
AUTO_INCREMENT=0
CHARACTER SET 'utf8' COLLATE 'utf8_general_ci'
COMMENT='';
In the end of the aggregation function there is a REPLACE in table stats_feeds_over_time. This table holds just the records, that will be displayed on the graph, so the actual graphing process does not involve heavy queries.
Finally here are the graphs produced by this:
I would be glad if someone point me in the right direction on where and how to optimize this solution, even if it means to ditch MySQL for statistics.
I have long experience with RRDTool, but here the situation is different, because of the "Time of day", "Day of week" aggregations.
I don't know how important are the queries you wish to optimize with regard to the other queries you might run on the same set of tables. I will assume that you wish first to have these queries optimized.
Seeing that all the queries are made with feed_id as WHERE predicates, I would try to partition the articles table on that column:
CREATE TABLE `articles` (
`id` INTEGER(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`feed_id` INTEGER(11) UNSIGNED NOT NULL,
-- etc.
)ENGINE=InnoDB
AUTO_INCREMENT=0
CHARACTER SET 'utf8' COLLATE 'utf8_general_ci'
COMMENT=''
PARTITION BY KEY(feed_id)
PARTITIONS 10;
The number of partitions (10 above) can be tuned according to your needs, yet must be above 1 to have any impact. You might want to use a larger number to make your select queries faster. However any query non dependent on feed_id will be slowered by this device.
The same process can be applied to other tables with columns often used as discriminant in queries.
However, as your first two queries are executed for all the feeds, you could rewrite them as follow:
SELECT feed_id, COUNT(feed_id)
FROM articles
GROUP BY feed_id
SELECT feed_id, value
FROM feeds_stats
WHERE name='total_articles'
GROUP BY feed_id
ORDER BY date DESC
Both these would retrieve the results for all the feeds, which frees you from having to run the queries for each individual feed. Using these queries makes the partitioning counter productive, so you will have to choose between the two.
The good point of partitioning: any query discriminating against one particular value of feed_id (or any other column used for partition) will see a significant boost. The bad point is that regular queries will be slowed down.
The good point of the second solution is that it will not have any impact on other queries.

Accidentally specified weird sql query. What happened?

I ran this sql query in my database:
update payments set method = 'paysafecard' AND amount = 25 WHERE payment_id IN (1,2,3,4,5,...)
Of course i meant set method = 'paysafecard' , amount = 25
However I did it in phpmyadmin and it showed me that rows were affected. After running it again it showed 0 rows affected.
I don't know what may have changed in the database, what could this have done?
My table looks like this:
CREATE TABLE IF NOT EXISTS `payments` (
`payment_id` int(11) NOT NULL AUTO_INCREMENT,
`method_unique_id` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`method` enum('moneybookers','paypal','admin','wallet','voucher','sofortueberweisung','bitcoin','paysafecard','paymentwall') COLLATE utf8_unicode_ci NOT NULL,
`method_tid` int(11) DEFAULT NULL,
`uid` int(11) NOT NULL,
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`plan` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`expires_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`amount` decimal(8,2) NOT NULL,
`currency` enum('EUR','USD','BTC') COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`payment_id`),
UNIQUE KEY `method` (`method`,`method_tid`),
UNIQUE KEY `method_unique_id` (`method_unique_id`,`method`),
KEY `expires_at` (`expires_at`),
KEY `uid` (`uid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=8030 ;
I am running
-- Server version: 5.1.41
-- PHP Version: 5.3.2-1ubuntu4.11
This would result in the method field being set to '0' for all of your records fitting the where clause.
It is interpreted as the following:
set method = ('paysafecard' AND amount = 25)
This is a logical AND, and results in a boolean value for these records(which will be parsed to the corresponding field of your column).