Keeping last seen values in a summary table - mysql

I have two tables:
parameters keeps all the para_ids and their names and is always updated to have all parameters in it.
CREATE TABLE `parameters` (
`para_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL DEFAULT '',
PRIMARY KEY (`para_id`),
UNIQUE KEY `idx_parameters_name` (`name`)
) ENGINE=InnoDB;
processing is holding a chunk of data every 5 minutes.
CREATE TABLE `processing` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`t_ns` bigint(20) unsigned NOT NULL DEFAULT '0',
`para_id` int(10) unsigned NOT NULL DEFAULT '0',
`value` varchar(1024) NOT NULL DEFAULT '',
`isanchor` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `data` (`para_id`,`t_ns`)
) ENGINE=InnoDB;
I want to keep a table actual_values with the last seen values that each parameter (if it occurred in processing) had. The para_ids are updated with an INSERT IGNORE before the update. Currently I have those queries:
INSERT IGNORE INTO actual_values (para_id) (SELECT DISTINCT para_id FROM parameters);
UPDATE actual_values a
JOIN processing p ON a.para_id = p.para_id
SET a.value = (SELECT p.value FROM processing p WHERE a.para_id = p.para_id ORDER BY t_ns DESC LIMIT 1);
I feel like this is not the optimal way to go, it takes quite long. Do you guys have better suggestions?

Related

MySQL indexing for INSERT INTO... ON DUPLICATE UPDATE

To give you a context, we have a huge table in our database, with well over 15 million rows.
We are executing a INSERT INTO... ON DUPLICATE KEY query on this table, which is taking more than 20 mins to complete the insert/update.
Example query -
INSERT INTO table1 (date_time, block_start, block_end, tx_id, tz_id, z_id, interval_span,
interval_id, updated, req, imp, cli)
VALUES ('2018-02-02 15:55:00', '2018-02-02 15:55:00', '2018-02-02 15:59:59', '51530',
'51530', '8005', '5', '1631', '2018-02-02 15:58:50', '1', '0', '0')
ON DUPLICATE KEY
UPDATE req = req + 1, imp = imp + 0, cli = cli + 0
Table structure is as below -
CREATE TABLE `table1` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`date_time` datetime NOT NULL,
`interval_span` int(10) unsigned NOT NULL,
`interval_id` int(10) unsigned NOT NULL,
`block_start` datetime NOT NULL,
`block_end` datetime NOT NULL,
`tx_id` int(10) unsigned NOT NULL,
`tz_id` int(10) unsigned NOT NULL,
`z_id` int(10) unsigned NOT NULL,
`req` int(10) unsigned NOT NULL DEFAULT '0',
`imp` int(10) unsigned NOT NULL DEFAULT '0',
`cli` int(10) unsigned NOT NULL DEFAULT '0',
`updated` datetime NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `iaz_table1` (`block_start`,`tx_id`,`z_id`),
KEY `tx_id` (`tx_id`,`date_time`),
KEY `z_id` (`z_id`,`date_time`),
KEY `date_time` (`date_time`),
KEY `block_start` (`block_start`)
) ENGINE=InnoDB AUTO_INCREMENT=257679784 DEFAULT CHARSET=utf8
How can I improve the speed of this insert? I need to achieve execution time of less than 5 seconds.
Sounds like you do not have a PRIMARY KEY or UNIQUE KEY included in the list of columns: (date_time, block_start, block_end, tx_id, tz_id, z_id, interval_span, interval_id, updated, req, imp, cli).
A table definition would be helpful. It looks like all fields are strings, but it seems like many of them could be integers. Integer comparisons are much faster than varchar (integers take up way less space). See this post:
SQL SELECT speed int vs varchar

mysql gives NULL for record in a table if JOIN and COUNT used but SELECT works fine why?

Table 1
CREATE TABLE IF NOT EXISTS `com_msg` (
`msg_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`msg_to` int(10) NOT NULL,
`msg_from` int(10) NOT NULL,
`msg_new` tinyint(1) unsigned NOT NULL DEFAULT '1',
`msg_content` varchar(300) NOT NULL,
`msg_date` date NOT NULL,
`bl_sender` tinyint(1) unsigned NOT NULL DEFAULT '0',
`bl_recip` tinyint(1) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`msg_id`),
UNIQUE KEY `msg_id` (`msg_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Table 2
CREATE TABLE IF NOT EXISTS `ac_vars` (
`user_id` int(10) unsigned NOT NULL,
`ac_ballance` smallint(3) unsigned NOT NULL DEFAULT '0',
`prof_views` mediumint(8) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`user_id`),
UNIQUE KEY `id` (`user_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
When I use query :
SELECT ac_ballance, prof_views, COUNT( msg_id ) AS messages
FROM ac_vars
INNER JOIN com_msg ON user_id = msg_to
WHERE user_id =".$userid." AND com_msg.msg_new =1;
I get :
ac_ballance=NULL(incorrect)
prof_views=NULL(incorrect)
messages=0(correct)
But with Select statement just on ac_vars I get correct values, what is the correct way of doing this?
You want rows from the table ac_vars even when there's no corresponding row in the table com_msg.
So you must use a LEFT JOIN:
SELECT ac_ballance, prof_views, COUNT( msg_id ) AS messages
FROM ac_vars
LEFT JOIN com_msg
ON user_id = msg_to AND com_msg.msg_new =1
WHERE user_id =".$userid.";
Please note that the condition
com_msg.msg_new =1
got to be a part of the JOIN condition and not the WHERE clause, because there's no value in com_msg that fulfills this condition.
Note
Adding
GROUP BY ac_ballance, prof_views
is not needed by MySQLs optimization because the values in those columns are directly dependent of the user_id and the WHERE clause permits only one single row.

How to optimize this mysql join on large table?

I have a project where the admin needs to create multiple newsletters with some crawled posts from the web.
I insert the posts in posts table after crawling has completed and assign them a feed_id to identify the source. this is the structure of posts table (truncated):
CREATE TABLE `posts` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`feed_id` int(11) NOT NULL,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT NULL,
`identifier` varchar(255) DEFAULT NULL,
`published` timestamp NULL DEFAULT NULL,
`content` longtext,
...
...
`is_unread` int(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Every admin (user) has access to one or more "feeds". So in Newsletter creation page I want to show them a list of posts from the feeds they are allowed to see and also, I show a button to put the posts in specifict categories of that newsletter, if the user previously selected that post, I should show him that and let him remove it from the category. So I have some other tables too: newsletters, categories, newsletter_post, category_post. Here is their structures:
newsletters:
CREATE TABLE `newsletters` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT NULL,
`sent_at` timestamp NULL DEFAULT NULL,
`title` varchar(255) DEFAULT NULL,
`date` date DEFAULT NULL,
`topic_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
categories:
CREATE TABLE `categories` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`topic_id` int(11) NOT NULL,
`title` varchar(255) DEFAULT NULL,
`slug` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
newsletter_post:
CREATE TABLE `newsletter_post` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT NULL,
`newsletter_id` int(11) NOT NULL,
`post_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
category_post:
CREATE TABLE `category_post` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT NULL,
`category_id` int(11) NOT NULL,
`post_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
So I'm using this query to find posts for the allowed feeds and check the status if a post is in a specific category of this specific newsletter:
SELECT DISTINCT `posts`.`id`, `published`, `posts`.`title`, `posts`.`content`, `source_name`, `category_id`, `newsletter_id`, `link_href`, categories.title as category_title
FROM `posts`
LEFT JOIN `category_post` ON `posts`.`id` = `category_post`.`post_id`
LEFT JOIN `categories` ON `categories`.`id` = `category_post`.`category_id`
LEFT JOIN `newsletter_post` ON `posts`.`id` = `newsletter_post`.`post_id`
LEFT JOIN `newsletters` ON `newsletters`.`id` = `newsletter_post`.`newsletter_id`
WHERE `feed_id` IN (6, 7) ORDER BY `posts`.`published` DESC LIMIT 40 OFFSET 0
but the problem is this is horrible and not optimized. My posts table contains up to 50,000 rows each month, and each row with 3~10kbs of data in avg., so sometimes when I try to run the query (which is frequently run by the admin to make the newsletter, pagination etc) mysql shows this error: too much rows to join, etc. and most of the times its really slow.
and the reason I'm doing all this in one query is because I want the result to be in one json response so I can show them the user quickly without doing additional requests.
I wanna know if there is a better way to do this query or use indexes or something else.
Thanks you in advance for your help.
index your posts table on
( feed_id, published )
so the data is already optimized for your WHERE clause, and pre-sorted to help your ORDER BY.
For reading querys that have a lot of demand, InnoDB is very inefficient. I recommend you to use a NoSQL Database but if you don't want or the cost of change is too much... you can try this:
1) LIKE Sallar Kaboli told you, you have to index your tables in columns that use in JOIN querys. For example:
CREATE INDEX index1 ON newsletter_post (post_id);
2) USE only important columns for JOINS.
I mean, you have to only use the columns that use in SELECT part of query.
I hope this'd be helpful.
To complete other answers, I suggest to change this types on posts table:
1) Change feed_id to int(4). Really you have more than int(4) feeds?
2) Change is_unread to bit instead of int(1). I should say that this may not improve your given query in the question but according to the field name, the correct type is bit.
Another more improvement to this answer is that never use default int(11) for numeric or id fields, assign types more specific. Using smaller size of types will improve your indexes also. I don't think you need more than int(4) for fields id.
For example indexing and querying int(3) column is more faster than int(11).
Please create the following indexes indexes on ::
1) `post_id` in `category_post`
2) `post_id` in `newsletter_post`

SQL: Refactoring a multi-join query

I have a query that should be quite simple and yet it causes me a lot of headaches.
I have a simple ads system that requires filtering ads according to a few variables.
I need to limit the number of views/clicks per day and the total number of views/clicks for a given ad. Also each ad is linked to one or more slots in which the ad can appear. I have a table that saves the statistics that I need about each ad. Note that the statistics table changes very frequently.
These are the tables that I'm using:
CREATE TABLE `t_ads` (
`id` int(10) unsigned NOT NULL auto_increment,
`name` varchar(255) NOT NULL,
`content` text NOT NULL,
`is_active` tinyint(1) unsigned NOT NULL,
`start_date` date NOT NULL,
`end_date` date NOT NULL,
`max_views` int(10) unsigned NOT NULL,
`type` tinyint(3) unsigned NOT NULL default '0',
`refresh` smallint(5) unsigned NOT NULL default '0',
`max_clicks` int(10) unsigned NOT NULL,
`max_daily_clicks` int(10) unsigned NOT NULL,
`max_daily_views` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `t_ad_slots` (
`id` int(10) unsigned NOT NULL auto_increment ,
`name` varchar(255) NOT NULL,
`width` int(10) unsigned NOT NULL,
`height` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `t_ads_to_slots` (
`ad_id` int(10) unsigned NOT NULL,
`slot_id` int(10) unsigned NOT NULL,
`value` int(10) unsigned NOT NULL,
PRIMARY KEY (`ad_id`,`slot_id`),
KEY `slot_id` (`slot_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `t_ads_to_slots`
ADD CONSTRAINT `t_ads_to_slots_ibfk_1` FOREIGN KEY (`ad_id`) REFERENCES `t_ads` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION,
ADD CONSTRAINT `t_ads_to_slots_ibfk_2` FOREIGN KEY (`slot_id`) REFERENCES `t_ad_slots` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION;
CREATE TABLE `t_ad_stats` (
`ad_id` int(10) unsigned NOT NULL,
`slot_id` int(10) unsigned NOT NULL,
`date` date NOT NULL COMMENT,
`views` int(10) unsigned NOT NULL,
`unique_views` int(10) unsigned NOT NULL,
`clicks` int(10) unsigned NOT NULL default '0',
PRIMARY KEY (`ad_id`,`slot_id`,`date`),
KEY `slot_id` (`slot_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `t_ad_stats`
ADD CONSTRAINT `t_ad_stats_ibfk_1` FOREIGN KEY (`ad_id`) REFERENCES `t_ads` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION,
ADD CONSTRAINT `t_ad_stats_ibfk_2` FOREIGN KEY (`slot_id`) REFERENCES `t_ad_slots` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION;
This is the query that I use to get ads for a given slot (Note that in this example I hard coded 20 as the slot id and 0,1,2 as the ad type, I get this data from a php script which invokes this query)
SELECT `ads`.`content`, `slots`.`value`, `ads`.`id`, `ads`.`refresh`, `ads`.`type`,
SUM(`total_stats`.`views`) AS "total_views",
SUM(`total_stats`.`clicks`) AS "total_clicks"
FROM (`t_ads` AS `ads`,
`t_ads_to_slots` AS `slots`)
LEFT JOIN `t_ad_stats` AS `total_stats`
ON `total_stats`.`ad_id` = `ads`.`id`
LEFT JOIN `t_ad_stats` AS `daily_stats`
ON (`daily_stats`.`ad_id` = `ads`.`id`) AND
(`daily_stats`.`date` = CURDATE())
WHERE (`ads`.`id` = `slots`.`ad_id`) AND
(`ads`.`type` IN(0,1,2)) AND
(`slots`.`slot_id` = 20) AND
(`ads`.`is_active` = 1) AND
(`ads`.`end_date` >= NOW()) AND
(`ads`.`start_date` <= NOW()) AND
((`ads`.`max_views` = 0) OR
(`ads`.`max_views` > "total_views")) AND
((`ads`.`max_clicks` = 0) OR
(`ads`.`max_clicks` > "total_clicks")) AND
((`ads`.`max_daily_clicks` = 0) OR
(`ads`.`max_daily_clicks` > IFNULL(`daily_stats`.`clicks`,0))) AND
((`ads`.`max_daily_views` = 0) OR
(`ads`.`max_daily_views` > IFNULL(`daily_stats`.`views`,0)))
GROUP BY (`ads`.`id`)
I believe that this query is self explanatory, even though its quite long. Note that the MySQL version that I'm using is: 5.0.51a-community. It seems to me like the big issue here is the double join to the stats table (I did that so that I will be able to get the data from a specific record and from multiple records (sum)).
How would you implement this query in order to get better results? (Note that I can't change from InnoDB).
Hopefully everything is clear about my question, but if that is not the case, please ask and I will clarify.
Thanks in advance,
Kfir
Add indexes to following columns:
t_ads.is_active
t_ads.start_date
t_ads.end_date
Change the order of the primary key on t_ad_stats to:
(`ad_id`,`date`,`slot_id`)
or add a covering index to t_ad_stats
('ad_id', 'date')
Change from 0 meaning "no limit" to 2147483647 meaning no limit, so you can change things like:
((`ads`.`max_views` = 0) OR (`ads`.`max_views` > "total_views"))
to
(`ads`.`max_views` > "total_views")
You could greatly improve this is if you were keeping running totals instead of having to calculate them each time.
Expanding on a comment above I believe that the following columns should be indexed:
ads.id
ads.type
ads.start_date
ads.end_date
daily_stats.date
As well as these:
slots.slot_id
ads.is_active
And these as well:
ads.max_views
ads.max_clicks
ads.max_daily_clicks
ads.max_daily_views
daily_stats.clicks
daily_stats.views
Do note that applying indexes on these columns will speed up your SELECTs but slow down your INSERTs since the indexes will need updating as well. But, you don't have to apply all of this all at once. You can do it incrementally and see how the performance shakes out for selects as well as inserts. If you cannot find a good middleground then I would suggest denormalization.

mySQL view for two different tables?

I've a massive problem creating a view in mySQL:
Table A in database DB1:
CREATE TABLE `a` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT COMMENT 'internal ID',
`account` VARCHAR(10) NOT NULL DEFAULT '0',
`filename` VARCHAR(50) NOT NULL,
`filesize` BIGINT(15) NOT NULL DEFAULT '0'
PRIMARY KEY (`id`)
)
ENGINE=InnoDB
Table B in database DB2:
CREATE TABLE `b` (
`archive_id` INT(10) UNSIGNED NULL AUTO_INCREMENT,
`archive_datetime` DATETIME,
`id` INT(10) UNSIGNED NOT NULL,
`account` VARCHAR(10) NOT NULL DEFAULT '0',
`filename` VARCHAR(50) NOT NULL,
`filesize` BIGINT(15) NOT NULL DEFAULT '0'
PRIMARY KEY (`archive_id`)
)
ENGINE=Archive
Entries from table A are automatically transfered to table B via trigger if BEFORE DELETE.
I need a view that gives me all entries from table a and table b as if they were still in one table of the same database. Columns archive_id and archive_datetime can be ignored in the view as they are not needed for this scenario.
You could use UNION:
SELECT * FROM a UNION SELECT * FROM b;
You just have to replace * with the desired table columns.
SELECT id, account, filename, filesize FROM a UNION ALL SELECT id, account, filename, filesize FROM b
Surely I must be missing something?