How to aggregate data without group by - mysql

I am having a little bit of a situation here.
The environment
I have a database for series here.
One table for the series itself, one for the season connected to the series table, one for the episodes connected to the seasons table.
Since there are air dates for different countries I have another table called 'series_data` which looks like the following:
CREATE TABLE IF NOT EXISTS `episode_data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`episode_id` int(11) NOT NULL,
`country` char(3) NOT NULL,
`title` varchar(255) NOT NULL,
`date` date NOT NULL,
`tba` tinyint(1) NOT NULL,
PRIMARY KEY (`id`),
KEY `episode_id` (`episode_id`),
KEY `date` (`date`),
KEY `country` (`country`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Now I am trying to collect the last aired episodes from each series in the database using the following query:
SELECT
*
FROM
`episode_data` ed
WHERE
`ed`.`date` < CURDATE( ) &&
`ed`.`date` != '1970-01-01' &&
`ed`.`series_id` = 1
GROUP BY
`ed`.`country` DESC
ORDER BY
`ed`.`date` DESC
Since I have everything normalized I changed 'episode_id' with 'series_id' to make the query less complicated.
What I am trying to accomplish
I want to have the last aired episodes for each country which are actually announced (ed.date != '1970-01-01') as the returning result of one query.
What's the problem
I know now (searched google, found not for me working answers here), that the ordering takes place AFTER grouping, so my "date" ordering is completly useless.
The other problem is that the query above is working, but always takes those entries with the lowest id matching my conditions, because those are the first ones in the tables index.
What is the question?
How may accomplish the above. I do not know if the grouping is the right way to do it. If there is no "one liner", I think the only way is a sub query which I want to avoid since this is as far as I know slower than a one liner with the right indexes set.
Hope in here is everything you need :)
Example data
CREATE TABLE IF NOT EXISTS `episode_data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`episode_id` int(11) NOT NULL,
`country` char(3) NOT NULL,
`title` varchar(255) NOT NULL,
`date` date NOT NULL,
`tba` tinyint(1) NOT NULL,
PRIMARY KEY (`id`),
KEY `episode_id` (`episode_id`),
KEY `date` (`date`),
KEY `country` (`country`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `episode_data` (`id`, `episode_id`, `country`, `title`, `date`, `tba`) VALUES
(4942, 2471, 'de', 'Väter und Töchter', '2013-08-06', 0),
(4944, 2472, 'de', 'Neue Perspektiven', '2013-08-13', 0),
(5013, 2507, 'us', 'Into the Deep', '2013-08-06', 0),
(5015, 2508, 'us', 'The Mirror Has Three Faces', '2013-08-13', 0);
Attention!
This is the original table data with "EPISODE_ID" not "SERIES_ID".
The data I want are those with closest dates to today, which are here 4944 and 5015.

If you want the last aired date for each country, then use this aggregation:
SELECT country, max(date) as lastdate
FROM `episode_data` ed
WHERE `ed`.`date` < CURDATE( ) AND
`ed`.`date` != '1970-01-01' AND
`ed`.`series_id` = 1
GROUP BY `ed`.`country`;
If you are trying to get the episode_id and title as well, you can use group_concat() and substring_index():
SELECT country, max(date) as lastdate,
substring_index(group_concat(episode_id order by date desc), ',', 1
) as episode_id,
substring_index(group_concat(title order by date desc separator '|'), '|', 1
) as title
FROM `episode_data` ed
WHERE `ed`.`date` < CURDATE( ) AND
`ed`.`date` != '1970-01-01' AND
`ed`.`series_id` = 1
GROUP BY `ed`.`country`;
Note that this uses a different separator for the title, under the assumption that it might have a comma.

Related

how to count items inside group_concat method with mysql query

I have one to many table relationship :
one user for multiple event
one event for multiple event_attribute
Now, I group by userId and want to know how many for each event attribute ?
I am using group_concat like this:
group_concat(
concat(event_event_attribute.event_attr_id,
count( distinct event_event_attribute.value)
) group by event_attr_id)
)
group by userId
So here, I first group by userId, then group concat event-attribute, at least I hope to have :
(attr1, 10),(attr2, 30)....
all in one row.
But this does not work at all
Any suggestions?
To be more specific, this is the DB schema I am using:
CREATE TABLE `user` (
`id` int(11) NOT NULL,
`name` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
);
CREATE TABLE `event` (
`id` int(11) NOT NULL,
`name` varchar(45) DEFAULT NULL,
`user_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
);
CREATE TABLE `event_attr` (
`id` int(11) NOT NULL,
`att_name` varchar(45) DEFAULT NULL,
`event_id` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`)
);
INSERT INTO `user` VALUES (1,'user1'),(2,'user2'),(3,'user3');
INSERT INTO `event` VALUES (1,'event1',1),(2,'event2',1),(3,'event3',1),(4,'event4',2),(5,'event5',2),(6,'event6',3);
INSERT INTO `event_attr` VALUES (1,'att1','1'),(2,'att2','1'),(3,'att3','1'),(4,'att1','2'),(5,'att2',NULL);
Now if I am running:
select u.id, group_concat(e.name)
from user u
join event e on u.id=e.user_id
group by u.id
I will get:
1 event1,event2,event3
2 event4,event6
3 event 6
That is fine. But one step forward, I need to know count for each event_attt for each user, such as:
1 evet_att1:3;event_att2:2
2 event_att3:1
Then it is not possible. Can I use just one query to get above expected response?
should be the inverse alias concat the aggreagted values and not aggregated the concat
select concat (group_concat(event_event_attribute.event_attr_id )
,' - ',
count( distinct event_event_attribute.value) )
from event_event_attribute
group by userid
Otherwise could be you need an subquery for obtain the count group by event_attr_id
select group_concat(
concat(event_attr_id), ',', count_value)
)
from t (
select user_id, event_event_attribute.event_attr_id, count( distinct event_event_attribute.value) count_value
from event_event_attribute
group by event_attr_id
) t
group by user_id

Correctly optimising MySQL data for date range queries?

I have a table with lots of numeric data. I need to query this to get the closest row for a specific date_added and name.
My problem is that this data is not ordered by date, so when returning results I need to include ORDER BY date_added (or it doesn't return the correct row). Currently doing this takes a good 90 seconds to run because of this ORDER BY condition.
Are there any ways I can further optimise this? I've already indexed the date_added and name columns, so I'm not really sure what else can be done. I considered creating a new table with the data reordered in date_added order, but this isn't practical as new entries need to be added regularly.
I've stored the numeric data as decimal as it can potentially be very small, very large or both. Perhaps storing this data in a different way would be more efficient?
Add a compound index on name and date. The query above will run without using filesort.
An alternative way for the query:
SELECT date_added, data_1, data_2, data_3, data_4, data_5, data_6, data_7, data_8, data_9
FROM numeric_data
WHERE date_added =
(select min(date_added) from numeric_data where date_added >= '2018-05-03 11:00:00' and name = 'aaa')
and name = 'aaa'
limit 1;
Fiddle: http://sqlfiddle.com/#!9/4e8d89/1 .
You can use range partitioning:
https://dev.mysql.com/doc/refman/5.7/en/partitioning-range.html
You need to define your partitions depends on the date range you have.
CREATE TABLE `numeric_data` (
`id` int(255) NOT NULL AUTO_INCREMENT,
`date_added` datetime NOT NULL,
`name` varchar(8) COLLATE utf8mb4_unicode_ci NOT NULL,
`data_1` decimal(30,17) NOT NULL,
`data_2` decimal(30,17) NOT NULL,
`data_3` decimal(30,17) NOT NULL,
`data_4` decimal(30,17) NOT NULL,
`data_5` decimal(30,17) NOT NULL,
`data_6` decimal(30,17) NOT NULL,
`data_7` decimal(30,17) NOT NULL,
`data_8` decimal(30,17) NOT NULL,
`data_9` decimal(30,17) NOT NULL,
PRIMARY KEY (`id`),
KEY `date_added` (`date_added`),
KEY `name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=60000000 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
PARTITION BY RANGE( TO_DAYS(date_added) ) (
PARTITION p1 VALUES LESS THAN (TO_DAYS('2018-01-01')),
PARTITION p2 VALUES LESS THAN (TO_DAYS('2018-02-01')),
PARTITION p3 VALUES LESS THAN (TO_DAYS('2018-03-01')),
PARTITION p4 VALUES LESS THAN (TO_DAYS('2018-04-01')),
PARTITION future VALUES LESS THAN MAXVALUE
);
For the below query will only use partition "future" :
SELECT date_added, data_1, data_2, data_3, data_4, data_5, data_6, data_7, data_8, data_9
FROM numeric_data
WHERE date_added >= '2018-05-03 11:00:00'
AND name = 'aaa'
ORDER BY date_added LIMIT 1

Optimize a query

How can I proceed to make my response time more faster, approximately the average time of response is 0.2s ( 8039 records in my items table & 81 records in my tracking table )
Query
SELECT a.name, b.cnt FROM `items` a LEFT JOIN
(SELECT guid, COUNT(*) cnt FROM tracking WHERE
date > UNIX_TIMESTAMP(NOW() - INTERVAL 1 day ) GROUP BY guid) b ON
a.`id` = b.guid WHERE a.`type` = 'streaming' AND a.`state` = 1
ORDER BY b.cnt DESC LIMIT 15 OFFSET 75
Tracking table structure
CREATE TABLE `tracking` (
`id` bigint(11) NOT NULL AUTO_INCREMENT,
`guid` int(11) DEFAULT NULL,
`ip` int(11) NOT NULL,
`date` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `i1` (`ip`,`guid`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=4303 DEFAULT CHARSET=latin1;
Items table structure
CREATE TABLE `items` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`guid` int(11) DEFAULT NULL,
`type` varchar(255) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`embed` varchar(255) DEFAULT NULL,
`url` varchar(255) DEFAULT NULL,
`description` text,
`tags` varchar(255) DEFAULT NULL,
`date` int(11) DEFAULT NULL,
`vote_val_total` float DEFAULT '0',
`vote_total` float(11,0) DEFAULT '0',
`rate` float DEFAULT '0',
`icon` text CHARACTER SET ascii,
`state` int(11) DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=9258 DEFAULT CHARSET=latin1;
Your query, as written, doesn't make much sense. It produces all possible combinations of rows in your two tables and then groups them.
You may want this:
SELECT a.*, b.cnt
FROM `items` a
LEFT JOIN (
SELECT guid, COUNT(*) cnt
FROM tracking
WHERE `date` > UNIX_TIMESTAMP(NOW() - INTERVAL 1 day)
GROUP BY guid
) b ON a.guid = b.guid
ORDER BY b.cnt DESC
The high-volume data in this query come from the relatively large tracking table. So, you should add a compound index to it, using the columns (date, guid). This will allow your query to random-access the index by date and then scan it for guid values.
ALTER TABLE tracking ADD INDEX guid_summary (`date`, guid);
I suppose you'll see a nice performance improvement.
Pro tip: Don't use SELECT *. Instead, give a list of the columns you want in your result set. For example,
SELECT a.guid, a.name, a.description, b.cnt
Why is this important?
First, it makes your software more resilient against somebody adding columns to your tables in the future.
Second, it tells the MySQL server to sling around only the information you want. That can improve performance really dramatically, especially when your tables get big.
Since tracking has significantly fewer rows than items, I will propose the following.
SELECT i.name, c.cnt
FROM
(
SELECT guid, COUNT(*) cnt
FROM tracking
WHERE date > UNIX_TIMESTAMP(NOW() - INTERVAL 1 day )
GROUP BY guid
) AS c
JOIN items AS i ON i.id = c.guid
WHERE i.type = 'streaming'
AND i.state = 1;
ORDER BY c.cnt DESC
LIMIT 15 OFFSET 75
It will fail to display any items for which cnt is 0. (Your version displays the items with NULL for the count.)
Composite indexes needed:
items: The PRIMARY KEY(id) is sufficient.
tracking: INDEX(date, guid) -- "covering"
Other issues:
If ip is an IP-address, it needs to be INT UNSIGNED. But that covers only IPv4, not IPv6.
It seems like date is not just a "date", but really a date+time. Please rename it to avoid confusion.
float(11,0) -- Don't use FLOAT for integers. Don't use (m,n) on FLOAT or DOUBLE. INT UNSIGNED makes more sense here.
OFFSET is naughty when it comes to performance -- it must scan over the skipped records. But, in your query, there is no way to avoid collecting all the possible rows, sorting them, stepping over 75, and only finally delivering 15 rows. (And, with no more than 81, it won't be a full 15.)
What version are you using? There have been important changes to the Optimization of LEFT JOIN ( SELECT ... ). Please provide EXPLAIN SELECT for each query under discussion.

SQL VIEW simplification/solution faster Querys

I'm trying to break down and re-write a view that had been created by a long gone developer. The query takes well over three minuites to access, I'm assuming from all the CONCATs.
CREATE VIEW `active_users_over_time` AS
select
`users_activity`.`date` AS `date`,
time_format(
addtime(
concat(`users_activity`.`date`,' ',`users_activity`.`time`),
concat('0 ',sec_to_time(`users_activity`.`duration_checkout`),'.0')
),'%H:%i:%s') AS `time`,
`users_activity`.`username` AS `username`,
count(addtime(concat(`users_activity`.`date`,' ',`users_activity`.`time`),
concat('0 ',sec_to_time(`users_activity`.`duration_checkout`),'.0'))) AS `checkouts`
from `users_activity`
group by
concat(
addtime(
concat(`users_activity`.`date`,' ',`users_activity`.`time`),
concat('0 ',sec_to_time(`users_activity`.`duration_checkout`),'.0')
),
`users_activity`.`username`);
The data comes from the SQL table:
CREATE TABLE `users_activity` (
`id` int(10) unsigned NOT NULL auto_increment,
`featureid` smallint(5) unsigned NOT NULL,
`date` date NOT NULL,
`time` time NOT NULL,
`duration_checkout` int unsigned NOT NULL,
`update_date` date NOT NULL,
`username` varchar(255) NOT NULL,
`checkout` smallint(5) unsigned NOT NULL,
`licid` smallint(5) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `featureid_licid_username` (`featureid`,`licid`,`date`,`time`,`username`),
FOREIGN KEY(featureid) REFERENCES features(id)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
I'm having a hard time decifering what exactly what is needed and what isnt needed.
Anyone have any ideas? Thanks.
I think this does everything that the original query did, skipping a bunch of redundant steps:
select `date`
, `time`
, `username`
, count(1) as `checkouts`
from
(
select
`users_activity`.`date` AS `date`
,time_format(
addtime(`users_activity`.`date`,`users_activity`.`time`)
+ interval `users_activity`.`duration_checkout` second
,'%H:%i:%s'
) AS `time`
,`users_activity`.`username` AS `username`
from `users_activity`
) x
group by `username`, `date`, `time`
You may also want to look at what indexes are on the table to see if optimisations can be made elsewhere (e.g. if you don't already have an index on the username and date fields you'd get a lot of benefit for this query by adding one).
You can start from rewriting GROUP BY clase from this:
group by
concat(
addtime(
concat(`users_activity`.`date`,' ',`users_activity`.`time`),
concat('0 ',sec_to_time(`users_activity`.`duration_checkout`),'.0')
),
`users_activity`.`username`);
to this one:
GROUP BY `users_activity`.`date`,
`users_activity`.`time`,
`users_activity`.`duration_checkout`,
`users_activity`.`username`
This change should give some slight savings on converting dates to strings and concatenating them, and the result of the query shouldn't change.
Then you may consider creating a composite index on GROUP BY columns.
According to this link: http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html
The most important preconditions for using indexes for GROUP BY are that all GROUP BY columns reference attributes from the same index
It means, that if we create the following index:
CREATE INDEX idx_name ON `users_activity`(
`date`,`time`,`duration_checkout`,`username`
);
then MySql might use it to optimize GROUP BY (but there is no guarantee).

mysql select record containing highest value, joining on range of columns containing nulls

Here's what I'm working with:
CREATE TABLE IF NOT EXISTS `rate` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`client_company` int(11) DEFAULT NULL,
`client_group` int(11) DEFAULT NULL,
`client_contact` int(11) DEFAULT NULL,
`role` int(11) DEFAULT NULL,
`date_from` datetime DEFAULT NULL,
`hourly_rate` decimal(18,2) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `rate` (`id`, `client_company`, `client_group`,
`client_contact`, `role`, `date_from`, `hourly_rate`)
VALUES
(4, NULL, NULL, NULL, 3, '2012-07-30 14:48:16', 115.00),
(5, 3, NULL, NULL, 3, '2012-07-30 14:51:38', 110.00),
(6, 3, NULL, NULL, 3, '2012-07-30 14:59:20', 112.00);
This table stores chargeout rates for clients; the idea being that, when looking for the correct rate for a job role, we'd first look for a rate matching the given role and client contact, then if no rate was found, would try to match the role and the client group (or 'department'), then the client company, and finally looking for a global rate for just the role itself. Fine.
Rates can change over time, so the table may contain multiple entries matching any given combination of role, company, group and client contact: I want a query that will only return me the latest one for each distinct combination.
Given that I asked a near-identical question only days ago, and that this topic seems fairly frequent in various guises, I can only apologise for my slow-wittedness and ask once again for someone to explain why the query below is returning all three of the records above and not, as I want it to, only the records with IDs 4 and 6.
Is it something to do with my trying to join based on columns containing NULL?
SELECT
rate.*,
newest.id
FROM rate
LEFT JOIN rate AS newest ON(
rate.client_company = newest.client_company
AND rate.client_contact = newest.client_contact
AND rate.client_group = newest.client_group
AND rate.role= newest.role
AND newest.date_from > rate.date_from
)
WHERE newest.id IS NULL
FWIW, the problem WAS joining NULL columns. The vital missing ingredient was COALESCE:
SELECT
rate.*,
newest.id
FROM rate
LEFT JOIN rate AS newest ON(
COALESCE(rate.client_company,1) = COALESCE(newest.client_company,1)
AND COALESCE(rate.client_contact,1) = COALESCE(newest.client_contact,1)
AND COALESCE(rate.client_group,1) = COALESCE(newest.client_group,1)
AND COALESCE(rate.role,1) = COALESCE(newest.role,1)
AND newest.date_from > rate.date_from
)
WHERE newest.id IS NULL