I have a problem where I'm trying to get counts of data associated with multiple events occurring on separate days.
Let's say I'm following a group of plane-spotters, each of whom may spot many planes from around the world on any one day, while based at some particular airport. I'd like to produce a list consisting of one row per spotter per day, with columns for the spotter's ID, the date, how many planes he (it's always "he", right?) spotted on that day, how many individual airlines the planes belonged to, and how many countries the airlines belonged to. So, I'd like to have results like this:
+-----------+------------+---------+----------+-----------+
| | | Planes | | Airline |
| SpotterID | Date | spotted | Airlines | Countries |
+-----------+------------+---------+----------+-----------+
| 1234 | 2017-04-15 | 28 | 11 | 4 |
+-----------+------------+---------+----------+-----------+
| 1234 | 2017-04-16 | 65 | 19 | 7 |
+-----------+------------+---------+----------+-----------+
| 5678 | 2017-04-22 | 39 | 14 | 6 |
+-----------+------------+---------+----------+-----------+
| 6677 | 2017-04-28 | 74 | 29 | 9 |
+-----------+------------+---------+----------+-----------+
So, (MySQL 5.7.17) my test tables are as sketched out below (I may have forgotten a couple of indexes).
The plane-spotting events table is defined as:
CREATE TABLE `SpottingEvents` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`SpotterID` int(11) NOT NULL,
`SpotDateTime` datetime NOT NULL,
`PlaneID` int(11) NOT NULL,
`Notes` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `indx_SpotterID_PlaneID_DateTime` (`SpotterID`,`PlaneID`,`SpotDateTime`),
KEY `indx_SpotterID_DateTime` (`SpotterID`,`SpotDateTime`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8
the planes table is defined as:
CREATE TABLE `Planes` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`PlaneID` int(11) NOT NULL,
`PlaneTypeID` int(11) NOT NULL,
`AirlineID` int(11) NOT NULL,
`Notes` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `indx_PlaneID_AirlineID` (`PlaneID`,`AirlineID`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8
the airlines table is defined as:
CREATE TABLE `Airlines` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`AirlineID` int(11) NOT NULL,
`AirlineName` varchar(100) NOT NULL,
`CountryID` int(11) NOT NULL,
`Notes` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `indx_AirlineID` (`AirlineID`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8
and the countries table is defined as:
CREATE TABLE `Countries` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`CountryID` int(11) NOT NULL,
`CountryName` varchar(100) NOT NULL,
`Notes` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `indx_CountryID` (`CountryID`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8
My problem is with the last two columns, the Airlines and Countries counts. I've tried several ways to do this, including the following:
select distinct sev.SpotDateTime, sev.SpotterID, count(*) as planes_count,
(select count(*) from ( select distinct cnt.CountryID
from Countries as cnt
inner join Airlines as aln on al.CountryID = cnt.CountryID
inner join Planes as pl on pl.AirlineID = aln.AirlineID
where pl.PlaneID = sev.PlaneID ) as t1) as countries_count
from SpottingEvents as sev
# where sev.UserID = 1234
group by SpotDateTime
order by SpotDateTime
which leads to an error Unknown column 'sev.planeID' in 'where clause'
but since I'm no expert, I'm just not getting the intended results. So, how do I achieve the desired results?
You are looking for COUNT(DISTINCT). Join the tables needed, then count.
select
se.spotterid,
date(se.spotdatetime) as spot_date,
count(*) as planes,
count(distinct p.airlineid) as airlines,
count(distinct a.countryid) as countries
from spottingevents se
join planes p on p.planeid = se.planeid
join airlines a on a.airlineid = p.airlineid
group by se.spotterid, date(se.spotdatetime)
order by date(se.spotdatetime), se.spotterid;
Related
My question is in reference to this question.
Say I have three tables, user, country and user_activity whose schema/table-structure are given below:-
|---------------------------------------------|
| id | fname | lname | status |
|---------------------------------------------|
country:-
|--------------------------------|
| id | name | status |
|--------------------------------|
user_activity
|-------------------------------------------------------------------------|
| id | user_id | activity_type | country_id | location | status |
|-------------------------------------------------------------------------|
Say, I create the user table like this:-
DROP TABLE IF EXISTS `user`;
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`fname` varchar(255) NOT NULL,
`lname` varchar(255) NOT NULL,
`status` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `iduser` (`id`,`status`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8;
Then, I create the country table like this:-
CREATE TABLE `country` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`status` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `idcountry` (`id`,`status`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8;
I want to create a table user_activity where the status of user_activity record would change according to the status of user if user_activity.user_id matches user.id
At the same time, if status of user_activity record would change according to the status of country if user_activity.country_id matches country.id.
How can I do that? How can I achieve the objective at the database level, instead of setting it via PHP scripting?
I currently join 5 tables to select 20 objects to show the user, unfortunately if I use GROUP BY and ORDER BY it gets really slow.
An example query looks Like this:
SELECT r.name, l.name, o.typ, o.id, persons, children, description, rating, totalratings, minprice, picture FROM angebote as a
JOIN objekte as o ON a.fid_objekt = o.id
JOIN regionen as r ON a.fid_region = r.id
JOIN laender as l ON a.fid_land = l.id
WHERE l.slug="aegypten" AND a.letztes_angebot >= 1
GROUP BY a.fid_objekt ORDER BY rating DESC LIMIT 0,20
The EXPLAIN of the Query shows this:
+------+-------------+-------+--------+----------------------------+------------+---------+---------------------------------------+--------+--------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+--------+----------------------------+------------+---------+---------------------------------------+--------+--------------------------------------------------------+
| 1 | SIMPLE | l | ref | PRIMARY,slug | slug | 767 | const | 1 | Using index condition; Using temporary; Using filesort |
| 1 | SIMPLE | o | ALL | PRIMARY | NULL | NULL | NULL | 186779 | Using join buffer (flat, BNL join) |
| 1 | SIMPLE | a | ref | unique_key,letztes_angebot | unique_key | 8 | ferienhaeuser.o.id,ferienhaeuser.l.id | 1 | Using where |
| 1 | SIMPLE | r | eq_ref | PRIMARY | PRIMARY | 4 | ferienhaeuser.a.fid_region | 1 | |
+------+-------------+-------+--------+----------------------------+------------+---------+---------------------------------------+--------+--------------------------------------------------------+
So it looks like it doesn't use a key for the table objekte, the Profiling says it uses 2.7s for Copying to tmp table.
Instead of FROM angebote or JOIN objekte I tried it with (SELECT * GROUP BY id) but unfortunately this doesn't improve.
The fields used for WHERE, ORDER BY and GROUP BY are also indexed.
I think I missed some basic concept here and any help will be appreciated.
Since it's most probable I made a mistake with the Tables, here the description of them:
Objekte
+---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| objekte | CREATE TABLE `objekte` (
`id` int(11) NOT NULL,
`typ` varchar(50) NOT NULL,
`persons` int(11) NOT NULL,
`children` int(11) NOT NULL,
`description` text NOT NULL,
`rating` float NOT NULL,
`totalratings` int(11) NOT NULL,
`minprice` float NOT NULL,
`picture` varchar(255) NOT NULL,
`last_offer` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `minprice` (`minprice`),
KEY `rating` (`rating`),
KEY `last_offer` (`last_offer`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Angebote

| angebote | CREATE TABLE `angebote` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`fid_objekt` int(11) NOT NULL,
`fid_land` int(11) NOT NULL,
`fid_region` int(11) NOT NULL,
`fid_subregion` int(11) NOT NULL,
`letztes_angebot` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_key` (`fid_objekt`,`fid_land`,`fid_region`,`fid_subregion`),
KEY `letztes_angebot` (`letztes_angebot`),
KEY `fid_objekt` (`fid_objekt`),
KEY `fid_land` (`fid_land`),
KEY `fid_region` (`fid_region`),
KEY `fid_subregion` (`fid_subregion`)
) ENGINE=InnoDB AUTO_INCREMENT=2433073 DEFAULT CHARSET=utf8 |

laender, regionen, subregionen (same structure)
+---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| laender | CREATE TABLE `laender` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`iso` varchar(2) NOT NULL,
`name` varchar(255) NOT NULL,
`slug` varchar(255) NOT NULL,
`letztes_angebot` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `iso` (`iso`),
KEY `slug` (`slug`),
KEY `letztes_angebot` (`letztes_angebot`)
) ENGINE=InnoDB AUTO_INCREMENT=107 DEFAULT CHARSET=utf8 |
+---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
First of all this is a non standard group by. As such it will stop working when you upgrade to mysql 5.7.
The biggest problem comes from the fact that no index is used on the objekte table. To make matters worse you are ordering on the ratings field on that table but the index is still not being used. A possible solution is to create a composite index like this:
CREATE INDEX objekte_idx ON objekte(id,rating);
You do not need to use GROUP BY here. You have not use aggregrate functions. So remove GROUP BY from query. Remove the Group By will increase query performance. Also no need to define 0 for limit.
SELECT r.name, l.name, o.typ, o.id, persons, children, description, rating, totalratings, minprice, picture FROM angebote as a
JOIN objekte as o ON a.fid_objekt = o.id
JOIN regionen as r ON a.fid_region = r.id
JOIN laender as l ON a.fid_land = l.id
WHERE l.slug="aegypten" AND a.letztes_angebot >= 1
ORDER BY rating DESC LIMIT 20
I need some help with my SQL request. I have this table:
CREATE TABLE `contact` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`first_name` varchar(65) DEFAULT NULL,
`last_name` varchar(65) DEFAULT NULL,
`phonenumber` varchar(45) NOT NULL,
`avinumber` varchar(45) DEFAULT NULL,
`date` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`source` varchar(45) NOT NULL,
`nbrtryies` int(11) NOT NULL DEFAULT '0',
`treaty_at` date DEFAULT NULL,
`comment` varchar(255) DEFAULT '',
`status` varchar(45) DEFAULT 'KO',
`campagne_id` int(11) NOT NULL,
`treated_by` int(11) DEFAULT NULL,
`nodetree_id` int(11) DEFAULT NULL,
`avi` varchar(45) DEFAULT NULL,
`sessionid` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`),
KEY `fk_contact_campagne1_idx` (`campagne_id`),
KEY `fk_contact_agent1_idx` (`treated_by`),
CONSTRAINT `fk_contact_agent` FOREIGN KEY (`treated_by`) REFERENCES `user` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `fk_contact_campagne1` FOREIGN KEY (`campagne_id`) REFERENCES `campagne` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=82 DEFAULT CHARSET=latin1;
I need to count the number of the rows and the number of the rows when the column:
treated_by
has a certain value and group the result by that column
Thsi is what I did but I doesn't seem to work:
SELECT
co.treated_by AS userId,
COUNT(*) AS treated,
SUM(CASE WHEN ca.userId=1 THEN 1 ELSE 0 END) AS total
FROM
contact AS co
INNER JOIN
campagne AS ca ON ca.id = co.campagne_id
WHERE
(co.date BETWEEN '2013-07-09' AND '2014-08-15')
AND co.treated_by IN (2 , 40)
GROUP BY co.treated_by
This is what I got:
----------------------------
| userId | treated | total |
----------------------------
| 2 | 5 | 5 |
----------------------------
| 40 | 3 | 3 |
----------------------------
And I need something like:
----------------------------
| userId | treated | total |
----------------------------
| 2 | 5 | 20 |
----------------------------
| 40 | 3 | 20 |
----------------------------
Thanks a lot for your help
GROUP BY co.treated_by , It produces record COUNT(*) AS treated 5 and 3 , so SUM(CASE WHEN ca.userId=1 THEN 1 ELSE 0 END) AS total , It is impossible much than 20 and 20.
If the condition is on co.treated_by, then why does the logic contain ca.userId?
Perhaps this will work:
SELECT co.treated_by AS userId,
COUNT(*) AS treated,
SUM(co.treated_by) AS total
FROM contact co INNER JOIN
campagne ca
ON ca.id = co.campagne_id
WHERE (co.date BETWEEN '2013-07-09' AND '2014-08-15') AND
co.treated_by IN (2 , 40)
GROUP BY co.treated_by;
Note that I also replaced the calculation for total with a simpler version supported by MySQL.
mysql> explain
select c.userEmail,f.customerId
from comments c
inner join flows f
on (f.id = c.typeId)
inner join users u
on (u.email = c.userEmail)
where c.addTime >= 1372617000
and c.addTime <= 1374776940
and c.type = 'flow'
and c.automated = 0;
+----+-------------+-------+--------+----------------------------------------+------------+---------+---------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+----------------------------------------+------------+---------+---------------------+--------+-------------+
| 1 | SIMPLE | f | index | PRIMARY | customerId | 4 | NULL | 144443 | Using index |
| 1 | SIMPLE | c | ref | userEmail_idx,addTime,automated,typeId | typeId | 198 | f.id,const | 1 | Using where |
| 1 | SIMPLE | u | eq_ref | email | email | 386 | c.userEmail | 1 | Using index |
+----+-------------+-------+--------+----------------------------------------+------------+---------+---------------------+--------+-------------+
How do I make the above query faster - it constantly shows up in the slow query logs.
Indexes present :
id is the auto incremented primary key of the flows table.
customerId of flows table.
userEmail of comments table.
composite index (typeId,type) on comments table.
email of users table (unique)
automated of comments table.
addTime of comments table.
Number of rows :
1. flows - 150k
2. comments - 500k (half of them have automated = 1 and others have automated = 0) (also value of type is 'flow' for all the rows except 500)
3. users - 50
Table schemas :
users | CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`email` varchar(128) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `email` (`email`)
) ENGINE=InnoDB AUTO_INCREMENT=56 DEFAULT CHARSET=utf8
comments | CREATE TABLE `comments` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`userEmail` varchar(128) DEFAULT NULL,
`content` mediumtext NOT NULL,
`addTime` int(11) NOT NULL,
`typeId` int(11) NOT NULL,
`automated` tinyint(4) NOT NULL,
`type` varchar(64) NOT NULL,
PRIMARY KEY (`id`),
KEY `userEmail_idx` (`userEmail`),
KEY `addTime` (`addTime`),
KEY `automated` (`automated`),
KEY `typeId` (`typeId`,`type`)
) ENGINE=InnoDB AUTO_INCREMENT=572410 DEFAULT CHARSET=utf8 |
flows | CREATE TABLE `flows` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`type` varchar(32) NOT NULL,
`status` varchar(128) NOT NULL,
`customerId` int(11) NOT NULL,
`createTime` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `flowType_idx` (`type`),
KEY `customerId` (`customerId`),
KEY `status` (`status`),
KEY `createTime` (`createTime`),
) ENGINE=InnoDB AUTO_INCREMENT=134127 DEFAULT CHARSET=utf8 |
You have the required indexes to perform the joins efficiently. However, it looks like MySQL is joining the tables in a less efficient manner. The EXPLAIN output shows that it is doing a full index scan of the flows table then joining the comments table.
It will probably be more efficient to read the comments table first before joining. That is, in the order you have specified in your query so that the comment set is restricted by the predicates you have supplied (probably what you intended).
Running OPTIMISE TABLE or ANALYZE TABLE can improve the decision that the query optimiser makes. Particularly on tables that have had extensive changes.
If the query optimiser still gets it wrong you can force tables to be read in the order you specify in the query by beginning your statement with SELECT STRAIGHT_JOIN or by changing the INNER JOIN to STRAIGHT_JOIN.
Example table content
'main'
| id | total |
| 1 | 10 |
| 2 | 20 |
'timed'
| id | id_main | date_from | date_to | total |
| 1 | 2 | 2012-03-29 | 2012-04-29 | 50 |
Desired result
| id | total |
| 1 | 10 |
| 2 | 50 |
Not exactly working query
SELECT main.id AS id, COALESCE(timed.total, main.total) AS total
FROM main
LEFT JOIN timed
ON main.id = timed.id_main
WHERE SYSDATE() BETWEEN timed.date_from AND timed.date_to
Result
| id | total |
| 2 | 50 |
In tables 'main' and 'timed' 'total' field will never be NULL.
In some 'timed' records there will be no relative 'id_main', or there will be few, but they will differ, 'date_from' 'date_to' never intersect.
Table 'main' is large, but in 'timed' will always be two or three relative records.
CREATE TABLE `main` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`total` decimal(10,2) unsigned NOT NULL DEFAULT '0.00',
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
INSERT INTO `main` VALUES (1,10);
INSERT INTO `main` VALUES (2,20);
CREATE TABLE `timed` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`id_main` int(11) unsigned NOT NULL DEFAULT '0',
`date_from` date DEFAULT NULL,
`date_to` date DEFAULT NULL,
`total` decimal(10,2) unsigned NOT NULL DEFAULT '0.00',
PRIMARY KEY (`id`),
KEY `link` (`id_main`)
) ENGINE=InnoDB;
INSERT INTO `timed` VALUES (1,2,'2012-03-29','2012-03-30',50);
ALTER TABLE `timed`
ADD CONSTRAINT `link` FOREIGN KEY (`id_main`)
REFERENCES `main` (`id`) ON DELETE CASCADE ON UPDATE CASCADE;
Sorry for my english.
You should move the date condition in the join condition:
SELECT main.id AS id, COALESCE(timed.total, main.total) AS total
FROM main
LEFT JOIN timed
ON main.id = timed.id_main and SYSDATE() BETWEEN timed.date_from AND timed.date_to
In your query, those rows not matched are filtered out by the WHERE condition because timed.date_form and timed.date_to are null, so sysdate can't be between them :)