What should i use instead of IN? - mysql

I have a query like this:
SELECT DISTINCT devices1_.id AS id27_, devices1_.createdTime AS createdT2_27_, devices1_.deletedOn AS deletedOn27_,
devices1_.deviceAlias AS deviceAl4_27_, devices1_.deviceName AS deviceName27_, devices1_.deviceTypeId AS deviceT21_27_,
devices1_.equipmentVendor AS equipmen6_27_, devices1_.exceptionDetail AS exceptio7_27_, devices1_.hardwareVersion AS hardware8_27_,
devices1_.ipAddress AS ipAddress27_, devices1_.isDeleted AS isDeleted27_, devices1_.loopBack AS loopBack27_,
devices1_.modifiedTime AS modifie12_27_, devices1_.osVersion AS osVersion27_, devices1_.productModel AS product14_27_,
devices1_.productName AS product15_27_, devices1_.routerType AS routerType27_, devices1_.rundate AS rundate27_,
devices1_.serialNumber AS serialN18_27_, devices1_.serviceName AS service19_27_, devices1_.siteId AS siteId27_,
devices1_.siteIdA AS siteIdA27_, devices1_.status AS status27_, devices1_.creator AS creator27_, devices1_.lastModifier AS lastMod25_27_
FROM goldenvariation goldenconf0_
INNER JOIN devices devices1_ ON goldenconf0_.deviceId=devices1_.id
CROSS JOIN devices devices2_
WHERE goldenconf0_.deviceId=devices2_.id
AND (goldenconf0_.classType = 'policy-options')
AND DATE(goldenconf0_.rundate)=DATE('2014-04-14 00:00:00')
AND devices2_.isDeleted=0
AND EXISTS (SELECT DISTINCT(deviceId) FROM goldenvariation goldenconf3_
WHERE (goldenconf3_.goldenVariationType = 'MISMATCH')
AND (goldenconf3_.classType = 'policy-options')
AND DATE(goldenconf3_.rundate)=DATE('2014-04-14 00:00:00'))
AND EXISTS (SELECT DISTINCT (deviceId) FROM goldenvariation goldenconf4_
WHERE (goldenconf4_.goldenVariationType = 'MISSING')
AND (goldenconf4_.classType = 'policy-options')
AND DATE(goldenconf4_.rundate)=DATE('2014-04-14 00:00:00'));
Its taking too much time, how i can rewrite the query and make it fast?
Table structure of goldervariation is:
CREATE TABLE `goldenvariation` (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`classType` VARCHAR(255) DEFAULT NULL,
`createdTime` DATETIME DEFAULT NULL,
`goldenValue` LONGTEXT,
`goldenXpath` VARCHAR(255) DEFAULT NULL,
`isMatched` TINYINT(1) DEFAULT NULL,
`modifiedTime` DATETIME DEFAULT NULL,
`pathValue` LONGTEXT,
`rundate` DATETIME DEFAULT NULL,
`value` LONGTEXT,
`xpath` VARCHAR(255) DEFAULT NULL,
`deviceId` BIGINT(20) DEFAULT NULL,
`goldenXpathId` BIGINT(20) DEFAULT NULL,
`creator` INT(10) UNSIGNED DEFAULT NULL,
`lastModifier` INT(10) UNSIGNED DEFAULT NULL,
`goldenVariationType` VARCHAR(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `FK6804472AD99F2D15` (`deviceId`),
KEY `FK6804472A98002838` (`goldenXpathId`),
KEY `FK6804472A27C863B` (`creator`),
KEY `FK6804472A3617A57C` (`lastModifier`),
KEY `rundateindex` (`rundate`),
KEY `varitionidindex` (`id`),
KEY `classTypeindex` (`classType`),
CONSTRAINT `FK6804472A27C863B` FOREIGN KEY (`creator`) REFERENCES `users` (`userid`),
CONSTRAINT `FK6804472A3617A57C` FOREIGN KEY (`lastModifier`) REFERENCES `users` (`userid`),
CONSTRAINT `FK6804472A98002838` FOREIGN KEY (`goldenXpathId`) REFERENCES `goldenconfigurationxpath` (`id`),
CONSTRAINT `FK6804472AD99F2D15` FOREIGN KEY (`deviceId`) REFERENCES `devices` (`id`)
) ENGINE=INNODB AUTO_INCREMENT=1868865 DEFAULT CHARSET=latin1;
And explain plan of query is :
"1" "PRIMARY" "goldenconf0_" "ref" "FK6804472AD99F2D15,classTypeindex" "classTypeindex" "258" "const" "179223" "Using where; Using temporary"
"1" "PRIMARY" "devices2_" "eq_ref" "PRIMARY,deviceindex" "PRIMARY" "8" "cmdb.goldenconf0_.deviceId" "1" "Using where"
"1" "PRIMARY" "devices1_" "eq_ref" "PRIMARY,deviceindex" "PRIMARY" "8" "cmdb.goldenconf0_.deviceId" "1" ""
"3" "DEPENDENT SUBQUERY" "goldenconf4_" "index_subquery" "FK6804472AD99F2D15,classTypeindex" "FK6804472AD99F2D15" "9" "func" "19795" "Using where"
"2" "DEPENDENT SUBQUERY" "goldenconf3_" "index_subquery" "FK6804472AD99F2D15,classTypeindex" "FK6804472AD99F2D15" "9" "func" "19795" "Using where"

INNER JOIN goldenvariation goldenconf4_
ON goldenconf4_.deviceId = goldenconf0_deviceId
AND (goldenconf4_.goldenVariationType = 'MISSING')
AND (goldenconf4_.classType = 'policy-options')
AND DATE(goldenconf4_.rundate)=DATE('2014-04-14 00:00:00'))
In the same way change another EXISTS. I think this one should work much faster. Also small tips from me: try to use shorter aliases. Your query is really hard to read.
SELECT DISTINCT
devices1_.id AS id27_,
devices1_.createdTime AS createdT2_27_,
devices1_.deletedOn AS deletedOn27_,
devices1_.deviceAlias AS deviceAl4_27_,
devices1_.deviceName AS deviceName27_,
devices1_.deviceTypeId AS deviceT21_27_,
devices1_.equipmentVendor AS equipmen6_27_,
devices1_.exceptionDetail AS exceptio7_27_,
devices1_.hardwareVersion AS hardware8_27_,
devices1_.ipAddress AS ipAddress27_,
devices1_.isDeleted AS isDeleted27_,
devices1_.loopBack AS loopBack27_,
devices1_.modifiedTime AS modifie12_27_,
devices1_.osVersion AS osVersion27_,
devices1_.productModel AS product14_27_,
devices1_.productName AS product15_27_,
devices1_.routerType AS routerType27_,
devices1_.rundate AS rundate27_,
devices1_.serialNumber AS serialN18_27_,
devices1_.serviceName AS service19_27_,
devices1_.siteId AS siteId27_,
devices1_.siteIdA AS siteIdA27_,
devices1_.status AS status27_,
devices1_.creator AS creator27_,
devices1_.lastModifier AS lastMod25_27_
FROM goldenvariation goldenconf0_
INNER JOIN devices devices1_ ON goldenconf0_.deviceId=devices1_.id
INNER JOIN goldenvariation a on a.deviceId = goldenconf0_.deviceId and a.goldenVariationType = 'MISMATCH'
INNER JOIN goldenvariation b on b.deviceId = goldenconf0_.deviceId and b.goldenVariationType = 'MISSING'
WHERE (goldenconf0_.classType = 'policy-options')
AND convert(date,goldenconf0_.rundate) = '2014-04-14'
AND devices1_.isDeleted=0
Try this one. Should work much faster than your query. You joined table using CROSS JOIN but not even 1 column from this was used in SELECT.

You are looking for elements associated with the golden variations table via EXISTS. I would start with that table to get distinct IDs, then join to your devices table. Also, when converting dates, you won't be able to take advantage of an INDEX (if so part of index).
INDEX... ( classType, rundate, goldenVariationType, deviceID )
CHANGE the date clause to >= ? and < ?+1 this way, you get the entire date range from 12:00:00 morning to 11:59:59pm of the same day and the index can utilize the date component without converting for every record.
Also, you are doing a cross-join to the devices table TWICE on the matching "ID" from the goldenVariations table to devices 1 and 2 on same ID which is wasteful and not doing anything.
Your devices table should have an index ON (id, isDeleted)
SELECT
d1.id AS id27,
d1.createdTime AS createdT2_27,
d1.deletedOn AS deletedOn27,
d1.deviceAlias AS deviceAl4_27_,
d1.deviceName AS deviceName27_,
d1.deviceTypeId AS deviceT21_27_,
d1.equipmentVendor AS equipmen6_27_,
d1.exceptionDetail AS exceptio7_27_,
d1.hardwareVersion AS hardware8_27_,
d1.ipAddress AS ipAddress27_,
d1.isDeleted AS isDeleted27_,
d1.loopBack AS loopBack27_,
d1.modifiedTime AS modifie12_27_,
d1.osVersion AS osVersion27_,
d1.productModel AS product14_27_,
d1.productName AS product15_27_,
d1.routerType AS routerType27_,
d1.rundate AS rundate27_,
d1.serialNumber AS serialN18_27_,
d1.serviceName AS service19_27_,
d1.siteId AS siteId27_,
d1.siteIdA AS siteIdA27_,
d1.status AS status27_,
d1.creator AS creator27_,
d1.lastModifier AS lastMod25_27_
from
( SELECT distinct
gv.deviceID
from
goldenVariation gv
where
gv.classType = 'policy-options'
AND gv.runDate >= '2014-04-14'
AND gv.runDate < '2014-04-15'
AND gv.goldenVariationType IN ( 'MISSING', 'MISMATCH' )) PQ
JOIN devices d1
ON PQ.deviceId = d1.id
AND d1.isDeleted = 0

Yes, the query could be rewritten to improve performance (though it looks like a query generated by Hibernate, and getting Hibernate to use a different query can be a challenge.)
How sure are you that this query is returning the resultset you expect? Because the query is rather odd.
In terms of performance, dollars to donuts, its the repeated executions of the dependent subqueries that are really eating your lunch, and your lunchbox, in terms of performance. It looks like MySQL is using the index on the deviceId column to satisfy that subquery, and that doesn't look like the most appropriate index.
We notice that there are two JOIN operations to the devices table; there is no reason this table needs to be joined twice. Both JOIN operations require a match to the deviceID column of goldenvariation, the second join to the devices table does additional filtering with the isDeleted=0. The keywords INNER and CROSS don't have any impact on the statement at all; and the second join to the devices table isn't really a "cross" join, it's really an inner join. (We prefer to see the join predicates in an ON clause rather than the WHERE clause.
The DATE() function wrapped around the rundate column disables an index range scan operation. These predicates can be rewritten to take advantage of an appropriate index.
The DISTINCT(deviceId) in the SELECT list of an EXISTS subquery is very strange. Firstly, DISTINCT is a keyword, not a function. There's no need for parens around deviceId. But beyond that, it doesn't matter what is returned in the SELECT list of the EXISTS subquery, it could just be SELECT 1.
It's odd to see an EXISTS predicate with a query that doesn't reference any expression in from the outer query (i.e. a correlated subquery). It's valid syntax. With a correlated subquery, MySQL performs that query for each and every row returned by the outer query. The EXPLAIN output looks like MySQL is doing the same thing, it didn't recognize any optimization.
The way those EXIST predicates are written, if there isn't a 'policy-options' row with 'MISMATCH' AND there isn't a 'policy-options' row with 'MISSING' (for the specified date, then the query will not return any rows. If a row of each type is found (for the specified date, then ALL of 'policy-options' rows for that date are returned. (It's syntactically valid, but it's rather odd.)
Assuming that the id column on the devices table is UNIQUE (i.e. it's the PRIMARY KEY or there's a UNIQUE index on that column, then the DISTINCT keyword is unnecessary on the outermost query. (From the EXPLAIN output, it looks like MySQL already optimized away the usual operations, that is, MySQL recognized that the DISTINCT keyword is unnecessary.
But bottom line, it's the dependent subqueries that are killing performance; the absence of suitable indexes, and the predicate on the date column wrapped in a function.
To answer your question, yes, this query can be rewritten to return an equivalent resultset more efficiently. (It's not entirely clear that the query is returning the resultset you expect.)
SELECT d1.id AS id27_
, d1.createdTime AS createdT2_27_
, d1.deletedOn AS deletedOn27_
, d1.deviceAlias AS deviceAl4_27_
, d1.deviceName AS deviceName27_
, d1.deviceTypeId AS deviceT21_27_
, d1.equipmentVendor AS equipmen6_27_
, d1.exceptionDetail AS exceptio7_27_
, d1.hardwareVersion AS hardware8_27_
, d1.ipAddress AS ipAddress27_
, d1.isDeleted AS isDeleted27_
, d1.loopBack AS loopBack27_
, d1.modifiedTime AS modifie12_27_
, d1.osVersion AS osVersion27_
, d1.productModel AS product14_27_
, d1.productName AS product15_27_
, d1.routerType AS routerType27_
, d1.rundate AS rundate27_
, d1.serialNumber AS serialN18_27_
, d1.serviceName AS service19_27_
, d1.siteId AS siteId27_
, d1.siteIdA AS siteIdA27_
, d1.status AS status27_
, d1.creator AS creator27_
, d1.lastModifier AS lastMod25_27_
FROM devices d1
JOIN (SELECT g.deviceId
FROM goldenvariation g
CROSS
JOIN (SELECT 1
FROM goldenvariation x3
WHERE x3.goldenVariationType = 'MISMATCH'
AND x3.classType = 'policy-options'
AND x3.rundate >= '2014-04-14'
AND x3.rundate < '2014-04-14' + INTERVAL 1 DAY
LIMIT 1
) t3
CROSS
JOIN (SELECT 1
FROM goldenvariation x4
WHERE x4.goldenVariationType = 'MISSING'
AND x4.classType = 'policy-options'
AND x4.rundate >= '2014-04-14'
AND x4.rundate < '2014-04-14' + INTERVAL 1 DAY
LIMIT 1
) t4
WHERE g.classType = 'policy-options'
AND g.rundate >= '2014-04-14'
AND g.rundate < '2014-04-14' + INTERVAL 1 DAY
GROUP BY g.deviceId
) t2
ON t2.device_id = d1.id
WHERE d1.isDeleted=0

Related

MYSQL ERROR CODE: 1288 - can't update with join statement

Thanks for past help.
While doing an update using a join, I am getting the 'Error Code: 1288. The target table _____ of the UPDATE is not updatable' and figure out why. I can update the table with a simple update statement (UPDATE sales.customerABC Set contractID = 'x';) but can't using a join like this:
UPDATE (
SELECT * #where '*' contains columns a.uniqueID and a.contractID
FROM sales.customerABC
WHERE contractID IS NULL
) as a
LEFT JOIN (
SELECT uniqueID, contractID
FROM sales.tblCustomers
WHERE contractID IS NOT NULL
) as b
ON a.uniqueID = b.uniqueID
SET a.contractID = b.contractID;
If changing that update statement a SELECT such as:
SELECT * FROM (
SELECT *
FROM opwSales.dealerFilesCTS
WHERE pcrsContractID IS NULL
) as a
LEFT JOIN (
SELECT uniqueID, pcrsContractID
FROM opwSales.dealerFileLoad
WHERE pcrsContractID IS NOT NULL
) as b
ON a."Unique ID" = b.uniqueID;
the result table would contain these columns:
a.uniqueID, a.contractID, b.uniqueID, b.contractID
59682204, NULL, NULL, NULL
a3e8e81d, NULL, NULL, NULL
cfd1dbf9, NULL, NULL, NULL
5ece009c, , 5ece009c, B123
5ece0d04, , 5ece0d04, B456
5ece7ab0, , 5ece7ab0, B789
cfd21d2a, NULL, NULL, NULL
cfd22701, NULL, NULL, NULL
cfd23032, NULL, NULL, NULL
I pretty much have all database privileges and can't find restrictions with the table reference data. Can't find much information online concerning the error code, either.
Thanks in advance guys.
You cannot update a sub-select because it's not a "real" table - MySQL cannot easily determine how the sub-select assignment maps back to the originating table.
Try:
UPDATE customerABC
JOIN tblCustomers USING (uniqueID)
SET customerABC.contractID = tblCustomers.contractID
WHERE customerABC.contractID IS NULL AND tblCustomers.contractID IS NOT NULL
Notes:
you can use a full JOIN instead of a LEFT JOIN, since you want uniqueID to exist and not be null in both tables. A LEFT JOIN would generate extra NULL rows from tblCustomers, only to have them shot down by the clause requirement that tblCustomers.contractID be not NULL. Since they allow more stringent restrictions on indexes, JOINs tend to be more efficient than LEFT JOINs.
since the field has the same name in both tables you can replace ON (a.field1 = b.field1) with the USING (field1) shortcut.
you obviously strongly want a covering index with (uniqueID, customerID) on both tables to maximize efficiency
this is so not going to work unless you have "real" tables for the update. The "tblCustomers" may be a view or a subselect, but customerABC may not. You might need a more complicated JOIN to pull out a complex WHERE which might be otherwise hidden inside a subselect, if the original 'SELECT * FROM customerABC' was indeed a more complex query than a straight SELECT. What this boils down to is, MySQL needs a strong unique key to know what it needs to update, and it must be in a single table. To reliably update more than one table I think you need two UPDATEs inside a properly write-locked transaction.

mysql with few tables, subquery on one large table performs slow

We are experiencing slow performance with a query on mysql database and we are not sure if the query is wrong or maybe mysql or server is not good enough.
The query with a subquery returns some project details (3 fields) and filename of the latest taken picture of a online camera.
Info
Table 'projects' contains 40 records.
Table 'cameras' contains approx 40 records (1 project, multiple cameras possible)
Table 'cameraimages' contains around 250000 (250 thousand) records. (1 camera can have thousands of images)
Engine is InnoDb
Database size is about 100Mb approx
No indexes are added yet.
Version number mysql 8.0.15
This is the query
SELECT
pj.title,
pj.description,
pj.city,
(SELECT cmi.filename
FROM cameras cm
LEFT JOIN cameraimages cmi ON cmi.cameraId = cm.id
WHERE cm.projectId = pj.id
ORDER BY cmi.dateRecording DESC
LIMIT 0,1) as latestfilename
FROM
projects pj
It takes 40-50 seconds to return this data.
That is to long for a webpage but I think it should take not that long at all.
We tested the same query on another server, to compare. Same data, same query.
That takes 25 seconds.
My questions are:
Is this query to 'heavy/bad' and if it is, what query should perform better?
Is there a way, or what should I check, to find out why this query runs better on an older/other server?
Hope someone can give some advice.
Thnx!
Additional info
CREATE TABLE `cameras` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`guid` varchar(50) DEFAULT NULL,
`title` varchar(50) DEFAULT NULL,
`longitude` double DEFAULT NULL,
`latitude` double DEFAULT NULL,
`status` smallint(6) DEFAULT NULL,
`cameraUid` varchar(20) DEFAULT NULL,
`cameraFriendlyName` varchar(50) DEFAULT NULL,
`projectId` int(11) DEFAULT NULL,
`dateCreated` datetime DEFAULT NULL,
`dateModified` datetime DEFAULT NULL,
`address` varchar(100) DEFAULT NULL,
`city` varchar(50) DEFAULT NULL,
`createArchive` smallint(6) DEFAULT '0',
`createDaily` smallint(6) DEFAULT '1',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=88 DEFAULT CHARSET=latin1
Columns cameraId,dateRecording is unique.
One camera takes on picture at the time.
You're using a so-called dependent subquery. That's slow.
I guess cameraimages.id is a primary key for your cameraimages file. That's a guess. You didn't provide enough information in your question to answer it with certainty.
I also guess that the dateRecording values in cameraimages are in the same order as your autoincrementing primary key id values. That is, I guess you INSERT a record to that table at the time each image is captured.
Let's break this down.
You want the id of the most recent image from each project. How can you get that? Write a subquery to retrieve the largest, most recent id for each project.
SELECT cm.projectId,
MAX(cmi.id) imageId
FROM cameras cm
JOIN cameraimages cmi ON cmi.cameraId = cm.id
GROUP BY cm.projectId
That subquery does the heavy lifting of searching your big table. It does it just once, not for every project, so it won't take as long.
Then put that subquery into your query to retrieve the columns you need.
SELECT
pj.title,
pj.description,
pj.city,
cmi.filename latestfilename
FROM projects pj
JOIN (
SELECT cm.projectId,
MAX(cmi.id) imageId
FROM cameras cm
JOIN cameraimages cmi ON cmi.cameraId = cm.id
GROUP BY cm.projectId
) latest ON pj.id = latest.projectId
JOIN cameraimages cmi ON cmi.imageId = latest.imageId
This has a series of JOINs making a chain from projects to the latest subquery and from there to cameraimages.
This depends on cameraimages.id values being in chronological order. It can still be done if they aren't in that order with a more elaborate query.
Indexes:
cm: INDEX(projectId, id)
cmi: INDEX(cameraId, dateRecording, filename)
cmi: INDEX(cameraId, id)
When cameraimages.id values aren't in chronological order, we need to work with the latest dateRecording values.
This is going to require a sequence of subqueries. So, rather than nesting them, let's use MySQL 8+ Common Table Expressions. It's a big query.
WITH
ProjectCameraImage AS (
/* a virtual version of the cameraimages table including projectId */
SELECT cmi.id, cmi.dateRecording, cm.projectId, cm.cameraId
FROM cameras cm
JOIN cameraimages cmi ON cm.id = cmi.cameraId
),
LatestDate AS (
/* the latest date for each entry in ProjectCameraImage */
/* Notice how this uses MAX rather than ORDER BY ... DESC LIMIT 1 */
SELECT projectId, cameraId,
MAX(dateRecording) dateRecording
FROM ProjectCameraImage
GROUP BY projectId, cameraId
),
ProjectCameraLatest AS (
/* the cameraimage.id values for the latest images in ProjectCameraImage */
SELECT ProjectCameraImage.id,
ProjectCameraImage.projectId,
ProjectCameraImage.cameraId,
ProjectCameraImage.dateRecording
FROM ProjectCameraImage
JOIN LatestDate
ON ProjectCameraImage.projectId = LatestDate.projectId
AND ProjectCameraImage.cameraId = LatestDate.cameraId
AND ProjectCameraImage.dateRecording = LatestDate.dateRecording
),
LatestProjectDate AS (
/* the latest data for each entry in ProjectCameraLatest */
SELECT projectId,
MAX(dateRecording) dateRecording
FROM ProjectCameraLatest
GROUP BY projectId
),
ProjectLatest AS (
/* the cameraimage.id values for the latest images in ProjectCameraLatest */
SELECT ProjectCameraLatest.id,
ProjectCameraLatest.projectId
FROM ProjectCameraLatest
JOIN LatestProjectDate
ON ProjectCameraLatest.projectId = LatestProjectDate.projectId
AND ProjectCameraLatest.dateRecording = LatestProjectDate.dateRecording
)
/* the main query */
SELECT pj.title,
pj.description,
pj.city,
cmi.filename latestfilename
FROM projects pj
JOIN ProjectLatest ON pj.id = ProjectLatest.projectId
JOIN cameraimages cmi ON ProjectLatest.id = cmi.id;
It's big because we have to go through two different cycles of finding the cameraimages.id value with the largest dateRecording.
Edit The heavy lifting, in terms of searching your tables, happens in the second common table expression (CTE), the one called LatestDate. I suggest adding an index to your cameraimages table as follows to give it a boost.
CREATE INDEX cmi_cameraid_daterec
ON cameraimages (cameraId, dateRecording DESC);
That compound index should allow random access by cameraId, then quick access to the latest date. Notice that it also should help the ProjectCameraLatest CTE.
You can test the performance of this by changing the last SELECT, the one in the main query, to just SELECT * FROM LatestDate;. And to see whether / how it uses the index try using EXPLAIN or EXPLAIN ANALYZE: use EXPLAIN SELECT * FROM LatestDate; as the main query.
You may learn some useful things about indexes if you run EXPLAIN with and without the index.

Query results take too long. Is there a better way to write this MySQL query?

I am trying to optimize a mysql query that works perfectly but is taking way too long. My inventory table is nearly 300,000 records (not too bad). I am not sure if using a subquery or join or additional index would speed up my results. I do have the district_id columns indexed in both the students and inventory tables.
Basically, the query below pulls all the inventory of all students in a teacher's roster. So it first has to search the students table to find which students are in the teacher's roster, then has to search the inventory table for each student. So if a teacher has 30+ students it can be a lot of searches through the inventory and each student can have 30+ pieces of inventory. Any advice would be helpful!
SELECT inventory.inventory_id, items.title, items.isbn, items.item_num,
items.price, conditions.condition_name, inventory.check_out,
inventory.check_in, inventory.student_id, inventory.teacher_id
FROM inventory, conditions, items, students
WHERE students.teacher_id = '$teacher_id'
AND students.district_id = $district_id
AND inventory.student_id = students.s_number
AND inventory.district_id = $district_id
AND inventory.item_id = items.item_id
AND items.consumable !=1
AND conditions.condition_id = inventory.condition_id
ORDER BY inventory.student_id, inventory.inventory_id
Here is the table structure:
CREATE TABLE `inventory` (
`id` int(11) NOT NULL,
`inventory_id` varchar(10) CHARACTER SET utf8 NOT NULL DEFAULT '0',
`item_id` int(6) NOT NULL DEFAULT '0',
`district_id` int(2) NOT NULL DEFAULT '0',
`condition_id` int(1) NOT NULL DEFAULT '0',
`check_out` date NOT NULL DEFAULT '0000-00-00',
`check_in` date NOT NULL DEFAULT '0000-00-00',
`student_id` varchar(10) CHARACTER SET utf8 NOT NULL DEFAULT '0',
`teacher_id` varchar(6) CHARACTER SET utf8 NOT NULL DEFAULT '0',
`acquisition_date` date NOT NULL DEFAULT '0000-00-00',
`notes` text CHARACTER SET utf8 NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
First you rewrite this to use explicit JOINs:
SELECT inventory.inventory_id,
items.title, items.isbn, items.item_num, items.price,
conditions.condition_name,
inventory.check_out, inventory.check_in,
inventory.student_id, inventory.teacher_id
FROM inventory
JOIN conditions ON (conditions.condition_id = inventory.condition_id)
JOIN items ON (inventory.item_id = items.item_id AND items.consumable != 1)
JOIN students ON (inventory.student_id = students.s_number)
WHERE students.teacher_id = '$teacher_id'
AND students.district_id = $district_id
AND inventory.district_id = $district_id
ORDER BY inventory.student_id, inventory.inventory_id
Then you examine the JOINs. For example this:
JOIN items ON (inventory.item_id = items.item_id AND items.consumable != 1)
means that the items table needs to be scanned on item_id and consumable, which might be a constant. It is always better to not use negative conditions if possible. But at the very least you index items on item_id (unless it's already the primary key, as is likely). If consumable can assume, say, values 0, 1, 2, 3, then you go:
JOIN items ON (inventory.item_id = items.item_id AND items.consumable IN (0, 2, 3))
and use CREATE INDEX to add an index on consumable.
You may notice that a few columns from inventory are always used in the other JOINs, and there are also some constant constraints.
So another useful index could be
CREATE INDEX ... ON inventory(district_id, student_id, item_id, condition_id)
Another useful index would be
ON students(teacher_id, district_id, student_id, s_number)
which allows immediately restricting the WHERE on the involved students, and retrieve the information required by the JOINs without ever loading the table, just using the index.
Switch to InnoDB! Some of what I am about to say is less efficient in InnoDB.
SELECT i.inventory_id,
items.title, items.isbn, items.item_num, items.price,
c.condition_name,
i.check_out, i.check_in, i.student_id, i.teacher_id
FROM inventory AS i
JOIN conditions AS c ON c.condition_id = i.condition_id
JOIN items ON i.item_id = items.item_id
JOIN students AS s ON i.student_id = s.s_number
WHERE s.teacher_id = '$teacher_id'
AND s.district_id = $district_id
AND i.student_id = s.s_number
AND i.district_id = $district_id
AND items.consumable != 1
ORDER BY i.student_id, i.inventory_id
To help the Optimizer if it would like to start with students:
students: INDEX(district_id, teacher_id, s_number)
Note: this is also "covering", thereby avoiding bouncing between index BTree and data BTree. (What is the PK of students? Please provide SHOW CREATE TABLE.)
If consuming the ORDER BY is better:
inventory: INDEX(district_id, student_id, inventory_id)
Also needed:
items: (item_id) -- probably already the PRIMARY KEY?
conditions: (condition_id) -- probably already the PRIMARY KEY?
Verify or add those 4 indexes. (The Optimizer will dynamically choose what to do.)

SQL alternative to sub-query in FROM

I have a table containing user to user messages. A conversation has all messages between two users. I am trying to get a list of all the different conversations and display only the last message sent in the listing.
I am able to do this with a SQL sub-query in FROM.
CREATE TABLE `messages` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`from_user_id` bigint(20) DEFAULT NULL,
`to_user_id` bigint(20) DEFAULT NULL,
`type` smallint(6) NOT NULL,
`is_read` tinyint(1) NOT NULL,
`is_deleted` tinyint(1) NOT NULL,
`text` longtext COLLATE utf8_unicode_ci NOT NULL,
`heading` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`created_at_utc` datetime DEFAULT NULL,
`read_at_utc` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
);
SELECT * FROM
(SELECT * FROM `messages` WHERE TYPE = 1 AND
(from_user_id = 22 OR to_user_id = 22)
ORDER BY created_at_utc DESC
) tb
GROUP BY from_user_id, to_user_id;
SQL Fiddle:
http://www.sqlfiddle.com/#!2/845275/2
Is there a way to do this without a sub-query?
(writing a DQL which supports sub-queries only in 'IN')
You seem to be trying to get the last contents of messages to or from user 22 with type = 1. Your method is explicitly not guaranteed to work, because the extra columns (not in the group by) can come from arbitrary rows. As explained in the [documentation][1]:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Furthermore, the selection of values from each group cannot be
influenced by adding an ORDER BY clause. Sorting of the result set
occurs after values have been chosen, and ORDER BY does not affect
which values within each group the server chooses.
The query that you want is more along the lines of this (assuming that you have an auto-incrementing id column for messages):
select m.*
from (select m.from_user_id, m.to_user_id, max(m.id) as max_id
from message m
where m.type = 1 and (m.from_user_id = 22 or m.to_user_id = 22)
) lm join
messages m
on lm.max_id = m.id;
Or this:
select m.*
from message m
where m.type = 1 and (m.from_user_id = 22 or m.to_user_id = 22) and
not exists (select 1
from messages m2
where m2.type = m.type and m2.from_user_id = m.from_user_id and
m2.to_user_id = m.to_user_id and
m2.created_at_utc > m.created_at_utc
);
For this latter query, an index on messages(type, from_user_id, to_user_id, created_at_utc) would help performance.
Since this is a rather specific type of data query which goes outside common ORM use cases, DQL isn't really fit for this - it's optimized for walking well-defined relationships.
For your case however Doctrine fully supports native SQL with result set mapping. Using a NativeQuery with ResultSetMapping like this you can easily use the subquery this problem requires, and still map the results on native Doctrine entities, allowing you to still profit from all caching, usability and performance advantages.
Samples found here.
If you mean to get all conversations and all their last messages, then a subquery is necessary.
SELECT a.* FROM messages a
INNER JOIN (
SELECT
MAX(created_at_utc) as max_created,
from_user_id,
to_user_id
FROM messages
GROUP BY from_user_id, to_user_id
) b ON a.created_at_utc = b.max_created
AND a.from_user_id = b.from_user_id
AND a.to_user_id = b.to_user_id
And you could append the where condition as you like.
THE SQL FIDDLE.
I don't think your original query was even doing this correctly. Not sure what the GROUP BY was being used for other than maybe try to only return a single (unpredictable) result.
Just add a limit clause:
SELECT * FROM `messages`
WHERE `type` = 1 AND
(`from_user_id` = 22 OR `to_user_id` = 22)
ORDER BY `created_at_utc` DESC
LIMIT 1
For optimum query performance you need indexes on the following fields:
type
from_user_id
to_user_id
created_at_utc

optimising and scaling mysql structure + queries for large mailing groups

So I have a system that stores contacts and allows them to be put into groups. These groups can be defined by criteria (everyone with surname 'smith'), or by explicitly adding / excluding people.
The problem I am having is that when I list the mailing groups, I need to count how many contacts are in each one. This number can change as contacts are added / removed from the contacts table. On small groups / amounts of contacts it is fine, however using 50k ish contacts runs into problems
An example query I use for this is as follows:
SELECT COUNT(c_id) FROM contacts, mgroups
LEFT JOIN mgroups_explicit ON mg_id = me_mg_id
WHERE mgroups.site_id = '10'
AND mg_id = '20'
AND me_c_id = c_id
AND contacts.site_id = '10'
OR (contacts.site_id = '10' AND ( c_tags LIKE '%tag1%')) AND c_id NOT IN
( SELECT mex_c_id FROM mgroups_exclude WHERE c_id = mex_c_id ) GROUP BY c_id
The criteria table does not feature in this query, as the problem presents itself when large groups are created explicitly, rather than with a criteria. This is required as criteria based groups grow or shrink on the fly as you modify your contacts, where as explicit is generally set in stone. So in this case, if you explicitly add 20k contacts to a group, it adds 20k rows to the table marked with that mg_id as a foreign key.
This basically takes ages / times out / gets the wrong number / generally doesn't work very well. I either need to figure out a more efficient query, or figure out a better way to store everything.
Any ideas?
The 5 main tables that make up the database
contacts - where the actual contacts reside
Field Type Null Default Comments
c_id int(8) No
site_id int(6) No
c_email varchar(500) No
c_source varchar(255) No
c_subscribed tinyint(1) No 0
c_special tinyint(1) No 0
c_domain text No
c_title varchar(12) No
c_name varchar(128) No
c_surname varchar(128) No
c_company varchar(128) No
c_jtitle text No
c_ad1 text No
c_ad2 text No
c_ad3 text No
c_county varchar(64) No
c_city varchar(128) No
c_postcode varchar(32) No
c_lat varchar(100) No
c_lng varchar(100) No
c_country varchar(64) No
c_tel varchar(20) No
c_mob varchar(20) No
c_dob date No
c_registered datetime No
c_updated datetime No
c_twitter varchar(255) No
c_facebook varchar(255) No
c_tags text No
c_special_1 text No
c_special_2 text No
c_special_3 text No
c_special_4 text No
c_special_5 text No
c_special_6 text No
c_special_7 text No
c_special_8 text No
mgroups - basic mailing group info
Field Type Null Default Comments
mg_id int(8) No
site_id int(6) No
mg_name varchar(255) No
mg_created datetime No
mgroups_criteria - criteria for said mailing groups
Field Type Null Default Comments
mc_id int(8) No
site_id int(6) No
mc_mg_id int(8) No
mc_criteria text No
mgroups_exclude - anyone to exclude from criteria
Field Type Null Default Comments
mex_id int(8) No
site_id int(6) No
mex_c_id int(8) No
mex_mg_id int(8) No
mgroups_explicit - anyone to explicitly add without the use of criteria
Field Type Null Default Comments
me_id int(8) No
site_id int(6) No
me_c_id int(8) No
me_mg_id int(8) No
And the indexs / explain of query. Must admit, indexes are not my strong point, any improvements?
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY mgroups ALL PRIMARY,mg_id NULL NULL NULL 9 Using temporary; Using filesort
1 PRIMARY mgroups_explicit ref me_mg_id me_mg_id 4 engine_4.mgroups.mg_id 8750
1 PRIMARY contacts ALL PRIMARY,c_id NULL NULL NULL 86012 Using where; Using join buffer
2 DEPENDENT SUBQUERY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE noticed after reading const table...
I don't see any indexes in the schema above, you do have indexes don't you?
run an explain on the query
EXPLAIN
SELECT COUNT(c_id) FROM
contacts, mgroups LEFT JOIN mgroups_explicit ON mg_id = me_mg_id
WHERE
mgroups.site_id = '10'
AND mg_id = '20'
AND me_c_id = c_id
AND contacts.site_id = '10'
OR (contacts.site_id = '10'
AND ( c_tags LIKE '%tag1%'))
AND c_id NOT IN (SELECT mex_c_id FROM mgroups_exclude WHERE c_id = mex_c_id ) GROUP BY c_id
That will tell you about what indexes are being used how many records it has to sort through etc..
DC
Right so I got this answered elsewhere (Huge thanks to Hambut_Bulge), so for the sake of it being useful to anyone else heres the solution:
First things off you're mixing old and new (ANSI) style joins in the same query. This is considered a bad idea in SQL circles. By old style I mean we write a query with a join along these lines
SELECT a.column_name, b.column2
FROM table1 a, second_table b
WHERE a.id_key = b.fid_key
AND b.some_other_criteria = 'Y';
In the newer ANSI style we'd rewrite the above to this:
SELECT a.column_name, b.column2
FROM table1 a INNER JOIN second_table b ON a.id_key = b.fid_key
WHERE b.some_other_criteria = 'Y';
Its neater and easier to read which bits are join conditions and which are where clauses. Its also best to get into the habit of using ANSI style as old style support may (at some point) be discontinued.
Also try and be consistent in your use of dot notation and/or aliases. Again it makes big queries easier to read.
Back to your problem query, I began by starting to convert it into ANSI style and straight-away noticed that you don't have a join condition between contacts and mgroups. This means that optimizer will create a cross join (also called a cartesian product), which was probably something you don't want to do. The cross join (in case you didn't know) joins every row in the contacts table with every row in the mgroups table. So if you have 50,000 rows in contacts and 20,000 rows in mgroup you're going to get a joined result set containing 1,000,000,000 rows!
The other thing that is going to slow this query drastically is the subquery on mgroups_exclude. A subquery is executed once for each row in the outer query eg:
SELECT a.column1
FROM table1 a
WHERE a.id_key NOT IN ( SELECT * FROM table2 b WHERE a.id_key = b.fid_key);
Assume that table1 has 2,000,000 rows and table2 has 500,000. For each and every row in the outer query (table1) the database is going to have to do a full scan on the inner query. So to get a result the database will have read 1,000,000,000,000 rows and we may only be interested in 1,000! It will not touch any indexes no matter what.
To get around this we can use a left join (also called a left outer join) on the two tables.
SELECT a.column1
FROM table1 a LEFT JOIN table2 b ON a.id_key = b.fid_key
WHERE b.fid_key IS NULL;
An outer join does not require each record in the joined tables to have a matching record. So the example above we'd get all the records from table1 even if there is no match on table2. For non-matched records the database returns a NULL and we can test for that in the where clause. Now the optimizer can scan the indexes on the two tables id_key fields (assuming there are any), resulting in a much faster query.
So, to wrap up. I'd rewrite your orginal query thus:
SELECT COUNT( a.c_id )
FROM contacts a
INNER JOIN mgroups b ON a.c_id = b.mg_id
LEFT JOIN mgroups_explicit c ON b.mg_id = c.me_mg_id
LEFT JOIN mgroups_exclude d ON a.c_id = d.mex_c_id
WHERE b.mg_id = '20'
AND a.site_id = '10'
AND a.c_tags LIKE '%tag1%'
AND d.mex_c_id IS NULL
GROUP BY c_id;