MySQL 5.6: Complex Query with Grouped Parameters - mysql

I'm far from a MYSQL expert, and I'm struggling with a relatively complicated query.
I have two tables:
A Data table with columns as follows:
`Location` bigint(20) unsigned NOT NULL,
`Source` bigint(20) unsigned NOT NULL,
`Param` bigint(20) unsigned NOT NULL,
`Type` bigint(20) unsigned NOT NULL,
`InitTime` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`ValidTime` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`Value` double DEFAULT NULL
A Location Group table with columns as follows:
`Group` bigint(20) unsigned NOT NULL,
`Location` bigint(20) unsigned NOT NULL,
The data table stores data of interest, where each 'value' is valid for a particular 'validtime'. However, the data in the table comes from a calculation which is run periodically. The initialisation time at which the calculation is run is stored in the 'inittime' field. A given calculation with particular inittime may result in, say 10 values being output with valid times (A - J). A more recent calculation, with a more recent inittime, may result in another 10 values being output with valid times (B - K). There is therefore an overlap in available values. I always want a result set of Values and ValidTimes for the most recent inittime (i.e. max(inittime)).
I can determine the most recent inittime using the following query:
SELECT MAX(InitTime)
FROM Data
WHERE
Location = 100060 AND
Source = 10 AND
Param = 1 AND
Type = 1;
This takes 0.072 secs to execute.
However, using this as a sub-query to retrieve data from the Data table results in an execution time of 45 seconds (it's a pretty huge table, but not super ridiculous).
Sub-Query:
SELECT Location, ValidTime, Value
FROM Data data
WHERE Source = 10
AND Location IN (SELECT Location FROM Location Group WHERE Group = 3)
AND InitTime = (SELECT max(data2.InitTime) FROM Data data2 WHERE data.Location = data2.Location AND data.Source = data2.Source AND data.Param = data2.Param AND data.Type = data2.Type)
ORDER BY Location, ValidTime ASC;
(Snipped ValidTime qualifiers for brevity)
I know there's likely some optimisation that would help here, but I'm not sure where to start. Instead, I created a stored procedure to effectively perform the MAX(InitTime) query, but because the MAX(InitTime) is determined by a combo of Location, Source, Param and Type, I need to pass in all the locations that comprise a particular group. I implemented a cursors-based stored procedure for this before realising there must be an easier way.
Putting aside the question of optimisation via indices, how could I efficiently perform a query on the data table using the most recent InitTime for a given location group, source, param and type?
Thanks in advance!

MySQL can do a poor job optimizing IN with a subquery (sometimes). Also, indexes might be able to help. So, I would write the query as:
SELECT d.Location, d.ValidTime, d.Value
FROM Data d
WHERE d.Source = 10 AND
EXISTS (SELECT 1 FROM LocationGroup lg WHERE d.Location = lg.Location and lg.Group = 3) AND
d.InitTime = (SELECT max(d2.InitTime)
FROM Data d2
WHERE d.Location = d2.Location AND
d.Source = d2.Source AND
d.Param = d2.Param AND
d.Type = d2.Type
)
ORDER BY d.Location, d.ValidTime ASC;
For this query, you want indexes on data(Location, Source, Param, Type, InitTime) and LocationGroup(Location, Group), and data(Source, Location, ValidTime).

Related

MYSQL ERROR CODE: 1288 - can't update with join statement

Thanks for past help.
While doing an update using a join, I am getting the 'Error Code: 1288. The target table _____ of the UPDATE is not updatable' and figure out why. I can update the table with a simple update statement (UPDATE sales.customerABC Set contractID = 'x';) but can't using a join like this:
UPDATE (
SELECT * #where '*' contains columns a.uniqueID and a.contractID
FROM sales.customerABC
WHERE contractID IS NULL
) as a
LEFT JOIN (
SELECT uniqueID, contractID
FROM sales.tblCustomers
WHERE contractID IS NOT NULL
) as b
ON a.uniqueID = b.uniqueID
SET a.contractID = b.contractID;
If changing that update statement a SELECT such as:
SELECT * FROM (
SELECT *
FROM opwSales.dealerFilesCTS
WHERE pcrsContractID IS NULL
) as a
LEFT JOIN (
SELECT uniqueID, pcrsContractID
FROM opwSales.dealerFileLoad
WHERE pcrsContractID IS NOT NULL
) as b
ON a."Unique ID" = b.uniqueID;
the result table would contain these columns:
a.uniqueID, a.contractID, b.uniqueID, b.contractID
59682204, NULL, NULL, NULL
a3e8e81d, NULL, NULL, NULL
cfd1dbf9, NULL, NULL, NULL
5ece009c, , 5ece009c, B123
5ece0d04, , 5ece0d04, B456
5ece7ab0, , 5ece7ab0, B789
cfd21d2a, NULL, NULL, NULL
cfd22701, NULL, NULL, NULL
cfd23032, NULL, NULL, NULL
I pretty much have all database privileges and can't find restrictions with the table reference data. Can't find much information online concerning the error code, either.
Thanks in advance guys.
You cannot update a sub-select because it's not a "real" table - MySQL cannot easily determine how the sub-select assignment maps back to the originating table.
Try:
UPDATE customerABC
JOIN tblCustomers USING (uniqueID)
SET customerABC.contractID = tblCustomers.contractID
WHERE customerABC.contractID IS NULL AND tblCustomers.contractID IS NOT NULL
Notes:
you can use a full JOIN instead of a LEFT JOIN, since you want uniqueID to exist and not be null in both tables. A LEFT JOIN would generate extra NULL rows from tblCustomers, only to have them shot down by the clause requirement that tblCustomers.contractID be not NULL. Since they allow more stringent restrictions on indexes, JOINs tend to be more efficient than LEFT JOINs.
since the field has the same name in both tables you can replace ON (a.field1 = b.field1) with the USING (field1) shortcut.
you obviously strongly want a covering index with (uniqueID, customerID) on both tables to maximize efficiency
this is so not going to work unless you have "real" tables for the update. The "tblCustomers" may be a view or a subselect, but customerABC may not. You might need a more complicated JOIN to pull out a complex WHERE which might be otherwise hidden inside a subselect, if the original 'SELECT * FROM customerABC' was indeed a more complex query than a straight SELECT. What this boils down to is, MySQL needs a strong unique key to know what it needs to update, and it must be in a single table. To reliably update more than one table I think you need two UPDATEs inside a properly write-locked transaction.

mysql with few tables, subquery on one large table performs slow

We are experiencing slow performance with a query on mysql database and we are not sure if the query is wrong or maybe mysql or server is not good enough.
The query with a subquery returns some project details (3 fields) and filename of the latest taken picture of a online camera.
Info
Table 'projects' contains 40 records.
Table 'cameras' contains approx 40 records (1 project, multiple cameras possible)
Table 'cameraimages' contains around 250000 (250 thousand) records. (1 camera can have thousands of images)
Engine is InnoDb
Database size is about 100Mb approx
No indexes are added yet.
Version number mysql 8.0.15
This is the query
SELECT
pj.title,
pj.description,
pj.city,
(SELECT cmi.filename
FROM cameras cm
LEFT JOIN cameraimages cmi ON cmi.cameraId = cm.id
WHERE cm.projectId = pj.id
ORDER BY cmi.dateRecording DESC
LIMIT 0,1) as latestfilename
FROM
projects pj
It takes 40-50 seconds to return this data.
That is to long for a webpage but I think it should take not that long at all.
We tested the same query on another server, to compare. Same data, same query.
That takes 25 seconds.
My questions are:
Is this query to 'heavy/bad' and if it is, what query should perform better?
Is there a way, or what should I check, to find out why this query runs better on an older/other server?
Hope someone can give some advice.
Thnx!
Additional info
CREATE TABLE `cameras` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`guid` varchar(50) DEFAULT NULL,
`title` varchar(50) DEFAULT NULL,
`longitude` double DEFAULT NULL,
`latitude` double DEFAULT NULL,
`status` smallint(6) DEFAULT NULL,
`cameraUid` varchar(20) DEFAULT NULL,
`cameraFriendlyName` varchar(50) DEFAULT NULL,
`projectId` int(11) DEFAULT NULL,
`dateCreated` datetime DEFAULT NULL,
`dateModified` datetime DEFAULT NULL,
`address` varchar(100) DEFAULT NULL,
`city` varchar(50) DEFAULT NULL,
`createArchive` smallint(6) DEFAULT '0',
`createDaily` smallint(6) DEFAULT '1',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=88 DEFAULT CHARSET=latin1
Columns cameraId,dateRecording is unique.
One camera takes on picture at the time.
You're using a so-called dependent subquery. That's slow.
I guess cameraimages.id is a primary key for your cameraimages file. That's a guess. You didn't provide enough information in your question to answer it with certainty.
I also guess that the dateRecording values in cameraimages are in the same order as your autoincrementing primary key id values. That is, I guess you INSERT a record to that table at the time each image is captured.
Let's break this down.
You want the id of the most recent image from each project. How can you get that? Write a subquery to retrieve the largest, most recent id for each project.
SELECT cm.projectId,
MAX(cmi.id) imageId
FROM cameras cm
JOIN cameraimages cmi ON cmi.cameraId = cm.id
GROUP BY cm.projectId
That subquery does the heavy lifting of searching your big table. It does it just once, not for every project, so it won't take as long.
Then put that subquery into your query to retrieve the columns you need.
SELECT
pj.title,
pj.description,
pj.city,
cmi.filename latestfilename
FROM projects pj
JOIN (
SELECT cm.projectId,
MAX(cmi.id) imageId
FROM cameras cm
JOIN cameraimages cmi ON cmi.cameraId = cm.id
GROUP BY cm.projectId
) latest ON pj.id = latest.projectId
JOIN cameraimages cmi ON cmi.imageId = latest.imageId
This has a series of JOINs making a chain from projects to the latest subquery and from there to cameraimages.
This depends on cameraimages.id values being in chronological order. It can still be done if they aren't in that order with a more elaborate query.
Indexes:
cm: INDEX(projectId, id)
cmi: INDEX(cameraId, dateRecording, filename)
cmi: INDEX(cameraId, id)
When cameraimages.id values aren't in chronological order, we need to work with the latest dateRecording values.
This is going to require a sequence of subqueries. So, rather than nesting them, let's use MySQL 8+ Common Table Expressions. It's a big query.
WITH
ProjectCameraImage AS (
/* a virtual version of the cameraimages table including projectId */
SELECT cmi.id, cmi.dateRecording, cm.projectId, cm.cameraId
FROM cameras cm
JOIN cameraimages cmi ON cm.id = cmi.cameraId
),
LatestDate AS (
/* the latest date for each entry in ProjectCameraImage */
/* Notice how this uses MAX rather than ORDER BY ... DESC LIMIT 1 */
SELECT projectId, cameraId,
MAX(dateRecording) dateRecording
FROM ProjectCameraImage
GROUP BY projectId, cameraId
),
ProjectCameraLatest AS (
/* the cameraimage.id values for the latest images in ProjectCameraImage */
SELECT ProjectCameraImage.id,
ProjectCameraImage.projectId,
ProjectCameraImage.cameraId,
ProjectCameraImage.dateRecording
FROM ProjectCameraImage
JOIN LatestDate
ON ProjectCameraImage.projectId = LatestDate.projectId
AND ProjectCameraImage.cameraId = LatestDate.cameraId
AND ProjectCameraImage.dateRecording = LatestDate.dateRecording
),
LatestProjectDate AS (
/* the latest data for each entry in ProjectCameraLatest */
SELECT projectId,
MAX(dateRecording) dateRecording
FROM ProjectCameraLatest
GROUP BY projectId
),
ProjectLatest AS (
/* the cameraimage.id values for the latest images in ProjectCameraLatest */
SELECT ProjectCameraLatest.id,
ProjectCameraLatest.projectId
FROM ProjectCameraLatest
JOIN LatestProjectDate
ON ProjectCameraLatest.projectId = LatestProjectDate.projectId
AND ProjectCameraLatest.dateRecording = LatestProjectDate.dateRecording
)
/* the main query */
SELECT pj.title,
pj.description,
pj.city,
cmi.filename latestfilename
FROM projects pj
JOIN ProjectLatest ON pj.id = ProjectLatest.projectId
JOIN cameraimages cmi ON ProjectLatest.id = cmi.id;
It's big because we have to go through two different cycles of finding the cameraimages.id value with the largest dateRecording.
Edit The heavy lifting, in terms of searching your tables, happens in the second common table expression (CTE), the one called LatestDate. I suggest adding an index to your cameraimages table as follows to give it a boost.
CREATE INDEX cmi_cameraid_daterec
ON cameraimages (cameraId, dateRecording DESC);
That compound index should allow random access by cameraId, then quick access to the latest date. Notice that it also should help the ProjectCameraLatest CTE.
You can test the performance of this by changing the last SELECT, the one in the main query, to just SELECT * FROM LatestDate;. And to see whether / how it uses the index try using EXPLAIN or EXPLAIN ANALYZE: use EXPLAIN SELECT * FROM LatestDate; as the main query.
You may learn some useful things about indexes if you run EXPLAIN with and without the index.

SQL alternative to sub-query in FROM

I have a table containing user to user messages. A conversation has all messages between two users. I am trying to get a list of all the different conversations and display only the last message sent in the listing.
I am able to do this with a SQL sub-query in FROM.
CREATE TABLE `messages` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`from_user_id` bigint(20) DEFAULT NULL,
`to_user_id` bigint(20) DEFAULT NULL,
`type` smallint(6) NOT NULL,
`is_read` tinyint(1) NOT NULL,
`is_deleted` tinyint(1) NOT NULL,
`text` longtext COLLATE utf8_unicode_ci NOT NULL,
`heading` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`created_at_utc` datetime DEFAULT NULL,
`read_at_utc` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
);
SELECT * FROM
(SELECT * FROM `messages` WHERE TYPE = 1 AND
(from_user_id = 22 OR to_user_id = 22)
ORDER BY created_at_utc DESC
) tb
GROUP BY from_user_id, to_user_id;
SQL Fiddle:
http://www.sqlfiddle.com/#!2/845275/2
Is there a way to do this without a sub-query?
(writing a DQL which supports sub-queries only in 'IN')
You seem to be trying to get the last contents of messages to or from user 22 with type = 1. Your method is explicitly not guaranteed to work, because the extra columns (not in the group by) can come from arbitrary rows. As explained in the [documentation][1]:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Furthermore, the selection of values from each group cannot be
influenced by adding an ORDER BY clause. Sorting of the result set
occurs after values have been chosen, and ORDER BY does not affect
which values within each group the server chooses.
The query that you want is more along the lines of this (assuming that you have an auto-incrementing id column for messages):
select m.*
from (select m.from_user_id, m.to_user_id, max(m.id) as max_id
from message m
where m.type = 1 and (m.from_user_id = 22 or m.to_user_id = 22)
) lm join
messages m
on lm.max_id = m.id;
Or this:
select m.*
from message m
where m.type = 1 and (m.from_user_id = 22 or m.to_user_id = 22) and
not exists (select 1
from messages m2
where m2.type = m.type and m2.from_user_id = m.from_user_id and
m2.to_user_id = m.to_user_id and
m2.created_at_utc > m.created_at_utc
);
For this latter query, an index on messages(type, from_user_id, to_user_id, created_at_utc) would help performance.
Since this is a rather specific type of data query which goes outside common ORM use cases, DQL isn't really fit for this - it's optimized for walking well-defined relationships.
For your case however Doctrine fully supports native SQL with result set mapping. Using a NativeQuery with ResultSetMapping like this you can easily use the subquery this problem requires, and still map the results on native Doctrine entities, allowing you to still profit from all caching, usability and performance advantages.
Samples found here.
If you mean to get all conversations and all their last messages, then a subquery is necessary.
SELECT a.* FROM messages a
INNER JOIN (
SELECT
MAX(created_at_utc) as max_created,
from_user_id,
to_user_id
FROM messages
GROUP BY from_user_id, to_user_id
) b ON a.created_at_utc = b.max_created
AND a.from_user_id = b.from_user_id
AND a.to_user_id = b.to_user_id
And you could append the where condition as you like.
THE SQL FIDDLE.
I don't think your original query was even doing this correctly. Not sure what the GROUP BY was being used for other than maybe try to only return a single (unpredictable) result.
Just add a limit clause:
SELECT * FROM `messages`
WHERE `type` = 1 AND
(`from_user_id` = 22 OR `to_user_id` = 22)
ORDER BY `created_at_utc` DESC
LIMIT 1
For optimum query performance you need indexes on the following fields:
type
from_user_id
to_user_id
created_at_utc

Strange query results from MySQL

I have a query that I'm testing on my database, but for some weird reason, and randomly, it returns a different set of results. Interestingly, there are only two distinct result-sets that it returns, from thousands of rows, and the query will randomly return one or the other, but nothing else.
Is there a reason the query only returns one of two datasets? Query and schema below.
My goal is to select the fastest laps for a given track, in a given time period, but only the fastest lap for each user (so there are always 10 different users in the top 10).
Most of the time the correct results are returned, but randomly, a totally different result set is returned.
SELECT `lap`.`ID`, `lap`.`qualificationTime`, `lap`.`userId`
FROM `lap`
WHERE (lap.trackID =4)
AND (lap.raceDateTime >= "2013-07-25 10:00:00")
AND (lap.raceDateTime < "2013-08-04 23:59:59")
AND (isTestLap =0)
GROUP BY `userId`
ORDER BY `qualificationTime` ASC
LIMIT 10
Schema:
CREATE TABLE IF NOT EXISTS `lap` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`userId` int(11) DEFAULT NULL,
`trackId` int(11) DEFAULT NULL,
`raceDateTime` datetime NOT NULL,
`qualificationTime` decimal(7,4) DEFAULT '0.0000',
`isTestLap` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`ID`)
(DB create script trimmed of un-needed columns)
You are using a (mis)feature of MySQL called hidden columns. As others have pointed out, you are allowed to put columns in the select statement that are not in the group by. But, the returned values are arbitrary, and not even guaranteed to be the same from one run to the next.
The solution is to find the max qualification time for each user. Then join this information back to get the other fields. Here is one way:
select l.*
from (SELECT userId, min(qualificationtime) as minqf
FROM lap
WHERE (lap.trackID =4)
AND (lap.raceDateTime >= "2013-07-25 10:00:00")
AND (lap.raceDateTime < "2013-08-04 23:59:59")
AND (isTestLap =0)
GROUP BY `userId`
) lu join
lap l
on lu.minqf = l.qualificationtime
ORDER BY l.`qualificationTime` ASC
LIMIT 10
You are selecting lap.ID, lap.qualificationTime and lap.userId, but you are not GROUPing BY them. You can only select fields you group by, or else aggregate functions on the other fields (MIN, MAX, AVG, etc). Otherwise, results are undefined.
I think you mean that sometimes values for lap.ID, lap.qualificationTime are different. And it's right behaviour for mysql. Because you group by userId and you don't know what values for other fields will be returned. Mysql can select different values depend on first value or last rows reading.
I would check something like this:
SELECT `l1`.`qualificationTime`, `l1`.`userId`,
(SELECT l2.ID FROM `lap` AS l2 WHERE l2.`userId` = l1.userId AND
l2.qualificationTime = min(l1.`qualificationTime`))
FROM `lap` AS `l1`
WHERE (l1.trackID =4)
AND (l1.raceDateTime >= "2013-07-25 10:00:00")
AND (l1.raceDateTime < "2013-08-04 23:59:59")
AND (isTestLap =0)
GROUP BY `userId`
ORDER BY `qualificationTime` ASC
LIMIT 10
It's likely to be your ORDER BY on a decimal entity, and how the DB stores this and then retrieves it.

Doing some calculations in mysql, numbers off when using GROUP BY

Im running the following query to get the stats for a user, based on which I pay them.
SELECT hit_paylevel, sum(hit_uniques) as day_unique_hits
, (sum(hit_uniques)/1000)*hit_paylevel as day_earnings
, hit_date
FROM daily_hits
WHERE hit_user = 'xxx' AND hit_date >= '2011-05-01' AND hit_date < '2011-06-01'
GROUP BY hit_user
The table in question looks like this:
CREATE TABLE IF NOT EXISTS `daily_hits` (
`hit_itemid` varchar(255) NOT NULL,
`hit_mainid` int(11) NOT NULL,
`hit_user` int(11) NOT NULL,
`hit_date` date NOT NULL,
`hit_hits` int(11) NOT NULL DEFAULT '0',
`hit_uniques` int(11) NOT NULL,
`hit_embed` int(11) NOT NULL,
`hit_paylevel` int(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`hit_itemid`,`hit_date`),
KEY `hit_user` (`hit_user`),
KEY `hit_mainid` (`hit_mainid`,`hit_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
The problem in the calculation has to do with the hit_paylevel which acts as a multiplier. Default is one, the other option is 2 or 3, which essentially doubles or triples the earnings for that day.
If I loop through the days, the daily day_earnings is correct, its just that when I group them, it calculates everything as paylevel 1. This happens if the user was paylevel 1 in the beginning, and was later upgraded to a higher level. if user is pay level 2 from the start, it also calculates everything correctly.
Shouldn't this be sum(hit_uniques * hit_paylevel) / 1000?
Like #Denis said:
Change the query to
SELECT hit_paylevel, sum(hit_uniques) as day_unique_hits
, sum(hit_uniques * hit_paylevel) / 1000 as day_earnings
, hit_date
FROM daily_hits
WHERE hit_user = 'xxx' AND hit_date >= '2011-05-01' AND hit_date < '2011-06-01'
GROUP BY hit_user;
Why this fixes the problem
Doing the hit_paylevel outside the sum, first sums all hit_uniques and then picks a random hit_paylevel to multiply it by.
Not what you want. If you do both columns inside the sum MySQL will pair up the correct hit_uniques and hit_paylevels.
The dangers of group by
This is an important thing to remember on MySQL.
The group by clause works different from other databases.
On MSSQL *(or Oracle or PostgreSQL) you would have gotten an error
non-aggregate expression must appear in group by clause
Or words to that effect.
In your original query hit_paylevel is not in an aggregate (sum) and it's also not in the group by clause, so MySQL just picks a value at random.