We are experiencing slow performance with a query on mysql database and we are not sure if the query is wrong or maybe mysql or server is not good enough.
The query with a subquery returns some project details (3 fields) and filename of the latest taken picture of a online camera.
Info
Table 'projects' contains 40 records.
Table 'cameras' contains approx 40 records (1 project, multiple cameras possible)
Table 'cameraimages' contains around 250000 (250 thousand) records. (1 camera can have thousands of images)
Engine is InnoDb
Database size is about 100Mb approx
No indexes are added yet.
Version number mysql 8.0.15
This is the query
SELECT
pj.title,
pj.description,
pj.city,
(SELECT cmi.filename
FROM cameras cm
LEFT JOIN cameraimages cmi ON cmi.cameraId = cm.id
WHERE cm.projectId = pj.id
ORDER BY cmi.dateRecording DESC
LIMIT 0,1) as latestfilename
FROM
projects pj
It takes 40-50 seconds to return this data.
That is to long for a webpage but I think it should take not that long at all.
We tested the same query on another server, to compare. Same data, same query.
That takes 25 seconds.
My questions are:
Is this query to 'heavy/bad' and if it is, what query should perform better?
Is there a way, or what should I check, to find out why this query runs better on an older/other server?
Hope someone can give some advice.
Thnx!
Additional info
CREATE TABLE `cameras` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`guid` varchar(50) DEFAULT NULL,
`title` varchar(50) DEFAULT NULL,
`longitude` double DEFAULT NULL,
`latitude` double DEFAULT NULL,
`status` smallint(6) DEFAULT NULL,
`cameraUid` varchar(20) DEFAULT NULL,
`cameraFriendlyName` varchar(50) DEFAULT NULL,
`projectId` int(11) DEFAULT NULL,
`dateCreated` datetime DEFAULT NULL,
`dateModified` datetime DEFAULT NULL,
`address` varchar(100) DEFAULT NULL,
`city` varchar(50) DEFAULT NULL,
`createArchive` smallint(6) DEFAULT '0',
`createDaily` smallint(6) DEFAULT '1',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=88 DEFAULT CHARSET=latin1
Columns cameraId,dateRecording is unique.
One camera takes on picture at the time.
You're using a so-called dependent subquery. That's slow.
I guess cameraimages.id is a primary key for your cameraimages file. That's a guess. You didn't provide enough information in your question to answer it with certainty.
I also guess that the dateRecording values in cameraimages are in the same order as your autoincrementing primary key id values. That is, I guess you INSERT a record to that table at the time each image is captured.
Let's break this down.
You want the id of the most recent image from each project. How can you get that? Write a subquery to retrieve the largest, most recent id for each project.
SELECT cm.projectId,
MAX(cmi.id) imageId
FROM cameras cm
JOIN cameraimages cmi ON cmi.cameraId = cm.id
GROUP BY cm.projectId
That subquery does the heavy lifting of searching your big table. It does it just once, not for every project, so it won't take as long.
Then put that subquery into your query to retrieve the columns you need.
SELECT
pj.title,
pj.description,
pj.city,
cmi.filename latestfilename
FROM projects pj
JOIN (
SELECT cm.projectId,
MAX(cmi.id) imageId
FROM cameras cm
JOIN cameraimages cmi ON cmi.cameraId = cm.id
GROUP BY cm.projectId
) latest ON pj.id = latest.projectId
JOIN cameraimages cmi ON cmi.imageId = latest.imageId
This has a series of JOINs making a chain from projects to the latest subquery and from there to cameraimages.
This depends on cameraimages.id values being in chronological order. It can still be done if they aren't in that order with a more elaborate query.
Indexes:
cm: INDEX(projectId, id)
cmi: INDEX(cameraId, dateRecording, filename)
cmi: INDEX(cameraId, id)
When cameraimages.id values aren't in chronological order, we need to work with the latest dateRecording values.
This is going to require a sequence of subqueries. So, rather than nesting them, let's use MySQL 8+ Common Table Expressions. It's a big query.
WITH
ProjectCameraImage AS (
/* a virtual version of the cameraimages table including projectId */
SELECT cmi.id, cmi.dateRecording, cm.projectId, cm.cameraId
FROM cameras cm
JOIN cameraimages cmi ON cm.id = cmi.cameraId
),
LatestDate AS (
/* the latest date for each entry in ProjectCameraImage */
/* Notice how this uses MAX rather than ORDER BY ... DESC LIMIT 1 */
SELECT projectId, cameraId,
MAX(dateRecording) dateRecording
FROM ProjectCameraImage
GROUP BY projectId, cameraId
),
ProjectCameraLatest AS (
/* the cameraimage.id values for the latest images in ProjectCameraImage */
SELECT ProjectCameraImage.id,
ProjectCameraImage.projectId,
ProjectCameraImage.cameraId,
ProjectCameraImage.dateRecording
FROM ProjectCameraImage
JOIN LatestDate
ON ProjectCameraImage.projectId = LatestDate.projectId
AND ProjectCameraImage.cameraId = LatestDate.cameraId
AND ProjectCameraImage.dateRecording = LatestDate.dateRecording
),
LatestProjectDate AS (
/* the latest data for each entry in ProjectCameraLatest */
SELECT projectId,
MAX(dateRecording) dateRecording
FROM ProjectCameraLatest
GROUP BY projectId
),
ProjectLatest AS (
/* the cameraimage.id values for the latest images in ProjectCameraLatest */
SELECT ProjectCameraLatest.id,
ProjectCameraLatest.projectId
FROM ProjectCameraLatest
JOIN LatestProjectDate
ON ProjectCameraLatest.projectId = LatestProjectDate.projectId
AND ProjectCameraLatest.dateRecording = LatestProjectDate.dateRecording
)
/* the main query */
SELECT pj.title,
pj.description,
pj.city,
cmi.filename latestfilename
FROM projects pj
JOIN ProjectLatest ON pj.id = ProjectLatest.projectId
JOIN cameraimages cmi ON ProjectLatest.id = cmi.id;
It's big because we have to go through two different cycles of finding the cameraimages.id value with the largest dateRecording.
Edit The heavy lifting, in terms of searching your tables, happens in the second common table expression (CTE), the one called LatestDate. I suggest adding an index to your cameraimages table as follows to give it a boost.
CREATE INDEX cmi_cameraid_daterec
ON cameraimages (cameraId, dateRecording DESC);
That compound index should allow random access by cameraId, then quick access to the latest date. Notice that it also should help the ProjectCameraLatest CTE.
You can test the performance of this by changing the last SELECT, the one in the main query, to just SELECT * FROM LatestDate;. And to see whether / how it uses the index try using EXPLAIN or EXPLAIN ANALYZE: use EXPLAIN SELECT * FROM LatestDate; as the main query.
You may learn some useful things about indexes if you run EXPLAIN with and without the index.
Related
I have a query which sometimes runs really fast and sometimes incredibly slowly depending on the number of results that match a full text boolean search within the query.
The query also contains a subquery.
Without the subquery the main query is always fast.
The subquery by itself is also always fast.
But together they are very slow.
Removing the full text search from a where clause and instead ordering by the full text search is really fast.
So it's only slow then when using a full text search search within a where.
That's the simple readable overview, exact queries are below.
I've included the schema at the bottom although it will be difficult to replicate without my dataset which unfortunately I can't share.
I've included the counts and increments in the example queries to give some indication of the data size involved.
I actually have a solution by simply accepting a result which includes irrelevant data and then filtering out that data in PHP. But i'd like to understand why my queries are performing poorly and how I might be able to resolve the issue in MySQL.
In particular i'm confused why it's fast with the full text search in an order by but not with it in the where.
The query I want (slow)
I've got a query that looks like this:
select
*,
MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE) AS relevance_score
from
`app_records`
where
`id` in (
select
distinct(app_record_parents.record_id)
from
`app_group_records`
inner join `app_record_parents`
on `app_record_parents`.`parent_id` = `app_group_records`.`record_id`
where
`group_id` = 3
)
and
MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE)
order by
`relevance_score` desc
limit
10;
This query takes 10 seconds.
This is too long for this sort of query, I need to be looking at milliseconds.
But the two queries run really fast when run by themselves.
The sub select by itself
select distinct(app_record_parents.record_id)
from
`app_group_records`
inner join
`app_record_parents`
on `app_record_parents`.`parent_id` = `app_group_records`.`record_id`
where
`group_id` = 3
The sub select by itself takes 7ms with 2600 results.
The main query without the sub select
select
*,
MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE) AS relevance_score
from
`app_records`
where
MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE)
order by
`relevance_score` desc
limit
10;
The main query without the sub select takes 6ms with 2971 possible results (obviously there's a limit 10 there).
It's faster with less results
The same query but matching against "Old Traf" rather than "Old Tra" takes 300ms.
The number of results are obviously different when using "Old Traf" vs "Old Tra".
Results of full query
"Old Tra": 9
"Old Traf": 2
Records matching the full text search
"Old Tra": 2971
"Old Traf": 120
Removing the where solves the issue
Removing the where and returning all records sorted by the relevance score is really fast and still gives me the experience i'd like:
select
*,
MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE) AS relevance_score
from
`app_records`
where
`id` in (
select
distinct(app_record_parents.record_id)
from
`app_group_records`
inner join `app_record_parents`
on `app_record_parents`.`parent_id` = `app_group_records`.`record_id`
where
`group_id` = 3
)
order by
`relevance_score` desc
limit
10;
But then I need to filter out irrelevant results in code
I'm using this in php so I can now filter my results to remove any that have a 0 relevance score (if there are only 2 matches for instance, 8 random results with a relevance score of 0 will still be included, since i'm not using a where).
array_filter($results, function($result) {
return $result->relevance_score > 0;
});
Obviously this is really quick so it's not really a problem.
But I still don't understand what's wrong with my queries.
So I do have a fix as outlined above. But I still don't understand why my queries are slow.
It's clear that the number of possible results from the full text search is causing an issue, but exactly why and how to get around this issue is beyond me.
Table Schema
Here are my tables
CREATE TABLE `app_records` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`type` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
FULLTEXT KEY `app_models_name_IDX` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=960004 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE TABLE `app_record_parents` (
`record_id` int(10) unsigned NOT NULL,
`parent_id` int(10) unsigned DEFAULT NULL,
KEY `app_record_parents_record_id_IDX` (`record_id`) USING BTREE,
KEY `app_record_parents_parent_id_IDX` (`parent_id`) USING BTREE,
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE TABLE `app_group_records` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`group_id` int(10) unsigned NOT NULL,
`record_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=31 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
A note on what the queries are doing
The subquery is getting a list of record_id's that belong to group_id 3.
So while there are 960004 records in app_records there are only 2600 which belong to group 3 and it is against these 2600 that i'm trying to query for name's that match "Old Tra",
So the subquery is getting a list of these 2600 record_id's and then i'm doing a WHERE id IN <subquery> to get the relevant results from app_records.
EDIT: Using joins is equally slow
Just to add using joins has the same issue. Taking 10 seconds for "Old Tra" and 400ms for "Old Traf" and being very fast when not using a full text search in a where.
SELECT
app_records.*,
MATCH (NAME) AGAINST ('Old Tra*' IN BOOLEAN MODE) AS relevance_score
FROM
`app_records`
INNER JOIN app_record_parents ON app_records.id = app_record_parents.record_id
INNER JOIN app_group_records ON app_group_records.record_id = app_record_parents.parent_id
WHERE
`group_id` = 3
AND MATCH (NAME) AGAINST ('Old Tra*' IN BOOLEAN MODE)
GROUP BY
app_records.id
LIMIT
10;
app_record_parents
Has no PRIMARY KEY; hence may have unnecessary duplicate pairs.
Does not have optimal indexes.
See this for several tips.
Perhaps app_group_records is also many-many?
Are you are searching for Old Tra* anywhere in name? If not, then why not use WHERE name LIKE 'Old Tra%. In this case, add INDEX(name).
Note: When FULLTEXT is involved, it is picked first. Please provide EXPLAIN SELECT to confirm this.
This formulation may be faster:
select *,
MATCH (r.name) AGAINST ('Old Tra*' IN BOOLEAN MODE) AS relevance_score
from `app_records` AS r
WHERE MATCH (r.name) AGAINST ('Old Tra*' IN BOOLEAN MODE)
AND EXISTS ( SELECT 1
FROM app_group_records AS gr
JOIN app_record_parents AS rp ON rp.parent_id = gr.record_id
WHERE gr.group_id = 3
AND r.id = rp.record_id )
ORDER BY relevance_score DESC
LIMIT 10
Indexes:
gr: (group_id, record_id) -- in this order
r: nothing but the FULLTEXT will be used
rp: (record_id, parent_id) -- in this order
I am trying to optimize a mysql query that works perfectly but is taking way too long. My inventory table is nearly 300,000 records (not too bad). I am not sure if using a subquery or join or additional index would speed up my results. I do have the district_id columns indexed in both the students and inventory tables.
Basically, the query below pulls all the inventory of all students in a teacher's roster. So it first has to search the students table to find which students are in the teacher's roster, then has to search the inventory table for each student. So if a teacher has 30+ students it can be a lot of searches through the inventory and each student can have 30+ pieces of inventory. Any advice would be helpful!
SELECT inventory.inventory_id, items.title, items.isbn, items.item_num,
items.price, conditions.condition_name, inventory.check_out,
inventory.check_in, inventory.student_id, inventory.teacher_id
FROM inventory, conditions, items, students
WHERE students.teacher_id = '$teacher_id'
AND students.district_id = $district_id
AND inventory.student_id = students.s_number
AND inventory.district_id = $district_id
AND inventory.item_id = items.item_id
AND items.consumable !=1
AND conditions.condition_id = inventory.condition_id
ORDER BY inventory.student_id, inventory.inventory_id
Here is the table structure:
CREATE TABLE `inventory` (
`id` int(11) NOT NULL,
`inventory_id` varchar(10) CHARACTER SET utf8 NOT NULL DEFAULT '0',
`item_id` int(6) NOT NULL DEFAULT '0',
`district_id` int(2) NOT NULL DEFAULT '0',
`condition_id` int(1) NOT NULL DEFAULT '0',
`check_out` date NOT NULL DEFAULT '0000-00-00',
`check_in` date NOT NULL DEFAULT '0000-00-00',
`student_id` varchar(10) CHARACTER SET utf8 NOT NULL DEFAULT '0',
`teacher_id` varchar(6) CHARACTER SET utf8 NOT NULL DEFAULT '0',
`acquisition_date` date NOT NULL DEFAULT '0000-00-00',
`notes` text CHARACTER SET utf8 NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
First you rewrite this to use explicit JOINs:
SELECT inventory.inventory_id,
items.title, items.isbn, items.item_num, items.price,
conditions.condition_name,
inventory.check_out, inventory.check_in,
inventory.student_id, inventory.teacher_id
FROM inventory
JOIN conditions ON (conditions.condition_id = inventory.condition_id)
JOIN items ON (inventory.item_id = items.item_id AND items.consumable != 1)
JOIN students ON (inventory.student_id = students.s_number)
WHERE students.teacher_id = '$teacher_id'
AND students.district_id = $district_id
AND inventory.district_id = $district_id
ORDER BY inventory.student_id, inventory.inventory_id
Then you examine the JOINs. For example this:
JOIN items ON (inventory.item_id = items.item_id AND items.consumable != 1)
means that the items table needs to be scanned on item_id and consumable, which might be a constant. It is always better to not use negative conditions if possible. But at the very least you index items on item_id (unless it's already the primary key, as is likely). If consumable can assume, say, values 0, 1, 2, 3, then you go:
JOIN items ON (inventory.item_id = items.item_id AND items.consumable IN (0, 2, 3))
and use CREATE INDEX to add an index on consumable.
You may notice that a few columns from inventory are always used in the other JOINs, and there are also some constant constraints.
So another useful index could be
CREATE INDEX ... ON inventory(district_id, student_id, item_id, condition_id)
Another useful index would be
ON students(teacher_id, district_id, student_id, s_number)
which allows immediately restricting the WHERE on the involved students, and retrieve the information required by the JOINs without ever loading the table, just using the index.
Switch to InnoDB! Some of what I am about to say is less efficient in InnoDB.
SELECT i.inventory_id,
items.title, items.isbn, items.item_num, items.price,
c.condition_name,
i.check_out, i.check_in, i.student_id, i.teacher_id
FROM inventory AS i
JOIN conditions AS c ON c.condition_id = i.condition_id
JOIN items ON i.item_id = items.item_id
JOIN students AS s ON i.student_id = s.s_number
WHERE s.teacher_id = '$teacher_id'
AND s.district_id = $district_id
AND i.student_id = s.s_number
AND i.district_id = $district_id
AND items.consumable != 1
ORDER BY i.student_id, i.inventory_id
To help the Optimizer if it would like to start with students:
students: INDEX(district_id, teacher_id, s_number)
Note: this is also "covering", thereby avoiding bouncing between index BTree and data BTree. (What is the PK of students? Please provide SHOW CREATE TABLE.)
If consuming the ORDER BY is better:
inventory: INDEX(district_id, student_id, inventory_id)
Also needed:
items: (item_id) -- probably already the PRIMARY KEY?
conditions: (condition_id) -- probably already the PRIMARY KEY?
Verify or add those 4 indexes. (The Optimizer will dynamically choose what to do.)
I have a table containing user to user messages. A conversation has all messages between two users. I am trying to get a list of all the different conversations and display only the last message sent in the listing.
I am able to do this with a SQL sub-query in FROM.
CREATE TABLE `messages` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`from_user_id` bigint(20) DEFAULT NULL,
`to_user_id` bigint(20) DEFAULT NULL,
`type` smallint(6) NOT NULL,
`is_read` tinyint(1) NOT NULL,
`is_deleted` tinyint(1) NOT NULL,
`text` longtext COLLATE utf8_unicode_ci NOT NULL,
`heading` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`created_at_utc` datetime DEFAULT NULL,
`read_at_utc` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
);
SELECT * FROM
(SELECT * FROM `messages` WHERE TYPE = 1 AND
(from_user_id = 22 OR to_user_id = 22)
ORDER BY created_at_utc DESC
) tb
GROUP BY from_user_id, to_user_id;
SQL Fiddle:
http://www.sqlfiddle.com/#!2/845275/2
Is there a way to do this without a sub-query?
(writing a DQL which supports sub-queries only in 'IN')
You seem to be trying to get the last contents of messages to or from user 22 with type = 1. Your method is explicitly not guaranteed to work, because the extra columns (not in the group by) can come from arbitrary rows. As explained in the [documentation][1]:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Furthermore, the selection of values from each group cannot be
influenced by adding an ORDER BY clause. Sorting of the result set
occurs after values have been chosen, and ORDER BY does not affect
which values within each group the server chooses.
The query that you want is more along the lines of this (assuming that you have an auto-incrementing id column for messages):
select m.*
from (select m.from_user_id, m.to_user_id, max(m.id) as max_id
from message m
where m.type = 1 and (m.from_user_id = 22 or m.to_user_id = 22)
) lm join
messages m
on lm.max_id = m.id;
Or this:
select m.*
from message m
where m.type = 1 and (m.from_user_id = 22 or m.to_user_id = 22) and
not exists (select 1
from messages m2
where m2.type = m.type and m2.from_user_id = m.from_user_id and
m2.to_user_id = m.to_user_id and
m2.created_at_utc > m.created_at_utc
);
For this latter query, an index on messages(type, from_user_id, to_user_id, created_at_utc) would help performance.
Since this is a rather specific type of data query which goes outside common ORM use cases, DQL isn't really fit for this - it's optimized for walking well-defined relationships.
For your case however Doctrine fully supports native SQL with result set mapping. Using a NativeQuery with ResultSetMapping like this you can easily use the subquery this problem requires, and still map the results on native Doctrine entities, allowing you to still profit from all caching, usability and performance advantages.
Samples found here.
If you mean to get all conversations and all their last messages, then a subquery is necessary.
SELECT a.* FROM messages a
INNER JOIN (
SELECT
MAX(created_at_utc) as max_created,
from_user_id,
to_user_id
FROM messages
GROUP BY from_user_id, to_user_id
) b ON a.created_at_utc = b.max_created
AND a.from_user_id = b.from_user_id
AND a.to_user_id = b.to_user_id
And you could append the where condition as you like.
THE SQL FIDDLE.
I don't think your original query was even doing this correctly. Not sure what the GROUP BY was being used for other than maybe try to only return a single (unpredictable) result.
Just add a limit clause:
SELECT * FROM `messages`
WHERE `type` = 1 AND
(`from_user_id` = 22 OR `to_user_id` = 22)
ORDER BY `created_at_utc` DESC
LIMIT 1
For optimum query performance you need indexes on the following fields:
type
from_user_id
to_user_id
created_at_utc
Im running the following query to get the stats for a user, based on which I pay them.
SELECT hit_paylevel, sum(hit_uniques) as day_unique_hits
, (sum(hit_uniques)/1000)*hit_paylevel as day_earnings
, hit_date
FROM daily_hits
WHERE hit_user = 'xxx' AND hit_date >= '2011-05-01' AND hit_date < '2011-06-01'
GROUP BY hit_user
The table in question looks like this:
CREATE TABLE IF NOT EXISTS `daily_hits` (
`hit_itemid` varchar(255) NOT NULL,
`hit_mainid` int(11) NOT NULL,
`hit_user` int(11) NOT NULL,
`hit_date` date NOT NULL,
`hit_hits` int(11) NOT NULL DEFAULT '0',
`hit_uniques` int(11) NOT NULL,
`hit_embed` int(11) NOT NULL,
`hit_paylevel` int(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`hit_itemid`,`hit_date`),
KEY `hit_user` (`hit_user`),
KEY `hit_mainid` (`hit_mainid`,`hit_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
The problem in the calculation has to do with the hit_paylevel which acts as a multiplier. Default is one, the other option is 2 or 3, which essentially doubles or triples the earnings for that day.
If I loop through the days, the daily day_earnings is correct, its just that when I group them, it calculates everything as paylevel 1. This happens if the user was paylevel 1 in the beginning, and was later upgraded to a higher level. if user is pay level 2 from the start, it also calculates everything correctly.
Shouldn't this be sum(hit_uniques * hit_paylevel) / 1000?
Like #Denis said:
Change the query to
SELECT hit_paylevel, sum(hit_uniques) as day_unique_hits
, sum(hit_uniques * hit_paylevel) / 1000 as day_earnings
, hit_date
FROM daily_hits
WHERE hit_user = 'xxx' AND hit_date >= '2011-05-01' AND hit_date < '2011-06-01'
GROUP BY hit_user;
Why this fixes the problem
Doing the hit_paylevel outside the sum, first sums all hit_uniques and then picks a random hit_paylevel to multiply it by.
Not what you want. If you do both columns inside the sum MySQL will pair up the correct hit_uniques and hit_paylevels.
The dangers of group by
This is an important thing to remember on MySQL.
The group by clause works different from other databases.
On MSSQL *(or Oracle or PostgreSQL) you would have gotten an error
non-aggregate expression must appear in group by clause
Or words to that effect.
In your original query hit_paylevel is not in an aggregate (sum) and it's also not in the group by clause, so MySQL just picks a value at random.
I have a table I need to perform a horrible query on (not my db design).
The table is simple:
`id` bigint(255) NOT NULL AUTO_INCREMENT,
`poster` varchar(30) NOT NULL,
`chattext` varchar(255) NOT NULL,
`timeposted` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`type` int(3) NOT NULL DEFAULT '0',
`towho` varchar(30) DEFAULT NULL
My current query does what I want, but at a horrible cost to the server. I'm running near max CPU with only a handful of users. Sadly, every 5 seconds they do this pull.
Current query:
SELECT c.poster, c.chattext, c.type, c.towho, c.timeposted, u.utype, u.locz
FROM chat c
LEFT JOIN users u ON c.poster=u.name
WHERE c.type!=4
UNION (
SELECT c.poster, c.chattext, c.type, c.towho, c.timeposted, u.utype, u.locz
FROM chat c
LEFT JOIN users u ON c.poster=u.name
WHERE c.type=4 AND c.towho='USERNAME')
UNION (
SELECT c.poster, c.chattext, c.type, c.towho, c.timeposted, u.utype, u.locz
FROM chat c
LEFT JOIN users u ON c.poster=u.name
WHERE c.type=4 AND c.poster='USERNAME')
ORDER BY timeposted DESC LIMIT 0, 25"
This gets performed every 5 seconds by all users online. As you can see, it quickly becomes a resource hog.
I'm used to MSSQL so I should be able to grasp the concepts for MySql, and syntax hasn't been too different. This query was given to me without the user tables added, so I think my mindset is stuck in making this work over finding a better way.
I think I'm doing this the wrong/complicated way. So any assistance in improving performance is appreciated.
I think this simpler query is equivalent to yours. isn't it?
SELECT c.poster, c.chattext, c.type, c.towho, c.timeposted, u.utype, u.locz
FROM chat c
LEFT JOIN users u ON c.poster=u.name
WHERE c.type!=4
OR
(c.type=4 AND c.towho='USERNAME')
OR
(c.type=4 AND c.poster='USERNAME')
ORDER BY timeposted DESC LIMIT 0, 25