Query on two tables for one report (Advanced) - mysql

I'm having some trouble with an advanced SQL query, and it's been a long time since I've worked with SQL databases. We use MySQL.
Background:
We will be working with two tables:
"Transactions Table"
table: expire_history
+---------------+-----------------------------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-----------------------------+------+-----+-------------------+-------+
| m_id | int(11) | NO | PRI | 0 | |
| m_a_ordinal | int(11) | NO | PRI | 0 | |
| a_expired_date| datetime | NO | PRI | | |
| a_state | enum('EXPIRED','UNEXPIRED') | YES | | NULL | |
| t_note | text | YES | | NULL | |
| t_updated_by | varchar(40) | NO | | | |
| t_last_update | timestamp | NO | | CURRENT_TIMESTAMP | |
+---------------+-----------------------------+------+-----+-------------------+-------+
"Information Table"
table: information
+---------------------+---------------+------+-----+---------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+---------------+------+-----+---------------------+-------+
| m_id | int(11) | NO | PRI | 0 | |
| m_a_ordinal | int(11) | NO | PRI | 0 | |
| a_type | varchar(15) | YES | MUL | NULL | |
| a_class | varchar(15) | YES | MUL | NULL | |
| a_state | varchar(15) | YES | MUL | NULL | |
| a_publish_date | datetime | YES | | NULL | |
| a_expire_date | date | YES | | NULL | |
| a_updated_by | varchar(20) | NO | | | |
| a_last_update | timestamp | NO | | CURRENT_TIMESTAMP | |
+---------------------+---------------+------+-----+---------------------+-------+
We have a set of fields in one table that describe the record. Each record is comprised of a m_id (the person) and an ordinal (a person can have multiple records). So for instance, my m_id could be 1, and i could have multiple ordinals, (1, 2, 3, 4, etc), each with their own individual set of data. The m_id and the m_a_ordinal comprise a composite key in the "information" table, and the m_id, m_a_ordinal, and a_expired_date fields in the "transactions" table comprises a composite key as well.
Essentially when we expire a record, the a_state field in the information table is updated to expired. At the same time, a record is created in the transactions table with the m_id, m_a_ordinal, and a_expired_date. We've found in the past that people get impatient and can click a button twice, so through some previous help I've managed to narrow down the most recent transaction for each expired record using the following query:
SELECT e1.m_id, e1.m_a_ordinal, e1.a_expired_date, e1.t_note, e1.t_updated_by
FROM expire_history e1
INNER JOIN (SELECT m_id, m_a_ordinal, MAX(a_expired_date) AS a_expired_date
FROM expire_history GROUP BY m_id, m_a_ordinal) e2
ON (e2.m_id = e1.m_id AND e2.m_a_ordinal = e1.m_a_ordinal AND e2.a_expired_date = e1.a_expired_date)
WHERE e2.a_expired_date > '2008-05-15 00:00:00' ORDER BY a_date_expired;
Seems simple enough, right?
Let's add some complexity. Each record in the "information" table has a "natural expiration date" as well. The original developer of our software, however, didn't code it to change the state of the record to "expired" once it's reached it's natural expiration date. It also does not write a transaction to the transaction table once it's expired (which I understand because this is only to keep records of ones that were expired by a person, as opposed to automagically). Also, when a record is expired manually, the original expiration date does not change. This is why this is so complicated :P~~.
Essentially I need to build a report that shows all aspects of expiration, whether it was expired manually, or naturally.
This report should take the data from the query above, and combines it with another query on the "information table" that says if a_expire_date <= CURDATE show record, except if record exisits in (query above from expire_history), then show record from (query on expire_history).
a rough structure of the raw logic is as follows:
for x in record_total
if (m_id m_a_ordinal) exists in expire_history
display m_id, m_a_ordinal, a_expired_date, a_state)
else if (m_id_a_ordinal) exists in information AND a_expire_date <= CURDATE
display (m_id, m_a_ordinal, a_expire_date, a_state)
end if
x++
I hope that this is concise enough.
Thanks for any help you can provide!

SELECT i.m_id, I.m_a_ordinal,
coalesce(e1.a_expired_date, I.A_Expire_Date) as Expire_DT,
coalesce(e1.t_note,'insert related item column'),
coalesce(e1.t_updated_by, I.A_Updated_by) as Updated_By
FROM Information I
LEFT JOIN expire_history e1
ON E1.M_ID = I.M_ID
AND I.m_a_ordinal=e1.M_a_ordinal
INNER JOIN
(SELECT m_id, m_a_ordinal, MAX(a_expired_date) AS a_expired_date
FROM expire_history GROUP BY m_id, m_a_ordinal) e2
ON (e2.m_id = e1.m_id
AND e2.m_a_ordinal = e1.m_a_ordinal
AND e2.a_expired_date = e1.a_expired_date)
WHERE coalesce(e2.a_expired_date,i.A_Expire_Date) > '2008-05-15 00:00:00'
ORDER BY a_date_expired;
Syntax may be off a bit don't ahve time to test; but you can get the gist of it from this I hope:
Again what coalesce does is simply return the first NON-null value in a series of values. If you're only dealing with two NULLIF may work as well.

Related

MySQL - How can you select multiple columns on a nested IFNULL...GROUP_CONCAT() condition?

I have a web application which is connected to a MySQL (5.5.64-MariaDB) database.
One of the queries is as follows:
SELECT
d.id,
d.label AS display_label,
d.anchor,
r.id AS regulation_id,
IFNULL(
(SELECT GROUP_CONCAT(value) FROM display_substances `ds`
WHERE `ds`.`display_id` = `d`.`id`
AND ds.substance_id = 1 -- For example, substance ID = 1
GROUP BY `ds`.`display_id`
), "Not Listed"
) `display_value` FROM displays `d`
JOIN groups g ON d.group_id = g.id
JOIN regulations r ON g.regulation_id = r.id
An example of the output is as follows:
+-----+------------------------------------+------------------------------------------------------------------------------------------+
| id | name | display_value |
+-----+------------------------------------+------------------------------------------------------------------------------------------+
| 4 | techfunction | Intermediate / monomer; Corrosion inhibitor / anodiser / galvaniser; Catalyst; Additive |
| 323 | russia_chemsafety_register_display | Not Listed |
| 733 | peru_pcb_display | Not Listed |
+-----+------------------------------------+------------------------------------------------------------------------------------------+
This query does what we need. For explanatory purposes:
There are 2 tables, displays and display_substances
The query is obtaining display_substances.value for each displays.id
If there is no corresponding display_substances.value then the string "Not Listed" (refer to query above) is returned. If there is a corresponding value then display_substances.value is returned. So in the example data above, IDs 323 and 733 refer to a scenario where there is no corresponding entry, therefore we want "Not Listed". Conversely ID 4 does have a value ("Intermediate / monomer; Corrosion inhibitor / anodiser / galvaniser; Catalyst; Additive") so we get that.
The table structures are as follows:
DESCRIBE displays;
+----------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+----------------------+------+-----+---------+----------------+
| id | smallint(5) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(127) | NO | | NULL | |
| label | varchar(255) | NO | | NULL | |
+----------+----------------------+------+-----+---------+----------------+
DESCRIBE display_substances;
+--------------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-----------------------+------+-----+---------+----------------+
| id | mediumint(8) unsigned | NO | PRI | NULL | auto_increment |
| display_id | smallint(5) unsigned | NO | MUL | NULL | |
| substance_id | mediumint(8) unsigned | NO | MUL | NULL | |
| value | text | NO | | NULL | |
| automated | tinyint(4) | YES | | NULL | |
+--------------+-----------------------+------+-----+---------+----------------+
I want to be able to return display_substances.automated (refer to table structure above) as a column from my query. But I can't see how to do this.
The reference to the display_substances table is ds, so I cannot use that in the initial SELECT statement because at that point there's no alias. Equally there is no JOIN condition that would make it possible, because not every row returned obtains data from display_substances (i.e. those that are "Not Listed" are not getting anything from that table).
If I want an additional column next to display_value in the sample output above that shows display_substances.automated, or NULL if it doesn't exist, how can I achieve that?
For reference the automated field either contains a 1 (to represent data that has been obtained through automated processes by our application), or NULL if it isn't automated.
there is no JOIN condition that would make it possible, because not
every row returned obtains data from display_substances
For this case you can use a LEFT JOIN:
SELECT d.id, d.label display_label, d.anchor, r.id regulation_id,
COALESCE(ds.value, 'Not Listed') display_value,
ds.automated
FROM displays d
INNER JOIN groups g ON d.group_id = g.id
INNER JOIN regulations r ON g.regulation_id = r.id
LEFT JOIN (
SELECT display_id, GROUP_CONCAT(value) value, MAX(automated) automated
FROM display_substances
WHERE substance_id = 1
GROUP BY display_id
) ds ON ds.display_id = d.id
I used MAX(automated) as the returned column, but you can use GROUP_CONCAT(automated) just like you do for value and also COALESCE():
COALESCE(ds.automated, 'Not Listed')

MySql LEFT OUTER JOIN causing duplicate rows

Im running a query to grab the first 10 profiles (think of them as an article that shows when a shop opens and holds information about that shop). I'm using the OUTER JOIN to select * images that belong to the profile PK.
Im running the following query, the main part I'm trying to focus on is the JOIN. I won't post the whole query as it's just a whole bunch of 'table'.'colname' = 'table.colname'.
But here is where the magic happens during my outer join.
LEFT JOIN `content_image` AS `image` ON `profile`.`content_ptr_id` = `image`.`content_id`
Full Query:
I've formatted like this so everyone can see the query without scrolling endlessly to the right.
select `profile`.`content_ptr_id` AS `profile.content_ptr_id`,
`profile`.`body` AS `profile.body`,
`profile`.`web_site` AS `profile.web_site`,
`profile`.`email` AS `profile.email`,
`profile`.`hours` AS `profile.hours`,
`profile`.`price_range` AS `profile.price_range`,
`profile`.`price_range_high` AS `profile.price_range_high`,
`profile`.`primary_category_id` AS `profile.primary_category_id`,
`profile`.`business_contact_email` AS `profile.business_contact_email`,
`profile`.`business_contact_phone` AS `profile.business_contact_phone`,
`profile`.`show_in_directory` AS `profile.show_in_directory`,
`image`.`id` AS `image.id`,
`image`.`content_id` AS `image.content_id`,
`image`.`type` AS `image.type`,
`image`.`order` AS `image.order`,
`image`.`caption` AS `image.caption`,
`image`.`author_id` AS `image.author_id`,
`image`.`image` AS `image.image`,
`image`.`link_url` AS `image.link_url`
FROM content_profile AS profile
LEFT JOIN `content_image` AS `image` ON `profile`.`content_ptr_id` = `image`.`content_id`
GROUP BY profile.content_ptr_id
LIMIT 10, 12
Is there a way I can group my results per profile? E.g all images will show in the one profile result? I can't use group by as I'm getting an error
Error: ER_WRONG_FIELD_WITH_GROUP: Expression #12 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'broadsheet.image.id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by]
code: 'ER_WRONG_FIELD_WITH_GROUP',
errno: 1055,
sqlState: '42000',
index: 0 }
Is there a possible way around this group by error or another query I could run?
Tables:
content_image
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| content_id | int(11) | NO | MUL | NULL | |
| type | varchar(255) | NO | | NULL | |
| order | int(11) | NO | | NULL | |
| caption | longtext | NO | | NULL | |
| author_id | int(11) | YES | MUL | NULL | |
| image | varchar(255) | YES | | NULL | |
| link_url | varchar(200) | YES | | NULL | |
+------------+--------------+------+-----+---------+----------------+
content_profile
+------------------------+----------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+----------------------+------+-----+---------+-------+
| content_ptr_id | int(11) | NO | PRI | NULL | |
| body | longtext | NO | | NULL | |
| web_site | varchar(200) | NO | | NULL | |
| email | varchar(75) | NO | | NULL | |
| menu | longtext | NO | | NULL | |
| hours | longtext | NO | | NULL | |
| price_range | smallint(5) unsigned | YES | MUL | NULL | |
| price_range_high | smallint(5) unsigned | YES | | NULL | |
| primary_category_id | int(11) | NO | | NULL | |
| business_contact_name | varchar(255) | NO | | NULL | |
| business_contact_email | varchar(75) | NO | | NULL | |
| business_contact_phone | varchar(20) | NO | | NULL | |
| show_in_directory | tinyint(1) | NO | | NULL | |
+------------------------+----------------------+------+-----+---------+-------+
From reading your question, I think you don't have a grasp of how the GROUP BY clause works.
So the short summary of my answer is: learn the fundamentals of the GROUP BY clause.
I will use only a small number of columns to make the explanation easier.
The first problem with your query is that you are not using the group by clause properly - when using a group by clause, all columns that are selected must be either in the group by clause OR be selected with an aggregate function.
Lets suppose these are the only columns you are selecting:
profile.content_ptr_id
profile.body
profile.web_site
image.id
image.content_id
And the query looked like this:
SELECT `profile.content_ptr_id`, `profile.body`, `profile.web_site`, `image.id`, `image.content_id`
FROM ...
GROUP BY `profile.content_ptr_id`
This query will error out as you did not specify how you want to consolidate multiple rows to one row for profile.body, profile.web_site, image.id, image.content_id. The database does not know how you want to consolidate the other columns as you can group, or use aggregate functions such as min(), max(), count(), etc.
So one solution to fix the error raised in the query above would be the following:
SELECT `profile.content_ptr_id`, `profile.body`, `profile.web_site`, `image.id`, `image.content_id`
FROM ...
GROUP BY `profile.content_ptr_id`, `profile.body`, `profile.web_site`, `image.id`, `image.content_id`
Here, I put all the columns in the group by clause which makes the query group and select all the unique combinations of profile.content_ptr_id, profile.body, profile.web_site, image.id, image.content_id columns.
Following is an example query which does not have all the columns included in the group by clause:
Lets say, you want to find out how many images there are for each of the profiles. You can use a query such as the following:
SELECT `profile.content_ptr_id`, `profile.body`, `profile.web_site`, COUNT(`image.id`)
FROM ...
GROUP BY `profile.content_ptr_id`, `profile.body`, `profile.web_site`
This query lets you find out how many images there are for every unique combination of profile.content_ptr_id, profile.body, profile.web_site columns.
Be aware that in my previous two examples, all the columns that are selected are either included in the group by clause or are selected with an aggregate function. This is a rule all queries need to follow when using the group by clause, otherwise an error will be raised by the database.
Now, lets get onto answering your question:
"Is there a way I can group my results per profile? E.g all images will show in the one profile result?"
I will use the following mock data to explain:
profile
+----------------+--------------+---------------+
| content_ptr_id | body | web_site |
+----------------+--------------+---------------+
| 100 | body1 | web1 |
+----------------+--------------+---------------+
image
+--------+-------------+
| id | content_id |
+--------+-------------+
| iid1 | 100 |
| iid2 | 100 |
+--------+-------------+
Following would be what the result would look like if you don't do a join:
SELECT `profile.content_ptr_id`, `profile.body`, `profile.web_site`, `image.id`, `image.content_id`
FROM ...
+----------------+--------------+---------------+--------+-------------+
| content_ptr_id | body | web_site | id | content_id |
+----------------+--------------+---------------+--------+-------------+
| 100 | body1 | web1 | iid1 | 100 |
| 100 | body1 | web1 | iid2 | 100 |
+----------------+--------------+---------------+--------+-------------+
You can't achieve your objective of grouping your results per profile (combining to only show one line per profile) by grouping by all the columns as the result will be the same:
SELECT `profile.content_ptr_id`, `profile.body`, `profile.web_site`, `image.id`, `image.content_id`
FROM ...
GROUP BY `profile.content_ptr_id`, `profile.body`, `profile.web_site`, `image.id`, `image.content_id`
will return
+----------------+--------------+---------------+--------+-------------+
| content_ptr_id | body | web_site | id | content_id |
+----------------+--------------+---------------+--------+-------------+
| 100 | body1 | web1 | iid1 | 100 |
| 100 | body1 | web1 | iid2 | 100 |
+----------------+--------------+---------------+--------+-------------+
The question you need to answer is how you want to display the non-unique columns you want to combine - in this case image.id. You can use count, but this will only return you a number. If you want to display all the text, you can use GROUP_CONCAT() which will concatenate all the values delimited by comma by default. If you use GROUP_CONCAT() the result will look like the following:
SELECT `profile.content_ptr_id`, `profile.body`, `profile.web_site`, GROUP_CONCAT(`image.id`), GROUP_CONCAT(`image.content_id`)
FROM ...
GROUP BY `profile.content_ptr_id`, `profile.body`, `profile.web_site`
This query will return:
+----------------+--------------+---------------+--------------------+-------------+
| content_ptr_id | body | web_site | GROUP_CONCAT(id) | content_id |
+----------------+--------------+---------------+--------------------+-------------+
| 100 | body1 | web1 | iid1,iid2 | 100 |
+----------------+--------------+---------------+--------------------+-------------+
If GROUP_CONCAT() is what you want to use for all the image columns, then go ahead, but doing this for many columns consolidating many rows may make the table less readable. But either way, I would suggest you read some articles to familiarise yourself with how the GROUP BY clause works.
Remove the GROUP BY clause.
I suspect you didn't want to do a GROUP BY operation, given that the expression in the group by is the PRIMARY KEY of the content_profile table.
What is up with all the single quotes? Those are used to enclose string literals, not identifiers.
Thank for sparing us from "scrolling endlessly to the right".
Are you aware that spaces and linebreaks can be included in the SQL text, without altering the meaning of the statement? The parser can easily deal with extra whitespace, and adding the extra whitespace to format the statement can make it much easier for a human reader to decipher.
It's not at all clear why the statement is skipping over the first ten rows, and then returning the next twelve. Very strange.
SELECT p.content_ptr_id AS `profile.content_ptr_id`
, p.body AS `profile.body`
, p.web_site AS `profile.web_site`
, p.email AS `profile.email`
, p.hours AS `profile.hours`
, p.price_range AS `profile.price_range`
, p.price_range_high AS `profile.price_range_high`
, p.primary_category_id AS `profile.primary_category_id`
, p.business_contact_email AS `profile.business_contact_email`
, p.business_contact_phone AS `profile.business_contact_phone`
, p.show_in_directory AS `profile.show_in_directory`
, i.id AS `image.id`
, i.content_id AS `image.content_id`
, i.type AS `image.type`
, i.order AS `image.order`
, i.caption AS `image.caption`
, i.author_id AS `image.author_id`
, i.image AS `image.image`
, i.link_url AS `image.link_url`
FROM `content_profile` p
LEFT
JOIN `content_image` i
ON i.content_id = p.content_ptr_id
ORDER
BY p.content_ptr_id
, i.id
Because content_id is not unique in the content_image table, duplicate rows from content_profile are the expected result.
If your code can't handle the "duplicate" rows, i.e. identifying when the row that was just fetched has the same value for content_ptr_id as the previous row, then your SQL shouldn't do a join operation that creates the duplicated values.

MySQL - Select everything from one table, but only first matching value in second table

I'm feeling a little rusty with creating queries in MySQL. I thought I could solve this, but I'm having no luck and searching around doesn't result in anything similar...
Basically, I have two tables. I want to select everything from one table and the matching row from the second table. However, I only want to have the first result from the second table. I hope that makes sense.
The rows in the daily_entries table are unique. There will be one row for each day, but maybe not everyday. The second table notes contains many rows, each of which are associated with ONE row from daily_entries.
Below are examples of my tables;
Table One
mysql> desc daily_entries;
+----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+----------------+
| eid | int(11) | NO | PRI | NULL | auto_increment |
| date | date | NO | | NULL | |
| location | varchar(100) | NO | | NULL | |
+----------+--------------+------+-----+---------+----------------+
Table Two
mysql> desc notes;
+---------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------+------+-----+---------+----------------+
| task_id | int(11) | NO | PRI | NULL | auto_increment |
| eid | int(11) | NO | MUL | NULL | |
| notes | text | YES | | NULL | |
+---------+---------+------+-----+---------+----------------+
What I need to do, is select all entries from notes, with only one result from daily_entries.
Below is an example of how I want it to look:
+----------------------------------------------+---------+------------+----------+-----+
| notes | task_id | date | location | eid |
+----------------------------------------------+---------+------------+----------+-----+
| Another note | 3 | 2014-01-02 | Home | 2 |
| Enter a note. | 1 | 2014-01-01 | Away | 1 |
| This is a test note. To see what happens. | 2 | | Away | 1 |
| Testing another note | 4 | | Away | 1 |
+----------------------------------------------+---------+------------+----------+-----+
4 rows in set (0.00 sec)
Below is the query that I currently have:
SELECT notes.notes, notes.task_id, daily_entries.date, daily_entries.location, daily_entries.eid
FROM daily_entries
LEFT JOIN notes ON daily_entries.eid=notes.eid
ORDER BY daily_entries.date DESC
Below is an example of how it looks with my query:
+----------------------------------------------+---------+------------+----------+-----+
| notes | task_id | date | location | eid |
+----------------------------------------------+---------+------------+----------+-----+
| Another note | 3 | 2014-01-02 | Home | 2 |
| Enter a note. | 1 | 2014-01-01 | Away | 1 |
| This is a test note. To see what happens. | 2 | 2014-01-01 | Away | 1 |
| Testing another note | 4 | 2014-01-01 | Away | 1 |
+----------------------------------------------+---------+------------+----------+-----+
4 rows in set (0.00 sec)
At first I thought I could simply GROUP BY daily_entries.date, however that returned only the first row of each matching set. Can this even be done? I would greatly appreciate any help someone can offer. Using Limit at the end of my query obviously limited it to the value that I specified, but applied it to everything which was to be expected.
Basically, there's nothing wrong with your query. I believe it is exactly what you need because it is returning the data you want. You can not look at as if it is duplicating your daily_entries you should be looking at it as if it is return all notes with its associated daily_entry.
Of course, you can achieve what you described in your question (there's an answer already that solve this issue) but think twice before you do it because such nested queries will only add a lot of noticeable performance overhead to your database server.
I'd recommend to keep your query as simple as possible with one single LEFT JOIN (which is all you need) and then let consuming applications manipulate the data and present it the way they need to.
Use mysql's non-standard group by functionality:
SELECT n.notes, n.task_id, de.date, de.location, de.eid
FROM notes n
LEFT JOIN (select * from
(select * from daily_entries ORDER BY date DESC) x
group by eid) de ON de.eid = n.eid
You need to do these queries with explicit filtering for the last row. This example uses a join to do this:
SELECT n.notes, n.task_id, de.date, de.location, de.eid
FROM daily_entries de LEFT JOIN
notes n
ON de.eid = n.eid LEFT JOIN
(select n.eid, min(task_id) as min_task_id
from notes n
group by n.eid
) nmin
on n.task_id = nmin.min_task_id
ORDER BY de.date DESC;

Get distinct results from several tables

I need to implement mysql query to calculate space used by user's mailbox.
A message thread may have multiple messages (reply, follow up) by 2 parties
(sender/recipient) and is tagged with one or more tags (Inbox, Sent etc.).
The following conditions have to be met:
a) user is either recipient OR author of the message;
b) message IS TAGGED by any of the tags: 1,2,3,4;
c) distinct records only, ie if the thread, containing messages is tagged with
more than one of the 4 tags (for example 1 and 4: Inbox and Sent) the calculation
is done on one tag only
I have tried the following query but I am not able to get distinct values - the
subject/body values are duplicated:
SELECT SUM(LENGTH(subject)+LENGTH(body)) AS sum
FROM om_msg_message omm
JOIN om_msg_index omi ON omm.mid = omi.mid
JOIN om_msg_tags_index omti ON omi.thread_id = omti.thread_id AND omti.uid = user_id
WHERE (omi.recipient = user_id OR omi.author = user_id) AND omti.tag_id IN (1,2,3,4)
GROUP BY omi.mid;
Structure of the tables:
om_msg_message - fields subject and body are the ones to be calculated
+--------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+----------------+
| mid | int(10) unsigned | NO | PRI | NULL | auto_increment |
| subject | varchar(255) | NO | | NULL | |
| body | longtext | NO | | NULL | |
| timestamp | int(10) unsigned | NO | | NULL | |
| reply_to_mid | int(10) unsigned | NO | | 0 | |
+--------------+------------------+------+-----+---------+----------------+
om_msg_index
+-----+-----------+-----------+--------+--------+---------+
| mid | thread_id | recipient | author | is_new | deleted |
+-----+-----------+-----------+--------+--------+---------+
| 1 | 1 | 1392 | 1211 | 0 | 0 |
| 2 | 1 | 1211 | 1392 | 1 | 0 |
+-----+-----------+-----------+--------+--------+---------+
om_msg_tags_index
+--------+------+-----------+
| tag_id | uid | thread_id |
+--------+------+-----------+
| 1 | 1211 | 1 |
| 4 | 1211 | 1 |
| 1 | 1392 | 1 |
| 4 | 1392 | 1 |
+--------+------+-----------+
Here's another solution:
SELECT SUM(LENGTH(omm.subject) + LENGTH(omm.body)) as totalLength
FROM om_msg_message omm
JOIN om_msg_index omi
ON omi.mid = omm.mid
AND (omi.recipient = user_id OR omi.author = user_id)
JOIN (SELECT DISTINCT thread_id
FROM om_msg_tags_index
WHERE uid = user_id
AND tag_id IN (1, 2, 3, 4)) omti
ON omti.thread_id = omi.thread_id
I'm assuming that:
user_id is a parameter marker/host variable, being queried for an individual user.
You want the total of all messages per user, not the total length of each message (which is what the GROUP BY clause in your version was getting you).
That mid in both om_msg_message and om_msg_index is unique.
So, your problem is the IN clause. I'm not a MYSQL guru, but in T-SQL you could change it to have a where clause on a subquery that contained an EXISTS so your join didn't pop out two rows. You need to compensate for the fact that you have two rows with different tagID's associated with each row of your primary join data.
The way I could do it cross-platform would be with four left-joins that linked tables then demanded a non-null value for 1, 2, 3, or 4. Fairly inefficient; I'm sure there's a better way to do it in MySQL, but now that you know what the problem is you might know a better solution.

Fast complex query to select bookings

I'm trying to write a query to get a courses information and the number of bookings and attendees. Each course can have many bookings and each booking can have many attendees.
We already have a working report, but it uses multiple queries to get the required information. One to get the courses, one to get the bookings, and one to get the number of attendees. This is very slow because of the size that the database has grown to.
There are a number of extra conditions for the reports:
Bookings must be made more than 5
minutes ago, or have been confirmed
The booking must not be canceled
The course must not be marked as deleted
The courses venue and location must be LIKE a search string
Courses with no bookings must appear in the results
This is the table structure: (I've omitted the unneeded information. All fields are not null and have no default)
mysql> DESCRIBE first_aid_courses;
+------------------+--------------+-----+----------------+
| Field | Type | Key | Extra |
+------------------+--------------+-----+----------------+
| id | int(11) | PRI | auto_increment |
| course_date | date | | |
| region_id | int(11) | | |
| location | varchar(255) | | |
| venue | varchar(255) | | |
| number_of_spaces | int(11) | | |
| deleted | tinyint(1) | | |
+------------------+--------------+-----+----------------+
mysql> DESCRIBE first_aid_bookings;
+-----------------------+--------------+-----+----------------+
| Field | Type | Key | Extra |
+-----------------------+--------------+-----+----------------+
| id | int(11) | PRI | auto_increment |
| first_aid_course_id | int(11) | | |
| placed | datetime | | |
| confirmed | tinyint(1) | | |
| cancelled | tinyint(1) | | |
+-----------------------+--------------+-----+----------------+
mysql> DESCRIBE first_aid_attendees;
+----------------------+--------------+-----+----------------+
| Field | Type | Key | Extra |
+----------------------+--------------+-----+----------------+
| id | int(11) | PRI | auto_increment |
| first_aid_booking_id | int(11) | | |
+----------------------+--------------+-----+----------------+
mysql> DESCRIBE regions;
+----------+--------------+-----+----------------+
| Field | Type | Key | Extra |
+----------+--------------+-----+----------------+
| id | int(11) | PRI | auto_increment |
| name | varchar(255) | | |
+----------+--------------+-----+----------------+
I need to select the following:
Course ID: first_aid_courses.id
Date: first_aid_courses.course_date
Region regions.name
Location: first_aid_courses.location
Bookings: COUNT(first_aid_bookings)
Attendees: COUNT(first_aid_attendees)
Spaces Remaining: COUNT(first_aid_bookings) - COUNT(first_aid_attendees)
This is what I have so far:
SELECT `first_aid_courses`.*,
COUNT(`first_aid_bookings`.`id`) AS `bookings`,
COUNT(`first_aid_attendees`.`id`) AS `attendees`
FROM `first_aid_courses`
LEFT JOIN `first_aid_bookings`
ON `first_aid_courses`.`id` =
`first_aid_bookings`.`first_aid_course_id`
LEFT JOIN `first_aid_attendees`
ON `first_aid_bookings`.`id` =
`first_aid_attendees`.`first_aid_booking_id`
WHERE ( `first_aid_courses`.`location` LIKE '%$search_string%'
OR `first_aid_courses`.`venue` LIKE '%$search_string%' )
AND `first_aid_courses`.`deleted` = 0
AND ( `first_aid_bookings`.`placed` > '$five_minutes_ago'
AND `first_aid_bookings`.`cancelled` = 0
OR `first_aid_bookings`.`confirmed` = 1 )
GROUP BY `first_aid_courses`.`id`
ORDER BY `course_date` DESC
Its not quite working, can any one help me with writing the correct query? Also there are 1000s of rows in this database, so any help on making it fast is appreciated (like which fields to index).
Ok, Ive answered my own question. Sometimes it helps to ask a question for you to figure out the answer.
SELECT `first_aid_courses`.*,
`regions`.`name` AS `region_name`,
COUNT(DISTINCT `first_aid_bookings`.`id`) AS `bookings`,
COUNT(`first_aid_attendees`.`id`) AS `attendees`
FROM `first_aid_courses`
JOIN `regions`
ON `first_aid_courses`.`region_id` = `regions`.`id`
LEFT JOIN `first_aid_bookings`
ON `first_aid_courses`.`id` =
`first_aid_bookings`.`first_aid_course_id`
LEFT JOIN `first_aid_attendees`
ON `first_aid_bookings`.`id` =
`first_aid_attendees`.`first_aid_booking_id`
WHERE ( `first_aid_courses`.`location` LIKE '%$search_string%'
OR `first_aid_courses`.`venue` LIKE '%$search_string%' )
AND `first_aid_courses`.`deleted` = 0
AND ( `first_aid_bookings`.`cancelled` = 0
AND `first_aid_bookings`.`confirmed` = 1 )
GROUP BY `first_aid_courses`.`id`
ORDER BY `course_date` ASC
This is completely untested, but maybe try selecting a count of non-null rows for bookings and attendees, like this:
SUM(IF(`first_aid_bookings`.`id` IS NOT NULL, 1, 0)) AS `bookings`,
COUNT(IF(`first_aid_attendees`.`id` IS NOT NULL, 1, 0)) AS `attendees`
Unless you have it but just do not show it, have a good look on indexes, without them you loose an order of magnitude on performance on any query that references anything but primary key.
Another major performance hit are the LIKE '%nnn%'.
Would it be possible to do something with those?
But with some good indexes, this query should be fine if you have the hardware to back it up.
I have queries doing LIKE on tables with millions of rows. its not a problem if the rest of the query can eliminate any unnecessary matchings.
You could go for subqueries to lessen the scope for the LIKE queries.