How can I refine this query? - mysql

You might want to have a look at my previous question.
My database schema looks like this
--------------- ---------------
| candidate 1 | | candidate 2 |
--------------- \ --------------
/ \ |
------- -------- etc
|job 1| | job 2 |
------- ---------
/ \ / \
--------- --------- --------- --------
|company | | skills | |company | | skills |
--------- --------- ---------- ----------
Here's my database:
mysql> describe jobs;
+--------------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------+------+-----+---------+----------------+
| job_id | int(11) | NO | PRI | NULL | auto_increment |
| candidate_id | int(11) | NO | MUL | NULL | |
| company_id | int(11) | NO | MUL | NULL | |
| start_date | date | NO | MUL | NULL | |
| end_date | date | NO | MUL | NULL | |
+--------------+---------+------+-----+---------+----------------+
.
mysql> describe candidates;
+----------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+----------+------+-----+---------+----------------+
| candidate_id | int(11) | NO | PRI | NULL | auto_increment |
| candidate_name | char(50) | NO | MUL | NULL | |
| home_city | char(50) | NO | MUL | NULL | |
+----------------+----------+------+-----+---------+----------------+
.
mysql> describe companies;
+-------------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+---------------+------+-----+---------+----------------+
| company_id | int(11) | NO | PRI | NULL | auto_increment |
| company_name | char(50) | NO | MUL | NULL | |
| company_city | char(50) | NO | MUL | NULL | |
| company_post_code | char(50) | NO | | NULL | |
| latitude | decimal(11,8) | NO | | NULL | |
| longitude | decimal(11,8) | NO | | NULL | |
+-------------------+---------------+------+-----+---------+----------------+
.
Note that I should probably call this skill_usage, as it indicates when a skill was use don a job.
mysql> describe skills;
+----------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+---------+------+-----+---------+-------+
| skill_id | int(11) | NO | MUL | NULL | |
| job_id | int(11) | NO | MUL | NULL | |
+----------+---------+------+-----+---------+-------+
.
mysql> describe skill_names;
+------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+----------+------+-----+---------+----------------+
| skill_id | int(11) | NO | PRI | NULL | auto_increment |
| skill_name | char(32) | NO | MUL | NULL | |
+------------+----------+------+-----+---------+----------------+
So far, my MySQL query looks like this:
SELECT DISTINCT can.candidate_id,
can.candidate_name,
can.candidate_city,
j.job_id,
j.company_id,
DATE_FORMAT(j.start_date, "%b %Y") AS start_date,
DATE_FORMAT(j.end_date, "%b %Y") AS end_date,
s.skill_id
FROM candidates AS can
INNER JOIN jobs AS j ON j.candidate_id = can.candidate_id
INNER JOIN companies AS co ON j.company_id = co.company_id
INNER JOIN skills AS s ON s.job_id = j.job_id
INNER JOIN skill_names AS sn ON s.skill_id = s.skill_id
AND sn.skill_id = s.skill_id
ORDER by can.candidate_id, j.job_id
I am getting output like this, but am not satisfied with it
+--------------+----------------+---------------------+--------+------------+------------+------------+----------+
| candidate_id | candidate_name | candidate_city | job_id | company_id | start_date | end_date | skill_id |
+--------------+----------------+---------------------+--------+------------+------------+------------+----------+
| 1 | Pamela Brown | Cardiff | 1 | 3 | 2019-01-01 | 2019-08-31 | 1 |
| 1 | Pamela Brown | Cardiff | 1 | 3 | 2019-01-01 | 2019-08-31 | 2 |
| 1 | Pamela Brown | Cardiff | 1 | 3 | 2019-01-01 | 2019-08-31 | 1 |
| 1 | Pamela Brown | Cardiff | 2 | 2 | 2018-06-01 | 2019-01-31 | 3 |
| 1 | Pamela Brown | Cardiff | 3 | 1 | 2017-11-01 | 2018-06-30 | 4 |
| 1 | Pamela Brown | Cardiff | 3 | 1 | 2017-11-01 | 2018-06-30 | 5 |
| 1 | Pamela Brown | Cardiff | 3 | 1 | 2017-11-01 | 2018-06-30 | 6 |
| 1 | Pamela Brown | Cardiff | 4 | 3 | 2016-08-01 | 2017-11-30 | 1 |
| 2 | Christine Hill | Salisbury | 5 | 2 | 2018-02-01 | 2019-05-31 | 3 |
Now, I would like to restrict the search, by specifying "skill", like Python, C, C++, UML, etc and company names
The user will enter something like Python AND C++ into a skill search box (and/or Microsoft OR Google into a company name search box).
How do I feed that into my query? Please bear in mind that each skill ID has a job Id associated with it. Maybe I first need to convert the skill names from the search (in this case Python C++) into skill Ids? Even so, how do I include that in my query?
Te make a few things clearer:
both the skills & company search box can be empty, which I will interpret as "return everything"
search terms can include the keywords AND and OR, with grouping brackets (NOT is not required). I am happy enough to parse that in PHP & turn it into a MySQL query term (my difficulty is only with SQL, not PHP)
It looks like I made a start, with that INNER JOIN skills AS s ON s.job_id = j.job_id, which I think will handle a search for a single skill, given its ... name ? ... Id?
I suppose my question would be how would that query look if, for example, I wanted to restrict the results to anyone who had worked at Microsoft OR Google and has the skills Python AND C++?
If I get an example for that, I can extrapolate, but, at this point, I am unsure whether I want more INNER JOINs or WHERE clauses.
I think that I want to extend that second last line AND sn.skill_id = s.skill_id by paring the skills search string, in my example Python AND C++ and generating some SQL along the lines of AND (s.skill_id = X ), where X is the skill Id for Python, BUT I don't know how to handle Python AND C++, or something more complex, like Python AND (C OR C++) ...
Update
Just to be clear, the users are technical and expect to be able to enter complex searches. E.g for skills: (C AND kernel)OR (C++ AND realtime) OR (Doors AND (UML OR QT)).
Final update
The requirements just changed. The person that I am coding this for just told me that if a candidate matches the skill search on any job that he ever worked, then I ought to return ALL jobs for that candidate.
That sounds counter-intuitive to me, but he swears that that is what he wants. I am not sure it can even be done in a single query (I am considering multiple queries; a first t get the candidates with matching skills, then a second to get all of their jobs).

The first thing I'd say is that your original query probably needs an outer join on the skills table - as it stands, it only retrieves people whose job has a skill (which may not be all jobs). You say that "both the skills & company search box can be empty, which I will interpret as return everything" - this version of the query will not return everything.
Secondly, I'd rename your "skills" table to "job_skills", and your "skill_names" to "skills" - it's more consistent (your companies table is not called company_names).
The query you show has a duplication - AND sn.skill_id = s.skill_id duplicates the terms of your join. Is that intentional?
To answer your question: I would present the skills to your users in some kind of pre-defined list in your PHP, associated with a skill_id. You could have all skills listed with check boxes, or allow the user to start typing and use AJAX to search for skills matching the text. This solves a UI problem (what if the user tries to search for a skill that doesn't exist?), and makes the SQL slightly easier.
Your query then becomes:
SELECT DISTINCT can.candidate_id,
can.candidate_name,
can.candidate_city,
j.job_id,
j.company_id,
DATE_FORMAT(j.start_date, "%b %Y") AS start_date,
DATE_FORMAT(j.end_date, "%b %Y") AS end_date,
s.skill_id
FROM candidates AS can
INNER JOIN jobs AS j ON j.candidate_id = can.candidate_id
INNER JOIN companies AS co ON j.company_id = co.company_id
INNER JOIN skills AS s ON s.job_id = j.job_id
INNER JOIN skill_names AS sn ON s.skill_id = s.skill_id
AND skill_id in (?, ?, ?)
OR skill_id in (?)
ORDER by can.candidate_id, j.job_id
You need to substitute the question marks for the input your users have entered.
EDIT
The problem with allowing users to enter the skills as free text is that you then have to deal with case conversion, white space and typos. For instance, is "python " the same as "Python"? Your user probably intends it to be, but you can't do a simple comparison with skill_name. If you want to allow free text, one solution might be to add a "normalized" skill_name column in which you store the name in a consistent format (e.g. "all upper case, stripped of whitespace"), and you normalize your input values in the same way, then compare to that normalized column. In that case, the "in clause" becomes something like:
AND skill_id in (select skill_id from skill_name where skill_name_normalized in (?, ?, ?))
The boolean logic you mention - (C OR C++) AND (Agile) - gets pretty tricky. You end up writing a "visual query builder". You may want to Google this term - there are some good examples.
You've narrowed down your requirements somewhat (I may misunderstand). I believe your requirements are
I want to be able to specify zero or more filters.
A filter consists of one or more ANDed skill groups.
A skill group consists of one or more skills.
Filters are ORed together to create a query.
To make this concrete, let's use your example - (A and (B OR C)) OR (D AND (E OR F)). There are two filters: (A and (B OR C)) and (D AND (E OR F)). The first filter has two skill groups: A and (B OR C).
It's hard to explain the suggestion in text, but you could create a UI that allows users to specify individual "filters". Each "filter" would allow the user to specify one or more "in clauses", joined with an "and". You could then convert this into SQL - again, using your example, the SQL query becomes
SELECT DISTINCT can.candidate_id,
can.candidate_name,
can.candidate_city,
j.job_id,
j.company_id,
DATE_FORMAT(j.start_date, "%b %Y") AS start_date,
DATE_FORMAT(j.end_date, "%b %Y") AS end_date,
s.skill_id
FROM candidates AS can
INNER JOIN jobs AS j ON j.candidate_id = can.candidate_id
INNER JOIN companies AS co ON j.company_id = co.company_id
INNER JOIN skills AS s ON s.job_id = j.job_id
INNER JOIN skill_names AS sn ON s.skill_id = s.skill_id
AND
(skill_id in (A) and skil_id in (B, C))
OR
(skill_id in (D) and skil_id in (E, F))
ORDER by can.candidate_id, j.job_id

Building a bit off previous comments and answers... if handling input like
(A and (B OR C)) OR (D AND (E OR F))
is the blocker you could try moving some of the conditional logic out of the joins and filter instead.
WHERE (
((sn.skill_id LIKE 'A') AND ((sn.skill_id LIKE ('B')) OR (sn.skill_id LIKE('C'))))
AND ((co.company_id IN (1,2,3)) AND ((can.city = 'Springfield') OR (j.city LIKE('Mordor'))))
)
You can build your query string based off used input, search out Id's for selected values and put them into the string and conditionally build as many filters as you like. Think about setting up add_and_filter and add_or_filter functions to construct the <db>.<field> <CONDITION> <VALUE> statements.
$qs = "";
$qs .= "select val from table";
...
$qs .= " WHERE ";
if($userinput){ $qs += add_and_filter($userinput); }
alternately, look at a map/reduce pattern rather than trying to do it all in SQL?

Related

MySQL - How can you select multiple columns on a nested IFNULL...GROUP_CONCAT() condition?

I have a web application which is connected to a MySQL (5.5.64-MariaDB) database.
One of the queries is as follows:
SELECT
d.id,
d.label AS display_label,
d.anchor,
r.id AS regulation_id,
IFNULL(
(SELECT GROUP_CONCAT(value) FROM display_substances `ds`
WHERE `ds`.`display_id` = `d`.`id`
AND ds.substance_id = 1 -- For example, substance ID = 1
GROUP BY `ds`.`display_id`
), "Not Listed"
) `display_value` FROM displays `d`
JOIN groups g ON d.group_id = g.id
JOIN regulations r ON g.regulation_id = r.id
An example of the output is as follows:
+-----+------------------------------------+------------------------------------------------------------------------------------------+
| id | name | display_value |
+-----+------------------------------------+------------------------------------------------------------------------------------------+
| 4 | techfunction | Intermediate / monomer; Corrosion inhibitor / anodiser / galvaniser; Catalyst; Additive |
| 323 | russia_chemsafety_register_display | Not Listed |
| 733 | peru_pcb_display | Not Listed |
+-----+------------------------------------+------------------------------------------------------------------------------------------+
This query does what we need. For explanatory purposes:
There are 2 tables, displays and display_substances
The query is obtaining display_substances.value for each displays.id
If there is no corresponding display_substances.value then the string "Not Listed" (refer to query above) is returned. If there is a corresponding value then display_substances.value is returned. So in the example data above, IDs 323 and 733 refer to a scenario where there is no corresponding entry, therefore we want "Not Listed". Conversely ID 4 does have a value ("Intermediate / monomer; Corrosion inhibitor / anodiser / galvaniser; Catalyst; Additive") so we get that.
The table structures are as follows:
DESCRIBE displays;
+----------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+----------------------+------+-----+---------+----------------+
| id | smallint(5) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(127) | NO | | NULL | |
| label | varchar(255) | NO | | NULL | |
+----------+----------------------+------+-----+---------+----------------+
DESCRIBE display_substances;
+--------------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-----------------------+------+-----+---------+----------------+
| id | mediumint(8) unsigned | NO | PRI | NULL | auto_increment |
| display_id | smallint(5) unsigned | NO | MUL | NULL | |
| substance_id | mediumint(8) unsigned | NO | MUL | NULL | |
| value | text | NO | | NULL | |
| automated | tinyint(4) | YES | | NULL | |
+--------------+-----------------------+------+-----+---------+----------------+
I want to be able to return display_substances.automated (refer to table structure above) as a column from my query. But I can't see how to do this.
The reference to the display_substances table is ds, so I cannot use that in the initial SELECT statement because at that point there's no alias. Equally there is no JOIN condition that would make it possible, because not every row returned obtains data from display_substances (i.e. those that are "Not Listed" are not getting anything from that table).
If I want an additional column next to display_value in the sample output above that shows display_substances.automated, or NULL if it doesn't exist, how can I achieve that?
For reference the automated field either contains a 1 (to represent data that has been obtained through automated processes by our application), or NULL if it isn't automated.
there is no JOIN condition that would make it possible, because not
every row returned obtains data from display_substances
For this case you can use a LEFT JOIN:
SELECT d.id, d.label display_label, d.anchor, r.id regulation_id,
COALESCE(ds.value, 'Not Listed') display_value,
ds.automated
FROM displays d
INNER JOIN groups g ON d.group_id = g.id
INNER JOIN regulations r ON g.regulation_id = r.id
LEFT JOIN (
SELECT display_id, GROUP_CONCAT(value) value, MAX(automated) automated
FROM display_substances
WHERE substance_id = 1
GROUP BY display_id
) ds ON ds.display_id = d.id
I used MAX(automated) as the returned column, but you can use GROUP_CONCAT(automated) just like you do for value and also COALESCE():
COALESCE(ds.automated, 'Not Listed')

Optimizing a conditional join in MySQL that depends on the character length of the source table

I'm using MySQL 5.7 and I'm trying to do a join with one of my source tables to a reference table in order to get the appropriate corresponding values. However, I'd like the join to be conditional so it can match according to the length of the value found in the source column.
Source Table
|---------------------|------------------|
| Company_Name | NAICS_Code |
|---------------------|------------------|
| Chem Inc | 325 |
|---------------------|------------------|
| Joe's Farming | 1112 |
|---------------------|------------------|
Reference Table
|---------------------|------------------|--------------------|------------------|
| NAICS_Code_3_Digit | NAICS_Code_ | NAICS_Code_4_Digit | NAICS_Cod_ |
| | 3D_Description | | 4D_Description |
|---------------------|------------------|--------------------|------------------|
| 325 | Chemicals | 3252 | Resin and Rubber|
|---------------------|------------------|--------------------|------------------|
| 111 | Crop Production | 1112 | Fruit and Nuts |
|---------------------|------------------|----------------------------------------
Final Table
|---------------------|------------------|------------------|--------------------|
| Company_Name | NAICS_Code | NAICS_Code_3D_ | NAICS_Code_4D |
| | | Description | Description |
|---------------------|------------------|---------------------------------------|
| Chem Inc | 325 | Chemicals | NULL |
|---------------------|------------------|------------------|--------------------|
| Joe's Farming | 1112 | Crop Production | Fruit and Nuts |
|---------------------|------------------|------------------|--------------------|
While I'm able to write a query that works, it takes an extremely long time and I' curious as to if there is a better way. Here's what I got so far:
SELECT src.Company_Name,
src.NAICS_Code,
CASE
WHEN LENGTH(src.NAICS_Code < 3 THEN NULL
ELSE ref.NAICS_Code_3D_Description
END AS NAICS_Code_3D_Description,
CASE
WHEN LENGTH(src.NAICS_Code < 4 THEN NULL
ELSE ref.NAICS_Code_4D Description
END AS NAICS_Code_4D_Description
FROM source_table AS src
LEFT JOIN reference_table AS ref ON CASE
WHEN LENGTH(src.NAICS_Code) = 4
AND src.NAICS_Code = ref.NAICS_Code_4_Digit THEN 1
WHEN LENGTH(src.NAICS_Code) = 3
AND src.NAICS_Code = ref.NAICS_Code_3_Digit THEN 1
ELSE 0
END = 1;
It might be more efficient to left join twice:
this avoids the need for the complicated logic in the on clause of the join
conditions are exclusive so it will not generate duplicates in the resultset
then you can use coalesce() in the select clause
So:
select
s.compay_name,
s.naics_code,
coalesce(r1.naics_code_3d_description, r2.naics_code_3d_description) naics_code_3d_description,
r2.naics_code_4d_description
from source_table s
left join reference_table r1 on r1.naics_code_3_digit = s.naics_code
left join reference_table r2 on r2.naics_code_4_digit = s.naics_code
If you want to evict source rows that did not match in the reference table, you can add a where clause, like:
where r1.naics_code_3_digit is not null or r2.naics_code_3d_description is not null

optimizing a query to use a join between two tables and translate rows to columns

I know it's an already done question, but all the answer I found do not suits my needs and, more of this, I am unable to tail a proper solution by myself.
I explain the situation:
2 tables (user, user_preferences)
in the first one there's, as you probably guessed, the name, last name, id and login (there's more data but theese are the ones I need) and in the second one we have user_id, preferences_key and preferences_value.
If I run my query:
select a.id, a.login, a.first_name, a.last_name, b.preferences_key from users a, user_preferences b where a.id=b.user_id and b.preferences_key like 'msg%';
I receive back an answer like this:
+----+---------+---------------+---------------+----------------------+
| id | login | first_name | last_name | preferences_key |
+----+---------+---------------+---------------+----------------------+
| 4 | usrn1 | User1 | NumberOne | msg002 |
| 7 | usrn5 | User5 | NumberFive | msg001 |
| 7 | usrn5 | User5 | NumberFive | msg002 |
| 10 | usrn9 | User0 | NumberNine | msg002 |
+----+---------+---------------+---------------+----------------------+
I'm trying to figure out how to switch from this view to this one:
+----+---------+---------------+---------------+--------+--------+
| id | login | first_name | last_name | msg001 | msg002 |
+----+---------+---------------+---------------+--------+--------+
| 4 | usrn1 | User1 | NumberOne | No | Yes |
| 7 | usrn5 | User5 | NumberFive | Yes | Yes |
| 10 | usrn9 | User0 | NumberNine | No | Yes |
+----+---------+---------------+---------------+--------+--------+
If you have any suggestion will be very appreciated, and, by the way, if you can add some more explanation I'll appreciate it even more.
Thank you
There isn't really an easy way to pivot a table like you want easily that I know of.
There is the following manual approach by JOINing to the same table multiple times. Something like the following should work:
SELECT
a.id, a.login, a.first_name, a.last_name,
IF(b1.preferences_key IS NULL, 'No', 'Yes') msg001,
IF(b2.preferences_key IS NULL, 'No', 'Yes') msg002
FROM
users a
LEFT JOIN user_preferences b1
ON b1.user_id = a.id
AND b1.preferences_key = 'msg001'
LEFT JOIN user_preferences b2
ON b2.user_id = a.id
AND b2.preferences_key = 'msg002';
If this doesn't help. check out MySQL pivot table

MySQL - Select everything from one table, but only first matching value in second table

I'm feeling a little rusty with creating queries in MySQL. I thought I could solve this, but I'm having no luck and searching around doesn't result in anything similar...
Basically, I have two tables. I want to select everything from one table and the matching row from the second table. However, I only want to have the first result from the second table. I hope that makes sense.
The rows in the daily_entries table are unique. There will be one row for each day, but maybe not everyday. The second table notes contains many rows, each of which are associated with ONE row from daily_entries.
Below are examples of my tables;
Table One
mysql> desc daily_entries;
+----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+----------------+
| eid | int(11) | NO | PRI | NULL | auto_increment |
| date | date | NO | | NULL | |
| location | varchar(100) | NO | | NULL | |
+----------+--------------+------+-----+---------+----------------+
Table Two
mysql> desc notes;
+---------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------+------+-----+---------+----------------+
| task_id | int(11) | NO | PRI | NULL | auto_increment |
| eid | int(11) | NO | MUL | NULL | |
| notes | text | YES | | NULL | |
+---------+---------+------+-----+---------+----------------+
What I need to do, is select all entries from notes, with only one result from daily_entries.
Below is an example of how I want it to look:
+----------------------------------------------+---------+------------+----------+-----+
| notes | task_id | date | location | eid |
+----------------------------------------------+---------+------------+----------+-----+
| Another note | 3 | 2014-01-02 | Home | 2 |
| Enter a note. | 1 | 2014-01-01 | Away | 1 |
| This is a test note. To see what happens. | 2 | | Away | 1 |
| Testing another note | 4 | | Away | 1 |
+----------------------------------------------+---------+------------+----------+-----+
4 rows in set (0.00 sec)
Below is the query that I currently have:
SELECT notes.notes, notes.task_id, daily_entries.date, daily_entries.location, daily_entries.eid
FROM daily_entries
LEFT JOIN notes ON daily_entries.eid=notes.eid
ORDER BY daily_entries.date DESC
Below is an example of how it looks with my query:
+----------------------------------------------+---------+------------+----------+-----+
| notes | task_id | date | location | eid |
+----------------------------------------------+---------+------------+----------+-----+
| Another note | 3 | 2014-01-02 | Home | 2 |
| Enter a note. | 1 | 2014-01-01 | Away | 1 |
| This is a test note. To see what happens. | 2 | 2014-01-01 | Away | 1 |
| Testing another note | 4 | 2014-01-01 | Away | 1 |
+----------------------------------------------+---------+------------+----------+-----+
4 rows in set (0.00 sec)
At first I thought I could simply GROUP BY daily_entries.date, however that returned only the first row of each matching set. Can this even be done? I would greatly appreciate any help someone can offer. Using Limit at the end of my query obviously limited it to the value that I specified, but applied it to everything which was to be expected.
Basically, there's nothing wrong with your query. I believe it is exactly what you need because it is returning the data you want. You can not look at as if it is duplicating your daily_entries you should be looking at it as if it is return all notes with its associated daily_entry.
Of course, you can achieve what you described in your question (there's an answer already that solve this issue) but think twice before you do it because such nested queries will only add a lot of noticeable performance overhead to your database server.
I'd recommend to keep your query as simple as possible with one single LEFT JOIN (which is all you need) and then let consuming applications manipulate the data and present it the way they need to.
Use mysql's non-standard group by functionality:
SELECT n.notes, n.task_id, de.date, de.location, de.eid
FROM notes n
LEFT JOIN (select * from
(select * from daily_entries ORDER BY date DESC) x
group by eid) de ON de.eid = n.eid
You need to do these queries with explicit filtering for the last row. This example uses a join to do this:
SELECT n.notes, n.task_id, de.date, de.location, de.eid
FROM daily_entries de LEFT JOIN
notes n
ON de.eid = n.eid LEFT JOIN
(select n.eid, min(task_id) as min_task_id
from notes n
group by n.eid
) nmin
on n.task_id = nmin.min_task_id
ORDER BY de.date DESC;

Get distinct results from several tables

I need to implement mysql query to calculate space used by user's mailbox.
A message thread may have multiple messages (reply, follow up) by 2 parties
(sender/recipient) and is tagged with one or more tags (Inbox, Sent etc.).
The following conditions have to be met:
a) user is either recipient OR author of the message;
b) message IS TAGGED by any of the tags: 1,2,3,4;
c) distinct records only, ie if the thread, containing messages is tagged with
more than one of the 4 tags (for example 1 and 4: Inbox and Sent) the calculation
is done on one tag only
I have tried the following query but I am not able to get distinct values - the
subject/body values are duplicated:
SELECT SUM(LENGTH(subject)+LENGTH(body)) AS sum
FROM om_msg_message omm
JOIN om_msg_index omi ON omm.mid = omi.mid
JOIN om_msg_tags_index omti ON omi.thread_id = omti.thread_id AND omti.uid = user_id
WHERE (omi.recipient = user_id OR omi.author = user_id) AND omti.tag_id IN (1,2,3,4)
GROUP BY omi.mid;
Structure of the tables:
om_msg_message - fields subject and body are the ones to be calculated
+--------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+----------------+
| mid | int(10) unsigned | NO | PRI | NULL | auto_increment |
| subject | varchar(255) | NO | | NULL | |
| body | longtext | NO | | NULL | |
| timestamp | int(10) unsigned | NO | | NULL | |
| reply_to_mid | int(10) unsigned | NO | | 0 | |
+--------------+------------------+------+-----+---------+----------------+
om_msg_index
+-----+-----------+-----------+--------+--------+---------+
| mid | thread_id | recipient | author | is_new | deleted |
+-----+-----------+-----------+--------+--------+---------+
| 1 | 1 | 1392 | 1211 | 0 | 0 |
| 2 | 1 | 1211 | 1392 | 1 | 0 |
+-----+-----------+-----------+--------+--------+---------+
om_msg_tags_index
+--------+------+-----------+
| tag_id | uid | thread_id |
+--------+------+-----------+
| 1 | 1211 | 1 |
| 4 | 1211 | 1 |
| 1 | 1392 | 1 |
| 4 | 1392 | 1 |
+--------+------+-----------+
Here's another solution:
SELECT SUM(LENGTH(omm.subject) + LENGTH(omm.body)) as totalLength
FROM om_msg_message omm
JOIN om_msg_index omi
ON omi.mid = omm.mid
AND (omi.recipient = user_id OR omi.author = user_id)
JOIN (SELECT DISTINCT thread_id
FROM om_msg_tags_index
WHERE uid = user_id
AND tag_id IN (1, 2, 3, 4)) omti
ON omti.thread_id = omi.thread_id
I'm assuming that:
user_id is a parameter marker/host variable, being queried for an individual user.
You want the total of all messages per user, not the total length of each message (which is what the GROUP BY clause in your version was getting you).
That mid in both om_msg_message and om_msg_index is unique.
So, your problem is the IN clause. I'm not a MYSQL guru, but in T-SQL you could change it to have a where clause on a subquery that contained an EXISTS so your join didn't pop out two rows. You need to compensate for the fact that you have two rows with different tagID's associated with each row of your primary join data.
The way I could do it cross-platform would be with four left-joins that linked tables then demanded a non-null value for 1, 2, 3, or 4. Fairly inefficient; I'm sure there's a better way to do it in MySQL, but now that you know what the problem is you might know a better solution.