MySQL SELECT to VIEW returns different results - mysql

I have put together a small sql query which brings data from one table and sorts it under new column names. The sql looks like this:
SELECT course_id AS course, NOW() as datum,
(SELECT COUNT(*) FROM users_courses WHERE course_id = course) AS antal_registrerade,
(SELECT COUNT(*) FROM users_courses WHERE status = 1 AND course_id = course) AS antal_aktiva,
(SELECT COUNT(*) FROM users_courses WHERE status = 3 AND course_id = course) AS antal_avklarade
FROM users_courses GROUP BY course_id
The above query returns the following:
| course | datum | antal_registrerade | antal_aktiva | antal_avklarade |
-----------------------------------------------------------------------------------------
| 31 | 2016-01-12 16:24:58 | 142 | 19 | 83 |
| 38 | 2016-01-12 16:24:58 | 826 | 45 | 49 |
| 39 | 2016-01-12 16:24:58 | 2 | 2 | NULL |
| 43 | 2016-01-12 16:24:58 | 169 | 29 | 32 |
| 44 | 2016-01-12 16:24:58 | 11 | 4 | 2 |
| 45 | 2016-01-12 16:24:58 | 67 | 8 | 7 |
| 46 | 2016-01-12 16:24:58 | 2 | 1 | 1 |
All good right? Just like I wanted it. BUT when I save this query as a view and run that the result is different. I get the same data for every row, except for the course and datum columns.
| course | datum | antal_registrerade | antal_aktiva | antal_avklarade |
-----------------------------------------------------------------------------------------
| 31 | 2016-01-12 16:24:58 | 1219 | 108 | 174 |
| 38 | 2016-01-12 16:24:58 | 1219 | 108 | 174 |
| 39 | 2016-01-12 16:24:58 | 1219 | 108 | 174 |
| 43 | 2016-01-12 16:24:58 | 1219 | 108 | 174 |
| 44 | 2016-01-12 16:24:58 | 1219 | 108 | 174 |
| 45 | 2016-01-12 16:24:58 | 1219 | 108 | 174 |
| 46 | 2016-01-12 16:24:58 | 1219 | 108 | 174 |
Anyone have any idea why this is? The sql found in the saved view looks like this:
SELECT `database`.`users_courses`.`course_id` AS `course`,now() AS `datum`,
(SELECT COUNT(0) from `database`.`users_courses` where (`database`.`users_courses`.`course_id` = `database`.`users_courses`.`course_id`)) AS `antal_registrerade`,
(SELECT COUNT(0) from `database`.`users_courses` where ((`database`.`users_courses`.`status` = 1) and (`database`.`users_courses`.`course_id` = `database`.`users_courses`.`course_id`))) AS `antal_aktiva`,
(SELECT COUNT(0) from `database`.`users_courses` where ((`database`.`users_courses`.`status` = 3) and (`database`.`users_courses`.`course_id` = `database`.`users_courses`.`course_id`))) AS `antal_avklarade`
FROM `database`.`users_courses`
GROUP BY `database`.`users_courses`.`course_id`

This is much simpler to express using conditional aggregation:
SELECT course_id AS course, NOW() as datum,
COUNT(*) as antal_registrerade,
SUM(status = 1) as antal_aktiva,
SUM(status = 3) AS antal_avklarade
FROM users_courses
GROUP BY course_id;
This should fix the problem with your results.
For some reason, the saved code for the view has the correlation clause incorrect. My guess is that you don't have two columns in the table for course and course_id, so your first query isn't exactly what is going into the view. In any case, fix this using a simpler query.

Related

How to get the opposite of a join?

I am trying to get the rows that don't exist in one table where one table called schedules (match_week, player_home_id, player_away_id) and the other table called match (match_week, Winner_id, Defeated_id) are joined. The players look at their schedule and play a match. I am trying to get a list of the scheduled matches that do not exist in the match table. The IDs in the match table can be in either column Winner_id or Defeated_id.
I have reviewed a number of Stack Exchange examples, but most use "IS NULL" and I don't have null values. I have used a Join that does give the output of the matches played. I would like the matches that have not been played.
CSV - wp_schedule_test
+----+------------+--------------+--------------+-----------------+-----------------+
| ID | match_week | home_player1 | away_player1 | player1_home_id | player1_away_id |
+----+------------+--------------+--------------+-----------------+-----------------+
| 1 | WEEK 1 | James Rives | Dale Hemme | 164 | 169 |
| 2 | WEEK 1 | John Head | David Foster | 81 | 175 |
| 3 | WEEK 1 | John Dalton | Eric Simmons | 82 | 23 |
| 4 | WEEK 2 | John Head | James Rives | 81 | 164 |
| 5 | WEEK 2 | Dale Hemme | John Dalton | 169 | 82 |
| 6 | WEEK 2 | David Foster | Eric Simmons | 175 | 23 |
| 7 | WEEK 3 | John Dalton | James Rives | 82 | 164 |
| 8 | WEEK 3 | John Head | Eric Simmons | 81 | 23 |
| 9 | WEEK 3 | Dale Hemme | David Foster | 169 | 175 |
| 10 | WEEK 4 | Eric Simmons | James Rives | 23 | 164 |
| 11 | WEEK 4 | David Foster | John Dalton | 175 | 82 |
| 12 | WEEK 4 | Dale Hemme | John Head | 169 | 81 |
+----+------------+--------------+--------------+-----------------+-----------------+
CSV - wp_match_scores_test
+----+------------+------------+------------+
| ID | match_week | player1_id | player2_id |
+----+------------+------------+------------+
| 5 | WEEK 1 | 82 | 23 |
| 20 | WEEK 1 | 164 | 169 |
| 21 | WEEK 2 | 164 | 81 |
| 25 | WEEK 2 | 82 | 169 |
| 61 | WEEK 3 | 175 | 169 |
| 62 | WEEK 4 | 175 | 82 |
| 69 | WEEK 2 | 175 | 23 |
| 85 | WEEK 3 | 164 | 82 |
| 86 | WEEK 4 | 164 | 23 |
+----+------------+------------+------------+
The output from the mysql query are the matches that have been played. I am trying to figure out how to list the matches that have not been played from the table Schedule.
CSV - MySQL Output
+------------+------------+------------+
| match_week | player1_id | player2_id |
+------------+------------+------------+
| WEEK 1 | 164 | 169 |
| WEEK 1 | 82 | 23 |
| WEEK 2 | 164 | 81 |
| WEEK 2 | 82 | 169 |
| WEEK 2 | 175 | 23 |
| WEEK 3 | 175 | 169 |
| WEEK 3 | 164 | 82 |
| WEEK 4 | 175 | 82 |
| WEEK 4 | 164 | 23 |
+------------+------------+------------+
MYSQL
select DISTINCT ms.match_week, ms.player1_id , ms.player2_id FROM
wp_match_scores_test ms
JOIN wp_schedules_test s
ON (s.player1_home_id = ms.player1_id or s.player1_away_id =
ms.player2_id)
Order by ms.match_week
The expected output is:
CSV - Desired Output
+------------+----------------+----------------+
| match_week | player_home_id | player_away_id |
+------------+----------------+----------------+
| WEEK 1 | 81 | 175 |
| WEEK 3 | 81 | 23 |
| WEEK 4 | 169 | 81 |
+------------+----------------+----------------+
The added code I would like to use is
SELECT s.*
FROM wp_schedules_test s
WHERE NOT EXISTS
(select DISTINCT ms.match_week, ms.player1_id , ms.player2_id FROM
wp_match_scores_test ms
JOIN wp_schedules_test s
ON (s.player1_home_id = ms.player1_id or s.player1_away_id =
ms.player2_id)
Order by ms.match_week)
Unfortunately, the output yields "No Rows"
You can use a LEFT JOIN to achieve the desired results, joining the two tables on matching player ids (noting that player id values in wp_match_scores_test can correspond to either player1_home_id or player1_away_id in wp_schedules_test). If there is no match, the result table will have NULL values from the wp_match_scores_test table values, and you can use that to select the matches which have not been played:
SELECT sch.*
FROM wp_schedule_test sch
LEFT JOIN wp_match_scores_test ms
ON (ms.player1_id = sch.player1_home_id
OR ms.player2_id = sch.player1_home_id)
AND (ms.player1_id = sch.player1_away_id
OR ms.player2_id = sch.player1_away_id)
WHERE ms.ID IS NULL
Output:
ID match_week home_player1 away_player1 player1_home_id player1_away_id
2 Week 1 John Head David Foster 81 175
8 Week 3 John Head Eric Simmons 81 23
12 Week 4 Dale Hemme John Head 169 81
Note that you can also use a NOT EXISTS query, using the same condition as I used in the JOIN:
SELECT sch.*
FROM wp_schedule_test sch
WHERE NOT EXISTS (SELECT *
FROM wp_match_scores_test ms
WHERE (ms.player1_id = sch.player1_home_id
OR ms.player2_id = sch.player1_home_id)
AND (ms.player1_id = sch.player1_away_id
OR ms.player2_id = sch.player1_away_id))
The output of this query is the same. Note though that conditions in the WHERE clause have to be evaluated for every row in the result set and that will generally make this query less efficient than the LEFT JOIN equivalent.
Demo on dbfiddle

select all rows with distinct and condition in MySQL

Hi guys I new in mySQL and I have problem with query. I was trying to write some Query which get me all records from table and if I have two records with the same date i need get only record between this two which have manual_selection = 1.
So result should be all records from my table except id = 1401 and id = 1549
my table
I tried to combine how can I get this records like this:
SELECT * FROM project.score WHERE project_id = 358
AND crawled_at IN(SELECT crawled_at FROM project.score WHERE project_id = 358
AND manual_selection = 1 GROUP BY crawled_at)
ORDER BY crawled_at;
SELECT * FROM project.score WHERE project_id = 358
GROUP BY crawled_at HAVING manual_selection = 1;
but all my way always get only rows with manual_selection = 1. I havent idea how can I distinct rows with duplicate "crawled_at" on case where manual_selection = 1. Can someone help me?
Try this:
select main.id, main.project_id, main.crawled_at, main.score, main.manual_selection
from dcdashboard.moz_optimization_keywords as main
left join dcdashboard.moz_optimization_keywords as non_manual_selection on non_manual_selection.crawled_at = main.crawled_at and non_manual_selection.manual_selection != 1
group by main.crawled_at;
Result with data set from question:
+------+------------+---------------------+-------+------------------+
| id | project_id | crawled_at | score | manual_selection |
+------+------------+---------------------+-------+------------------+
| 807 | 360 | 2016-02-06 00:00:00 | 76 | 0 |
| 1001 | 360 | 2016-02-20 00:00:00 | 76 | 0 |
| 223 | 360 | 2016-11-28 00:00:00 | 76 | 0 |
| 224 | 360 | 2016-12-05 00:00:00 | 76 | 0 |
| 670 | 360 | 2016-12-19 00:00:00 | 76 | 0 |
| 1164 | 360 | 2017-04-19 00:00:00 | 78 | 1 |
| 1400 | 360 | 2017-09-13 00:00:00 | 96 | 1 |
| 1548 | 360 | 2017-09-15 00:00:00 | 96 | 1 |
+------+------------+---------------------+-------+------------------+
8 rows in set (0.00 sec)

Query to get data from four tables

i have following scheme,
purchase_order
+-------------------+----------------------+
| purchase_order_id | purchase_order |
+-------------------+----------------------+
| 54 | Purchase Order 12345 |
| 56 | po-laptop-hp-3 |
| 57 | po-laptop-hp-1 |
+-------------------+----------------------+
purchase_order_detail
+--------------------------+-------------------+---------+------------------+
| purchase_order_detail_id | purchase_order_id | item_id | ordered_quantity |
+--------------------------+-------------------+---------+------------------+
| 61 | 54 | 279 | 500 |
| 62 | 54 | 286 | 700 |
| 63 | 56 | 279 | 43 |
| 64 | 57 | 279 | 43 |
| 65 | 57 | 286 | 43 |
| 66 | 57 | 287 | 43 |
+--------------------------+-------------------+---------+------------------+
delivery_order
+-------------------+--------------------------+-------------------+
| delivery_order_id | purchase_order_detail_id | recieved_quantity |
+-------------------+--------------------------+-------------------+
| 62 | 61 | 250 |
| 63 | 62 | 300 |
| 64 | 63 | 34 |
| 65 | 64 | 34 |
| 66 | 65 | 34 |
| 67 | 66 | 34 |
| 68 | 61 | 34 |
| 69 | 61 | 34 |
+-------------------+--------------------------+-------------------+
stock
+----------+-------------------+------------+----------+------------------+---------------+
| stock_id | delivery_order_id | project_id | quantity | initial_quantity | stock_type_id |
+----------+-------------------+------------+----------+------------------+---------------+
| 12 | 62 | 1 | 60 | 60 | 1 |
| 13 | 63 | 1 | 120 | 120 | 1 |
| 14 | 63 | 1 | 50 | 50 | 1 |
| 15 | 64 | 1 | 12 | 12 | 1 |
| 16 | 62 | 1 | 120 | 120 | 1 |
| 17 | 62 | 1 | 12 | 12 | 1 |
+----------+-------------------+------------+----------+------------------+---------------+
i have write this query but it returns duplicate results
SELECT po.created_on
, po.purchase_order
, i.item_name
, u.unit_name
, pod.ordered_quantity
, do.recieved_quantity
, do.recieved_on
, po.remarks
FROM purchase_order po
, purchase_order_detail pod
, delivery_order do
, stock s
, item i
, unit u
WHERE u.unit_id = i.unit_id
AND i.item_id = pod.item_id
AND po.purchase_order_id = pod.purchase_order_id
AND pod.purchase_order_detail_id = do.purchase_order_detail_id
AND do.delivery_order_id = s.delivery_order_id
AND s.project_id = 1
ORDER BY po.purchase_order_id
, pod.item_id
;
The results
+---------------------+----------------------+------------+-----------+------------------+-------------------+---------------------+---------------------------------------+
| created_on | purchase_order | item_name | unit_name | ordered_quantity | recieved_quantity | recieved_on | remarks |
+---------------------+----------------------+------------+-----------+------------------+-------------------+---------------------+---------------------------------------+
| 2015-02-24 22:48:15 | Purchase Order 12345 | HP Laptops | Unit | 500 | 250 | 2015-02-21 00:00:00 | Adding first Purchase Order as a Test |
| 2015-02-24 22:48:15 | Purchase Order 12345 | HP Laptops | Unit | 500 | 250 | 2015-02-21 00:00:00 | Adding first Purchase Order as a Test |
| 2015-02-24 22:48:15 | Purchase Order 12345 | Lenovo | Unit | 700 | 300 | 2015-02-21 00:00:00 | Adding first Purchase Order as a Test |
| 2015-02-24 22:48:15 | Purchase Order 12345 | Lenovo | Unit | 700 | 300 | 2015-02-21 00:00:00 | Adding first Purchase Order as a Test |
| 2015-02-24 22:55:40 | po-laptop-hp-3 | HP Laptops | Unit | 43 | 34 | 2015-02-21 00:00:00 | dfgsdfgsd |
+---------------------+----------------------+------------+-----------+------------------+-------------------+---------------------+---------------------------------------+
relationship is one to many from top to bottom.
What I wanted to get is the each purchase_order , his ordered quantity of each item, and total recieved quantity, and quantity in stock where project_id = 1 from stock.
i am expecting something like this,
+-------------------+---------+------------------+---------------+----------+
| purchase_order_id | item_id | ordered_quantity | totalReceived | quantity |
+-------------------+---------+------------------+---------------+----------+
| 54 | 279 | 500 | 314 | 192 |
| 54 | 286 | 700 | 300 | 170 |
| 56 | 279 | 43 | 34 | 12 |
+-------------------+---------+------------------+---------------+----------+
EDIT
Thank you for clearing up the mistake in my first part. I realize now that we cannot do all calculations in a single query (because we group on different columns in various parts) so I started by writing individual subqueries and joining them together. The steps went something like this:
Get the sum of received total received quantity for each
purchase_order_detail_id from the delivery_order table.
Join that subquery with the delivery_order table itself to get the totalReceived for the various delivery_order_id values.
Join that result set with the purchase_order_detail table to get the purchase_order_id, item_id, and ordered_quantity for each delivery_order_id.
We now have a result set including the delivery_order_id, purchase_order_id, item_id, ordered_quantity, and total received. The last two things are:
Get the SUM() of quantity for each delivery_order_id from the stock table.
Join that with our above result set on the condition that order_id matches (so we will only get one row) and that project_id is 1 (so we only get the necessary delivery_order_id values). I put that condition in the WHERE clause of the sum subquery.
Here is your final query:
SELECT tmp1.purchase_order_id, tmp1.item_id, tmp1.ordered_quantity, tmp1.totalReceived, tmp2.quantity
FROM(
SELECT tmp.delivery_order_id, pod.purchase_order_id, pod.item_id, pod.ordered_quantity, tmp.totalReceived
FROM purchase_order_detail pod
JOIN(
SELECT do.delivery_order_id, tmp.purchase_order_detail_id, tmp.totalReceived
FROM delivery_order do
JOIN(
SELECT do.purchase_order_detail_id, SUM(do.received_quantity) AS totalReceived
FROM delivery_order do
GROUP BY do.purchase_order_detail_id) tmp ON tmp.purchase_order_detail_id = do.purchase_order_detail_id)
tmp ON tmp.purchase_order_detail_id = pod.purchase_order_detail_id) tmp1
JOIN(
SELECT s.delivery_order_id, SUM(quantity) AS quantity
FROM stock s
WHERE s.project_id = 1
GROUP BY s.delivery_order_id) tmp2 ON tmp2.delivery_order_id = tmp1.delivery_order_id;
Here is the SQL Fiddle. It shows all of the intermediate steps too, if you'd like to see how the results came together individually.
Try modifying your query to use DISTINCT and OUTER JOINs instead of cartesian ("comma") joins.
SELECT DISTINCT po.created_on
, po.purchase_order
, i.item_name
, u.unit_name
, pod.ordered_quantity
, do.recieved_quantity
, do.recieved_on
, po.remarks
FROM purchase_order po
LEFT JOIN purchase_order_detail pod USING (purchase_order_id)
LEFT JOIN delivery_order do USING (purchase_order_detail_id)
LEFT JOIN stock s USING (delivery_order_id)
LEFT JOIN item i USING (item_id)
LEFT JOIN unit u USING (unit_id)
ORDER BY po.purchase_order_id
, pod.item_id
;

Using a calendar table to interpolate values across a date range

BACKGROUND
I am working on a project where I need to capture the 30 day average of values for some id# then use this average to determine if some new value is anomalous. For the purposes of this question, we can assume I only need a 10-day average since the solutions are probably similar. I currently have two tables: history which holds the actual values that I have recorded for specific id# numbers by day but can have some missing days and calendar a date table that has all of the days that I need in my 30 day average.
create table history (
day date not null,
id bigint not null,
category int not null,
value int not null default '0',
primary key (day, id, category),
key category (category)
);
create table calendar (
day date not null primary key
);
I would like to take the existing data that I have in the history table and fill in the missing data by either copying forward a previous value or copying back a forward value. E.g given this data in the history table:
+------------+-----------+----------+-------+
| day | id | category | value |
+------------+-----------+----------+-------+
| 2015-02-19 | 159253663 | 364 | 212 |
| 2015-02-20 | 159253663 | 364 | 211 |
| 2015-02-22 | 159253663 | 364 | 199 |
| 2015-02-23 | 159253663 | 364 | 192 |
| 2015-02-24 | 159253663 | 364 | 213 |
+------------+-----------+--------+---------+
Note: there is no entry for 2015-02-21
I would like to fill in enough data so that I can compute the 10-day average i.e. copy the oldest value (2015-02-19) back to the beginning of my 10-day range then fill in the missing 2015-02-21 value with the previous day's value. The result would be this (stars mark the newly added rows):
+------------+-----------+----------+-------+
| day | id | category | value |
+------------+-----------+----------+-------+
| 2015-02-14 | 159253663 | 364 | 212 | *
| 2015-02-15 | 159253663 | 364 | 212 | *
| 2015-02-16 | 159253663 | 364 | 212 | *
| 2015-02-17 | 159253663 | 364 | 212 | *
| 2015-02-18 | 159253663 | 364 | 212 | *
| 2015-02-19 | 159253663 | 364 | 212 |
| 2015-02-20 | 159253663 | 364 | 211 |
| 2015-02-21 | 159253663 | 364 | 211 | *
| 2015-02-22 | 159253663 | 364 | 199 |
| 2015-02-23 | 159253663 | 364 | 192 |
| 2015-02-24 | 159253663 | 364 | 213 |
+------------+-----------+--------+---------+
ATTEMPT
My initial thought was to left join to a calendar table that has the date ranges I need, when I do that I get something like this:
select c.day, h.id, h.value
from calendar c
left join history h using (day)
where c.day between curdate() - interval 10 day and curdate();
+------------+-----------+----------+-----------+
| day | id | category | value |
+------------+-----------+----------+-----------+
| 2015-02-14 | NULL | NULL | NULL |
| 2015-02-15 | NULL | NULL | NULL |
| 2015-02-16 | NULL | NULL | NULL |
| 2015-02-17 | NULL | NULL | NULL |
| 2015-02-18 | NULL | NULL | NULL |
| 2015-02-19 | 159253663 | 364 | 212 |
| 2015-02-19 | 159253690 | 364 | 222 |
| 2015-02-20 | 159253663 | 364 | 211 |
| 2015-02-20 | 159253690 | 364 | 221 |
| 2015-02-21 | NULL | NULL | NULL |
| 2015-02-22 | 159253663 | 364 | 199 |
| 2015-02-22 | 159253690 | 364 | 209 |
| 2015-02-23 | 159253663 | 364 | 192 |
| 2015-02-23 | 159253690 | 364 | 202 |
| 2015-02-24 | 159253663 | 364 | 213 |
| 2015-02-24 | 159253690 | 364 | 213 |
+------------+-----------+----------+-----------+
I am not sure where to proceed from this point, because I need an entry for each day for each distinct id#. This join only returns a single day if they are missing. I am looking for a better approach. I would like to push as much of the work as possible on the MySQL server, but can do some things programmaticaly. Any/all ideas or suggestions are welcome.
Here is a SQLFiddle that has the DDL definitions I am testing with: http://sqlfiddle.com/#!2/cc206/2
The following uses an # variable and in-statement assignments to roll backward the value (and id):
SET #lastval = 0, #lastid = 0;
SELECT c.day, #lastid := COALESCE(h.id,#lastid) id, #lastval := COALESCE(h.value,#lastval) VALUE, h.id id1,h.value v1
FROM (SELECT DISTINCT c.day,h.id FROM history h, calendar c) c
LEFT JOIN history h ON h.day = c.day AND h.id = c.id
WHERE c.day BETWEEN CURDATE() - INTERVAL 10 DAY AND CURDATE()
ORDER BY COALESCE(h.id,#lastid),c.day DESC
The sub-query seems to be necessary, never been too sure why (some do, some don`t).
If it looks like the results are in the wrong order you might have to add :
SET optimizer_switch='block_nested_loop=off';
before the statement as the block nested loop optimisation can mess with the Order mysql uses when collecting the rows.

MySQL select one field from an Attributes table WHERE condition is in multiple rows

I started here:
MySQL select one field from table WHERE condition is in multiple rows
This works fine - thank you!
The additional complexity is that I need to search within multiple attributes in a single search.
Here's a data snapshot. The attribute_ids are:
1 - language
18 - phone1
19 - phone2
20 - phone3
Sample data
+-----+------------+--------------+------------------------+
| id | contact_id | attribute_id | stored_attribute_value |
+-----+------------+--------------+------------------------+
| 15 | 1 | 1 | english |
| 83 | 5 | 1 | english |
| 153 | 9 | 1 | english |
| 197 | 11 | 1 | english |
| 250 | 3 | 1 | english |
| 267 | 13 | 1 | tagalog |
| 303 | 15 | 1 | spanish |
| 374 | 19 | 1 | spanish |
| 469 | 17 | 1 | spanish |
| 490 | 21 | 1 | spanish |
| 507 | 7 | 1 | english |
| 9 | 1 | 18 | 983-296-3660 |
| 77 | 5 | 18 | 123-300-3985 |
| 147 | 9 | 18 | 215-857-7105 |
| 191 | 11 | 18 | 123-216-8501 |
| 244 | 3 | 18 | 478-786-4450 |
| 261 | 13 | 18 | 802-118-7211 |
| 297 | 15 | 18 | 998-370-4612 |
| 367 | 19 | 18 | 203-435-4023 |
| 463 | 17 | 18 | 945-519-5355 |
| 481 | 21 | 18 | 425-675-8912 |
| 501 | 7 | 18 | 123-712-6946 |
| 11 | 1 | 19 | 123-653-3722 |
| 79 | 5 | 19 | 396-609-5772 |
| 149 | 9 | 19 | 261-899-1470 |
| 193 | 11 | 19 | 673-452-9545 |
| 246 | 3 | 19 | 760-700-5826 |
| 263 | 13 | 19 | 123-701-7931 |
| 299 | 15 | 19 | 123-445-5874 |
| 369 | 19 | 19 | 711-657-8183 |
| 465 | 17 | 19 | 123-130-2816 |
| 483 | 21 | 19 | 123-391-1234 |
| 503 | 7 | 19 | 123-568-1263 |
| 485 | 21 | 20 | 123-428-6610 |
+-----+------------+--------------+------------------------+
So if I were to search for all contacts with language 'english' and phone1 like '123%', the query would be:
SELECT `contact_id`
FROM (`contact_attribute_value`)
WHERE (`attribute_id` = '18' AND `stored_attribute_value` LIKE '123%')
OR (`attribute_id` = '1' AND `stored_attribute_value` = 'english')
GROUP BY `contact_id` HAVING COUNT(*) = 2
And I would get 3 results returned: 5, 7, and 11 which is correct.
The challenge is that I want to create a generic phone field in the search interface so that if a user searches for a phone number, they search all three phone fields simultaneously.
So, I wrote the following query:
SELECT `contact_id`
FROM (`contact_attribute_value`)
WHERE (`attribute_id` = '18' AND `stored_attribute_value` LIKE '123%')
OR (`attribute_id` = '19' AND `stored_attribute_value` LIKE '123%')
OR (`attribute_id` = '20' AND `stored_attribute_value` LIKE '123%')
OR (`attribute_id` = '1' AND `stored_attribute_value` = 'english')
GROUP BY `contact_id` HAVING COUNT(*) = 2
Conceptually that works, but there are conditions in which it breaks.
The first condition is when a contact has language of 'english' and two phone numbers that match like '123%'. Here the contact gets a count of 3 and does not show up in the results.
The second condition is when a contact has a language not equal to 'english' and also has two phone numbers that match like '123%'. In this case, the contact gets a count of 2 and shows up in the results, but that's not what is desired.
I'm sure there's a "hard coded" way of trapping for these conditions in this scenario, but the set of attributes and possible searches is quite large so I need a generalizable solution.
Thanks in advance!
Just taking into account your example and the conditions you explained, I would say that the easiest way to solve this without doing two queries, is with a subquery:
SELECT DISTINCT contact_id
FROM contact_attribute_value AS c
WHERE c.attribute_id IN (18, 19, 20)
AND c.stored_attribute_value LIKE '123%'
AND EXISTS (SELECT *
FROM contact_attribute_value AS c1
WHERE c1.contact_id = c.contact_id AND c1.attribute_id = 1
AND c1.stored_attribute_value = 'english')
With this query, first we check if the contact haves any number that starts with 123, and then the subquery takes care of checking if the contact haves english as his language.
The DISTINCT keyword removes dupliciates, so no need for grouping anymore.
If I understand you correctly try
SELECT c1.contact_id
FROM contact_attribute_value c1 LEFT JOIN contact_attribute_value c2
ON c1.contact_id = c2.contact_id
AND c2.attribute_id = '18' LEFT JOIN contact_attribute_value c3
ON c1.contact_id = c3.contact_id
AND c3.attribute_id = '19' LEFT JOIN contact_attribute_value c4
ON c1.contact_id = c4.contact_id
AND c4.attribute_id = '20'
WHERE c1.attribute_id = '1' AND c1.stored_attribute_value = 'english'
AND (c2.stored_attribute_value LIKE '123%'
OR c3.stored_attribute_value LIKE '123%'
OR c4.stored_attribute_value LIKE '123%')
SQLFiddle
UPDATE Improved version with HAVING that uses conditional count
SELECT `contact_id`
FROM `contact_attribute_value`
GROUP BY `contact_id`
HAVING SUM(CASE WHEN `attribute_id` = '1'
AND `stored_attribute_value` = 'english' THEN 1 ELSE 0 END) = 1
AND (SUM(CASE WHEN `attribute_id` = '18'
AND `stored_attribute_value` LIKE '123%' THEN 1 ELSE 0 END)
+SUM(CASE WHEN `attribute_id` = '19'
AND `stored_attribute_value` LIKE '123%' THEN 1 ELSE 0 END)
+SUM(CASE WHEN `attribute_id` = '20'
AND `stored_attribute_value` LIKE '123%' THEN 1 ELSE 0 END)) > 0
;
SQLFiddle