Sorry if my title is odd, but I'm not even sure how to word my problem in this description let alone a short title.
We have 1000 users. 400 of them are new. 500 of them have updated their profiles with the new fields we added. 100 have not updated their profiles.
When I try to pull data on a specific field I get 900 results.
Select j1.question, j1.response
FROM table1 t1
JOIN table2 j1 on t1.id_user = j1.iduser AND j1.idquestion IN (26)
This is missing the 100 users that haven't updated their profile using the new profile questions.
When I try to pull data on that specific field to include the old profile question that was similar I get 1500 results.
Select j1.question, j1.response
FROM table1 t1
JOIN table2 j1 on t1.id_user = j1.iduser AND (j1.idquestion IN (26) OR j1.idquestion IN (8))
This pulls the 900 results from 26 as well as the original 600 users result from 8.
So my question is, how do I only get the data of the idquestion IN (26) and then the 100 left over from idquestion IN (8)?
This will get you the 100 users that where 'missing' in your first query. I am not sure I understand what you want with quesionID(8).
SELECT t1.question, t1.response
FROM table1 t1
LEFT OUTER JOIN table2 j1 on t1.id_user = j1.iduser
AND j1.idquestion IN (26)
WHERE j1.iduser IS NULL
SELECT IF(q26.question IS NOT NULL, q26.question, q8.question) as question, IF(q26.response IS NOT NULL, q26.response, q8.response) as response
FROM table1 t1
LEFT JOIN table2 as q26
ON
(t1.id_user = q26.iduser
AND q26.idquestion = 26)
LEFT JOIN table2 as q8
ON
(t1.id_user = q8.iduser
AND q8.idquestion = 8);
This should work. Starting with table1 and left joining ensures you get one answer for each user. Joining q26 will join the q26 values if they exist and null otherwise. Joining q8 will do the same in additional columns.
You end up with table1 with some columns that only apply to question26 (or null), followed by columns that only apply to question 8 (or null). Then, if you use IF() in your selects, you can choose the right columns.
Related
I want to count the number of times a behaviour happens in an hour before an event. This requires joining two tables. The first table has the time of the event and the time of an hour prior, the second table has each instance of a behaviour timestamped.
I am currently trying to join the tables with a left join, but because of the WHERE command used to specify to only count the behaviours within the hour period I think it is reducing the number of rows returned. I would like a count for the number of the behaviours in the hour prior to the event even if it is zero.
My code looks something like this:
SELECT Table1.*,
COUNT(Table2.`BehaviourTime`) AS CountBefore
FROM Table1
LEFT JOIN Table2 on Table1.`Date` = Table2.`Date` AND Table1.`GroupRef` =
Table2.`GroupRef`
WHERE Table2.`BehaviourTime` BETWEEN Table1.`HourBefore` AND
Table1.`EventTime`
GROUP BY `Date`, `GroupRef`, `Session`
I am expecting a count for each row of Table1 (as each row represents an event), but am getting fewer rows returned each time. I need to do similar variations of this for different behaviours.
Is there a problem with my code, or a simpler way for me to get the output required without losing data?
Thanks for you help!
UPDATE
Expected result (1917 rows returned):
Session Date GroupRef HourBefore EventTime CountBefore
A 1/1/19 XX 09:32:46 10:32:46 3
P 1/1/19 XX 15:55:02 16:55:02 0
A 4/1/19 XX 06:49:12 07:49:12 8
....
Actual result is returning 1306 rows, with no counts of zero.
Session Date GroupRef HourBefore EventTime CountBefore
A 1/1/19 XX 09:32:46 10:32:46 3
A 4/1/19 XX 06:49:12 07:49:12 8
.....
You should add the condition for left joined table to the on clause
SELECT Table1.*,
COUNT(Table2.`BehaviourTime`) AS CountBefore
FROM Table1
LEFT JOIN Table2 on Table1.`Date` = Table2.`Date` AND Table1.`GroupRef` =
Table2.`GroupRef`
AND Table2.`BehaviourTime` BETWEEN Table1.`HourBefore` AND Table1.`EventTime`
GROUP BY `Date`, `GroupRef`, `Session`
Otherwise if you leave the condition related to the left join table 's columns in where clause these work as INNER JOIN
For the = 0 values of CountBefore could be you need a ifnull check
SELECT Table1.*, ifnull(COUNT(Table2.`BehaviourTime`),0) AS CountBefore
FROM Table1
LEFT JOIN Table2 on Table1.`Date` = Table2.`Date`
AND Table1.`GroupRef` = Table2.`GroupRef`
AND Table2.`BehaviourTime`
BETWEEN Table1.`HourBefore` AND Table1.`EventTime`
GROUP BY `Date`, `GroupRef`, `Session`
Or try using sum for checked if null
SELECT Table1.*
, sum(case when table2.`BehaviourTime` is null then 0 else 1 end ) AS CountBefore
FROM Table1
LEFT JOIN Table2 on Table1.`Date` = Table2.`Date`
AND Table1.`GroupRef` = Table2.`GroupRef`
AND Table2.`BehaviourTime`
BETWEEN Table1.`HourBefore` AND Table1.`EventTime`
GROUP BY `Date`, `GroupRef`, `Session`
I'm really struggling with this query and I hope somebody can help.
I am querying across multiple tables to get the dataset that I require. The following query is an anonymised version:
SELECT main_table.id,
sub_table_1.field_1,
main_table.field_1,
main_table.field_2,
main_table.field_3,
main_table.field_4,
main_table.field_5,
main_table.field_6,
main_table.field_7,
sub_table_2.field_1,
sub_table_2.field_2,
sub_table_2.field_3,
sub_table_3.field_1,
sub_table_4.field_1,
sub_table_4.field_2
FROM main_table
INNER JOIN sub_table_4 ON sub_table_4.id = main_table.id
INNER JOIN sub_table_2 ON sub_table_2.id = main_table.id
INNER JOIN sub_table_3 ON sub_table_3.id = main_table.id
INNER JOIN sub_table_1 ON sub_table_1.id = main_table.id
WHERE sub_table_4.field_1 = '' AND sub_table_4.field_2 = '0' AND sub_table_2.field_1 != ''
The query works, the problem I have is sub_table_1 has a revision number (int 11). Currently I get duplicate records with different revision numbers and different versions of sub_table_1.field_1 which is to be expected, but I want to limit the result set to only include results limited by the latest revision number, giving me only the latest sub_table_1_field_1 and I really can not figure it out!
Can anybody lend me a hand?
Many Thanks.
It's always important to remember that a JOIN can be on a subquery as well as a table. You could build a subquery that returns the results you want to see then, once you've got the data you want, join it in the parent query.
It's hard to 'tailor' an answer that's specific to you problem, as it's too obfuscated (as you admit) to know what the data and tables really look like, but as an example:
Say table1 has four fields: id, revision_no, name and stuff. You want to return a distinct list of name values, with their latest version of stuff (which, we'll pretend varies by revision). You could do this in isolation as:
select t.* from table1 t
inner join
(SELECT name, max(revision_no) maxr
FROM table1
GROUP BY name) mx
on mx.name = t.name
and mx.maxr = t.revision_no;
(Note: see fiddle at the end)
That would return each individual name with the latest revision of stuff.
Once you've got that nailed down, you could then swap out
INNER JOIN sub_table_1 ON sub_table_1.id = main_table.id
....with....
INNER JOIN (select t.* from table1 t
inner join
(SELECT name, max(revision_no) maxr
FROM table1
GROUP BY name) mx
on mx.name = t.name
and mx.maxr = t.revision_no) sub_table_1
ON sub_table_1.id = main_table.id
...which would allow a join with a recordset that is more tailored to that which you want to join (again, don't get hung up on the actual query I've used, it's just there to demonstrate the method).
There may well be more elegant ways to achieve this, but it's sometimes good to start with a simple solution that's easier to replicate, then simplify it once you've got the general understanding of the what and why nailed down.
Hope that helps - as I say, it's as specific as I could offer without having an idea of the real data you're using.
(for the sake of reference, here is a fiddle with a working version of the above example query)
In your case where you only need one column from the table, make this a subquery in your select clause instead of than a join. You get the latest revision by ordering by revision number descending and limiting the result to one row.
SELECT
main_table.id,
(
select sub_table_1.field_1
from sub_table_1
where sub_table_1.id = main_table.id
order by revision_number desc
limit 1
) as sub_table_1_field_1,
main_table.field_1,
...
FROM main_table
INNER JOIN sub_table_4 ON sub_table_4.id = main_table.id
INNER JOIN sub_table_2 ON sub_table_2.id = main_table.id
INNER JOIN sub_table_3 ON sub_table_3.id = main_table.id
WHERE sub_table_4.field_1 = ''
AND sub_table_4.field_2 = '0'
AND sub_table_2.field_1 != '';
Say I need to pull data from several tables like so:
item 1 - from table 1
item 2 - from table 1
item 3 - from table 1 - but select only max value of item 3 from table 1
item 4 - from table 2 - but select only max value of item 4 from table 2
My query is pretty simple:
select
a.item 1,
a.item 2,
b.item 3,
c.item 4
from table 1 a
left join (select b.key_item, max(item 3) from table 1, group by key_item) b on a.key_item = b.key_item
left join (select c.key_item, max(item 4) from table 2, group by key_item) c on c.key_item = a.key_item
I am not sure if my methodology of pulling just a single max item from a table is the most efficient. Assume both tables are over a million rows. my actual sql run forever using this sql setup.
EDIT: I changed the group by clause to reflect comments made. I hope it makes a bit of sense now?
Your best bet is to add an index on table1 and table2, as follows:
ALTER TABLE table1
ADD INDEX `GoodIndexName1` (`key_item`,`item3`)
ALTER TABLE table2
ADD INDEX `GoodIndexName2` (`key_item`,`item4`)
This will allow you to use queries as described in the MySQL documentation for finding the rows holding the group-wise maximum, which appears to be what you are looking for.
Your original (edited) query should work:
select
a.item1,
a.item2,
b.item3,
c.item4
from table1 a
LEFT OUTER JOIN (
SELECT
b.key_item,
MAX(item3) AS item3
FROM table1
GROUP BY key_item
) b
ON a.key_item = b.key_item
LEFT OUTER JOIN (
SELECT
c.key_item,
MAX(item4)
FROM table2
GROUP BY key_item
) c
ON c.key_item = a.key_item
and if that performs slowly after adding the indexes, try the following too:
SELECT
a.item1,
a.item2,
b.item3,
c.item4
FROM table1 a
LEFT OUTER JOIN table1 b
ON b.key_item = a.key_item
LEFT OUTER JOIN table1 larger_b
ON larger_b.key_item = b.key_item
AND larger_b.item3 > b.item_3
LEFT OUTER JOIN table2 c
ON c.key_item = a.key_item
LEFT OUTER JOIN table2 larger_c
ON larger_c.key_item = c.key_item
AND larger_c.item4 > c.item4
WHERE larger_b.key_item IS NULL
AND larger_c.key_item IS NULL
(I have modified the table and column names only slightly, so that they conform to correct MySQL syntax. )
I work with queries that use the above structure all the time, and they perform very efficiently with indexes like the one I provided.
That said, usually I am using INNER JOINs on the b and c tables, but I don't see why your query should have any issues.
If you do experience performance problems still, report the data types of the key_item columns for each table, as if you try to join on different data types, you will generally get poor performance.
Update 1
I discover when it does the wrong behaviour. If the view is composed by two tables, only the fields in the first table has values inside the subquery. I don't know why, but if I change the JOIN order, it works. As soon as I try to match another field with the second table it returns NULL again.
Update 2
I've created a working example here: http://sqlfiddle.com/#!2/d4eb97/1
Update 3
The same example works in a newer MySQL version (5.6.6) so maybe there is a bug in the 5.5 - http://sqlfiddle.com/#!9/4e140/2
I've a schema in which I ended doing a SQL like this:
SELECT view.user,
(
SELECT tableA.user
FROM tableA
LEFT JOIN tableB ON tableA.id = tableB.tableA_id
WHERE tableA.user = view.user
LIMIT 1
) as b_user
FROM view
WHERE view.user = 1
What I'm doing here is simple:
Select two fields from view
view is a MySQL view, not a real table.
The second field is a subquery of:
2.1 The field user of the table tableA
2.2 Left join with the table tableB with the relational field
There are no rows in tableB yet
2.3 Only where the the tableA user is the same as in the view
2.4 Limit 1, just for this example
Limit results to user = 1
The strange thing here is that in some situations the field b_user is NULL, but the data is ok.
I can make three changes to make it works:
fix 1
Put the user id manually make it works
SELECT view.user,
(
SELECT tableA.user
FROM tableA
LEFT JOIN tableB ON tableA.id = tableB.tableA_id
WHERE tableA.user = 1
LIMIT 1
) as b_user
FROM view
WHERE view.user = 1
fix 2
Remove the left join also make it works:
SELECT view.user,
(
SELECT tableA.user
FROM tableA
WHERE tableA.user = view.user
LIMIT 1
) as b_user
FROM view
WHERE view.user = 1
fix 3
Another option is not to use the MySQL view:
SELECT view.user,
(
SELECT tableA.user
FROM tableA
WHERE tableA.user = view_table_a.user
LEFT JOIN tableB ON tableA.id = tableB.tableA_id
LIMIT 1
) as b_user
FROM view_table_a INNER JOIN view_table_b ON condition
WHERE table_a.user = 1
I'm not being able to reproduce this recreating a new database schema manually, it only happens in my current setup, which I cannot expose here due to security reasons.
Why the subquery return NULL values? I need to make the first query works since I can't use any of the three fixes.
Why have the subquery in the first place? I like subqueries, they are very handy things of have around. But they shouldn't be used if they don't have to be. Queries can get complicated enough with no help from us.
You are looking for a particular user from the main table (the fact that it is really a view is irrelevant) then using the same User value to join with TableA and then optionally joining to TableB using the ID value associated with that user:
select rs.Origin, a.Origin as Same_Origin
from requests_status rs
join assignments a
on a.employee = rs.employee
and a.origin = rs.origin
left join assignments_author aa
on aa.assignment = a.id
where rs.employee = 1;
Then I noticed that in your fiddles, you create the assignments_author table but never populate it. But that doesn't really matter because you left join to it. But you don't use any data from that table. So in actuality, you don't need that table in your query at all. Thus the equivalent query would be:
select rs.Origin, a.Origin as Same_Origin
from requests_status rs
join assignments a
on a.employee = rs.employee
and a.origin = rs.origin
where rs.employee = 1;
I don't know why you get a NULL in one but not the other. But since the query above returns the same answer in both fiddles and it is the expected results, my work here is finished.
I assume this is a bug, maybe this one (http://bugs.mysql.com/bug.php?id=52051) because the query fails in MySQL 5.5 (http://sqlfiddle.com/#!2/d4eb97/1) but works in 5.6 (http://sqlfiddle.com/#!9/4e140/2)
Got a query running one sum from a different table, which works perfectly (and obtained from this forum as well):
SELECT
R.REP_ID as repid, R.REP_DESBREV as repdesc,
IFNULL(SUM(RD.REPDATA_CANT), 0) as cant
FROM
REPUESTOS R
LEFT JOIN
REP_DATA RD, ON RD.REPDATA_REPID = R.REP_ID
GROUP BY
RD.REPDATA_REPID
Now, the thing is that I'd like to add an extra column that obtains the total inventory (something like
IFNULL(SUM(I.INV_CANT), 0) as inv)
FROM table INVENTARIO I
WHERE I.INV_REPID = R.REP_ID
This value can be obtained by means of a JOIN, in the exact same way we got the first statement that works, but I have not found the way to include BOTH SUMS in just one query.
Any ideas? THANKS!
Try this query example
select
t1.id,
ifnull(sum(t1.my_column),0) as Sum1,
ifnull(other.sum,0) as Sum2
from table as t1
left join (select id , sum(other_table_column) from other_table group by id) as other
on t1.id = other.id
group by t1.id
I just implemented the following statement:
SELECT
R.REP_PARNUM as parnum,
R.REP_ID as repid,
R.REP_DESBREV as repdesc,
IFNULL(SUM(RD.REPDATA_CANT),0) as cant,
IFNULL(SUM(I.INV_CANT),0) as intt
FROM REPUESTOS R
LEFT JOIN REP_DATA RD ON RD.REPDATA_REPID = R.REP_ID
LEFT JOIN INVENTARIO I ON I.INV_REPID = R.REP_ID
GROUP BY R.REP_PARNUM
It actually runs, but the problem is that it's giving me some weird values. For example, some of the values in the first sum column (REPDATA_CANT) are shown with -3, i.e. if the real result is 20, it shows 17 and so on. In the second sum column (INV_CANT), it actually MULTIPLIES (in some rows, not all) the value by 3. Very weird behaviour, do you have an idea why?