mysql query returns an empty result set - mysql

I have been tasked with a query I am having problems with. Here is the query:
Given a user id and a month, produce a list containing student name, list of files they own (largest to smallest) including total number of files and number of bytes used in a month specified.
Here is what I have so far:
(Select * from htmp_cs368
Join roster_cs368 ON htmp_cs368.userId =
roster_cs368.lastName Where htmp_cs368.userId =
(SELECT lastName FROM roster_cs368 WHERE userId = 'userId' AND htmp_cs368.monthIn = 'monthIn'))
UNION
(Select * from atmp_cs368
JOIN roster_cs368 ON atmp_cs368.userId =
roster_cs368.userId Where roster_cs368.userId =
'userId' AND atmp_cs368.monthIn = 'monthIn') ORDER BY fileSize DESC;
I am getting a result of empty set. My tables are full. I am hoping somone can correct my mistakes.
I have included my schema:
mysql> select * from roster_cs368
-> ;
+--------+-----------+-----------+
| userId | firstName | lastName |
+--------+-----------+-----------+
| apn7cf | Allen | Newton |
| atggg3 | andrew | goebel |
Primary key is userId
mysql> select * from htmp_cs368;
+------------+----------+------------+----------+----------+-------+------+-------+----------------------+
| filePerms | numLinks | userId | idGroup | fileSize | monthIn | day | time | fileName |
+------------+----------+------------+----------+----------+-------+------+-------+----------------------+
| drwx------ | 2 | schulte | faculty | 289 | Nov | 7 | 2011 | Java |
| -rw-r--r-- | 1 | schulte | faculty | 136 | Apr | 29 | 2012 | LD |
| drwxr-xr-x | 3 | schulte | faculty | 177 | Mar | 20 | 2012 | Upgrade |
No primary key here
select * from atmp_cs368;
+------------+----------+--------------+----------+----------+-------+------+-------+-----------------------------+
| filePerms | numLinks | userId | idGroup | fileSize | monthIn | day | time | fileName |
+------------+----------+--------------+----------+----------+-------+------+-------+-----------------------------+
| drwxr-xr-x | 2 | remierm | 203 | 245 | Sep | 17 | 14:40 | 148360_sun_studio_12 |
| drwx---rwx | 31 | antognolij | sasl | 2315 | Oct | 24 | 12:28 | 275 |
| -rwx------ | 1 | kyzvdb | student | 36 | Sep | 19 | 13:05 | 275hh |
No primary key here as either.
I have had very little experience with mysql. I also have to come up with:
If no user id is specified, all files, if no month specified, all users and if neither specified, all months and users.
I am stuck and at a lost. I appreciate any help! Thanks!

You seem to have a number of problems in the SQL.
First
Join roster_cs368 ON htmp_cs368.userId = roster_cs368.lastName
You try to join the userId field to the lastName field, which definitely won't work. It should be userId in both tables.
Then
WHERE userId = 'userId' AND htmp_cs368.monthIn = 'monthIn'
Assuming those really are literal strings, they won't match anything in the table. You need to use a parameterized query, and substitute question marks in the SQL, as in
WHERE userId = ? AND htmp_cs368.monthIn = ?
and provide the actual values to be used in the Java code.
I think you're looking for something along these lines (untested, but this will give you a starting point)
List of files
select r.lastName, r.firstName, t.fileName, t.fileSize
from htmp_cs368 t join roster_cs368 r on t.userId=r.userId
where t.userId=? and t.monthIn=?
order by fileSize desc
Summary:
select r.lastName, r.firstName, count(t.fileName), sum(t.fileSize)
from htmp_cs368 t join roster_cs368 r on t.userId=r.userId
where t.userId=? and t.monthIn=?
group by t.userId
This is a simple approach that does not take into account files appearing and disappearing during a month, but you don't seem to have data in your tables for this.
Also, it's not clear what atmp_cs368 is for, or why the time column in one table seems to have year values.

As pointed out by others you seem to have a number of problems in your SQL. I dont think it can compile as well.
Try:
SELECT r.userId, files.*
FROM roster_cs368 AS r
JOIN (
Select * from htmp_cs368 WHERE userId = 'userId' AND monthIn = 'monthIn'
UNION
Select * from atmp_cs368 Where userId = 'userId' AND monthIn = 'monthIn'
) AS files ON files.userId = r.userId
ORDER BY files.fileSize DESC;
You need only one JOIN. This lists users and all their files. And take care to equate apples to apples (userId != lastName).
Now to get count of files and file sizes etc you need a GroupBy effectively. But you cannot list files and get count of files together "easily". It will have to be one way or other. Just for the count you can use Jim's solution.

This JOIN looks a tad suspicious...
JOIN roster_cs368 ON htmp_cs368.userId = roster_cs368.lastName
Even if userId in htmp_cs368 has an equivalent value in the lastName column of roster_cs368, this is very bad form. JOINS should typically be done on like-named columns.
If these two columns are unrelated (it's hard to tell when roster_cs368 also has a userId column), then that would be at least part of your problem.
Also, htmp_cs368.monthIn = 'monthIn' doesn't make sense. This won't match anything in that column either.

Related

"GROUP BY" on MariaDB behaves differently from MySQL

I have been told many times that same queries MariaDB will work just the same like how it is on MySQL... until I meet this problem.
Recently, I am trying to clone an application from MySQL(InnoDB) to MariaDB(XtraDB).
Although MariaDB runs MySQL queries without the need of changing anything, I was surprised to discover that the same queries actually behave quite differently on both platforms particularly in ORDER BY and GROUP BY.
For an example:
MyTable
=======
+----+----------+---------------------+-----------+
| id | parentId | creationDate | name |
+----+----------+---------------------+-----------+
| 1 | 2357 | 2017-01-01 06:03:40 | Anna |
+----+----------+---------------------+-----------+
| 2 | 5480 | 2017-01-02 07:13:20 | Becky |
+----+----------+---------------------+-----------+
| 3 | 2357 | 2017-01-03 08:20:12 | Christina |
+----+----------+---------------------+-----------+
| 4 | 2357 | 2017-01-03 08:20:15 | Dorothy |
+----+----------+---------------------+-----------+
| 5 | 5480 | 2017-01-04 09:25:45 | Emma |
+----+----------+---------------------+-----------+
| 6 | 1168 | 2017-01-05 10:30:10 | Fiona |
+----+----------+---------------------+-----------+
| 7 | 5480 | 2017-01-05 10:33:23 | Gigi |
+----+----------+---------------------+-----------+
| 8 | 1168 | 2017-01-06 12:46:34 | Heidi |
+----+----------+---------------------+-----------+
| 9 | 1168 | 2017-01-06 12:46:34 | Irene |
+----+----------+---------------------+-----------+
| 10 | 2357 | 2017-01-07 14:58:37 | Jane |
+----+----------+---------------------+-----------+
| 11 | 2357 | 2017-01-07 14:58:37 | Katy |
+----+----------+---------------------+-----------+
Basically what I want to get from a query is the latest records from each GROUPing (i.e. parentId). By latest, I mean MAX(creationDate) and MAX(id)
So, for the above example, since there are only three different parentId values, I am hoping to get:
+----+----------+---------------------+-----------+
| id | parentId | creationDate | name |
+----+----------+---------------------+-----------+
| 11 | 2357 | 2017-01-07 14:58:37 | Katy |
+----+----------+---------------------+-----------+
| 9 | 1168 | 2017-01-06 12:46:34 | Irene |
+----+----------+---------------------+-----------+
| 7 | 5480 | 2017-01-05 10:33:23 | Gigi |
+----+----------+---------------------+-----------+
Originally the application has queries similar to this fashion:
SELECT * FROM
( SELECT * FROM `MyTable` WHERE `parentId` IN (...)
ORDER BY `creationDate` DESC, `id` DESC ) AS `t`
GROUP BY `parentId`;
On MySQL, this works, since the inner query will order and then the outer query gets the first of each GROUP from the result of the inner query. The outer query basically obeys ordering of the inner query.
But on MariaDB, the outer query will ignore the ordering of the inner query result. I get this on MariaDB instead:
+----+----------+---------------------+-----------+
| id | parentId | creationDate | name |
+----+----------+---------------------+-----------+
| 1 | 2357 | 2017-01-01 06:03:40 | Anna |
+----+----------+---------------------+-----------+
| 2 | 5480 | 2017-01-02 07:13:20 | Becky |
+----+----------+---------------------+-----------+
| 6 | 1168 | 2017-01-05 10:30:10 | Fiona |
+----+----------+---------------------+-----------+
To achieve the same behaviour on MariaDB, I have come up with something like this. (Not sure if this is accurate though.)
SELECT `t1`.* FROM `MyTable` `t1` LEFT JOIN `MyTable` `t2` ON (
`t1`.`parentId` = `t2`.`parentId`
AND `t2`.`parentId` IN (...)
AND `t1`.`creationDate` <= `t2`.`creationDate`
AND `t1`.`id` < `t2`.`id`)
) WHERE `t2`.`id` IS NULL;
Now the problem is... If I am going to rewrite the queries, I have to rewrite hundreds of them... and they are some how a little bit different from each other.
I wonder if anyone here have any ideas that would allow me to make the least changes possible.
Thank you all in advance.
Yeah, this is a link-only answer. But the links are to the MariaDB site.
Here is another discussion of the 'incompatibility': https://mariadb.com/kb/en/mariadb/group-by-trick-has-been-optimized-away/
Technically, speaking, MySQL implemented an extension to the the Ansi standard. Much later, it decided to remove it, so I think you will find that MySQL has migrated toward MariaDB.
Here is list of "fast" ways to do group-wise max, which is probably what you are trying to do: https://mariadb.com/kb/en/mariadb/groupwise-max-in-mariadb/
Your first query would probably work in MySQL but its behavior is not documented: you are grouping by groupid but you are selecting non-aggregated columns with * and the value of any of those non-aggregated columns is undefined - if the value you get is the first value encountered it's just a "matter of luck".
It is true that, even if it cannot be considered correct, on MySQL I have never seen this "trick" fail (and here on stackoverflow there are plenty of upvoted answers suggesting you to use this trick), but MariaDB uses a different optimization engine and you cannot rely on MySQL undocumented behavior.
Your second query needs a little adjustment:
and (
`t1`.`creationDate` < `t2`.`creationDate`
or (
`t1`.`creationDate` = `t2`.`creationDate`
and `t1`.`id` < `t2`.`id`
)
)
because first you are ordering by creation date, then if more than one record share the same creation date you are getting the one with the highest id.
There are other ways to write the same query, e.g.
select * from mytable
where id in (
select max(m.id)
from mytable m inner join (
select parentID, max(creationDate) as max_cd
from mytable
group by ParentID
) t on m.parentID = t.parentID and m.creationDate = t.max_cd
group by m.parentID, m.creationDate
)
but every query needs to be rewritten separately.
Edit
Your example is a little more complicated because you are ordering by both creationDate and id. Let me explain better. First thing to do, for every parentID you have to get the last creationDate:
select parentID, max(creationDate) as max_cd
from MyTable
group by parentID
then for every max creationDate you have to get the highest id:
select t.parentID, t.max_cd, max(t.id) as max_id
from
MyTable t inner join (
select parentID, max(creationDate) as max_cd
from MyTable
group by parentID
) t1 on t.parentID = t1.parentID and t.creationDate = t1.max_cd
group t.parentID, t.max_cd
then you have to get all records where the id are returned by this query. In this particular context a LEFT JOIN with the table itself should be easier to write and more performant.

Issue with grouping?

I asked earlier about a solution to my problem which worked however now when I'm trying to get some information from a second table (that stores more information) I'm running into a few issues.
My tables are as follows
Users
+----+----------------------+---------------+------------------+
| id | username | primary_group | secondary_groups |
+----+----------------------+---------------+------------------+
| 1 | Username1 | 3 | 7,10 |
| 2 | Username2 | 7 | 3,5,10 |
| 3 | LongUsername | 1 | 3,7 |
| 4 | Username3 | 1 | 3,10 |
| 5 | Username4 | 7 | |
| 6 | Username5 | 5 | 3,7,10 |
| 7 | Username6 | 2 | 7 |
| 8 | Username7 | 4 | |
+----+----------------------+---------------+------------------+
Profile
+----+---------------+------------------+
| id | facebook | steam |
+----+---------------+------------------+
| 1 | 10049424151 | 11 |
| 2 | 10051277183 | 55 |
| 3 | 10051281183 | 751 |
| 4 | | 735 |
| 5 | 10051215770 | 4444 |
| 6 | 10020210531 | 50415 |
| 7 | 10021056938 | 421501 |
| 8 | 10011547143 | 761 |
+----+---------------+------------------+
My SQL is as follows (based off the previous thread)
SELECT u.id, u.username, p.id, p.facebook, p.steam
FROM users u, profile p
WHERE p.id=u.id AND FIND_IN_SET( '7', secondary_groups )
OR primary_group = 7
GROUP BY u.id
The problem is my output is displayed as below
+----+----------------------+-------------+-------+
| id | username | facebook | steam |
+----+----------------------+-------------+-------+
| 1 | Username1 | 10049424151 | 11 |
| 2 | Username2 | 10051277183 | 55 |
| 3 | LongUsername | 10051281183 | 751 |
| 4 | Username4 | 10051215770 | 4444 |
| 5 | Username5 | 10049424151 | 11 |
| 6 | Username6 | 10049424151 | 55 |
+----+----------------------+-------------+-------+
I'm guessing that the problem is that profile rows with a primary_group of 7 are getting matched to all user rows. Remove the GROUP BY, and you'll be able to better see what is happening.
But that's just a guess. It's not clear what you are attempting to achieve.
I suspect you are getting tripped up with the order of precedence of the AND and OR. (The AND operator has a higher order of precedence than OR operator. That means the AND will be evaluated before the OR.)
The quick fix is to just add some parens, to override the default order of operations. Something like this:
WHERE p.id=u.id AND ( FIND_IN_SET('7',secondary_groups) OR primary_group = 7 )
-- ^ ^
The parens will cause the OR operation to be evaluated (as either TRUE, FALSE or NULL) and then the result from that will be evaluated in the AND.
Without the parens, it's the same as if the parens were here:
WHERE ( p.id=u.id AND FIND_IN_SET('7',secondary_groups) ) OR primary_group = 7
-- ^ ^
With the AND condition evaluated first, and the result from that is operated on by OR. This is what is causing profile rows with a 7 to be matched to rows in user with different id values.
A few pointers on style:
avoid the old-school comma operator for join operations, and use the newer JOIN syntax
place the join predicates (conditions) in the ON clause, other filtering criteria in the WHERE clause
qualify all column references
As an example:
SELECT u.id
, u.username
, p.id
, p.facebook
, p.steam
FROM users u
JOIN profile p
ON p.id = u.id
WHERE u.primary_group = 7
OR FIND_IN_SET('7',u.secondary_groups)
ORDER BY u.id
We only need a GROUP BY clause if we want to "collapse" rows. If the id column is unique in both the users and profile tables, then there's no need for a GROUP BY u.id. We can add an ORDER BY clause if we want rows returned in a particular sequence.
I don't know, what exactly do you want to do with output, but you can't group informations like this. MySQL isn't really a classic programming language, it's more like powerful tool for set mathematics. So if you want to get informations based on corelations between two or more tables, first you write a select statement which contains raw data which you want to work with, like this:
SELECT * FROM users u INNER JOIN profile p ON p.id=u.id
GROUP BY u.id;
Now you select relevant data with WHERE statement:
SELECT * FROM users u INNER JOIN profile p ON p.id=u.id WHERE
FIND_IN_SET( '7', secondary_groups ) OR primary_group = 7
GROUP BY u.id;
Now you should see grouped joined tables profile and users, and can start mining data. For example, if you want to count items in these groups, just add count function in SELECT and so on.
When debugging SQL, I highly recommend these steps:
1.) First, you should write down all corelations between data, all foreign keys between tables, so you will know if your selection is fully deterministic. You can now start JOINing tables from left to right
2.) Try small bits of querys on model database. Then you will see which selection works right and which doesn't do what you expected.
I think #SIDU has it in the comments: You are experiencing a Boolean order of operations problem. See also SQL Logic Operator Precedence: And and Or
For example:
SELECT 0 AND 0 OR 1 AS test;
+------+
| test |
+------+
| 1 |
+------+
When doing complex statements with both AND and OR, use parenthesis. The operator order problem is leading to you doing an unintended outer join that's being masked by your GROUP BY. You shouldn't need a GROUP BY for that statement.
Although I don't personally care for the style #spencer7593 suggests in his answer(using INNER JOIN, etc.), it does have the advantage of preventing or identifying errors early for people new to SQL, so it's something to consider.

Mysql query with multiple LEFT join issue

I want to get this kind of result from a mysql query:
Name | Asked Date | Granted Date | Duration
Joe | 2016-07-01 10:02:01 | 2016-07-01 10:02:05 | 10
Ben | 2016-07-01 10:04:24 | 2016-07-01 10:04:26 | 12
....
Every entries are stored in a table that look like this:
id | action_date | action_type | unique_instance | name
12 | 2016-07-01 10:02:01 | Asked | 6546532161654 | Joe
13 | 2016-07-01 10:02:06 | Granted | 6546532161654 | Joe
14 | 2016-07-01 10:05:12 | Asked | 6546532161654 | Ben
15 | 2016-07-01 10:05:15 | Granted | 6546532161654 | Ben
16 | 2016-07-01 10:06:06 | Finished | 6546532161654 | Joe
I've tried to do this query, but it didn't worked:
Select table.name as Name,
table.action_date as Asked,
g.action_date as Granted,
TIMESTAMPDIFF(SECOND, g.action_date, q.action_date) as Duration
FROM table
LEFT JOIN table g ON table.unique_instance = g.unique_instance AND g.action_type = 'Granted'
LEFT JOIN table q ON table.unique_instance = q.unique_instance AND q.action_type = 'Finished'
WHERE table.action_type = 'Asked'
AND table.unique_instance = '6546532161654'
GROUP BY table.action_date;
As stated in the comments.
Your design of your database seem incorrect as you are not even making use of your ID.
You aren't able to identify which record is unique and related to what, however you can still use the name but it's not unique, since later on in the future you are still going to have records with the same name.
My suggestion would be having action_type and ID as primary key. That way you can always use the same ID and changing the action_type accordingly.
Example (note that you can't have more than 1 record containing this):
ID: 1, action_type: Asked
ID: 1, action_type: Granted
ID: 1, action_type: Finished
In order to debug your code, this query should return just 1 record:
select * from table where unique_instance = '6546532161654' AND action_type = 'Granted'
But since unique_instance is not unique after all, it is going to grab other unwanted data.
Note: Your query is correct.

List Last record of each item in mysql

Each item(item is produced by Serial) in my table has many record and I need to get last record of each item so I run below code:
SELECT ID,Calendar,Serial,MAX(ID)
FROM store
GROUP BY Serial DESC
it means it must show a record for each item which in that record all data of columns be for last record related to each item but the result is like this:
-------------------------------------------------------------+
ID | Calendar | Serial | MAX(ID) |
-------------------------------------------------------------|
7031053 | 2016-05-14 14:05:14 79.5 | N10088 | 7031056 |
7053346 | 2016-05-14 15:17:28 79.8 | N10078 | 7053346 |
7051349 | 2016-05-14 15:21:29 86.1 | J20368 | 7051349 |
7059144 | 2016-05-14 15:50:27 89.6 | J20367 | 7059144 |
7045551 | 2016-05-14 15:15:15 89.2 | J20366 | 7045551 |
7056243 | 2016-05-14 15:25:34 85.2 | J20358 | 7056245 |
7042652 | 2016-05-14 15:18:33 83.9 | J20160 | 7042652 |
7039753 | 2016-05-14 11:48:16 87 | J20158 | 7039753 |
7036854 | 2016-05-14 15:18:35 87.5 | J20128 | 7036854 |
7033955 | 2016-05-14 15:20:45 83.4 | 9662 | 7033955 |
-------------------------------------------------------------+
the problem is why for example in record related to Serial N10088 the ID is "7031053", but MAX(ID) is "7031056"? or also for J20358?
each row must show last record of each item but in my output it is not true!
If you want the row with the max value, then you need a join or some other mechanism.
Here is a simple way using a correlated subquery:
select s.*
from store s
where s.id = (
select max(s2.id)
from store s2
where s2.serial = s.serial
);
You query uses a (mis)feature of SQL Server that generates lots of confusion and is not particularly helpful: you have columns in the select that are not in the group by. What value do these get?
Well, in most databases the answer is simple: the query generates an error as ANSI specifies. MySQL pulls the values for the additional columns from indeterminate matching rows. That is rarely what the writer of the query intends.
For performance, add an index on store(serial, id).
try this one.
SELECT MAX(id), tbl.*
FROM store tbl
GROUP BY Serial
You can try with this also...
SELECT ID,Calendar,Serial
FROM store s0
where ID = (
SELECT MAX(id)
FROM store s1
WHERE s1.serial = s0.serial
);

MySQL query - only exact result or every choice

I've a query that I need some help with -
As part of a form I've got a serial number field that is populated if there is a serial number, blank if it's not, or no result if it's an invalid serial number.
select *
from cust_site_contract as cs
where cs.serial_no = 'C20050' or (cs.serial_no <> 'C20050' and if(cs.serial_no = 'C20050',1,0)=0)
limit 10;
Here's a sample of the regular data:
+----------------------+-----------+-----------+-----------
| idcust_site_contract | system_id | serial_no | end_date
+----------------------+-----------+-----------+-----------
| 561315 | SH001626 | C19244 | 2009-12-21
| 561316 | SH001626 | C19244 | 2010-06-30
| 561317 | SH002125 | C19671 | 2010-05-31
| 561318 | SH001766 | C14781 | 2010-09-25
| 561319 | SH001766 | C14781 | 2011-02-15
| 561320 | SH002059 | C19020 | 2008-07-09
| 561321 | SH002639 | C18889 | 2008-03-31
| 561322 | SH002639 | C18889 | 2008-06-30
| 561323 | SH002715 | C20051 | 2010-04-30
| 561324 | SH002719 | C20057 | 2010-04-30
And an exact result would look something like this:
| 561487 | SH002837 | C20050 | 2012-07-04
I was writing this as a subquery so I could match the system_ids to customer and contract names, but realised I was getting garbage pretty early on.
I'm tempted to try and simplify it by saying the third case might not hold true (i.e. if it's an invalid serial number, allow the choice of any customer name and simply flag it in the data)
Has anyone got any ideas of where I'm going wrong? The combination of conditions is clearly wrong, and I can't work out how to make each side of the or statement mutually exclusive
Even if I try to evaluate only the if(sn = 'blah') I get the wrong result for obvious reasons, but can't think of a sane way to express it.
Many thanks
Scott
If there is is no contract with a serial number of C20050, this query will return all rows, otherwise, it will return only one row where serial_no is C20050:
SELECT a.*
FROM cust_site_contract a
INNER JOIN
(
SELECT COUNT(*) AS rowexists
FROM cust_site_contract
WHERE serial_no = 'C20050'
) b ON b.rowexists = 0
UNION ALL
(
SELECT *
FROM cust_site_contract
WHERE serial_no = 'C20050'
LIMIT 1
)
If you just write the query as below you will get blank if doesn't exists or it's an invalid serial number.
select cs.serial_no from cust_site_contract as cs where cs.serial_no = 'C20050'