MYSQL - displaying query result counts without losing actual result entries - mysql

Sorry if this is a really dumb question but I'm not too familiar with MYSQL syntax.
I've historically been running:
USE hg19;
SELECT DISTINCT (name2), txStart, txEnd
FROM refGene
WHERE name2 LIKE '[genename]';
which would output all entries and it was fine, except if I was looking for an entry that didn't exist, I would get a blank (which just so happened to be the same result if I disconnected from the server). This was leading to a bunch of downstream issues when I couldn't actually detect if an entry didn't exist vs my internet disconnected me.
So instead I decided to try:
USE hg19;
SELECT *, count(*) AS results
FROM (
SELECT DISTINCT (name2), txStart, txEnd
FROM refGene
WHERE name2 LIKE 'TP53'
) a;
This would now give me a 0 for results if it didn't exist (and if it didn't connect it'd remain blank). However, now for whatever reason it only displays one entry no matter what (If I query for TP53 for example, it should have two distinct entries -> however, it will give me results:2 but only display one of them). Is there a way around this? I would still like to have it displaying all distinct results.

COUNT() is a aggregate function that works on groups of rows. Without a GROUP BY clause only MySQL accepts such a statement and will return arbitrary values in the not aggregated columns - and return just one row, as you've seen.
To get your desired result, you only got to invert your logic and use a LEFT JOIN
SELECT
a.results,
b.*
FROM
(SELECT COUNT(*) results FROM refGene r1 WHERE a.name2 LIKE 'TP53') a
LEFT JOIN (
SELECT
*
FROM
refGene r2
WHERE
name2 LIKE 'TP53'
) b
ON
a.result IS NOT NULL;
If your "main" query returns no row there will be a 0 (zero) in the result column an NULL values in the columns of your "main" query.

Related

Mysql DISTINCT with more than one column (remove duplicates)

My database is called: (training_session)
I try to print out some information from my data, but I do not want to have any duplicates. I do get it somehow, may someone tell me what I do wrong?
SELECT DISTINCT athlete_id AND duration FROM training_session
SELECT DISTINCT athlete_id, duration FROM training_session
It works perfectly if i use only one column, but when I add another. it does not work.
I think you misunderstood the use of DISTINCT.
There is big difference between using DISTINCT and GROUP BY.
Both have some sort of goal, but they have different purpose.
You use DISTINCT if you want to show a series of columns and never repeat. That means you dont care about calculations or group function aggregates. DISTINCT will show different RESULTS if you keep adding more columns in your SELECT (if the table has many columns)
You use GROUP BY if you want to show "distinctively" on a certain selected columns and you use group function to calculate the data related to it. Therefore you use GROUP BY if you want to use group functions.
Please check group functions you can use in this link.
https://dev.mysql.com/doc/refman/8.0/en/group-by-functions.html
EDIT 1:
It seems like you are trying to get the "latest" of a certain athlete, I'll assume the current scenario if there is no ID.
Here is my alternate solution:
SELECT a.athlete_id ,
( SELECT b.duration
FROM training_session as b
WHERE b.athlete_id = a.athlete_id -- connect
ORDER BY [latest column to sort] DESC
LIMIT 1
) last_duration
FROM training_session as a
GROUP BY a.athlete_id
ORDER BY a.athlete_id
This syntax is called IN-SELECT subquery. With the help of LIMIT 1, it shows the topmost record. In-select subquery must have 1 record to return or else it shows error.
MySQL's DISTINCT clause is used to filter out duplicate recordsets.
If your query was SELECT DISTINCT athlete_id FROM training_session then your output would be:
athlete_id
----------
1
2
3
4
5
6
As soon as you add another column to your query (in your example, the column called duration) then each record resulting from your query are unique, hence the results you're getting. In other words the query is working correctly.

"Query input must contain at least one table or query" error - nested queries, MS Access

Essentially we are sometimes (?) required to provide a reference to table even if I do not need it. E.g.
Query input must contain atleast one table or query
The question I have is why query q1 SELECT 1 executes just fine and gives me 1 row-1 column resultant table with 1 as the value but query q2 SELECT * FROM q1 produces the aforementioned error?
When I change q1 to SELECT 1 from dummy_table where dummy_table is a dummy table with dummy value, q2 runs fine.
Why q1's internal structure is in any way relevant to q2? q1 on its own works just fine. Does the q2 "unrolls" q1 and then compiles a statement
SELECT * FROM (SELECT 1) (which on its own produces the same error). Can I somehow force Access not to peek into parents' internal structure?
Also why SELECT * FROM (SELECT 1) gives an error and SELECT 1 works fine?
Access will only accept a query without a FROM clause when the "naked" SELECT is used in isolation, not as part of another query.
As you discovered, SELECT 1 is valid when it is the entire statement. But Access complains "Query input must contain at least one table or query" if you attempt to use that "naked" SELECT in another query such as SELECT q.* FROM (SELECT 1) AS q;
Similarly, although SELECT 1 and SELECT 2 are both valid when used alone, attempting to UNION them triggers the same error:
SELECT 1
UNION ALL
SELECT 2
There is no way to circumvent that error. As you also discovered, saving the "naked" SELECT as a named query, and then using the named query in another still triggers the error. It's just a limitation of the Access db engine, and it's been that way with every Access version I've used (>= Access 2000).
We ran into this issue today with error 3067 when trying to add some records to query results using a UNION query.
This doesn't work:
SELECT
UserID, UserName
FROM USERS
UNION SELECT
0, 'Add User...'
But as pointed out in the original question, if you use a valid table name, you can work around the issue.
Simply adjust the code to select a single record (TOP 1) from any table. Here I use MSysObjects because that should always exist and have records.
SELECT
UserID, UserName
FROM USERS
UNION SELECT TOP 1
0, 'Add User...'
FROM MSysObjects
Even though we are technically not using any data from the "dummy" table, it satisfies the compiler requirements for our union query and returns the desired results.

Union query to combine results of 3 tables

I am relatively new to coding so please have patience.
I am trying to combine data from 3 tables. I have managed to get some data back but it isn't what i need. Please see my example below.
select oid, rrnhs, idnam, idfnam, dte1, ta
as 'access type' from person
left join
(select fk_oid, min(dte), dte1, ta
from
((Select fk_oid,min(accessdate) as dte, accessdate1 as dte1, accesstype as ta
from vascularpdaccess
where isnull(accesstype)=false group by fk_oid)
union
(Select fk_oid, min(hpdate) as dte, hpdate as dte1, HPACCE as ta
from hdtreatment
where isnull(hptype)=false group by fk_oid)) as bla
group by fk_oid) as access
on person.oid=access.fk_oid
where person.rrnhs in (1000010000, 2000020000, 3000030000)
My understanding with a union is that the columns have to be of the same data type but i have two problems. The first is that accesstype and hpacce combine in to a the same column as expected, but i dont want to actually see the hpacce data (dont know if this is even possible).
Secondly, the idea of the query is to pull back a patients 'accesstype' date at the first date of hpdate.
I dont know if this even makes sens to you guys but hoping someone can help..y'all are usually pretty nifty!
Thanks in advance!
Mikey
All queries need to have the same number of columns in the SELECT statement. It looks like you first query has the max number of columns, so you will need to "pad" the other to have the same number of columns. You can use NULL as col to create the column with all null values.
To answer the question (I think) you were asking... for a UNION or UNION ALL set operation, you are correct: the number of columns and the datatypes of the columns returned must match.
But it is possible to return a literal as an expression in the SELECT list. For example, if you don't want to return the value of HPACCE column, you can replace that with a literal or a NULL. (If that column is character datatype (we can't tell from the information provided in the question), you could use (for example) a literal empty string '' AS ta in place of HPACCE AS ta.
SELECT fk_oid
, MIN(HPDATE) AS dte
, hpdate AS dte1
, NULL AS ta
-- -------------------- ^^^^
FROM hdtreatment
Some other notes:
The predicate ISNULL(foo)=FALSE can be more simply expressed as foo IS NOT NULL.
The UNION set operator will remove duplicate rows. If that's not necessary, you could use a UNION ALL set operator.
The subsequent GROUP BY fk_oid operation on the inline view bla is going to collapse rows; but it's indeterminate which row the values from dte1 and ta will be from. (i.e. there is no guarantee those values will be from the row that had the "minimum" value of dte.) Other databases will throw an exception/error with this statement, along the lines of "non-aggregate in SELECT list not in GROUP BY". But this is allowed (without error or warning) by a MySQL specific extension to GROUP BY behavior. (We can get MySQL to behave like other databases and throw an error of we specify a value for sql_mode that includes ONLY_FULL_GROUP_BY (?).)
The predicate on the outer query doesn't get pushed down into the inline view bla. The view bla is going to materialized for every fk_oid, and that could be a performance issue on large sets.
Also, qualifying all column references would make the statement easier to read. And, that will also insulate the statement from throwing an "ambiguous column" error in the future, when a column named (e.g.) ta or dte1 is added to the person table.

MySQL : Count returning double the number of entries when using distinct

So I do a count like so
select distinct count(prod.id) from product as prod....
I get back 175590
I do a select like so
select distinct prod.id from product as prod.... (rest of the query is exactly the same)
and I limit it. Now if I limit the query to return anything over the half way point it returns nothing. It appears as if count is returning double the number of entries each time.
Does anyone know of anything that may be causing this?
Thanks
Tracey
The DISTINCT keyword tells MySQL to strip the duplicate rows from the result set. Because SELECT COUNT(prod.id) returns a single row (I guess this, I cannot tell for sure until I see the complete query), adding DISTINCT in front of COUNT() does not change its behaviour in any way.
What you probably want is SELECT COUNT(DISTINCT prod.id) and that's a totally different thing. It removes the duplicate values of prod.id before counting them.
Your first query is counting how many prod.id's there are.
Your second query is showing all distinct prod.id's.
This is quite different.
If you were to do the second query without the distinct key word the number would be the same.

MySQL: Include COUNT of SELECT Query Results as a Column (Without Grouping)

I have a simple report sending framework that basically does the following things:
It performs a SELECT query, it makes some text-formatted tables based on the results, it sends an e-mail, and it performs an UPDATE query.
This system is a generalization of an older one, in which all of the operations were hard coded. However, in pushing all of the logic of what I'd like to do into the SELECT query, I've run across a problem.
Before, I could get most of the information for my text tables by saying:
SELECT Name, Address FROM Databas.Tabl WHERE Status='URGENT';
Then, when I needed an extra number for the e-mail, also do:
SELECT COUNT(*) FROM Databas.Tabl WHERE Status='URGENT' AND TimeLogged='Noon';
Now, I no longer have the luxury of multiple SELECT queries. What I'd like to do is something like:
SELECT Tabl.Name, Tabl.Address, COUNT(Results.UID) AS Totals
FROM Databas.Tabl
LEFT JOIN Databas.Tabl Results
ON Tabl.UID = Results.UID
AND Results.TimeLogged='Noon'
WHERE Status='URGENT';
This, at least in my head, says to get a total count of all the rows that were SELECTed and also have some conditional.
In reality, though, this gives me the "1140 - Mixing of GROUP columns with no GROUP columns illegal if no GROUP BY" error. The problem is, I don't want to GROUP BY. I want this COUNT to redundantly repeat the number of results that SELECT found whose TimeLogged='Noon'. Or I want to remove the AND clause and include, as a column in the result of the SELECT statement, the number of results that that SELECT statement found.
GROUP BY is not the answer, because that causes it to get the COUNT of only the rows who have the same value in some column. And COUNT might not even be the way to go about this, although it's what comes to mind. FOUND_ROWS() won't do the trick, since it needs to be part of a secondary query, and I only get one (plus there's no LIMIT involved), and ROW_COUNT() doesn't seem to work since it's a SELECT statement.
I may be approaching it from the wrong angle entirely. But what I want to do is get COUNT-type information about the results of a SELECT query, as well as all the other information that the SELECT query returned, in one single query.
=== Here's what I've got so far ===
SELECT Tabl.Name, Tabl.Address, Results.Totals
FROM Databas.Tabl
LEFT JOIN (SELECT COUNT(*) AS Totals, 0 AS Bonus
FROM Databas.Tabl
WHERE TimeLogged='Noon'
GROUP BY NULL) Results
ON 0 = Results.Bonus
WHERE Status='URGENT';
This does use sub-SELECTs, which I was initially hoping to avoid, but now realize that hope may have been foolish. Plus it seems like the COUNTing SELECT sub-queries will be less costly than the main query since the COUNT conditionals are all on one table, but the real SELECT I'm working with has to join on multiple different tables for derived information.
The key realizations are that I can GROUP BY NULL, which will return a single result so that COUNT(*) will actually catch everything, and that I can force a correlation to this column by just faking a Bonus column with 0 on both tables.
It looks like this is the solution I will be using, but I can't actually accept it as an answer until tomorrow. Thanks for all the help.
SELECT Tabl.Name, Tabl.Address, Results.Totals
FROM Databas.Tabl
LEFT JOIN (SELECT COUNT(*) AS Totals, 0 AS Bonus
FROM Databas.Tabl
WHERE TimeLogged='Noon'
GROUP BY NULL) Results
ON 0 = Results.Bonus
WHERE Status='URGENT';
I figured this out thanks to ideas generated by multiple answers, although it's not actually the direct result of any one. Why this does what I need has been explained in the edit of the original post, but I wanted to be able to resolve the question with the proper answer in case anyone else wants to perform this silly kind of operation. Thanks to all who helped.
You could probably do a union instead. You'd have to add a column to the original query and select 0 in it, then UNION that with your second query, which returns a single column. To do that, the second query must also select empty fields to match the first.
SELECT Cnt = 0, Name, Address FROM Databas.Tabl WHERE Status='URGENT'
UNION ALL
SELECT COUNT(*) as Cnt, Name='', Address='' FROM Databas.Tabl WHERE Status='URGENT' AND TimeLogged='Noon';
It's a bit of a hack, but what you're trying to do isn't ideal...
Does this do what you need?
SELECT Tabl.Name ,
Tabl.Address ,
COUNT(Results.UID) AS GrandTotal,
COUNT(CASE WHEN Results.TimeLogged='Noon' THEN 1 END) AS NoonTotal
FROM Databas.Tabl
LEFT JOIN Databas.Tabl Results
ON Tabl.UID = Results.UID
WHERE Status ='URGENT'
GROUP BY Tabl.Name,
Tabl.Address
WITH ROLLUP;
The API you're using to access the database should be able to report to you how many rows were returned - say, if you're running perl, you could do something like this:
my $sth = $dbh->prepare("SELECT Name, Address FROM Databas.Tabl WHERE Status='URGENT'");
my $rv = $sth->execute();
my $rows = $sth->rows;
Grouping by Tabl.id i dont believe would mess up the results. Give it a try and see if thats what you want.