How to form a subquery - mysql

I think I need a subquery for this, and while I have read what subqueries are, I have not found help on how to write a subquery. I am interested in learning how to fish, but I also would like a fish soon, please :)
Simple, 1 table of data:
lastname, (found or not found boolean)
I want to generate some stats, across the whole alphabet, of who has been found.
Desired results:
A : 5 of 16 found, or about 31 percent
B : 2 of 4 found, or about 50 percent
C : 30 of 90 found, or about 30 percent
etc
I can form simple SQL, I need help with forming the subquery, if that's what is needed here.
I can write a query to list how many were found by the first letter of the last name:
select substring(lastname,1,1) as lastinitial, count(*) from members where found !=0 and found is not null group by lastinitial;
I can write a query to list how many total there are, by last initial:
select substring(lastname,1,1) as lastinitial, count(*) from members group by lastinitial;
But how do I combine the two queries to yield the desired result? Thanks for the help.

You probably don't need sub-query for this. The grouping can give you both found and not found for each name. Just add "found" to the grouping and you will get two records for each name, one for found and another for not found. You also don't need another query for the total, just add the found and not found together.
SELECT SUBSTRING(lastname,1,1) AS lastinitial,
(CASE WHEN found = 1 THEN 1 ELSE 0 END) AS found_val,
COUNT(lastname) AS found_count
FROM members
GROUP BY lastinitial, found_val;
If you want to have both of the found and not found in one row for each letter, try this:
SELECT found_list.lastinitial, found_count, not_found_count
FROM (
SELECT SUBSTRING(lastname,1,1) AS lastinitial, COUNT(lastname) AS found_count
FROM members
WHERE found = 1
GROUP BY lastinitial
) AS found_list,
(
SELECT SUBSTRING(lastname,1,1) AS lastinitial, COUNT(lastname) AS not_found_count
FROM members
WHERE found IS NULL OR found = 0
GROUP BY lastinitial
) AS not_found_list
WHERE found_list.lastinitial = not_found_list.lastinitial
As you can see, the first query is much shorter, more elegant, and also performs faster.

Related

MS Access count query does not produce wanted results

I have a table (tblExam) showing exam data score designed as follow:
Exam Name: String
Score: number(pecent)
Basically I am trying to pull the records by Exam name where the score are less than a specific amount (0.695 in my case).
I am using the following statement to get the results:
SELECT DISTINCTROW tblExam.name, Count(tblExam.name) AS CountOfName
FROM tblExam WHERE (((tblExam.Score)<0.695))
GROUP BY tblExam.name;
This works fine but does not display the exam that have 0 records more than 0.695; in other words I am getting this:
Exam Name count
firstExam 2
secondExam 1
thirdExam 3
The count of 0 and any exams with score above 0.695 do not show up. What I would like is something like this:
Exam Name count
firstExam 2
secondExam 1
thirdExam 3
fourthExam 0
fifthExam 0
sixthExam 2
.
..
.etc...
I hope that I am making sense here. I think that I need somekind of LEFT JOIN to display all of the exam name but I can not come up with the proper syntax.
It seems you want to display all name groups and, within each group, the count of Score < 0.695. So I think you should move < 0.695 from the WHERE to the Count() expression --- actually remove the WHERE clause.
SELECT
e.name,
Count(IIf(e.Score < 0.695, 1, Null)) AS CountOfName
FROM tblExam AS e
GROUP BY e.name;
That works because Count() counts only non-Null values. You could use Sum() instead of Count() if that seems clearer:
Sum(IIf(e.Score < 0.695, 1, 0)) AS CountOfName
Note DISTINCTROW is not useful in a GROUP BY query, because the grouping makes the rows unique without it. So I removed DISTINCTROW from the query.
Do I detect a contradiction? The query calls for results <0.695 but your text says you are also looking for results >0.695. Perhaps I don't understand. Does this give you what you are looking for:
SELECT DISTINCTROW tblExam.ExamName, Count(tblExam.ExamName) AS CountOfExamName
FROM tblExam
WHERE (((tblExam.Score)<0.695 Or (tblExam.Score)>0.695))
GROUP BY tblExam.ExamName;

MySQL ORDER BY Column = value AND distinct?

I'm getting grey hair by now...
I have a table like this.
ID - Place - Person
1 - London - Anna
2 - Stockholm - Johan
3 - Gothenburg - Anna
4 - London - Nils
And I want to get the result where all the different persons are included, but I want to choose which Place to order by.
For example. I want to get a list where they are ordered by LONDON and the rest will follow, but distinct on PERSON.
Output like this:
ID - Place - Person
1 - London - Anna
4 - London - Nils
2 - Stockholm - Johan
Tried this:
SELECT ID, Person
FROM users
ORDER BY FIELD(Place,'London'), Person ASC "
But it gives me:
ID - Place - Person
1 - London - Anna
4 - London - Nils
3 - Gothenburg - Anna
2 - Stockholm - Johan
And I really dont want Anna, or any person, to be in the result more then once.
This is one way to get the specified output, but this uses MySQL specific behavior which is not guaranteed:
SELECT q.ID
, q.Place
, q.Person
FROM ( SELECT IF(p.Person<=>#prev_person,0,1) AS r
, #prev_person := p.Person AS person
, p.Place
, p.ID
FROM users p
CROSS
JOIN (SELECT #prev_person := NULL) i
ORDER BY p.Person, !(p.Place<=>'London'), p.ID
) q
WHERE q.r = 1
ORDER BY !(q.Place<=>'London'), q.Person
This query uses an inline view to return all the rows in a particular order, by Person, so that all of the 'Anna' rows are together, followed by all the 'Johan' rows, etc. The set of rows for each person is ordered by, Place='London' first, then by ID.
The "trick" is to use a MySQL user variable to compare the values from the current row with values from the previous row. In this example, we're checking if the 'Person' on the current row is the same as the 'Person' on the previous row. Based on that check, we return a 1 if this is the "first" row we're processing for a a person, otherwise we return a 0.
The outermost query processes the rows from the inline view, and excludes all but the "first" row for each Person (the 0 or 1 we returned from the inline view.)
(This isn't the only way to get the resultset. But this is one way of emulating analytic functions which are available in other RDBMS.)
For comparison, in databases other than MySQL, we could use SQL something like this:
SELECT ROW_NUMBER() OVER (PARTITION BY t.Person ORDER BY
CASE WHEN t.Place='London' THEN 0 ELSE 1 END, t.ID) AS rn
, t.ID
, t.Place
, t.Person
FROM users t
WHERE rn=1
ORDER BY CASE WHEN t.Place='London' THEN 0 ELSE 1 END, t.Person
Followup
At the beginning of the answer, I referred to MySQL behavior that was not guaranteed. I was referring to the usage of MySQL User-Defined variables within a SQL statement.
Excerpts from MySQL 5.5 Reference Manual http://dev.mysql.com/doc/refman/5.5/en/user-variables.html
"As a general rule, other than in SET statements, you should never assign a value to a user variable and read the value within the same statement."
"For other statements, such as SELECT, you might get the results you expect, but this is not guaranteed."
"the order of evaluation for expressions involving user variables is undefined."
Try this:
SELECT ID, Place, Person
FROM users
GROUP BY Person
ORDER BY FIELD(Place,'London') DESC, Person ASC;
You want to use group by instead of distinct:
SELECT ID, Person
FROM users
GROUP BY ID, Person
ORDER BY MAX(FIELD(Place, 'London')), Person ASC;
The GROUP BY does the same thing as SELECT DISTINCT. But, you are allowed to mention other fields in clauses such as HAVING and ORDER BY.

Multiple LEFT JOINs to self with criteria to produce distribution

Although several . questions . come . close . to what I want (and as I write this stackoverflow has suggested several more, none of which quite capture my problem), I just don't seem to be able to find my way out of the SQL thicket.
I have a single table (let's call it the user_classification_fct) that has three fields: user, week, and class (e.g. user #1 in week #1 had a class of 'Regular User', while user #2 in week #1 has a class of 'Infrequent User'). (As an aside, I have implemented classes as INTs, but wanted to work with something legible in the form of VARCHAR while I sorted out the SQL.)
What I want to do is produce a summary report of how user behaviour is changing in aggregate along the lines of:
There were 50 users who were regular users in both week 1 and week 2 and ...
There were 10 users who were regular users in week 1, but fell to infrequent users in week 2
There were 5 users who went from infrequent in week 1 to regular in week 2
... and so on ...
What makes this slightly more tricky is that user #5000 might only have started using the service in week 2 and so have no record in the table for week 1. In that case, I'd want to see a NULL FOR week 1 and a 'Regular User' (or whatever is appropriate) for week 2. The size of the table is not strictly relevant, but with 5 weeks' worth of data I'm looking at 42 million rows, so I do not want to insert 4 'fake' rows of 'Non-User' for someone who only starts using the service in week 5 or something.
To me this seems rather obviously like a case for using a LEFT or RIGHT JOIN in MySQL because the NULL should come through on the 'missing' record.
I have tried using both WHERE and AND conditions on the LEFT JOINs and am just not getting the 'right' answers (i.e. I either get no NULL values at all in the case of trailing WHERE conditions, or my counts are far, far too high for the number of distinct users (which is ca. 10 million) in the case of the AND constraints used below). Here's was my last attempt to get this working:
SELECT
ucf1.class_nm AS 'Class in 2012/15',
ucf2.class_nm AS 'Class in 2012/16',
ucf3.class_nm AS 'Class in 2012/17',
ucf4.class_nm AS 'Class in 2012/18',
ucf5.class_nm AS 'Class in 2012/19',
count(*) AS 'Count'
FROM
user_classification_fct ucf5
LEFT JOIN user_classification_fct ucf4
ON ucf5.user_id=ucf4.user_id
AND ucf5.week_key=201219 AND ucf4.week_key=201218
LEFT JOIN user_classification_fct ucf3
ON ucf4.user_id=ucf3.user_id
AND ucf4.week_key=201218 AND ucf3.week_key=201217
LEFT JOIN user_classification_fct ucf2
ON ucf3.user_id=ucf2.user_id
AND ucf3.week_key=201217 AND ucf2.week_key=201216
LEFT JOIN user_classification_fct ucf1
ON ucf2.user_id=ucf1.user_id
AND ucf2.week_key=201216 AND ucf1.week_key=201215
GROUP BY 1,2,3,4,5;
In looking at the various other questions on stackoverflow.com, it may well be that I need to perform the queries one-at-a-time and UNION the result sets together or use parentheses to chain them one-to-another, but those approaches are not ones that I'm familiar with (yet) and I can't even get a single LEFT JOIN (i.e. week 5 to week 1, dropping all the other weeks of data) to return something useful.
Any tips would be much, much appreciated and I would really appreciate suggestions that work in MySQL as switching database products is not an option.
You can do this with a group by. I would start by summarizing all the possible combinations for the five weeks as:
select c_201215, c_201216, c_201217, c_201218, c_201219,
count(*) as cnt
from (select user_id,
max(case when week_key=201215 then class_nm end) as c_201215,
max(case when week_key=201216 then class_nm end) as c_201216,
max(case when week_key=201217 then class_nm end) as c_201217,
max(case when week_key=201218 then class_nm end) as c_201218,
max(case when week_key=201219 then class_nm end) as c_201219
from user_classification_fct ucf
group by user_id
) t
group by c_201215, c_201216, c_201217, c_201218, c_201219
This may solve your problem. If you have 5 classes (including NULL), then this will return at most 5^5 or 3,125 rows.
This fits into Excel, so you can do the final processing there. Alternatively, you can still use the database.
If you want to extract pairs of weeks, then I would suggest putting the above into a temporary table, say "t". And doing a series of extracts with unions:
select *
from ((select '201215' as weekstart, c_201215, c_201216, sum(cnt) as cnt
from t
group by c_201215, c_201216
) union all
(select '201216', c_201216, c_201217, sum(cnt) as cnt
from t
group by c_201216, c_201217
) union all
(select '201217', c_201217, c_201218, sum(cnt) as cnt
from t
group by c_201217, c_201218
) union all
(select '201218', c_201218, c_201219, sum(cnt) as cnt
from t
group by c_201218, c_201219
)
) tg
order by 1, cnt desc
I suggest putting it in a subquery because you don't want to message around with common-subquery optimizations on such a large table. You'll get to your final answer by summarizing first, and then bringing the data together.

Complex MySQL COUNT query

Evening folks,
I have a complex MySQL COUNT query I am trying to perform and am looking for the best way to do it.
In our system, we have References. Each Reference can have many (or no) Income Sources, each of which can be validated or not (status). We have a Reference table and an Income table - each row in the Income table points back to Reference with reference_id
On our 'Awaiting' page (the screen that shows each Income that is yet to be validated), we show it grouped by Reference. So you may, for example, see Mr John Smith has 3 Income Sources.
We want it to show something like "2 of 3 Validated" beside each row
My problem is writing the query that figures this out!
What I have been trying to do is this, using a combination of PHP and MySQL to bridge the gap where SQL (or my knowledge) falls short:
First, select a COUNT of the number of incomes associated with each reference:
SELECT `reference_id`, COUNT(status) AS status_count
FROM (`income`)
WHERE `income`.`status` = 0
GROUP BY `reference_id`
Next, having used PHP to generate a WHERE IN clause, proceed to COUNT the number of confirmed references from these:
SELECT `reference_id`, COUNT(status) AS status_count
FROM (`income`)
WHERE `reference_id` IN ('8469', '78969', '126613', ..... etc
AND status = 1
GROUP BY `reference_id`
However this doesn't work. It returns 0 rows.
Any way to achieve what I'm after?
Thanks!
In MySQL, you can SUM() on a boolean expression to get a count of the rows where that expression is true. You can do this because MySQL treats true as the integer 1 and false as the integer 0.
SELECT `reference_id`,
SUM(`status` = 1) AS `validated_count`,
COUNT(*) AS `total_count`
FROM `income`
GROUP BY `reference_id`

Getting a sorted distinct list from mySQL

Goal
I'l like to get a list of unique FID's ordered by the the one which has most recently been changed. In this sample table it should return FIDs in the order of 150, 194, 122
Example Data
ID FID changeDate
----------------------------------------------
1 194 2010-04-01
2 122 2010-04-02
3 194 2010-04-03
4 150 2010-04-04
My Attempt
I thought distinct and order by would do the trick. I initially tried:
SELECT distinct `FID` FROM `tblHistory` WHERE 1 ORDER BY changeDate desc
# Returns 150, 122, 194
using GROUP BY has the same result. I'm just barely a SQL amateur, and I'm a bit hung up. What seems to be happening is the aggregating functions find the first occurrence of each and then perform the sort.
Is there a way I can get the result I want straight from mySQL or do I have to grab all the data and then sort it in the PHP?
This worked for me:
SELECT FID
FROM tblHistory
GROUP BY FID
ORDER BY MAX(changeDate) DESC;
Okay, been reading questions on the site since I asked this one, and came up with an answer that seems to work, although I hope perhaps someone else may shed light on a simpler way:
SELECT t1.FID, t1.CD
FROM (
SELECT FID, max(changeDate) as CD
FROM sorted
GROUP BY FID
) as t1
WHERE 1
order by t1.CD desc
Seems to get the results I expect. I didn't know subqueries existed until a few minuets ago. I'm a real SQL newbie.
select * from tbl A
where changeDate =
(select max(changeDate) from tbl where tbl.fid = A.fid)
order by changeDate desc