MYSQL rows to columns using CASE, GROUP BY... what about different versions? - mysql

This is my first post here.
My question is similar to a previous thread albeit different:
mysql converting multiple rows into columns in a single row
What I have is really a large form. There are many forms (sheets, really) and each has the same setup. Each form has labels and values, but the values in the forms can be changed and the forms only display the ''latest'' values. The database has a few tables but those important here are the field_labels and the field_values. These two are linked as one might suspect and, the field_value table has a ''date'' column.
Now, what I wan't to do is to select the field_label.id, and the latest value (field_value.fv_value). First I thought this might work fine with CASE but the problem is, CASE stops searching the table immediately after it finds a hit that matches, I want to select the latest hit, not just the first one matched.
The only good idea I had so far is to use a subquery and reform the value table by ordering it first by the (linked) id of the labels, and then by the ''date'' of the value. Here's what I got
SELECT T.msdsid,
field_label.id,
(CASE WHEN field_label.id = 1 THEN T.fv_value ELSE NULL END) AS value
FROM (SELECT * FROM field_value ORDER BY field_value.fl_id,field_value.date DESC) AS T
LEFT JOIN field_label ON(T.fl_id=field_label.id)
GROUP BY T.refid;
Now, this does do exactly what I want, but... is there a better way?
Thanks in advance.

This query will show you the latest values (record) for each field_value.fl_id:
SELECT fv1.* FROM field_value fv1
JOIN (SELECT fl_id, MAX(date) date FROM field_value GROUP BY fl_id) fv2
ON fv1.fl_id = fv2.fl_id AND fv1.date = fv2.date;
Try this query, play with it, and add it into your query.

Related

Mysql DISTINCT with more than one column (remove duplicates)

My database is called: (training_session)
I try to print out some information from my data, but I do not want to have any duplicates. I do get it somehow, may someone tell me what I do wrong?
SELECT DISTINCT athlete_id AND duration FROM training_session
SELECT DISTINCT athlete_id, duration FROM training_session
It works perfectly if i use only one column, but when I add another. it does not work.
I think you misunderstood the use of DISTINCT.
There is big difference between using DISTINCT and GROUP BY.
Both have some sort of goal, but they have different purpose.
You use DISTINCT if you want to show a series of columns and never repeat. That means you dont care about calculations or group function aggregates. DISTINCT will show different RESULTS if you keep adding more columns in your SELECT (if the table has many columns)
You use GROUP BY if you want to show "distinctively" on a certain selected columns and you use group function to calculate the data related to it. Therefore you use GROUP BY if you want to use group functions.
Please check group functions you can use in this link.
https://dev.mysql.com/doc/refman/8.0/en/group-by-functions.html
EDIT 1:
It seems like you are trying to get the "latest" of a certain athlete, I'll assume the current scenario if there is no ID.
Here is my alternate solution:
SELECT a.athlete_id ,
( SELECT b.duration
FROM training_session as b
WHERE b.athlete_id = a.athlete_id -- connect
ORDER BY [latest column to sort] DESC
LIMIT 1
) last_duration
FROM training_session as a
GROUP BY a.athlete_id
ORDER BY a.athlete_id
This syntax is called IN-SELECT subquery. With the help of LIMIT 1, it shows the topmost record. In-select subquery must have 1 record to return or else it shows error.
MySQL's DISTINCT clause is used to filter out duplicate recordsets.
If your query was SELECT DISTINCT athlete_id FROM training_session then your output would be:
athlete_id
----------
1
2
3
4
5
6
As soon as you add another column to your query (in your example, the column called duration) then each record resulting from your query are unique, hence the results you're getting. In other words the query is working correctly.

Not selecting duplicates in join / where query

I've been trying to learn MySQL, and I'm having some trouble creating a join query to not select duplicates.
Basically, here's where I'm at :
SELECT atable.phonenumber, btable.date
FROM btable
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
However, in my database, there is the possibility of having duplicate rows in column atable.phonenumber.
For example (added asterisks for clarity)
phonenumber | date
-------------|-----------
*555-681-2105 | 2015-08-12
555-425-5161 | 2015-08-15
331-484-7784 | 2015-08-17
*555-681-2105 | 2015-08-25
.. and so on.
I tried using SELECT DISTINCT but that doesn't work. I also was looking through other solutions which recommended GROUP BY, but that threw an error, most likely because of my WHERE clause and condition. Not really sure how I can easily accomplish this.
DISTINCT applies to the whole row being returned, essentially saying "I want only unique rows" - any row value may participate in making the row unique
You are getting phone numbers duplicated because you're only looking at the column in isolation. The database is looking at phone number and also date. The rows you posted have different dates, and these hence cause the rows to be different
I suggest you do as the commenter recommended and decide what you want to do with the dates. If you want the latest date for a phone number, do this:
SELECT atable.phonenumber, max(btable.date)
FROM battle
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
GROUP BY atable.phonenumber
When you write a query that uses grouping, you will get a set of rows where there is only one set of value combinations for anything that is in the group by list. In this case, only unique phone numbers. But, because you want other values as well (I.e. Date) you MUST use what's called an aggregate function, to specify what you want to do with all the various values that aren't part of the unique set. Sometimes it will be MAX or MIN, sometimes it will be SUM, COUNT, AVG and so on.
if you're familiar with hash tables or dictionaries from elsewhere in programming, this is what a group by is: it maps a set of values (a key) to a list of rows that have those key values, and then the aggregating function is applied to any of the values in the list associated with the key
The simple rule when using group by (and one that MySQL will do implicitly for you) is to write queries thus:
SELECT
List,
of,
columns,
you,
want,
in,
unique,
combination,
FN(List),
FN(of),
FN(columns),
FN(you),
FN(want),
FN(aggregating)
FROM table
GROUP BY
List,
of,
columns,
you,
want,
in,
unique,
combination
i.e. You can copy paste from your select list to your group list. MySQL does this implicitly for you if you don't do it (i.e. If you use one or more aggregate functions like max in your select list, but forget or omit the group by clause- it will take everything that isn't in an agggregate function and run the grouping as if you'd written it). Whether group by is hence largely redundant is often debated, but there do exist other things you can do with a group by, such as rollup, cube and grouping sets. Also you can group on a column, if that column is used in a deterministic function, without having to group on the result of he deterministic function. Whether there is any point to doing so is a debate for another time :)
You should add GROUP BY, and an aggregate to the date field, something like this:
SELECT atable.phonenumber, MAX(btable.date)
FROM btable
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
GROUP BY atable.phonenumber
This will return the maximum date, hat is the latest date...

Getting different results from group by and distinct

this is my first post here since most of the time I already found a suitable solution :)
However this time nothing seems to help properly.
Im trying to migrate information from some mysql Database I have just read-only access to.
My problem is similar to this one: Group by doesn't give me the newest group
I also need to get the latest information out of some tables but my tables have >300k entries therefore checking whether the "time-attribute-value" is the same as in the subquery (like suggested in the first answer) would be too slow (once I did "... WHERE EXISTS ..." and the server hung up).
In addition to that I can hardly find the important information (e.g. time) in a single attribute and there never is a single primary key.Until now I did it like it was suggested in the second answer by joining with subquery that contains latest "time-attribute-entry" and some primary keys but that gets me in a huge mess after using multiple joins and unions with the results.
Therefore I would prefer using the having statement like here: Select entry with maximum value of column after grouping
But when I tried it out and looked for a good candidate as the "time-attribute" I noticed that this queries give me two different results (more = 39721, less = 37870)
SELECT COUNT(MATNR) AS MORE
FROM(
SELECT DISTINCT
LAB_MTKNR AS MATNR,
LAB_STG AS FACH,
LAB_STGNR AS STUDIENGANG
FROM
FKT_LAB
) AS TEMP1
SELECT COUNT(MATNR) AS LESS
FROM(
SELECT
LAB_MTKNR AS MATNR,
LAB_STG AS FACH,
LAB_STGNR AS STUDIENGANG,
LAB_PDATUM
FROM
FKT_LAB
GROUP BY
LAB_MTKNR,
LAB_STG,
LAB_STGNR
HAVING LAB_PDATUM = MAX(LAB_PDATUM)
)AS TEMP2
Although both are applied to the same table and use "GROUP BY" / "SELECT DISTINCT" on the same entries.
Any ideas?
If nothing helps and I have to go back to my mess I will use string variables as placeholders to tidy it up but then I lose the overview of how many subqueries, joins and unions I have in one query... how many temproal tables will the server be able to cope with?
Your second query is not doing what you expect it to be doing. This is the query:
SELECT COUNT(MATNR) AS LESS
FROM (SELECT LAB_MTKNR AS MATNR, LAB_STG AS FACH, LAB_STGNR AS STUDIENGANG, LAB_PDATUM
FROM FKT_LAB
GROUP BY LAB_MTKNR, LAB_STG, LAB_STGNR
HAVING LAB_PDATUM = MAX(LAB_PDATUM)
) TEMP2;
The problem is the having clause. You are mixing an unaggregated column (LAB_PDATUM) with an aggregated value (MAX(LAB_PDATAUM)). What MySQL does is choose an arbitrary value for the column and compare it to the max.
Often, the arbitrary value will not be the maximum value, so the rows get filtered. The reference you give (although an accepted answer) is incorrect. I have put a comment there.
If you want the most recent value, here is a relatively easy way:
SELECT COUNT(MATNR) AS LESS
FROM (SELECT LAB_MTKNR AS MATNR, LAB_STG AS FACH, LAB_STGNR AS STUDIENGANG,
max(LAB_PDATUM) as maxLAB_PDATUM
FROM FKT_LAB
GROUP BY LAB_MTKNR, LAB_STG, LAB_STGNR
) TEMP2;
It does not, however, affect the outer count.

RoR selecting newest "distinct by column" rows

I have a table which I use to log item price change over time.
I'm trying to write a method which grabs the entire set of items (without duplicates), together with their latest prices.
That means that a row with an item_id of 2 may appear several times inside my table, and a row with an item_id of 3 may appear several times inside the table etc', but the result should only include them once, with their latest price
I'm trying to figure out a way (without using Item.find_by_sql() if possible), to return the entire set of items and their latest prices.
Currently I have the following:
SELECT * FROM
(SELECT * FROM item_logs
ORDER BY created_at DESC) inner_table
GROUP BY item_id
It does work, but it seems wrong to do it like this, I guess i'm looking for a more elegant way to do this, since current implementation requires me to use find_by_sql which is not very flexible.
not sure it's any better, but another option would be use joins:
ItemLog.joins(
'join (select item_id, max(created_at) as created_at from item_logs group by 1)
as i on i.item_id = item_logs.item_id and i.created_at = item_logs.created_at'
)
longer than your find_by_sql solution, could be a more expensive query on your database, but keeps the result as an active record relation so you can chain other methods on.

MySQL: Include COUNT of SELECT Query Results as a Column (Without Grouping)

I have a simple report sending framework that basically does the following things:
It performs a SELECT query, it makes some text-formatted tables based on the results, it sends an e-mail, and it performs an UPDATE query.
This system is a generalization of an older one, in which all of the operations were hard coded. However, in pushing all of the logic of what I'd like to do into the SELECT query, I've run across a problem.
Before, I could get most of the information for my text tables by saying:
SELECT Name, Address FROM Databas.Tabl WHERE Status='URGENT';
Then, when I needed an extra number for the e-mail, also do:
SELECT COUNT(*) FROM Databas.Tabl WHERE Status='URGENT' AND TimeLogged='Noon';
Now, I no longer have the luxury of multiple SELECT queries. What I'd like to do is something like:
SELECT Tabl.Name, Tabl.Address, COUNT(Results.UID) AS Totals
FROM Databas.Tabl
LEFT JOIN Databas.Tabl Results
ON Tabl.UID = Results.UID
AND Results.TimeLogged='Noon'
WHERE Status='URGENT';
This, at least in my head, says to get a total count of all the rows that were SELECTed and also have some conditional.
In reality, though, this gives me the "1140 - Mixing of GROUP columns with no GROUP columns illegal if no GROUP BY" error. The problem is, I don't want to GROUP BY. I want this COUNT to redundantly repeat the number of results that SELECT found whose TimeLogged='Noon'. Or I want to remove the AND clause and include, as a column in the result of the SELECT statement, the number of results that that SELECT statement found.
GROUP BY is not the answer, because that causes it to get the COUNT of only the rows who have the same value in some column. And COUNT might not even be the way to go about this, although it's what comes to mind. FOUND_ROWS() won't do the trick, since it needs to be part of a secondary query, and I only get one (plus there's no LIMIT involved), and ROW_COUNT() doesn't seem to work since it's a SELECT statement.
I may be approaching it from the wrong angle entirely. But what I want to do is get COUNT-type information about the results of a SELECT query, as well as all the other information that the SELECT query returned, in one single query.
=== Here's what I've got so far ===
SELECT Tabl.Name, Tabl.Address, Results.Totals
FROM Databas.Tabl
LEFT JOIN (SELECT COUNT(*) AS Totals, 0 AS Bonus
FROM Databas.Tabl
WHERE TimeLogged='Noon'
GROUP BY NULL) Results
ON 0 = Results.Bonus
WHERE Status='URGENT';
This does use sub-SELECTs, which I was initially hoping to avoid, but now realize that hope may have been foolish. Plus it seems like the COUNTing SELECT sub-queries will be less costly than the main query since the COUNT conditionals are all on one table, but the real SELECT I'm working with has to join on multiple different tables for derived information.
The key realizations are that I can GROUP BY NULL, which will return a single result so that COUNT(*) will actually catch everything, and that I can force a correlation to this column by just faking a Bonus column with 0 on both tables.
It looks like this is the solution I will be using, but I can't actually accept it as an answer until tomorrow. Thanks for all the help.
SELECT Tabl.Name, Tabl.Address, Results.Totals
FROM Databas.Tabl
LEFT JOIN (SELECT COUNT(*) AS Totals, 0 AS Bonus
FROM Databas.Tabl
WHERE TimeLogged='Noon'
GROUP BY NULL) Results
ON 0 = Results.Bonus
WHERE Status='URGENT';
I figured this out thanks to ideas generated by multiple answers, although it's not actually the direct result of any one. Why this does what I need has been explained in the edit of the original post, but I wanted to be able to resolve the question with the proper answer in case anyone else wants to perform this silly kind of operation. Thanks to all who helped.
You could probably do a union instead. You'd have to add a column to the original query and select 0 in it, then UNION that with your second query, which returns a single column. To do that, the second query must also select empty fields to match the first.
SELECT Cnt = 0, Name, Address FROM Databas.Tabl WHERE Status='URGENT'
UNION ALL
SELECT COUNT(*) as Cnt, Name='', Address='' FROM Databas.Tabl WHERE Status='URGENT' AND TimeLogged='Noon';
It's a bit of a hack, but what you're trying to do isn't ideal...
Does this do what you need?
SELECT Tabl.Name ,
Tabl.Address ,
COUNT(Results.UID) AS GrandTotal,
COUNT(CASE WHEN Results.TimeLogged='Noon' THEN 1 END) AS NoonTotal
FROM Databas.Tabl
LEFT JOIN Databas.Tabl Results
ON Tabl.UID = Results.UID
WHERE Status ='URGENT'
GROUP BY Tabl.Name,
Tabl.Address
WITH ROLLUP;
The API you're using to access the database should be able to report to you how many rows were returned - say, if you're running perl, you could do something like this:
my $sth = $dbh->prepare("SELECT Name, Address FROM Databas.Tabl WHERE Status='URGENT'");
my $rv = $sth->execute();
my $rows = $sth->rows;
Grouping by Tabl.id i dont believe would mess up the results. Give it a try and see if thats what you want.