Join or SubQuery? - mysql

I've got a MySQL table that records the addon titles used on various websites, including a version number. For example:
AddonName | Website ID | Version
ZZZ 1 3.3
ZZZ 2 3.4
ZZZ 3 3.4
ZZZ 4 3.1
YYY 1 1.1
YYY 2 1.1
YYY 3 1.1
YYY 4 1.2
I'd like to create a query that lists a distinct list of AddonName, with details of the total count, count of all sites using the latest version, and counts of all sites using out of date versions.
i.e.:
Name | Total Addons | Up to Date | Out of Date
ZZZ 4 2 2
YYY 4 1 3
I can't figure out how to get this type of data returned, even though the information is all there. I tried using JOIN queries, but didn't have any success.
If it helps make things easier, I can add a 'latest' enum field to the table, to mark rows as up-to-date or out-of-date when the are imported.

Assuming max value as latest version.
Try this:
select t1.AddonName,
count(*) as total_Addon,
sum(case when t1.version=t2.version then 1 else 0 end) as up_to_date,
sum(case when t1.version!=t2.version then 1 else 0 end) as out_of_date
from table1 t1
inner join(
select AddonName,max(version) as version
from table1 group by AddonName
)t2 on t1.AddonName=t2.AddonName
group by t1.AddonName

Try:
SELECT your_table.AddonName,
COUNT(`Website ID`),
COUNT(IF(Version = your_table_max.max_version, 1, NULL)) AS `Up to Date`,
COUNT(IF(Version <> your_table_max.max_version, 1, NULL)) AS `Out of Date`
FROM your_table
INNER JOIN (SELECT MAX(Version) as max_version, AddonName
FROM your_table group by AddonName) your_table_max
ON your_table_max.AddonName = your_table.AddonName
GROUP BY your_table.AddonName;

Assuming the latest version is from the last column:
select t.name, count(*) as TotalAddons,
sum(t.version = tt.maxv) as UpToDate,
sum(t.version <> tt.maxv) as OutOfDate
from t join
(select name, max(version) as maxv
from t
group by name
) tt
on t.name = tt.name
group by t.name;
This calculates the maximum version number for each name in a subquery. It then uses that information for the outer aggregation.
This assumes that version is a number. If it is a string (so 1.10 > 1.2), then a similar approach is:
select t.name, count(*) as TotalAddons,
sum(t.version = t.maxv) as UpToDate,
sum(t.version <> t.maxv) as OutOfDate
from (select t.*,
(select version
from t t2
where t2.name = t.name
order by length(version) desc, version desc
limit 1
) as maxv
from t
) t
group by t.name;
Of course, this will also work for numbers as well.

try this this will solve your problem.
select AddonName,count(AddonName) as countAdd,(select count(Version)from test1 as t where t.AddonName = test1.AddonName and t.Version = max(test1.Version)),(select count(Version) from test1 as t where t.AddonName = test1.AddonName and t.Version = min(test1.Version))from test1 GROUP BY AddonName;

Once you add that Latest table, try following (after inserting your table names)
select
AddonName as Name,
count(*) as TotalAddons,
count(case TableName.Version when Latest.LatestVersion
then 1 else null end) as UpToDate,
TotalAddons-UpToDate as OutOfDate
from TableName join Latest
on TableName.AddonName = Latest.AddonName
group by AddonName

WITH cte
AS
(
SELECT t.AddonName, MAX(t.Version) AS latest_version
FROM Table1 t
GROUP BY t.AddonName
)
SELECT t.AddonName, COUNT(t.WebsiteID) AS total_addons,
SUM
(
CASE WHEN t.version = cte.latest_version
THEN 1 ELSE 0 END
) AS up_to_date,
SUM
(
CASE WHEN t.version <> cte.latest_version
THEN 1 ELSE 0 END
) AS out_of_date
FROM Table1 t
JOIN cte ON t.AddonName = cte.AddonName
GROUP BY t.AddonName

Related

Sort records based on string

Please consider the table below
Id F1 F2
---------------------------
1 Nima a
2 Eli a
3 Arian a
4 Ava b
5 Arsha b
6 Rozhan c
7 Zhina c
I want to display records by sorting COLUMN F2 to display one record from each string category (a,b,c) in order
Id F1 F2
---------------------------
1 Nima a
5 Arsha b
6 Rozhan c
2 Eli a
4 Ava b
7 Zhina c
3 Arian a
NOTE: a,b,c could be anything... it should take one record from one entry and then 2nd from 2nd entry.
I have used join, or group by records but no success.
MySQL version 5.7 – Syed Saqlain
SELECT id, f1, f2
FROM ( SELECT t1.id, t1.f1, t1.f2, COUNT(*) cnt
FROM test t1
JOIN test t2 ON t1.f2 = t2.f2 AND t1.id >= t2.id
GROUP BY t1.id, t1.f1, t1.f2 ) t3
ORDER BY cnt, f2;
https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=8138bd9ab5be36ba534a258d20b2e555
ROW_NUMBER() alternative for lower version of MYSQL. This query will work for version 5.5, 5.6 & 5.7.
-- MySQL (v5.7)
SELECT t.id, t.f1, t.f2
FROM (SELECT #row_no:=CASE WHEN #db_names=d.f2 THEN #row_no+1 ELSE 1 END AS row_number
, #db_names:= d.f2
, d.f2
, d.f1
, d.id
FROM test d,
(SELECT #row_no := 0,#db_names:='') x
ORDER BY d.f2) t
ORDER BY t.row_number, t.f2
Please check from url https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=02dbb0086a6dd7c926d55a690bffbd06
You can use window functions in the order by:
select t.*
from t
order by row_number() over (partition by f2 order by id),
f2;
The row_number() function (as used above) assigns a sequential number starting with 1 to each value of f2.
In older versions of MySQL, you can use a correlated subquery instead:
order by (select count(*) from t t2 where t2.f2 = t.f2 and t2.id <= t.id),
f2;
For performance, you want an index on (f2, id).

SQL: select a count distinct for entries with higher ID than previous and conditions met

Say that I have the following data in a table:
ID ENTRY NAME ENTRY_ID
6 REMOVE ALICE 333
5 ADD JOHN 333
4 REMOVE JOHN 222
3 ADD ALICE 222
2 ADD AMANDA 111
1 ADD JOHN 111
I am trying to get a count for all who has an "ADD" in their latest entry which is determined by having a higher number in the "ENTRY_ID".
So in this case the count I am looking for is going to be 2, as "JOHN" in 333 has an "ADD" and "AMANDA" in 111 has an "ADD" - and none of the two has a higher ENTRY_ID with "REMOVE", as is the case with "ALICE", who is not suppose to be counted as her newest (highest) ENTRY_ID is a "REMOVE".
How can I most easily achieve this?
You can use window functions:
select count(*)
from (
select t.*, row_number() over(partition by name order by entry_id) rn
from mytbale t
) t
where rn = 1 and entry = 'ADD'
Or using first_value():
select count(*) cnt
from (
select t.*, first_value(entry) over(partition by name order by entry_id desc) last_entry
from mytbale t
) t
where last_entry = 'ADD'
This requires MySQL 8.0. In earlier versions, one option uses a correlated subquery for filtering:
select count(*)
from mytable t
where
t.entry = 'ADD'
and t.entry_id = (select max(t1.entry_id) from mytable t1 where t1.name = t.name)
You can get the list using aggregation:
select name
from t
group by name
having max(entry_id) = max(case when entry = 'ADD' then entry_id end);
This gets all names where the entry id of "ADD" matches the last entry id.
You can use a subquery and get the count:
select count(*)
from (select name
from t
group by name
having max(entry_id) = max(case when entry = 'ADD' then entry_id end)
) t;
Otherwise, I might suggest a correlated subquery:
select count(*)
from t
where t.entry = 'ADD' and
t.entry_id = (select max(t2.entry_id) from t t2 where t2.name = t.name);

update a column in sql using two conditions

I have a table in mysql in which i want to update the records in a column called status to 'Duplicate'. I want to mark a record 'duplicate' on the basis of 2 conditions.
The records have duplicate customer id.
Those records which don't have the recent modified date will be marked duplicate.
I have tried the below code but it gives me an error:
UPDATE test_sql_duplicate
SET status = 'Duplicate'
WHERE test_sql_duplicate.modi_date NOT IN (
SELECT *, max(modi_date)
FROM test_sql_duplicate
GROUP BY cust_id
HAVING COUNT(cust_id > 1)
You could use a LEFT JOIN antipattern to identify the records to update. Basically we use a subquery to identify the latest record for each customer, then we use it to exclude the corresponding from the update query:
UPDATE test_sql_duplicate t
LEFT JOIN (
SELECT cust_id, MAX(modi_date) modi_date FROM test_sql_duplicate GROUP BY cust_id
) m ON m.cust_id = t.cust_id and m.modi_date = t.modi_date
SET t.status = 'Duplicate'
WHERE m.cust_id IS NULL
I suspect we may be wanting a query something like this:
UPDATE TEST_SQL_DUPLICATE t
JOIN (
SELECT n.cust_id
, MAX(n.modi_date) AS max_modi_date
FROM TEST_SQL_DUPLICATE n
GROUP
BY n.cust_id
HAVING COUNT(n.cust_id) > 1
) d
ON d.cust_id = t.cust_id
AND d.max_modi_date > t.modi_date
SET t.status = 'Duplicate'
Given sample data:
_row cust_id modi_date
------- ----------
1 444 2019-10-28
2 444 2019-10-28
3 444 2019-10-29
4 444 2019-10-30
5 444 2019-10-30
the query in this answer would flag rows 1 thru 3, set status column to to 'Duplicate'. Rows 4 and 5 would not be marked, because they both have the same (maximum) modi_date.
We would also achieve the same result if we omitted the HAVING clause from the inline view query.
Here is a quick and dirty way:
UPDATE test_sql_duplicate SET status = 'Duplicate'
WHERE cust_id IN (
SELECT t.id FROM (
SELECT
modi_date date,
cust_id id,
COUNT(*) OVER(PARTITION BY cust_id) cnt,
MAX(modi_date) OVER() maxDate
FROM test_sql_duplicate
) t
WHERE t.date < maxDate OR t.cnt > 1);

How do I Compare columns of records from the same table?

Here is my testing table data:
Testing
ID Name Payment_Date Fee Amt
1 BankA 2016-04-01 100 20000
2 BankB 2016-04-02 200 10000
3 BankA 2016-04-03 100 20000
4 BankB 2016-04-04 300 20000
I am trying to compare fields Name, Fee and Amt of each data records to see whether there are the same values or not. If they got the same value, I'd like to mark something like 'Y' to those record. Here is the expected result
ID Name Payment_Date Fee Amt SameDataExistYN
1 BankA 2016-04-01 100 20000 Y
2 BankB 2016-04-02 200 10000 N
3 BankA 2016-04-03 100 20000 Y
4 BankB 2016-04-04 300 20000 N
I have tried these two methods below. but I am looking for any other solutions so I can pick out the best one for my work.
Method 1.
select t.*, iif((select count(*) from testing where name=t.name and fee=t.fee and amt=t.amt)=1,'N','Y') as SameDataExistYN from testing t
Method 2.
select t.*, case when ((b.Name = t.Name)
and (b.Fee = t.Fee) and (b.Amt = t.Amt)) then 'Y' else 'N' end as SameDataExistYN
from testing t
left join ( select Name, Fee, Amt
from testing
Group By Name, Fee, Amt
Having count(*)>1 ) as b on b.Name = t.Name
and b.Fee = t.Fee
and b.Amt = t.Amt
There are several approaches, with differences in performance characteristics.
One option is to run a correlated subquery. This approach is best suited if you have a suitable index, and you are pulling a relatively small number of rows.
SELECT t.id
, t.name
, t.payment_date
, t.fee
, t.amt
, ( SELECT 'Y'
FROM testing s
WHERE s.name = t.name
AND s.fee = t.fee
AND s.amt = t.amt
AND s.id <> t.id
LIMIT 1
) AS SameDataExist
FROM testing t
WHERE ...
LIMIT ...
The correlated subquery in the SELECT list will return a Y when there is at least one "matching" row found. If no "matching" row is found, SameDataExist column will have a value of NULL. To convert the NULL to an 'N', you could wrap the subquery in an IFULL() function.
Your method 2 is a workable approach. The expression in the SELECT list doesn't need to do all those comparisons, those have already been done in the join predicates. All you need to know is whether a matching row was found... just testing one of the columns for NULL/NOT NULL is sufficient.
SELECT t.id
, t.name
, t.payment_date
, t.fee
, t.amt
, IF(s.name IS NOT NULL,'Y','N') AS SameDataExists
FROM testing t
LEFT
JOIN ( -- tuples that occur in more than one row
SELECT r.name, r.fee, r.amt
FROM testing r
GROUP BY r.name, r.fee, r.amt
HAVING COUNT(1) > 1
) s
ON s.name = t.name
AND s.fee = t.fee
AND s.amt = t.amt
WHERE ...
You could also make use of an EXISTS (correlated subquery)
Check this out
Select statement to find duplicates on certain fields
Not sure how to mark this as a dupe...
Here is another method, but I think you have to run tests on your data to find out which is best:
SELECT
t.*,
CASE WHEN EXISTS(
SELECT * FROM testing WHERE id <> t.id AND Name = t.Name AND Fee = t.Fee AND Amt = t.Amt
) THEN 'Y' ELSE 'N' END SameDataExistYN
FROM
testing t
;
Select t.name ,t.fee,t.amt,if(count(*)>1),'Y','N') from testing t group by t.name,t.fee,t.amt

Fetch 2nd Higest value from MySql DB with GROUP BY

I have a table tbl_patient and I want to fetch last 2 visit of each patient in order to compare whether patient condition is improving or degrading.
tbl_patient
id | patient_ID | visit_ID | patient_result
1 | 1 | 1 | 5
2 | 2 | 1 | 6
3 | 2 | 3 | 7
4 | 1 | 2 | 3
5 | 2 | 3 | 2
6 | 1 | 3 | 9
I tried the query below to fetch the last visit of each patient as,
SELECT MAX(id), patient_result FROM `tbl_patient` GROUP BY `patient_ID`
Now i want to fetch the 2nd last visit of each patient with query but it give me error
(#1242 - Subquery returns more than 1 row)
SELECT id, patient_result FROM `tbl_patient` WHERE id <(SELECT MAX(id) FROM `tbl_patient` GROUP BY `patient_ID`) GROUP BY `patient_ID`
Where I'm wrong
select p1.patient_id, p2.maxid id1, max(p1.id) id2
from tbl_patient p1
join (select patient_id, max(id) maxid
from tbl_patient
group by patient_id) p2
on p1.patient_id = p2.patient_id and p1.id < p2.maxid
group by p1.patient_id
id11 is the ID of the last visit, id2 is the ID of the 2nd to last visit.
Your first query doesn't get the last visits, since it gives results 5 and 6 instead of 2 and 9.
You can try this query:
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
union
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
where id not in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
GROUP BY patient_ID)
order by 1,2
SELECT id, patient_result FROM `tbl_patient` t1
JOIN (SELECT MAX(id) as max, patient_ID FROM `tbl_patient` GROUP BY `patient_ID`) t2
ON t1.patient_ID = t2.patient_ID
WHERE id <max GROUP BY t1.`patient_ID`
There are a couple of approaches to getting the specified resultset returned in a single SQL statement.
Unfortunately, most of those approaches yield rather unwieldy statements.
The more elegant looking statements tend to come with poor (or unbearable) performance when dealing with large sets. And the statements that tend to have better performance are more un-elegant looking.
Three of the most common approaches make use of:
correlated subquery
inequality join (nearly a Cartesian product)
two passes over the data
Here's an approach that uses two passes over the data, using MySQL user variables, which basically emulates the analytic RANK() OVER(PARTITION ...) function available in other DBMS:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM (
SELECT p.id
, p.patient_id
, p.visit_id
, p.patient_result
, #rn := if(#prev_patient_id = patient_id, #rn + 1, 1) AS rn
, #prev_patient_id := patient_id AS prev_patient_id
FROM tbl_patients p
JOIN (SELECT #rn := 0, #prev_patient_id := NULL) i
ORDER BY p.patient_id DESC, p.id DESC
) t
WHERE t.rn <= 2
Note that this involves an inline view, which means there's going to be a pass over all the data in the table to create a "derived tabled". Then, the outer query will run against the derived table. So, this is essentially two passes over the data.
This query can be tweaked a bit to improve performance, by eliminating the duplicated value of the patient_id column returned by the inline view. But I show it as above, so we can better understand what is happening.
This approach can be rather expensive on large sets, but is generally MUCH more efficient than some of the other approaches.
Note also that this query will return a row for a patient_id if there is only one id value exists for that patient; it does not restrict the return to just those patients that have at least two rows.
It's also possible to get an equivalent resultset with a correlated subquery:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
WHERE ( SELECT COUNT(1) AS cnt
FROM tbl_patients p
WHERE p.patient_id = t.patient_id
AND p.id >= t.id
) <= 2
ORDER BY t.patient_id ASC, t.id ASC
Note that this is making use of a "dependent subquery", which basically means that for each row returned from t, MySQL is effectively running another query against the database. So, this will tend to be very expensive (in terms of elapsed time) on large sets.
As another approach, if there are relatively few id values for each patient, you might be able to get by with an inequality join:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
LEFT
JOIN tbl_patients p
ON p.patient_id = t.patient_id
AND t.id < p.id
GROUP
BY t.id
, t.patient_id
, t.visit_id
, t.patient_result
HAVING COUNT(1) <= 2
Note that this will create a nearly Cartesian product for each patient. For a limited number of id values for each patient, this won't be too bad. But if a patient has hundreds of id values, the intermediate result can be huge, on the order of (O)n**2.
Try this..
SELECT id, patient_result FROM tbl_patient AS tp WHERE id < ((SELECT MAX(id) FROM tbl_patient AS tp_max WHERE tp_max.patient_ID = tp.patient_ID) - 1) GROUP BY patient_ID
Why not use simply...
GROUP BY `patient_ID` DESC LIMIT 2
... and do the rest in the next step?