Select only unique records from multiple columns - mysql

I have a table that logs downloads by IP, version and platform. Looking at the table manually I see a lot of duplicates where all 3 of those values are the same. (user is probably just impatient) I'd like to use a SELECT statement that filters out the duplicates and only returns one of the entries if all 3 of those values are the same. Even more advanced, if possible, I also have a date/time field that uses CURRENT_TIMESTAMP. Would be nice if I could include duplicates if they are from different days, but not different times. So I can see if the same user is downloading again on a different day.
I'm mainly just trying to get statistics on how many unique people download each version each day. The structure of the DB table is simple...
key (AUTO_INCREMENT), date (CURRENT_TIMESTAMP), ip, user_agent, platform, version
The software has a Windows and Mac version (platform) and I offer both the current version and a few distinct past versions that were before major changes.

Just group by the fields you want to exclude from being duplicated, like
SELECT ip, platform, version, COUNT(*) AS number_of_tries, max(download_date) AS last_download_date
FROM downloads
GROUP BY ip, platform, version, DATE(download_date)
It would then be relatively easy to do some more advanced filtering over the result grouping by day, etc.

mysql 8.0+ version you can use row_number()
select * from (select *,
row_number()over(partition by ip,platform,date(datetime) order by datetime) rn
from table_name
) a where a.rn=1

Is this what you want? It returns the first record on each date for the ip/platform/version combination:
select t.*
from <tablename> t
where t.datetime = (select min(t2.datetime)
from <tablename> t2
where t2.ip = t.ip and
t2.platform = t.platform and
t2.version = t.version and
date(t2.datetime) = date(t.datetime)
);

Related

How to select records with a count >30?

So I have this data set (down below) and I'm simply trying to gather all data based on records in field 1 that have a count of more than 30 (meaning a distinct brand that has 30+ record entries) that's it lol!
I've been trying a lot of different distinct, count esc type of queries but I'm falling short. Any help is appreciated :)
Data Set
By using GROUP BY and HAVING you can achieve this. To select more columns remember to add them to the GROUP BY clause as well.
SELECT Mens_Brand FROM your_table
WHERE Mens_Brand IN (SELECT Mens_Brand
FROM your_table
GROUP BY Mens_Brand
HAVING COUNT(Mens_Brand)>=30)
You can simply use a window function (requires mysql 8 or mariadb 10.2) for this:
select Mens_Brand, Mens_Price, Shoe_Condition, Currency, PK
from (
select Mens_Brand, Mens_Price, Shoe_Condition, Currency, PK, count(1) over (partition by Mens_Brand) brand_count
from your_table
) counted where brand_count >= 30

Latest data sorted by latest date and test run

I was wondering if there is an easy way to sort a table view, so you only get every latest different test run, sorted by date.
I have a table with a date, engine (It's like a project name for a test, and it changes depending on tests), and the ID of every run.
SELECT
`testjob`.`id` AS `Testjobid`,
`testjob`.`engine` AS `Engine`,
`testjob`.`StartTime`
FROM TestRun
ORDER BY `testjob`.`StartTime` DESC
So after the SQL is run, this table will be shown:
But actually I only need this:
NOTE: This table updates every day, and the engine names will also change over time, so I cant just type the three engine names and get the latest data that way.
The only solution I can think of, would be making an SQL to get every engine name, and then make a loop to run through the table for every engine, but I hope there is a better way than this?
You can use correlated subquery :
select tr.*
from TestRun tr
where tr.StartTime = (select max(tr1.StartTime) from TestRun tr1 where tr1.engine = tr.engine);
In newer version, you can use row_number() :
select tr.*
from (select tr.*, row_number() over (partition by tr.engine order by tr.StartTime desc) as seq
from TestRun tr
) tr
where tr.seq = 1;
If you have a ties with starttime then sub-query will need to modify with limit 1.

How can I convert/fix this WITH statement in SQL?

I have this query but apparently, the WITH statement has not been implemented in some database systems as yet. How can I rewrite this query to achieve the same result.
Basically what this query is supposed to do is to provide the branch names all of all the branches in a database whose deposit total is less than the average of all the branches put together.
WITH branch_total (branch_name, value) AS
SELECT branch_name, sum (balance) FROM account
GROUP BY branch_name
WITH branch_total_avg (value) AS SELECT avg(value)
FROM branch_total SELECT branch_name
FROM branch_total, branch_total_avg
WHERE branch_total.value < branch_total_avg.value;
Can this be written any other way without the WITH? Please help.
WITH syntax was introduced as a new feature of MySQL 8.0. You have noticed that it is not supported in earlier versions of MySQL. If you can't upgrade to MySQL 8.0, you'll have to rewrite the query using subqueries like the following:
SELECT branch_total.branch_name
FROM (
SELECT branch_name, SUM(balance) AS value FROM account
GROUP BY branch_name
) AS branch_total
CROSS JOIN (
SELECT AVG(value) AS value FROM (
SELECT SUM(balance) AS value FROM account GROUP BY branch_name
) AS sums
) AS branch_total_avg
WHERE branch_total.value < branch_total_avg.value;
In this case, the WITH syntax doesn't provide any advantage, so you might as well write it this way.
Another approach, which may be more efficient because it can probably avoid the use of temporary tables in the query, is to split it into two queries:
SELECT AVG(value) INTO #avg FROM (
SELECT SUM(balance) AS value FROM account GROUP BY branch_name
) AS sums;
SELECT branch_name, SUM(balance) AS value FROM account
GROUP BY branch_name
HAVING value < #avg;
This approach is certainly easier to read and debug, and there's some advantage to writing more straightforward code, to allow more developers to maintain it without having to post on Stack Overflow for help.
Another way to rewrite this query:
SELECT branch_name
FROM account
GROUP BY branch_name
HAVING SUM(balance) < (SELECT AVG(value)
FROM (SELECT branch_name, SUM(balance) AS value
FROM account
GROUP BY branch_name) t1)
As you can see from this code the account table has nearly the same aggregate query run against it twice, once at the outer level and again nested two levels deep.
The benefit of the WITH clause is that you can write that aggregate query once give it a name and use it as many times as needed. Additionally a smart DB engine will only run that subfactored query once but use the results as often as needed.

SQL request to both group responses and retrieve only the last line of each

I have a table without IDs. it has 3 columns: the name of a computer, its status (on/off) at the moment of the poll, and the timestamp of the insertion.
if I run
select * from computers group by name;
I get a line for each computer (there are 200 different ones), but these lines don't always hold the latest entry for it.
I then tried
select computers group by name order by timestamp asc;
But I get incoherent responses (some recent timestamps, some old ones... no idea why).
It's basically the same problem as here : SQL: GROUP BY records and then get last record from each group?, but I don't have ids to help :(
You can write:
SELECT computers.name,
computers.status,
computers.timestamp
FROM ( SELECT name,
MAX(timestamp) AS max_timestamp
FROM computers
GROUP
BY name
) AS t
JOIN computers
ON computers.name = t.name
AND computers.timestamp = t.max_timestamp
;
The above uses this subquery to finds the greatest timestamp for each name:
SELECT name
MAX(timestamp) AS max_timestamp
FROM computers
GROUP
BY name
;
and then it gathers fields from computers whose name and timestamp match something that the subquery returned.
The reason that your order by clause has no effect is that it comes too "late": it's used to order records that are going to be returned, after it's already determined that they will be returned. To quote from ยง11.16.3 "GROUP BY and HAVING with Hidden Columns" in the MySQL 5.6 Reference Manual on this subject:
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values the server chooses.
Another way is to write a correlated subquery, and dispense with the GROUP BY entirely. This:
SELECT name, status, timestamp
FROM computers AS c1
WHERE NOT EXISTS
( SELECT 1
FROM computers
WHERE name = c1.name
AND timestamp > c1.timestamp
)
;
finds all rows in computers that haven't been superseded by more-recent rows with the same name. The same approach can be done with a join:
SELECT c1.name, c1.status, c1.timestamp
FROM computers AS c1
LEFT
OUTER
JOIN computers AS c2
ON c2.name = c1.name
AND c2.timestamp > c1.timestamp
WHERE c2.name IS NULL
;
which is less clear IMHO, but may perform better.

MySQL Query in a View

I have the following query I use and it works great:
SELECT * FROM
(
SELECT * FROM `Transactions` ORDER BY DATE DESC
) AS tmpTable
GROUP BY Machine
ORDER BY Machine ASC
What's not great, is when I try to create a view from it. It says that subqueries cannot be used in a view, which is fine - I've searched here and on Google and most people say to break this down into multiple views. Ok.
I created a view that orders by date, and then a view that just uses that view to group by and order by machines - the results however, are not the same. It seems to have taken the date ordering and thrown it out the window.
Any and all help will be appreciated, thanks.
This ended up being the solution, after hours of trying, apparently you can use a subquery on a WHERE but not FROM?
CREATE VIEW something AS
SELECT * FROM Transactions AS t
WHERE Date =
(
SELECT MAX(Date)
FROM Transactions
WHERE Machine = t.Machine
)
You don't need a subquery here. You want to have the latest date in the group of machines, right?
So just do
SELECT
t.*, MAX(date)
FROM Transactions t
GROUP BY Machine
ORDER BY Machine ASC /*this line is obsolete by the way, since in MySQL a group by automatically does sort, when you don't specify another sort column or direction*/
A GROUP BY is used together with a aggregate function (in your case MAX()) anyway.
Alternatively you can also specify multiple columns in the ORDER BY clause.
SELECT
*
FROM
Transactions
GROUP BY Machine
ORDER BY Date DESC, Machine ASC
should give you also what you want to achieve. But using the MAX() function is definitely the better way to go here.
Actually I have never used a GROUP BY without an aggregate function.