How to select records with a count >30? - mysql

So I have this data set (down below) and I'm simply trying to gather all data based on records in field 1 that have a count of more than 30 (meaning a distinct brand that has 30+ record entries) that's it lol!
I've been trying a lot of different distinct, count esc type of queries but I'm falling short. Any help is appreciated :)
Data Set

By using GROUP BY and HAVING you can achieve this. To select more columns remember to add them to the GROUP BY clause as well.
SELECT Mens_Brand FROM your_table
WHERE Mens_Brand IN (SELECT Mens_Brand
FROM your_table
GROUP BY Mens_Brand
HAVING COUNT(Mens_Brand)>=30)

You can simply use a window function (requires mysql 8 or mariadb 10.2) for this:
select Mens_Brand, Mens_Price, Shoe_Condition, Currency, PK
from (
select Mens_Brand, Mens_Price, Shoe_Condition, Currency, PK, count(1) over (partition by Mens_Brand) brand_count
from your_table
) counted where brand_count >= 30

Related

Suggest an optimised mysql query

I have table with user transactions.I need to select users who made total transactions more than 100 000 in a single day.Currently what I'm doing is gather all user ids and execute
SELECT sum ( amt ) as amt from users where date = date("Y-m-d") AND user_id=id;
for each id and checking weather the amt > 100k or not.
Since it's a large table, it's taking lot of time to execute.Can some one suggest an optimised query ?
This will do:
SELECT sum ( amt ) as amt, user_id from users
where date = date("Y-m-d")
GROUP BY user_id
HAVING sum ( amt ) > 1; ' not sure what Lakh is
What about filtering the record 1st and then applying sum like below
select SUM(amt),user_id from (
SELECT amt,user_id from users where user_id=id date = date("Y-m-d")
)tmp
group by user_id having sum(amt)>100000
What datatype is amt? If it's anything but a basic integral type (e.g. int, long, number, etc.) you should consider converting it. Decimal types are faster than they used to be, but integral types are faster still.
Consider adding indexes on the date and user_id field, if you haven't already.
You can combine the aggregation and filtering in a single query...
SELECT SUM(Amt) as amt
FROM users
WHERE date=date(...)
AND user_id=id
GROUP BY user_id
HAVING amt > 1
The only optimization that can be done in your query is by applying primary key on user_id column to speed up filtering.
As far as other answers posted which say to apply GROUP BY on filtered records, it won't have any effect as WHERE CLAUSE is executed first in SQL logical query processing phases.
Check here
You could use MySql sub-queries to let MySql handle all the iterations. For example, you could structure your query like this:
select user_data.user_id, user_data.total_amt from
(
select sum(amt) as total_amt, user_id from users where date = date("Y-m-d") AND user_id=id
) as user_data
where user_data.total_amt > 100000;

MySQL - SELECT all columns WHERE one column is DISTINCT

I'm very sorry if the question seems too basic.
I've surfed entire Internet and StackOverflow for a finished solution, and did not find anything that I can understand, and can't write it myself, so have to ask it here.
I have a MySQL database.
It has a table named "posted".
It has 8 columns.
I need to output this result:
SELECT DISTINCT link FROM posted WHERE ad='$key' ORDER BY day, month
But I need not only the "link" column, but also other columns for this row.
Like for every row returned with this query I also need to know its "id" in the table, "day" and "month" values etc.
Please tell me what should I read to make it, or how to make it.
Please keep it as simple as possible, as I'm not an expert in MySQL.
Edit:
I tried this:
SELECT DISTINCT link,id,day,month FROM posted WHERE ad='$key' ORDER BY day, month
It doesn't work. It returns too many rows. Say there are 10 rows with same links, but different day/month/id. This script will return all 10, and I want only the first one (for this link).
The problem comes from instinctively believing that DISTINCT is a local pre-modifier for a column.
Hence, you "should" be able to type
XXbadXX SELECT col1, DISTINCT col2 FROM mytable XXbadXX
and have it return unique values for col2. Sadly, no. DISTINCT is actually a global post-modifier for SELECT, that is, as opposed to SELECT ALL (returning all answers) it is SELECT DISTINCT (returning all unique answers). So a single DISTINCT acts on ALL the columns that you give it.
This makes it real hard to use DISTINCT on a single column, while getting the other columns, without doing major extremely ugly backflips.
The correct answer is to use a GROUP BY on the columns that you want to have unique answers: SELECT col1, col2 FROM mytable GROUP BY col2 will give you arbitrary unique col2 rows, with their col1 data as well.
I tried this:
SELECT DISTINCT link,id,day,month FROM posted
WHERE ad='$key' ORDER BY day, month
It doesn't work. It returns too many rows. Say there are 10 rows with
same links, but different day/month/id. This script will return all
10, and I want only the first one (for this link).
What you're asking doesn't make sense.
Either you want the distinct value of all of link, id, day, month, or you need to find a criterion to choose which of the values of id, day, month you want to use, if you just want at most one distinct value of link.
Otherwise, what you're after is similar to MySQL's hidden columns in GROUP BY/HAVING statements, which is non-standard SQL, and can actually be quite confusing.
You could in fact use a GROUP BY link if it made sense to pick any row for a given link value.
Alternatively, you could use a sub-select to pick the row with the minimal id for a each link value (as described in this answer):
SELECT link, id, day, month FROM posted
WHERE (link, id) IN
(SELECT link, MIN(id) FROM posted ad='$key' GROUP BY link)
SELECT Id, Link, Day, Month FROM Posted
WHERE Id IN(
SELECT Min(Id) FROM Posted GROUP BY Link)
SELECT OTHER_COLUMNS FROM posted WHERE link in (
SELECT DISTINCT link FROM posted WHERE ad='$key' )
ORDER BY day, month
If what your asking is to only show rows that have 1 link for them then you can use the following:
SELECT * FROM posted WHERE link NOT IN
(SELECT link FROM posted GROUP BY link HAVING COUNT(LINK) > 1)
Again this is assuming that you want to cut out anything that has a duplicate link.
I think the best solution would be to do a subquery and then join that to the table. The sub query would return the primary key of the table. Here is an example:
select *
from (
SELECT row_number() over(partition by link order by day, month) row_id
, *
FROM posted
WHERE ad='$key'
) x
where x.row_id = 1
What this does is the row_number function puts a numerical sequence partitioned by each distinct link that results in the query.
By taking only those row_numbers that = 1, then you only return 1 row for each link.
The way you change what link gets marked "1" is through the order-by clause in the row_number function.
Hope this helps.
SELECT DISTINCT link,id,day,month FROM posted WHERE ad='$key' ORDER BY day, month
OR
SELECT link,id,day,month FROM posted WHERE ad='$key' ORDER BY day, month
If you want all columns where link is unique:
SELECT * FROM posted WHERE link in
(SELECT link FROM posted WHERE ad='$key' GROUP BY link);
What you want is the following:
SELECT DISTINCT * FROM posted WHERE ad='$key' GROUP BY link ORDER BY day, month
if there are 4 rows for example where link is the same, it will pick only one (I asume the first one).
I had a similar problem, maybe that help someone, for example - table with 3 columns
SELECT * FROM DataTable WHERE Data_text = 'test' GROUP BY Data_Name ORDER BY Data_Name ASC
or
SELECT Data_Id, Data_Text, Data_Name FROM DataTable WHERE Data_text = 'test' GROUP BY Data_Name ORDER BY Data_Name ASC
Two ways work for me.
SELECT a.* FROM orders a INNER JOIN (SELECT course,MAX(id) as id FROM orders WHERE admission_id=".$id." GROUP BY course ) AS b ON a.course = b.course AND a.id = b.id
With the Above Query you will get unique records with where condition
In MySQL you can simply use "group by". Below will select ALL, with a DISTINCT "col"
SELECT *
FROM tbl
GROUP BY col
Select the datecolumn of month so that u can get only one row per link, e.g.:
select link, min(datecolumn) from posted WHERE ad='$key' ORDER BY day, month
Good luck............
Or
u if you have date column as timestamp convert the format to date and perform distinct on link so that you can get distinct link values based on date instead datetime

Mysql COUNT, GROUP BY and ORDER BY

This sounds quite simple but I just can't figure it out.
I have a table orders (id, username, telephone_number).
I want to get number of orders from one user by comparing the last 8 numbers in telephone_number.
I tried using SUBSTR(telephone_number, -8), I've searched and experimented a lot, but still I can't get it to work.
Any suggestions?
Untested:
SELECT
COUNT(*) AS cnt,
*
FROM
Orders
GROUP BY
SUBSTR(telephone_number, -8)
ORDER BY
cnt DESC
The idea:
Select COUNT(*) (i.e., number of rows in each GROUPing) and all fields from Orders (*)
GROUP by the last eight digits of telephone_number1
Optionally, ORDER by number of rows in GROUPing descending.
1) If you plan to do this type of query often, some kind of index on the last part of the phone number could be desirable. How this could be best implemented depends on the concrete values stored in the field.
//Memory intensive.
SELECT COUNT(*) FROM `orders` WHERE REGEXP `telephone_number` = '(.*?)12345678'
OR
//The same, but better and quicker.
SELECT COUNT(*) FROM `orders` WHERE `telephone_number` LIKE '%12345678'
You can use the below query to get last 8 characters from a column values.
select right(rtrim(First_Name),8) FROM [ated].[dbo].[Employee]

Fast MAX, GROUP BY on the concatenation of mulliple columns

I have a table with 4 columns: name, date, version,and value. There's an composite index on all four, in that order. It has 20M rows: 2.000 names, approx 1.000 dates per name, approx 10 versions per date.
I'm trying to get a list that give for all names the highest date, the highest version on that date, and the associated value.
When I do
SELECT name,
MAX(date)
FROM table
GROUP BY name
I get good performance and the database uses the composite index
However, when I join the table to this in order to get the MAX(version) per name the query takes ages. There must be a way to get the result in about the same magnitude of time as the SELECT statement above? I can easily be done by using the index.
Try this: (I know it needs a few syntax tweaks for MySQL... ask for them and I will find them)
INSERT INTO #TempTable
SELECT name, MAX(Date) as Date
FROM table
Group By name
select table.name, table.date, max(table.version) as version
from table
inner join #TempTable on table.name = #temptable.name and table.date = #temptable.date
group by table.name, table.date

What's faster, SELECT DISTINCT or GROUP BY in MySQL?

If I have a table
CREATE TABLE users (
id int(10) unsigned NOT NULL auto_increment,
name varchar(255) NOT NULL,
profession varchar(255) NOT NULL,
employer varchar(255) NOT NULL,
PRIMARY KEY (id)
)
and I want to get all unique values of profession field, what would be faster (or recommended):
SELECT DISTINCT u.profession FROM users u
or
SELECT u.profession FROM users u GROUP BY u.profession
?
They are essentially equivalent to each other (in fact this is how some databases implement DISTINCT under the hood).
If one of them is faster, it's going to be DISTINCT. This is because, although the two are the same, a query optimizer would have to catch the fact that your GROUP BY is not taking advantage of any group members, just their keys. DISTINCT makes this explicit, so you can get away with a slightly dumber optimizer.
When in doubt, test!
If you have an index on profession, these two are synonyms.
If you don't, then use DISTINCT.
GROUP BY in MySQL sorts results. You can even do:
SELECT u.profession FROM users u GROUP BY u.profession DESC
and get your professions sorted in DESC order.
DISTINCT creates a temporary table and uses it for storing duplicates. GROUP BY does the same, but sortes the distinct results afterwards.
So
SELECT DISTINCT u.profession FROM users u
is faster, if you don't have an index on profession.
All of the answers above are correct, for the case of DISTINCT on a single column vs GROUP BY on a single column.
Every db engine has its own implementation and optimizations, and if you care about the very little difference (in most cases) then you have to test against specific server AND specific version! As implementations may change...
BUT, if you select more than one column in the query, then the DISTINCT is essentially different! Because in this case it will compare ALL columns of all rows, instead of just one column.
So if you have something like:
// This will NOT return unique by [id], but unique by (id,name)
SELECT DISTINCT id, name FROM some_query_with_joins
// This will select unique by [id].
SELECT id, name FROM some_query_with_joins GROUP BY id
It is a common mistake to think that DISTINCT keyword distinguishes rows by the first column you specified, but the DISTINCT is a general keyword in this manner.
So people you have to be careful not to take the answers above as correct for all cases... You might get confused and get the wrong results while all you wanted was to optimize!
Go for the simplest and shortest if you can -- DISTINCT seems to be more what you are looking for only because it will give you EXACTLY the answer you need and only that!
well distinct can be slower than group by on some occasions in postgres (dont know about other dbs).
tested example:
postgres=# select count(*) from (select distinct i from g) a;
count
10001
(1 row)
Time: 1563,109 ms
postgres=# select count(*) from (select i from g group by i) a;
count
10001
(1 row)
Time: 594,481 ms
http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks_I
so be careful ... :)
Group by is expensive than Distinct since Group by does a sort on the result while distinct avoids it. But if you want to make group by yield the same result as distinct give order by null ..
SELECT DISTINCT u.profession FROM users u
is equal to
SELECT u.profession FROM users u GROUP BY u.profession order by null
It seems that the queries are not exactly the same. At least for MySQL.
Compare:
describe select distinct productname from northwind.products
describe select productname from northwind.products group by productname
The second query gives additionally "Using filesort" in Extra.
In MySQL, "Group By" uses an extra step: filesort. I realize DISTINCT is faster than GROUP BY, and that was a surprise.
After heavy testing we came to the conclusion that GROUP BY is faster
SELECT sql_no_cache
opnamegroep_intern
FROM telwerken
WHERE opnemergroep IN (7,8,9,10,11,12,13) group by opnamegroep_intern
635 totaal 0.0944 seconds
Weergave van records 0 - 29 ( 635 totaal, query duurde 0.0484 sec)
SELECT sql_no_cache
distinct (opnamegroep_intern)
FROM telwerken
WHERE opnemergroep IN (7,8,9,10,11,12,13)
635 totaal 0.2117 seconds ( almost 100% slower )
Weergave van records 0 - 29 ( 635 totaal, query duurde 0.3468 sec)
(more of a functional note)
There are cases when you have to use GROUP BY, for example if you wanted to get the number of employees per employer:
SELECT u.employer, COUNT(u.id) AS "total employees" FROM users u GROUP BY u.employer
In such a scenario DISTINCT u.employer doesn't work right. Perhaps there is a way, but I just do not know it. (If someone knows how to make such a query with DISTINCT please add a note!)
Here is a simple approach which will print the 2 different elapsed time for each query.
DECLARE #t1 DATETIME;
DECLARE #t2 DATETIME;
SET #t1 = GETDATE();
SELECT DISTINCT u.profession FROM users u; --Query with DISTINCT
SET #t2 = GETDATE();
PRINT 'Elapsed time (ms): ' + CAST(DATEDIFF(millisecond, #t1, #t2) AS varchar);
SET #t1 = GETDATE();
SELECT u.profession FROM users u GROUP BY u.profession; --Query with GROUP BY
SET #t2 = GETDATE();
PRINT 'Elapsed time (ms): ' + CAST(DATEDIFF(millisecond, #t1, #t2) AS varchar);
OR try SET STATISTICS TIME (Transact-SQL)
SET STATISTICS TIME ON;
SELECT DISTINCT u.profession FROM users u; --Query with DISTINCT
SELECT u.profession FROM users u GROUP BY u.profession; --Query with GROUP BY
SET STATISTICS TIME OFF;
It simply displays the number of milliseconds required to parse, compile, and execute each statement as below:
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 2 ms.
SELECT DISTINCT will always be the same, or faster, than a GROUP BY. On some systems (i.e. Oracle), it might be optimized to be the same as DISTINCT for most queries. On others (such as SQL Server), it can be considerably faster.
This is not a rule
For each query .... try separately distinct and then group by ... compare the time to complete each query and use the faster ....
In my project sometime I use group by and others distinct
If you don't have to do any group functions (sum, average etc in case you want to add numeric data to the table), use SELECT DISTINCT. I suspect it's faster, but i have nothing to show for it.
In any case, if you're worried about speed, create an index on the column.
If the problem allows it, try with EXISTS, since it's optimized to end as soon as a result is found (And don't buffer any response), so, if you are just trying to normalize data for a WHERE clause like this
SELECT FROM SOMETHING S WHERE S.ID IN ( SELECT DISTINCT DCR.SOMETHING_ID FROM DIFF_CARDINALITY_RELATIONSHIP DCR ) -- to keep same cardinality
A faster response would be:
SELECT FROM SOMETHING S WHERE EXISTS ( SELECT 1 FROM DIFF_CARDINALITY_RELATIONSHIP DCR WHERE DCR.SOMETHING_ID = S.ID )
This isn't always possible but when available you will see a faster response.
in mySQL i have found that GROUP BY will treat NULL as distinct, while DISTINCT does not.
Took the exact same DISTINCT query, removed the DISTINCT, and added the selected fields as the GROUP BY, and i got many more rows due to one of the fields being NULL.
So.. I tend to believe that there is more to the DISTINCT in mySQL.