MySQL Select duplicates with LEAST condition - mysql

I'm trying to find duplicates and select the result with the least value combination in a table.
Until now I'm only able to select the result that has the lowest value on a column using MIN(). I thought it would be easy to just replace MIN with LEAST and change the columns.
Here's a layout:
CREATE TABLE `index`.`products` ( `id` INT NOT NULL AUTO_INCREMENT , `name` VARCHAR(10) NOT NULL , `price` INT NOT NULL , `availability` INT NOT NULL , PRIMARY KEY (`id`)) ENGINE = InnoDB;
INSERT INTO `products` (`id`, `name`, `price`, `availability`) VALUES
(NULL, 'teste', '10', '1'),
(NULL, 'teste', '5', '2'),
(NULL, 'teste', '3', '3');
The simplified layout
id - name - price - availabilty
1 - test - 10 - 1
2 - test - 5 - 2
3 - test - 3 - 3
using the following query:
select name, MIN(price) from products group by name having count(*) > 1
gets me the lowest price. I'm trying to get the lowest price and lowest availabilty.
select name, LEAST(price, availability) from products group by name having count(*) > 1
This doesn't work.
Clarification: I want to select the row with the lowest price and lowest availabity. In this case it should be the first one I guess.
I should clarifity that 1=available, 2=not available and 3=coming soon

The statement to select lowest price for the best availability is:
set sql_mode=only_full_group_by;
SELECT
name, MIN(price), availability
FROM
products
JOIN
(
SELECT
name, MIN(availability) availability
FROM
products
GROUP BY name
) as x
USING (name , availability)
GROUP BY name , availability;

Related

Why is the `ORDER BY RAND()` statement not working in my query?

I have a database where I save information about my products. I use a query for getting those products from my table. The query looks like this:
SELECT * FROM products WHERE stock > 0 ORDER BY RAND();
This query returns all the products that have stock > 0 in a random order, and it works ok. However, now I want to get those products with stock = 0, but I want them to appear at the end of the query (also in a random way but always after products that have stock > 0). So I tried a new query which looks like this:
(SELECT * FROM products WHERE stock > 0 ORDER BY RAND())
UNION
(SELECT * FROM products WHERE stock = 0 ORDER BY RAND());
...this query returns the zero-stock products at the end, but it seems to ignore the ORDER BY RAND() statement and I always get them in the same order. So my question is: how can I get a random response from the query mantaining the condition of zero-stock products at the end?
You don't need UNION:
SELECT *
FROM products
ORDER BY stock = 0, RAND();
The condition stock = 0 in the ORDER BY clause makes sure that the zero-stock products are placed last and the 2nd level of sorting with RAND() randomizes the rows in each of the 2 groups.
SQL Fiddle
Use a case statement to create a field to order by
e.g.
CREATE TABLE IF NOT EXISTS `products` (
`id` int(6) unsigned NOT NULL,
`stock` int(3) unsigned NOT NULL,
`product` varchar(200) NOT NULL,
PRIMARY KEY (`id`,`product`)
) DEFAULT CHARSET=utf8;
INSERT INTO `products` (`id`, `stock`, `product`) VALUES
('1', '10', 'Timber'),
('2', '12', 'Nails'),
('1', '0', 'Glue'),
('1', '0', 'Left handed wrench.');
And run
SELECT stock, product, case when stock > 0 then 1 else 2 end as SetOrder
FROM products
ORDER BY SetOrder, RAND()
Gets you
stock product SetOrder
10 Timber 1
12 Nails 1
0 Glue 2
0 Left handed wrench. 2
SQL Fiddle

How to count multiple columns grouping by rows in MySQL?

I have two tables, "keywords" and "stats" and want to know per keyword how many results each merchant has. So one row per keyword.
Desired result e.g.:
KWD | RESULTS Amazon | RESULTS eBay
test 3 5
second 6 2
The tables:
create table keywords
(
ID mediumint unsigned auto_increment
primary key,
KEYWORD varchar(255) null
);
create table stats
(
MERCHANT_ID tinyint unsigned not null,
TYPE_ID mediumint unsigned not null comment 'the ID of the coresponding type. E.g. kw_id from keywords',
RESULTS smallint unsigned null,
DATE date not null,
primary key (DATE, MERCHANT_ID, TYPE_ID)
)
comment 'How many results does each merchant have per search?';
Sample data:
-- keywords
insert into test.keywords (ID, KEYWORD) values (1, 'testing');
insert into test.keywords (ID, KEYWORD) values (2, 'blablub');
-- stats
insert into test.stats (MERCHANT_ID, TYPE_ID, RESULTS, DATE) values (1, 1, 33, '2021-07-06');
insert into test.stats (MERCHANT_ID, TYPE_ID, RESULTS, DATE) values (1, 2, 3, '2021-07-06');
insert into test.stats (MERCHANT_ID, TYPE_ID, RESULTS, DATE) values (2, 1, 22, '2021-07-06');
insert into test.stats (MERCHANT_ID, TYPE_ID, RESULTS, DATE) values (2, 2, 6, '2021-07-06');
The query:
select
kwd.KEYWORD,
mss.MERCHANT_ID,
mss.RESULTS
from keywords kwd
LEFT JOIN stats mss ON mss.TYPE_ID = kwd.ID
where
date = 20210705
group by kwd.ID
There are about 10 merchants. Is it possible to get one row per keyword and have the number of results per merchant in seperate colunns?
Try something like this:
select
kwd.KEYWORD,
SUM(IF(mss.MERCHANT_ID = 'amazon', mss.RESULTS, 0)) as `amazon_sum`,
SUM(IF(mss.MERCHANT_ID = 'eBay', mss.RESULTS, 0)) as `eBay_sum`
from keywords kwd
LEFT JOIN stats mss ON mss.TYPE_ID = kwd.ID
where
date = 20210705
group by kwd.ID

Paginating results from union of two tables

I have a problem with paginating two large tables:
Receipts table: id, receipt_date, record_details (650k records)
Z Reports table: id, receipt_date, record_details (88k records)
What I want to do is to sort both of these tables by receipt_date and union, after that I want to paginate them. Currently I have this SQL (not exactly but the main idea is this):
SELECT c.id, c.receipt_date, c.col_type FROM (
SELECT a.id, a.receipt_date, 'receipt' AS coltype
FROM `terminal_receipts` a
WHERE `a`.`deleted` IS NULL
UNION ALL
SELECT b.id, b.receipt_date, 'zreport' AS coltype
FROM z_reports` b WHERE `b`.`deleted` IS NULL
) c
ORDER BY receipt_date desc LIMIT 50 OFFSET 0
This way, the server selects all records from two tables, orders them by date and then applies the pagination.
But when the row counts increase, this query will take longer to complete. Is there any other algorithm to get the same result without being dependent to table sizes?
There is a technique called Seek Method, you can read about here. According to it, you need to identify a set of columns that uniquely identifies each row. This set of column will then be used with a predicate when searching through the database.
The link I mentioned has some examples, but here's another one, a very simple one:
CREATE TABLE IF NOT EXISTS `docs` (
`id` int(6) unsigned NOT NULL,
`rev` int(3) unsigned NOT NULL,
`content` varchar(200) NOT NULL,
PRIMARY KEY (`id`,`rev`)
) DEFAULT CHARSET=utf8;
INSERT INTO `docs` (`id`, `rev`, `content`) VALUES
('1', '1', 'The earth is flat'),
('2', '1', 'One hundred angels can dance on the head of a pin'),
('1', '2', 'The earth is flat and rests on a bull\'s horn'),
('1', '3', 'The earth is like a ball.');
SELECT *
FROM `docs`
ORDER BY rev, id
LIMIT 2;
SELECT *
FROM `docs`
WHERE (rev, id) > (1, 2) # let's use the last row from the previous select with the predicate
ORDER BY rev, id
LIMIT 2;
SELECT *
FROM `docs`
WHERE (rev, id) > (3, 1) # same idea
ORDER BY rev, id
LIMIT 2;
SQLFiddle Link
Having indexes will further speed up pagination.
I hope this helps.

get the id of the row with the least value, group by an other column

I ran into a problem trying to pull one action per user with the least priority, the priority is based on other columns content and is an integer,
This is the initial query :
SELECT
CASE
...
END AS dummy_priority,
id,
user_id
FROM
actions
Result :
id user_id priority
1 2345 1
2 2345 3
3 2999 5
4 2999 2
5 3000 10
Desired result :
id user_id priority
1 2345 1
4 2999 2
5 3000 10
Following what i want i tried
SELECT x.id, x.user_id, MIN(x.priority)
FROM (
SELECT
CASE
...
END AS priority,
id,
user_id
FROM
actions
) x
GROUP BY x.user_id
Which didn't work
Error Code: 1055. Expression #1 of SELECT list is not in GROUP BY
clause and contains nonaggregated column 'x.id' which is not
functionally dependent on columns in GROUP BY clause;
Most examples of this I found were extracting just the user_id and priority and then doing an inner join with both of them to get the row, but I can't do that since (priority, user_id) isn't unique
A simple verifiable example would be
CREATE TABLE `actions` (
`id` int(11) NOT NULL,
`user_id` int(11) DEFAULT NULL,
`priority` int(11) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `actions` (`id`, `user_id`, `priority`) VALUES
(1, 2345, 1),
(2, 2345, 3),
(3, 2999, 5),
(4, 2999, 2),
(5, 3000, 10);
how to extract the desired result (please hold in mind that this table is a subquery)?
The proper way to do this would involve a subquery of some sort . . . and that would require repeating the case definition.
Here is another method, using the substring_index()/group_concat() trick:
SELECT SUBSTRING_INDEX(GROUP_CONCAT(x.id ORDER BY x.priority), ',', 1) as id,
x.user_id, MIN(x.priority)
FROM (SELECT (CASE ...
END) AS priority,
id, user_id
FROM actions a
) x
GROUP BY x.user_id;
And that proper way in full...
SELECT x...
, CASE...x... priority
FROM my_table x
JOIN
( SELECT user_id
, MIN(CASE...) priority
FROM my_table
GROUP
BY user_id
) y
ON y.user_id = x.user_id
AND y.priority = CASE...x...;
This should work ...
SELECT id , user_id, priority FROM actions act
INNER JOIN
(SELECT
user_id, MIN(priority) AS priority
FROM
actions
GROUP BY user_id) pri
ON act.user_id = pri.user_id AND act.priority = pri.prority

Mysql calculating percentage of repeat rows

So I am trying to calculate the amount of repeat orders in my system per restaurant. This is defined as the number of users (based on their email address, eo_email) that have ordered more than once from that restaurant. Examples under the schema
Here is the table that represents my restaurants
CREATE TABLE IF NOT EXISTS `lf_restaurants` (
`r_id` int(8) NOT NULL AUTO_INCREMENT,
`r_name` varchar(128) DEFAULT NOT NULL,
PRIMARY KEY (`r_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 ;
INSERT INTO `lf_restaurants` (`eo_id`, `eo_ref_id`) VALUES
('1', 'Restaurant X'),
('2', 'Cafe Y');
And this is my orders table
CREATE TABLE IF NOT EXISTS `ecom_orders` (
`eo_id` mediumint(9) NOT NULL AUTO_INCREMENT,
`eo_ref_id` varchar(12) DEFAULT NOT NULL,
`eo_email` varchar(255) DEFAULT NOT NULL,
`eo_order_parent` int(11) NOT NULL,
PRIMARY KEY (`eo_id`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
INSERT INTO `ecom_orders` (`eo_id`, `eo_ref_id`, `eo_email`, `eo_order_parent`) VALUES
('', '', 'a#a.com', '1'),
('', '', 'a#a.com', '1'),
('', '', 'a#a.com', '1'),
('', '', 'a#a.com', '1'),
('', '', 'a#a.com', '1'),
('', '', 'b#b.com', '1'),
('', '', 'b#b.com', '1'),
('', '', 'c#c.com', '1'),
('', '', 'd#d.com', '1'),
('', '', 'e#e.com', '1'),
('', '', 'a#a.com', '2'),
('', '', 'c#c.com', '2'),
('', '', 'c#c.com', '2'),
('', '', 'e#e.com', '2');
So Restaurant X (r_id 1) has 10 orders. Users a#a.com and b#b.com have ordered from that restaurant multiple times, and c#c.com, d#d.com, and e#e.com have only ordered once, so it would need to return 40%
Cafe Y (r_id 2) has 4 orders. User c#c.com has ordered twice, users a#a.com and e#e.com have only ordered once, so it would need to return 33%
I am not sure posting what I have got already will be much good, as I keep running into 'Subquery has more than 1 result' or if I wrap that subquery in its own dummy query with a count, it wont let me use fields I need from the main query such as r_id. But here goes:
SELECT r_name,
(SELECT COUNT(*) AS cnt_users
FROM (
SELECT *
FROM ecom_orders
WHERE eo_order_parent = r_id
GROUP BY eo_email
) AS cnt_dummy
) AS num_orders,
(SELECT COUNT(*) AS cnt
FROM ecom_orders
WHERE eo_order_parent = r_id
GROUP BY eo_order_parent, eo_email
) AS num_rep_orders
FROM lf_restaurants
ORDER BY num_orders DESC
The num_orders subquery is saying it doesnt recognise r_id, as I am guessing this is due to the order in which things are executed
The num_rep_orders subquery is coming back as multiple rows, but really i want that to come back with just a single value, which I could do if I made it like the num_orders subquery but then would run into the r_id doesnt exist problem.
So my question is: How do I get these values that I need without running into subquery has more than 1 row, and r_id does not exist?
Then from those 2 values I can work out the percentage and all should be gravy :) Any help much appreciated!
So Restaurant X (r_id 1) has 10 orders. Users a#a.com and b#b.com have
ordered from that restaurant multiple times, and c#c.com, d#d.com, and
e#e.com have only ordered once, so it would need to return 40%
Cafe Y (r_id 2) has 4 orders. User c#c.com has ordered twice, users
a#a.com and e#e.com have only ordered once, so it would need to return
33%
Okay. So let's start with getting the number of repeating customers.
SELECT eo_order_parent, eo_email, COUNT(eo_email) AS orders FROM ecom_orders
GROUP BY eo_order_parent, eo_email
HAVING orders > 1;
And the total number of different customers
SELECT eo_order_parent, COUNT(eo_email) FROM ecom_orders
GROUP BY eo_order_parent;
But we can do this in one go:
SELECT eo_order_parent,
SUM(CASE WHEN orders > 1 THEN 1 ELSE 0 END) AS repeats,
SUM(1) AS total FROM
(
SELECT eo_order_parent, eo_email, COUNT(*) AS orders FROM ecom_orders
GROUP BY eo_order_parent, eo_email
) AS eo_group_1
GROUP BY eo_order_parent;
This gives:
+-----------------+---------+-------+
| eo_order_parent | repeats | total |
+-----------------+---------+-------+
| 1 | 2 | 5 |
| 2 | 1 | 3 |
+-----------------+---------+-------+
2 rows in set (0.00 sec)
Then 2/5 is your 40%, and 1/3 is 33%.
The following query computes the number of repeat customers and the total number of customers per restaurant
SELECT
u.r_id,
u.r_name,
SUM(u.no_orders > 1) AS repeats,
SUM(u.no_orders) AS orders,
COUNT(u.eo_email) AS customers
FROM (
SELECT
r.*,
o.eo_email,
COUNT(o.eo_id) AS no_orders
FROM lf_restaurants r
LEFT JOIN ecom_orders o ON o.eo_order_parent = r.r_id
GROUP BY o.eo_email
) u
GROUP BY
r.r_id;
The subquery first computes the number of orders per customer/restaurant pair. The outer query computes from this the number of customers, the number of repeating customers and the total number of customers per restaurant. You can also compute the percentage (but this does not have to be done in the query).