MySQL inquiry does not make difference when using order by - mysql

I create a table like:
CREATE TABLE my_table
(
value int(20)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
And I insert some data:
mysql> SELECT * FROM my_table;
+-------+
| value |
+-------+
| 0 |
| 1 |
| 2 |
| 3 |
+-------+
When I execute SELECT COUNT(value), value FROM my_table; and SELECT COUNT(value), value FROM my_table ORDER BY value DESC;, they both show:
+--------------+-------+
| COUNT(value) | value |
+--------------+-------+
| 4 | 0 |
+--------------+-------+
My question is: why the column at the right side is always 0? Why ORDER BY value DESC doesn't make any difference here?

ORDER BY is processed after it generates the results. When you use an aggregate function like COUNT() without GROUP BY, it aggregates all the selected rows, and this produces one row of results. Any non-aggregated columns come from indeterminate rows; the ORDER BY clause has no effect on how this row is selected.

ORDER BY sorts the result rows. What you are looking for is MAX() or MIN() on the value
SELECT COUNT(value), MAX(value) FROM my_table;
SELECT COUNT(value), MIN(value) FROM my_table;

Related

How to maintain the sort at insert-select scripts?

We have a table called tblINUser, which has many records and occupies a vast amount of space. In an attempt to reduce the amount of used space, we create a table called tblINUserSortByFilter which contains all the possible text values of this field and we create a foreign key in tblINUser that numerically references this value. We have several databases, because this database is sharded, so it would be great to sort the values similarly accross databases. This was the first attempt:
CREATE TABLE MC.tblINUserSortByFilterType(
pkINUserSortByFilterTypeID SMALLINT(6) PRIMARY KEY AUTO_INCREMENT,
SortByFilter varchar(45) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'first',
INDEX(SortByFilter)
);
INSERT INTO MC.tblINUserSortByFilterType(SortByFilter)
SELECT DISTINCT SortByFilter
FROM MC.tblINUser
ORDER BY SortByFilter = 'first';
ALTER TABLE MC.tblINUser
ADD COLUMN fkINUserSortByFilterTypeID SMALLINT(6) DEFAULT 1,
ADD INDEX (fkINUserSortByFilterTypeID);
UPDATE MC.tblINUser INUser
JOIN MC.tblINUserSortByFilterType INUserSortByFilterType
ON INUser.SortByFilter = INUserSortByFilterType.SortByFilter
SET INUser.fkINUserSortByFilterTypeID = INUserSortByFilterType.pkINUserSortByFilterTypeID;
ALTER TABLE MC.tblINUser
DROP COLUMN SortByFilter;
You may argue, correctly that the sort has the only criteria, which is ORDER BY SortByFilter = 'first' and a clause of ORDER BY SortByFilter = 'first', SortByFilter would be an obvious improvement. This would be a correct criticism, yet, even though we may have a chaotic behavior starting from the second record, it would be reasonable to expect that the very first inserted record would be first, yet, unfortunately, this is not the case. Running select * from MC.tblINUserSortByFilterType; yields
+----------------------------+----------------------------+
| pkINUserSortByFilterTypeID | SortByFilter |
+----------------------------+----------------------------+
| 5 | first |
| 4 | first-ASC |
| 3 | last |
| 1 | none |
| 2 | StatTeacher.IsActive DESC |
+----------------------------+----------------------------+
as we can see, not even this expectation is met, since first has an id of 5. An improvement is achieved by changing the inserts to
INSERT INTO MC.tblINUserSortByFilterType(SortByFilter)
SELECT DISTINCT SortByFilter
FROM MC.tblINUser
WHERE SortByFilter = 'first';
INSERT INTO MC.tblINUserSortByFilterType(SortByFilter)
SELECT DISTINCT SortByFilter
FROM MC.tblINUser
WHERE SortByFilter <> 'first';
and then the result of the same selection we get this result:
+----------------------------+----------------------------+
| pkINUserSortByFilterTypeID | SortByFilter |
+----------------------------+----------------------------+
| 1 | first |
| 3 | first-ASC |
| 4 | last |
| 2 | none |
| 5 | StatTeacher.IsActive DESC |
+----------------------------+----------------------------+
5 rows in set (0.00 sec)
as we can see, first is correctly receiving a value of 1. Yet, it seems that if we run the same inserts over different copies of the database, the order of subsequent rows might be unreliable. So, how could we ensure that the records would be inserted in the exact order that the following query yields?
SELECT DISTINCT SortByFilter
FROM MC.tblINUser
WHERE SortByFilter = 'first', SortByFilter;
I know that we can solve this by
using a cursor for the insert
looping the records received
inserting them individually
But that would have as many insert statements as the number of records the above query yields. Is there a way to achieve the same with a single command?
it would be reasonable to expect that the very first inserted record would be first
I don't think so. You used ORDER BY SortByFilter = 'first' which returns 0 for all values except 'first', followed by 1 for 'first'. The value 1 sorts after the value 0, so the entry 'first' ends up being last. The other values end up sorting more or less randomly.
Demo:
mysql> create table mytable (SortByFilter varchar(64));
Query OK, 0 rows affected (0.02 sec)
mysql> insert into mytable values ('first'), ('first-ASC'),
('last'), ('none'), ('StatTeacher.IsActive DESC');
Query OK, 5 rows affected (0.01 sec)
Records: 5 Duplicates: 0 Warnings: 0
mysql> select SortByFilter='first', SortByFilter from mytable
order by SortByFilter = 'first';
+----------------------+---------------------------+
| SortByFilter='first' | SortByFilter |
+----------------------+---------------------------+
| 0 | first-ASC |
| 0 | last |
| 0 | none |
| 0 | StatTeacher.IsActive DESC |
| 1 | first |
+----------------------+---------------------------+
I suggest do not rely on automatic sorting. Be specific about the sort order of every value. Here's one way to do it:
mysql> select field(SortByFilter, 'first', 'first-ASC',
'none', 'StatTeacher.IsActive DESC', 'last') AS SortOrder,
SortByFilter
from mytable order by SortOrder;
+-----------+---------------------------+
| SortOrder | SortByFilter |
+-----------+---------------------------+
| 1 | first |
| 2 | first-ASC |
| 3 | none |
| 4 | StatTeacher.IsActive DESC |
| 5 | last |
+-----------+---------------------------+
To get the rows in a particular order, you must use an ORDER BY. That is straightforward to do if the object of the ORDER BY is a string and you want alphabetical order, or it is numeric and you want it in numeric order. Ditto for the reverse by using DESC.
For for some abnormal ordering, here is one trick:
ORDER BY FIND_IN_SET(my_column, "first,second,third,fourth")
Another:
ORDER BY my_column != 'first', my_column
That will list 'first' first, then do the rest in alphabetic order. (I am assuming my_column is a VARCHAR.)
ORDER BY my_column = 'last', my_column
Note that a boolean expression evaluates to 0 (for false) or 1 (for true); I am then depending on the sort order of 0 and 1.

How to specify to use the modified field from select statement in group by (not the original one)

I have a query like:
SELECT
DATE_FORMAT(`create`, "%d.%m.%Y") AS `create`
FROM table
GROUP by `create`;
Is it possible to specify that I wanna use in GROUP BY the modified "create" from SELECT statement, instead of the original table field value?
In other words, you can specify it like table.create but how to do select.create
create is a reserved keyword in MySQL. To use it as an identifier, delimit it with back-ticks, as you define the column alias and as you reference it in your GROUP BY clause:
SELECT
DATE_FORMAT(`create`, '%d.%m.%Y') AS `create`
FROM `table`
GROUP by `create`
Update: Testing this, I see what you mean. It seems to use the column create of the base table instead of the alias. If I have two rows each on the same day but with different times, I'd expect those to be grouped together, but they are not.
mysql> insert into `table` values
('2021-04-01 12:34:56'),
('2021-04-01 14:34:56'),
('2021-05-01 14:56:59'),
('2021-05-01 09:56:59');
mysql> SELECT DATE_FORMAT(`create`, "%d.%m.%Y") AS `create`
FROM `table`
GROUP by `create`;
+------------+
| create |
+------------+
| 01.04.2021 |
| 01.04.2021 |
| 01.05.2021 |
| 01.05.2021 |
+------------+
You could force it by putting the query with the date_format into a subquery:
mysql> SELECT `create`
FROM (
SELECT DATE_FORMAT(`create`, '%d.%m.%Y') AS `create`
FROM `table`
) as t
GROUP BY `create`;
+------------+
| create |
+------------+
| 01.04.2021 |
| 01.05.2021 |
+------------+
Or you could make sure your column alias is different from the column name of the base table:
mysql> SELECT DATE_FORMAT(t.`create`, "%d.%m.%Y") AS `created`
FROM `table` t GROUP by `created`;
+------------+
| created |
+------------+
| 01.04.2021 |
| 01.05.2021 |
+------------+

Add column with count(*)

When I do "select count(*) from users", it returns the data in the following format:
mysql> select count(*) from users;
+----------+
| count(*) |
+----------+
| 100 |
+----------+
1 row in set (0.02 sec)
I would like to get the data in the following format instead.
+---------+----------+
| key | count |
+---------+----------+
| my_count| 100 |
+---------+----------+
The reason is to feed this data to a pre-built widget which expects the data in the above format.
Is there a way to do this in SQL?
I tried various options such as "group by" but couldn't get it working.
mysql> select count(*) from users;
+---------+----------+
| key | count |
+---------+----------+
| my_count| 100 |
+---------+----------+
Just add a string literal to your select clause:
SELECT 'my_count' AS `key`, COUNT(*) AS count
FROM users;
Note that key is a reserved keyword in MySQL, so we must escape it using backticks.
If you intended to use GROUP BY, then you probably want a query like this:
SELECT `key`, COUNT(*) AS count
FROM users
GROUP BY `key`;

Get random posts without scanning the whole database [duplicate]

This question already has answers here:
Fetching RAND() rows without ORDER BY RAND() in just one query
(3 answers)
Closed 9 years ago.
How can I get random posts without scanning the whole database.
As I know if you use MySQL ORDER BY RAND() it will scan the whole database.
If there is any other way to do this without scanning the whole database.
A tiny modification of #squeamish ossifrage solution using primary key values - assumming that there is a primary key in a table with numeric values:
SELECT *
FROM delete_me
WHERE id >= Round( Rand() *
( SELECT Max( id ) FROM test ))
LIMIT 1
For table containing more than 50.000 rows the query runs in a 100 miliseconds:
mysql> SELECT id, table_schema, table_name
FROM delete_me
WHERE id >= Round( Rand() *
( SELECT Max( id ) FROM delete_me ))
LIMIT 1;
+-----+--------------------+------------+
| id | table_schema | table_name |
+-----+--------------------+------------+
| 173 | information_schema | PLUGINS |
+-----+--------------------+------------+
1 row in set (0.01 sec)
A lot of people seem to be convinced that ORDER BY RAND() is somehow able to produce results without scanning the whole table.
Well it isn't. In fact, it's liable to be slower than ordering by column values, because MySQL has to call the RAND() function for each row.
To demonstrate, I made a simple table of half a million MD5 hashes:
mysql> select count(*) from delete_me;
+----------+
| count(*) |
+----------+
| 500000 |
+----------+
1 row in set (0.00 sec)
mysql> explain delete_me;
+-------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| txt | text | NO | | NULL | |
+-------+------------------+------+-----+---------+----------------+
2 rows in set (0.12 sec)
mysql> select * from delete_me limit 4;
+----+----------------------------------+
| id | txt |
+----+----------------------------------+
| 1 | 9b912c03d87991b71955a6cd4f81a299 |
| 2 | f1b7ddeb1c1a14265a620b8f2366a22e |
| 3 | 067b39538b767e2382e557386cba37d9 |
| 4 | 1a27619c1d2bb8fa583813fdd948e94c |
+----+----------------------------------+
4 rows in set (0.00 sec)
Using ORDER BY RAND() to choose a random row from this table takes my computer 1.95 seconds.
mysql> select * from delete_me order by rand() limit 1;
+--------+----------------------------------+
| id | txt |
+--------+----------------------------------+
| 446149 | b5f82dd78a171abe6f7bcd024bf662e8 |
+--------+----------------------------------+
1 row in set (1.95 sec)
But ordering the text fields in ascending order takes just 0.8 seconds.
mysql> select * from delete_me order by txt asc limit 1;
+-------+----------------------------------+
| id | txt |
+-------+----------------------------------+
| 88583 | 00001e65c830f5b662ae710f11ae369f |
+-------+----------------------------------+
1 row in set (0.80 sec)
Since the id values in this table are numbered sequentially starting from 1, I can choose a random row much more quickly like this:
mysql> select * from delete_me where id=floor(1+rand()*500000) limit 1;
+-------+----------------------------------+
| id | txt |
+-------+----------------------------------+
| 37600 | 3b8aaaf88af68ca0c6eccff7e61e897a |
+-------+----------------------------------+
1 row in set (0.02 sec)
But in the general case, I would suggest using the method proposed by Mike in the page linked to by #deceze.
My suggestion for this kind of requirement is to use an MD5 hash.
Add a field to your DB table, CHAR(32), and create and index for it.
Populate it for every record with an MD5 hash of anything (maybe the value from the ID column or just any old random number, doesn't matter too much as long as each record is different)
Now you can query the table like so:
SELECT * FROM myTable WHERE md5Col > MD5(NOW()) LIMIT 1
This will give you a single random record without having to scan the whole table. The table has a random sort order thanks to the MD5 values. MD5 is great for this because it's quick and randomly distributed.
Caveats:
If the MD5 from your SELECT query results in a hash that is larger than the last record in your table, you might get no records from the query. If that happens, you can always re-query it with a new hash.
Having a fixed MD5 hash on each record means that the records are in a fixed order. This isn't really an issue if you're only ever fetching a single record at a time, but if you're using it to fetch groups of records, it may be noticable. You can of course correct this if you want by rehashing records as you load them.

MySQL Group By Sum by Day

I have a table with
id int pk auto_inc | created int(11) | amount int | user_id int
I want to create a list of rows grouped by day totalling the amount field.
I have tried this:
SELECT created, sum(amount) as amount, id FROM total_log WHERE user_id = $this->user_id GROUP BY DAY(created)
This doesn't give the right results. They are getting grouped into one row.
The date is saved from dd/mm/yyyy format to unix time stamp like 1349046000
SELECT
DATE(FROM_UNIXTIME(created)) as d,
sum(amount) as amount
FROM total_log
WHERE user_id = $this->user_id
GROUP BY d
MySQL doesn't like mixing day and int columns:
mysql> select day(1349046000);
+-----------------+
| day(1349046000) |
+-----------------+
| NULL |
+-----------------+
1 row in set, 1 warning (0.00 sec)
mysql> show warnings;
+---------+------+----------------------------------------+
| Level | Code | Message |
+---------+------+----------------------------------------+
| Warning | 1292 | Incorrect datetime value: '1349046000' |
+---------+------+----------------------------------------+
1 row in set (0.00 sec)
So all of your rows will have NULL for day(some_int_value), and they'll all be in the same group.
I would suggest using a date or datetime type for that column instead.
Also, columns not in the group by clause should not be referenced in the select statement, unless an aggregating function is used on them.
try
SELECT
DAY(DATE(FROM_UNIXTIME(created))),
sum(amount) as amount
FROM total_log
WHERE user_id = $this->user_id
GROUP BY DAY(DATE(FROM_UNIXTIME(created)))