Get random single row from mysql table with enumerated rows - mysql

I've got a table with auto-incremented ID in Mysql. I am always adding to this table, never deleting and setting the ID value to NULL so that I am pretty sure there are no holes. This is the table structure:
CREATE TABLE mytable (
id smallint(5) unsigned NOT NULL AUTO_INCREMENT,
data1 varchar(200) DEFAULT NULL,
data2 varchar(30) DEFAULT NULL,
PRIMARY KEY (id),
UNIQUE KEY data (data1,data2)
)
I want to pick up a random row from the table. I am using this:
select * from mytable where id=(select floor(1 + rand() * ((select max(id) from mytable) - 1)));
But sometimes I get nothing, sometimes one row, sometimes two. Replacing max(id) with count(*) or count(id) did not help. I understand it may be because rand() is evaluated for each row. As suggested in a similar question, I used this query:
select * from mytable cross join (select #rand := rand()) const where id=floor(1 + #rand*((select count(*) from mytable)-1));
But I still get an empty set sometimes. Same goes for this:
select * from mytable cross join (select #rand := rand()) const where id=floor(#rand*(select count(*) from mytable)+1);
I am looking for a fast way to do this, so that it won't take a long on big tables. ORDER BY rand() LIMIT 1 is not an option for me. Can't that be done with one query, can be?

Related

Quickly Select Random Rows With Where Condition

Is it possible to quickly select random rows from a table, while also using a where condition?
Example:
SELECT * FROM geo WHERE placeRef = 1 ORDER BY RAND() LIMIT 1
This can take 10+ seconds.
I found this, which is sometimes quick, sometimes very slow:
(SELECT *
FROM geo
INNER JOIN ( SELECT RAND() * ( SELECT MAX( nameRef ) FROM geo ) AS ID ) AS t ON geo.nameRef >= t.ID
WHERE geo.placeRef = 1
ORDER BY geo.nameRef
LIMIT 1)
This provides a quick result, only if there is no extra where condition.
This is the create table:
CREATE TABLE `geo` (
`nameRef` int(8) DEFAULT NULL,
`placeRef` mediumint(7) unsigned DEFAULT NULL,
`category` enum('continent','country','region','subregion') COLLATE utf8_bin DEFAULT NULL,
`parentRef` mediumint(7) DEFAULT NULL,
`incidence` int(9) unsigned NOT NULL,
`percent` decimal(11,9) unsigned DEFAULT NULL,
`ratio` int(11) NOT NULL,
`rank` mediumint(7) unsigned DEFAULT NULL,
KEY `placeRef_rank` (`placeRef`,`rank`),
KEY `nameRef_category` (`nameRef`,`category`),
KEY `nameRef_parentRef` (`nameRef`,`parentRef`),
KEY `nameRef_placeRef` (`nameRef`,`placeRef`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin
N.B. this table has around 550 million rows.
Desired query: query the table where placeRef = x; and then quickly return one row.
Issue: a query like SELECT * FROM geo WHERE placeRef = 1 can provide up to about 15 million results. So selecting a single random row is slow.
That technique is variable because it depends on where the matching rows happen to lie in the table.
The quick fix may be to add this index, assuming that nameRef is the PRIMARY KEY for the table:
INDEX(placeRef, nameRef)
Let's discuss this further after
You provide SHOW CREATE TABLE geo
You read http://mysql.rjweb.org/doc.php/random
There are (currently) 3 indexes that make this subquery very fast (because of the leading nameRef):
( SELECT MAX( nameRef ) FROM geo )
After that, my suggestion of (placeRef, nameRef) will kick in for these:
WHERE geo.placeRef = 1
geo.nameRef >= t.ID
I think the resulting query should be consistently fast.
This is pulling a result in 1/100th of a second:
SELECT * FROM geo where placeRef = 1 AND nameRef >= CEIL( RAND() * ( SELECT MAX( nameRef ) FROM forenameGeo ) ) LIMIT 1
This works well if you have an index on both the columns you would like to query. However, you may need to make a new table that is randomly ordered. In my table the nameRefs tend to be grouped by country. This causes the random results to be selected from a handful of results as most of the resulted are grouped around the same Id. I needed to create a new table ordered randomly ORDER BY RAND() where each row had a unique Id. Now I search this much smaller summary table with:
SELECT * FROM geoSummary where placeRef = 1 AND nameRef >= CEIL( RAND() * ( SELECT MAX( id ) FROM geoSummary ) ) LIMIT 1
Though to cut that SELECT MAX query running all the time I have saved the maximum Id in the server-side code, generate the random number there and run:
SELECT * FROM geoSummary where placeRef = 1 AND nameRef >= :random_number LIMIT 1
This provides truly random results.

Efficient way to list ranges of consecutive records

I have a table set up like so:
CREATE TABLE `cn` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`type` int(3) unsigned NOT NULL,
`number` int(10) NOT NULL,
`desc` varchar(64) NOT NULL,
`datetime` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB
number is usually but not necessarily unique.
Most of the table consists of rows with consecutive number entries.
e.g.
101010, 101011, 101012, etc.
I've been trying to find an efficient way to list ranges of consecutive numbers so I can find out where numbers are "missing" easily. What I'd like to do is list the start number, end number, and number of consecutive rows. Since there can be duplicates, I am using SELECT DISTINCT(number) to avoid duplicates.
I've not been having much luck - most of the questions of this type deal with dates and have been hard to generalize. One query was executing forever, so that was a no go. This answer is sort of close but not quite. It uses a CROSS JOIN, which sounds like a recipe for disaster when you have millions of records.
What would the best way to do this be? Some answers use joins, which I'm skeptical of performance wise. Right now there are only 50,000 rows, but it will be millions of records within a few days, and so every ounce of performance matters.
The eventual pseudoquery I have in mind is something like:
SELECT DISTINCT(number) FROM cn WHERE type = 1 GROUP BY [consecutive...] ORDER BY number ASC
This is a gaps-and-islands problem. You can solve by using the difference between row_number() and number to define groups; gaps are identified by changes in the difference:
select type, min(number) first_number, max(number) last_number, count(*) no_records
from (
select cn.*, row_number() over(order by number) rn
from cn
where type = 1
) c
group by type, number - rn
Note: window functions avalailable in MySQL 8.0 and MariaDB 10.3 onwards.
In earlier versions, you can emulate row_number() with a session variable:
select type, min(number) first_number, max(number) last_number, count(*) no_records
from (
select c.*, #rn := #rn + 1 rn
from (select * from cn where type = 1 order by number) c
cross join (select #rn := 0) r
) c
group by number - rn

MySQL logic for prioritizing matching pattern/text

I have a table of incidents that have a short_description. I'm trying to assign them to a category based on the text in that short_description (not ideal, I know, but I'm working with an existing system and don't have much control). So I created a lookup table with search_text to look for, and the category value that should be assigned to the incident. In some cases more than one search_text value matches the short_description. I want to use the priority field to choose the highest priority (lowest priority value, such as 1) when this happens. I feel like maybe this involves a window function or something, but I'm not sure how to approach it.
Can someone help me with the changes needed in the logic below?
Thanks!
The query below returns two results, but I want it to just return one (Cluster) because Cluster is priority 1, and Disk is priority 2. I only want one record per snumber (incident).
CREATE TABLE snow_incident_s_2 (
SNUMBER varchar(40) DEFAULT NULL,
SHORT_DESCRIPTION varchar(200) DEFAULT NULL
);
INSERT INTO snow_incident_s_2 (snumber, short_description) values ('INC15535802','Prognosis::ADMINCLUSTER [5251]::CmaDiskPartitionNearFull, PROGNOSIS:ADMINCLUSTER');
CREATE TABLE lkp_incident_category_2 (
incident_category_id smallint(6) NOT NULL AUTO_INCREMENT,
incident_type varchar(10) DEFAULT NULL,
category varchar(100) NOT NULL,
search_text varchar(200) NOT NULL,
priority smallint(6) DEFAULT NULL,
PRIMARY KEY (incident_category_id)
);
INSERT INTO lkp_incident_category_2 (incident_type, category, search_text, priority) values ('INC','Cluster','Cluster',1);
INSERT INTO lkp_incident_category_2 (incident_type, category, search_text, priority) values ('INC','Disk','Disk',2);
SELECT
inc.snumber,
inc.short_description,
ic.search_text,
ic.category
FROM
snow_incident_s_2 inc
LEFT JOIN
lkp_incident_category_2 ic ON inc.short_description LIKE CONCAT('%', ic.search_text, '%')
AND ic.incident_type = 'INC'
window functions don't exist in mysql. but you can mimic them using variables.
I have been trying to test this but sqlfiddle is not cooperating so here is what I believe should work for you give it a try.
SELECT *
FROM (
SELECT
inc.snumber
,inc.short_description
,ic.search_text
,ic.category
,(#rownum := IF(#incnum=inc.snumber,#rownum+1,1)) as IncidentRowNum
,#incnum := inc.snumber
FROM
snow_incident_s_2 inc
LEFT JOIN lkp_incident_category_2 ic
ON inc.short_description LIKE CONCAT('%', ic.search_text, '%')
AND ic.incident_type = 'INC'
CROSS JOIN (SELECT #rownum := 0, #incnum := '') var
ORDER BY
ic.priority
) t
WHERE
t.IncidentRowNum = 1
;
Fully tested and functional here is a sqlfiddle of it http://sqlfiddle.com/#!9/75ff1/7
Here is how to use a window function.
I have to say ON inc.short_description LIKE CONCAT('%', ic.search_text, '%') will be slow. If this is is done once a while (like every day) as an ad-hoc it will be fine, but if this is done often (every hour, every minute or more often) you'll want to materialize the existence of these values since you can't make an index for them unless you use full a text query or NOSQL solution.
SELECT snumber, short_description, search_text, category
FROM (
SELECT snumber, short_description, search_text, category,
ROW_NUMBER() OVER (partition by snumber order by priority asc) as rn
FROM (
SELECT
inc.snumber,
inc.short_description,
ic.search_text,
ic.category,
ic.priority
FROM snow_incident_s_2 inc
LEFT JOIN lkp_incident_category_2 ic
ON inc.short_description LIKE CONCAT('%', ic.search_text, '%')
AND c.incident_type = 'INC'
) X
) Y
WHERE RN = 1

SELECT index of single row in table

I have a MYSQL table which stores teams.
Table structure:
CREATE TABLE teams (
id int(11) NOT NULL AUTO_INCREMENT,
name varchar(28) COLLATE utf8_unicode_ci NOT NULL,
UNIQUE KEY id (id)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1;
Sample data:
INSERT INTOteamsVALUES
(1, 'one'),
(2, 'two'),
(3, 'three'),
(4, 'four'),
(5, 'five');
Use:
SELECT id, name, id as rowNumber FROM teams WHERE id = 4
Returns the correct rowNumber, as there are really three rows infront f it in the table. But this only works as long as I don't remove a row.
Example:
Let's say I DELETE FROM teams WHERE id = 3;
When I now use SELECT id, name, id as rowNumber FROM teams WHERE id = 4 the result is wrong as there are now only two rows (id's 1&2) infront of it in the table.
How can I get the "real" row number/index ordered by id from one specific row?
You are rturning ID as rowNumber, so it simply returning ID column value. Why do you expect it to be different?
I think you may want to define and #curRow variable to get the row number as and use sub query as below:
SELECT * from
(SELECT ID,
NAME,
#curRow := #curRow + 1 AS rowNumber
FROM Teams t
JOIN (SELECT #curRow := 0) curr
ORDER by t.ID asc) as ordered_team
WHERE ordered_team.id = 4;
It's not a good way, but with plain sql:
SELECT
t.id,
t.name,
(SELECT COUNT(*)+1 FROM teams WHERE id < t.id) as row_number
FROM teams t
WHERE t.id = 4
Why do you bother row indexes inside the persistance layer?
If your really need to rely on the "index" of the tupples stored, you could introduce a variable and increment it in the query/ program code for each row.
EDIT:
Just found that one:: With MySQL, how can I generate a column containing the record index in a table?

How to select last N Rows without using an Index

I have a query that contains several conditions to extract data from a table of 5 million rows. A composite index has been built to partially cover some of these conditions to the extend that I am not able to cover the sorting with an index:
SELECT columns FROM Table WHERE conditions='conditions' ORDER BY id DESC LIMIT N;
The id itself is an auto-increment column. The above query can be very slow (4-5s) as filesort is being used. By removing the ORDER BY clause, I am able to speed up the query by up to 4 times. However the data extracted will be mostly old data.
Since post-processing can be carried out to sort the extracted data, I am more interested in extracting data from roughly the latest N rows from the resultset. My question is, is there a way to do something like this:
SELECT columns FROM Table WHERE conditions='conditions' LIMIT -N;
Since I do not really need a sort and I know that there is very high likelihood that the bottom N rows contain newer data.
Here you go. Keep in mind that there should be no problem in using ORDER BY with any indexed columns, including id.
SET #seq:=0;
SELECT `id`
FROM (
SELECT #seq := #seq +1 AS `seq` , `id`
FROM `Table`
WHERE `condition` = 'whatever'
)t1
WHERE t1.seq
BETWEEN (
(
SELECT COUNT( * )
FROM `Table`
WHERE `condition` = 'whatever'
) -49
)
AND (
SELECT COUNT( * )
FROM `Table`
WHERE `condition` = 'whatever'
);
You can replace the "-49" with an expression like: -1 * ($quantity_desired -1);
Also check out this answer as it might help you:
https://stackoverflow.com/a/725439/631764
And here's another one:
https://stackoverflow.com/a/1441164/631764
Grab the last "few" rows using a between:
SELECT columns
FROM Table
WHERE conditions = 'conditions'
AND id between (select max(id) from table) - 50 AND (select max(id) from table)
ORDER BY id
DESC LIMIT N;
This example gets the last 50 rows, but the id index will be used efficiently. The other conditions and ordering will then be only over 50 rows. Should work a treat.