GROUP_CONCAT but with limits to get more than one row - mysql

I am developing a small jumbled-words game for users on a PtokaX DC hub I manage. For this, I'm storing the list of words inside a MySQL table. Table schema is as follows:
CREATE TABLE `jumblewords` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`word` CHAR(15) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE INDEX `word` (`word`)
)
COMMENT='List of words to be used for jumble game.'
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
Now, in the game-engine; I want to fetch 20 words as a string randomly. This I can achieve with a query similar to this:
SELECT GROUP_CONCAT(f.word SEPARATOR ', ' )
FROM ( SELECT j.word AS word
FROM jumblewords j
ORDER BY RAND()
LIMIT 20) f
but I have to execute this statement everytime the list expires(all 20 words have been put before user).
Can I modify this query so that I can fetch more than one row with the results as generated from the query I have above?

Probably an easier way to solve this problem is by storing the random words in a temporary table and later extract the values. A stored procedure would be perfect for that.
DELIMITER //
DROP PROCEDURE IF EXISTS sp_jumblewords //
CREATE PROCEDURE sp_jumblewords(no_lines INT)
BEGIN
DROP TABLE IF EXISTS tmp_jumblewords;
CREATE TEMPORARY TABLE tmp_jumblewords (
`word` VARCHAR(340) NOT NULL);
REPEAT
INSERT INTO tmp_jumblewords
SELECT GROUP_CONCAT(f.word SEPARATOR ', ' )
FROM ( SELECT j.word AS word
FROM jumblewords j
ORDER BY RAND()
LIMIT 20) f;
SET no_lines = no_lines - 1;
UNTIL no_lines = 0
END REPEAT;
SELECT * FROM tmp_jumblewords;
END //
DELIMITER ;
CALL sp_jumblewords(20);

Related

Select all tables with same columns by suffix and merging them into new one

I have an arbitrary set of tables with exactly the same structure (columns) and all of the tables have the same suffix _data.
What I've tried:
CREATE TABLEglobal_dataAS SELECT * FROM (SELECT * FROMv_dataUNION ALL SELECT * FROMx_dataUNION ALL SELECT * FROMz_dataUNION ALL SELECT * FROMd_data) X GROUP BY ('id') ORDER BY 1
But as a result i'me getting only one single row even without auto increment colimn, but I need all the rows that exists in each of the table snd with autoincrement column.
So what I need is an SQL query for:
Select all tables by suffix.
Create a new one table with merged table values, where duplicates
should be skipped, and the remaining unique values needs to by merged into a
new one.
In result table should be id column with Unique and AutoIncrement attributes.
This answer uses a prepared statement, lots of people here will give you loads of grief about it, so make sure you are aware of the risks of SQL injection..
-- Create some tables, drop them if they exist already.
DROP TABLE IF EXISTS Table1_Data;
CREATE TABLE Table1_Data
(
Id INTEGER,
StoredValued VARCHAR(10)
);
DROP TABLE IF EXISTS Table2_Data;
CREATE TABLE Table2_Data
(
Id INTEGER,
StoredValued VARCHAR(10)
);
DROP TABLE IF EXISTS Table3_Data;
CREATE TABLE Table3_Data
(
Id INTEGER,
StoredValued VARCHAR(10)
);
DROP TABLE IF EXISTS Table4_Data;
CREATE TABLE Table4_Data
(
Id INTEGER,
StoredValued VARCHAR(10)
);
DROP TABLE IF EXISTS Result;
CREATE TABLE Result
(
Id INTEGER,
StoredValued VARCHAR(10)
);
-- Insert some data into the tables
INSERT INTO Table1_Data VALUES (1,'Test'),(2,'Testy'),(3,'Testing');
INSERT INTO Table2_Data VALUES (1,'Foo'),(2,'Fooby'),(3,'Foober');
INSERT INTO Table3_Data VALUES (1,'Bar'),(2,'oobar'),(3,'Barbo');
INSERT INTO Table4_Data VALUES (1,'Bar'),(2,'Testy'),(3,'JubJub');
-- Create a statement to execute
SELECT CONCAT('INSERT INTO Result',GROUP_CONCAT(' SELECT * FROM ',TABLE_SCHEMA,'.',TABLE_NAME SEPARATOR ' UNION ')) INTO #query
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME LIKE '%_Data';
-- Execute the statement
PREPARE stmt1 FROM #query;
EXECUTE stmt1;
DEALLOCATE PREPARE stmt1;
-- Get the results from our new table.
SELECT *
FROM Result;

MySQL Find distinct pair of values per group in time interval

I have the following table in MySQL:
CREATE TABLE `events` (
`pv_name` varchar(60) COLLATE utf8mb4_bin NOT NULL,
`time_stamp` bigint(20) unsigned NOT NULL,
`event_type` varchar(40) COLLATE utf8mb4_bin NOT NULL,
`has_data` tinyint(1) NOT NULL,
`data` json DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin ROW_FORMAT=COMPRESSED;
ALTER TABLE `events`
ADD PRIMARY KEY (`pv_name`,`time_stamp`), ADD KEY `has_data` (`has_data`,`pv_name`,`time_stamp`);
I have been struggling to construct an efficient query to find each pv_name that has at least one change in value in a given time interval.
I believe that the query I currently have is inefficient because it finds all of the distinct values in the given time interval for each pv_name, instead of stopping as soon as it finds more than one:
SELECT events.pv_name
FROM events
WHERE events.time_stamp > 0 AND events.time_stamp < 9999999999999999999
GROUP BY events.pv_name
HAVING COUNT(DISTINCT JSON_EXTRACT(events.data, '$.value')) > 1;
To avoid this I am considering breaking the count and distinct parts into separate steps, since the documentation says that:
When combining LIMIT row_count with DISTINCT, MySQL stops as soon as
it finds row_count unique rows.
Is there an efficient query to find a pair of distinct values for each pv_name in a given time interval, that does not have to find all of the distinct values for each pv_name in a given time interval?
EDIT #Rick James
I am essentially trying to find a faster non cursor based solution for this:
SET #old_sql_mode=##sql_mode, sql_mode='STRICT_ALL_TABLES';
DELIMITER //
DROP PROCEDURE IF EXISTS check_for_change;
CREATE PROCEDURE check_for_change(IN t0_in bigint(20) unsigned, IN t1_in bigint(20) unsigned)
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE current_pv_name VARCHAR(60);
DECLARE cur CURSOR FOR SELECT DISTINCT pv_name FROM events;
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done = TRUE;
SET #t0_in := t0_in;
SET #t1_in := t1_in;
IF #t0_in > #t1_in THEN
SET #temp := #t0_in;
SET #t0_in := #t1_in;
SET #t1_in := #temp;
END IF;
DROP TEMPORARY TABLE IF EXISTS has_change;
CREATE TEMPORARY TABLE has_change (
pv_name varchar(60) NOT NULL,
PRIMARY KEY (pv_name)
) ENGINE=Memory DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
OPEN cur;
label1: LOOP
FETCH cur INTO current_pv_name;
IF done THEN
LEAVE label1;
END IF;
INSERT INTO has_change
SELECT current_pv_name
FROM (
SELECT DISTINCT JSON_EXTRACT(events.data, '$.value') AS distinct_value
FROM events
WHERE events.pv_name = current_pv_name
AND events.has_data = 1
AND events.time_stamp > #t0_in AND events.time_stamp < #t1_in
LIMIT 2 ) AS t
HAVING COUNT(t.distinct_value) = 2;
END LOOP;
CLOSE cur;
END //
DELIMITER ;
SET sql_mode=#old_sql_mode;
The optimization here is in the application of the limit on the number of distinct values to find for each pv_name.
There is no LIMIT, so the quote does not apply. (Or at least, I think not.)
COUNT(DISTINCT ...) will, in some cases do a "loose scan", which is better than reading every row. For example,
SELECT name
FROM tbl
GROUP BY name
HAVING COUNT(DISTINCT foo) > 3;
together with INDEX(name, foo) would probably leapfrog through the index to do the COUNT DISTINCT of foos for each name. Granted, this is not "stopping at 3" as you requested.
You can demonstrate the above by doing
FLUSH STATUS;
SELECT ...;
SHOW SESSIONS STATUS LIKE 'Handler%';
To see that it did not (or did) have a Handler_read count that is the size of the table.
The loose scan is not applicable to your particular query for multiple reasons.
Bottom line: "No, you can't achieve your goal".
Also, the stored routine you wrote will probably be much slower than simply accepting the overhead of a full scan.

MySQL: How to optimize this simple GROUP BY+ORDER BY query?

I have one mysql table:
CREATE TABLE IF NOT EXISTS `test` (
`Id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`SenderId` int(10) unsigned NOT NULL,
`ReceiverId` int(10) unsigned NOT NULL,
`DateSent` datetime NOT NULL,
`Notified` tinyint(1) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`Id`),
KEY `ReceiverId_SenderId` (`ReceiverId`,`SenderId`),
KEY `SenderId` (`SenderId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
The table is populated with 10.000 random rows for testing by using the following procedure:
DELIMITER //
CREATE DEFINER=`root`#`localhost` PROCEDURE `FillTest`(IN `cnt` INT)
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE intSenderId INT;
DECLARE intReceiverId INT;
DECLARE dtDateSent DATE;
DECLARE blnNotified INT;
WHILE (i<=cnt) DO
SET intSenderId = FLOOR(1 + (RAND() * 50));
SET intReceiverId = FLOOR(51 + (RAND() * 50));
SET dtDateSent = str_to_date(concat(floor(1 + rand() * (12-1)),'-',floor(1 + rand() * (28 -1)),'-','2008'),'%m-%d-%Y');
SET blnNotified = FLOOR(1 + (RAND() * 2))-1;
INSERT INTO test (SenderId, ReceiverId, DateSent, Notified)
VALUES(intSenderId,intReceiverId,dtDateSent, blnNotified);
SET i=i+1;
END WHILE;
END//
DELIMITER ;
CALL `FillTest`(10000);
The problem:
I need to write a query which will group by ‘SenderId, ReceiverId’ and return the first 100 highest Ids of each group, ordered by Id in ascending order.
I played with GROUP BY, ORDER BY and MAX(Id), but the query was too slow, so I came up with this query:
SELECT SQL_NO_CACHE t1.*
FROM test t1
LEFT JOIN test t2 ON (t1.ReceiverId = t2.ReceiverId AND t1.SenderId = t2.SenderId AND t1.Id < t2.Id)
WHERE t2.Id IS NULL
ORDER BY t1.Id ASC
LIMIT 100;
The above query returns the correct data, but it becomes too slow when the test table has more than 150.000 rows . On 150.000 rows the above query needs 7 seconds to complete. I expect the test table to have between 500.000 – 1M rows, and the query needs to return the correct data in less than 3 sec. If it’s not possible to fetch the correct data in less than 3 sec, than I need it to fetch the data using the fastest query possible.
So, how can the above query be optimized so that it runs faster?
Reasons why this query may be slow:
It's a lot of data. Lots of it may be returned. It returns the last record for each SenderId/ReceiverId combination.
The division of the data (many Sender/Receiver combinations, or relative few of them, but with multiple 'versions'.
The whole result set must be sorted by MySQL, because you need the first 100 records, sorted by Id.
These make it hard to optimize this query without restructuring the data. A few suggestions to try:
- You could try using NOT EXISTS, although I doubt if it would help.
SELECT SQL_NO_CACHE t1.*
FROM test t1
WHERE NOT EXISTS
(SELECT 'x'
FROM test t2
WHERE t1.ReceiverId = t2.ReceiverId AND t1.SenderId = t2.SenderId AND t1.Id < t2.Id)
ORDER BY t1.Id ASC
LIMIT 100;
- You could try using proper indexes on ReceiverId, SenderId and Id. Experiment with creating a combined index on the three columns. Try two versions, one with Id being the first column, and one with Id being the last.
With slight database modifications:
- You could save a combination of SenderId/ReceiverId in a separate table with a LastId pointing to the record you want.
- You could save a 'PreviousId' with each record, keeping it NULL for the last record per Sender/Receiver. You only need to query the records where previousId is null.

MySQL Row counter in Update statement

The following MySQL statement is working fine, and it returns me the rownumber as row, of each result. But now, what I want to do, is setting the column pos with the value of "row", by using an update statement, since I don't want to loop thousands of records with single queries.
Any ideas?
SELECT #row := #row + 1 AS row, u.ID,u.pos
FROM user u, (SELECT #row := 0) r
WHERE u.year<=2010
ORDER BY u.pos ASC LIMIT 0,10000
There is a risk using user defined variables
In a SELECT statement, each select expression is evaluated only when sent to the client. This means that in a HAVING, GROUP BY, or ORDER BY clause, referring to a variable that is assigned a value in the select expression list does not work as expected:
A more safe guard method will be
create table tmp_table
(
pos int(10) unsigned not null auto_increment,
user_id int(10) not null default 0,
primary key (pos)
);
insert into tmp_table
select null, u.ID
from user
where u.year<=2010
order by YOUR_ORDERING_DECISION
limit 0, 10000;
alter table tmp_table add index (user_id);
update user, tmp_table
set user.pos=tmp_table.pos
where user.id=tmp_table.user_id;
drop table tmp_table;

How to split a single row in to multiple columns in mysql

Simply Asking, Is there any function available in mysql to split single row elements in to multiple columns ?
I have a table row with the fields, user_id, user_name, user_location.
In this a user can add multiple locations. I am imploding the locations and storing it in a table as a single row using php.
When i am showing the user records in a grid view, I am getting problem for pagination as i am showing the records by splitting the user_locations. So I need to split the user_locations ( single row to multiple columns).
Is there any function available in mysql to split and count the records by character ( % ).
For Example the user_location having US%UK%JAPAN%CANADA
How can i split this record in to 4 columns.
I need to check for the count values (4) also. thanks in advance.
First normalize the string, removing empty locations and making sure there's a % at the end:
select replace(concat(user_location,'%'),'%%','%') as str
from YourTable where user_id = 1
Then we can count the number of entries with a trick. Replace '%' with '% ', and count the number of spaces added to the string. For example:
select length(replace(str, '%', '% ')) - length(str)
as LocationCount
from (
select replace(concat(user_location,'%'),'%%','%') as str
from YourTable where user_id = 1
) normalized
Using substring_index, we can add columns for a number of locations:
select length(replace(str, '%', '% ')) - length(str)
as LocationCount
, substring_index(substring_index(str,'%',1),'%',-1) as Loc1
, substring_index(substring_index(str,'%',2),'%',-1) as Loc2
, substring_index(substring_index(str,'%',3),'%',-1) as Loc3
from (
select replace(concat(user_location,'%'),'%%','%') as str
from YourTable where user_id = 1
) normalized
For your example US%UK%JAPAN%CANADA, this prints:
LocationCount Loc1 Loc2 Loc3
4 US UK JAPAN
So you see it can be done, but parsing strings isn't one of SQL's strengths.
The "right thing" would be splitting the locations off to another table and establish a many-to-many relationship between them.
create table users (
id int not null auto_increment primary key,
name varchar(64)
)
create table locations (
id int not null auto_increment primary key,
name varchar(64)
)
create table users_locations (
id int not null auto_increment primary key,
user_id int not null,
location_id int not null,
unique index user_location_unique_together (user_id, location_id)
)
Then, ensure referential integrity either using foreign keys (and InnoDB engine) or triggers.
this should do it
DELIMITER $$
DROP PROCEDURE IF EXISTS `CSV2LST`$$
CREATE DEFINER=`root`#`%` PROCEDURE `CSV2LST`(IN csv_ TEXT)
BEGIN
SET #s=CONCAT('select \"',REPLACE(csv_,',','\" union select \"'),'\";');
PREPARE stmt FROM #s;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END$$
DELIMITER ;
You should do this in your client application, not on the database.
When you make a SQL query you must statically specify the columns you want to get, that is, you tell the DB the columns you want in your resultset BEFORE executing it. For instance, if you have a datetime stored, you may do something like select month(birthday), select year(birthday) from ..., so in this case we split the column birthday into 2 other columns, but it is specified in the query what columns we will have.
In your case, you would have to get exactly that US%UK%JAPAN%CANADA string from the database, and then you split it later in your software, i.e.
/* get data from database */
/* ... */
$user_location = ... /* extract the field from the resultset */
$user_locations = explode("%", $user_location);
This is a bad design, If you can change it, store the data in 2 tables:
table users: id, name, surname ...
table users_location: user_id (fk), location
users_location would have a foreign key to users thorugh user_id field