I have a table keywords with columns keyword and weight. My goal is to randomly select one keyword, but to regard its weight (probability). I found two ways to solve this, where the latter one is more elegant (and consumes less ressources) - but i dont get it to run. See yourself.
The table and records:
CREATE TABLE IF NOT EXISTS `keywords` (
`keyword` varchar(100) COLLATE utf8_bin NOT NULL,
`weight` int(11) NOT NULL,
UNIQUE KEY `keywords` (`keyword`),
KEY `rate` (`weight`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
INSERT INTO `keywords` (`keyword`, `weight`) VALUES
('google', 50),
('microsoft', 20),
('apple', 10),
('yahoo', 5),
('bing', 5),
('xing', 5),
('cool', 5);
Query 1
Consumes more ressources, i work on 5k+ records. Source is Why would this MySQL query using rand() return no results about a third of the time?:
SELECT * FROM `keywords` ORDER BY -LOG(1.0 - RAND()) / weight LIMIT 1
Query 2
Sums up weights to #weight_sum. Sets #weight_point to RAND() number from within that range. Loops through all records, substracting weight from #weight_pos and setting #keyword to the current keywords.keyword. Until #weight_pos < 0. Then it keeps that keyword. Source is Random Weighted Choice in T-SQL
SET #keyword = 0;
SET #weight_sum = (SELECT SUM(weight) FROM keywords);
SET #rand = RAND();
SET #weight_point = ROUND(((#weight_sum - 1) * #rand + 1), 0);
SET #weight_pos = #weight_point;
SELECT
keyword,
weight,
#keyword:=CASE
WHEN #weight_pos < 0 THEN #keyword
ELSE keyword
END AS test,
(#weight_pos:=(#weight_pos - weight)) AS curr_weight,
#weight_point,
#keyword,
#weight_pos,
#rand,
#weight_sum
FROM
keywords;
See phpmyadmin results here http://postimg.org/image/stgpd776f/
My Question
How do i get the value in #keyword, or what the test column holds in the end? Adding a SELECT #keyword afterwards doesn't change anything.
Ok, i think my question was more or less a basic mysql-question. I achieved what i wanted by encapsulating the above SELECT statement into another SELECT, that then filtered the first one's result for what i searched. Sorry for bothering you. See the query:
SET #keyword = 0;
SET #weight_sum = (SELECT SUM(weight) FROM keywords);
SET #rand = RAND();
SET #weight_point = ROUND(((#weight_sum - 1) * #rand + 1), 0);
SET #weight_pos = #weight_point;
SELECT t.test FROM (
SELECT
keyword,
weight,
#keyword:=CASE
WHEN #weight_pos < 0 THEN #keyword
ELSE keyword
END AS test,
(#weight_pos:=(#weight_pos - weight)) AS curr_weight,
#weight_point,
##keyword,
#weight_pos,
#rand,
#weight_sum
FROM
keywords
) AS t
WHERE
t.curr_weight < 0
LIMIT
1;
Related
I need to count how many characters are equal in two text columns (same size, in the same table).
For example:
RowNum: Template: Answers:
------- --------- --------
1 ABCDEABCDEABCDE ABCDAABCDBABCDC
2 EDAEDAEDAEDAEDA EDBEDBEDBEDBEDB
SELECT SOME_COUNT_FUNCTION (Template, Answers) should return:
RowNum: Result:
------- -------
1 12
2 10
The database is a MySQL.
Not exactly MySQL, but here's something that works in SQL Server. Maybe it'll translate over.
DROP TABLE IF EXISTS #tmp
CREATE TABLE #tmp (
[RowNum] INT IDENTITY(1,1) PRIMARY KEY,
[Template] NVARCHAR(20),
[Answer] NVARCHAR(20),
[Result] INT
)
INSERT INTO #tmp
VALUES ('ABCDEABCDEABCDE','ABCDAABCDBABCDC', NULL),
('EDAEDAEDAEDAEDA','EDBEDBEDBEDBEDB', NULL)
--SELECT * FROM #tmp
DECLARE #current_template NVARCHAR(50) -- Variable to hold the current template
, #current_answer NVARCHAR(50) -- Variable to hold the current answer
, #template_char CHAR(1) -- Char for template letter
, #answer_char CHAR(1) -- Char for answer letter
, #word_index INT -- Index (position) within each word
, #match_counter INT -- Match counter for each word
, #max_iter INT = (SELECT TOP 1 RowNum FROM #tmp ORDER BY RowNum DESC) -- Max iterations
, #row_idx INT = (SELECT TOP 1 RowNum FROM #tmp) -- Minimum RowNum as initial row index value.
WHILE (#row_idx <= #max_iter)
BEGIN
SET #match_counter = 0 -- Reset match counter for each row
SET #word_index = 1 -- Reset word index for each row
SET #current_template = (SELECT [Template] FROM #tmp WHERE RowNum = #row_idx)
SET #current_answer = (SELECT [Answer] FROM #tmp WHERE RowNum = #row_idx)
WHILE (#word_index <= LEN(#current_template))
BEGIN
SET #template_char = SUBSTRING(#current_template, #word_index, 1)
SET #answer_char = SUBSTRING(#current_answer, #word_index, 1)
IF (#answer_char = #template_char)
BEGIN
SET #match_counter += 1
END
SET #word_index += 1
END
UPDATE #tmp
SET Result = #match_counter
WHERE RowNum = #row_idx
SET #row_idx += 1
END
Get values from the temp table:
SELECT * FROM #tmp
Output:
RowNum Template Answer Result
1 ABCDEABCDEABCDE ABCDAABCDBABCDC 12
2 EDAEDAEDAEDAEDA EDBEDBEDBEDBEDB 10
If you are running MySQL 8.0, you can use a recursive query compare the strings character by character:
with recursive chars as (
select rownum, template, answers, 1 idx, 0 res from mytable
union all
select
rownum,
template,
answers,
idx + 1,
res + ( substr(template, idx, 1) = substr(answers, idx, 1) )
from chars
where idx <= least(char_length(template), char_length(answers))
)
select rownum, max(res) result from chars group by rownum order by rownum
In the CTE (the with clause), the anchor (the query before union all) selects the whole table, then the recursive member (the query after union all) compares the characters and the current position (idx) increments the result (res) if they match, and advances to the next position, until the (smallest) string is exhausted. Then, the outer query just aggregates by rownum.
Demo on DB Fiddle:
rownum | result
-----: | -----:
1 | 12
2 | 10
Please bear in mind that this query will not perform well against a large dataset. Other slighly more efficient solutions exist (typically, using a number table instead of a recursive cte), but basically, as commented by Gordon Linoff, you do want to fix your data structure if you need to run such queries. You should store each character in a separate row, along with its rownum and its index in the string. Materialize the proper data structure, and then you won't need to generate it on the fly in each and every query.
I'm having trouble understanding how triggers execute in MySQL and I'm banging my head on the wall, because I can't seem to find out why it doesn't work.
I have the following trigger
CREATE TRIGGER Insert_Products BEFORE INSERT ON `Products`
FOR EACH ROW BEGIN
DECLARE x_ProductID INT;
SET x_ProductID = NEW.`ProductID`;
SET NEW.`PriceExVAT` = (
SELECT
ROUND(p.`Price` * 100 / (100 + v.`VATPercentage`), 2) as priceexvat
FROM
`Products` p
LEFT JOIN
`VAT` v ON p.`VATID` = v.`VATID`
WHERE p.`ProductID` = x_ProductID); -- also tried inserting NEW.`ProductID` directly into this line
END $$
However it populates my rows with null instead of the correct values. HOWEVER, putting it in a select query results the correct values. IE:
SELECT
ROUND(p.`Price` * 100 / (100 + v.`VATPercentage`), 2) as x_value
FROM
`Products` p
LEFT JOIN
`VAT` v ON p.`VATID` = v.`VATID`
WHERE p.`ProductID` = 1;
I tried putting it in an AFTER INSERT trigger, but that resulted in a different error. What am I not seeing, how should I fix this?
You do not need to query the product-table for the values of the currently inserted row: apart from the autoincrement id, they are provided in NEW, and you can use them directly:
SET NEW.`PriceExVAT` = (
SELECT ROUND(NEW.`Price` * 100 / (100 + v.`VATPercentage`), 2) as priceexvat
FROM `VAT` v
WHERE NEW.`VATID` = v.`VATID`
)
You can do the same in an before update-trigger.
I tried to write a SQL-function that generates an unused unique ID in a range between 1000000 and 4294967295. I need numeric values, so UUID() alike is not a solution. It doesn't sound that difficult, but for some reason, the code below does not work when called within an INSERT-statement on a table as value for the primary key (not auto_increment, of course). The statement is like INSERT INTO table (id, content) VALUES ((SELECT getRandomID(0,0)), 'blabla bla');
(Since default values are not allowed in such functions, I shortly submit 0 for each argument and set it in the function to the desired value.)
Called once and separated from INSERT or Python-code, everything is fine. Called several times, something weird happens and not only the whole process but also the server might hang within REPEAT. The process is then not even possible to kill/restart; I have to reboot the machine -.-
It also seems to only have some random values ready for me, since the same values appear again and again after some calls, allthough I actually thought that the internal rand() would be a sufficient start/seed for the outer rand().
Called from Python, the loop starts to hang after some rounds although the very first one in my tests always produces a useful, new ID and therefore should quit after the first round. Wyh? Well, the table is empty...so SELECT COUNT(*)... returns 0 which actually is the signal for leaving the loop...but it doesn't.
Any ideas?
I'm running MariaDB 10.something on SLES 12.2. Here is the exported source code:
DELIMITER $$
CREATE DEFINER=`root`#`localhost` FUNCTION `getRandomID`(`rangeStart` BIGINT UNSIGNED, `rangeEnd` BIGINT UNSIGNED) RETURNS bigint(20) unsigned
READS SQL DATA
BEGIN
DECLARE rnd BIGINT unsigned;
DECLARE i BIGINT unsigned;
IF rangeStart is null OR rangeStart < 1 THEN
SET rangeStart = 1000000;
END IF;
IF rangeEnd is null OR rangeEnd < 1 THEN
SET rangeEnd = 4294967295;
END IF;
SET i = 0;
r: REPEAT
SET rnd = FLOOR(rangeStart + RAND(RAND(FLOOR(1 + rand() * 1000000000))*10) * (rangeEnd - rangeStart));
SELECT COUNT(*) INTO i FROM `table` WHERE `id` = rnd;
UNTIL i = 0 END REPEAT r;
RETURN rnd;
END$$
DELIMITER ;
A slight improvement:
SELECT COUNT(*) INTO i FROM `table` WHERE `id` = rnd;
UNTIL i = 0 END REPEAT r;
-->
UNTIL NOT EXISTS( SELECT 1 FROM `table` WHERE id = rnd ) REPEAT r;
Don't pass any argument to RAND -- that is for establishing a repeatable sequence of random numbers.
mysql> SELECT RAND(123), RAND(123), RAND(), RAND()\G
*************************** 1. row ***************************
RAND(123): 0.9277428611440052
RAND(123): 0.9277428611440052
RAND(): 0.5645420109522921
RAND(): 0.12561983719991504
1 row in set (0.00 sec)
So simplify to
SET rnd = FLOOR(rangeStart + RAND() * (rangeEnd - rangeStart));
If you want to include rangeEnd in the possible outputs, add 1:
SET rnd = FLOOR(rangeStart + RAND() * (rangeEnd - rangeStart + 1));
I'd like some help in optimizing the following query:
SELECT DISTINCT TOP (#NumberOfResultsRequested) dbo.FilterRecentSearchesTitles(OriginalSearchTerm) AS SearchTerms
FROM UserSearches
WHERE WebsiteID = #WebsiteID
AND LEN(OriginalSearchTerm) > 20
--AND dbo.FilterRecentSearchesTitles(OriginalSearchTerm) NOT IN (SELECT KeywordUrl FROM PopularSearchesBaseline WHERE WebsiteID = #WebsiteID)
GROUP BY OriginalSearchTerm, GeoID
It runs fine without the line that is commented out. I have an index set on UserSearches.OriginalSearchTerm, WebsiteID, and PopularSearchesBaseline.KeywordUrl, but the query still runs slow with this line in there.
-- UPDATE --
The function used is as follows:
ALTER FUNCTION [dbo].[FilterRecentSearchesTitles]
(
#SearchTerm VARCHAR(512)
)
RETURNS VARCHAR(512)
AS
BEGIN
DECLARE #Ret VARCHAR(512)
SET #Ret = dbo.RegexReplace('[0-9]', '', REPLACE(#SearchTerm, '__s', ''), 1, 1)
SET #Ret = dbo.RegexReplace('\.', '', #Ret, 1, 1)
SET #Ret = dbo.RegexReplace('\s{2,}', ' ', #Ret, 1, 1)
SET #Ret = dbo.RegexReplace('\sv\s', ' ', #Ret, 1, 1)
RETURN(#Ret)
END
Using the Reglar Expression Workbench code.
However, as I mentioned - without the line that is currently commented out it runs fine.
Any other suggestions?
I am going to guess that dbo.FilterRecentSearchesTitles(OriginalSearchTerm) is a function. My suggestion would be to see about rewriting it into a table valued function so you can return a table that could be joined on.
Otherwise you are calling that function for each row you are trying to return which is going to cause your problems.
If you cannot rewrite the function, then why not create a stored proc that will only execute it once, similar to this:
SELECT DISTINCT TOP (#NumberOfResultsRequested) dbo.FilterRecentSearchesTitles(OriginalSearchTerm) AS SearchTerms
INTO #temp
WHERE WebsiteID = #WebsiteID
SELECT *
FROM #temp
WHERE SearchTerms NOT IN (SELECT KeywordUrl
FROM PopularSearchesBaseline
WHERE WebsiteID = #WebsiteID)
Then you get your records into a temp table after executing the function once and then you select on the temp table.
I might try to use a persisted computed column in this case:
ALTER TABLE UserSearches ADD FilteredOriginalSearchTerm AS dbo.FilterRecentSearchesTitles(OriginalSearchTerm) PERSISTED
You will probably have to add WITH SCHEMABINDING to your function (and the RegexReplace function) like so:
ALTER FUNCTION [dbo].[FilterRecentSearchesTitles]
(
#SearchTerm VARCHAR(512)
)
RETURNS VARCHAR(512)
WITH SCHEMABINDING -- You will need this so the function is considered deterministic
AS
BEGIN
DECLARE #Ret VARCHAR(512)
SET #Ret = dbo.RegexReplace('[0-9]', '', REPLACE(#SearchTerm, '__s', ''), 1, 1)
SET #Ret = dbo.RegexReplace('\.', '', #Ret, 1, 1)
SET #Ret = dbo.RegexReplace('\s{2,}', ' ', #Ret, 1, 1)
SET #Ret = dbo.RegexReplace('\sv\s', ' ', #Ret, 1, 1)
RETURN(#Ret)
END
This makes your query look like this:
SELECT DISTINCT TOP (#NumberOfResultsRequested) FilteredOriginalSearchTerm AS SearchTerms
FROM UserSearches
WHERE WebsiteID = #WebsiteID
AND LEN(OriginalSearchTerm) > 20
AND FilteredOriginalSearchTerm NOT IN (SELECT KeywordUrl FROM PopularSearchesBaseline WHERE WebsiteID = #WebsiteID)
GROUP BY OriginalSearchTerm, GeoID
Which could potentially be optimized for speed (if necessary) with a join instead of not in, or maybe different indexing (perhaps on the computed column, or some covering indexes). Also, DISTINCT with a GROUP BY is somewhat of a code smell to me, but it could be legit.
Instead of using using the function on SELECT, I modified the INSERT query to include this function. That way, I avoid calling the function for every row when I later want to retrieve the data.
I have near about 200 words. I want to see how many times those words occurred in a column of a table.
e.g: say we have table test with column statements which has two rows.
How are you. It's been long since I met you.
I am fine how are you.
Now I want to find the the occurrences of words "you" and "how". Output should be something like:
word count
you 3
how 2
since "you" has 3 and how has 2 occurrences in the two rows.
How can I do this?
You can do it like this:
Split the phrase and put all items in a different table;
Remove all ponctuation;
Make a select using the created table and the words that you want to identify.
The way I would approach this is to write a little user defined function to give me the number of times one string appears in another with some allowances for:
upper and lower case
common punctuation
I would then create a table with all of the words that I wish to search with i.e. your 200 list. Then use the function to count the number of occurrences of each word in every phrase, put that in a inline view and then sum the results up by search word.
Hence:
User Defined Function
DELIMITER $$
CREATE FUNCTION `get_word_count`(phrase VARCHAR(500),word VARCHAR(255), delimiter VARCHAR(1)) RETURNS int(11)
READS SQL DATA
BEGIN
DECLARE cur_position INT DEFAULT 1 ;
DECLARE remainder TEXT;
DECLARE cur_string VARCHAR(255);
DECLARE delimiter_length TINYINT UNSIGNED;
DECLARE total INT;
DECLARE result DOUBLE DEFAULT 0;
DECLARE string2 VARCHAR(255);
SET remainder = replace(phrase,'!',' ');
SET remainder = replace(remainder,'.',' ');
SET remainder = replace(remainder,',',' ');
SET remainder = replace(remainder,'?',' ');
SET remainder = replace(remainder,':',' ');
SET remainder = replace(remainder,'(',' ');
SET remainder = lower(remainder);
SET string2 = concat(delimiter,trim(word),delimiter);
SET delimiter_length = CHAR_LENGTH(delimiter);
SET cur_position = 1;
WHILE CHAR_LENGTH(remainder) > 0 AND cur_position > 0 DO
SET cur_position = INSTR(remainder, delimiter);
IF cur_position = 0 THEN
SET cur_string = remainder;
ELSE
SET cur_string = concat(delimiter,LEFT(remainder, cur_position - 1),delimiter);
END IF;
IF TRIM(cur_string) != '' THEN
set result = result + (select instr(string2,cur_string) > 0);
END IF;
SET remainder = SUBSTRING(remainder, cur_position + delimiter_length);
END WHILE;
RETURN result;
END$$
DELIMITER ;
You might have to play with this function a little depending on what allowances you need to make for punctuation and case. Hopefully you get the idea here though!
Populate tables
create table search_word
(id int unsigned primary key auto_increment,
word varchar(250) not null
);
insert into search_word (word) values ('you');
insert into search_word (word) values ('how');
insert into search_word (word) values ('to');
insert into search_word (word) values ('too');
insert into search_word (word) values ('the');
insert into search_word (word) values ('and');
insert into search_word (word) values ('world');
insert into search_word (word) values ('hello');
create table phrase_to_search
(id int unsigned primary key auto_increment,
phrase varchar(500) not null
);
insert into phrase_to_search (phrase) values ("How are you. It's been long since I met you");
insert into phrase_to_search (phrase) values ("I am fine how are you?");
insert into phrase_to_search (phrase) values ("Oh. Not bad. All is ok with the world, I think");
insert into phrase_to_search (phrase) values ("I think so too!");
insert into phrase_to_search (phrase) values ("You know what? I think so too!");
Run Query
select word,sum(word_count) as total_word_count
from
(
select phrase,word,get_word_count(phrase,word," ") as word_count
from search_word
join phrase_to_search
) t
group by word
order by total_word_count desc;
Here is a solution:
SELECT SUM(total_count) as total, value
FROM (
SELECT count(*) AS total_count, REPLACE(REPLACE(REPLACE(x.value,'?',''),'.',''),'!','') as value
FROM (
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(t.sentence, ' ', n.n), ' ', -1) value
FROM table_name t CROSS JOIN
(
SELECT a.N + b.N * 10 + 1 n
FROM
(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
ORDER BY n
) n
WHERE n.n <= 1 + (LENGTH(t.sentence) - LENGTH(REPLACE(t.sentence, ' ', '')))
ORDER BY value
) AS x
GROUP BY x.value
) AS y
GROUP BY value
Here is the full working fiddle: http://sqlfiddle.com/#!2/17481a/1
First we do a query to extract all words as explained here by #peterm(follow his instructions if you want to customize the total number of words processed). Then we convert that into a sub-query and then we COUNT and GROUP BY the value of each word, and then make another query on top of that to GROUP BY not grouped words cases where accompanied signs might be present. ie: hello = hello! with a REPLACE
Below is the simple solution for the case when you need to count certain word occurrences, not the complete statistics:
SELECT COUNT(*) FROM `words` WHERE `row1` LIKE '%how%';
SELECT COUNT(*) FROM `words` WHERE `row1` LIKE '%you%';