Customized cyclic number design in database - mysql

I need a database design (mysql 8.0+) to support a cyclic number series from 1 to a specific max number, such as 1 to 3, then would be get 1,2,3,1,2,3,... as query result respectively and cyclically. My version has been worked successfully but hope seeking for maybe better, native version. Many thanks.
My scripts are here,
CREATE TABLE IF NOT EXISTS `cyclic_series_number` (
`category` VARCHAR(100) NOT NULL,
`sn` int NOT NULL,
`max` int NOT NULL,
PRIMARY KEY (`category`)
);
Afterwards, insert 2 records. The 1st record will be the one to test.
REPLACE INTO `cyclic_series_number` (`category`, `sn`, `max`)
VALUES ('testing', 1, 3), ('ticket', 1, 999);
SELECT * FROM `cyclic_series_number`;
+--------------------------+
| cyclic_series_number |
+---+-----------+----+-----+
| # | category | sn | max |
+---+-----------+----+-----+
| 1 | 'testing' | 1 | 3 |
+---+-----------+----+-----+
| 2 | 'ticket' | 1 | 999 |
+---+-----------+----+-----+
The last, offering a stored procedure.
The idea is to update (sn=sn+1) and get that number as well as a necessary check sn+1 to see if exceeds the max number.
All above logics run at the same time.
DROP PROCEDURE IF EXISTS `get_new_sn`;
DELIMITER //
CREATE PROCEDURE get_new_sn(IN input_category varchar(100))
BEGIN
SET #latest_sn = -1;
UPDATE `cyclic_series_number`
SET `sn` = (#latest_sn := case `sn` when `max` then 1 else `sn` + 1 end)
WHERE `category` = #input_category;
SELECT #latest_sn;
END //
DELIMITER ;
The testing result shows the stored procedure works.
CALL get_new_sn('testing'); -- 2
CALL get_new_sn('testing'); -- 3
CALL get_new_sn('testing'); -- 1
CALL get_new_sn('testing'); -- 2
CALL get_new_sn('testing'); -- 3
CALL get_new_sn('testing'); -- 1
-- ...
References
StackOverflow mysql-how-to-set-a-local-variable-in-an-update-statement-syntax

UPDATE sourcetable
SET sourcetable.num = subquery.num
FROM ( SELECT id, 1 + (ROW_NUMBER() OVER (ORDER BY id) - 1) % 99 num
FROM sourcetable ) subquery
WHERE sourcetable.id = subquery.id;
where 99 is upper limit.

keeping your stored procedure...
change line that starts:
SET sn = (#latest_sn := case sn when ... .. ...
to something like:
SET sn = (sn + 1) % max;
The modulo operator returns remainder after division... so if sn+1 is less than max then the remainder is sn+1. Once sn+1 = max, remainder = 0 and it starts over... This means too, that max needs to be 1 higher than highest allowed value... so if sn can be 99 but not 100, then max = 100.

Related

MySQL - copy or update rows with a change within one table

I have a database table like this one:
group | detailsID | price
EK | 1 | 1.40
EK | 2 | 1.50
EK | 3 | 1.60
H | 1 | 2.40
H | 2 | 2.50
Now I want to copy the data from group "EK" to the group "H", so the prices for the detailsID's must be adjusted for the detailsIDs 1 and 2, and the entry for detailsID 3 must be inserted for group "H".
How can I do that with one or two MySQL query's?
Thanks!
We can try doing an INSERT INTO ... SELECT with ON DUPLICATE KEY UPDATE:
INSERT INTO yourTable (`group`, detailsID, price)
SELECT 'H', detailsID, price
FROM yourTable t
WHERE `group` = 'EK'
ON DUPLICATE KEY UPDATE price = t.price;
But this assumes that there exists a unique key on (group, detailsID). If this would not be possible, then this approach would not work.
As an alternative, I might do this in two steps. First, remove the H group records, then insert the updated H records you expect.
DELETE
FROM yourTable
WHERE `group` = 'H';
INSERT INTO yourTable (`group`, detailsID, price)
SELECT 'H', detailsID, price
FROM yourTable
WHERE `group` = 'EK';
I use the above approach because a single update can't handle your requirement, since new records also need to be inserted.
Note that you should avoid naming your columns and tables using reserved MySQL keywords such as GROUP.
You can try this as well, Following code implemented using stored procedures. Very simple not that difficult to understand. You may need to modify data type and optimize the code as per the requirement.
DELIMITER $$;
DROP PROCEDURE IF EXISTS update_H $$;
CREATE PROCEDURE update_H()
BEGIN
DECLARE finished INTEGER DEFAULT 0;
DECLARE `group_col` varchar(255) DEFAULT "";
DECLARE `detaildid_col` varchar(255) DEFAULT "";
DECLARE `price_col` varchar(255) DEFAULT "";
DECLARE H_FOUND INTEGER DEFAULT 0;
DECLARE pull_data CURSOR FOR select `group`, `detaildid`, `price` from test.newtab WHERE `group` = 'EK';
DECLARE CONTINUE HANDLER FOR NOT FOUND SET finished = 1;
OPEN pull_data;
traverse_data: LOOP
FETCH pull_data INTO group_col, detaildid_col, price_col;
IF finished = 1 THEN
LEAVE traverse_data;
END IF;
SET H_FOUND = (SELECT count(*) from test.newtab where `group` = 'H' AND `detaildid` = detaildid_col);
IF ( H_FOUND = 1 ) THEN
UPDATE test.newtab SET `price` = price_col where `group` = 'H' AND `detaildid` = detaildid_col;
ELSE
INSERT INTO test.newtab (`group`, `detaildid`, `price`) VALUES ('H', detaildid_col, price_col);
END IF;
END LOOP traverse_data;
CLOSE pull_data;
END $$;
DELIMITER ;
You can call this procedure by executing, call update_H();

MySQL duplicate data removal with loop

I have a table called Positions which has data like this:
Id PositionId
1 'a'
2 'a '
3 'b '
4 'b'
Some of them has spaces so my idea is to remove those spaces, this is not actual table just an example of a table which has much more data.
So i created procedure to iterate over PositionIds and compare them if trimed they match remove one of them:
CREATE PROCEDURE remove_double_positions()
BEGIN
DECLARE done INT DEFAULT 0;
DECLARE current VARCHAR(255);
DECLARE previous VARCHAR(255) DEFAULT NULL;
DECLARE positionCur CURSOR FOR SELECT PositionId FROM Positions ORDER BY PositionId;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN positionCur;
clean_duplicates: LOOP
FETCH positionCur INTO current;
IF done THEN
LEAVE clean_duplicates;
END IF;
IF previous LIKE current THEN
DELETE FROM Positions WHERE PositionId = current;
END IF;
SET previous = current;
END LOOP clean_duplicates;
CLOSE positionCur;
END
For some reason it shows that 2 rows were affected but actually deletes all 4 of them and i don't know the reason why, could you help me.
From the manual https://dev.mysql.com/doc/refman/5.7/en/string-comparison-functions.html#operator_like under the like operator - Per the SQL standard, LIKE performs matching on a per-character basis, thus it can produce results different from the = comparison operator:...In particular, trailing spaces are significant, which is not true for CHAR or VARCHAR comparisons performed with the = operator:
mysql> SELECT 'a' = 'a ', 'a' LIKE 'a ';
+------------+---------------+
| 'a' = 'a ' | 'a' LIKE 'a ' |
+------------+---------------+
| 1 | 0 |
+------------+---------------+
1 row in set (0.00 sec)
This is true when = or like is used in where or case.
Your procedure would work as desired if you amended the delete bit to
IF trim(previous) = trim(current) THEN
DELETE FROM Positions WHERE PositionId like current;
END IF;
Just some other solution without cursor and procedure. I've check it on ORACLE. Hope it helps.
DELETE FROM positions
WHERE id IN ( SELECT t1.id
FROM positions t1,
positions t2
WHERE t1.positionId = TRIM(t2.positionId)
AND t1.positionId != t2.positionId
);
UPDATE
There are some crasy things are going on with mysql. Some problem with blank at the end of a strong and this error 1093 error.
Now my solution checked with MySQL 5.5.9
CREATE TABLE positions (
id INT NOT NULL,
positionid VARCHAR(2) NOT NULL
);
INSERT INTO positions VALUES
( 1, 'a'),
( 2, 'a '),
( 3, 'b'),
( 4, 'b ');
DELETE FROM positions
WHERE id IN ( SELECT t3.id FROM
(SELECT t2.id
FROM positions t1,
positions t2
WHERE t1.positionid = t2.positionid
AND LENGTH(t1.positionid) = 1
AND length(t2.positionid) = 2
) t3
);
mysql> SELECT * from positions;
+----+------------+
| id | positionid |
+----+------------+
| 1 | a |
| 3 | b |
+----+------------+
2 rows in set (0.00 sec)
mysql>
This "double" from delete SQL will fix this error 1093
Hope this helps.

How to compare two comma separate fields and get the count in MySQL

Hi all I have a MySQL table that has a field of comma separated values
id res
=============================
1 hh_2,hh_5,hh_6
------------------------------
2 hh_3,hh_5,hh_4
------------------------------
3 hh_6,hh_8,hh_7
------------------------------
4 hh_2,hh_7,hh_4
------------------------------
Please see the above example ,Actually i need to compare each row 'res' with other row's 'res' values and need to display count if they match with others. Please help me to get the count.
For example,
IN first row 'hh_2' also exist in fourth row so we need count as 2, likewise we need to compare all in all rows
I Have run the function its working for me. but the table so big. It have million of records so my performance take time. While check one record with 50000 record take 25 sec. Suppose my input is 60 rows it take one hour. Please help me how to optimize.
CREATE FUNCTION `combine_two_field`(s1 CHAR(96), s3 TEXT) RETURNS int(11)
BEGIN
DECLARE ndx INT DEFAULT 0;
DECLARE icount INT DEFAULT 0;
DECLARE head1 char(10);
DECLARE head2 char(10);
DECLARE head3 char(10);
WHILE ndx <= LENGTH(s1) DO
SET head1 = SUBSTRING_INDEX(s3, ',', 1);
SET s3 = SUBSTRING(s3, LENGTH(head1) + 1 + #iSeparLen);
SET head2 = SUBSTRING_INDEX(s1, ',', 1);
SET s1 = SUBSTRING(s1, LENGTH(head2) + 1 + #iSeparLen);
IF (head1 = head2) THEN
SET icount = icount + 1;
END IF;
SET ndx = ndx + 1;
END WHILE;
RETURN icount;
END
And the table size is too big and i want to reduce fetching time also ...
UPDATE QUERY:
DROP PROCEDURE IF EXISTS `pcompare7` $$
CREATE DEFINER=`root`#`localhost` PROCEDURE `pcompare7`(IN in_analysis_id INT(11))
BEGIN
drop table if exists `tmp_in_results`;
CREATE TEMPORARY TABLE `tmp_in_results` (
`t_id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
`r_id` bigint(11) NOT NULL,
`r_res` char(11) NOT NULL,
PRIMARY KEY (`t_id`),
KEY r_res (r_res)
)
ENGINE = InnoDB;
SELECT splite_snp(r_snp,id,ruid) FROM results WHERE technical_status = 1 and critical_status = 1 and autosomal_status = 1 and gender_status != "NO CALL" and analys_id = in_analysis_id;
-- SELECT * FROM tmp_in_results;
-- COmpare Functionality
SELECT a.t_id, b.id, SUM(IF(FIND_IN_SET(a.r_res, b.r_snp), 1, 0)) FROM tmp_in_results a CROSS JOIN results b GROUP BY a.t_id, b.id;
END $$
Function FOR CREATE TEMP TABLE:
DROP FUNCTION IF EXISTS `splite_snp` $$
CREATE DEFINER=`root`#`localhost` FUNCTION `splite_snp`(s1 TEXT, in_id bigint(96), ruid char(11)) RETURNS tinyint(1)
BEGIN
DECLARE ndx INT DEFAULT 0;
DECLARE icount INT DEFAULT 0;
DECLARE head1 TEXT;
DECLARE head2 TEXT;
DECLARE intpos1 char(10);
DECLARE intpos2 char(10);
DECLARE Separ char(3) DEFAULT ',';
DECLARE iSeparLen INT;
SET #iSeparLen = LENGTH( Separ );
WHILE s1 != '' DO
SET intpos1 = SUBSTRING_INDEX(s1, ',', 1);
SET s1 = SUBSTRING(s1, LENGTH(intpos1) + 1 + #iSeparLen);
INSERT INTO tmp_in_results(r_id,r_res) VALUES(in_id,intpos1);
END WHILE;
RETURN TRUE;
END $$
New table structure
pc_input
id in_res in_id
=============================
1 hh_2 1000
------------------------------
2 hh_3 1000
------------------------------
3 hh_6 1001
------------------------------
4 hh_2 1001
------------------------------
res_snp
id r_res r_id
=============================
1 hh_2 999
------------------------------
2 hh_3 999
------------------------------
3 hh_9 999
------------------------------
4 hh_2 998
------------------------------
5 hh_6 998
------------------------------
6 hh_9 998
------------------------------
Result:
in_id r_id matches_count
=============================
1000 999 2 (hh_2,hh_3)
------------------------------
1000 998 1 (hh_2)
------------------------------
1001 999 1 (hh_2)
------------------------------
1001 998 2 (hh_2,hh_6)
------------------------------
I have add the separate index both table in_res,in_id and r_res and r_id
QUERY:
SELECT b.r_id,count(*) FROM pc_input AS a INNER JOIN results_snps AS b ON (b.r_snp = a.in_snp) group by a.in_id,b.r_id;
But mysql server was freeze. Cloud you please suggest any other way or optimize my query.
EXPLAIN TABLE: res_snp
Field Type Null Key Default Extra
id bigint(11) NO PRI NULL auto_increment
r_snp varchar(50) NO MUL NULL
r_id bigint(11) NO MUL NULL
EXPLAIN TABLE: pc_input
Field Type Null Key Default Extra
id bigint(11) NO PRI NULL auto_increment
in_snp varchar(55) NO MUL NULL
in_id bigint(11) NO MUL NULL
Explain Query:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE a ALL in_snp NULL NULL NULL 192 Using temporary; Using filesort
1 SIMPLE b ref r_snp r_snp 52 rutgers22042014.a.in_snp 2861 Using where0
This is possible, but nasty. A properly normalised database would be far easier, but sometime you have to work with an existing database.
Something like this should do it (not tested). This uses a couple of sub queries to generate the numbers from 0 to 9, combined allowing a range from 0 to 99. This is then used with substring_index to split the string up, along with DISTINCT to get eleminate the duplicates that this will otherwise generate (I assume there should be no duplicates on any line - if there are they can be got rid of but it gets more complicated), then that is just used as a sub query to do the counts
SELECT aRes, COUNT(*)
FROM
(
SELECT DISTINCT sometable.id, SUBSTRING_INDEX(SUBSTRING_INDEX(sometable.res, ',', 1 + units.i + tens.i * 10), ',', -1) AS aRes
FROM sometable
CROSS JOIN (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
) Sub1
GROUP BY aRes
EDIT - now tested:-
http://www.sqlfiddle.com/#!2/0ef59/4
EDIT - Possible solution. Hopefully will be acceptably quick.
First extract your input rows into a temp table:-
CREATE TEMPORARY TABLE tmp_record
(
unique_id INT NOT NULL AUTO_INCREMENT,
id INT,
res varchar(25),
PRIMARY KEY (unique_id),
KEY `res` (`res`)
);
Load the above up with your test data
INSERT INTO tmp_record (unique_id, id, res)
VALUES
(1, 1, 'hh_2'),
(2, 1, 'hh_5'),
(3, 1, 'hh_6'),
(4, 2, 'hh_3'),
(5, 2, 'hh_5'),
(6, 2, 'hh_4');
Then you can do a join as follows.
SELECT a.id, b.id, SUM(IF(FIND_IN_SET(a.res, b.res), 1, 0))
FROM tmp_record a
CROSS JOIN sometable b
GROUP BY a.id, b.id
This is joining every input row with every row on your main table and checking if the individual input res in in the comma separated list. If it is then the IF returns 1, else 0. Then it is summing up those values, grouped by the 2 ids.
Not tested but hopefully this should work. I am unsure on performance (which might be slow as you are dealing with a LOT of potential records).
Note that temp tables only last for the length of time the connection to the database exists. If you need to do this over several scripts then you will probably need to create a normal table (and remember to drop it when you have finished with it)

How to query a MySql table to display the root and its subchild.

UserID UserName ParentID TopID
1 abc Null Null
2 edf 1 1
3 gef 1 1
4 huj 3 1
5 jdi 4 1
6 das 2 1
7 new Null Null
8 gka 7 7
TopID and ParentID is from the userID
I Want to get a user record and its child and subchild record. Here userid1 is the root and its child are userid2 and userid 3. So If the user id is 1 I have to display all the records from userid 1 to userid 6 since all are child and SUbchild of the root. Similarly for userid3 I have to display userid3 and its child Userid 4 and Child of Userid 4 Userid5
if the userid is 3
output should be
Userid Username
3 gef
4 huj
5 jdi
I will know the userid and the topID so how can I do the query to acheive the above result.
SELECT UserID, UserName FROM tbl_User WHERE ParentID=3 OR UserID=3 And TopID=1;
By the above query I am able to display userid 3 and userid 4 I am not able to display userid 5, Kind of struck in it. Need help. Thanks
It is technically possible to do recursive hierarchical queries in MySQL using stored procedures.
Here is one adapted to your scenario:
CREATE TABLE `user` (
`UserID` int(16) unsigned NOT NULL,
`UserName` varchar(32),
`ParentID` int(16) DEFAULT NULL,
`TopID` int(16) DEFAULT NULL,
PRIMARY KEY (`UserID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO user VALUES (1, 'abc', NULL, NULL), (2, 'edf', 1, 1), (3, 'gef', 1, 1),
(4, 'huj', 3, 1), (5, 'jdi', 4, 1), (6, 'das', 2, 1), (7, 'new', NULL, NULL),
(8, 'gka', 7, 7);
DELIMITER $$
DROP PROCEDURE IF EXISTS `Hierarchy` $$
CREATE PROCEDURE `Hierarchy` (IN GivenID INT, IN initial INT)
BEGIN
DECLARE done INT DEFAULT 0;
DECLARE next_id INT;
-- CURSOR TO LOOP THROUGH RESULTS --
DECLARE cur1 CURSOR FOR SELECT UserID FROM user WHERE ParentID = GivenID;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
-- CREATE A TEMPORARY TABLE TO HOLD RESULTS --
IF initial=1 THEN
-- MAKE SURE TABLE DOESN'T CONTAIN OUTDATED INFO IF IT EXISTS (USUALLY ON ERROR) --
DROP TABLE IF EXISTS OUT_TEMP;
CREATE TEMPORARY TABLE OUT_TEMP (userID int, UserName varchar(32));
END IF;
-- ADD OURSELF TO THE TEMPORARY TABLE --
INSERT INTO OUT_TEMP SELECT UserID, UserName FROM user WHERE UserID = GivenID;
-- AND LOOP THROUGH THE CURSOR --
OPEN cur1;
read_loop: LOOP
FETCH cur1 INTO next_id;
-- NO ROWS FOUND, LEAVE LOOP --
IF done THEN
LEAVE read_loop;
END IF;
-- NEXT ROUND --
CALL Hierarchy(next_id, 0);
END LOOP;
CLOSE cur1;
-- THIS IS THE INITIAL CALL, LET'S GET THE RESULTS --
IF initial=1 THEN
SELECT * FROM OUT_TEMP;
-- CLEAN UP AFTER OURSELVES --
DROP TABLE OUT_TEMP;
END IF;
END $$
DELIMITER ;
CALL Hierarchy(3,1);
+--------+----------+
| userID | UserName |
+--------+----------+
| 3 | gef |
| 4 | huj |
| 5 | jdi |
+--------+----------+
3 rows in set (0.07 sec)
Query OK, 0 rows affected (0.07 sec)
CALL Hierarchy(1,1);
+--------+----------+
| userID | UserName |
+--------+----------+
| 1 | abc |
| 2 | edf |
| 6 | das |
| 3 | gef |
| 4 | huj |
| 5 | jdi |
+--------+----------+
6 rows in set (0.10 sec)
Query OK, 0 rows affected (0.10 sec)
Time to point out some caveats:
Since this is recursively calling a stored procedure, you need to increase the size of max_sp_recursion_depth, which has a max value of 255 (defaults to 0).
My results on a non-busy server with the limited test data (10 tuples of the user table) took 0.07-0.10 seconds to complete. The performance is such that it might be best to put the recursion in your application layer.
I didn't take advantage of your TopID column, so there might be a logic flaw. But the two test-cases gave me the expected results.
Disclaimer: This example was just to show that it can be done in MySQL, not that I endorse it in anyway. Stored Procedures, temporary tables and cursors are perhaps not the best way to do this problem.
Well not a pretty clean implementation but since you need only the children and sub-children, either of these might work:
Query1:
SELECT UserID, UserName
FROM tbl_user
WHERE ParentID = 3 OR UserID = 3
UNION
SELECT UserID, UserName
FROM tbl_user
WHERE ParentID IN (SELECT UserID
FROM tbl_user
WHERE ParentID = 3);
Query 2:
SELECT UserID, UserName
FROM tbl_user
WHERE UserID = 3
OR ParentID = 3
OR ParentID IN (SELECT UserID
FROM tbl_user
WHERE ParentID = 3);
EDIT 1: Alternatively, you may modify your table structure to make it more convenient to query all children of a particular category. Please follow this link to read more on storing hierarchical data in MySQL.
Also, you may think on storing your data hierarchically in a tree-like fashion that is very well explained in this article.
Please note that each method has its trade-offs with respect to retrieving desired results vs adding/removing categories but I'm sure you'll enjoy the reading.
This is one of the best articles I've seen for explaining the "Modified Preorder Tree Traversal" method of storing tree-like data in a SQL-style database.
http://www.sitepoint.com/hierarchical-data-database/
The MPTT stuff starts on page 2.
Essentially, you store a "Left" and a "Right" value for each node in the tree, in such a manner that to get all children of ParentA, you get the Left and Right for ParentA, then
SELECT *
FROM TableName
WHERE Left > ParentLeft
AND Right < ParentRight
To get all of the parents of the selected child (user_id = 3 in this example):
SELECT
#parent_id AS _user_id,
user_name,
(
SELECT #parent_id := parent_id
FROM users
WHERE user_id = _user_id
) AS parent
FROM (
-- initialize variables
SELECT
#parent_id := 3
) vars,
users u
WHERE #parent_id <> 0;
To get all of the children of a selected user_id
SELECT ui.user_id AS 'user_id', ui.user_name AS 'user_name', parent_id,
FROM
(
SELECT connect_by_parent(user_id) AS user_id
FROM (
SELECT
#start_user := 3,
#user_id := #start_user
) vars, users
WHERE #user_id IS NOT NULL
) uo
JOIN users ui ON ui.user_id = uo.user_id
This requires the following function
CREATE FUNCTION connect_by_parent(value INT) RETURNS INT
NOT DETERMINISTIC
READS SQL DATA
BEGIN
DECLARE _user_id INT;
DECLARE _parent_id INT;
DECLARE _next INT;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET #user_id = NULL;
SET _parent_id = #user_id;
SET _user_id = -1;
IF #user_id IS NULL THEN
RETURN NULL;
END IF;
LOOP
SELECT MIN(user_id)
INTO #user_id
FROM users
WHERE parent_id = _parent_id
AND user_id > _user_id;
IF #user_id IS NOT NULL OR _parent_id = #start_with THEN
RETURN #user_id;
END IF;
SELECT user_id, parent_id
INTO _user_id, _parent_id
FROM users
WHERE user_id = _parent_id;
END LOOP;
END
This example heavily uses session variables which many sql users may be unfamiliar with, so here's a link that may provide some insight: session variables

MySql: Count amount of times the words occur in a column

For instance, if I have data in a column like this
data
I love book
I love apple
I love book
I hate apple
I hate apple
How can I get result like this
I = 5
love = 3
hate = 2
book = 2
apple = 3
Can we achieve this with MySQL?
Here is a solution only using a query:
SELECT SUM(total_count) as total, value
FROM (
SELECT count(*) AS total_count, REPLACE(REPLACE(REPLACE(x.value,'?',''),'.',''),'!','') as value
FROM (
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(t.sentence, ' ', n.n), ' ', -1) value
FROM table_name t CROSS JOIN
(
SELECT a.N + b.N * 10 + 1 n
FROM
(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
ORDER BY n
) n
WHERE n.n <= 1 + (LENGTH(t.sentence) - LENGTH(REPLACE(t.sentence, ' ', '')))
ORDER BY value
) AS x
GROUP BY x.value
) AS y
GROUP BY value
Here is the full working fiddle: http://sqlfiddle.com/#!2/17481a/1
First we do a query to extract all words as explained here by #peterm(follow his instructions if you want to customize the total number of words processed). Then we convert that into a sub-query and then we COUNT and GROUP BY the value of each word, and then make another query on top of that to GROUP BY not grouped words cases where accompanied signs might be present. ie: hello = hello! with a REPLACE
If you want to perform such kind of text analysis, I would recommend using something like lucene, to get the termcount for each term in the document.
This query is going to take a long time to run if your table is of any decent size. It may be better to keep track of the counts in a separate table and update that table as values are inserted or, if real time results are not necessary, to only run this query every so often to update the counts table and pull your data from it. That way, you're not spending minutes to get data from this complex query.
Here's what I've for you so far. It's a good start. The only thing you need to do is modify it to iterate through the words in each row. You could use a cursor or a subquery.
Create test table:
create table tbl(str varchar(100) );
insert into tbl values('data');
insert into tbl values('I love book');
insert into tbl values('I love apple');
insert into tbl values('I love book');
insert into tbl values('I hate apple');
insert into tbl values('I hate apple');
Pull data from test table:
SELECT DISTINCT str AS Word, COUNT(str) AS Frequency FROM tbl GROUP BY str;
create a user defined function like this and use it in your query
DELIMITER $$
CREATE FUNCTION `getCount`(myStr VARCHAR(1000), myword VARCHAR(100))
RETURNS INT
BEGIN
DECLARE cnt INT DEFAULT 0;
DECLARE result INT DEFAULT 1;
WHILE (result > 0) DO
SET result = INSTR(myStr, myword);
IF(result > 0) THEN
SET cnt = cnt + 1;
SET myStr = SUBSTRING(myStr, result + LENGTH(myword));
END IF;
END WHILE;
RETURN cnt;
END$$
DELIMITER ;
Hope it helps
Refer This
Split-string procedure is not my job. You can find it here
http://forge.mysql.com/tools/tool.php?id=4
I wrote you the rest of code.
drop table if exists mytable;
create table mytable (
id int not null auto_increment primary key,
mytext varchar(1000)
) engine = myisam;
insert into mytable (mytext)
values ('I love book,but book sucks!What do you,think about it? me too'),('I love apple! it rulez.,No, it sucks a lot!!!'),('I love book'),('I hate apple!!! Me too.,!'),('I hate apple');
drop table if exists mywords;
create table mywords (
id int not null auto_increment primary key,
word varchar(50)
) engine = myisam;
delimiter //
drop procedure if exists split_string //
create procedure split_string (
in input text
, in `delimiter` varchar(10)
)
sql security invoker
begin
declare cur_position int default 1 ;
declare remainder text;
declare cur_string varchar(1000);
declare delimiter_length tinyint unsigned;
drop temporary table if exists SplitValues;
create temporary table SplitValues (
value varchar(1000) not null
) engine=myisam;
set remainder = input;
set delimiter_length = char_length(delimiter);
while char_length(remainder) > 0 and cur_position > 0 do
set cur_position = instr(remainder, `delimiter`);
if cur_position = 0 then
set cur_string = remainder;
else
set cur_string = left(remainder, cur_position - 1);
end if;
if trim(cur_string) != '' then
insert into SplitValues values (cur_string);
end if;
set remainder = substring(remainder, cur_position + delimiter_length);
end while;
end //
delimiter ;
delimiter //
drop procedure if exists single_words//
create procedure single_words()
begin
declare finish int default 0;
declare str varchar(200);
declare cur_table cursor for select replace(replace(replace(replace(mytext,'!',' '),',',' '),'.',' '),'?',' ') from mytable;
declare continue handler for not found set finish = 1;
truncate table mywords;
open cur_table;
my_loop:loop
fetch cur_table into str;
if finish = 1 then
leave my_loop;
end if;
call split_string(str,' ');
insert into mywords (word) select * from splitvalues;
end loop;
close cur_table;
end;//
delimiter ;
call single_words();
select word,count(*) as word_count
from mywords
group by word;
+-------+------------+
| word | word_count |
+-------+------------+
| a | 1 |
| about | 1 |
| apple | 3 |
| book | 3 |
| but | 1 |
| do | 1 |
| hate | 2 |
| I | 5 |
| it | 3 |
| lot | 1 |
| love | 3 |
| me | 2 |
| No | 1 |
| rulez | 1 |
| sucks | 2 |
| think | 1 |
| too | 2 |
| What | 1 |
| you | 1 |
+-------+------------+
19 rows in set (0.00 sec)
The code must be improved in order to consider any punctuation but this is the general idea.