How to use MySQL REGEXP in the WHERE of a JOIN statement - mysql

I have two tables A and B
Table A has columns: ID and POST
Table B has columns: ID, POST_ID and UPPERS
I want to select all records where a.POST matches the regex
'\\[cd(i|b)?(=[a-z0-9]+)?\\].+\\[/cd(i|b)?\\]'
and JOIN table B on a.ID = b.POST_ID where b.UPPERS matches the regex
'(\\|[0-9]+\\![0-9]{4}[-]+[0-9]{2}[-]+[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},){1,}'
I came up with the following statement but it is not returning any row even when the columns contains the contents matching the regex
SELECT a.*,b.*
FROM a JOIN
b
ON b.POST_ID=a.ID
WHERE a.POST RLIKE '\\[cd(i|b)?(=[a-z0-9]+)?\\].+\\[/cd(i|b)?\\]' AND
b.UPPERS REGEXP '(\\|[0-9]+\\![0-9]{4}[-]+[0-9]{2}[-]+[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},){1,}'
Summary:
I want to select records where a users has sent contents that matches this regex
'\\[cd(i|b)?(=[a-z0-9]+)?\\].+\\[/cd(i|b)?\\]'
and then check if that very post has received at least two ups(or likes) using the regex
'(\\|[0-9]+\\![0-9]{4}[-]+[0-9]{2}[-]+[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},){2,}'
which can be broken down as simply:
a prefix pipe: |
a user id: [0-9]+
an exclamation mark: !
a datetime: [0-9]{4}[-]+[0-9]{2}[-]+[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}
and a sufix: ,
NOTE: {2,} simply to check how many times the match occurs
Please can someone point me in the right direction as to what am doing wrong.
Sample table datas:
Table A
ID | POST
23 match found [cd=plain]6h+#gtyr[/cd]
24 match found [cd]65#%gte2!iu[/cd]
25 match found [cdi]*tre&y^g82u[/cdi]
26 no match found *tre&y^g82u
27 no match found rtyure99
28 match found [cdb]aha87ulchr[/cdb]
Table B
ID | POST_ID | UPPERS
4 24 |98!2018-02-10 22:43:03,
|35!2018-02-08 20:42:09,
|3!2018-02-05 02:05:07,
5 26 |2!2018-02-10 22:43:03,
|30!2018-02-08 20:42:09,
6 25 |21!2018-02-10 22:43:03,
7 27 |23!2018-02-10 22:43:03,
|11!2018-02-08 20:42:09,
NOTE: POST_ID in table B is a foreign key referencing ID of table A

If you don't mind, I'm actually going to answer the question that lies beneath your actual question. I'm sure we could work through why the regular expression is not working as you expect, but it begs the question: why use regular expressions for such a simple task?
It happens a lot that people first just use a database to stash stuff that is the same format that appears in the code. But if you take a little time to break down your data in a meaningful way, you can unlock a lot of power from humble MySQL.
Think about the question you want this query to answer:
Which posts that match certain criteria have been upped?
As you already realized, that suggests two tables - one to store information about the posts, and another to store information about who upped them. To make your queries fast and easy, think about which attributes of the information are going to show up in your where clause.
You want posts that are enclosed by certain markup. To make your search more efficient, put the markup tag in its own column:
CREATE TABLE `posts` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`tag` enum('cd','cdi','cdb') DEFAULT NULL,
`tag_value` varchar(11) DEFAULT NULL,
`content` text NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
for the data you list above, the table might look something like:
+-----+------+-----------+-------------+
| id | tag | tag_value | content |
+-----+------+-----------+-------------+
| 23 | cd | plain | 6h+#gtyr |
| 24 | cd | NULL | 65#%gte2!iu |
| 25 | cdi | NULL | *tre&y^g82u |
| 26 | NULL | NULL | *tre&y^g82u |
| 27 | NULL | NULL | rtyure99 |
| 28 | cdb | NULL | aha87ulchr |
+-----+------+-----------+-------------+
It takes a little more work to put your data IN (this is where your regex powers are better applied, as you create the INSERT), but now you can do all sorts of things with it quite easily. I used an ENUM for the tag column, because that is extra-fast to search on. If you have a large number of tags or don't know what they will all be, you can just use a VARCHAR instead.
So how to you track UPPERS? That part gets very easy. All you need is a table with a row for each time someone ups something:
CREATE TABLE `uppers` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`post_id` int(11) DEFAULT NULL,
`time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Currently when someone ups something, you have to go find the relevant record, append new data to it, then save it back. Now you can just slap a record into the table. The time will be set automatically; all you need to insert is the user_id and post_id. Some of your data might look like:
+----+---------+---------+---------------------+
| id | user_id | post_id | time |
+----+---------+---------+---------------------+
| 2 | 98 | 24 | 2018-02-10 15:23:03 |
| 3 | 35 | 24 | 2018-02-10 15:23:23 |
| 4 | 27 | 24 | 2018-02-10 15:23:43 |
| 5 | 2 | 26 | 2018-02-10 15:24:16 |
| 6 | 30 | 26 | 2018-02-10 15:24:28 |
+----+---------+---------+---------------------+
Now you can harness the power of the MySQL engine to capture all the information you need:
All posts with the desired tags:
SELECT * FROM posts where tag IN ('cd', 'cdi', 'cdb')
All post with the desired tags and at least one up:
SELECT posts.*, uppers.user_id, uppers.time
FROM posts
INNER JOIN uppers ON posts.id = uppers.post_id
WHERE tag IN ('cd', 'cdi', 'cdb')
That will return a row for each post-upper combination. The INNER JOIN means it will not return any posts that don't have a match in the uppers table. This may be what you are looking for, but if you want to group the ups together by post ID, you can ask MySQL to group them for you:
SELECT posts.*, COUNT(uppers.user_id)
FROM posts
INNER JOIN uppers
WHERE tag IN ('cd', 'cdi', 'cdb')
GROUP BY posts.id
If you want to rule out duplicate ups by the same user, you can easily only count unique user id's for each post:
SELECT posts.*, COUNT(DISTINCT uppers.user_id)
FROM posts
INNER JOIN uppers
WHERE tag IN ('cd', 'cdi', 'cdb')
GROUP BY posts.id
There are many functions like COUNT() you can use to work with the data that gets grouped together. You could MAX(uppers.time) to get the time of the most recent up for that post, or you can use functions like GROUP_CONCAT() to put the values together in a long string.
The bottom like is that by breaking down your data into its fundamental pieces, you allow MySQL (or any other relational database) to work much more efficiently, and life gets much, much easier.

Related

Reduce number of joins in mysql

I have 12 fixed tables (group, local, element, sub_element, service, ...), each table with different numbers of rows.
The columns 'id_' in all table is a primary key (int). The others columns are of datatype varchar(20). The maximum number of rows in these tables are 300.
Each table was created in this way:
CREATE TABLE group
(
id_G int NOT NULL,
name_group varchar(20) NOT NULL,
PRIMARY KEY (id_G)
);
|........GROUP......| |.......LOCAL.......| |.......SERVICE.......|
| id_G | name_group | | id_L | name_local | | id_S | name_service |
+------+------------+ +------+------------+ +------+--------------+
| 1 | group1 | | 1 | local1 | | 1 | service1 |
| 2 | group2 | | 2 | local2 | | 2 | service2 |
And I have one table that combine all these tables depending on user selects.
The 'id_' come from fixed tables selected by the user are recorded into this table.
This table was crate in this way:
CREATE TABLE group
(
id_E int NOT NULL,
event_name varchar(20) NOT NULL,
id_G int NOT NULL,
id_L int NOT NULL,
...
PRIMARY KEY (id_G)
);
The tables (event) look like this:
|....................EVENT.....................|
| id_E | event_name | id_G | id_L | ... |id_S |
+------+-------------+------+------+-----+-----+
| 1 | mater1 | 1 | 1 | ... | 3 |
| 2 | master2 | 2 | 2 | ... | 6 |
This table get greater each day, an now it has about thousunds of rows.
Column id_E is the primary key (int), event_name is varchar(20).
This table has, in addition of id_E and event_name columns, 12 other columns the came from the fixed tables.
Every time than I need to retrieve information on the event table, to turn more readable, I need to do about 12 joins.
My query look like this where i need to retrieve all columns from table event:
SELECT event_name, name_group, name_local ..., name_service
FROM event
INNER JOIN group on event.id_G = group.id_G
INNER JOIN local on event.id_L = local.id_L
...
INNER JOIN service on event.id_S = service.id_S
WHERE event.id_S = 7 (for example)
This slows down my system performance. Is there a way to reduce the number of joins? I've heard about using Natural Keys, but I think this is not a good idea to form my case thinking in future maintenance.
My queries are taking about 7 seconds and I need to reduce this time.
I changed the WHERE clause and this caused not affect. So, I am sure that the problem is that the query has so many joins.
Could someone give some help? thanks a lot...
MySQL has a great keyword of "STRAIGHT_JOIN" and might be what you are looking for. First, each of your lookup tables (id/description) I have to assume already have an index on the ID column since that is primary key.
Your event table is the one you are querying as the primary basis of the details and joining to the lookups per their respective IDs. As long as your WHERE clause applicable to the EVENT table is optimized, such as the ID you are looking for, it SHOULD be virtually instantaneous.
If it is not, then it might be that MySQL is trying to think for you and take one of the secondary lookup tables and make it a primary basis of the query for whatever reason, such as much lower record count. In this case, add the keyword and try it..
SELECT STRAIGHT_JOIN ... rest of your query
This tells MySQL to do the query in the order you gave it, thus the Event table first and it's where clause on the ID. It should find that one thing, then grab all the corresponding lookup descriptions from the other tables.
Create indexes, concretely use compound indexes, for instance, start creating a compound index for event and groups:
on table events create one for (event id, group id).
then, on the group table create another one for the next relation (group id, local id).
on local do the same with service, and so on...

For each set of keywords in one table, find all matching hits in a second table

Disclaimer: I'm using MySQL with 2 tables. So far I've found solutions to my issue when individual groups are queried one at a time using IN() but nothing that allows me to do the whole table at once without looping over multiple queries.
I have two tables:
CREATE TABLE WordGroups (
wgId int NOT NULL AUTO_INCREMENT,
groupId int NOT NULL,
word varchar(255) NOT NULL,
PRIMARY KEY (wgId)
);
Which keeps track of groups of keywords, word to groupId, and
CREATE TABLE ArticleWords (
awId int NOT NULL AUTO_INCREMENT,
articleId int NOT NULL,
word varchar(255) NOT NULL,
PRIMARY KEY (awId)
);
which keeps track of the key words within an article.
I am attempting to build a single query which can take the groups of words, and return for each group all articles which contain AT LEAST all of those words.
I realize if I look for one group at a time in a single query this is very simple, however I can't seem to figure out how to make a single query result in the set of all matching subsets.
For example imagine that the two tables have the following data:
WordGroups
groupId | word
-----------------
1 | B
1 | A
2 | C
2 | E
3 | F
ArticleWords
articleId | word
-----------------
1 | A
1 | C
1 | B
2 | C
3 | A
3 | B
3 | F
4 | C
4 | E
4 | F
The resulting query would return:
groupId | articleId
1 | 1
1 | 3
2 | 4
3 | 3
3 | 4
Since those articles contain at least all the words from those groups.
I've attempted an intersection of the two tables using an inner join but that matches incomplete groups of words resulting in the row:
groupId | articleId
2 | 2
Showing up in the result all because Article 2 contains the word "C". I'm open to ideas as I've dabbled in less serious MySQL but this has been eluding me all week.
Any help is much appreciated. I'm at the point where I'm wondering if I'm trying to make SQL do something it isn't meant to do. I have a very long query which works for WordGroups up to 6 words but it is very exact and not scalable, this query would need to work for any size WordGroup to be feasible.
Thank you for reading!
Here is one method, uses group_concat() for the comparison:
select wg.groupId, aw.articleId
from articlewords aw join
wordgroups wg
on wg.word = aw.word join
(select wg.groupId, group_concat(wg.word order by word) as words
from word_groups wg
group by wg.groupId
) wgw
on wgw.groupId = wg.groupid
group by aw.articleid, wgw.words
having group_concat(aw.word order by aw.word) = wgw.words;
Here is a SQL Fiddle.

Mysql 3 tables, multiple counts grouped by, but gets stuck

I'm running with some troubles on a query. I'm trying to retrieve some data of a big database where 3 tables are involved.
These tables contain data about adds where, in a backend website, the administrator can manage which local adds he wants to be displayed, position and etc... These are organized in 3 tables, 1 of them, contains all the data that are relevant to adds info (Name, date of avaliability, date of expiration, etc...). Then, there's another 2 tables which contain some extra info, but just about views, or clicks.
So I have only 15 adds, that have multiple clicks and multiple views.
Each click and view table, register a new row for every click. So, when a click is registered, it will add a new row where addid_views is a register(click), and addid is addid from adds_table. So for instance, add (1) will have 2 views and 2 clicks while add (2) will have 1 view and 1 click.
My idea is to get for each add, how many clicks and views had in total.
I have 3 tables like these:
adds_table adds_clicks_table adds_views_table
+-------+-----------+ +-------------+------+ +-------------+------+
| addid | name | | addid_click |addid | | addid_views |addid |
+-------+-----------+ +-------------+------+ +-------------+------+
| 1 | add_name1 | | 1 | 1 | | 1 | 1 |
+-------+-----------+ +-------------+------+ +-------------+------+
| 2 | add_name2 | | 2 | 2 | | 2 | 1 |
+-------+-----------+
| 3 | add_name3 | | 3 | 1 | | 3 | 2 |
+-------+-----------+ +-------------+------+ +-------------+------+
CREATE TABLE `bwm_adds` (
`addid` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL,
...
PRIMARY KEY (`addid`)
) ENGINE=InnoDB AUTO_INCREMENT=16 DEFAULT CHARSET=utf8
CREATE TABLE `bwm_adds_clicks` (
`add_clickid` int(19) NOT NULL AUTO_INCREMENT,
`addid` int(11) NOT NULL,
...
PRIMARY KEY (`add_clickid`)
) ENGINE=InnoDB AUTO_INCREMENT=3374 DEFAULT CHARSET=utf8
CREATE TABLE `bwm_adds_views` (
`add_viewsid` int(19) NOT NULL AUTO_INCREMENT,
`addid` int(11) NOT NULL,
...
PRIMARY KEY (`add_viewsid`)
) ENGINE=InnoDB AUTO_INCREMENT=2078738 DEFAULT CHARSET=utf8
The result would be a single table where I retrieved, per each add (addid), how many clicks and how many views it had.
I need to get all a query where I get something like this:
+-------+---------+-----------+
| addid | clicks | views |
+-------+---------+-----------+
| 1 | 123123 | 235457568 |
+-------+---------+-----------+
| 2 | 5124123 | 435345234 |
+-------+---------+-----------+
| 3 | 123541 | 453563623 |
+-------+---------+-----------+
I tried to execute a query but it get's stuck and loading for undefined time... I 'm pretty sure that my query is failing cause if I remove one of the counts, displays some data very fast.
SELECT a.addid, COUNT(ac.addid_clicks) as 'clicks', COUNT(av.addid_views) as 'views'
FROM `adds_table` a
LEFT JOIN `adds_clicks_table` ac ON a.addid = ac.addid_click
LEFT JOIN `adds_views_table` av ON ac.addid_click = av.addid_views
GROUP BY a.addid
Mysql gets loading all the time, any idea to help know what I'm missing?
By the way, I found this post where treats almost the same problem I have, you can see I have the query very similar to the first answer, but I get the Loading message all the time. No errors, just Loading.
Edit: I missplaced the numbers and got confused. Now the tables are fixed and I added some explanation about it.
Edit2: I updated the post with SHOW CREATE TABLES DEFINITIONS.
Edit3: Is there any way to optimise this query? It seems it retrieves the result I want but the mysql database cancels the query because it gets more than 30 seconds to execute.
SELECT a.addid,
(SELECT COUNT(addid) FROM add_clicks where addid = a.addid) as clicks,
(SELECT COUNT(addid) FROM add_views where addid = a.addid) as views
FROM adds a ORDER BY a.addid;
If those are really your tables (one column, plus an auto_inc), then there is no meaningful information justifying having 3 tables instead of 1:
CREATE TABLE `bwm_adds` (
`addid` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL,
clicks INT UNSIGNED NOT NULL,
views INT UNSIGNED NOT NULL,
PRIMARY KEY (`addid`)
) ENGINE=InnoDB AUTO_INCREMENT=16 DEFAULT CHARSET=utf8
and then UPDATE ... SET views = views + 1 (etc) rather than inserting into the other tables.
If you have an old version,
SELECT a.addid,
( SELECT COUNT(addid_clicks)
FROM `adds_clicks_table`
WHERE addid = a.addid
) AS 'clicks',
( SELECT COUNT(addid_clicks)
FROM `adds_views_table`
WHERE addid = a.addid
) AS 'views'
FROM adds_table AS a
For 5.6 and later, this might be faster:
SELECT a.addid, c.clicks, v.views
FROM `adds_table` a
LEFT JOIN ( SELECT addid, COUNT(addid_clicks) FROM addid_clicks ) AS c USING(addid)
LEFT JOIN ( SELECT addid, COUNT(addid_views) FROM addid_views ) AS v USING(addid)
If you get NULLs but prefer 0s, then wrap the value in IFNULL(..., 0).
If you need to discuss further, please provide SHOW CREATE TABLE and EXPLAIN SELECT ...
I ended with a solution to my problem. The table I was trying to reach was too big cause of the bad engineered database, where in adds_views_table, for each view, a new row would be added. Ending with almost 3 millions of rows and with a table that weights almost the 35% of the entire database (326MB).
When phpmyadmin tried to execute a query, loaded for ever and never showed a result because a timeout limit applied to mysql. Changing this value would help but wasn't viable to retrieve that data and display it on a website (that implies the website or data wouldn't load until the query its executed).
That problem was fixed thanks to creating an index of addid in adds_table. Also, the query it's faster if subquery's are used for some reason. The query ended like this:
SELECT a.addid,
(SELECT COUNT(addid) FROM adds_clicks_table WHERE addid = a.addid) AS 'clicks',(SELECT COUNT(addid) FROM adds_views_table WHERE addid = a.addid) AS 'views'
FROM adds_table a
ORDER BY a.addid;
Thanks to #Rick James who posted a similar query and I ended modifying it to get the data I needed
forgive my horrible english

Identifying faulty data in MySQL table

Apologies for asking a question that may have answers in some form or another on here, but I was unable to make any of those solutions work for me.
I have the following query:
SELECT `user_id`, `application_id`, `unallocated_date`, `check_in_date`, `check_out_date`
FROM `student_room`
WHERE `user_id` = 17225
ORDER BY `application_id` DESC
It produces the following result:
user_id | application_id | unallocated_date | check_in_date | check_out_date
--------+----------------+---------------------+---------------------+---------------
17225 | 30782 | 2018-02-04 14:32:29 | NULL | NULL
17225 | 30782 | 2018-02-04 14:32:49 | NULL | NULL
17225 | 30782 | 2018-02-04 14:32:51 | NULL | NULL
17225 | 30782 | NULL | NULL | NULL
17225 | 30782 | NULL | 2018-02-04 14:41:54 | NULL
The fourth row in the result is a fault in my data; it should look similar to the first three rows - these rows occur happens when a student was allocated a new room, and the previous one needs to be unallocated. In this case, the unallocation of row 4 did not actually happen due to either a historical bug in the system I am working on, or user error, but most likely the former.
How can I identify ALL such rows? My attempts with GROUP BY and HAVING did not work, as I checked where all three date fields were NULL, but it did not pick up this particular user - so I was doing something wrong. My original query was:
SELECT COUNT(user_id) AS `count`, user_id FROM `student_room`
WHERE `unallocated_date` IS NULL
AND `check_in_date` IS NULL
AND `check_out_date` IS NULL
GROUP BY `user_id`
HAVING COUNT(user_id) > 1
ORDER BY `user_id` ASC
I tried various INNER JOIN attempts too, but I did not of them right...
The rows that I am interested in will have at least one entry with all three dates NULL, but also one where there is a check_in_date that is NOT NULL, as per this example. If I only had the first four rows, then the data could be correct, but the fifth row's presence makes the fourth row a faulty record - it should've been given an "unallocated_date" value at the time of the allocation of the room in the fifth row, which for some reason did not happen.
Together with a friend of mine, we made the following query that works. I have now learned that you can use "EXISTS" in MySQL. I saw it used when dropping or creating tables, but never like this. It ended up that this query solves the problem:
SELECT cte.user_id, COUNT(*)
FROM (
SELECT sro.user_id
FROM student_room AS sro
WHERE sro.unallocated_date IS NULL
AND sro.check_in_date IS NULL
AND sro.check_out_date IS NULL
AND EXISTS (
SELECT *
FROM student_room AS sri
WHERE sri.user_id = sro.user_id
AND sri.student_room_id > sro.student_room_id
)
ORDER BY user_id DESC
)
AS cte
GROUP BY cte.user_ID
ORDER BY COUNT(*) DESC
This query is the result of more than an hour of tinkering with records that was erroneous, so apologies if this appears to not match the question's requirements 100%, but it does solve the problem for me.

MYSQL, PHP, order by not working, primary key

I am generating a mySQL query from PHP.
Part of the query re-orders a table based on some variables (which do not include the primary key).
The code doesn't produce errors, however the table is not sorted.
I echo'd out the SQL code, and it looks correct, I tried running it directly in phpMyAdmin, and it runs also without error, but the table is still not sorted as requested.
alter table anavar order by dset_name, var_id;
I am pretty sure that this has to do with the fact that I have a primary key variable (UID) which is not present in the sort.
Both prior and post running the query the table remains ordered by UID. Deleting UID and re-running the query results in a correctly sorted table, but this seems like an overkill solution.
Any suggestions?
create table t2
( id int auto_increment primary key,
someInt int not null,
thing varchar(100) not null,
theWhen datetime not null,
key(theWhen) -- creates an index on theWhen
);
-- my table now has 2 indexes on it
-- see it by running `show indexes from t2`
-- truncate table t2;
insert t2(someInt,thing,theWhen) values
(17,'chopstick','2016-05-08 13:00:00'),
(14,'alligator','2016-05-01'),
(11,'snail','2016-07-08 19:00:00');
select * from t2; -- returns in physical order (the primary key `id`)
select * from t2 order by thing; -- returns via thing, which has no index anyway
select * from t2 order by theWhen,thing; -- partial index use
note that indexes aren't even used until you have a significant number of rows in a db anyway
Edit (new data comes in)
insert t2 (someInt,thing,theWhen) values (777,'apple',now());
select t2.id,t2.thing,t2.theWhen,#rnk:=#rnk+1 as rank
from t2
cross join (select #rnk:=0) xParams
order by thing;
+----+-----------+---------------------+------+
| id | thing | theWhen | rank |
+----+-----------+---------------------+------+
| 2 | alligator | 2016-05-01 00:00:00 | 1 |
| 4 | apple | 2016-09-04 15:04:50 | 2 |
| 1 | chopstick | 2016-05-08 13:00:00 | 3 |
| 3 | snail | 2016-07-08 19:00:00 | 4 |
+----+-----------+---------------------+------+
Focus on the fact that you can maintain your secondary indices and generate a rank on the fly whenever you want.