I want to get a record from a joint table at a time. But I don't hope the tables are joined as a whole.
The actual tables are as follow.
table contents -- stores content information.
+----+----------+----------+----------+-------------------+
| id | name |status |priority |last_registered_day|
+----+----------+----------+----------+-------------------+
| 1 | content_1|0 |1 |2020/10/10 11:20:20|
| 2 | content_2|2 |1 |2020/10/10 11:21:20|
| 3 | content_3|2 |2 |2020/10/10 11:22:20|
+----+----------+----------+----------+-------------------+
table clusters -- stores cluster information
+----+----------+
| id | name |
+----+----------+
| 1 | cluster_1|
| 2 | cluster_2|
+----+----------+
table content_cluster -- each record indicates that one content is on one cluster
+----------+----------+-------------------+
|content_id|cluster_id| last_update_date|
+----------+----------+-------------------+
| 1 | 1 |2020-10-01T11:30:00|
| 2 | 2 |2020-10-01T11:30:00|
| 3 | 1 |2020-10-01T10:30:00|
| 3 | 2 |2020-10-01T10:30:00|
+----------+----------+-------------------+
By specifying a cluster_id, I want to get one content name at a time where contents.status=2 and (contents name, cluster_id) pair is in content_cluster. The query in sql is something like follow.
SELECT contents.name
FROM contents
JOIN content_cluster
ON contents.content_id = content_cluster.content_id
where contents.status = 2
AND content_cluster.cluster_id = <cluster_id>
ORDER
BY contents.priority
, contents.last_registered_day
, contents.name
LIMIT 1;
However, I don't want the tables to be joined as a whole every time as I have to do it frequently and the tables are large. Is there any efficient way to do this? I can add some indices to the tables. What should I do?
I would try writing the query like this:
SELECT c.name
FROM contents c
WHERE EXISTS (SELECT 1
FROM content_cluster cc
WHERE cc.content_id = c.content_id AND
cc.cluster_id = <cluster_id>
) AND
c.status = 2
ORDER BY c.priority, c.last_registered_day, c.name
LIMIT 1;
Then create the following indexes:
content(status, priority, last_registered_day, name, content_id, name)
content_cluster(content_id, cluster_id).
The goal is for the execution plan to scan the index for context and for each row, look up to see if there is a match in content_cluster. The query stops at the first match.
I can't guarantee that this will generate that plan (avoiding the sort), but it is worth a try.
This query can easily be optimized by applying correct indexes. Apply the alter statements I am mentioning below. And let me know if the performance have considerably increased or not:
alter table contents
add index idx_1 (id),
add index idx_2(status);
alter table content_cluster
add index idx_1 (content_id),
add index idx_2(cluster_id);
If a content can be in multiple clusters and the number of clusters can change, I think that doing a join like this is the best solution.
You could try splitting your contents table into different tables each containing the contents of a specific cluster, but it would need to be updated frequently.
Related
I'm trying to update one MySQL table based on information from another.
My original table looks like:
id | value
------------
1 | hello
2 | fortune
3 | my
4 | old
5 | friend
And the tobeupdated table looks like:
uniqueid | id | value
---------------------
1 | | something
2 | | anything
3 | | old
4 | | friend
5 | | fortune
I want to update id in tobeupdated with the id from original based on value (strings stored in VARCHAR(32) field).
The updated table will hopefully look like:
uniqueid | id | value
---------------------
1 | | something
2 | | anything
3 | 4 | old
4 | 5 | friend
5 | 2 | fortune
I have a query that works, but it's very slow:
UPDATE tobeupdated, original
SET tobeupdated.id = original.id
WHERE tobeupdated.value = original.value
This maxes out my CPU and eventually leads to a timeout with only a fraction of the updates performed (there are several thousand values to match). I know matching by value will be slow, but this is the only data I have to match them together.
Is there a better way to update values like this? I could create a third table for the merged results, if that would be faster?
I tried MySQL - How can I update a table with values from another table?, but it didn't really help. Any ideas?
UPDATE tobeupdated
INNER JOIN original ON (tobeupdated.value = original.value)
SET tobeupdated.id = original.id
That should do it, and really its doing exactly what yours is. However, I prefer 'JOIN' syntax for joins rather than multiple 'WHERE' conditions, I think its easier to read
As for running slow, how large are the tables? You should have indexes on tobeupdated.value and original.value
EDIT:
we can also simplify the query
UPDATE tobeupdated
INNER JOIN original USING (value)
SET tobeupdated.id = original.id
USING is shorthand when both tables of a join have an identical named key such as id. ie an equi-join - http://en.wikipedia.org/wiki/Join_(SQL)#Equi-join
It depends what is a use of those tables, but you might consider putting trigger on original table on insert and update. When insert or update is done, update the second table based on only one item from the original table. It will be quicker.
This information is very condensed.
There are 2 tables.
article
-----------------------------------
|id | weight | text |
-----------------------------------
|1 | 10 | blah |
|2 | 100 | blah |
|3 | 50 | blah |
|4 | 1000 | blah |
-----------------------------------
read
-----------------------------------
| user_id | article_id |
-----------------------------------
| 1 | 4 |
| 1 | 2 |
| 1 | 3 |
| 2 | 3 |
| 2 | 4 |
-----------------------------------
I want to get unread articles using below query (very condensed)
SELECT
a.*
FROM
article a LEFT OUTER JOIN read r ON r.article_id = a.id and r.user_id = 1
WHERE
r.id IS NULL
ORDER BY
a.weight DESC
LIMIT 10
important information
the number of read table rows keeps under 1000 per user. (remove old data)
weight column in article table is changed frequently. (It means order not fixed)
problem is .. (when number of users : over 1M)
the way to get unread articles using read table (not in, outer join is not important)
number of read table rows will be over 1G
It works well so far (current # of read table rows : 100M). but I have to prepare next step because number of users is increasing rapidly.
What is the best way for large service in this case?
(sharding? partitioning table? or redesign architecture?)
Thanks in advance
Add a column to article. It will be a flag saying whether the article is read/unread. (Do not make it a user count or a timestamp; that will slowdown the subsequent steps.)
Whenever a user reads an article, check the flag and change it if needed.
Have `INDEX(flag, weight, id) -- this will let your query run almost instantly. This should be OK on that million-row table.
A problem: Since you are purging (after 1000), some "read" articles can become "unread". To deal with this, batch the purging, and gather the distinct list of articles that got purged. Then do the tedious task of re-computing the flag, but just for those articles. INDEX(article_id) will help; use EXISTS ( SELECT * FROM read WHERE article_id = $aid ). (This can probably be turned into a batch operation rather than one aid at a time.)
Another problem: secondary keys on billion-row tables are costly -- they may lead to a lot of I/O. Before attempting to address this problem, please provide SHOW CREATE TABLE for both tables, plus any other common SELECTs. Picking the right index(es) and datatypes is very important to performance in billion-row tables..
Point is, to use the index as far as possible.
SELECT a.*
FROM a
LEFT JOIN read r
ON r.article_id = a.id and r.user_id =1
WHERE r.id IS NULL
ORDER BY a.weight DESC
LIMIT 10
Edit:
The concern for you is the data size of read table and we have to reduce the data size. For that we have multiple options:
MySQL partitions: create partitions on range of user_id ( may be 100K users per partition
Create multiple tables: Similar to partitioning, but you will have the data in different databases(even in different DB servers). Based on the user_id, you will decide on the table/database to join.
Also, you can think of having archival of old data periodically and the application should be smart enough to decide on whether it needs to query archived tables or live table.
I am working on a social network website, so i hope users will be a lot.
I need to save tags (key | counter) for every user and i wonder if it's better to use 1) a big table vs 2) one really large table vs 3) splitted big tables.
1) this is an example for many tables implementation
table userid_tags (every user has it's own table)
key | counter
----- ---------
tag1 | 3
tag2 | 1
tag3 | 10
Query 1: SELECT * FROM userid_tags WHERE key='tag1'
Query 2: SELECT * FROM userid_tags
2) single table implementation:
table tags
key | counter | user_id
----- ------------------
tag1 | 3 | 20022
tag2 | 1 | 20022
tag2 | 10 | 31234
Query 1: SELECT * FROM userid_tags WHERE key='tag1' AND user_id='20022'
Query 2: SELECT * FROM userid_tags AND user_id='20022'
3) splitted tables implementation
table 1000_tags (user_id from 1 to 1000)
key | counter | user_id
----- ------------------
tag1 | 3 | 122
tag2 | 1 | 122
tag2 | 10 | 734
table 21000_tags (user_id from 20000 to 21000)
key | counter | user_id
----- ------------------
tag1 | 3 | 20022
tag2 | 1 | 20022
tag2 | 10 | 20234
Query 1: SELECT * FROM userid_tags WHERE key='tag1' AND user_id='20022'
Query 2: SELECT * FROM userid_tags AND user_id='20022'
Question for 3) what's a good split index? i used 1000 (users) following the instict
2 is the right answer. Think about how you are going to maintain one table per user, or 1 table per 1000 tags. How Will you create/update/delete the tables? What if you have to make mass changes? How will you be able to figure out which table you need to select from? Even if you can, what if you need to select from more than one of those tables simultaneously (e.g. get the tags for two users).
Having the tables split up won't give you much of a performance benefit as it is. It's true that if the tables grow very large inserts may become slower because mysql has to create the keys, but as long as you have the appropriate keys look ups should be very fast.
Another similar solution would be to have a table for tags, a table for users, and a table that maps both of them. This will keep the tag cardinality small and if you're using an auto_increment surrogate key for both tables, the key length for both will be small which should make look ups as fast as possible with no restrictions on the relation (i.e. having to figure out other tables to join on for other users).
Using option 2 is the correct way to handle this. You can still use partitions within the table though. All the information about using partition can be found in the MySQL documentation.
Splitting the table in partitions for every thousand users would look something like:
CREATE TABLE tags (`key VARCHAR(50), counter INT, user_id INT)
PARTITION BY KEY(user_id) partitions 1000;
If the user_id would be 21001 you could start searching in the correct partition something like:
SELECT * FROM tags PARTITION (p22);'
Because the id 21001 would be in the 22nd partition. Check the link for more information.
I have a large database with two tables: stat and total.
The example of the relation is the following:
STAT:
| ID | total event |
+--------+--------------+
| 7 | 2 |
| 8 | 1 |
TOTAL:
|ID | Event |
+---+--------------+
| 7 | "hello" |
| 7 | "everybody" |
| 8 | "hi" |
This is a very simplified version; also consider that STAT table could have 500K records, and for each STAT I can have about 200 TOTAL rows.
Currently, if I run a simple SELECT query in table TOTAL the system is terribly slow.
Could anyone help me with some advice for the creation of the TOTAL table? Is it possible to say to MySQL that the id column is already sorted so that there is no reason to scan all the rows till the end where, for example, id=7?
Add INDEX(ID) to your tables (both), if you did not already.
SELECT COUNT(*) FROM TOTAL WHERE ID=7 -> if ID is indexed, this will be fast.
You can add an index, and furthermore you can partition your table.
As per #ypercube's comment, tables are not stored in a sorted state, so one cannot "tell" this to the database. However you can add an index on tables to make them faster to search.
One important thing to check - it looks like TOTAL.ID is intended as a foreign key - if so, the table TOTAL should have a primary key called ID. Rename the existing column of that name to STAT_ID instead, so it is obvious what it is a foreign key for. Then add an index on STAT_ID.
Lastly, as a point of style, I recommend that you make your table and column names case-insensitive, and write them in lower-case. It makes it easier to read SQL when keywords are in upper case, and database objects are in lower.
I'm trying to update one MySQL table based on information from another.
My original table looks like:
id | value
------------
1 | hello
2 | fortune
3 | my
4 | old
5 | friend
And the tobeupdated table looks like:
uniqueid | id | value
---------------------
1 | | something
2 | | anything
3 | | old
4 | | friend
5 | | fortune
I want to update id in tobeupdated with the id from original based on value (strings stored in VARCHAR(32) field).
The updated table will hopefully look like:
uniqueid | id | value
---------------------
1 | | something
2 | | anything
3 | 4 | old
4 | 5 | friend
5 | 2 | fortune
I have a query that works, but it's very slow:
UPDATE tobeupdated, original
SET tobeupdated.id = original.id
WHERE tobeupdated.value = original.value
This maxes out my CPU and eventually leads to a timeout with only a fraction of the updates performed (there are several thousand values to match). I know matching by value will be slow, but this is the only data I have to match them together.
Is there a better way to update values like this? I could create a third table for the merged results, if that would be faster?
I tried MySQL - How can I update a table with values from another table?, but it didn't really help. Any ideas?
UPDATE tobeupdated
INNER JOIN original ON (tobeupdated.value = original.value)
SET tobeupdated.id = original.id
That should do it, and really its doing exactly what yours is. However, I prefer 'JOIN' syntax for joins rather than multiple 'WHERE' conditions, I think its easier to read
As for running slow, how large are the tables? You should have indexes on tobeupdated.value and original.value
EDIT:
we can also simplify the query
UPDATE tobeupdated
INNER JOIN original USING (value)
SET tobeupdated.id = original.id
USING is shorthand when both tables of a join have an identical named key such as id. ie an equi-join - http://en.wikipedia.org/wiki/Join_(SQL)#Equi-join
It depends what is a use of those tables, but you might consider putting trigger on original table on insert and update. When insert or update is done, update the second table based on only one item from the original table. It will be quicker.