In what situations should I use DB Triggers? - mysql

I have two tables, one which stores articles and the number of votes it has received:
| article_id | visitor_votes | member_votes | voting_opened |
-------------------------------------------------------------
| 1 | 12 | 394 | Y |
| 3 | 94 | 5821 | Y |
I also have another table which keeps track of which user voted for which article
| vote_id | user_id | article_id | date |
--------------------------------------------
| 1 | 12 | 1 | 7/28/2012
| 2 | 23 | 3 | 7/28/2012
One user can only place one vote per transaction. I currently use a trigger that increments the number of votes in the articles table every time a record is inserted into the votes table. Is this good practice or should I be doing this in my application (PHP web-based website)? I also want to stop voting after a certain number of votes (voting_opened = N), should I use a trigger to check if the total votes (visitor_votes + member_votes >= 6000) and then update the article row to set voting_opened = N? Or is this something I should be doing in my application as well? I need a solution that is scale-able because I will have thousands of votes for possibly hundreds of articles and I don't want to run into a case where the number of votes goes over the threshold because an update didn't update quick enough or whatever. Can someone shed some light on this scenario please?
Thank you!

Both solutions are valid and should work equally well.
You can try something like this in the application
UPDATE articles SET
visitor_votes = visitor_votes + 1
voting_opened = IF(visitor_votes + member_votes >= 6000, 'N', 'Y')
WHERE
article_id = xxxx
AND voting_opened = 'Y'
then check affected rows and if it is > 0 insert the row in the votes table.

Related

How to optimize an update query for multiple rows using MySQL and PHP

I have a table that has around 80.000 records. It has 4 columns:
| id | code | size | qty |
+--------+--------+-------+------+
| 1 | 4735 | M | 5 |
| 2 | 8452 | L | 2 |
...
| 81456 | 9145 | XS | 13 |
The code column is unique.
I have to update the qty twice a day.
For that i'm using this query:
UPDATE stock SET qty = CASE id
WHEN 1 THEN 10
WHEN 2 THEN 8
...
WHEN 2500 THEN 20
END
WHERE id IN (1,2,...,2500);
I am splitting the query to update 2500 stocks at a time using PHP.
Here is (in seconds) how much it takes for each 2500 stocks to update:
[0]7.11
[1]11.30
[2]19.86
[3]27.01
[4]36.25
[5]44.21
[6]51.44
[7]61.03
[8]71.53
[9]81.14
[10]89.12
[11]99.99
[12]111.46
[13]121.86
[14]131.19
[15]136.94
[END]137
As you can see it takes between 5 - 9 seconds to update 2500 products which i think is quiet a lot.
What can i change to speed things up?
Thank you!
Because the times seem to be getting longer the further along you get, I'd expect you need an index on the id field, as it looks suspiciously like it's doing a full table scan. You can create the index something like this
CREATE INDEX my_first_index ON table(id);
(I am having to add this as an answer because I can't make comments, I know it is more of a comment!!)
** EDIT **
I re-read and see your issue is bigger. I still think there is a chance that putting an index on id would fix it but a better solution would be to have a new table for the id to quantity mappings, lets call it qty_mapping
| id | qty |
+--------+------+
| 1 | 10 |
| 2 | 8 |
...
| 2500 | 20 |
make sure to index id and then you can change your update to
update stock set qty = (select qm.qty from qty_mapping qm where qm.id = stock.id)
It should be able to update the whole 80,000 records in next to no time.

How can I implement a viewed system for my website's posts?

Here is my current structure:
// posts
+----+--------+----------+-----------+------------+
| id | title | content | author_id | date_time |
+----+--------+----------+-----------+------------+
| 1 | title1 | content1 | 435 | 1468111492 |
| 2 | title2 | content2 | 657 | 1468113910 |
| 3 | title3 | content3 | 712 | 1468113791 |
+----+--------+----------+-----------+------------+
// viewed
+----+---------------+---------+------------+
| id | user_id_or_ip | post_id | date_tiem |
+----+---------------+---------+------------+
| 1 | 324 | 1 | 1468111493 |
| 2 | 546 | 3 | 1468111661 |
| 3 | 135.54.12.1 | 1 | 1468111691 |
| 5 | 75 | 1 | 1468112342 |
| 6 | 56.26.32.1 | 2 | 1468113190 |
| 7 | 56.26.32.1 | 3 | 1468113194 |
| 5 | 75 | 2 | 1468112612 |
+----+---------------+---------+------------+
Here is my query:
SELECT p.*,
(SELECT count(*) FROM viewed WHERE post_id = :id) AS total_viewed
FROM posts p
WHERE id = :id
Currently I've faced with a huge date for viewed table. Well what's wrong with my table structure (or database design)? In other word how can I improve it?
A website like stackoverflow has almost 12 million posts. Each post has (on average) 500 viewed. So the number of viewed's rows should be:
12000000 * 500 = 6,000,000,000 rows
Hah :-) .. Honestly I cannot even read that number (btw that number will grow up per sec). Well how stackoverflow handles the number of viewed for each post? Will it always calculate count(*) from viewed per post showing?
You are not likely to need partitioning, redis, nosql, etc, until you have many millions of rows. Meanwhile, let's see what we can do with what you do have.
Let's start by dissecting your query. I see WHERE id=... but no LIMIT or ORDER BY. Let's add to your table
INDEX(id, timestamp)
and use
WHERE id = :id
ORDER BY timestamp DESC
LIMIT 10
Any index is sorted by what is indexed. That is the 10 rows you are looking for are adjacent to each other. Even if the data is pushed out of cached, there will probably be only one block to provide those 10 rows.
But a "row" in a secondary index in InnoDB does not contain the data to satisfy SELECT *. The index "row" contains a pointer to the actual 'data' row. So, there will be 10 lookups to get them.
As for view count, let's implement that a different way:
CREATE TABLE ViewCounts (
post_id ...,
ct MEDIUMINT UNSIGNED NOT NULL,
PRIMARY KEY post_id
) ENGINE=InnoDB;
Now, given a post_id, it is very efficient to drill down the BTree to find the count. JOINing this table to the other, we get the individual counts with another 10 lookups.
So, you say, "why not put them in the same table"? The reason is that ViewCounts is changing so frequently that those actions will clash with other activity on Postings. It is better to keep them separate.
Even though we hit a couple dozen blocks, that is not bad compared to scanning millions of rows. And, this kind of data is somewhat "cacheable". Recent postings are more frequently accessed. Popular users are more frequently accessed. So, 100GB of data can be adequately cached in 10GB of RAM. Scaling is all about "counting the disk hits".

MySQL: Performance on user settings stored in one row (flat) versus multiple rows (key-value pairs)

Which of the following two queries would be faster? more performant?
Setup A
userSetting table just includes all parameters as columns
userSettingId | userId | marketingEmail | weeklyEmail | pushNotifications
--------------------------------------------------------------------
120 | 1 | 1 | 1 | 0
select userSetting.userId, user.email
from userSetting
INNER JOIN user ON userSetting.userId = user.userId
where marketingNotifEmail = 1;
or
Setup B
userBoolSetting table with keeps key/value pairs, where the value is a boolean, 0 or 1
userBoolSettingId | userId | description | value
-----------------------------------------------------------------
121 | 1 | marketingEmail | 1
122 | 1 | weeklyEmail | 1
123 | 1 | pushNotifications | 0
select userBoolSetting.userId, user.email
from userBoolSetting
INNER JOIN user ON userBoolSetting.userId = user.userId
where notificationType = 'marketingEmail'
AND isEnabled = 1;
Also, for the sake of clarity, I'd be looking at the performance at a bit larger table than these examples. Which query would be most performant for a larger data set, say 50-100 parameters, not just 3 as shown.
As has been pointed out many times in this forum, splaying an array of things across columns is not good because it is unmaintainable, etc.
Performance of fetching a hundred rows is not bad.
Throwing them into a JSON string is another option.

Select userids from mysql table if user has at least 3 fields filled

I have a mysql user table that holds user data like that:
userid | title | content
----------------------------------
1 | about | I am from ...
1 | location | Norway
1 | name | Mark
1 | website |
2 | about |
2 | location |
2 | name |
2 | website |
3 | ...
As you see the content is empty for userid 2, and also for many more users in the table.
My goal is to select only the userids that have at least 3 fields filled. All others should be ignored.
As my mysql knowledge is still weak I could not find a solution for this. I only found the opposite and just with count: Find the count of EMPTY or NULL columns in a MySQL table
What is the magic mysql query? Any help appreciated, thank you.
You would use aggregation and a having clause for this:
select u.userId
from users u
where content > '' and content is not null
group by u.userId
having count(*) >= 3;
I added the non-blank check as well as the null check. The null check is redundant, but it makes the intention clearer.

MySQL - how to optimize query to count votes

Just after some opinions on the best way to achieve the following outcome:
I would like to store in my MySQL database products which can be voted on by users (each vote is worth +1). I also want to be able to see how many times in total a user has voted.
To my simple mind, the following table structure would be ideal:
table: product table: user table: user_product_vote
+----+-------------+ +----+-------------+ +----+------------+---------+
| id | product | | id | username | | id | product_id | user_id |
+----+-------------+ +----+-------------+ +----+------------+---------+
| 1 | bananas | | 1 | matthew | | 1 | 1 | 2 |
| 2 | apples | | 2 | mark | | 2 | 2 | 2 |
| .. | .. | | .. | .. | | .. | .. | .. |
This way I can do a COUNT of the user_product_vote table for each product or user.
For example, when I want to look up bananas and the number of votes to show on a web page I could perform the following query:
SELECT p.product AS product, COUNT( v.id ) as votes
FROM product p
LEFT JOIN user_product_vote v ON p.id = v.product_id
WHERE p.id =1
If my site became hugely successful (we can all dream) and I had thousands of users voting on thousands of products, I fear that performing such a COUNT with every page view would be highly inefficient in terms of server resources.
A more simple approach would be to have a 'votes' column in the product table that is incremented each time a vote is added.
table: product
+----+-------------+-------+
| id | product | votes |
+----+-------------+-------+
| 1 | bananas | 2 |
| 2 | apples | 5 |
| .. | .. | .. |
While this is more resource friendly - I lose data (eg. I can no longer prevent a person from voting twice as there is no record of their voting activity).
My questions are:
i) am I being overly worried about server resources and should just stick with the three table option? (ie. do I need to have more faith in the ability of the database to handle large queries)
ii) is their a more efficient way of achieving the outcome without losing information
You can never be over worried about resources, when you first start building an application you should always have resources, space, speed etc. in mind, if your site's traffic grew dramatically and you never built for resources then you start getting into problems.
As for the vote system, personally I would keep the votes like so:
table: product table: user table: user_product_vote
+----+-------------+ +----+-------------+ +----+------------+---------+
| id | product | | id | username | | id | product_id | user_id |
+----+-------------+ +----+-------------+ +----+------------+---------+
| 1 | bananas | | 1 | matthew | | 1 | 1 | 2 |
| 2 | apples | | 2 | mark | | 2 | 2 | 2 |
| .. | .. | | .. | .. | | .. | .. | .. |
Reasons:
Firstly user_product_vote does not contain text, blobs etc., it's purely integer so it takes up less resources anyways.
Secondly, you have more of a doorway to new entities within your application such as Total votes last 24 hr, Highest rated product over the past 24 hour etc.
Take this example for instance:
table: user_product_vote
+----+------------+---------+-----------+------+
| id | product_id | user_id | vote_type | time |
+----+------------+---------+-----------+------+
| 1 | 1 | 2 | product |224.. |
| 2 | 2 | 2 | page |218.. |
| .. | .. | .. | .. | .. |
And a simple query:
SELECT COUNT(id) as total FROM user_product_vote WHERE vote_type = 'product' AND time BETWEEN(....) ORDER BY time DESC LIMIT 20
Another thing is if a user voted at 1AM and then tried to vote again at 2PM, you can easily check when the last time they voted and if they should be allowed to vote again.
There are so many opportunities that you will be missing if you stick with your incremental example.
In regards to your count(), no matter how much you optimize your queries it would not really make a difference on a large scale.
With an extremely large user-base your resource usage will be looked at from a different perspective such as load balancers, mainly server settings, Apache, catching etc., there's only so much you can do with your queries.
If my site became hugely successful (we can all dream) and I had thousands of users voting on thousands of products, I fear that performing such a COUNT with every page view would be highly inefficient in terms of server resources.
Don't waste your time solving imaginary problems. mysql is perfectly able to process thousands of records in fractions of a second - this is what databases are for. Clean and simple database and code structure is far more important than the mythical "optimization" that no one needs.
Why not mix and match both? Simply have the final counts in the product and users tables, so that you don't have to count every time and have the votes table , so that there is no double posting.
Edit:
To explain it a bit further, product and user table will have a column called "votes". Every time the insert is successfull in user_product_vote, increment the relevant user and product records. This would avoid dupe votes and you wont have to run the complex count query every time as well.
Edit:
Also i am assuming that you have created a unique index on product_id and user_id, in this case any duplication attempt will automatically fail and you wont have to check in the table before inserting. You will just to make sure the insert query ran and you got a valid value for the "id" in the form on insert_id
You have to balance the desire for your site to perform quickly (in which the second schema would be best) and the ability to count votes for specific users and prevent double voting (for which I would choose the first schema). Because you are only using integer columns for the user_product_vote table, I don't see how performance could suffer too much. Many-to-many relationships are common, as you have implemented with user_product_vote. If you do want to count votes for specific users and prevent double voting, a user_product_vote is the only clean way I can think of implementing it, as any other could result in sparse records, duplicate records, and all kinds of bad things.
You don't want to update the product table directly with an aggregate every time someone votes - this will lock product rows which will then affect other queries which are using products.
Assuming that not all product queries need to include the votes column, you could keep a separate productvotes table which would retain the running totals, and keep your userproductvote table as a means to enforce your user voting per product business rules / and auditing.