What is the right way of building user favourites table (Performance) - mysql

I guess that title isn't very descriptive, so I will explain! I have table called users_favs where is stored all info about which posts user has liked, which post he has favourited and the same for comments. info there is stored as serealized array / or JSON who cares.
Question: What is better? Stay like this or to make 4 tables for each of the fields and store not in serealized version but like user_id => post_id???
What I think about second option is that after some time this field will be GIGANTIC. Also, I will need to make 4 queries (or with JOINS) to take all of the info from these tables.

Keeping it in 1 table means that you'll only need 1 table access and 0 joins to get all the data. While storing it in 4 tables, you'll need at least 1 table access and n-1 joins, when you need n fields of information. Your result set at the end of the query will probably be the same, so the amount of data send over the network is independent of your table structure.

I presume a scenario when you will have data for fav_categories and other columns are null. Similarly for columns fav_posts, liked_posts, liked_comments. So there is a high probability that in each row , only three columns will have data most of the time (id,user_id,any one of rest). If my assumptions are right and the use cases as well , then i would definitely go four four tables.
To add to above you can always choose from whether you want to make read-friendly or write-friendly.

Related

SQL only get rows that matches full number split by a comma

I'm working on something that shows shops under a specific category, however I have an issue because I store the categories of a shop like this in a record with the id of a category. "1,5,12". Now, the problem is if I want to show shops with category 2, it "mistakens" 12 as category 2. This is the SQL right now.
SELECT * FROM shops WHERE shop_cats LIKE '%".$sqlid."%' LIMIT 8
Is there a way to split the record "shop_cats" by a comma in SQL, so it checks the full number? The only way I can think of is to get all the shops, and do it with PHP, but I don't like that as it will take too many resources.
This is a really, really bad way to store categories, for many reasons:
You are storing numbers as strings.
You cannot declare proper foreign key relationships.
A (normal) column in a table should have only one value.
SQL has poor string functions.
The resulting queries cannot take advantage of indexes.
The proper way to store this information in a database is using a junction table, with one row per shop and per category.
Sometimes, we are stuck with other people's really bad design decisions. If this is your case, then you can use FIND_IN_SET():
WHERE FIND_IN_SET($sqlid, shop_cats) > 0
But you should really fix the data structure.
If you can, the correct solution should be to normalize the table, i.e. have a separate row per category, not with commas.
If you can't, this should do the work:
SELECT * FROM shops WHERE CONCAT(',' , shop_cats , ',') LIKE '%,".$sqlid.",%' LIMIT 8
The table shops does not follow 1NF (1st Normal Form) i.e; every column should exactly one value. To avoid that you need to create another table called pivot table which relates two tables (or entities). But to answer your question, the below SQL query should do the trick.
SELECT * FROM shops WHERE concat(',',shop_cats,',') LIKE '%,".$sqlid.",%' LIMIT 8

Storing list of users associated with a certain item

This is a bit hard to explain.
But i have built an app where users create what i like to call 'raffles' and then users subscribe to it.
I have a table for the raffles, and i could have a column of type text in it and store all the users in it separated by commas(,)
or i could create a separate table where users are added and associated to the raffle via another field called 'raffle_id' or something like it.
I'm not sure how effective both of these methods will be efficient in the long run or for scaling.
Some advise would be appreciated.
I would recommend against storing your user information in CSV format. The main reason for this is that CSV will make querying the table by user difficult. It will also make doing updates difficult. SQL databases were designed to handle relational data using tables. So in your case I would design the raffles table to look like thia:
raffles (raffle_id, user_id)
And the data might look like this:
1 1
1 3
1 7
2 1
2 2
2 3
2 6
In other words, each record corresponds to a single raffle-user relation. Assuming that you only have a few dozen users and raffles happen every so often, thia should scale fine. And if this raffles table ever gets too large at a much later date you can archive a portion of it.
See [What is the best way to add users to multiple groups in a database?][1]
Raffles are the "Groups". "UserInGroup" becomes UserInRaffle, your join table.

Redshift Usage - 1 row by 400 columns per user or (20-400) rows by 4 columns per user

We are building an analytics engine which has to store attribute preference score for each user. We are expecting 400 attributes and they may change(at what frequency is not known as yet). We are planning to store this in Redshift.
My qs is:
Should we store as 1 row per user with 400 cols(1 column for each attribute)
or should we go for a table structure like
(uid, attribute id, attribute value, preference score) which will be (20-400)rows by 3 columns
Which kind of storage would lead to a better performance in Redshift.
Should be really consider NoSQL for this?
Note:
1. This is a backend for real time application with increasing number of users.
2. For processing, the above table has to be read with entire information of all attibutes for one user i.e indirectly create a 1*400 matrix at runtime.
Please help me which desgin would be ideal for such a use case. Thank you
You can go for tables like given in this example and then use bitwise functions
http://docs.aws.amazon.com/redshift/latest/dg/r_bitwise_examples.html
Bitwise functions are here
For your problem, I would suggest a two table design. Its more pain in the beginning but will help in future.
First table would be a key value kind of first table, which would store all the base data and would be kind of future proof, where you can add/remove more attributes, but this table will continue working.
And a N(400 in your case) column 2nd table. This second table you can build using the first table. For the second table, you can start with a bare minimum set of columns .. lets say only 50 out of those 400. So that querying this table would be really fast. And the structure of this table can be refreshed periodically to match with the current reporting requirements. Also you will always have the base table in case you need to backfill any data.

How to efficiently design MySQL database for my particular case

I am developing a forum in PHP MySQL. I want to make my forum as efficient as I can.
I have made these two tables
tbl_threads
tbl_comments
Now, the problems is that there is a like and dislike button under the each comment. I have to store the user_name which has clicked the Like or Dislike Button with the comment_id. I have made a column user_likes and a column user_dislikes in tbl_comments to store the comma separated user_names. But on this forum, I have read that this is not an efficient way. I have been advised to create a third table to store the Likes and Dislikes and to comply my database design with 1NF.
But the problem is, If I make a third table tbl_user_opinion and make two fields like this
1. comment_id
2. type (like or dislike)
So, will I have to run as many sql queries as there are comments on my page to get the like and dislike data for each comment. Will it not inefficient. I think there is some confusion on my part here. Can some one clarify this.
You have a Relational Scheme like this:
There are two ways to solve this. The first one, the "clean" one is to build your "like" table, and do "count(*)'s" on the appropriate column.
The second one would be to store in each comment a counter, indicating how many up's and down's have been there.
If you want to check, if a specific user has voted on the comment, you only have to check one entry, wich you can easily handle as own query and merge them two outside of your database (for this use a query resulting in comment_id and kind of the vote the user has done in a specific thread.)
Your approach with a comma-seperated-list is not quite performant, due you cannot parse it without higher intelligence, or a huge amount of parsing strings. If you have a database - use it!
("One Information - One Dataset"!)
The comma-separate list violates the principle of atomicity, and therefore the 1NF. You'll have hard time maintaining referential integrity and, for the most part, querying as well.
Here is one way to do it in a normalized fashion:
This is very clustering-friendly: it groups up-votes belonging to the same comment physically close together (ditto for down-votes), making the following query rather efficient:
SELECT
COMMENT.COMMENT_ID,
<other COMMENT fields>,
COUNT(DISTINCT UP_VOTE.USER_ID) - COUNT(DISTINCT DOWN_VOTE.USER_ID) SCORE
FROM COMMENT
LEFT JOIN UP_VOTE
ON COMMENT.COMMENT_ID = UP_VOTE.COMMENT_ID
LEFT JOIN DOWN_VOTE
ON COMMENT.COMMENT_ID = DOWN_VOTE.COMMENT_ID
WHERE
COMMENT.COMMENT_ID = <whatever>
GROUP BY
COMMENT.COMMENT_ID,
<other COMMENT fields>;
[SQL Fiddle]
Please measure on realistic amounts of data if that works fast enough for you. If not, then denormalize the model and cache the total score in the COMMENT table, and keep it current it through triggers every time a new row is inserted to or deleted from *_VOTE tables.
If you also need to get which comments a particular user voted on, you'll need indexes on *_VOTE {USER_ID, COMMENT_ID}, i.e. the reverse of the primary/clustering key above.1
1 This is one of the reasons why I didn't go with just one VOTE table containing an additional field that can be either 1 (for up-vote) or -1 (for down-vote): it's less efficient to cover with secondary indexes.

Need help on Mysql Database Structure

I have 200 users each user will eventually have a "reviewINFO" table with certain data.
Each user will have a review every 3 to 4 months
So for every review, it creates a new row inside the "reviewINFO" table.
This is where i'm stuck. I'm not sure if I need to serialize a table inside each row or not.
Example:
-> links
"USER1reviewINFO"-row1->USER1table1
-row2->USER1table2
-row3->USER1table3
-row4->USER1table4
-row5->USER1table5
"USER2reviewINFO"-row1->USER2table1
-row2->USER2table2
-row3->USER2table3
-row4->USER2table4
-row5->USER2table5
using this method it will make a couple of thousand rows within two years. And I think its harder to manage.
"Userxtablex" is a table with dynamic rows of children names,ages,boolean
What i'm think of doing is serialize each USERxtable into its corresponding row.
Please help as I would not like to make this complicate or inefficient
Generally, you should never have to serialize data of this nature into a table row to accomplish what your goal is (which I am assuming is an implicit link between a user and a review)
What you need to do is key the reviews by a user_id such that all the reviews are packaged in one table, and relate numerically back to the users table.
Assuming you have an AUTO_INCREMENT primary key in the user table, all you would need is a user_id field in the reviews table that represents what user the review relates to. There is no need for a separate structure for each user, if that's what you are suggesting. Reviews can have date fields as well, so you can perform queries for a specific year or window of time.
You can then use a JOIN query to select out your data set relating to a particular user or review, and apply the usual WHERE clause to determine what result set you want to fetch.