best way to store user's "favorites" in MySQL - mysql

I have a photo gallery. I want to add "Add to favorites" button - so user can add other user to his/her favorites. And then I want each user to be able to watch his list of favorite users, as well as to be able to watch who (list of users) added this user to favorites.
I found two ways, and the first is:
faver_id faved_id
1 10
1 31
1 24
10 1
10 24
I dont like this method because of
1) a lots of repeating 2) very large table in future (if a have at least 1001 users, and each likes other 1000 users = 1 001 000 records) which I suppose will slow down my base.
The second way is:
user_id favs
1 1 23 34 56 87 23
10 45 32 67 54 34 88 101
I can take these favs and explode() them in php or search if user likes some other user by MySQL query select count(user_id) from users where favs LIKE '% 23 %' and user_id=10;
But I feel the second way is not very "correct" in MySQL terms.
Can you advice me something?

Think about this. Your argument against using the first approach is that your tables might get too big, but you then go on to say that if you use the second approach you could run a wildcard query to find fields which contain something.
The second approach forces a full table search, and is unindexable. With the first approach, you just slap indexes on each of your columns and you're good to go. The first approach scales much, much, much better than the second one. Since scaling seems to be your only concern with the first, I think the answer is obvious.
Go with the first approach. Many-to-Many tables are used everywhere, and for good reason.
Edit:
Another problem is that the second approach is handing off a lot of the work in maintaining the database off to the application. This is fine in some cases, but the cases you're talking about are things that the database excels at. You would only be reinventing the wheel, and badly.

Definitely go with the first way.

Well, the second way is not that easy when you want to remove or make changes, but the its all right in terms of MySQL.
Though, Joomla is even including different date of information in the same field called params.

Related

Horizontal vs vertical data approach in MySql

I am creating analytics module in our 'Tours & Travels' application.
Following are the steps through which user has to go in our application:
Step 1: User search tours for any city.
Step 2: User views the details of the tour.
Step 3: If user finds perfect tour for him/her, he/she book the tour.
Step 4: While booking the tour user enter passenger details.
Step 5: User reviews the final data.
Step 6: User pays online & tour gets booked.
Now I want to store the users each activity on our system for our analysis purpose. For this I have below table structure:
Id
user_id
tour_id
city_id
searched_at
viewed_at
entered_pax_info_at
reviewed_at
booked_at
151
34
678
1290
2021-03-14 12:00:00
2021-03-14 12:05:00
2021-03-14 12:10:00
2021-03-14 12:15:00
2021-03-14 12:20:00
Now while analyzing the data from this structure, Admin user may want data based on below columns:
searched_at
or
viewed_at
or
entered_pax_info_at
or
reviewed_at
or
booked_at
Eg. Admin user can ask the data like - Give me report of tour 'ABC' which got booked from Jan 2021 to March 2021. etc...
Now to make such searches on huge data efficiently, I will have to put indexes on each above mentioned column. By doinng this there will be no efficiency problem while reading the data, but it will cost me while writing, updating operations.
To counter above problem I am thinking below structure table:
id
user_id
tour_id
city_id
activity_type
date
50
34
678
1290
searched
2021-03-14 12:00:00
51
34
678
1290
viewed
2021-03-14 12:05:00
52
34
678
1290
pax_info
2021-03-14 12:10:00
53
34
678
1290
reviewed
2021-03-14 12:15:00
54
34
678
1290
booked
2021-03-14 12:20:00
Now to make searches on huge data efficiently on above table structure, I may have to put indexes only on activity_type & date column.
But accoding to me disadvantage of this stucture is that it is going to take large space comparively to the first approach.
I am left with confusion which approch (among above two or any other) will be future proof in terms of scalability, efficacy.
Any help to sort out this would be appreciated.
Your second alternative is far better than your first. It allows your system to be flexible about the number of steps you will analyze, for one thing. Normalized (vertical) tables almost always scale up better than denormalized (horizontal) tables.
And about the space used by your tables and indexes? Fuggedaboudit! Disk / SSD space is really cheap, and getting expomemtially cheaper by the month.
Unless your system already has tens of millions of rows AND your database administrator is pressuring you to denormalize your tables for performance's sake, do not worry about the size of your tables. Seriously.
The analytic database should not not the operational database. In fact, I often work with analytic databases that are batch updated and rarely -- if ever -- have updates. Typically, analysts don't like their data changing under them as they are solving a problem.
In other words, either you need to rethink your approach or you have not described the full problem.
The first table you described looks like a good summary table for users that might be quite appropriate for analysts. It is not appropriate as an operational store for the data. In the world I live in, people are not so consistent about their searches. They search for the best tour in one city, find the price and other details, go back and check others. And so on. This is "navigation" and "path analysis", which your structure does not allow.
Such a summary table can be produced in a batch process. Even on a relatively large amount of data, that might take just a minute or two and it might be sufficient to do it once per day. If so, problem solved. There are no updates. The indexes are the ones needed on the analytic side.
On the other hand, there is lots of analysis that is this structure does not support. For instance, how many cities did a user look at before deciding on the final city? Well, maybe you could eke out the answer to that question.
I think you need 2 tables
one for browsing activity. In this situation, you probably should not even identify the user; let them be anonymous.
one for booking, paying, etc. (Probably more than one table, due to normalization, etc.)
The browsing table probably only gets INSERTs, and lots of them. If there will be many millions of rows, then we should talk about "summary tables". You don't have to decide on what, exactly, they summarize, but instead wait until the admins have requested some "reports".
The booking table(s) will have fewer INSERTs and possibly more UPDATEs than INSERTs.

Two sets of sequences. How do I reset when a record is deleted

The other similar questions do not solve my problem. This is nothing to do with the PK.
My app is for salespeople to make quotes built with Web2Py. I have products with a monthly cost and a purchase cost and some with both.
The output is 2 separate tables (monthly and purchase), the salespeople want to be able to change the order the products appear on the quote. They also need to be numbered sequentially in the output.
However, as a product may be in the offer but only have a monthly cost and visa versa or even both costs. The order columns look like this:
1 0
2 1
3 0
0 2
0 3
0 4
Which is all fine until a product needs to be removed from an offer.
If for example the second item is deleted. I need to update both sequences.
The sequences are not very long. 20 max.
Is there a better way to store the ordering? If not, is there a neater solution than retrieving and updating every record in a offer?
The short answer is no. Thanks CL for the link.
This solves the problem but as the article states, it is a method which will eventually break, however this VERY unlikely in this situation. A user would have to spend a very long time changing the order for this to happen and each sequence is only available to one user and his management.
I have decided to use the divide by 2 method as it reduces the Database load and follow up with maintenance re-doing the sequences as suggested in the article.
Considering the scope of the project. I feel the solutions fits.

The right way to plan my database

I'm creating a music sharing site, so each user can set up his account, add songs etc..
I would like to add the ability for users to give points to one another based on whether they like the song.
For example user1 has some songs in his collection, user2 likes a song so he clicks "I like" resulting in giving a point to user1.
Now I would like to know if my idea of creating the "Points table" in my database is somewhat right and correct.
I decided to create a separate table to hold data about points, this table would have id column, who gave the point to who, song id column, date column etc. My concern is that in my table I will have a row for every single point that has been given.
Of course it's nice to have all this specific info, but i'm not sure if this is the right way to go, or perhaps i'm wasting reasources, space.. and so on.
Maybe I could redesign my songs Table to have additional column points, and I would just count how many points each song has.
I need some advice on this, maybe I shouldn't really worry about my design, optimalization and scalibility, since todays technology is so fast and powerful and database queries are instant quick..
IMO, it's better to use a transactional table to track the points given to a user based on their song-lists. Consider how Stackoverflow (SO) works, if you up-vote a question or solution, you can remove your vote at a later time, if SO used a summation column, it would be impossible to support this type of functionality.
I wouldn't worry too much about the number of rows in your points table, as it will probably be pretty narrow, generously; 10 columns at the most. Not to mention the table would be a pivot table between users, so would comprised mostly of int values.
Part of the issue is really simple. If you need to know
who gave a point
to whom
for which song
on which date
then you need to record all that information.
Wasn't that simple?
If you only need to know the totals, then you can just store the totals.
As for scale, say you have 20,000 users, each with an average of 200 songs. Let's say 1 in 10 gets any up votes, averaging 30 per song. That's 4 million user-songs; 400,000 that get up votes, at 30 per song you have 12 million rows. That's not that many. If the table gets too many rows, partitioning on "to whom" would speed things up a lot.

Where to split Directrory Groupings? A-F | G-K | L-P

I'm looking to build a "quick link" directory access widget.
e.g. (option 1)
0-9 | A-F | G-K | L-P | Q-U | V-Z
Where each would be a link into sub-chunks of a directory starting with that character. The widget itself would be used in multiple places for looking up contacts, companies, projects, etc.
Now, for the programming part... I want to know if I should split as above...
0-9 | A-F | G-K | L-P | Q-U | V-Z
10+ 6 5 5 5 5
This split is fairly even and logically grouped, but what I'm interested to know is if there is a more optimal split based on the quantity of typical results starting with each letter. (option 2)
e.g. very few items will start with "Q".
(Note: this is currently for a "North American/English" deployment.)
Does anyone have any stats that would backup reasons to split differently?
Likewise, for usability how do users like/dislike this type of thing? I know mentally if I am looking for say: "S" it takes me a second to recall it falls in the Q-U section.
Would it be better to do a big list like this? (option 3)
#|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z
I would suggest one link per letter and hide the letters that don't have any results (if that doesn't asks for too much processing power).
As a user I would most definitely prefer one link per letter.
But better (for me as a user) would be a search box.
I think you're splitting the wrong things. You shouldn't evenly split letters, you should evenly split the results (as best as you can).
If you want 20 results per page, and A has 28, while B-C have 15 you'll want to have
A
B-C
and so on.
Additionally, you might have to consider why you are using alphabet chunking instead of something a bit more contextual. The problem with alphabet chunking is that users have to know the name of what they are looking for, and that name has to be the same as yours.
EDIT: We've tested this in lab conditions, and users locate information in chunk by results vs chunk by number of letters in pretty much the same way.
EDIT_2: Chunking by letters almost always tests poorly. Think if there are any better ways to do this.
Well, one of the primary usability considerations is evenly-distributed groups, so either your current idea (0-9, A-F, etc.) would work well, or the list with each individual letter. Having inconsistently-sized groups is a definite no-no for a user interface.
You probably definitely don't want to split across a number - that is, something like
0-4 | 5-B | ...
Besides that, I'd say just see where your data lies. Write a program to do groupings of two, three, four, five, etc... and see what the most even split for each grouping is. Pick the one that seems nicest. If you have sparse data, then having one link per letter might be annoying if there are only 1 or 2 directories with that name.
Then again, it depends what a typical user will be looking for. I can't tell what that might be just from your description - are they just navigating a directory tree?
I almost always use the last option since it is by far the easier to navigate for a user. Use that if you have enough room for it and the other one if you have a limited amount of screen estate.

Is it a good idea to have Multiple FKs in one field in MySQL?

I'm setting up a database and I'm interested in having a facebook-like friends system.
My original plan was to have a table like so:
uid friends
4 30,23,12,33
30 54,92,108
with all these numbers being FK's to tables with user information.
I was told that this is inadvisable and practically impossible since MySQL will only deal with FK's well if they're the only placed one in a cell.
So maybe something like this?
uid(PK) friend
4 30
4 23
4 12
30 54
30 92
30 108
ect.
Won't this leave me with an enormous number of rows? (tens of thousands?)
Is the first technique not worth it in terms of time and efficiency?
10's of thousands of rows is peanuts, even for Mysql. There is no other way to model a many-to-many relationship. You will have indexes on these ids, which perform better by many orders of magnitude that a substring comparison.
I would say that the second way is indeed the "right" way to do it - and ultimately superior to the first way you mentioned in almost every way. And yes, it will leave you with an enormous number of rows.
If indexed, it should still be very fast though - up to a point (maybe hundreds of thousands or perhaps a few million rows). Beyond that and you'll want to start looking into partitioning or other more advanced techniques.
Not worth the time and efficiency? You will gain much more efficiency if you use the second method.
For a slightly more theoretical explaination why this is bad, read http://en.wikipedia.org/wiki/First_normal_form