Let's assume there is a table, with theese rows:
-personID,
-personName,
-personInterests
There is also another table, which stores the interests:
-interestID
-interestName
One person can have multiple interests, so I put the serialize()-d or JSON representation of the interest array into the interest field. This is not a String, like "reading", buth rather an index of the interests table, which stores the possible interests. Something like multiple foreign keys in one field.
The best way would be to use foreign keys, but it is not possible to achieve multiple references in one field...
How do I run such a query, without REGEX or splitting the field's content by software? If putting indexes to one field is not the way to go, then how is it possible, to achieve a structure like this?
Storing multiple indexes or any references in one field is strictly not advised.
You have to create something that I call "rendezvous" table.
In your case it has:
- ID
- UserID (foreign key)
- InterestID (foreign key)
Every single person can have multiple interests, so when a person adds a new interest to himself, you just add a new row into this table, that will have a reference to the person and the desired interest with a foreign key NOT NULL.
On large-scale projects when there are too many variations available, it is advised, to not to give an ID row to this table, but rather set the two foreign keys also primary keys, so the duplication will be impossible and the table-index will be smaller, as well as in case of lookup, it will consume less from the expensive computing power.
So the best solution is this:
- UserID (foreign key AND primary key)
- InterestID (foreign key AND primary key)
I believe the only way you can implement this is to create a third table, which will actually get updated by a trigger (Similar to what Gabor Dani advised)
Table1
-personID,
-personName,
-personInterests
Table2
-interestID
-interestName
Table3
-personInterestID (AutoIncrement Field)
-personID
-interestID
Then you need to write a trigger which will do this a stored procedure may be needed because you will need to loop through all the values in the field.
Related
I need the advice of someone who has a greeter experience.
I have an associative entity in my database, like that:
Table2-> CustomerID, ServiceID, DateSub
Since the same customer (with PK, for example 1111) can require the same service (with PK, for example 3) more than once but never in the same date , the composite PK of Table 2 can't be just (CustomerID, ServiceID).
Now I have 2 options:
1- Also "DateSub" will be a primary key, so the PK of table 2 will be (CustomerID, ServiceID, DateSub)
2- Create a specific PK for the associative entity (for example, Table2ID, and so CustomerID and Service ID will be FK)
Which of the 2 approach would you follow and why? Thank you
First of all you need to decide whether is it your requirement to make combination of CustomerID, ServiceID amd DateI column as unique? If so then you should go for firt option.
Otherwise I would go for second option.
With first option if DateI is of date data type you will not be able to insert same service for a customer twice. If it's datetime then it's doable though.
If you want to use this primary key (composite primary key) in any other table as foreign key then you need to use all three columns there too.
I tend to prefer the PK be "natural". You have 3 columns that, together, can uniquely define each row. I would consider using it.
The next question is what order to put the 3 columns in. This depends on the common queries. Please provide them.
An index (including the PK) is used only leftmost first. It may be desirable to have some secondary key(s), for efficient access to other columns. Again, let's see the queries.
If you have a lot of secondary indexes, it may be better to have a surrogate, AUTO_INCREMENT "id" as the PK. Again, let's see the queries.
If you ever use a date range, then it is probably best to have DateSub last in any index. (There are rare exceptions.)
How many rows in the table?
The table is ENGINE=InnoDB, correct?
Reminder: The PRIMARY KEY is a Unique key, which is an INDEX.
DateSub is of datatype DATE, correct?
I am in a situation where i have to store key -> value pairs in a table which signifies users who have voted certain products.
UserId ProductID
1 2345
1 1786
6 657
2 1254
1 2187
As you can see that userId keeps on repeating and so can productId. I wanted to know what can be the best way to represent this data. Also is there a necessity of using primary key in here. I've searched a lot but am not able to find the exact specification about my problem. Any help would be appreciated. Thank you.
If you want to enforce that a given user can vote for a given product at most once, create a unique constraint over both columns:
ALTER TABLE mytable ADD UNIQUE INDEX (UserId, ProductID);
Although you can use these two columns together as a key, your app code is often simpler if you define a separate, typically auto increment, key column, but the decision to do this depends on which app code language/library you use.
If you have any tables that hold a foreign key reference to this table, and you intend to use referential integrity, those tables and the SQL used to define the relationship will also be simpler if you create a separate key column - you just end up carting multiple columns around instead of just one.
I want to have a lookup table that links two of the same things to eachother. Say I have a 'Person' table and I want to lookup the relationship between two people. I'll have column one of the lookup be 'PersonId1' and column two be 'PersonId2' and the third column be 'Relationship'. Since the relationship goes both ways I don't need to have duplicate records with the PlayerId's switched. Is there any way to make mysql enforce uniqueness on PlayerId1 and PlayerId2 combinations regardless of which order they're in?
Does that make sense?
Short answer: No.
Longer answer: You could set up a trigger to swap the order of the two person ids if the second were smaller than the first, then write them, and use a composite key.
Even longer answer: Not all interpersonal relationships are commutative (not all relationships go both ways). What about the "Employee" or "Mother" relationships? Even the "Friend" relationship, which is presumably peer-to-peer, might be better represented if you had separate rows saying A is B's Friend and B is A's Friend. So maybe you want a three-field composite key on this table.
You mean you want to have a unique row record from PersonID1 and PersonID2 Column (regardless of the Relationship column)? If that so, you may use the Composite key (Multi column key).
Here's an example:
CREATE TABLE Person (
PersonId1 INT,
PersonId2 INT,
PRIMARY KEY (PersonId1, PersonId2)
)
+1 for composite pk. To prevent duplicate combinations, an extra varchar column with for example personid1+personid2 with a unique constraint on it may be a solution...
See also: person data model example
I'm designing a db table that will save a list of user's favorited food items.
I created favorite table with the following schema
id, user_id, food_id
user_id and food_id will be foreign key linking to another table.
Im just wondering if this is efficient and scalable cause if user has multiple favorite things then it would need multiple rows of data.
i.e. user has 5 favorited food items, then it will consist of five rows to save the list for that user.
Is this efficient? and scalable? Whats the best way to optimize this schema?
thnx in advance!!!
tldr; This is called a "join table" and is the correct and scalable approach to model M-M relationships in a relational database. (Depending upon the constraints used it can also model 1-M/1-1 relationships in a "no NULL FK" schema.)
However, I contend that the id column should be omitted here so that the table is only user_id, food_id. The PK will be (user_id, food_id) in this case.
Unlike other tables, where surrogate (aka auto-increment) PKs are sometimes argued for, a surrogate PK generally only adds clutter in a join table as it has a very natural compound PK.
While the PK itself is compound in this case, each "joined" table only relates back by part of the PK. Depending upon queries performed it might also be beneficial to add covering indices on food_id or (food_id, user_id).
Eliminate Surrogate Key: Unless you have a specific reason for the surrogate key id, exclude it from the table.
Fine-tune Indexing: A this point, you just have a composite primary key that is the combination of the two foreign keys. In which order should the PK fields be?
If your application(s) predominantly execute queries such as: "for given user, give me foods", then PK should be {user_id, food_id}.
If the predominant query is "for given food, give me users", then the PK should be {food_id, user_id}.
If both query "directions" are common, add a UNIQUE INDEX that has the same fields as PK, but in opposite directions. So you'll have PK on {user_id, food_id} and index on {food_id, user_id}.
Note that InnoDB tables are clustered, which eliminates (in this case "unnecessary") table heap. Yet, the secondary index discussed above will not cause a double-lookup (since it fully covers the query), nor have a hidden overhead of PK fields (since it indexes the same fields as PK, just in opposite order).
For more on designing a junction table, take a look at this post.
To my opinion, you can optimize your table in the following ways:
As a relation table with 2 foreighkeys you don't have to use "id" field.
use "innodb" engine to your table
name your relation table "user_2_food", which will make it more clear.
try to use datatype as small as possible, i.e. "smallint" is better than "int", and don't forget "UNSIGNED" attribute.
Creating the below three Tables will result in an efficient design.
users : userId, username, userdesc
foods : foodId, foodname, fooddesc
userfoodmapping : ufid, userid, foodid, rowstate
The significance of rowstate is, if the user in future doesn't like that food, its state will become -1
You have 2 options in my opnion:
Get rid of the ID field, but in that case, make both your other keys (combined) your primary key
Keep your ID key as the primary key for your table.
In either case, I think this is a proper approach. Once you get into a problem of inefficiency, then you will look at probably how to load part of the table or any other technique. This would do for now.
This is more of a design problem then a programming one.
I have a table where I store details about retail products:
Name Barcode BarcodeFormat etc...
----------------------------------------
(Name, Barcode, BarcodeFormat) are three columns will uniquely identify a record in the table (Candidate Key). However, I have other tables that need a FK on this one. So I introduced an auto_increment column itemId and made that the PK.
My question is - should I have the PK as (itemId, Name, Barcode, BarcodeFormat) or would it be better to have PK(itemId) and UNIQUE(Name, Barcode, BarcodeFormat).
My primary concern is performance in terms of INSERT and SELECT operations but comments on size are also welcome.
I'm using an innodb table with mysql
Definitely: PK(itemId) and UNIQUE(Name, Barcode, BarcodeFormat).
You don't want the hassle of using a multi-part key for all your joins etc
You may one day have rows without barcode values which then won't be unique, so you don't want uniqueness hard-wired into your model (you can easily drop the unique without breaking any relationships etc)
The constraint on uniqueness is a business-level issue, not a database entity one: You'll always need a key, but you may not always need the business rule of uniqueness
Unless you have millions of products, or very high throughput requirements it won't make much difference in terms of performance.
My preference is to have a surrogate PK (i.e. the auto increment column, your second option of PK(itemId) and UNIQUE(Name, Barcode, BarcodeFormat) ) because this is easier to manage if business keys change.
You have two candidate keys. We call the three-column compound key the 'natural key' and the auto_increment column (in this case) the 'surrogate key'. Both require unique constraints ('unique' in lower case to denote logical) at the database level.
Optionally, one candidate key may be designated 'primary'. The choice of which key (if any) should get this designation is arbitrary. Beware of anyone giving you definitive advice on this matter!
If you already add an itemId then you should use that as PK and have the other three columns with a UNIQUE.
If you don't have an itemId then you could use the other columns as the PK, but it may become difficult to keep it everywhere. In this case it is not great, because the product should have an id since it is an entity, but if it where just a relationship, then it would be acceptable not to have an id column.