MySQL Unique Combination Across Many Columns - mysql

I want to have a unique pairing of columns in MySQL however it can be in any order...
So if I have three columns
Person1 Person2 Person3
and they all are FOREIGN keys to a person and I want these only to be allowed once in the table...
The row:
1 2 3
is the same as
3 2 1
is the same as
2 3 1
etc etc...so just doing UNIQUE(Person1, Person2, Person3) does not accomplish this task I do not think.

My solution (given this schema) would be to make the order predictable. The easiest way is to store them in descending order so that the first row always contains the smallest value and so on. This makes it easier to query and you will not be able to add two rows with the same members.
Since your question is pretty generic, my answer is vague. If you provide an example of what you are trying to accomplish, there might be better ways. In my experience, when you find yourself having columns representing the same type of relation it's often better to normalize. Going from friend1, friend2, ... friendN to a table of friends per user, for example.

Related

Storing and referencing an immutable ordered list in a relational database

Background: I have a database containing parents and children's names (this is a simplification of the actual data, but the analogy is close enough).
Task: The database has to store an ordered list of children's names for each parent.
Assumptions:
The database will contain millions of parents, possibly more.
Parents typically have no more than to 4 or 5 children, but rare (and even extreme) cases have to be supported as well.
Children's names (as well as the ordering) tends to repeat a lot. So parents should reference some children_names_list_id instead of keeping a copy of the actual names.
Children's names as well as their ordering for a particular parent are immutable.
Insertion of new parents will be very frequent. When a new parent is inserted with a list of his children, if such a list of names already exists in the database, the new parent should reference the existing list identifier.
Queries about names and their ordering should be possible (for example - find all parents that names a child "Bob" after naming a child "Alice", or find all parents that named a child "Alice", then had two more children, with the third named "Carol" etc)
Questions:
What's the best way to store such lists? The solution should be robust and support fast parent insertions.
How should a parent reference the lists?
Current (proposed) solution:
My current approach is to have a table that maps children names to integer name ids (names are long, integers are short).
Then store name lists in the following tuples: <list_id> <order> <name_id> so the list table will look like this:
<list_id> <order> <name_id>
1 1 123
1 2 345
1 3 678
2 1 901
3 1 123
3 1 901
The example table contains three lists: [123,345,678], [901], [123,901] which might correspond to something like: ["Alice", "Bob", "Carol"], ["Dave"], ["Alice", "Dave"]
The parents table will then have a children_list_id column that references the list_id column.
This solution seems to be robust, except for two issues:
I'm not sure whether insertions will be fast enough (looking up whether an existing list already exists seems like it could be slow), but other approaches seem to be less robust or (much) harder to query.
The key to the name list table is composed of both the list_id and the order columns; the parents table has to reference only the list_id which should be a foreign key, but since list_id isn't a key by itself in the list table, an additional table of lists, in which list_id is key is needed. This seems cumbersome.
Alternate solution:
The table of lists will store implicit ordering in the columns:
<list_id> <name_1> <name_2> <name_3> <name_4> ... <name_100>
1 111 222 333 null
2 444 null
3 555 111 null
In this table, list_id will be the primary key.
The parents table will keep the list_id as a foreign key.
This solution is somewhat less robust (how many columns do I create? 10? 20? 50?), but makes insertions much quicker. And since the list_id is a key, no additional tables are needed. A possible downside however is that some queries become much more complicated since they have to reference multiple columns.
Thanks!
The list table is over-design. Just have a Parents table, a Names table and a ParentChildren table. The ParentChildren table is just like your list table, except for a few details. It would look like:
<ParentId> <Order> <NameId>
1 1 123
1 2 345
1 3 678
2 1 901
3 1 123
3 1 901
I don't see a particular savings to storing independent lists. Just store the children for each parent.

Data normalization - mysql

I searched a bit as my case is same as "multiple tables vs comma separated values" but my case is different. I have done both practices but for a new requirement I am bit confused as the columns that contain comma separated columns are around 3-4...so what guys would suggest...
Example (Existing Scenario):
Student Table
---------------------
id name subjectsEnrolled gamesEnrolled BooksIssued absentOn
1 Naya yoga, maths cricket, football aka#date, baka#date,caka#date 20121010,20130103
2 sait science, botany football,chess,abc prea#date, sunbaka#date 20121010,20130103
Possible Fixes I have thought
Student Table (as student master data)
id name moredetailsofstudent like address & blah blah
1 Naya
2 Naya
then having another table absentTable
StudentId absentOn
1 20121010
1 20121011
2 20121010
2 20121011
1 20121012
then having another tables master "booksTable" with unique ids of books & booksIssueTable
booksIssueTable
bookid issuedby dated
4 1 a date
2 1 a date
and almost same for games (2 tables) & subjects (2 table) and storing same way i am doing above for books.
gameid enrolledby
4 1
2 12
4 10
3 12
and so on...
Am I going correct...just this is what i want to know from our database & coder experts.
note: I am not creating a school management...its just an example...as i couldn't post real data. In real scenraio I have too big data in main student table like 10-20 lac ...so absentTable, booksIssueTable, gamesEnollTable, subjectenrollTable can have easily 10 lac records * 1-10...= around 30-90 lacs each if i go with this solution. Please suggest.
Update
As of now, I have learnt that I am going in right direction, what would anyone suggest to have the dates in absenttable. Should i create a master table for the dates too, and then using its dateid to bind the absentTable ?
Yes, normalizing is better than storing comma-separated values. I gave a popular answer for the pitfalls of comma-separated values here: Is storing a delimited list in a database column really that bad?
As for your other question, no I wouldn't make a lookup table for simple date values. The scenario where that might be useful would be when you have many rows in AbsentTable that have the same date, and you would want to change the value for all these rows by updating a single row somewhere else. But I expect your AbsentTable would never need to do this.
By the way, using an id to refer to an entry in another table has nothing to do with normalization. You can use natural values both as the primary key of the referenced table, and the foreign key in the referencing table, and still both tables comply with normal form.
You may also be interested in what I have written about comma-separated values and database normalization in my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
You're doing exactly what's expected to get a solution to your problem, even with large amounts of data, it gets to a point where having proper m-n tables will improve a lot more than having values separated by commas in the same fields.
So good job in finding that solution for yourself. Can't suggest any better.

Find first, second, third, and so forth record per person

I have a 1 to many relationship between people and notes about them. There can be 0 or more notes per person.
I need to bring all the notes together into a single field and since there are not going to be many people with notes and I plan to only bring in the first 3 notes per person I thought I could do this using at most 3 queries to gather all my information.
My problem is in geting the mySQL query together to get the first, second, etc note per person.
I have a query that lets me know how many notes each person has and I have that in my table. I tried something like
SELECT
f_note, f_person_id
FROM
t_person_table,
t_note_table
WHERE
t_person_table.f_number_of_notes > 0
AND t_person_table.f_person_id = t_note_table.f_person_id
GROUP BY
t_person_table.f_person_id
LIMIT 1 OFFSET 0
I had hoped to run this up to 3 times changing the OFFSET to 1 and then 2 but all I get is just one note coming back, not one note per person.
I hope this is clear, if not read on for an example:
I have 3 people in the table. One person (A) has 0 notes, one (B) with 1 and one (C) with 2.
First I would get the first note for person B and C and insert those into my person table note field.
Then I would get the second note for person C and add that to the note field in the person table.
In the end I would have notes for persons B and C where the note field for person C would be a concatination of their 2 notes.
Welcome to SO. The thing you're trying to do, selecting the three most recent items from a table for each person mentioned, is not easy in MySQL. But it is possible. See this question.
Select number of rows for each group where two column values makes one group
and, see my answer to it.
Select number of rows for each group where two column values makes one group
Once you have a query giving you the three rows, you can use GROUP_CONCAT() ... GROUP BY to aggregate the note fields.
You can get one note per person using a nested query like this:
SELECT
f_person_id,
(SELECT f_note
FROM t_note_table
WHERE t_person_table.f_person_id = t_note_table.f_person_id
LIMIT 1) AS note
FROM
t_person_table
WHERE
t_person_table.f_number_of_notes > 0
Note that tables in SQL are basically without a defined inherent order, so you should use some form or ORDER BY in the subquery. Otherwise, your results might be random, and repeated runs asking for different notes might unexpectedly return the same data.
If you only aim for a concatenation of notes in any case, then you can use the GROUP_CONCAT function to combine all notes into a single column.

SQL schema design question around relating 2 different types of ID's to one piece of information

I'm working on redesigning some parts of our schema, and I'm running into a problem where I just don't know a good clean way of doing something. I have an event table such as:
Events
--------
event_id
for each event, there could be n groups or users associated with it. So there's a table relating Events to Users to reflect that one to many relationship such as:
EventUsers
----------
event_id
user_id
The problem is that we also have a concept of groups. We want to potentially tie n groups to an event in addition to users. So, that user_id column isn't sufficient, because we need to store potentially either a user_id or a group_id.
I've thought of a variety of ways to handle this, but they all seem like a big hack. For example, I could make that a participant_id and put in a participant_type column such as:
EventUsers
----------
event_id
participant_id
participant_type
and if I wanted to get the events that user_id 10 was a part of, it could be something like:
select event_id
from EventUsers
where participant_id = 10
and participant_type = 1
(assuming that somewhere participant_type 1 was defined to be a User). But I don't like that from a philosophical point of view because when I look at the data, I don't know what the number in participant_id means unless I also look at the value in particpant_type.
I could also change EventUsers to be something like:
EventParticipants
-----------------
event_id
user_id
group_id
and allow the values of user_id and group_id to be NULL if that record is dealing with the other type of information.
Of course, I could just break EventUsers and we'll call it EventGroups into 2 different tables but I'd like to keep who is tied to an event stored in one single place if there's a good logical way to do it.
So, am I overlooking a good way to accomplish this?
Tables Events, Users and Groups represent the basic entities. They are related by EventUsers, GroupUsers and EventGroups. You need to union results together, e.g. the attendees for an event are:
select user_id
from EventUsers
where event_id = #event_id
union
select GU.user_id
from EventGroups as EG inner join
GroupUsers as GU on GU.group_id = EG.group_id
where EG.event_id = #event_id
Don't be shy about creating additional tables to represent different types of things. It is often easier to combine them, e.g. with union, than to try to sort out a mess of vague data.
Of course, I could just break EventUsers and we'll call it EventGroups into 2 different tables
This is the good logical way to do it. Create a junction table for each many-to-many relationship; one for events and users, the other for events and groups.
There's no correct answer to this question (although I'm sure if you look hard enough you'll finds some purists that believe that their approach is the correct one).
Personally, I'm a fan of the second approach because it allows you to give columns names that accurately reflect the data they contain. This makes your SELECT statements (in particular when it comes to joining) a bit easier to understand. Yeah, you'll end up with a bunch of NULL values in the column that is unused, but that's not really a big deal.
However, if you'll be joining on this table a lot, it might be wise to go with the first approach, so that the column you join on is consistently the same. Also, if you anticipate new types of participant being added in the future, which would result in a third column in EventParticipants, then you might want to go with the first approach to keep the table narrow.

Storing lots of user information into one MySQL table?

I'm working on a application where each user has to fill out an extensive profile for themselves.
The first part of the user profile consists of about 25 or so fields of general information
The next section of the user profile is a section where they evaluate themselves on a set list of criteria. ie, "Rate how good you are at cooking" and then they tick a radio box from one to five, there is also a check box that the can check if they are 'extra interested' in the activity/subject they rated themselves on.
There are about 40 of these that they rate themselves on.
So my question is, how should I store this information, should there be columns in my users table for every field and item? This would be nearly 70 fields
or should I setup a table for user_profile, and user_self_evaluation, and have the columns for each in there and have a one-one relationship with the users?
Use separate tables. In this way when you update only self evaluation, you does not need to update the user_profile table. The idea here is to separate the often updated fields in another table, leaving the rarely updated on another location. If the table became large, and the username/password is in separate table, the performance of lookup by userid / username won't be affected by the large amount of update queries, nor you'll bring the whole site down if you alter the self_evaluation table.
But if you are planning to add new evaluations, I'd suggest a different design:
user_profile table with the 25 profile field
self_evaluations table, with id and name, and any meta information about the question; with 1 record per evaluation
user_profile_evaluation with userid, evaluationid, score, extra - with one record for each evaluation of the user.
This way your schema will be much more flexible and you won't need to alter the table in order to add another evaluation.
or should I setup a table for user_profile, and user_self_evaluation,
and have the columns for each in there and have a one-one relationship
with the users?
Yes, this is how you should do it, if you know you won't expand the table in the future. The other option is too bad.
If you think you will expand the evaluations in the future, then you can do it like this:
user_self_evaluation table
user_id | evaluation_type | evaluation_value
1 | cooking | 5
1 | singing | 3
2 | cooking | 2
2 | singing | 5
Make the combined columns (user_id, evaluation_type, evaluation_value) a unique or the primary.
I think the latter one is better, a table with 70 columns is really bad-looking and can get really worse if you try to manage it.
When every question is multiple choice you can also add numbers in one field for each answer.
Let's say you've got four questions with 4 choices:
You could save them as 1433 in one column called answerers, (the first question is answer 1, second answer 4, third answer 3, and last but not least question 4 is answer 3)
Just giving you some choices here.
But if I had to choose between one-one relationship and 1 table, I would choose one one relationship because it's easier to manage later on.