MySQL: Table entry representing all 'ids'? - mysql

I have a single value in a table that I want selected every time a query is made on the table.
Let me break is down.
I have the following entry:
Instead of making a new entry for every different user_id, can I use some kind is primitive to represent ALL user_ids instead of specific ids? Example below:
For reasons that I would rather not take the time to explain, this is what I need. Is there any way to do this?
Thanks in advance!

If I'm correct in assuming that that means you want tag_id linked to every user_id (as some sort of a catch-all clause), you have a few ways of going about it. Depending on your application, you can simply request it to add a row for tag_id = 1 whenever you add a user. If you would, however, want to do it in a single row, well ... it kind of misrepresents the relational model.
You could, presumably use the NULL special "value" (essentially, declare it without a value) and then check in your application logic with
WHERE user_id = [uid] OR user_id IS NULL
or some such. I'd prefer keeping the relations intact with the former approach, however; you lose foreign keys (although using NULL won't violate the constraint) and similar constraints if you don't.

Related

what is the best practice - a new column or a new table?

I have a users table, that contains many attributes like email, username, password, phone, etc.
I would like to save a new type of data (integer), let's call it "superpower", but only very few users will have it. the users table contains 10K+ records, while fewer than 10 users will have a superpower (for all others it will be null).
So my question is which of the following options is more correct and better in terms of performance:
add another column in the users table called "superpower", which will be null for almost all users
have a new table calles users_superpower, which will at most contains 10 records and will map users to superpowers.
some things i have thought about:
a. the first option seems wasteful of space, but it really just an ingeger...
b. the second option will require a left join every time i query the users...
c. will the answer change if "superpower" data was 5 columns, for example?
note: i'm using hibenate and mysql, if it changes the answer
This might be a matter of opinion. My viewpoint on this follows:
If superpower is an attribute of users and you are not in the habit of adding attributes, then you should add it as a column. 10,000*4 additional bytes is not very much overhead.
If superpower is just one attribute and you might add others, then I would suggest using JSON or another EAV table to store the value.
If superpower is really a new type of user with other attributes and dates and so on, then create another table. In this table, the primary key can be the user_id, making the joins between the tables even more efficient.
I would go with just adding a new boolean field in your user entity which keeps track of whether or not that user has superpowers.
Appreciate that adding a new table and linking it requires the creation of a foreign key in your current users table, and this key will be another column taking up space. So it doesn't really get around avoiding storage. If you just want a really small column to store whether a user has superpowers, you can use a boolean variable, which would map to a MySQL BIT(1) column. Because this is a fixed width column, NULL values would still take up a single bit of space, but this not a big storage concern most likely as compared to the rest of your table.

Integer values for status fields

Often I find myself creating 'status' fields for database tables. I set these up as TINYINT(1) as more than often I only need a handful of status values. I cross-reference these values to array-lookups in my code, an example is as follows:
0 - Pending
1 - Active
2 - Denied
3 - On Hold
This all works very well, except I'm now trying to create better database structures and realise that from a database point of view, these integer values don't actually mean anything.
Now a solution to this may be to create separate tables for statuses - but there could be several status columns across the database and to have separate tables for each status column seems a bit of overkill? (I'd like each status to start from zero - so having one status table for all statuses wouldn't be ideal for me).
Another option is to use the ENUM data type - but there are mixed opinions on this. I see many people not recommending to use ENUM fields.
So what would be the way to go? Do I absolutely need to be putting this data in to its own table?
I think the best approach is to have a single status table for each kind of status. For example, order_status ("placed", "paid", "processing", "completed") is qualitatively different from contact_status ("received", "replied", "resolved"), but the latter might work just as well for customer contacts as for supplier contacts.
This is probably already what you're doing — it's just that your "tables" are in-memory arrays rather than database tables.
As I really agree with "ruakh" on creating another table structured as id statusName which is great. However, I would like to add that for such a table you can still use tinyint(1) for the id field. as tinyint accepts values from 0 to 127 which would cover all status cases you might need.
Can you add (or remove) a status value without changing code?
If yes, then consider a separate lookup table for each status "type". You are already treating this data in a generic way in your code, so you should have a generic data structure for it.
I no, then keep the ENUM (or well-documented integer). You are treating each value in a special way, so there isn't much purpose in trying to generalize the data model.
(I'd like each status to start from zero - so having one status table for all statuses wouldn't be ideal for me
You should never mix several distinct sets of values within the same lookup table (regardless of your "zero issue"). Reasons:
A simple FOREIGN KEY alone won't be able to prevent referencing a value from the wrong set.
All values are forced into the same type, which may not always be desirable.
That's such a common anti-pattern that it even has a name: "one true lookup table".
Instead, keep each lookup "type" within a separate table. That way, FKs work predictably and you can tweak datatypes as necessary.

database storing multiple types of data, but need unique ids globally

a while ago, i asked about how to implement a REST api. i have since made headway with that, but am trying to fit my brain around another idea.
in my api, i will have multiple types of data, such as people, events, news, etc.
now, with REST, everything should have a unique id. this id, i take it, should be unique to the whole system, and not just each type of data.
for instance, there should not be a person with id #1 and a news item with id of #1. ultimately, these two things would be given different ids altogether: person #1 with unique id of #1 and news item #1 with unique id #2, since #1 was taken by a person.
in a database, i know that you can create primary keys that automatically increment. the problem is, usually you have a table for each data "type", and if you set the auto increment for each table individually, you will get "duplicate" ids (yes, the ids are still unique in their own table, but not the whole DB).
is there an easy way to do this? for instance, can all of these tables be set to work off of one incrementer (the only way i could think of how to put it), or would it require creating a table that holds these global ids, and ties them to a table and the unique id in that table?
You could use a GUID, they will be unique everywhere (for all intents and purposes anyway).
http://en.wikipedia.org/wiki/Globally_unique_identifier
+1 for UUIDs (note that GUID is a particular Microsoft implementation of a UUID standard)
There is a built-in function uuid() for generating UUID as text. You may probably prefix it with table name so that you may easily recognize it later.
Each call to uuid() will generate you a fresh new value (as text). So with the above method of prefixing, the INSERT query may look like this:
INSERT INTO my_table VALUES (CONCAT('my_table-', UUID()), ...)
And don't forget to make this column varchar of large enough size and of course create an index for it.
now, with REST, everything should have a unique id. this id, i take
it, should be unique to the whole system, and not just each type of
data.
That's simply not true. Every resource needs to have a unique identifier, yes, but in an HTTP system, for example, that means a unique URI. /people/1 and /news/1 are unique URI's. There is no benefit (and in fact quite a lot of pain, as you are discovering) from constraining the system such that /news/1 has to instead be /news/0983240-2309843-234802/ in order to avoid conflict.

unnecessary normalization

My friend and I are building a website and having a major disagreement. The core of the site is a database of comments about 'people.' Basically people can enter comment and they can enter the person the comment is about. Then viewers can search the database for words that are in the comment or parts of the person name. It is completely user generated. For example, if someone wants to post a comment on a mispelled version of a person's name, they can, and that's OK. So there may be multiple spellings of different people listed as several different entries (some with middle name, some with nickname, some mispelled, etc.), but this is all OK. We don't care if people make comments about random people or imaginary people.
Anyway, the issue is about how we are structuring the database. Right now it is just one table with the comment ID as the primary key, and then there is a field for the 'person' the comment is about:
comment ID - comment - person
1 - "he is weird" - John Smith
2 - "smelly girl" - Jenny
3 - "gay" - John Smith
4 - "owes me $20" - Jennyyyyyyyyy
Everything is working fine. Using the database, I am able to create pages that list all the 'comments' for a particular 'person.' However, he is obsessed that the database isn't normalized. I read up on normalization and learned that he was wrong. The table IS currently normalized, because the comment ID is unique and dictates the 'comment' and the 'person.' Now he is insistant that 'person' should have it's OWN table because it is a 'thing.' I don't think it is necessary, because even though 'person' really is the bigger container (one 'person' can have many 'comments' about them), the database seems to operate just fine with 'person' being an attribute of the comment ID. I use various PHP calls for different SQL selections to make it magically appear more sophisticated on the output and the different way the user can search and see results, but in reality, the set-up is quite simple. I am now letting users rank comments with thumbs up and thumbs down, and I keep a 'score' as another field on the same table.
I feel that there is currently no need to have a separate table for just unique 'person' entries because the 'persons' don't have their own 'score' or any of their own attributes. Only the comments do. My friend is so insistant that it is necessary for efficiency. Finally I said, "OK, if you want me to create a separate table and let 'person' be it's own field, then what would be the second field? Because if a table has just a single column, it seems pointless. I agree that we may later create a need to give 'person' it's own table, but we can deal with that then." He then said that strings can't be primary keys, and that we would convert the 'persons' in the current table to numbers, and the numbers would be the primary key in the new 'person' table. To me this seems unnecessary and it would make the current table harder to read. He also thinks it will be impossible to create the second table later, and that we need to anticipate now that we might need it for something later.
Who is right?
In my opinion your friend is right.
Person should live in a different table and you should try to normalize. Don't overdo-it, though.
In the long run you may want to do more things with your site, say you want to attach multiple files to a person (ie. pictures) you'll be very thankfull then for the normalization.
Creating a new table for person and using the key of that table in place of the person attribute has nothing to do with normalization. It may be a good idea for other reasons but doing so does not make the database "more normalized" than not doing it. So you are right: as far as normalization is concerned, creating another table is unnecessary.
I would vote for your friend. I like to normalize and plan for the future and even if you never need it, this normalization is so easy to do it literally takes no time. You can create a view that you query in order to make your SQL cleaner and eliminate the need for you to join the tables yourself.
If you have already reached all of your capabilities and have no plans for expansion of capabilities I think you leave it as it is.
If you plan to add more, namely allowing people to have accounts, or anything really, I think it might be smart to separate your data into Person, Comments tables. Its not hard and makes expanding your functionality easier.
You're right.
Person may be a thing in general, but not in your model. If you were going to hassle people into properly identifying the person they're talking about, a Person table would be necessary. For example, if the comments were only about persons already registered in the database.
But here it looks like you have an unstructured data, without identity; and that nothing/nobody is interested in making sure whether "jenny" and "jennyyy" are in fact the same person, not to mentionned "jenny doe", and "my cousin"...
Well, there are two schools of thought. One says, create your data model in the most normalized way possible, then de-normalize if you need more efficiency. The other is basically "do the minimum work necessary for the job, then change it as your requirements change". Also known as YAGNI (You aren't going to need it).
It all depends on where you see this going. If this is all it will be, then your approach is probably fine. If you intend to improve it with new features over time, then your friend is right.
If you never intend to associate the person column with a user or anything else and data apparently needs no consistency or data integrity checks, just why is this in a relational database at all? Wouldn't this be a use case for a nosql database? Or am I missing something?
Normalization is all about functional dependencies (FD's). You need to identify all of the
FD's that exist among the attributes of your data model before it can be fully normalized.
Lets review what you have:
Any given instance of a CommentId functionally determines the Person (FD: CommentId -> Person)
Any given instance of a CommentId functionally determines the Comment (FD: CommentId -> Comment)
Any given instance of a CommentId functionally determines the UserId (FD: CommentId -> UserId)
Any given instance of a CommentId functionally determines the Score (FD: CommentId -> Score)
Everything here is a dependant attribute on CommentId and
CommentId alone. This might lead you to the belief that a relation (table) containing all of, or a subset of, the
above attributes must be normalized.
First thing to ask yourself is why did you create the CommentId attribute anyway? Strictly speaking,
this is a manufactured attribute - it does not relate to anything 'real'. CommentId is
commonly referred to as a surrogate key. A surrogate key is just a made up value that stands in
for a unique value set corresponding to some other group of attributes. So what group of attributes is CommentId
a surrogate for? We can figure that
out by asking the following questions and adding new FD's to the model:
1) Does a Comment have to be unique? If so the FD: Comment -> CommentId must be true.
2) Can the same Comment be made multiple times as long as it is about a different Person? If so, then
FD: Person + Comment -> CommentId must be true and the FD in 1 above is false.
3) Can the same Comment be made multiple times about the same Person provided it was made by
different UserId's? If so, the FDs in 1 and 2 cannot be true but
FD: Person + Comment + UserId -> CommentId may be true.
4) Can the same Comment be made multiple times about the same Person by the same UserId but
have different Scores? This implies FD: Person + Comment + UserId' + Score -> CommentId is true and the others are false.
Exactly one of the above 4 FD's above must be true. Whichever it is affects how your data model is normalized.
Suppose FD: Person + Comment + UserId -> CommentId turns out to be true. The logical
consequences are that:
Person + Comment + UserId and CommentId serve as equivalent keys with respect to Score
Score should be put in a relation with one but not both of its keys (to avoid transitive dependencies).
The obvious choice would be CommentId since it was specifically created as a surrogate.
A relation comprised of: CommentId, Person, Comment, UserId is needed to tie the
Key to its surrogate.
From a theoretical point of view, the surrogate key CommentId is not
required to make your data model or database work. However, its presence may affect how relations are constructed.
Creation of surrogate keys is a practical issue of some importance.
Consider what might happen if you choose to not use a surrogate key but the full
attribute set Person + Comment + UserId in its place, especially if it was required
on multiple tables as a foreign or primary key:
Comment might add a lot of space overhead
to your database because it is repeated in multiple tables. It is probably more than a couple of characters long.
What happens if someone chooses to edit a Comment? That change needs to be propagated
to all tables where Comment is part of a key. Not a pretty sight!
Indexing long complex keys can take a lot of space and/or make for slow update performance
The value assigned to a surrogate key never changes, no matter what you do to the values
associated to the attributes that it determines. Updating the dependant attributes is now
limited to the one table defining the surrogate key. This is of huge practical significance.
Now back to whether you should be creating a surrogate for Person. Does Person live
on the left hand side of many, or any, FDs? If it does, its value will propogate through your
database and there is a case for creating a surrogate for it. Whether Person is a text or numeric attribute is irrelevant to the choice of creating a surrogate key.
Based on what you have said, there is at best a weak argument to create a
surrogate for Person. This argument is based on the suspicion that its value may at some point become a key or part of a key at some point in the future.
Here's the deal. Whenever you create something, you want to make sure that it has room to grow. You want to try to anticipate future projects and future advancements for your program. In this scenario, you're right in saying that there is no need currently to add a persons table that just holds 1 field (not counting the ID, assuming you have an int ID field and a person name). However, in the future, you may want to have other attributes for such people, like first name, last name, email address, date added, etc.
While over-normalizing is certainly harmful, I personally would create another, larger table to hold the person with additional fields so that I can easily add new features in the future.
Whenever you're dealing with users, there should be a dedicated table. Then you can just join the tables and refer to that user's ID.
user -> id | username | password | email
comment -> id | user_id | content
SQL to join the comments to the users:
SELECT user.username, comment.content FROM user JOIN comment WHERE user.id = comment.user_id;
It'll make it so much easier in the future when you want to find information about that specific user. The amount of extra effort is negligible.
Concerning the "score" for each comment, that should also be a separate table as well. That way you can connect a user to a "like" or "dislike."
With this database, you might feel that it is okay but there may be some problem in the future when you want the users to know more from the database.Suppose you want to know about the number of comments made on a person with the name='abc'.In this case ,you will have to go through the entire table of comments and keep counting.In place of this, you can have an attribute called 'count' for every person and increment it whenever a comment is made on that person.
As far as normalization is concerned,it is always better to have a normalized database because it reduces redundancy and makes the database intuitive to understand. If you are expecting that your database will go large in future then normalization must be present.

Best way to link a table to 2 different keys?

I'm designing a mySQL DB and I'm having the following issue:
Say I have a wall_posts table. Walls can belong to either an event or a user.
Hence the wall_posts table must references either event_id or user_id (foreign key constraint).
What is the best way to build such a relationship, considering I must always be able to know who the walls belong to ... ?
I've been considering using 2 tables, such as event_wall_posts and user_wall_posts so one got an event_id field and the other a user_id one, but I believe there must be something much better than such a redundant workaround ...
Is there a way to use an intermediate table to link the wall_posts to either an event_id or a user_id ?
Thanks in advance,
Edit : seems there is not a clear design to do this and both approach seem okay, so,
which one will be the fastest if there is a lots of data ?
Is it preferable to have 2 separates table (so queries might be faster, since there will be twice less data in tables ...), or is it still preferable to have a better OO approach with a single wall_posts table referencing a wall table (and then both users and events will have a uniquewall_id`)
Why is it redundant? You won't write code twice to handle them, you will use the same code, just change the name of the table in the SQL.
Another reason to take this approach is that some time in the future you will discover you need new different fields for each entity.
What you're talking about is called an exclusive arc and it's not a good practice. It's hard to enforce referential integrity. You're probably better off using, in an object sense, a common supertype.
That can be modelled in a couple of ways:
A supertype table called, say, wall. Two subtype tables (user_wall and event_wall) that link to a user and event respectively as the owner. The wall_posts table links to the supertype table; or
Putting both entity types into one table and having a type column. That way you're not linking to two separate tables.
Go for the simplest solution: add both an event_id and a user_id column to the wall_posts table. Use constraints to enforce that one of them is null, and the other is not.
Anything more complex smells like overnormalization to me :)
A classical approach to this problem is:
Create a table called wall_container and keep properties common to both users and events in it
Reference both users and events to wall_container
Reference wall_posts to wall_container
However, this is not very efficient and it's not guaranteed that this wall_container doesn't containt records that are not either a user or an event.
SQL is not particularly good in handling multiple inheritance.
Your wall and event has their own unique IDs .. right?? then their is no need for another table . let the wall_post table have a attribute as origin which will direct to the record of whatever the record is event's or users. '
If the wall and event may have same ID then make a table with three attributes origin(primary), ID number and type. There ID number will be what you set, type defining what kind of entity does the ID represent and origin will be a new ID which you will generate maybe adding different prefix. In case of same ID the origin table will help you immensely for other things to other than wall posts.