I am building an application that will have one table of clients that has an autoincrement id INT field. Then I have an HTML "case" form where the user will have to chose a client from the dropdown, then add some info about "case" that will go into another table.
That means that the client will have an id of 1,2,3 and so on. And I would like that the case adds one decimal number on id number of the client chosen from dropdown. So for Client number two + 1: 2.1, 2.2 and so on. Client number 3, 3.1, 3.2 etc.
What is the best way to add that case filed to SQL? I see if I chose Decimal for a case id field I'm getting number 3.4 as 3.400 because I have chosen decimal 4,3 (MySQL) for testing. I Need to have such decimals because the number of cases can go to hundreds, I can not trim that. I'm struggling with the type of MySQL fields and how to approach this problem.
I'd appreciate some guidance.
The only thing I can think of is to pass the value of a client and then do id + "." + 1, and store it as decimal 1,1 (MySQL), will that auto autoincrement to 1.2 and so on?
The MySQL auto-increment mechanism only increments by whole integers. Sorry, that's the way it is implemented.
The best way to design your Case table in MySQL is this:
CREATE TABLE Cases (
case_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
client_id INT NOT NULL,
...other attributes of the case...
FOREIGN KEY (client_id) REFERENCES Client (client_id)
);
It will have one auto-increment counter for the table, and all clients will need to share this number. This means the case numbers won't always be consecutive for a given client, and they won't start at 1 for each client. Sorry, that's the way auto-increment works in MySQL.
The question has been asked many times with some variation of, "how can I make an auto-increment that renumbers for each group?" You could read the MAX(case_id) for the given client for which you need to insert a case, and then using the max case_id + 1 in your INSERT. In other words, forget about using the auto-increment feature, and calculate the id yourself.
You have to lock the table while doing this to avoid race conditions; two concurrent users could be inserting at the same time, and read the same value for MAX(case_id) and try to insert the same value.
Your plan of using decimal numbers will lead to problems.
What if one day you have a client with more than 999 cases? You'd have to reformat all your case id's, not only for the client with 1000 cases, but for all clients. Any references to the case id's that you had sent out in paper statements and reports would become invalid.
How would you do an SQL query to search for all cases for a given client? If you had client_id in its own column, it would be a query like SELECT ... FROM Case WHERE client_id = 3 but if you have to do a query like ... WHERE case_id BETWEEN 3.000 AND 3.999 it's less clear and harder to optimize. It's also harder to explain to a new programmer you hire for the project. If you end up extending the id format to 4 digits past the decimal, you'd have to rewrite all these SQL queries.
Don't do it. This is the best piece of advise I can give to you.
You are trying to use what was called "Inteligent Codes" back in the 80s.
They went out of fashion for a good reason. Very expensive to develop, non-mantainable, limited ranges, you-name-it. Stay away from them and use normal foreign keys instead. They give you all the flexibility you'll need when the application grows.
Related
TL;DR: Is this design correct and how should I query it?
Let's say we have history tables for city and address designed like this:
CREATE TABLE city_history (
id BIGINT UNSIGNED NOT NULL PRIMARY KEY,
name VARCHAR(128) NOT NULL,
history_at DATETIME NOT NULL,
obj_id INT UNSIGNED NOT NULL
);
CREATE TABLE address_history (
id BIGINT UNSIGNED NOT NULL PRIMARY KEY,
city_id INT NULL,
building_no VARCHAR(10) NULL,
history_at DATETIME NOT NULL,
obj_id INT UNSIGNED NOT NULL
);
Original tables are pretty much the same except for history_id and obj_id (city: id, name; address: id, city_id, building_no). There's also a foreign key relation between city and address (city_id).
History tables are populated on every change of the original entry (create, update, delete) with the exact state of the entry at given time.
obj_id holds id of original object - no foreign key, because original entry can be deleted and history entries can't. history_at is the time of creation of history entry.
History entries are created for every table independently - change in city name creates city_history entry but does not create address_history entry.
So to see what was the state of the whole address with city (e.g. on printed documents) at any T1 point in time, we take from both history tables most recent entries for given obj_id created before T1, right?
With this design in theory we should be able to see the state of signle address with city at any given point of time. Could anyone help me create such a query for given address id and time? Please note that there could be multiple records with the same exact timestamp.
There is also a need to create a report for showing every change of state of given address in given time period with entries like "city_name, building_no, changed_at". Is it something that can be created with SQL query? Performance doesn't matter here so much, such reports won't be generated so often.
The above report will probably be needed in an interactive version where user can filter results e.g. by city name or building number. Is it still possible to do in SQL?
In reality address table and address_history table have 4 more foreign keys that should be joined in report (street, zip code, etc.). Wouldn't the query be like ten pages long to provide all the needed functionality?
I've tried to build some queries, play with greatest-n-per-group, but I don't think I'm getting anywhere with this. Is this design really OK for my use cases (if so, can you please provide some queries for me to play with to get where I want?)? Or should I rethink the whole design?
Any help appreciated.
(My answer copied from here, since that question never marked an answer as accepted.)
My normal "pattern" in (very)pseudo code:
Table A: a_id (PK), a_stuff
Table A_history: a_history_id (PK), a_id(FK referencing A.a_id), valid_from, valid_to, a_stuff
Triggers on A:
On insert: insert values into A_history with valid_from = now, and valid_to = null.
On update: set valid_to = now for last history record of a_id; and do the same insert from the "on insert" trigger with the updated values for the row.
On delete: set valid_to = now for last history record of a_id.
In this scenario, you'd query history with "x >= from and x < to" (not BETWEEN as the a previous record's "from" value should match the next's to "value").
Additionally, this pattern also makes "change log" reports easier.
Without a table dedicated to change logging, the relevant records can be found just by SELECT * FROM A_history WHERE valid_from BETWEEN [reporting interval] OR valid_to BETWEEN [reporting interval].
If there is a central change log table, the triggers can just be modified to include log entry inserts as well. (Unless log entries include "meta" data such as reason for change, who changed, etc... obviously).
Note: This pattern can be implemented without triggers. Using a stored procedure, or even just multiple queries in code, can actually negate the need for the non-history table.
The history table's "a_id" would need to be replaced with whatever uniquely identifies the record normally though; it could still be an id value, but these values would need synthesized when inserting, and known when updating/deleting.
Queries:
(if not new) UPDATE the most recent entry's valid_to.
(if not deleting) INSERT new entry
This is a very "traditional" Problem, when it comes down to versioning (or monitoring) of changes to a certain row.
There are various "solutions", each having its own drawback and advantage.
The following "statements" are a result of my expericence, they are neither perfect, nor do I claim they are the "only ones"!
1.) Creating a "history table": That's the worst Idea of all. You would always need to take into account which table you need to query, depending on DATA that should be queried. That's a "Chicken-Egg" Problem...
2.) Using ONE Table with ONE (increasing) "Revision" Number: That's a better approach, but it will get "hard" to query: Determining the "most recent row" per "id" is very costly no matter which aproach is used.
My personal expierence is, that following the pattern of a "double linked List" ist the best to solve this, when it comes down to Millions of records:
3.) Maintain two columns among every entity, let's say prev_version_id and next_version_id. prev_version_id points to NULL, if there is no previous version. next_version_id points to NULL if there is no later version.
This approach would require you to ALWAYS perform two actions upon an update:
Create the new row
Update the old rows reference (next_version_id) to the just insterted row.
However, when your database has grown to something like 100 Million Rows, you will be very happy that you have choosen this path:
Querying the "Oldest" Version is as simple as querying where ISNULL(prev_version_id) and entity_id = 5
Querying the "Latest" Version is as simple as querying where ISNULL(next_version_id) and entity_id = 5
Getting a full version history will just target the entity_id=5 of the data-table, sortable by either prev_version_id or next_version_id.
The very often neglected fact: The first two queries will also work to get a list of ALL first versions or of ALL recent versions of an entity - in about NO TIME! (Don't underestimate how "costly" it can be do determine the most recent version of an entity otherwise! Believe me, when "testing" everything seems equaly fine, but the real struggle starts when live-data with millions of records is used.)
cheers,
dognose
I have a MySQL table that stores user emails:
user_id | user_phonenumber
----------------------------
id1 | 555-123456789
I want to allow the user to store multiple phonenumbers and I don't want to limit the number of numbers a user can be associated with.
What's the best way of structuring my data, and how would a query work in PDO?
For example, should I store them all in the same field with comma separators and then parse the output when the query is returned, or should I use another table and have each row as a separate number with common user_ids? How would a lookup work then (please provide example code if possible)?
Thanks
Generally RDBMS systems are designed to access fields/rows. Everything will be much harder when you start to break the data-field link/consistency/logic.
I mean when you start to store more data in a single field.
But you know your system's future. It can happen that you won't ever have to access for example the first phone number, and if you can handle it everywhere as a blob then it can be fine to store more values in a single field.
Anyway If this is not a homework or similar short living task then you should choose the 1 phone number/1 record approach.
I mean something like this can be future proof:
create table user_phonenumbers(
id auto_increment primary key.
user_id integer references user(id),
phonenumber varchar(32)
);
Yes, use another table to store user phone numbers.
use inner join to lookup, it would be good.
I am in the process of rewriting a company's entire system. The original developer was a bit silly and generated ID numbers for each customer report randomly in his database. Each ID number is up to 7 digits long - but could be anything.
I am migrating over all his old data to our new, far more logically structured database. I obviously want to use a MySQL auto-increment for our ID field. However, it's vital that we keep the old ID numbers as customers still phone up each day with those to reference against.
Ideally, the perfect scenario would be we go live December 1st - everything up to December 1st is all randomly IDed, and from December 1st onwards they automatically increment starting at the highest random ID in the old database.
Is such a thing possible with MySQL without any issues? I am currently using two columns - one, our logical autoincrementing ID, and a second column called old_id which was being used during migration. But we need the call centre staff to only be using one ID or mass confusion will ensue.
Thanks!
If you start numbering from the highest random value, just changing the field to autoincrement should be enough, the normal behaviour is that mysql won't change ids already set, and starts numbering from the highest value+1.
If you want to start from a specific value (say 10,000,000) you can set
ALTER TABLE theTableInQuestion AUTO_INCREMENT=10000000
Of course, be sure to create backups and test, but it should not pose any problems at all. (Note that the old records will be stored in order of the id-field, which is random, and won't reflect the creation order.)
As you need to keep the old IDs, I'm going to assume that you're going to create a new column for autoincrement ID that will become your primary key but keep the existing ID column and rename it (to old_id, maybe?). I'm also going to assume you record when a customer signed up.
If you make your old ID column nullable (allow NULL as a valid value) then you can simply check whether or not the old ID column is NULL. If it's not NULL then treat that as the ID, otherwise use the autoincrement column.
Finding a customer:
SELECT *
FROM customer
WHERE (id = /*Put your ID here*/ AND reg_date >= /*Put the date the new regime starts here*/)
OR (id_old = /*put your ID here*/ AND reg_date < /*Put the date the new regime starts here*/)
This will occasionally return 2 rows so you'll have to use some other criteria to uniquely identify the customer in question.
As for associating an old customer with other tables in the database, you can always use the new ID internally throughout the entire DB once its generated. You will have to update tables that are using the old ID as the foreign key, obviously.
UPDATE target_table
JOIN customers on target_table.cust_id = customers.id_old
SET target_table.cust_id = customers.id;
(Note: The above is just a quick and dirty query that hasn't been tested! I'd suggest testing on a copy of the database before you try it for real!)
I have several tables like Buyers, Shops, Brands, Money_Collectors, e.t.c.
Each one of those has a default value, e.g. the default Buyer is David, the default Shop is Ebay, and so on.
I would like to save those default values in a database (so that user could change them).
I thought to add is_default column to each one of the tables, but it seems to be ineffective because only one row in each table may be the default.
Then I thought that the best would be to have Defaults table that will contain all the default values. This table will have 1 row and N columns, where N is the number of the default values:
Defaults table:
buyer shop brand money_collector
----- ---- ----- ---------------
David Ebay Dell NULL (no default value)
But, this seems to be not the best approach because the table structure changes when a new default value is added.
What would be the best approach to store default values ?
Just to be clear.
The best way is with a column on each table which dropdowns source from.
And here's why...
"Shouldn't I worry about space when
saving data in a database?"
The short answer is no. The longer answer is what you should worry about is performance. Focusing on space will lead you to do very bad things.
Bad things that you'll do if space is a concern.
You'll bury meaning into Primary Keys. i.e. Smart Keys.
You'll try to store mulitple values in one column.
You'll index too little
(No doubt we could create a list of 50 bad practices which save space)
suppose there are 50 shops (select box
with 50 possible values). In this
case, to store the default shop you
need 50 boolean fields,
Well it's ONE Boolean column. It exists on each row.
Let me ask you this. If you created a table with 1 date column and inserted 1 row, how much space would you use on disk?
If you said a 7 or 8 bytes then you're off by about 1000 times.
The smallest unit of disk space is a block. Blocks are typical 8kb (the can be as small as 2kb as large as 32kb, in general (no nitpicking here, the actual limits are unimportant))
Let's say you have 8kb blocks then your 1 column, 1 row table takes 8Kb. If you insert another 999 rows it will still take up 8KB. (Again no nitpicking there is overhead per block and per row - it's an example)
So in your look up table with 50 store names, the likelihood that adding 50 bytes to the size of the table forces you to expand from 1 block to 2 is slim to none and completely irrelevant.
On the other hand, your default table will certainly take up at least one additional block.
But the worst hit to PERFORMANCE is that your call to fill a drop down will need two round trips to the database, one to get the list, one to get the default. (yes, you may be able to do this in one but go with it)
So you've saved exactly zero space and doubled your network traffic.
You see what I'm saying.
Another crucial reason to stop worrying about space is you're giving up clarity. think of the developer you're going to hire to run this app. When he joins the team and looks at the database, imagine the two scenarios.
There's a Boolean column named Default_value
There's a table with no relationships to anything that's named Default_Values
You ask him to build a new for with a dropdown for 'store'.
In scenario 1 he finds the store table, wires up the dropdown to a simple query of the table and uses the default_value field to select the initial value.
In scenario 2, without some training, how would he know to look for a separate table? Maybe he'd see the table but by the time you're hiring, your datamodel now has hundreds of tables.
Again, a little contrived but the point is salient. Clarity in the database is well, well worth a byte per row.
Technical stuff
I'm not a MySQL guy but in Oracle, a null column at the end of a row take no additional space. In Oracle I would use a Varchar2(1) and let 'T' = Default and leave the others null. That would have the effect on only using 1 addition byte total, and not per row. YMMV with MySQL, you can pose that question separately if you can't Google the answer.
But the time to worry about that is on millions of rows, not hundreds. Any table which feeds a dropdown will never be big enough to start worrying about extra bytes.
What if you create an XML and then store that XML in the table in an XML column. The XML column would contain the XML, and the XML could have tags of tables and a sub node of default values.
You should rather create a a table with two columns and n rows
Defaults table:
buyer, David
shop, Ebay,
brand, Dell
This way you can add new values without having to change table structure
You can create a catalog table (some kind of metadata table) containing the default values as strings for the desired table columns. Then you can use the convert function for getting the appropriate value. Below is a sample table definition (Transact-SQL was used):
create table dbo.cat_default_values
(
id_column varchar(30) not null,
id_table varchar(30) not null,
datatype varchar(30) not null,
value varchar(100) not null,
f_creation datetime not null,
usr_creation char(8) null,
primary key clustered (id_column, id_table)
)
declare #defaultValueInt int,
#defaultValueVarchar varchar(30)
select #defaultValueInt = convert(int, value)
from cat_default_values where id_column = "defColumInteger" and id_table = "table1"
select defaultValueVarchar = value
from cat_default_values where id_column = "defColumVarchar" and id_table = "table1"
What you are trying to store is not meta data information. First of all, so I will not invent an external data store to store this data.(coupled with extra code )
I assume you have a PK Sequence generation logic (under your control). I will assign a magic number x and I will insert a record in each table with _id = x as the default value. So if you want to show the user the default value, you can handle in your query in a uniform way or you can handle this in application logic while insert. The good thing about this is, you have access to default value all the time and without writing any extra logic and the logic for maintaining default value of a table can be maintained using the same code (templating ;)
(From the lessons W3c learned from modeling schema information of XML using DTD.)
Only catch is this logic should be made explicit either using some extensive documentation or could be hard imposed by using a trigger.
Looking for a scalable, flexible and fast database design for 'Build your own form' style website - e.g Wufoo.
Rules:
User has only 1 Form they can build
User can create their own fields or choose from 'standard' fields
User's 1 Form has as many fields as the user wants
Values can be the sibling of another value E.g A photo value could have name, location, width, height as sibling values
Special Rules:
User can submit their form a maximum of 5 times a day
Value's Date is important
Flexibility to report on values (for single user, across all users, 1 field, many fields) is very important -- data visualization (most will be chronologically based e.g. all photos for July 2009 for all users).
Table "users"
uid
Table "field_user" - assign a field to a users form
fid
uid
weight - int - used to order the fields on the users form
Table "fields"
fid
creator_uid - int - the field 'creator'
label - varchar - e.g. Email
value_type - varchar - used to determine what field in the 'values' table will be filled in (e.g. if 'int' then values of this field will submit data into the values.type_int field - and all other .type_x fields will be NULL).
field_type - varchar - e.g. 'email' - used for special conditions e.g. validation rules
Table "values"
vid
parent_vid
fid
uid
date - date
date_group - int - value 1-5 (user may submit max of 5 forms per day)
type_varchar - varchar
type_text - text
type_int - int
type_float - float
type_bool - bool
type_date - date
type_timestamp - timestamp
I understand that this approach will mean records in the 'Value' table will only have 1 piece of data with other .type_x fields containing NULL's... but from my understanding this design will be the 'fastest' solution (less queries, less join tables)
At OSCON yesterday, Josh Berkus gave a good tutorial on DB design, and he spent a good fraction of it mercilessly tearing into such "EAV"il tables; you should be able to find his slides on the OSCON site soon, and eventually the audio recording of his whole tutorial online (the latter will probably take a while).
You'll need a join per attribute (multiple instances of the values table, one per attribute you're fetching or updating) so I don't know what you mean by "less join tables". Joining many instances of the same table isn't a particularly fast operation, and your design makes indices nearly unfeasible and unusable.
At least as a minor improvement use per-type separate tables for your attributes' values (maybe some indexing might be applicable in that case, though with MySQL's limitation to one index per query per table even that's somewhat dubious).
You should really look into schema-free dbs like CouchDB, problems like this are exactly those these types of DBs want to solve.
y'know, create table, alter, add a column, etc are operations you can do at run time in many modern rdbms implementations. Why be EAVil? Especially if you are using dynamic sql.
It's not for the fainthearted. I recall an implementation at Boeing which resulted in 70,000 tables in a database.
Obviously there are pitfalls in dynamic table creation, but they also exist for EAV tables. Things like two attributes for the same key expressing the same fact. Or transitive dependencies and other normalization gotchas. So why not at least leverage the power of the RDBMS on your behalf?
I agree with john owen.
dynamically creating a query from the schema is a small price to pay compared to querying EVA tables. Especially if the tables are large.
Usually table columns are considered an "interface". A design that relies on a dynamically changing interface is usually bad, but EAV data is a special case where you don't have many options. You have to choose between slow unintuitive queries or dynamic schema.