CREATE TABLE auto append default columns? - mysql

In MySQL, is it possible to append default columns after creation or create them automatically? A brief overview is this:
All tables must have 5 fields that are standardized across our databases (created_on, created_by, row_status etc). Its sometimes hard for developers to remember to do this and/or not done uniformly. Going forward we'd like to automate the task some how. Does anyone know if its possible to create some sort of internal mysql script that will automatically append a set of columns to a table?
After reading through some responses, I think i'd rephrase the question bit, rather than making it an autoamtic task (i.e EVERY Table), make it function that can be user-triggered to go through and check for said columns and if not add them. I'm pretty confident this is out of SQL's scope and would require a scripting language to do, not a huge issue but it had been preferable to keep things encapsulated within SQL.

I'm not very aware of MySQL specific data modeling tools, but there's no infrastructure to add columns to every table ever created in a database. Making this an automatic behavior would get messy too, when you think about situations where someone added the columns but there were typos. Or what if you have tables that are allowed to go against business practice (the columns you listed would typically be worthless on code tables)...
Development environments are difficult to control, but the best means of controlling this is by delegating the responsibility & permissions to as few people as possible. IE: There may be 5 developers, but only one of them can apply scripts to TEST/PROD/etc so it's their responsibility to review the table scripts for correctness.

i would say first - don't do that.
make an audit table seperately - and link with triggers.
otherwise, you will need to feed your table construction through a procedure or other application that will create what you want.

I'd first defer to Randy's answer - this info is probably better extracted elsewhere.
That said, if you're set on adding the columns, ALTER TABLE is probably what you're looking for. You might consider also including some extra logic to determine which columns are missing for each table.

Related

How to implement custom fields in database

I need to implement a custom fields in my database so every user can add any fields he wants to his form/entities.
The user should be able to filter or/and sort his data by any custom field.
I want to work with MySQL because the rest of my data is very suitable to SQL. So, unless you have a great idea, SQL will be preferred over NoSQL.
We thought about few solutions:
JSON field - Great for dynamic schema. Can be filtered and sorted. The problem is that it is slower then regular columns.
Dynamic indexes can solve that but is it too risky to add indexes dynamically.
Key-value table - A simple solution but a really slow one. You can't index it properly and the queries are awful.
Static placeholder columns - Create N columns and hold a map of each field to its placeholder. - A good solution in terms of performance but it makes the DB not readable and it has limited columns.
Any thoughts how to improve any of the solutions or any idea for a new solution?
As many of the commenters have remarked, there is no easy answer to this question. Depending on which trade-offs you're willing to make, I think the JSON solution is neatest - it's "native" to MySQL, so easiest to explain and understand.
However, given that you write that the columns are specified only at set up time, by technically proficient people, you could, of course, have the set-up process include an "alter table" statement to add new columns. Your database access code and all the associated view logic would then need to be configurable too; it's definitely non-trivial.
However...it's a proven solution. Magento and Drupal, for instance, have admin screens for adding attributes to the business entities, which in turn adds columns to the relational database.

update target table given DateCreated and DateUpdated columns in source table

What is the most efficient way of updating a target table given the fact that the source table contains a DateTimeCreated and DateTimeUpdated column?
I would like to keep the source in target in synch avoiding a
truncate. I am looking for a bets practice pattern in this situation
I'll avoid a best practice answer but give enough detail to make an appropriate choice. There are two main methods with which you might update a table in SSIS, avoiding a TRUNCATE - LOAD:
1) Use an OLEBD COMMAND
This method is good if:
you have a reliable DateTimeUpdated column,
there are not many rows to update,
there are not a lot of columns to update
there are not many added columns in the dataflow (i.e. derived column transforms)
and the update statement is fairly straightforward.
This method performs poorly with many columns because it performs a row-by-row update. Relying on an audit date column can be a great method to reduce the number of rows to update, but it can also cause problems if rows are updated in the source system and the audit column is not changed. I recommend only trusted it if it has a trigger or you can be certain that no human can perform updates on the table.
Additionally, this component falls short when there is a lot of columns to map or a lot of transforms going on in the data flow. For example, if you are converting all string columns from unicode to non-unicode, you may have many additional columns in the mix that will make mapping and maintenance a pain. The mapping tool in this component is good for about 10 columns, it starts to get confusing very quickly after that. Especially because you are mapping to numbered parameters rather than column names.
Lastly, if you are doing anything complex in the update statement, it is better suited for SQL code rather than maintaining it in the components editor which has no intellisense and is generally painful to use.
2) Stage the data and perform the update in Execute SQL task after the data flow
This method is good for all the reasons that the OLEDB command is bad for, but has some disadvantages as well. There is more code to maintain:
a couple of t-sql tasks,
a proc
and a staging table
This means also that it takes more time to set up as well. However, it does perform very well and the code is far easier to read and understand. Ongoing maintenance is simpler as well.
Please see my notes from this other question that I happened to answer today on the same subject: SSIS Compare tables content and update another

Should I be using triggers to connect two related but heavily denormalized tables?

I've never used triggers before, but this seems like a solid use case. I'd like to know if triggers are what I should be using, and if so, I could use a little hand-holding on how to go about it.
Essentially I have two heavily denormalized tables, goals and users_goals. Both have title columns (VARCHAR) that duplicate the title data. Thus, there will be one main goal of "Learn how to use triggers", and many (well, maybe not many in this case) users' goals with the same title. The architecture of the site demands that this be the case.
I haven't had a need to have a relationship between these two tables just yet. I link from individual users' goals to the main goals, but simply do so with a query by title, (with an INDEX on the title column). Now I need to have a third table that relates these two tables, but it only needs to be eventually consistent. There would be two columns, both FOREIGN KEYs, goal_id and users_goal_id.
Are triggers the way to go with this? And if so, what would that look like?
Yes you could do this using triggers, but the exact implementation depends on your demands.
If you want to rebuild al your queries, so they don't use the title for the join, but the goal_id instead, you can just build that. If you need to keep the titles in sync as well, that's an extra.
First for the join. You stated that one goal has many user goals. Does that mean that each user goal belongs to only one goal? If so, you don't need the extra table. You can just add a column goal_id to your user_goals table. Make sure there is a foreign key constraint (I hope you're using InnoDB tables), so you can enforce referential integrity.
Then the trigger. I'm not exactly sure how to write them on MySQL. I do use triggers a lot on Oracle, but only seldom on MySQL. Anyway, I'd suggest you build three triggers:
Update trigger on goals table. This trigger should update related user_goals table when the title is modified.
Update trigger on the user_goals table. If user_goals.title is modified, this trigger should check if the title in the goals table differs from the new title in user_goals. If so, you have two options:
Exception: Don't allow the title to be modified in the user_goals child table.
Update: Allow the title to be changed. Update the parent record in goals. The trigger on goals will update the other related user_goals for you.
You could also silently ignore the change by changing the value back in the trigger, but that wouldn't be a good idea.
Insert trigger on user_goals. Easiest option is to query the title of the specified goal_id and don't allow inserting another value for title. You could opt to update goals if a title is given.
Insert trigger on goals. No need for this one.
No, you should never use triggers at all if you can avoid it.
Triggers are an anti-pattern to me; they have the effect of "doing stuff behind the programmer's back".
Imagine a future maintainer of your application needs to do something, if they are not aware of the trigger (imagine they haven't checked your database schema creation scripts in detail), then they could spend a long, long time trying to work out why this happens.
If you need to have several pieces of client-side code updating the tables, consider making them use a stored procedure; document this in the code maintenance manual (and comments etc) to ensure that future developers do the same.
If you can get away with it, just write a common routine on the client side which is always called to update the shared column(s).
Even triggers do nothing to ensure that the columns are always in sync, so you will need to implement a periodic process which checks this anyway. They will otherwise go out of sync sooner or later (maybe just because some operations engineer decides to start doing manual updates; maybe one table gets restored from a backup and the other doesn't)

Pros and Cons for CreatedDate and ModifiedDate columns in all database tables

What are the pros and cons? When should we have them and when we shouldn't?
UPDATE
What is this comment in an update SP auto generated with RepositoryFactory? Does it have to do anything with above columns not present?
--The [dbo].[TableName] table doesn't have a timestamp column. Optimistic concurrency logic cannot be generated
If you don't need historical information about your data adding these columns will fill space unnecessarily and cause fewer records to fit on a page.
If you do or might need historical information then this might not be enough for your needs anyway. You might want to consider using a different system such as ValidFrom and ValidTo, and never modify or delete the data in any row, just mark it as no longer valid and create a new row.
See Wikipedia for more information on different schemes for keeping historic information about your data. The method you proposed is similar to Type 3 on that page and suffers from the same drawback that only information about the last change is recorded. I suggest you read some of the other methods too.
All I can say is that data (or full blown audit tables) has helped me find what or who caused a major data problem. All it takes is one use to convince you that it is good to spend the extra time to keep these fields up-to-date.
I don't usually do it for tables that are only populated through a single automated process and no one else has write permissions to the table. And usually it isn't needed for lookup tables which users generally can't update either.
There are pretty much no cons to having them, so if there are any chance you will need them, then add them.
People may mention performance or storage concerns but,
in reality they will have little to no effect on SELECT performance with modern hardware, and properly specified SELECT clauses
there can be a minor impact to write performance, but this will likley only be a concern in OLTP-type systems, and this is exactly the case where you suually want these kinds of columns
if you are at the point where adding columns like this are a dealbreaker in terms of performance, then you are likely looking at moving away from SQL databases as a storage platform
With CreatedDate, I almost always set it up with a default value of GetDate(), so I never have to think about it. When building out my schema, I will add both of these columns unless it is a lookup table with no GUI for administering it, because I know it is unlikely the data will be kept up to date if modified manually.
Some DBMSs provide other means to capture this information autmatically. For example Oracle Flashback or Microsoft Change Tracking / Change Data Capture. Those methods also capture more detail than just the latest modification date.
That column type timestamp is misleading. It has nothing to do with time, it is rowversion. It is widely used for optimistic concurrency, example here

What is the right way to do flexible columns in database?

Im storing columns in database with users able to add and remove columns, with fake columns. How do I implement this efficiently?
The best way would be to implement the data structure vertically, instead of normal horizontal.
This can be done using something like
TableAttribute
AttributeID
AttributeType
AttributeValue
This application of vertical is mostly used in applications where users can create their own custom forms, and field (if i recall corretly the devexpress form layout allows you to create custom layouts). Mostly used in CRM applications, this is easily modified inproduction, and easily maintainable, but can greatly decrease SQL performance once the data set becomes very large.
EDIT:
This will depend on how far you wish to take it. You can set it up that it will be per form/table, add attributes that describe the actual control (lookup, combo, datetime, etc...) position of the controls, allowed values (min/max/allow null).
This can become a very cumbersome task, but will greatly depend on your actual needs.
I'd think you could allow that at the user-permission level (grant the ALTER privilege on the appropriate tables) and then restrict what types of data can be added/deleted using your presentation layer.
But why add columns? Why not a linked table?
Allowing users to define columns is generally a poor choice as they don't know what they are doing or how to relate it properly to the other data. Sometimes people use the EAV approach to this and let them add as many columns as they want, but this quickly gets out of control and causes performance issues and difficulty in querying the data.
Others take the approach of having a table with user defined columns and give them a set number of columns they can define. This works better performance wise but is more limiting interms of how many new columns they can define.
In any event you should severely restrict who can define the additional columns only to system admins (who can be at the client level). It is a better idea to actually talk to users in the design phase and see what they need. You will find that you can properly design a system that has 90+% of waht the customer needs if you actually talk to them (and not just to managers either, to users at all levels of the organization).
I know it is common in today's world to slough off our responsibility to design by saying we are making things flexible, but I've had to use and provide dba support for many of these systems and the more flexible they try to make the design, the harder it is for the users to use and the more the users hate the system.