Due to bad design, I have a database that contains data in one table, that really should be split up into two tables.
The table provides data for two different models. I distinguish between those models using a table field called type.
I use this to say if type == MODEL_A ... do foo, or if type == MODEL_B ... do bar.
Depending on the type of the concrete table (type: MODEL_A or MODEL_B), I only use a subset of the columns in the table for MODEL_A, and the remaining subset of the columns for MODEL_B. Therefore, many columns always contain NULL
I believe they should be split up into a MODEL_A table and a MODEL_B table.
How should I go about this in Rails/ActiveRecord, without dropping the existing data?
This is a pretty broad question, so my answer will focus on procedure rather than specific code.
Create a new table for the MODEL_B data. Name it MODEL_B_TABLE (for example)
Rename the original table (if necessary) since it will now be used for only MODEL_A data
Run a query to pull all MODEL_B data from the original table and place it into the new MODEL_B_TABLE
Update your application to pull from the correct database tables
Remove the unneeded data from the original table (since that specific data now exists in the MODEL_B_TABLE)
Test, test, test!
Upload the changes to a staging server and run the migrations.
If all looks well on the staging server, push it to production. If not, start over from step 6.
That would be an appropriate procedure to avoid data loss for a production server. Proper testing is paramount! Ensure you make backups of all your data before pushing into production.
Related
I wanted to ask for some advice in structuring the SQL database I am creating.
UPDATE: Storing Data in MySQL as JSON seems to clearly indicate that storing JSON in MySQL is not a smart choice.
My initial idea was to create a table for each user named '{user_id}' with the following rows:
Datetime
Entry (one-digit int)
Comment (reasonably short string)
Activity (one word)
However, I have read that creating a table for each user is not advisable because it's unmanageable in the long run.
Specifically, I wanted to know how I could put all the information that I would have put in the '{user_id}' table in the user's row of my 'members' table.
I had a few ideas, but don't know how good they are:
Storing the user data as a JSON object (converted to a string) in an additional column 'data' of the 'members' table. Would that become unmanageable in the long run too (due to JSON object string becoming too long)?
Storing the user data in various additional columns of the 'members' table, maybe one for each of the parameters listed above (each of them being an array)
Storing the user data in various additional columns of the 'members' table, maybe one for each of the parameters listed above (each of them being a dictionary or some other data structure)
Are there any other better ways, or better data storage types than JSON objects?
What would be a good way of storing this information? Isn't handling the arrays/dictionaries going to become unmanageable over time when they become very big?
(one thing to keep in mind is that the 'data' entries would have to be daily modified and easily accessed)
I think you may simply want a single additional table, maybe called "activities" with a foreign key "user" to the "members" table.
Then for each row in each of the per user table that you were originally thinking of, you have a row in the activities table with the value of "user" being the user in question. Since each row is of relatively small bounded size, one would expect the database to handle it well, and efficiency issues can be addressed by indexing. Basically I am agreeing with #MikeNakis
Can I use (and if yes -- then how) Yii2 Faker to fill entire table (all columns) with random data for n records without knowing table structure? Can Faker check schema and do this for me or do I have to write my own code, that will use it in this scenario?
I want, for example to test, how large my database will become, when I feed it with let's say millions of records. Since my database contains many tables and each table has different structure, I would like to use something automated rather than writing my own code for each table and each structure.
Is this possible with Faker or possibly any other extension to Yii2?
Take a look at Gii, it goes through all the columns on a table and does some things. You can also figure out that columns are foreign keys and get data from other tables.
I do not know of anything that does this for you automatically it is doable.
1 thing, you have to give it an order to fill in the tables, it will not work unless you fill the tables in a specific way especially with the foreign keys.
I am working on a project that is an upgrade of an existing system.
The existing DB structure must be kept intact as there is a system reading the same DB that will be used ontop of the new system.
I am building a new CMS / Management system using a PHP framework that expects so see all DB table autoincrement ID field named simply "id" - I do not want to modify the PHP deal with anything other that "id" as this field name - trust me it will be a massive task.
The existing DB has non standard Autoincrement ID field naming, eg:
"iBmsId" -shcema: i=INT Bms = the name of the table, Id = ID....
Is there anything I can do to the DB itself to make a duplicate of the "iBmsId" column, to create a matched column called simply "id" that has the corresponding INT values? This way my new system will function as expected without having to do a serious re-write, and at the same time still have the existing system able to communicate with the DB?
In this situation you can just use VIEW :)
http://dev.mysql.com/doc/refman/5.0/en/create-view.html
View in dbms is like a virtual table (unless it's materialized). Views add a new abstraction layer which can support independency between how you use db and how it's implemented. It can also increase security for example by hiding some fields or making view readonly.
Notice: In order to add view transparently you can rename origin table and create the View with origin table name. This let's you avoid modifications in existing code.
You can read here how to create updatable and insertable view (which can behave as normal table).
If only one system at a time is modifying the value, then you can use a view:
create view v_table as
select t.*, iBMid as id
from table t;
Presumably, an auto-incremented value is not going to be updated, so this should be safe. However, keep in mind that:
To be more specific, a view is not updatable if it contains any of the following:
. . .
Multiple references to any column of a base table.
This could affect other columns that you might want to treat the same way.
I have an existing application (with MySQL DB).
I just got a new requirement where I need to delete some records from one of main entity. I dont want to apply hard delete here as its risky for whole application. If I use soft delete I have to add another field is_deleted and because of that i have to update all my queries (like where is_deleted = '0').
Please let me know if there is any smarter way to handle this situation. I have to make changes in half of the queries if I introduce a new flag to handle deletes.
Your application can run without any changes. MySQL is ANSI-SPARC Architecture compliant . With external schema you achieve codd's rule 9 "Logical data independence":
Changes to the logical level (tables, columns, rows, and so on) must
not require a change to an application based on the structure. Logical
data independence is more difficult to achieve than physical data
independence.
You can rename your tables and create views with original table names. A sample:
Let's supose a table named my_data:
REMAME TABLE my_data TO my_data_flagged
ALTER TABLE my_data_flagged
ADD COLUMN is_deleted boolean NOT NULL default 0;
CREATE VIEW my_data AS
SELECT *
FROM my_data_flagged
WHERE is_deleted = '0'
Another way is create a trigger and make a copy of erased rows in independent table.
Four suggestions:
Instead of using a bit called is_deleted, use a dateTime called something like deleted_Date... have this value be NULL if it is still active, and be a timestamp for the deletion date otherwise. This way you also know when a particular record was deleted.
Instead of updating half of your queries to exclude deleted records, create a view that does this filtering, and then update your queries to use this view instead of applying the filtering everywhere.
If the soft deleted records are involved in any type of relationships, you may have to create triggers to ensure that active records can't have a parent that is flagged as deleted.
Think ahead to how you want to eventually hard-delete these soft-deleted records, and make sure that you have the appropriate integrity checks in place before performing the hard-delete.
We are currently thinking about different ways to implement custom fields for our web application. Users should be able to define custom fields for certain entities and fill in/view this data (and possibly query the data later on).
I understand that there are different ways to implement custom fields (e.g. using a name/value table or using alter table etc.) and we are currently favoring using ALTER TABLE to dynamically add new user fields to the database.
After browsing through other related SO topics, I couldn't find any big drawbacks of this solution. In contrast, having the option to query the data in fast way (e.g. by directly using SQL's where statement) is a big advantage for us.
Are there any drawbacks you could think of by implementing custom fields this way? We are talking about a web application that is used by up to 100 users at the same time (not concurrent requests..) and can use both MySQL and MS SQL Server databases.
Just as an update, we decided to add new columns via ALTER TABLE to the existing database table to implement custom fields. After some research and tests, this looks like the best solution for most database engines. A separate table with meta information about the custom fields provides the needed information to manage, query and work with the custom fields.
The first drawback I see is that you need to grant your application service with ALTER rights.
This implies that your security model needs careful attention as the application will be able to not only add fields but to drop and rename them as well and create some tables (at least for MySQL).
Secondly, how would you distinct fields that are required per user? Or can the fields created by user A be accessed by user B?
Note that the cardinality of the columns may also significantly grow. If every user adds 2 fields, we are already talking about 200 fields.
Personally, I would use one of the two approaches or a mix of them:
Using a serialized field
I would add one text field to the table in which I would store a serialized dictionary or dictionaries:
{
user_1: {key1: val1, key2, val2,...},
user_2: {key1: val1, key2, val2,...},
...
}
The drawback is that the values are not easily searchable.
Using a multi-type name/value table
fields table:
user_id: int
field_name: varchar(100)
type: enum('INT', 'REAL', 'STRING')
values table:
field_id: int
row_id: int # the main table row id
int_value: int
float_value: float
text_value: text
Of course, it requires a join and is a bit more complicated to implement but far more generic and, if indexed properly, quite efficient.
I see nothing wrong with adding new custom fields to the database table.
With this approach, the specific/most appropriate type can be used i.e. need an int field? define it as int. Whereas with a name/value type table, you'd be storing multiple data types as one type (nvarchar probably) - unless you complete that name/value table with multiple columns of different types and populate the appropriate one but that is a bit horrible.
Also, adding new columns makes it easier to query/no need to involve a join to a new name/value table.
It may not feel as generic, but I feel that's better than having a "one-size fits all" name/value table.
From an SQL Server point of view (2005 onwards)....
An alternative, would be to store create 1 "custom data" field of type XML - this would be truly generic and require no field creation or the need for a separate name/value table. Also has the benefit that not all records have to have the same custom data (i.e. the one field is common, but what it contains doesn't have to be). Not 100% on the performance impact but XML data can be indexed.