EntityFramework 4.1 Code first TPT and data integrity

EntityFramework 4.1 Code first TPT and data integrity - entity-framework-4.1

Reading this article,
http://weblogs.asp.net/manavi/archive/2010/12/28/inheritance-mapping-strategies-with-entity-framework-code-first-ctp5-part-2-table-per-type-tpt.aspx
and doing a simple test myself, I feel that the database isn't very safe (from data integrity).
I mean, lets take the example in that page:
1 base class and 2 subclasses:
base = BillingDetail
subclasses = CreditCards, BankAccounts
3 tables. BillingDetail has primary key shared with CreditCards, BankAccounts.
Say from the application, we create a CreditCards entity and save to database. We get a row in BillingDetail + a row with same id in CreditCards.
If we then go to database, and manually create a row in BankAccounts with the same ID, it now is also a BankAccounts entity in the application.
I mean is this behavior.... wrong?
(Not that I am that overly concerned about this, but wouldn't it have been better to insert something like trigger or constraint in database to ensure integrity?)

You can always create your own trigger if you feel that you need additional integrity. Your described approach will be indeed allowed in the database but once you do such change your EF application will always fail when trying to read such corrupted record.
The reason why EF does it this way is that using either code first or model first is for scenarios where database is only for your application so you don't have to deal with such modifications from elsewhere.

Related

Database Design: force a child table to populate with predefined data when the parent record is created

I hope this isn't going to be too broad a question or too subjective (and please correct me if it is) but what are the best practices or most ideal way of populating the rows of a child table with data that MUST exist when a parent record is created?
I have come across something like this Database structure
in the past (This is not a real database design, this is just something i threw together in MySQL workbench to give you the general idea. i know there are errors in it)
When a test is applied to a 'sample' (i.e, a record is added to the 'SampleTest' table) all of the 'components' that comprise that test need to be added to the 'SampleResult' table, almost as if the 'test' and 'component' tables contain a template for the 'SampleTest' and 'SampleResult' tables.
What is the best way to ensure that the 'SampleResult' table gets populated with all the records it needs when a test is applied to a sample?
Example data:
The Test table contains a test called Metals.
The Metals test is made up of several Component records; Arsenic, Iron, Aluminium, etc...
The Metals test is added to a sample by adding a record to the SampleTest Table.
At that point the SampleResult Table needs to have records linking to the three components; Arsenic, Iron and Aluminium.
Should this be dealt with by the database itself or is this a front-end issue?

The two options are:
Implement in the database in the form of a trigger, where the trigger would look something like this:
CREATE TRIGGER ins_test_components
AFTER INSERT ON SampleTest
FOR EACH ROW
BEGIN
INSERT INTO SampleResult (component_id, ...)
SELECT components.id, ...
FROM Components, Test
WHERE Test.id = NEW.Test_id
END
The trigger above will insert all the components of the Test who's id is in the NEW row inserted into the SampleTest.
Here's a tutorial on triggers.
Implement this within in the Application, but make sure it is framed within a transaction so that if something goes wrong the whole thing will be aborted and the database will remain in a consistent state. This is definitely not a front-end issue, since you need to add things in the database table as a side-effect of another INSERT.

Database design for keeping track of experiment data

I am designing a database to record experiment results. Basically, an experiment has several input parameters and an output response. Therefore, the data table will look like following:
run_id parameter_1 parameter_2 ... parameter_n response
1 ... ... ... ...
2 ... ... ... ...
.
.
.
However, the structure of this table is not determinant since different experiments have different number of columns. Then the question is: when a user instantiate an experiment, is it a good idea to create data table dynamically on the fly? Otherwise, what is the elegant solution for that? Thanks.

When I find myself trying to dynamically create tables during runtime, it usually means I need another table to resolve a relationship between entities. In short, I would recommend treating your input parameters as a separate entity and store them in a separate table.
It sounds like your entities are:
experiment
runs of an experiment, which consist of a response and one or more:
input parameters
The relationships between entities is:
One experiment to zero or more runs
One run to one or more input parameter values (one to many)
This last relationship will require an additional table to resolve. You can have a separate table that stores your input parameters, and associate the input parameters with a run_id. This table could look like:
run_parameter_id ... run_id_fk ... parameter_keyword ... parameter_value
Where run_id_fk is a foreign key to the appropriate row in the Runs table (described in your question). The parameter_keyword is just used to keep track of the name of the parameter (parameter_1_exp1, parameter_2_exp1, etc).
Your queries to read/write from the database now become a bit more complicated (needing a join), but no longer reliant on creating tables on the fly.
Let me know if this is unclear and I can provide a potential database diagram.

Database Architecture for logging

This is something that has bothered me for a long time and i still have been unable to find an answer.
I have a huge system with alot of different features. What is common for this system is of course that my users can
create, update, read & delete
different parts of my system.
For simple reasons lets say i have an application that has the following features:
Document administration
Video administration
User administration
Salery administration
(Please do note i took these at random just to prove a point that all of these would have their own separate tables and does not necessarily be connected).
Now i wish to create some sort of logging system. So that when ever someone either create,update or delete an entity it will be recorded.
Now as far as i can see i can do this two ways.
1.
Create a logging table for each of the 4 features that is in my system. However with this method i am required to create a logging table for each new feature i add to the system. i would also have to combine data from X number of tables if i wish to create a log which potentially could be a huge task!
2.
i could create something like the following:
However once again i would have to add a col for each new feature i will add.
So my question is what is the best way for creating logging database architecture
Or is there an easier way?

Instead of one target_xx for each feature, you could do it this way:
target_id | target_type
1 video
4 document
5 user
2 user
or even better. A table with target types and insert only the respective id's on target_type
Something like this:

if you want to capture for each table creation and update date, i would just use the default and the update event from mysql. You can define the fields like this for a table:
ALTER TABLE table
ADD COLUMN CreateDate Datetime DEFAULT CURRENT_TIMESTAMP,
ADD COLUMN LastModifiedDate Datetime ON UPDATE CURRENT_TIMESTAMP;
You can add these 2 fields in all tables. If you want to use one central table for logging (which might be more difficult to manage, because you always need to create joins, maybe also worse performance), then I would work with triggers.

Adding tables via DAO to a database

As a general question which would really help me "connect the dots" with my studies.
I am currently doing exercises working with DAO and Learning how to add tables automatically. Although i have been working with databases for many years, i question, what type of scenerarios would it be vantagious to use this function. When is it necessary to add tables to a database in an automatic way? Up until now, in all my experiences the tables i need have Always been defined from the beginning and I cant think of a situation where I could of benefited from using this function. For example, i use frequently delete queries to help me clear tables and re-populate them, but when would it be necessary to actually "create" a new table"?

Yes, I have seen a scenario where new tables were created 'on the fly' (either via SQL create, or just DAO). With a shared database on a server, the application called for importing Excel data that a particular user was responsible for, so a table was created on the fly. Multiple users, changes in staff, need to keep data independent, etc. we could create their own table (name based on userid) that they had interfaces to do whatever they wanted with their own data. Not a typical scenario, but worked well for this application.

adding data to interrelated tables..easier way?

I am a bit rusty with mysql and trying to jump in again..So sorry if this is too easy of a question.
I basically created a data model that has a table called "Master" with required fields of a name and an IDcode and a then a "Details" table with a foreign key of IDcode.
Now here's where its getting tricky..I am entering:
INSERT INTO Details (Name, UpdateDate) Values (name, updateDate)
I get an error: saying IDcode on details doesn't have a default value..so I add one then it complains that Field 'Master_IDcode' doesn't have a default value
It all makes sense but I'm wondering if there's any easy way to do what I am trying to do. I want to add data into details and if no IDcode exists, I want to add an entry into the master table. The problem is I have to first add the name to the fund Master..wait for a unique ID to be generated(for IDcode) then figure that out and add it to my query when I enter the master data. As you can imagine the queries are going to probably get quite long since I have many tables.
Is there an easier way? where everytime I add something it searches by name if a foreign key exists and if not it adds it on all the tables that its linked to? Is there a standard way people do this? I can't imagine with all the complex databases out there people have not figured out a more easier way.
Sorry if this question doesn't make sense. I can add more information if needed.
p.s. this maybe a different question but I have heard of Django for python and that it helps creates queries..would it help my situation?
Thanks so much in advance :-)

(decided to expand on the comments above and put it into an answer)
I suggest creating a set of staging tables in your database (one for each data set/file).
Then use LOAD DATA INFILE (or insert the rows in batches) into those staging tables.
Make sure you drop indexes before the load, and re-create what you need after the data is loaded.
You can then make a single pass over the staging table to create the missing master records. For example, let's say that one of your staging table contains a country code that should be used as a masterID. You could add the master record by doing something along the lines of:
insert
into master_table(country_code)
select distinct s.country_code
from staging_table s
left join master_table m on(s.country_code = m.country_code)
where m.country_code is null;
Then you can proceed and insert the rows into the "real" tables, knowing that all detail rows references a valid master record.
If you need to get reference information along with the data (such as translating some code) you can do this with a simple join. Also, if you want to filter rows by some other table this is now also very easy.
insert
into real_table_x(
key
,colA
,colB
,colC
,computed_column_not_present_in_staging_table
,understandableCode
)
select x.key
,x.colA
,x.colB
,x.colC
,(x.colA + x.colB) / x.colC
,c.understandableCode
from staging_table_x x
join code_translation c on(x.strange_code = c.strange_code);
This approach is a very efficient one and it scales very nicely. Variations of the above are commonly used in the ETL part of data warehouses to load massive amounts of data.
One caveat with MySQL is that it doesn't support hash joins, which is a join mechanism very suitable to fully join two tables. MySQL uses nested loops instead, which mean that you need to index the join columns very carefully.
InnoDB tables with their clustering feature on the primary key can help to make this a bit more efficient.
One last point. When you have the staging data inside the database, it is easy to add some analysis of the data and put aside "bad" rows in a separate table. You can then inspect the data using SQL instead of wading through csv files in yuor editor.

I don't think there's one-step way to do this.
What I do is issue a
INSERT IGNORE (..) values (..)
to the master table, wich will either create the row if it doesn't exist, or do nothing, and then issue a
SELECT id FROM master where someUniqueAttribute = ..
The other option would be stored procedures/triggers, but they are still pretty new in MySQL and I doubt wether this would help performance.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008