Database design for keeping track of experiment data - mysql

I am designing a database to record experiment results. Basically, an experiment has several input parameters and an output response. Therefore, the data table will look like following:
run_id parameter_1 parameter_2 ... parameter_n response
1 ... ... ... ...
2 ... ... ... ...
.
.
.
However, the structure of this table is not determinant since different experiments have different number of columns. Then the question is: when a user instantiate an experiment, is it a good idea to create data table dynamically on the fly? Otherwise, what is the elegant solution for that? Thanks.

When I find myself trying to dynamically create tables during runtime, it usually means I need another table to resolve a relationship between entities. In short, I would recommend treating your input parameters as a separate entity and store them in a separate table.
It sounds like your entities are:
experiment
runs of an experiment, which consist of a response and one or more:
input parameters
The relationships between entities is:
One experiment to zero or more runs
One run to one or more input parameter values (one to many)
This last relationship will require an additional table to resolve. You can have a separate table that stores your input parameters, and associate the input parameters with a run_id. This table could look like:
run_parameter_id ... run_id_fk ... parameter_keyword ... parameter_value
Where run_id_fk is a foreign key to the appropriate row in the Runs table (described in your question). The parameter_keyword is just used to keep track of the name of the parameter (parameter_1_exp1, parameter_2_exp1, etc).
Your queries to read/write from the database now become a bit more complicated (needing a join), but no longer reliant on creating tables on the fly.
Let me know if this is unclear and I can provide a potential database diagram.

Related

Performance of modelling inheritance in database using superclass table

My Question, is actually a question about the usability / performance of a concept / idea I had:
The Setup:
Troughout my Database, two (actually three) fields always re-appear constantly: title and description (and created). The title is always a VARCHAR(100) and the description always a TEXT.
Now, to simplify those tables, I thought about something (and changed it in that way): Wouldnt it be more useful to just create a table named content, with id, title, description and created as only fields, and always point to that table from all others?
Example:
table tab has id, key and content_id (instead of title, description and created)
table chapter has id, story_id and content_id (" ")
etc
The Question:
Everything works fine so far, but my only fear is performance. Will I run into a bottleneck, doing it this way, or should I be fine? I have about 23 different tables pointing to content right now, and some of them will hold user-defined content (journals, comments, etc) - so the number of entries in content could get quite high.
Is this setup better, or equal to having title and description in every separate table?
Edit: And if it turns out to be a bad idea, what are alternatives to mantain/copying certain fields like title and description into ~25 tables?
Thanks in advance for the help!
There is no clear answer for your question because it mainly depends on usage of the tables, so just consider following points:
How often will you need write to the tables? In case of many inserts/updates having data in one big table can cause problems because all write operations will target the same table.
How often do you need data stored in table with common data? If title or description are not needed most of the time for your select this can be OK. If you need title every time then take into account that you wile always have to JOIN table with common data.
How do you manage your database schema? It can be easier to write some simple tool for creation/checking table structure. In MySQL you can easily access data dictionary with DESCRIBE table_name or through INFORMATION_SCHEMA database.
I'm working on project with 700+ tables where some of the fields have to be present in every table (when was record created, timestamp of last modification). We have simple script that helps with this, because having all data in one table would be disastrous.

Each user has different 'structure' using only one table

I'm trying to do it like this:
Every single user can choose fields (like structures on MySQL) where this fields can handle their respective value, it's like doing a DB inside a DB.
But how can I do it using a single table?
(not talking about user accounts etc where I should be able to use a pointer to his own "structure")
Do something like: varchar Key where register something like "Name:asd" where PHP explode : to get the respective structure ('name' in this case) and the respective value? ('asd')
Use BLOB? can someone turn the light on for me? I don't know how to do something where works better than my current explanation...
I know my text is confuse and sorry for any bad english.
EDIT:
Also, they could add multiple keys/"structures" where accepts a new value
And they are not able to see the Database or Tables, they still normal users
My server does not support Postogre
In my opinion you should create two tables.
with the user info
with 3 fields (userid, key and value)
Each user has 1 record in the first table. Each user can have 0 or more records in the second table. This will ensure you can still search the data and that users can easily add more key/value pairs when needed.
Don't start building a database in a database. In this case, since the user makes the field by himself there is no relation between the fields as I understand? In that case it would make sense to take a look at the NoSQL databases since they seem to fit very good for this kind of situations.
Another thing to check is something like:
http://www.postgresql.org/docs/8.4/static/hstore.html
Do not try to build tables like: records, fields, field types etc. That's a bad practice and should not be needed.
For a more specific answer on your wishes we need a bit more info about the data the user is storing.
While i think the rational answer to this question is the one given by PeeHaa, if you really want the data to fit into one table you could try saving a serialized PHP array in one of the fields. Check out serialize and unserialize
Generates a storable representation of a value
This is useful for storing or passing PHP values around without losing
their type and structure.
This method is discouraged as it is not at all scalable.
Use a table with key-value pairs. So three columns:
user id
key ("name")
value ("asd")
Add an index on user id, so that you can query a user's attributes easily. If you wanted to query all users with the same properties, then you could add a second index on key and/or value.
Hope you are using a programming language also to get the data and present them.
You can have a single table which has a varchar field. Then you store the serialized data of the field structure and their value in that field. When you want to get the structure, query the data and De-serialize that varchar field data.
As per my knowledge every programming language supports serialization and De-serialization.
Edited : This is not a scalable option.

Implementing Comments and Likes in database

I'm a software developer. I love to code, but I hate databases... Currently, I'm creating a website on which a user will be allowed to mark an entity as liked (like in FB), tag it and comment.
I get stuck on database tables design for handling this functionality. Solution is trivial, if we can do this only for one type of thing (eg. photos). But I need to enable this for 5 different things (for now, but I also assume that this number can grow, as the whole service grows).
I found some similar questions here, but none of them have a satisfying answer, so I'm asking this question again.
The question is, how to properly, efficiently and elastically design the database, so that it can store comments for different tables, likes for different tables and tags for them. Some design pattern as answer will be best ;)
Detailed description:
I have a table User with some user data, and 3 more tables: Photo with photographs, Articles with articles, Places with places. I want to enable any logged user to:
comment on any of those 3 tables
mark any of them as liked
tag any of them with some tag
I also want to count the number of likes for every element and the number of times that particular tag was used.
1st approach:
a) For tags, I will create a table Tag [TagId, tagName, tagCounter], then I will create many-to-many relationships tables for: Photo_has_tags, Place_has_tag, Article_has_tag.
b) The same counts for comments.
c) I will create a table LikedPhotos [idUser, idPhoto], LikedArticles[idUser, idArticle], LikedPlace [idUser, idPlace]. Number of likes will be calculated by queries (which, I assume is bad). And...
I really don't like this design for the last part, it smells badly for me ;)
2nd approach:
I will create a table ElementType [idType, TypeName == some table name] which will be populated by the administrator (me) with the names of tables that can be liked, commented or tagged. Then I will create tables:
a) LikedElement [idLike, idUser, idElementType, idLikedElement] and the same for Comments and Tags with the proper columns for each. Now, when I want to make a photo liked I will insert:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Photo'
INSERT (user id, typeId, photoId)
and for places:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Place'
INSERT (user id, typeId, placeId)
and so on... I think that the second approach is better, but I also feel like something is missing in this design as well...
At last, I also wonder which the best place to store counter for how many times the element was liked is. I can think of only two ways:
in element (Photo/Article/Place) table
by select count().
I hope that my explanation of the issue is more thorough now.
The most extensible solution is to have just one "base" table (connected to "likes", tags and comments), and "inherit" all other tables from it. Adding a new kind of entity involves just adding a new "inherited" table - it then automatically plugs into the whole like/tag/comment machinery.
Entity-relationship term for this is "category" (see the ERwin Methods Guide, section: "Subtype Relationships"). The category symbol is:
Assuming a user can like multiple entities, a same tag can be used for more than one entity but a comment is entity-specific, your model could look like this:
BTW, there are roughly 3 ways to implement the "ER category":
All types in one table.
All concrete types in separate tables.
All concrete and abstract types in separate tables.
Unless you have very stringent performance requirements, the third approach is probably the best (meaning the physical tables match 1:1 the entities in the diagram above).
Since you "hate" databases, why are you trying to implement one? Instead, solicit help from someone who loves and breathes this stuff.
Otherwise, learn to love your database. A well designed database simplifies programming, engineering the site, and smooths its continuing operation. Even an experienced d/b designer will not have complete and perfect foresight: some schema changes down the road will be needed as usage patterns emerge or requirements change.
If this is a one man project, program the database interface into simple operations using stored procedures: add_user, update_user, add_comment, add_like, upload_photo, list_comments, etc. Do not embed the schema into even one line of code. In this manner, the database schema can be changed without affecting any code: only the stored procedures should know about the schema.
You may have to refactor the schema several times. This is normal. Don't worry about getting it perfect the first time. Just make it functional enough to prototype an initial design. If you have the luxury of time, use it some, and then delete the schema and do it again. It is always better the second time.
This is a general idea
please donĀ“t pay much attention to the field names styling, but more to the relation and structure
This pseudocode will get all the comments of photo with ID 5
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "comment"
This pseudocode will get all the likes or users who liked photo with ID 5
(you may use count() to just get the amount of likes)
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "like"
as far as i understand. several tables are required. There is a many to many relation between them.
Table which stores the user data such as name, surname, birth date with a identity field.
Table which stores data types. these types may be photos, shares, links. each type must has a unique table. therefore, there is a relation between their individual tables and this table.
each different data type has its table. for example, status updates, photos, links.
the last table is for many to many relation storing an id, user id, data type and data id.
Look at the access patterns you are going to need. Do any of them seem to made particularly difficult or inefficient my one design choice or the other?
If not favour the one that requires the fewer tables
In this case:
Add Comment: you either pick a particular many/many table or insert into a common table with a known specific identifier for what is being liked, I think client code will be slightly simpler in your second case.
Find comments for item: here it seems using a common table is slightly easier - we just have a single query parameterised by type of entity
Find comments by a person about one kind of thing: simple query in either case
Find all comments by a person about all things: this seems little gnarly either way.
I think your "discriminated" approach, option 2, yields simpler queries in some cases and doesn't seem much worse in the others so I'd go with it.
Consider using table per entity for comments and etc. More tables - better sharding and scaling. It's not a problem to control many similar tables for all frameworks I know.
One day you'll need to optimize reads from such structure. You can easily create agragating tables over base ones and lose a bit on writes.
One big table with dictionary may become uncontrollable one day.
Definitely go with the second approach where you have one table and store the element type for each row, it will give you a lot more flexibility. Basically when something can logically be done with fewer tables it is almost always better to go with fewer tables. One advantage that comes to my mind right now about your particular case, consider you want to delete all liked elements of a certain user, with your first approach you need to issue one query for each element type but with the second approach it can be done with only one query or consider when you want to add a new element type, with the first approach it involves creating a new table for each new type but with the second approach you shouldn't do anything...

adding data to interrelated tables..easier way?

I am a bit rusty with mysql and trying to jump in again..So sorry if this is too easy of a question.
I basically created a data model that has a table called "Master" with required fields of a name and an IDcode and a then a "Details" table with a foreign key of IDcode.
Now here's where its getting tricky..I am entering:
INSERT INTO Details (Name, UpdateDate) Values (name, updateDate)
I get an error: saying IDcode on details doesn't have a default value..so I add one then it complains that Field 'Master_IDcode' doesn't have a default value
It all makes sense but I'm wondering if there's any easy way to do what I am trying to do. I want to add data into details and if no IDcode exists, I want to add an entry into the master table. The problem is I have to first add the name to the fund Master..wait for a unique ID to be generated(for IDcode) then figure that out and add it to my query when I enter the master data. As you can imagine the queries are going to probably get quite long since I have many tables.
Is there an easier way? where everytime I add something it searches by name if a foreign key exists and if not it adds it on all the tables that its linked to? Is there a standard way people do this? I can't imagine with all the complex databases out there people have not figured out a more easier way.
Sorry if this question doesn't make sense. I can add more information if needed.
p.s. this maybe a different question but I have heard of Django for python and that it helps creates queries..would it help my situation?
Thanks so much in advance :-)
(decided to expand on the comments above and put it into an answer)
I suggest creating a set of staging tables in your database (one for each data set/file).
Then use LOAD DATA INFILE (or insert the rows in batches) into those staging tables.
Make sure you drop indexes before the load, and re-create what you need after the data is loaded.
You can then make a single pass over the staging table to create the missing master records. For example, let's say that one of your staging table contains a country code that should be used as a masterID. You could add the master record by doing something along the lines of:
insert
into master_table(country_code)
select distinct s.country_code
from staging_table s
left join master_table m on(s.country_code = m.country_code)
where m.country_code is null;
Then you can proceed and insert the rows into the "real" tables, knowing that all detail rows references a valid master record.
If you need to get reference information along with the data (such as translating some code) you can do this with a simple join. Also, if you want to filter rows by some other table this is now also very easy.
insert
into real_table_x(
key
,colA
,colB
,colC
,computed_column_not_present_in_staging_table
,understandableCode
)
select x.key
,x.colA
,x.colB
,x.colC
,(x.colA + x.colB) / x.colC
,c.understandableCode
from staging_table_x x
join code_translation c on(x.strange_code = c.strange_code);
This approach is a very efficient one and it scales very nicely. Variations of the above are commonly used in the ETL part of data warehouses to load massive amounts of data.
One caveat with MySQL is that it doesn't support hash joins, which is a join mechanism very suitable to fully join two tables. MySQL uses nested loops instead, which mean that you need to index the join columns very carefully.
InnoDB tables with their clustering feature on the primary key can help to make this a bit more efficient.
One last point. When you have the staging data inside the database, it is easy to add some analysis of the data and put aside "bad" rows in a separate table. You can then inspect the data using SQL instead of wading through csv files in yuor editor.
I don't think there's one-step way to do this.
What I do is issue a
INSERT IGNORE (..) values (..)
to the master table, wich will either create the row if it doesn't exist, or do nothing, and then issue a
SELECT id FROM master where someUniqueAttribute = ..
The other option would be stored procedures/triggers, but they are still pretty new in MySQL and I doubt wether this would help performance.

Merging tables from two Access databases into one new common

I have this assignment that I think someone should be able to help me. I have 5 ACCESS databases wvrapnaoh.accdb, wvrappaul.accdb, ....etc. These databases have about 45 tables each and 15 forms. The good part is the structure, the name and the fields of each table in all the databases are all the same except the data or the records are different. For example I have a stress table in wvrapnoah as well as wvrappaul with the same fields in both tables but different data or records.
So, I need to merge all these five into a new Access database that will have the same structure as the 5 databases but will include the complete data that is all the records from the 5 databases merged into this new database.The same applies to even the 15 forms. It does not seem to be having a primary key I guess. I was planning to add a field for each table that would give me the name of the database as well from which it was merged. Example I will add a DBName field in Wvrapnoah in all the tables and add the name Noah in that field for all the records in each table. I basically need to automate this code.
I need a script (VBA or anything) so that the guys creating these databases can just run this script the next time and merge the databases.
Talking about the 'table' part of the problem:
Questions
Are the databases / table names defined or you don't know them?
Are you able to use linked tables?
I believe the straightforward way to merge all of them is to link all tables into a single access DB and then run a UNION ALL query. It would be something like this:
SELECT "HANK", *
FROM MyTableHank
UNION ALL
SELECT "JOHN", *
FROM MyTableJohn;
Notice I defined a field to identify the origin of the data being merged ("HANK", "JOHN"), as you suggested above.
About the forms, I believe you'll need to import them and then review the whole code. It basically depends on what the forms are doing. If they're query-based won't be a big deal (importing / fixing the queries, will make the form works). However, if the forms are related to the tables, you'll have more work to do.