I've recent gone through the process of revamping my database, normaising a lot of entities. Obviously I now have a few more tables than I had. A lot of data I use on the website is readonly so this is simple to denormalise using a view, however there are entities that could benefit from denormalised retrieval but still need to be updated.
Here's an example.
A User may be a Member
A Member may have a Profile
A Member may have an Account
In addition I have 3 further lookup tables.
In total there are 3 tables for User and 4 tables for Member.
Ideally, I can create 2 views from the above tables.
However, User needs to be updated as do the entities belonging to Member. Additionally there are 6 separate tables associated with Users/Members, i.e. FavouriteCategories that also need to be retreived and updated from time to time.
I'm struggling to come up with the best, most efficient way of doing this.
I could simply not use views and bring all the entities and lookups into the model, but I would be reliant on EF to produce the retreival queries. The stuff I've read suggest that EF is not best at dealing with joined data.
I could add both the view and tables, using the tables for updates only. This seems sloppy due to the duplication, complication of the model, as well as underutilising the EF model functionality.
Maybe I could use the readonly view for data retrieval and create stored procs. I believe that the process of using EF with stored procs is a bit of a hack, so I'd probably keep the stored procs distinct from EF and simply pass params and call the SP via traditional methods. This again seems like a bit of a halfway house.
I'm not that experienced with .net or EF, so would appreciate some solid advice on either the methods I've referred to above or any better technique to acheive this. I don't want to go hacking the edmx file at this stage because... well it's just wrong.
I have a few entities that would benefit from the right solution. The User example is amongst the simplest, so there's a lot to gain from the right approach.
Help and advice would be very much appreciated.
Do you want to use EF? If yes use either first approach with not using views at all and allowing EF to handle everything or the last approach with using views and mapping stored procedures for insert, update and delete operations.
Combining mapped views for reading and mapped tables for modifications is possible as well but it is mostly the first solution (allowing EF to handle everything) with additional views for some query optimization.
You will not find cleaner approaches. Are mentioned approaches are valid solution for your problem. The only question is if you want to write SQL yourselves (view and stored procedures) or let EF to do that.
The worst approach is using EF for querying and manual calling of stored procedures for updating but in some cases it can be also useful.
Related
I have a MySQL Database and I need to create a Mongo Database (I don't care about keeping any data).
So are there any good practices for designing the structure (mongoose.Schema) based on the relational tables of MySQL ?
For example, the SQL has a table users and a table courses with relation 1:n, should I also create two collections in MongoDB or would it be better to create a new field courses: [] inside user document and create only the user collection ?
The schema definition should be driven by the use cases of the application.
Under which conditions is data accessed and modified. Which is the leading entity.
e.g. When a user is loaded do you always also want to know the courses of the user? This would be an argument for embedding.
Can you update courses without knowing all of its users, e.g. update the name of a course? Do you want to list an overview of all courses? This would be an argument for extracting into an own collection.
So there is no general guideline for such migration as only from the schema definition, the use cases cannot be derived.
If you don't care about data, the best approach is to redesign it from scratch.
NoSQLs differ from RDBMS in many ways so direct mapping will hardly be efficient and in many cases not possible at all.
First thing you need to answer to yourself (and probably to mention in the question) is why you need to change database in the first place. There are different kind of problems that Mongo can solve better than SQL and they require different data models. None of them come for free so you will need to understand the tradeoffs.
You can start from the very simple rule: in SQL you model your data after your business objects and describe relations between them, in Mongo you model data after queries that you need to respond to. As soon as you grasp the idea it will let you ask answerable questions.
It may worth reading https://www.mongodb.com/blog/post/building-with-patterns-a-summary as a starting point.
An old yet still quite useful https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1 Just keep in mind it was written long time ago when mongo did not have many of v4+ features. Nevertheless it describes philosophy of mongo data modelling with simple examples.It didn't change much since then.
I need to create dynamic tables in the database on the fly. For example, in the database I will have tables named:
Table
Column
DataType
TextData
NumberData
DateTimedata
BitData
Here I can add a table in the table named table, then I can add all the columns to that table in the columns table and associate a datatype to each column.
Basically I want to create tables without actually creating a table in the database. Is this even possible? If so, can you direct me to the right place so I can research? Also, I would prefer sql server or any free database software.
Thanks
What you are describing is an entity-attribute-value model (EAV). It is a very poor way to design a data model.
Although the data model is quite flexible, querying such a data model is quite complicated. You frequently end up having to self-join a table n times if you want to select or filter on n different attributes. That gets slow rather slow and becomes rather hard to optimize relatively quickly.
Plus, you generally end up building a lot of functionality that the database or your ORM would provide.
I'm not sure what the real problem you're having is, but the solution you proposed is the "database within a database" antipattern which makes so many people cringe.
Depending on how you're querying your data, if you were to structure things like you're planning, you'd either need a bunch of piece-wise queries which are joined in the middleware (slow) or one monster monolithic query (either slow or creates massive index bloat), if one is even possible.
If you must create tables on the fly, learn the CREATE TABLE ALTER TABLE and DROP TABLE DDL statements for the particular database engine you're using. Better yet, find an ORM that will do this for you. If your real problem is that you need to store unstructured data, check out MongoDB, Redis, or some of the other NoSQL variants.
My final advice is to write up the actual problem you're trying to solve as a separate question, and you'll probably learn a lot more.
Doing this with documents might be easier. Perhaps you should look at a noSQL solution such as mongoDB.
Or you can still create the Temporary tables but use a cronjob and create the Temporary tables every %% hours and rename it to the correct name after the query's are done. so your site is stil in the air
What you are trying to archive is not not bad but you must use it in the correct logic way.
*sorry for my bad english
I did something like this in LedgerSMB. While we use EAV modelling for a few things (where the flexibility is needed and the sort of querying we are doing is straight-forward, for example menu nodes use this in part), in general, you want to stay away from this as much as possible.
A better approach is to do all of what you are doing except for the data columns. Then you can (shock of shocks) just create the tables. This gives you a catalog of what you have added so your app knows this (and you can diff from the system catalogs if you ever have to check!) but at the same time you get actual relational modelling.
What we did in LedgerSMB was to have stored procedures that would accept a table name exists ('extends_' || name supplied). If so would add a column with the datatype required and write this to the application catalogs. This gives us relational modelling of extended attributes. At load time, the application loads the application catalogs and writes queries as appropriate at appropriate points to load/save the data. It works pretty well, actually.
I have three tables as seen in this image and I want to present it as seen in the last table. I can't figure out how to solve it - right now I'm using three nestled calls to display it.
First I loop through Customer to display all of them. Inside this loop I have a loop that goes through OrderCustom and inside that I check if there is a CustomerOrderCustom with the right Customer_id and OrderCustom_id.
Not only am I using a lot of queries but the view shows OrderCustom items that now Customer are using, in this case Zip Code. I'm using MySQL 5.
This is an entity-attribute-value database design. It is not relational in design and you will not be able to manipulate it with relational operations (such as JOINs) except for the most trivial examples.
If you are determined to store this non-relational data in a relational database you'll be dependent on either your own code or some EAV-based object serialization and deserialization library for whatever programming language you're using. SQL will be of little use to you.
If you are really required to use a model like this for this project (that is, you cannot adopt a relational model) then, if it is not too late in the development process, I would suggest abandoning SQL and reading up on XML, XPath, and XSLT which are probably a better fit for storing and recovering data in which each entry can have a different structure.
Bonus Article: "Why Entity-Attribute-Value is bad"
I have a table that has a lot of fields that are foreign keys referencing a related table. I am writing a script in PHP that will do the db queries. When I query this table for its data I need to know the values associated with these keys not the key.
How do most people go about this?
A 101 way to do this would be to query this table for its data including the foreign keys and then query the related tables to get each key's value. This could be a lot of queries (~10).
Question 1: I think I could write 1 query with a bunch of joins. Would that be better?
This approach also requires the querying script to know which table fields are foreign keys. Since I have many tables like this but all with different fields, this means writing nice generic functions is hard. MySQL InnoDB tables allow for foreign constraints. I know the database has these set up correctly.
Question 2: What about the idea of querying the table and identifying what the constraints are and then matching them up using whatever process I decide on from Question 1. I like this idea but never see it being used in code. Makes me think its not a good idea for some reason. I would use something like SHOW CREATE TABLE tbl_name; to find what constraints/relationships exist for that table.
Thank you for any suggestions or advice.
You talk about writing "nice generic functions", but I think you are thinking a little TOO generic here.
Personally I would just write a query with a bunch of joins in it. If you want to abstract all that join logic away and not have to worry about it, then you should probably look at using an ORM instead of writing the SQL directly.
At some level, the system should run queries using joins, whether those queries are written explicitly by the application programmer or generated automatically by the data access layer. Option 1 is definititely better than the naive option. As for some other query creation options (by no means an exhaustive list):
You could abstract out all database operations, much as PDO abstracts out connecting and query operations (i.e. preparing & executing queries). Use this to get table metadata, including foreign keys, which could then be used to construct queries automatically.
You could write object specifications in some other format (e.g. XML) and a class that would use that to both generate PHP classes and database tables. You find this more in Enterprise applications than smaller projects. This option has more overhead than others, and thus isn't suitable if you only have a few classes to model. Occurrences of this option might also be a consequence of Conway's Law, which I first heard as Richard Fairly's variant: "The structure of a system reflects the structure of the organization that built it."
You could take a LINQ-like approach. In PHP, this would mean writing numerous functions or methods that the application programmer can chain together which would create a query. The application programmers are ultimately responsible for joining tables, though they never write a JOIN themselves.
Rather than thinking about how to create the queries, a better problem approach is to think about how to interface the DB and the application. This leads to patterns such as Data Mapper and Active Record that fall into the category of Object-Relational mapping (ORM). Note that some patterns (such as AR), other ORM techniques and even ORM itself have issues of their own. Any of the above query creation options can be used in the implementation of a data access pattern.
The problem with using SHOW CREATE TABLE is it doesn't work with most (all?) other RDBMSs. If you want to marry your app to MySQL, go ahead, but the decision could haunt you.
What kind of record counts are you working with, both in the main data table(s) and the lookup tables?
As a general rule, you should join the lookup tables to the main table. If you have an excessive amount of joins and there aren't many UDFs involved here, there's a pretty good chance the table should be normalized a bit more. If the normalization is fine and the main data table is really wide, you could still split the table to multiple tables with 1:1 relationships so as to separate the frequently accessed columns from the infrequently accessed columns.
MySQL includes support for the ANSI catalog INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS. You could use that to gather information on the FK relationships that exist.
Also, if there are combinations of joins you use frequently, create a views or stored procedures based on those common operations.
Since MySQL started supporting stored procedures, I've never really used them. Partly because I'm not a great query writer, partly because I often work with DBAs who make those choices for me, partly because I'm just comfy with What I Know.
In terms of doing data selection, specifically when considering a select that is essentially a de-normalization (joins) and aggregate (avg or max, subqueries w/counts, etc) selection of data, what is the right choice in MySQL 5.x? A view? Or a stored procedure?
Views I'm comfortable with - you know what your SELECT query is supposed to look like so you just create that, make sure it indexed and whatnot, then just do a CREATE VIEW [View] AS SELECT [...]. Then, in my application, I treat the view as a read-only table - it represents a de-normalized version of my normalized data.
What are the disadvantages here - if any? And what would change (gains or losses) if I moved that exact same SELECT statement into a stored procedure?
I'm hoping to find some good 'under the hood' info that has been difficult to find while googling this topic but really I welcome all comments and answers.
In my opinion, Stored Procedures should be used solely for data manipulation when the same routine needs to be used amongst several different application or for ETL between databases or tables, nothing more. Basically, do as much in code as you can until you run into the DRY principle or what you are doing is simply moving data from one place to another within the DB.
Views can be used to provide an alternate or simplified "view" into the data. As such, I would go with a view as you are not really manipulating the data as much as finding a different method of displaying it.
Not sure if it's an either/or choice. Stored procedures can do a wide variety of things that views would struggle (think populating data in temp table then running cursor on it and then doing aggregation and returning a result set).
Views on the other hand can hide complex sql / access rights and present a modified view of the schema.
I think both have a place in the scheme of things and both are useful for a successful schema implementation.
I use views for de-normalisation or output formatting and stored procedures for filtering and data manipulation (things that require parameter inputs) or iteration (cursors).
I often access a view inside a stored procedure when both de-normalisation and filtering are required.
One thing to note, at least with mysql view results are stored in a temporary table and unlike most decent database engines this table is not indexed, so if using to just simplify queries, view are great when your program is going to grab all of the results from the view, however if your then searching the results of that view, based on parameters it is incredibly slow especially if there are millions of records to sift through and even worse if the view is built on top of other views and so on.
A stored procedure however you can pass those search parameters in and run the query directly against the underlining (indexed) tables. the downside is the results will need to be fetched every time the procedure is run, which may also occur with a view anyway depending on server configuration.
so basically if your using a view try to minimise the number of results (if you then need to search it) else use a stored procedure.