I am assigned in a project where I need to work with a medium database. When I open that database I saw that the database is not correctly made and it MUST have more tables than it should be.No normalization is applied even !
But problem is the database has a medium scale of data of almost 500 users. when I will break the older database, the older users will loss their data.
But I must copy this data to the newly formatted table of the new database.(But all field may not match.) I think there is no tool to automate it, is there any?
Is there any best practice to follow to do such type of work?
Is the schema really a problem or do you just want to fix it because it's not 3rd normal form?
Anyways, I'd create an entirely new database with the desired, normalized schema and write some import routines.
If the database was / is heavily used, I'd create some views to maintain read compatibility (the views would have the same names as the former tables and the same columns), that way all you have to change are the insert / update parts and ofc. the connection strings.
The question beeing asked is:
Is there any tool that can transform a non-normalized database into a normalized database while preserving all its contents.
The answer is: no.
You have to fine-tune the database optimization to your needs.
Related
I am working on a django project which shows images the db will have image links and descriptions(keywords) i dont want duplicate entries in the imagelink which is the best database to use with it? shall i go with mysql to avoid duplicate entries? but i realy like the json structure of mongodb it is easy to handle i think Please suggest a good one
You are comparing two different regimes here.
MySql
- You have to decide on schema before hand. If you need to change design then migration also needs to be taken in picture.
- But on other hand you get relational benefits.
MongoDb
- You can have flexible on the go column expansion, or can have more generic classifications all merged in single collection(like Tables in MySql)
- With that goes db level checks also. As your db is accepting different data hence you cannot at same time ask to classify it. Although newer versions have introduced something on these lines but they are start.
You first need to decide how your application needs to scale up. If you have structured data inflow and want to leverage relational interpretation of data then go for MySql, else if you will be scaling application and structure change is seen go for MongoDB.
Ease of coding level may differ but this kind of choice should be more on application type level.
Going for NoSQL DB has its own advantages, scaling will not be a problem anymore and also the performance of the DB.
MongoDB would be a good option since you are storing the image links and keywords. NoSQL DB's are best DB for non-transactional data. In this case here, the image links are not going to be changed at all or changed very rarely.
To avoid the duplicate entries to be stored use an index for the key which stores the image link and make it as an unique index. Also you can use a text index for the keywords to make your search faster.
I need to implement a custom fields in my database so every user can add any fields he wants to his form/entities.
The user should be able to filter or/and sort his data by any custom field.
I want to work with MySQL because the rest of my data is very suitable to SQL. So, unless you have a great idea, SQL will be preferred over NoSQL.
We thought about few solutions:
JSON field - Great for dynamic schema. Can be filtered and sorted. The problem is that it is slower then regular columns.
Dynamic indexes can solve that but is it too risky to add indexes dynamically.
Key-value table - A simple solution but a really slow one. You can't index it properly and the queries are awful.
Static placeholder columns - Create N columns and hold a map of each field to its placeholder. - A good solution in terms of performance but it makes the DB not readable and it has limited columns.
Any thoughts how to improve any of the solutions or any idea for a new solution?
As many of the commenters have remarked, there is no easy answer to this question. Depending on which trade-offs you're willing to make, I think the JSON solution is neatest - it's "native" to MySQL, so easiest to explain and understand.
However, given that you write that the columns are specified only at set up time, by technically proficient people, you could, of course, have the set-up process include an "alter table" statement to add new columns. Your database access code and all the associated view logic would then need to be configurable too; it's definitely non-trivial.
However...it's a proven solution. Magento and Drupal, for instance, have admin screens for adding attributes to the business entities, which in turn adds columns to the relational database.
I'm currently developing an API for a company that didn't do a very good job on maintaining a good test database with test data. The MySQL database structure is quite big and complicated and the live database should be around 160-200GB.
And because I'm quite lazy and don't want to create test data for all the table from scratch, I was wondering what would be the best way to turn such a big database into a smaller test database that keeps all data with their relationships in a correct form. Is there an easy way to this with some kind of script that checks the database model and knows what kind of data it needs to keep or delete when reducing the database to smaller size?
Or am I doomed and have to go through the tedious task of creating my own test data?
Take a look at Jailer which describes itself as a "Database Subsetting and Browsing Tool". It is specifically designed to select a subset of data, following the database relationships/constraints to include all related rows from linked tables. To limit the amount of data you export, you can set a WHERE clause on the table you are exporting.
The issue of scrubbing your test data to remove customer data is still there, but this will be easier once you have a smaller subset to work with.
In adition to Liath recomend:
maybe its a hard way but u can just export your schema (no data) and then make a stored procedure to iterate on your (original) tables and make a simple:
insert into dest_table (fields) (select * from origin_table where (`external_keys already inserted`) limit 100)
or somethink like.
thanks to #Liath : external_keys already inserted you hav to make a filter to ensure that any external key of this table already exist on your test database. So you also need to iterate your tables in order by external keys
another way its to export your data and edit the sql.dump file to remove the unwanted data (realy hard way)
I would suggest that it doesn't matter how thorough you are the risk of getting live customer details into a test database is too high. What happens if you accidentally email or charge a real customer for something your testing!?
There are a number of products out there such as RedGate's Data Generator which will create test data for you based on your schema (there is a free trial I believe so you can check it meets your needs before committing).
Your other alternative? Hire a temp to enter data all day!
ETA: Sorry - I just saw you're looking more at MySQL rather than MSSQL which probably rules out the tool I recommended. A quick google produces similar results.
I need to create dynamic tables in the database on the fly. For example, in the database I will have tables named:
Table
Column
DataType
TextData
NumberData
DateTimedata
BitData
Here I can add a table in the table named table, then I can add all the columns to that table in the columns table and associate a datatype to each column.
Basically I want to create tables without actually creating a table in the database. Is this even possible? If so, can you direct me to the right place so I can research? Also, I would prefer sql server or any free database software.
Thanks
What you are describing is an entity-attribute-value model (EAV). It is a very poor way to design a data model.
Although the data model is quite flexible, querying such a data model is quite complicated. You frequently end up having to self-join a table n times if you want to select or filter on n different attributes. That gets slow rather slow and becomes rather hard to optimize relatively quickly.
Plus, you generally end up building a lot of functionality that the database or your ORM would provide.
I'm not sure what the real problem you're having is, but the solution you proposed is the "database within a database" antipattern which makes so many people cringe.
Depending on how you're querying your data, if you were to structure things like you're planning, you'd either need a bunch of piece-wise queries which are joined in the middleware (slow) or one monster monolithic query (either slow or creates massive index bloat), if one is even possible.
If you must create tables on the fly, learn the CREATE TABLE ALTER TABLE and DROP TABLE DDL statements for the particular database engine you're using. Better yet, find an ORM that will do this for you. If your real problem is that you need to store unstructured data, check out MongoDB, Redis, or some of the other NoSQL variants.
My final advice is to write up the actual problem you're trying to solve as a separate question, and you'll probably learn a lot more.
Doing this with documents might be easier. Perhaps you should look at a noSQL solution such as mongoDB.
Or you can still create the Temporary tables but use a cronjob and create the Temporary tables every %% hours and rename it to the correct name after the query's are done. so your site is stil in the air
What you are trying to archive is not not bad but you must use it in the correct logic way.
*sorry for my bad english
I did something like this in LedgerSMB. While we use EAV modelling for a few things (where the flexibility is needed and the sort of querying we are doing is straight-forward, for example menu nodes use this in part), in general, you want to stay away from this as much as possible.
A better approach is to do all of what you are doing except for the data columns. Then you can (shock of shocks) just create the tables. This gives you a catalog of what you have added so your app knows this (and you can diff from the system catalogs if you ever have to check!) but at the same time you get actual relational modelling.
What we did in LedgerSMB was to have stored procedures that would accept a table name exists ('extends_' || name supplied). If so would add a column with the datatype required and write this to the application catalogs. This gives us relational modelling of extended attributes. At load time, the application loads the application catalogs and writes queries as appropriate at appropriate points to load/save the data. It works pretty well, actually.
This is a complex problem, so I'm going to try to simplify it.
I have a mysql instance on my server hosting a number of schemas for different purposes. The schemas are structured generally (not perfectly) in a EAV fashion. I need to transition information into and out of that structure on a regular basis.
Example1: in order to present the information on a webpage, I get the information, stick it into a complex object, which I then pass via json to the webpage, where I convert the json into a complex javascript object, which I then present with knockoutjs and similar things.
Conclusion: This resulted in a lot of logic being put into multiple places so that I could associate the values on the page with the values in the database.
Example2: in order to allow users to import information from a pdf, I have a lot of information stored in pdf form fields. In this case, I didn't write the pdf though, so the form fields aren't named in such a way that all of this logic is easy enough to write 3 or more times for CRUD.
Conclusion: This resulted in my copying a list of the pdf form fields to a table in the database, so that I could then somehow associate them with where their data should be placed. The problem that arose is that the fields on the pdf would need to associate with schema.table.column and the only way I found to store that information was via a VARCHAR
Neither of the examples are referring to a small amount of data (something like 6 tables in example 1 and somewhere around 1400 pdf form fields in example 2). Given Example1 and the resulting logic being stored multiple places, it seemed logical to build Example2, where I could store the relationships between data in the database where they could be accessed and changed consistently and for all involved methods.
Now, it's quite possible I'm just being stupid and all of my googling hasn't come across that there's an easy way to associate this data with the correct schema.table.column If this is the case, then telling me the right way to do that is the simple answer here.
However, and this is where I get confused. I have always been told that you never want to store information about a database in the database, especially not as strings (varchar). This seems wrong on so many levels and I just can't figure out if I'm being stupid, and it's better to follow Example1 or if there's some trick here that I've missed about database structure.
Not sure where you got "... never ... store information about a database in the database". With an EAV model it is normal to store the metamodel (the entity types and their allowable attributes) in the database itself so that it is self-describing. If you had to change the metamodel, would you rather change code or a few rows in a table?
The main drawback to EAV databases is that you lose the ability to do simple joins. Join-type operations become much more complex. Like everything else in life, you make tradeoffs depending on your requirements. I have seen self-describing EAV architectures used very successfully.