Translate data in MySQL database into another with slightly different structure? - mysql

I did some research on this and couldn't find many introductory resources for a beginner so I'm looking for a basic understanding here of how the process works. The problem I'm trying to solve is as such: I want to move data from an old database to a new one with a slightly different structure, possibly mutating the data a little bit in the process. Without going into the nitty gritty detail.. what are the general steps involved in doing this?
From what I gathered I would either be...
writing a ton of SQL queries manually (eesh)
using some complex tool that may be overkill for what I'm doing
There is a lot of data in the database so writing INSERT queries from a SQL dump seems like a nightmare. What I was looking for is some way to write a simple program that inserts logic like for each row in the table "posts", take the value of the "body" attribute and put it in the "post-body attribute of the new database or something like that. I'm also looking for functionality like append a 0 to the data in the "user id" column then insert it in the new database (just an example, the point is to mutate the data slightly).
In my head I can easily construct the logic of how the migration would go very easily (definitely not rocket science here).. but I'm not sure how to make this happen on a computer to iterate over the ridiculous amount of data without doing it manually. What is the general process for doing this, and what tools might a beginner want to use? Is this even a good idea for someone who has never done it before?
Edit: by request, here is an example of a mutation I'd like to perform:
Old database: table "posts" with an attribute post_body that is a varchar 255.
New database: table "posts" with an attribute body" that is a text datatype.
Want to take post-body from the old one and put it in body in the new one. Realize they are different datatypes but they are both technically strings and should be fine to convert, right? Etc. a bunch of manipulations like this.

Usually, the most time-consuming step of a database conversion is understanding both the old and the new structure, and establishing the correspondance of fields in each structure.
Compared to that, the time it takes to write the corresponding SQL query is ridiculously short.
for each row in the table "posts", take the value of the "body" attribute and put it in the "post-body attribute of the new database
INSERT INTO newdb.postattribute (id, attribute, value)
SELECT postid, 'post-body', body FROM olddb.post;
In fact, the tool that allows such data manipulation is... SQL! Really, this is already a very high-level language.

Related

How to implement custom fields in database

I need to implement a custom fields in my database so every user can add any fields he wants to his form/entities.
The user should be able to filter or/and sort his data by any custom field.
I want to work with MySQL because the rest of my data is very suitable to SQL. So, unless you have a great idea, SQL will be preferred over NoSQL.
We thought about few solutions:
JSON field - Great for dynamic schema. Can be filtered and sorted. The problem is that it is slower then regular columns.
Dynamic indexes can solve that but is it too risky to add indexes dynamically.
Key-value table - A simple solution but a really slow one. You can't index it properly and the queries are awful.
Static placeholder columns - Create N columns and hold a map of each field to its placeholder. - A good solution in terms of performance but it makes the DB not readable and it has limited columns.
Any thoughts how to improve any of the solutions or any idea for a new solution?
As many of the commenters have remarked, there is no easy answer to this question. Depending on which trade-offs you're willing to make, I think the JSON solution is neatest - it's "native" to MySQL, so easiest to explain and understand.
However, given that you write that the columns are specified only at set up time, by technically proficient people, you could, of course, have the set-up process include an "alter table" statement to add new columns. Your database access code and all the associated view logic would then need to be configurable too; it's definitely non-trivial.
However...it's a proven solution. Magento and Drupal, for instance, have admin screens for adding attributes to the business entities, which in turn adds columns to the relational database.

Store Miscellaneous Data in DB Table Row

Let's assume I need to store some data of unknown amount within a database table. I don't want to create extra tables, because this will take more time to get the data. The amount of data can be different.
My initial thought was to store data in a key1=value1;key2=value2;key3=value3 format, but the problem here is that some value can contain ; in its body. What is the best separator in this case? What other methods can I use to be able to store various data in a single row?
The example content of the row is like data=2012-05-14 20:07:45;text=This is a comment, but what if I contain a semicolon?;last_id=123456 from which I can then get through PHP an array with corresponding keys and values after correctly exploding row text with a seperator.
First of all: You never ever store more than one information in only one field, if you need to access them separately or search by one of them. This has been discussed here quite a few times.
Assuming you allwas want to access the complete collection of information at once, I recommend to use the native serialization format of your development environment: e.g. if it is PHP, use serialze().
If it is cross-plattform, JSON might be a way to go: Good JSON encoding/decoding libraries exist for something like all environments out there. The same is true for XML, but int his context the textual overhead of XML is going to bite a bit.
On a sidenote: Are you sure, that storing the data in additional tables is slower? You might want to benchmark that before finally deciding.
Edit:
After reading, that you use PHP: If you don't want to put it in a table, stick with serialize() / unserialize() and a MEDIUMTEXT field, this works perfectly, I do it all the time.
EAV (cringe) is probably the best way to store arbitrary values like you want, but it sounds like you're firmly against additional tables for whatever reason. In light of that, you could just save the result of json_encode in the table. When you read it back, just json_decode to get it back into an array.
Keep in mind that if you ever need to search for anything in this field, you're going to have to use a SQL LIKE. If you never need to search this field or join it to anything, I suppose it's OK, but if you do, you've totally thrown performance out the window.
it can be the quotes who separate them .
key1='value1';key2='value2';key3='value3'
if not like that , give your sql example and we can see how to do it.

DB Structure - Storage of relationships

This is a complex problem, so I'm going to try to simplify it.
I have a mysql instance on my server hosting a number of schemas for different purposes. The schemas are structured generally (not perfectly) in a EAV fashion. I need to transition information into and out of that structure on a regular basis.
Example1: in order to present the information on a webpage, I get the information, stick it into a complex object, which I then pass via json to the webpage, where I convert the json into a complex javascript object, which I then present with knockoutjs and similar things.
Conclusion: This resulted in a lot of logic being put into multiple places so that I could associate the values on the page with the values in the database.
Example2: in order to allow users to import information from a pdf, I have a lot of information stored in pdf form fields. In this case, I didn't write the pdf though, so the form fields aren't named in such a way that all of this logic is easy enough to write 3 or more times for CRUD.
Conclusion: This resulted in my copying a list of the pdf form fields to a table in the database, so that I could then somehow associate them with where their data should be placed. The problem that arose is that the fields on the pdf would need to associate with schema.table.column and the only way I found to store that information was via a VARCHAR
Neither of the examples are referring to a small amount of data (something like 6 tables in example 1 and somewhere around 1400 pdf form fields in example 2). Given Example1 and the resulting logic being stored multiple places, it seemed logical to build Example2, where I could store the relationships between data in the database where they could be accessed and changed consistently and for all involved methods.
Now, it's quite possible I'm just being stupid and all of my googling hasn't come across that there's an easy way to associate this data with the correct schema.table.column If this is the case, then telling me the right way to do that is the simple answer here.
However, and this is where I get confused. I have always been told that you never want to store information about a database in the database, especially not as strings (varchar). This seems wrong on so many levels and I just can't figure out if I'm being stupid, and it's better to follow Example1 or if there's some trick here that I've missed about database structure.
Not sure where you got "... never ... store information about a database in the database". With an EAV model it is normal to store the metamodel (the entity types and their allowable attributes) in the database itself so that it is self-describing. If you had to change the metamodel, would you rather change code or a few rows in a table?
The main drawback to EAV databases is that you lose the ability to do simple joins. Join-type operations become much more complex. Like everything else in life, you make tradeoffs depending on your requirements. I have seen self-describing EAV architectures used very successfully.

Performance of MySql Xml functions?

I am pretty excited about the new Mysql XMl Functions.
Now I can finally embed something like "object oriented" documents in my oldschool relational database.
For an example use-case consider a user who sings up at your website using facebook connect.
You can fetch an object for the user using the graph api, and get nice information. This information however can vary vastly. Some fields may or may not be set, some may be added over time and so on.
Well if you are just intersted in very special fields (for example friends relations, gender, movies...), you can project them into your relational database scheme.
However using the XMl functions you could store the whole object inside a field and then your different models can access the data using the ExtractValue function. You can store everything right away without needing to worry what you will need later.
But what will the performance be?
For example I have a table with 50 000 entries which represent useres.
I have an enum field that states "male", "female" (or various other genders to be politically correct).
The performance of for example fetching all males will be very fast.
But what about something like WHERE ExtractValue(userdata, '/gender/') = 'male' ?
How will the performance vary if the object gets bigger?
Can I maby somehow put an Index on specified xpath selections?
How do field types work together with this functions/performance. Varchar/blob?
Do I need fulltext indexes?
To sum up my question:
Mysql XML functins look great. And I am sure they are really great if you just want to store structured data that you fetch and analyze further in your application.
But how will they stand battle in procedures where there are internal scans/sorting/comparision/calculations performed on them?
Can Mysql replace document oriented databases like CouchDB/Sesame?
What are the gains and trade offs of XML functions?
How and why are they better/worse than a dynamic application that stores various data as attributes?
For example a key/value table with an xpath as key and the value as value connected to the document entity.
Anyone made any other experiences with it or has noticed something mentionable?
I tend to make comments similar to Pekka's, but I think the reason we cannot laugh this off is your statement "This information however can vary vastly." That means it is not realistic to plan to parse it all and project it into the database.
I cannot answer all of your questions, but I can answer some of them.
Most notably I cannot tell you about performance on MySQL. I have seen it in SQL Server, tested it, and found that SQL Server performs in memory XML extractions very slowly, to me it seemed as if it were reading from disk, but that is a bit of an exaggeration. Others may dispute this, but that is what I found.
"Can Mysql replace document oriented databases like CouchDB/Sesame?" This question is a bit over-broad but in your case using MySQL lets you keep ACID compliance for these XML chunks, assuming you are using InnoDB, which cannot be said automatically for some of those document oriented databases.
"How and why are they better/worse than a dynamic application that stores various data as attributes?" I think this is really a matter of style. You are given XML chunks that are (presumably) documented and MySQL can navigate them. If you just keep them as-such you save a step. What would be gained by converting them to something else?
The MySQL docs suggest that the XML file will go into a clob field. Performance may suffer on larger docs. Perhaps then you will identify sub-documents that you want to regularly break out and put into a child table.
Along these same lines, if there are particular sub-docs you know you will want to know about, you can make a child table, "HasDocs", do a little pre-processing, and populate it with names of sub-docs with their counts. This would make for faster statistical analysis and also make it faster to find docs that have certain sub-docs.
Wish I could say more, hope this helps.

Implementing a database structure for generic objects

I'm building a PHP/MySQL website and I'm currently working on my database design. I do have some database and MySQL experience, but I've never structured a database from scratch for a real world application which hopefully is going to get some good traffic, so I'd love to hear advices from people who've already done it, in order to avoid common mistakes. I hope my explanations are not too confusing.
What I need
In my application, the user should be able to write a post (title + text), then create an "object" (which can be anything, like a video, or a song, etc.) and attach it to the post. The site has a list of predefined object types the user can create, and I should be able to add new types in the future. The user should also have the ability to see the object's details in a dedicated page and add a comment to it - the same applies to posts.
What I tried
I created an objects table with these fields: oid, type, name and date. This table contains records for anything the user should be able to add comments to (i.e. posts and objects). Then I created a postmeta table which contains additional post data (such as text, author, last edit date, etc.), a videometa table for data about the "video" object (URL, description, etc.), and so on. A postobject table (pid,oid) links objects to posts. Additionally, there's a comments table which contains the comment text, the author and the ID of the object it refers to.
Since the list of object types is predefined and is probably not going to change (though I still need the ability to add a type easily at any time without changing the app's code structure or the database design), and it is relatively small, it's not a problem to create a "meta" table for each type and make a corresponding PHP class in my application to handle it.
Finally, a page on the site needs to show a list of all the posts including the objects attached to it, sorted by date. So I get all the records from the objects table with type "post" and join it with postmeta to get the post metadata. Then I query postobject to get all the objects attached to this post, and comments to get all the comments.
The questions
Does this make any sense? Is it any good to design a database in this way for a real world site? I need to join quite a few tables to get all the data I need, and the objects table is going to become huge since it contains almost every item (only the type, name and creation date, though) - this is to keep the database and the app code flexible, but does it work in the real world, or is it too expensive in the long term? Am I thinking about it in the wrong way with this kind of OOP approach?
More specifically: suppose I need to list all the posts, including their attached objects and metadata. I would need to join these tables, at least: posts, postmeta, postobject and {$objecttype}meta (not to mention an users table to get all posts by a specific user, for example). Would I get poor performance doing this, even if I'm using only numeric indexes?
Also, I considered using a NoSQL database (MongoDB) for this project (thanks to Stuart Ellis advice). Apparently it seems much more suitable since I need some flexibility here. But my doubt is: metadata for my objects includes a lot of references to other records in the database. So how would I avoid data duplication if I can't use JOIN? Should I use DBRef and the techniques described here? How do they compare to MySQL JOINs used in the structure described above in terms of performance?
I hope these questions do make any sense. This is my first project of this kind and I just want to avoid to make huge mistakes before I launch it and find out I need to rework the design completely.
I'm not a NoSQL person, but I wonder whether this particular case might actually be handled best with a document database (MongoDB or CouchDB). Various type of objects with metadata attached sounds like the kind of scenario that MongoDB is designed for.
FWIW, you've got a couple of issues with your table and field naming that might bite you later. For example, type and date are rather generic, and also reserved words. You've also mixed singular and plural table names, which will throw any automatic object mapping.
Whichever database you use, it's a good idea to find an existing set of database naming conventions and apply it from the start - this will help you avoid subtle issues and ensure that your naming stays consistent. I tend to use the Rails naming conventions ATM, because they are well-known and fairly sensible.
Or you could store the object contents as a file, outside of the database, if you're concerned about the database space.
If you store anything in the database, you already have the object type in objects; so you could just add object_contents table with a long binary field to store the object. You don't need to create a new table for each new type.
I've seen a lot of JOIN's in real world web application (5 to 10). Objects table may get large, but that's indices are for. So far, I don't see anything wrong in your database. BTW, what felt strange to me - one post, one object, and separate comments for each? No ability to mix pictures with text?