I have an XML file, with a schema defined in it.
The scheme has several nested elements (e.g., Family (root) -> Family Members (list of sub-nodes) ).
What would be the easiest way to break this down to a mysql database with multiple tables? Preferably an automated tool/GUI to handle this process. I am trying to avoid writing dedicated code to parse the file and extract the data, an approach that was common in other related questions.
I am using a mac, so windows tools are not relevant.
mysql has load xml as a command which is quite nice if your data can be formatted to match this specification. It's hard to tell if that would work for your dataset without seeing more.
The first thing you would have to do is create a mysql schema based on the XML schema. There are some projects to do this, but it's worth noting that not everything that can be described in XSD can be implemented in SQL.
You could use XSLT or regexp or an editor to get what you want, then do an import. If you have to use a DOM parser to convert your XML to CSVs to load to mysql, it's not too tough at all.
You're essentially asking how to automate the process of (relational) normalization, and that's very difficult if you're only starting from an instance. For example, if your instance has
<book>
<author>Kay</author>
</book>
there's no way of knowing whether a book can have multiple authors, which would affect the SQL table structure.
If you've got a schema then you can do better, but it's still not ideal because inferring the non-hierarchic relationships from an XSD is going to be pretty difficult. Apart from anything else, there are usually cross-document relationships which XSD can't describe - it's unusual to put all your data in one giant XML document.
To do this job properly, you really need to reverse-engineer the object model, and that requires a semantic understanding of the data, not just syntactic manipulation.
Related
I need help from you experts about practices regarding database development. I have a few questions regarding MySQL databases:
Is there a way for MySQL that a database and its structure is developed in an XML language and then converted to a fully functional MySQL database?
Is it possible to generate the XML source file from question 1 (see above) based on an existing database in MySQL ?
As far as I know, XML is not suitable for developing database structures. However can we say that XML is a language to demonstrate hierarchical structures and a MySQL database also shows a hierarchical structure, so in fact it is suitable for database development?
Thank you very much!
You can certainly store XML data in MySQL. You can also use any number of approaches to converted hierachical XML data into individual relational database field representations.
I would however say that if you just want to work with intact XML documents, you might look to go the NoSQL route, which is really better suited for this type of data storage. You also might consider JSON as the format for storage as it is more concise (saves space and transmissions badnwidth) and is more aligned with the popular NoSQL data stores out there.
1) yeah there is a way, but you should check out mongodb if you want a dynamic database structure, it was developed with that in mind. also, unless you need the rss features of xml or something similar, you might want to consider using json as a format for you documents.
2) json and mongodb work very well together to quickly and easily get documents in and out of the db. you can technically do it in mysql as well, but you might spend more time scripting in php or ruby to get the desired format you want.
3) you could use xml to demonstrate your db structure because of it's loose structure, but i'm not sure it would be intuitively clear to others. hard to say, really depends on how you implement it and how complicated your db structure is going to be.
I have a few XML files containing data for a research project which I need to run some statistics on. The amount of data is close to 100GB.
The structure is not so complex (could be mapped to perhaps 10 tables in a relational model), and given the nature of the problem, this data will never be updated again, I only need it available in a place where it's easy to run queries on.
I've read about XML databases, and the possibility of running XPATH-style queries on it, but I never used them and I'm not so comfortable with it. Having the data in a relational database would be my preferred choice.
So, I'm looking for a way to covert the data stored in XML into a relational database (think of a big .sql file similar to the one generated by mysqldump, but anything else would do).
The ultimate goal is to be able to run SQL queries for crunching the data.
After some research I'm almost convinced I have to write it on my own.
But I feel this is a common problem, and therefore there should be a tool which already does that.
So, do you know of any tool that would transform XML data into a relational database?
PS1:
My idea would be something like (it can work differently, but just to make sure you get my point):
Analyse the data structure (based on the XML themselves, or on a XSD)
Build the relational database (tables, keys) based on that structure
Generate SQL statements to create the database
Generate SQL statements to create fill in the data
PS2:
I've seen some posts here in SO but still I couldn't find a solution.
Microsoft's "Xml Bulk Load" tool seems to do something in that direction, but I don't have a MS SQL Server.
Databases are not the only way to search data. I can highly recommend Apache Solr
Strategies to Implement search on XML file
Keep your raw data as XML and search it using the Solr index
Importing XML files of the right format into a MySql database is easy:
https://dev.mysql.com/doc/refman/5.6/en/load-xml.html
This means, you typically have to transform your XML data into that kind of format. How you do this depends on the complexity of the transformation, what programming languages you know, and if you want to use XSLT (which is most probably a good idea).
From your former answers it seems you know Python, so http://xmlsoft.org/XSLT/python.html may be the right thing for you to start with.
Take a look at StAX instead of XSD for analyzing/extraction of data. It's stream based and can deal with huge XML files.
If you feel comfortable with Perl, I've had pretty good luck with XML::Twig module for processing really big XML files.
Basically, all you need is to setup few twig handlers and import your data into MySQL using DBI/DBD::mysql.
There is pretty good example on xmltwig.org.
If you comfortable with commercial products, you might want to have a look at Data Wizard for MySQL by the SQL Maestro Group.
This application is targeted especially at exporting and, of course, importing data from/ to MySQL databases. This also includes XML import. You can download a 30-day trial to check if this is what you are looking for.
I have to admit that I did not use the MySQL product line from them yet, but I had a good user experience with their Firebird Maestro and SQLite Maestro products.
I am aware of the batch LOAD XML technique e.g. Load XML Update Table--MySQL
Can MySql insert/replace rows directly from xml. I'd like to pass an XML string to MySQL.
Something like replace into user XML VALUES maybe even using as to map the tags to the column names??
The primary thing is that I dont want to parse the XML in my code, I'd like MySql to handle this. I dont have a file, I have the XML as a string.
I have looked and found there are some XML Functions:
12.11. XML Functions
The XML functions can do XPath, but I think this is a little fiddly as I have a 1:1 mapping from the XML to the table structure so I'd hjst like to be able to say hey MySql, insert the values in the xml string in to the table.
Is this possible?
In a nutshell, No.
What your looking for is an XML storage engine for MySQL. There has never been one created officially, and i have never seen a third party one either (but feel free to google).
If you really want to achieve this, then the closest you would get is to look for an alternative (R)DMS, but then that might not support the type of queries you wish to perform, may require a bit of a learning curve, would no doubt require you are using a server with superuser access, and potentially mean re-factoring a lot of your code.
This is kind of implementation question maybe. I wonder if I where to make a tool to convert some relational database to some other kind of database. What would the approach be?
If I for example want to convert data and the structure from a mysql database to mssql. Would I need to use regular expression to parse the SQL-file? Or maybe I could convert it to XML or JSON first and from that structure parse into my targeted database?
Using existing tools for converting mysql to mssql or anything similar is not in this scope. Since I want to know how it is actually done.
Well it's kind of a broad question, but generally speaking, having your own abstract representation of the structure and data would be a good thing, because you could extend your system "easily" by writing importers and exporters, and actually decouple your code a little by abstracting the relational db concepts into your own format.
The importers would "reverse engineer" a given database, by converting it to your own representation (as you say, xml/json or even your own query language -that would be better I guess-). Then the exporters would just convert from your format to the requested SQL dialect. No regular expressions, no other stuff "hardcoded".
This will allow you to extend your system and support a bigger number of sources and targets, and also handle errors like some SQL features from a "source" not supported in the selected "target".
My 2 cents, hope it helps!
So I'm starting to learn XML. It seems like a simple flat file data system of which you can view output by using a server side language of your choice and some parsing. I don't really see the benefit to using XML over storing values in a database and doing the same kind of parsing. I mean it would seem that databases would be faster.
So what can you really do with XML that you can't/shouldn't do with a database? Is XML really that useful?
So what can you really do with XML that you can't/shouldn't do with a database? Is XML really that useful?
XML is an interchange format first and foremost. It allows you to transport structured data between programs, servers, or people, and retain a common parser and schema system.
XML of course can be horribly misused or overused.
This question is to broad (i.e. there are too many aspects in which they differ), yet main reason for XML is not even about data storage. It was designed as ultimate common platform for data exchange with defined rules how data is organised. Thus you can read/write valid XML on almost every platfrom and language.
XML is designed to be more human readable. XML can be opened easily in a text editor and read. Some XML readers can support folding, which also helps with getting a hierarchical organization to your data.
If you're processing files that's a different story. I think databases often have the option of exporting to XML.
You can carry your datas from one type database to another (example from MS-SQL to MySQL) by using XML.
Or sending datas from an application to another, which is used on many web applications.
I think it can be very useful for this.
I think it is comparison of apples to oranges...
There are a lot of usages of XML but it is not primarily used for storing data. It is very loosely coupled data structure when compared to databases.
One of the many usages of XML, which I encounter with very frequently is exchanging data from one program to another. Because it is very simple format one can create an XML file in Java program and other can parse(read) the xml file in VB/C#/Python/Cocoa or any other language.
One such use of XML is Webservices where client programs can call(Execute) code residing on servers, where requests and response both are in XML.
So one can say that strong feature of XML is interoperability.
On the other had databases are mainly used for storing and retrieving data, databases are extremely powerful to do fast retrieval/insertion of values in tables where XML will immensely fail because most of the time XMLs have to be read serially as oppose to tables residing in databases.
XML can contain highly complex tree data structures that cannot be easily represented in relational databases.
XML is also useful for representing documents (Word docs for example or HTML).
The thing that's so appealing about XML is that it is quite simple to create.
Python is a great language for converting text files into XML for example.
XML vs databases is a false dichotomy, because you can store XML in databases. Though it's true that a simple XML document can sometimes be used for an application that would otherwise have needed a database.
If you're dealing with documents (like articles in technical journals) then your only real choice is between XML and some proprietary equivalent. This of course is the problem that XML was originally invented to solve.
XML is also used extensively for data messaging. It supplanted EDI and ASN.1 in this role because it can handle all the complex data that EDI and ASN.1 can handle, but is itself much simpler. More recently we've seen JSON taking over some of this role, especially for "private" (as distinct from standardised) protocols, because JSON is simpler still, and works better with general-purpose programming languages.
XML, like any successful technology, has also been used extensively for problems where it isn't really needed. That's not a misuse, any more than it is a misuse of this forum to send a plain text message in a field that is capable of holding richly formatted text, or to ride my bicycle on a road that's engineered to take 40ton lorries: once the technology is in place, you might as well use it.