XML flat file vs. relational database backend - mysql

Most projects now need some form of a database. When someone says database, I usually think relational databases, but I still hear about flat file XML databases.
What parameters do you take into consideration when deciding between a "real" database and a flat-file XML database. When should one be used over the other, and under what circumstances should I never consider using a flat file (or vice versa a relational) database?

There is no such thing as a xml flat file database. Flat xml files are non-databases in that they have no higher functions like indices - have fun with larger datasets and searches or analytical queries without any index.
XML databases are another topic and may have their needs (content management, document storage in general - complicated schemata you dont care too much from the database point of view).
Flat files are fine for things like settings 8smaller files), but a real database is a real database. ACID conditions are hard to guarantee for flat files.

To add to Rachel's answer.
concurrency
read vs. write
If you have something simple that's going to be read often and is not going to change much it might be more optimal to use a flat file and save the overhead.
On the other hand, if you have to support multiple connections that are going to be adding and updating the data you'll want to use a database.

Few Parameters to consider is
Amount of Data
Complexity of Data
Relationship between Data
If we have less amount of Data with low Complexity and no inter-relationship than people would go for Flat file but in real application this is rarely the case and so you will always find Relational Databases used very often.

XML file is not a database. Read Joel's "Back to basics" article to see the difference.

Related

Where to store large number of JSON files

We are in the process of setting up a web application (start up at present). The web application will quickly grow in terms of number of JSON files that it needs to handle. We are probably talking about 5-10 million files. The individual JSON files are not particularly large - maybe in the region of 150K per file. Files will unlikely be accessed concurrently so individual users have their set of individual files.
The question I would like to put out there is simply how to best store the JSON files. Is a CDN best where links are stored in a relational database? Or should I jump on the bandwagon and go down the route of a NoSQL database? Or maybe there are other solutions I haven't thought about???
really looking for some good advice, ideally from someone with experience about large databases.
Many thanks in advance!!!!
Markus
I would consider looking into MongoDB since it already stores its documents in a json format.
You could also stick it into a regular relational db, but the nice thing about working with json documents in mongo is that you will have query capabilities against the documents, so that you don't have to load the entire document always.
If all you want is quick access to a write-once-read-many type of storage, then you can also consider DBM. It is fast, cheap, reliable.
Assuming you will compress the file contents, JSON-ness is probably a nonfactor from storage perspective.
Reliability - can you tolerate some statistical loss? If not, an all-or-bust DBs is the only choice left. If not, filesystem-based storage may be an alternative. Filesystems are not as fanatical as DB on whole data integrity checks. And they are much better supported. Serving files is easier; but keeping track of versions takes more design time effort. A common enough pattern is to serve product images and other collateral out of filesystem while keeping other data in an rdbms.
If you consider CDN -> relational DB then could also consider CDN -> {filesystem, inode}, keeping filesystems balanced explicitly in terms of file count.
NoSQL database, like MongoDB, might have restart and recovery times beyond your tolerance levels. Otherwise it's great tool. Many RDBMS have raw partition support for much better IO. At 150KB one must use a TEXT or CLOB field, just a minor annoyance.
HTH. Will appreciate if you shared back what you actually used.

XML or MySQL for User Database?

Might seem a strange question but would there be a performance benefit in using XML for a database rather than MySQL and tables?
To put this into context I wil be creating a website that has user profiles. I know more XML than MySQL and know most ppl will use MySQL as standard but was wondering if anyone could throw some pennies this way about how the two compare and if this suggestion is as outrageous to anyone understanding what the big O notation is as it could be...
The bigger xml file, the more memory usage because you'll have to load the entire xml file to RAM whilst running your script.
An average MySQL database is about 4mb big. Lets take that to a xml file of 4 mb, loaded to ram 4 mb, loaded from disk, into ram at every pageview, with about 25 visitors at any given moment that's 100mb already lost, let's say they flick a lotthrough pages it adds up to a fast 1 gigabyte of ram.
Not to mention you'll add about 1 second to page load every time, if not longer.
Not to mention continueus disk load for reading and writing changed vars. Threaded fork issues when two vitors want to update the same xml file.
These problems you don't have with an SQL server.
MySQL has indexes, and it's optimized for the binary values you will be storing. All you have with an xml file, is a plain file.. and any optimizations (caching, indexing, anything you can think of) will be up to you to implement.
XML is a great format for transport, everybody speaks it.. but you do not want to use it for storage.
And if you already know XML, but not yet MySQL.. I would say you're ahead of the game. You'll probably find writing SQL queries and fetching the results more straightforward than working with xml data.
As I see - there are several XML Db solutions available - these appear in a simple google search:
http://exist-db.org/exist/index.xml;jsessionid=1dowedwdr9hsanbcvdcom8aka
http://basex.org/
http://www.oracle.com/technetwork/database/features/xmldb/index.html
http://www.sedna.org/
So all it matters here is the speed of development. If you're mostly familiar with XML - then using one of those could be a booster for development time.
However - there is plenty of relational DB ORM products - depending on the programming language, that leverage the most dev effort and make it easy to use a database for a web site. So if you don't have some specific needs for your web site, you might go with any of the options above.
It depends on the structure of your database. This question cann't give a definite answer without knowing anything about your data. Any comparison of XML versus a relational database depends heavily on which data you choose, and what type of operations you plan.
For example you want store, index, and query is more than million rows and each row has a lot of the same fields. That’s a simple and fixed structure and it’s the same for all records. It’s a perfect fit for a relational database and can be stored in a single table. Relational databases handles such fixed records very efficiently.
Well, there are two main questions here.
First, if you're going to use a database, you have a choice between an XML database and a relational database. The choice depends primarily on the nature of your data (especially its complexity, but also the way in which it is used).
Then you have the choice between using a database and using a simple file (for example an XML file). That choice depends primarily on the quantity of data and the transaction throughput.
Since you haven't told us much about the nature of the data or its quantity or the throughput requirements, it's hard to advise you specifically on either question.

What are some cons of storing html in a database for use?

Altough its very easy to do a search about the topic, it's not as easy to come to a conclusion. What are some cons of storing html in a database for use?
HTML is static, and querying the data from a database uses database resources; database resources are typically among the more restricted on moderate to heavy use systems, therefore it makes sense to not store HTML in the database, but to place it on the filesystem, where it can be retrieved without using critical resources.
In the broadest sense, HTML is a document markup language and serves to structure data into a document. The database on the other hand should contain raw data organized along its logical relations. Documents use formatting and may present data redundantly, but the true, underlying data is always fixed. Thus you should store the most immediate, raw form of data that you possibly can, and retrieve it in meaningful ways using both the query language itself to create suitable views for your purposes, and other, output-specific data processing to generate documents.
Of course you may like to cache the result of an output formatting operation, and you may choose to store the cache in a database, too. That's fine of course. But concerning the raw payload data, I would always go for the above.
That depends on the use of the HTML in the database. If it's data that you only ever access as a blob (meaning you never/rarely query the contents of the HTML), then I think it can be a good idea in some cases. Then the question is essentially the same as "Should I store files in xyz format in my database?" And the answer to questions like that depends on several things:
How large are the files? Would storing them on the filesystem, with just their filename/path in the DB be more efficient?
Do you need to replicate the data to other servers? If so, then storing raw files in the DB may be easier than on the FS, if you already have DB-sync infrastructure in place.
What are your query uses like? Are they friendlier to a DB or a file system storage?
Now, if you're talking about storing HTML data that you frequently have to query, that changes the game entirely.
Any database normalization nazi would tell you never to do it. But there might be cases when it's useful. For instance, if you're using some sort of full-text searching engine, you may want that in a database--or in whatever form the full-text search engine uses.

Loading XML "Cache" Versus Querying DB. The Drawbacks?

For a read-only application, I am currently storing data in a relational database, but rather than querying it via the app, I am doing a nightly write of the data, including its relationships, to an XML file.
Granted, it is not a a lot of data -- the XML represents less than 1000 objects.
Then, through client side code, I am loading that data, and "querying" it as necessary.
No write opertations are required -- the app's sole function is search and display.
I've developed the app in such a way that whether it queries the db or the loaded xml can be switched very easily, and so I am able to compare performance.
I find that e.g. full text search (as such) is instant, etc, with the loaded XML approach.
However, I know there are drawbacks to this approach, and I would greatly appreciate it if any of you could help me flesh out when and why this is or is not a valid approach.
Thanks in advance.
When you load XML into a good XML processing engine, it constructs appropriate data structures to speed up XPath queries or the tree traversal in general.
When you keep the data in a relational database and query it, the query optimizer builds a query plan which will access data in some optimized way too.
Which method best suits you completely depends on the nature of your queries.
Note that loading an XML document and parsing it on each client call may be quite expensive, and unless you use some kind of an application server which keeps the parsed XML tree in memory, a database query will most probably be a better way since with 1000 records your whole table will fit into the cache.
It sounds valid to me.
The drawbacks would come if the size of the data was prohibitively large and threatened your available memory, or if it was shared in such a way that thread safety was an issue.
But you say it's small and read-only, so it sounds fine to me. Keeping the data close to where it's needed is something that every hardware designer would understand.
You say it's stored as XML, but I assume you read the file once per day, parse and store it in an in-memory DOM object, and query it using XPath. Is the XPath performance adequate for your needs? That would be my only concern.
It all comes down to resource management. If you have the resources to run queries it is a better road because your data is "live" vs. having an XML file that is cached and then parsing it. If you are worried about performance unless you are querying tens of millions of rows of data I wouldn't be too concerned. We have a box with about 60+ clients that constantly run queries all day and the box can actually perform quite well with everything running. XML parsing can be more stressful to the server than a query most of the time.

DB or flat file?

Just one little question.
Shall i put the HTML textarea content into a DB or a flat file? The textarea content can be very long (like an article or a project).
Thanks for helping.
Silvio.
Use text type instead of string, varchar type in db.
Note:-Text fields in mysql are limited to 65kb
EDIT
OR You should look at using MySQL's LONGBLOB or LONGTEXT data type. They can store up to 4 gigabytes of binary or textual data, respectively.
Here are some pros and cons off the top of my head (I've done it both ways with varying levels of success):
Database Pros
Well known model for reading / writing data.
If the rest of your application is based on a database, this solution fits nicely.
The db has concurrency mechanisms already in place.
As long as you back up your database, your documents are backed up to (and in sync with the state of your database).
Database Cons
Theoretically, it is less efficient to pull a file from a database than directly from the file system. The db server has to read from disk, translate into its network protocol, etc.
The serving of these files from the db is a potential scalability bottleneck if only using one database server.
Flat File Pros
Dirt simple operations: Write, Delete, Read.
Presumably low overhead. If serving from web, the server can just be pointed to the files.
Flat File Cons
You have to deal with concurrency operations (what if one user wants to write to the file while another is reading, etc). This may or may not be an issue in your case.
It's one more type of information to back up / keep in sync / maintain.
You have to deal with security of the files as a separate issue from the rest of the data in the database.
Database without a doubt.
There's not much point in flat files because when you start expanding which you will do then you have to create more files and it's easier to lose track.
Start with a database and when you grow you can do things much faster and more complex selections by using structured query language the name says it all.