Data dump filetype for not-yet-existant SQL database - mysql

A friend wants to start scraping data for a data-heavy site he wants me to try to build. I'm a (relatively new) Rails developer and don't know much about the data side of all this. If he's contracting out the scraping, any idea what sort of format can/should I get the data in to easily import it into a PostgreSQL database once I get the site started up?
Hope this isn't too vague a question. I don't know where to start looking for this.

CSV file format is compatible with almost any database systems and it is quite a good starter. Even, if you change your mind later, as for what database system you'll use, you don't have to worry too much about changing the format.
If you thinking about data mining, then probably NoSQL database systems can be a better solution (MongoDB, CouchDB, etc.). Then, then file format can be JSON as well.

Related

Ruby on Rails - Database or excel

I am currently doing a project in Ruby on Rails and I have been presented with a dilemma.
The dilemma is that the users of my system will be uploading an excel spreadsheet. The issue is should I just read straight from this excel spreadsheet into my front-end or should I load this spreadsheet into my MySQL database and then to my front-end.
I have asked numerous people about this issue and have researched on-line to no avail.
Any help would be much appreciated.
The Excel file is not a database. If you need to allow it as source input, parse it, copy the data into a real database and connect to it.
The database is more flexible and efficient for querying and processing information.
I can think of two benefits, or rather options, of having them upload the excel spreadsheet for processing by your back end.
1) would be for your tracking purposes (who sent what and here is what the back-end did with it...). In fact consider that other formats/versions could be introduced, would it be important to keep them to identify what went wrong? "How can we handle this new format"?
2) On the other side, the front-end way that is, you offload processing from the back-end, but that means that the browser app could get fairly complex and depending on your excel, that is if it has many relationships, sending that data up to the server could be complex. However if is simply a flat spreadsheet, say simple rows without totals/tax calc/..., then it might be an advantage of loading it into the browser and then sending these rows up to the server if offloading processing is of any importance.
However point 2 really is diluted by point 1, which to me would be of greater importance for future migration of this service. So I personally would choose uploading it and processing on the back end.
Update
As you clarified in the comments, if you are asking about the use of Excel on the backend as a database? I would agree with Simone Carletti's answer here. Maybe just add a real database gives you much more flexibility, more tools and, more performance. This difference is loading a file, parsing it into some structure, then saving it (unless you are using some .NET framework and even if, the Database (MySQL, MongoDB...) would give you much more flexibility in structuring and querying, over the headache of managing with the speed of DB connections. You might just want to write a sample in both to evaluate, the DB solution will probably win you over.

Best File format to import data into remote MySQL from a local software

Let me clarify what I am exactly looking for. I am building a dynamic website for a school on php mysql platform which will require certain data to be updated from the local offline software that the school uses. I am in touch with the software provider and the software provider is ready to provide the data in whatever format is required by me.
So what would be the best format, the easiest, fastest and more reliable. Would it be csv, excel, etc. Mostly I will need the student details, grades etc from the local database. What else will I require to take care of in case any of you have done a similar project?
Looking forward to replies.
CSV is the easiest to handle. You don't even need to process it in PHP if you can get it in a format that MySQL can use with LOAD DATA INFILE if you have access to the database server itself.
Excel is not at all easy to import and can be mangled in all kinds of ways that aren't obvious. It's best avoided unless you absolutely need to support it.
If you're building a new application, I really hope you use a popular PHP framework as a foundation. Some of these include modules for handling CSV data, saving you the trouble of having to do that yourself.

Database development questions MySQL

I need help from you experts about practices regarding database development. I have a few questions regarding MySQL databases:
Is there a way for MySQL that a database and its structure is developed in an XML language and then converted to a fully functional MySQL database?
Is it possible to generate the XML source file from question 1 (see above) based on an existing database in MySQL ?
As far as I know, XML is not suitable for developing database structures. However can we say that XML is a language to demonstrate hierarchical structures and a MySQL database also shows a hierarchical structure, so in fact it is suitable for database development?
Thank you very much!
You can certainly store XML data in MySQL. You can also use any number of approaches to converted hierachical XML data into individual relational database field representations.
I would however say that if you just want to work with intact XML documents, you might look to go the NoSQL route, which is really better suited for this type of data storage. You also might consider JSON as the format for storage as it is more concise (saves space and transmissions badnwidth) and is more aligned with the popular NoSQL data stores out there.
1) yeah there is a way, but you should check out mongodb if you want a dynamic database structure, it was developed with that in mind. also, unless you need the rss features of xml or something similar, you might want to consider using json as a format for you documents.
2) json and mongodb work very well together to quickly and easily get documents in and out of the db. you can technically do it in mysql as well, but you might spend more time scripting in php or ruby to get the desired format you want.
3) you could use xml to demonstrate your db structure because of it's loose structure, but i'm not sure it would be intuitively clear to others. hard to say, really depends on how you implement it and how complicated your db structure is going to be.

How to convert data stored in XML files into a relational database (MySQL)?

I have a few XML files containing data for a research project which I need to run some statistics on. The amount of data is close to 100GB.
The structure is not so complex (could be mapped to perhaps 10 tables in a relational model), and given the nature of the problem, this data will never be updated again, I only need it available in a place where it's easy to run queries on.
I've read about XML databases, and the possibility of running XPATH-style queries on it, but I never used them and I'm not so comfortable with it. Having the data in a relational database would be my preferred choice.
So, I'm looking for a way to covert the data stored in XML into a relational database (think of a big .sql file similar to the one generated by mysqldump, but anything else would do).
The ultimate goal is to be able to run SQL queries for crunching the data.
After some research I'm almost convinced I have to write it on my own.
But I feel this is a common problem, and therefore there should be a tool which already does that.
So, do you know of any tool that would transform XML data into a relational database?
PS1:
My idea would be something like (it can work differently, but just to make sure you get my point):
Analyse the data structure (based on the XML themselves, or on a XSD)
Build the relational database (tables, keys) based on that structure
Generate SQL statements to create the database
Generate SQL statements to create fill in the data
PS2:
I've seen some posts here in SO but still I couldn't find a solution.
Microsoft's "Xml Bulk Load" tool seems to do something in that direction, but I don't have a MS SQL Server.
Databases are not the only way to search data. I can highly recommend Apache Solr
Strategies to Implement search on XML file
Keep your raw data as XML and search it using the Solr index
Importing XML files of the right format into a MySql database is easy:
https://dev.mysql.com/doc/refman/5.6/en/load-xml.html
This means, you typically have to transform your XML data into that kind of format. How you do this depends on the complexity of the transformation, what programming languages you know, and if you want to use XSLT (which is most probably a good idea).
From your former answers it seems you know Python, so http://xmlsoft.org/XSLT/python.html may be the right thing for you to start with.
Take a look at StAX instead of XSD for analyzing/extraction of data. It's stream based and can deal with huge XML files.
If you feel comfortable with Perl, I've had pretty good luck with XML::Twig module for processing really big XML files.
Basically, all you need is to setup few twig handlers and import your data into MySQL using DBI/DBD::mysql.
There is pretty good example on xmltwig.org.
If you comfortable with commercial products, you might want to have a look at Data Wizard for MySQL by the SQL Maestro Group.
This application is targeted especially at exporting and, of course, importing data from/ to MySQL databases. This also includes XML import. You can download a 30-day trial to check if this is what you are looking for.
I have to admit that I did not use the MySQL product line from them yet, but I had a good user experience with their Firebird Maestro and SQLite Maestro products.

filemaker pro export and import to mysql via php

could anyone advise me direct me to a site that explains the best way to go about this I'm sure I could figure it out with allot of time invested but just looking for a jump start. I don't want to use the migration tool either as I just want to put fmp xml files on the server and it create new MySql databases based on the fmpxml results provided
thanks
Technically you can write a XSLT to transform the XML files into SQL. It's pretty much straightforward for data (except data in container fields), but with some effort you can even transfer the scheme from DDR reports (but I doubt it worth it for a single project).
Which version of MySQL? v6 has LOAD XML which will make things easy for you.
If not v6, then you are dealing with stored procedures, which can be a pain. If you need v5, it might make sense to install MySQL6, get the data in there using LOAD XML, and then do a mysqldump, which you can import into v5.
Here is a good link:
http://dev.mysql.com/tech-resources/articles/xml-in-mysql5.1-6.0.html