Should you zip files when saving blobs to SQL? - json

I have JSON file that I want to save as a blob to Microsoft SQL Server.
The pros for zipping is saving space, the cons is the readability that getting lost.
I want to know if T-SQL has any optimization in which it zips the blobs on its own. I know that columnar databases work this way, like Vertica or Postgres for example.

I personally would not compress them if I wanted to be able to search by them. I do not believe it compresses a blob on it's own. I know for a fact even just very large VARCHAR columns do not compress on their own, so I would not expect a blob to. However there is built in compression you can turn on:
https://blogs.msdn.microsoft.com/sqlserverstorageengine/2015/12/08/built-in-functions-for-compressiondecompression-in-sql-server-2016/
https://learn.microsoft.com/en-us/sql/relational-databases/data-compression/enable-compression-on-a-table-or-index?view=sql-server-2017
There are some advantages to it but usually at the cost of CPU. So if I were you, I'd probably not zip up the files to put in SQL, but I might compress the tables I store. It would depend on exactly what the data was, json probably gets a lot of space back on compression, but a .jpeg would not.
An option I have done in the past is to simply store my files on a content server somewhere, and store in SQL the meta data about the file (name, tags, patch to where I stored it, file extension, etc.) That way my data is easy to get at/put there and I simply use SQL to look it up. Additionally it has allowed me when it was large text files to also use Lucene indexes from solr to make a full text searchable solution since the data wasn't stuffed into a SQL table. Just an idea! :)
One more thought, if I were to store big json files into SQL I would probably choose VARCHAR(MAX) or NVARCHAR(MAX) as my datatype. Anytime I have tried to use TEXT, IMAGE, etc. I would later run into some kind of SQL error if I tried to do a tricky query. I believe Microsoft is trying to use VARCHAR(MAX) to replace the blob type of data types and is slowly deprecating them.

Related

How to read a csv file from database?

I have uploaded a csv file into database as a byte[] data, using hibernate and spring, I have 3 columns in my database (id, file name and byte[] data). I would like to display the 3rd column which is a csv file, into a front page. Any help?
From the database perspective, this is nothing more than querying the table by ID and reading the BLOB from the database.
The tricky thing is that unformatted BLOBs/CLOBs are usually read using streams of some sort, and rarely are available as straight columns. Depends on your database implementation, hibernate can sometimes hide the complexity, but something to watch out for.
In terms of displaying it, it's a matter of using the returned BLOB and throwing it into something to force it to display. A quick search found this, but this is really the more difficult part of the problem, the database side can be fairly boring here.

Can Lucene, Sphinx (or any other engine) index binary data?

I already have a Sql Server 2008 based app in production, where am using Full text search by storing the binary (along with the file extension). Which means the same column can store doc, xls, pdf, docx... etc. I went for that approach (knowing it would be insert costly) because i have varied files which can be uploaded and I don't want to run into madness of converting text from various types (xls, xlsx, doc, docx, pdf etc) of files. Also i am not aware of any free tools which can do that for me. I don't want to use filesystem as that would be insecure and maintenance will be costly.
Now am looking for the ease (or difficulty) to move to mysql. Do have some options of full text search in mysql For ex: MySql Full text
search (which does not index binary), Sphinx and Solr.
I found this Question, which is kind of closest to what i need... Although i guess Sphinx doesn't index binary data... However, by using SphinxSE i can query the mysql tables and Sphinx to get related resultset (in the same connection). I hope that understanding is correct. But am not sure of the performance. Can someone add more insight?
Of what i have heard... Integrating Lucene with Mysql is difficult.
My need is to fetch ranked results based on criterion which can be structured (stored in RDBMS) and unstructured (textual dats which
shall be indexed).
Also, is there any other option which looks like more suitable in my given situation.
Have a look at ElasticSearch (uses lucene under the hood like Solr) I think it may do what you require I haven't needed document indexing though so not tried it.
See here though for more information
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-attachment-type.html
It uses Apache Tika to convert the documents to indexable content (same as SQL server does with IFilter plugins)

How to convert data stored in XML files into a relational database (MySQL)?

I have a few XML files containing data for a research project which I need to run some statistics on. The amount of data is close to 100GB.
The structure is not so complex (could be mapped to perhaps 10 tables in a relational model), and given the nature of the problem, this data will never be updated again, I only need it available in a place where it's easy to run queries on.
I've read about XML databases, and the possibility of running XPATH-style queries on it, but I never used them and I'm not so comfortable with it. Having the data in a relational database would be my preferred choice.
So, I'm looking for a way to covert the data stored in XML into a relational database (think of a big .sql file similar to the one generated by mysqldump, but anything else would do).
The ultimate goal is to be able to run SQL queries for crunching the data.
After some research I'm almost convinced I have to write it on my own.
But I feel this is a common problem, and therefore there should be a tool which already does that.
So, do you know of any tool that would transform XML data into a relational database?
PS1:
My idea would be something like (it can work differently, but just to make sure you get my point):
Analyse the data structure (based on the XML themselves, or on a XSD)
Build the relational database (tables, keys) based on that structure
Generate SQL statements to create the database
Generate SQL statements to create fill in the data
PS2:
I've seen some posts here in SO but still I couldn't find a solution.
Microsoft's "Xml Bulk Load" tool seems to do something in that direction, but I don't have a MS SQL Server.
Databases are not the only way to search data. I can highly recommend Apache Solr
Strategies to Implement search on XML file
Keep your raw data as XML and search it using the Solr index
Importing XML files of the right format into a MySql database is easy:
https://dev.mysql.com/doc/refman/5.6/en/load-xml.html
This means, you typically have to transform your XML data into that kind of format. How you do this depends on the complexity of the transformation, what programming languages you know, and if you want to use XSLT (which is most probably a good idea).
From your former answers it seems you know Python, so http://xmlsoft.org/XSLT/python.html may be the right thing for you to start with.
Take a look at StAX instead of XSD for analyzing/extraction of data. It's stream based and can deal with huge XML files.
If you feel comfortable with Perl, I've had pretty good luck with XML::Twig module for processing really big XML files.
Basically, all you need is to setup few twig handlers and import your data into MySQL using DBI/DBD::mysql.
There is pretty good example on xmltwig.org.
If you comfortable with commercial products, you might want to have a look at Data Wizard for MySQL by the SQL Maestro Group.
This application is targeted especially at exporting and, of course, importing data from/ to MySQL databases. This also includes XML import. You can download a 30-day trial to check if this is what you are looking for.
I have to admit that I did not use the MySQL product line from them yet, but I had a good user experience with their Firebird Maestro and SQLite Maestro products.

Storing image in image column of SQL Server. Is it beneficial than storing image in folder on website

I have to display images on website and I can store image in the folder on my website and also I can store the image in image column of SQL Server.
So which way of storing image is better : in folder or in Image column of SQL Server.
1. Which way of storing image and retrieving it is faster
With SQL Server 2008, while you can store BLOB data, it's best to avoid it. I've done it in the past, grudgingly, and it did have negative performance implications. Unless you have some constraint which prevents it, use the file system. That's what it's built for, and it's much faster.
As #Martin Smith pointed out you could use FileStream. We started storing our files using FileStream so that we could also add full-text indexing and allow the users to not only search the data, but the files on our site. It is also nice because we can easily move our files along with teh Database to other environments (Dev, Test).
nice file stream Article: Here
Also, please use varbinary(max) if you are going to store in the DB. The image column is going to be deprecated in future versions.
--S

Storing Base64 PNG in MySQL

I am using Sencha Touch to capture data from a user on an iPad. This includes a standard form (name, email, etc.) as well as the customer's signature (see the plugin here).
Essentially, the plugin takes the coordinates from the user's signature and gives me back Base64 PNG data.
Once I have the signature data, I want to store it. My two questions are:
Should I store the Base64 data in my
(MySQL) database along with the rest
of the user's information, or should
I create a static file and link as
necessary?
If storing in the
database is the way to go, what data
type should I use?
There's no need to base64 encode the image. MySQL's perfectly capable of storing binary data. Just make sure you use a 'blob' field type, and not 'text'. text fields are subject to character set translation, which could trash your .png data. blob fields are not translated.
As well, base64 encoding increases the size of text by around 35%, so you'd be wasting a large chunk of space for no benefit whatsoever.
However, it's generally a bad idea to store images in the database. You do have the advantage of the image being "right there" always, but makes for absolutely huge dumps at backup time and all kinds of fun trying to get the image out and displayed in your app/web page.
it's invariably better to store it externally in a file named after the record's primary key for ease of access/verfication.
Just save files in BLOB field. Such PNG file shouldn't be larger than 1KB if you turn some optimizations (grayscale or B/W).
Storing files outside DB seems easy but there are things to consider:
backup,
additional replication if multi-server
security - access rights to files dir, but also to files,
no transactions - e.g. DB insert ok but file write fails,
need to distribute files within multiple directories to avoid large dir listings (depends on filesystem capabilities)
Blob will store Base64. It will get you what you need. Storing it in the database gives you built in relational capabilities that you would have to code yourself if you stored it in a static file. Hope this helps. Good luck sir.
Edit: mark's right about binary v. base 64
Set your field as Blob data type, it stores perfectly base64EncodedString