How to retrieve original pdf stored as MySQL mediumblob? - mysql

A table containing almost four thousand records includes a mediumblob field for each record that contains the record's associated PDF report. Under both MySQL Workbench and phpMyAdmin the relevant DOCUMENT column displays the data as a BLOB button or link. In the case of phpMyAdmin the link also indicates the size of the data the Blob contains.
The issue is that when the Blob button/link is clicked, under MySQL Workbench opening any of the files using the SQL Editor only displays the raw Blob data and under phpMyAdmin th link only allows the Blob data to be saved as a .bin file instead of displaying or saving the data as a viewable PDF file. All previous attempts to retrieve the original PDFs using PHP have failed - see related earlier thread: Extract Pdf from MySql Dump Saved as Text.
The filename field in the table shows that all the stored files are PDF files. Further research and tests indicate that the mediumblob data has been stored as application/octet-streams.
My question is how can the original PDFs be retrieved as readable PDFs? Is it possible for a .bin file saved from the database to be converted or used to recover the original PDF file?
Any assistance would be greatly appreciated.

In line with my assumption and Isaac's suggestion the only solution was to be able to speak to one of the software developers. It transpires that the documents have been zipped using an third-party library as well as the header being removed before then being stored in the database.
The third-party library used is version 2.0.50727 of Chilkat, available from www.chilkatsoft.com. That version no longer appears to be available, but hopefully at least one of the later versions may do the job.
Thanks again for everyone's input and assistance.

Based on the discussion in the comments, it sounds like you'll need to either refer to the original source code or consult with the original developer to determine exactly how the data was stored.
Using phpMyAdmin to download the mediumblob data as a file will download a .bin file in many cases, I actually don't recall how it determines content type (for instance, a PNG file will download with a .png extension, but most other binary files simply download as a .bin when phpMyAdmin isn't sure what the extension should be, PDF included). So the behavior you're seeing from phpMyAdmin is expected and correct, but since the .bin file doesn't work when it's renamed to .pdf that means something has probably gone wrong with the import and upload.
BLOB data is often stored in a pretty standardized way, but it seems your data doesn't follow that method.
Without us seeing the code directly, we can't guess what exactly happened with storing the data and would only be guessing.

Related

Azure Synapse Dedicated Pool COPY INTO function fails due to base64 encode image in CSV file

I am using Azure Synapse Link for Dynamics 365. It automatically exports data from Dynamics 365 in CSV format into blob storage/data lake. I use the COPY INTO function to load the data into a Dedicated Pool instance. However, the contact model has recently started failing.
I investigated the issue and found that the cause was due to a field that has an image encoded as text. I only copy selected fields from the CSV files and this is not one of them, but it still causes the copy to fail. I manually updated the CSV file to exclude this data from the one row where it was found and it worked fine.
The error message associated with the error is:
The column is too long in the data file for row 1328, column 32.
This is supposed to be an automated process so I do not want to be manually editing CSV files when this occurs. Are there any parameters that I can add to the COPY INTO function to prevent this error? I tried using MAXERRORS but that made no difference.
The only other thing that I could think of is to write a script (maybe an Azure Function?) that checks the file for this issue and corrects it. Maybe there is a simpler approach though?

Tableau isn't converting my csv data source to tables

When I import a csv to Tableau, it gets the same format of the original csv file (a single column with every label on it). How can I make Tableau separate the columns based on commas?
I can't see why this is happening, since in every tutorial I checked Tableau already converts the .csv to a tabular format.
Here's what I get
Note: I'm using Tableau's trial version.
Sometimes when you open a csv in Excel it can mess with the formatting like your image shows. If you think you opened it in Excel before connecting, try downloading your source again and connecting first with Tableau. If that doesn't work, I happen to have this dataset in a .tde if you would like to use that. vgsales.tde
Edit: Thinking regional settings might be a factor.
Click the dropdown on the right of the data source. Select Text File Properties
To get this window:
Can you match these settings?

Creating a CSV file with the Report Generation Toolkit in Labview

I want to create .csv files with the Report Generation Toolkit in Labview.
They must actually be .csv files which can be opened with Notepad or something similar.
Creating a .csv is not that hard, it's just a matter of adding the extension to the file name that's going to be created.
If I create a .csv file this way it opens nicely in excel just the way it should, but if I open it in Notepad it shows all kind of characters and it doesn't even come close to the data I wrote to the file.
I create the files with the Labview code below:
Link to image (can't post image yet because I've got to few points)
I know .csv files can be created with the Write to Spreadsheet VI but I would like to use the Report Generation Toolkit because it's pretty easy to add columns and rows to the file and that is something I really need.
you can use the Robust CSV package on the lavag.org forum to read and write 2D arrays to CSV files.
http://lavag.org/files/file/239-robust-csv/
Calling a file "csv" does not make it a CSV file. I never used the toolkit to generate an Excel file, but I'm assuming it creates an XLS or XLSX file, regardless of what extension you give it, which is why you're seeing gibberish (probably XLS, since it's been around for a while and I believe XLSX is XML, not binary).
I'm not sure what your problem is with the write spreadsheet VI. It has an append input, so I assume you can use that to at least add rows directly to a file, although I can't say I ever tried it. I would prefer handling all the data in memory explicitly, where you can easily use the array functions to add rows or columns to the array and then overwrite the entire file.

Streaming CSV to browser

Busy building a website for a client using classic ASP (It will reside on an old server) which is going to be used internally only.
The admin is able to view a paginated table of data and export this to CSV. This works fine when I save the CSV data to a CSV file but I have now been asked to try avoid the creation of the file if possibly and create the CSV in memory without the need for a file.
I have my doubts that this is possible but might be completely wrong. Is there anyway to send the CSV data to the browser such that it will open in Excel rather than having to create a CSV file and link to it as I am currently doing ?
TIA
John
Response.ContentType = "text/csv" will help you here. In the past I've paired that with a rewrite rule so that the URL is something like foo.com/example.csv but there are plenty of other ideas to be found in the following question: Response Content type as CSV

MS Office no longer works as BLOB

Hi does anyone know why MS Office such as doc, docx and xls can no longer be viewed when retrieved from a mysql db when stored as Blob?
The doc and docx used to download and open without any problem, but now it no longer recognises the file format.
I'd like to ditto your problem. Images and plain text files upload/download from mysql blob field. Doc and docx files seemed to be corrupted. I've read somewhere of a rumor of mysql truncating the last 4 bits but I can't verify that.
I have used xvi32 (a hex editor) to compare local originals of files with versions dowloaded from BLOB/LONGBLOB fields. It seems that extra bytes, which I think represent a CRLF are appended, as far as I can work out by Windows when the file is written. This doesn't seem to be a problem for some graphic formats which are to some extent fault-tolerant, but the office XML format files are corrupted by this extra data.
I have tried using ob_clean() and ob_flush() [that is, in php] before printing/echoing the file contents, but still corrupted as far as Office is concerned.
I know this is an old thread but I would appreciate any solutions anyone might have found since it was last updated.
Did you try with a short txt file instead of .doc and see if the contents are different than what you expected?