How to load HTML data into SQL Server (non-table format)? - html

I'm posting it here because I couldn't' find any such scenario on the web so far. I have a webpage which contains a set of reports both in XLS and PDF formats. I should be downloading the excel files from the page and load into my database. I wish I could use the URL for XLS file directly but the problem is the naming convention may keep changing every time (Sales_Quarter1.xlsx can be Sales_Q1.xlsx the next year). The only thing that would be constant in the following example is "Sales for Calendar Year". I should be looking up for the file that corresponds to this text and download it before loading it into database table.
I would like to know from experts if this would be possible?
<li>
<sub>Sales for Calendar Year 2015--All Countries </sub>
<a href="/Data/Downloads/Documents/Sales/Sales_Quarter1.xlsx">
<sub>[XLS]</sub></a><sub> , <sub>[PDF]</sub><sub>​</sub></sub>
</li>
PS: I am using SQL Server 2014.
Thanks!

Have a look at Integration Services. Create a package for both pulling the web page using a script task, along with a variable name that will represent your downloaded, local filenames for the html file and excel files (you will also have to parse the link out of the html file). Then utilize an Excel Source next in your package.
The variable name for the excel file used in the script task will need to be set to ReadWrite as well.
You can also schedule the resulting package execution via SQL Agent job, if you plan to run this on a reoccurring basis, placing logic into the script or the execution paths,

Related

SSIS power query source -> connection manager description

Since excel source has constatnt problems with truncating either numbers or texts - can't get it to work properly with mixed data in one column, i've figured out Powerquery source would be the answer.
I managed to import one file.
Now i try to iterate over all files in the folder.
Problem is in Description of Connection manager - can I somehow use wildcards for all files ? otherwise it crashes with error for incorrect credentials.
As of connection manager - no problem as i can use expressions to use variables
As far as I know Power Query still on preview, very limited compared with all the functions in the for example Power BI desktop version.
In your case do the query using Power BI Desktop, select new source > from Folder, do the transformations and the copy and paste the code in SSIS PQY. In that way you don't have to resort using wildcards in the SSIS flow to iterate over files in the same folder.

Ssis empty excel columns causing error

Using Microsoft Visual Studio Community 2015.
Goal of project
-create "*\temp\email" directory
-start program to extract all emails that include xls attachments to the previously created folder
-use for each loop to cycle through each file in the folder, process, and shift to sql table.
The problem I am running into is caused by either a blank excel document (which is occasionally sent from a remote location) or some of the original xls reports only contain 5 columns instead of 6 that I have mapped now. Is there any way to separate files that include the correct columns from those that do not match?
** as Long as these two problems do not exist I can run the ssis package and everything runs without issue.
Control flow;
File System Task (creates directory --->Execute Process Task (xls extraction)-->ForEach Loop(Data flow Task "email2Sql")
Data Flow;
Excel Source (uses expression ExcelFilePath,#user:filepath) delay validation ==true
(columns are initially set to f1-f6 and are mapped to for ex. a,b,c,d,e,f. The Older files that get mixed in only include a,b,c,d,e.) This is where I want to be able to separate the xls files
Conditional Transformation split (column names are not in row 1, this helps remove "null" values)
Ole Db destination (sql table)
Sorry for the amount of reading, but for the first post I tried to include anything that I thought may be relevant.
There are some tools out there which would allow you to open the excel doc and read it. However, I think the simplest thing to do would be to use SSIS out of the box:
1 - add a file system task after the data flow which reads the file.
2 - Make the precedence constraint from the data flow to the file system task "failure." This will cause that to only fire when the data flow task fails.
3 - set the file task to move the "bad" files to another folder
This will allow you to loop through all the files and move the failed ones. Ultimately, the package will end in failure. If you don't want that behavior you can change the ForceExecutionResult property to be success. However, it might be good to know that there were problems with some files so that they can be addressed.
m

adding text to an xls file with bash script

I'm trying to understand if it's possible to write to an xls file with a bash script. Situation is outlined below.
I have a cronjob that runs every monday and generates an xls and emails to my client. This xls is filled with data from a MySQL DB. when the report is empty and the client attempts to open it, it shows as corrupt. Originally I addressed this issue by excluding empty files from the email with an if statement. However, the constraint is that all 4 reports much reach the client - empty or not.
So my question is, can I simply add a row of text at the top with a bash script so the file never "empty"? I'm not an expert in bash scripting by any means, so feedback here would be great. thanks!
Tony
I'm not aware of any pure bash implementation for writing XLS files. There are solutions in other languages such as Perl, Python, or PHP. If you think outside the box there is another option available to you. You mentioned that you currently use an if statement to not attach empty files. Create a blank spreadsheet in a program like MS Excel, optionally enter some text in A1 like "No records", save it, and transfer that to a known location on the server that runs the cronjob. Rather that skipping the attachment, whenever you detect an empty file in your if statement just attach the blank "No records" template XLS file. You may need to copy the template to a temporary location before attaching if you need to rename the file.

How can I add file locations to a database after they are uploaded using a Perl CGI script?

I have a CGI program I have written using Perl. One of its functions is to upload pics to the server.
All of it is working well, including adding all kinds of info to a MySQL db. My question is: How can I get the uploaded pic files location and names added to the db?
I would rather that instead of changing the script to actually upload the pics to the db. I have heard horror stories of uploading binary files to databases.
Since I am new to all of this, I am at a loss. Have tried doing some research and web searches for 3 weeks now with no luck. Any suggestions or answers would be greatly appreciated. I would really hate to have to manually add all the locations/names to the db.
I am using: a Perl CGI script, MySQL db, Linux server and the files are being uploaded to the server. I AM NOT looking to add the actual files to the db. Just their location(s).
It sounds like you have your method complete where you take the upload, make it a string and toss it unto mysql similar to reading file in as a string. However since your given a filehandle versus a filename to read by CGI. You are wondering where that file actually is.
If your using CGI.pm, the upload, uploadInfo, the param for the upload, and upload private files will help you deal with the upload file sources. Where they are stashed after the remote client and the CGI are done isn't permanent usually and a minimum is volatile.
You've got a bunch of uploaded files that need to be added to the db? Should be trivial to dash off a one-off script to loop through all the files and insert the details into the DB. If they're all in one spot, then a simple opendir()/readdir() type loop would catch them all, otherwise you can make a list of file paths to loop over and loop over that.
If you've talking about recording new uploads in the server, then it would be something along these lines:
user uploads file to server
script extracts any wanted/needed info from the file (name, size, mime-type, checksums, etc...)
start database transaction
insert file info into database
retrieve ID of new record
move uploaded file to final resting place, using the ID as its filename
if everything goes file, commit the transaction
Using the ID as the filename solves the worries of filename collisions and new uploads overwriting previous ones. And if you store the uploads somewhere outside of the site's webroot, then the only access to the files will be via your scripts, providing you with complete control over downloads.

How to extract data from excel file into the database in Mysql at runtime in asp.net?

I am creating a website...and i want to give a liberty to users, to upload the data of excel file and then i want to save that excel data inside mysql database on runtime...
kindly help me in performing this task...
you can mail me at...."amiteshsinha09#rediffmail.com"
thank you
Amitesh
You can query the data in the excel sheet using Open Xml
Using this instead of running Excel via interop is both faster, more stable and saves you licence costs.