Copy/Move files in PDI / Spoon yields 'is not a file' error - pdi

I am trying to automate weekly generation of a database. As a first step in this process, I need to obtain a set of files from network location M:\. The process is as follows:
Delete any possibly remaining old source files from my local folder (REMOVE_OLD_FILES).
Obtain the names of the required files using regular expressions (GET_FILES).
Copy the files from the network location to my local folder for further processing (COPY/MOVE FILES)
Step 3 is where I run into trouble, I frequently receive the below error:
Error processing files. Exception : org.apache.commons.vfs.FileNotFoundException: Could not read from "file:///M:/FILESOURCE/FILENAME.zip" because it is a not a file.
However, when I manually locatae the 'erroneous' file on the network location and try to open or copy it, there are no problems. If I then re-run the Spoon job, no errors occur for this file (although the next file might lead to an error).
So far, I have verified that steps 1 and 2 run correctly: more specifically, there are no errors in the file names returned from step 2.
Obviously, I would prefer not having to manually open all the files first to ensure that Spoon can correctly copy them. Does anyone have an idea what might be causing this behaviour?
For completeness, below are the parameters selected in the COPY/MOVE FILES step.

I was facing same issue with different clients and finally i tried with some basic approach and it got resolved. It might help in your case as well.
Also, other users can follow this rule.
Just try this: Create all required folder with Spoon Job "Create a Folder" and inactive/delete those hops from your job or transformation once your folders are created.
This is because, user you are using to delete the file/s is not recognized as Windows User. Once your folder is in place you can remove "Create a Folder" steps from your Job.

The path to the file is wrong. If you are running spoon in a Windows environment you should use the Windows format for filepaths. Try changing from
"file:///M:/FILESOURCE/FILENAME.zip"
To
"M:\FILESOURCE\FILENAME.zip"
By the way, it will only work if M: is an actual drive in the machine. If you want to access a file in the network you should use the network path to the shared folder, this way:
"\\MachineName\M$\FILESOURCE\FILENAME.zip"
or
"\\MachineName\FILESOURCE\FILENAME.zip"
If you try to access a file in a network mounted drive it won't work.

Related

Managing a large SPSS (*.sav) file (4.2 GB)

I have received an SPSS file from survey fielded by another company that allegedly only contains ~1500 respondents, but the file size somehow has ballooned 4.2GB. My hunch is that the reason for this is that the file was from a global survey and the 1500 records that have been selected are from the US only so there are a series of blank variables, metadata for those variables that are included in this file and may also be in multiple languages/alphabets.
I only need a subset of this data, and can likely work with it if I removed the metadata but my issue has been that I can't get the damn thing open to cut down on the number of variables. I have been using the tools at my disposal to try the following workarounds, though I'm sure there are better options:
Opening the file using PSPP (freeware SPSS) - this causes the PSPP to stop responding
Using the R command read.spss (from the foreign package) to write a .csv - this claims that the file has a duplicate variable name and won't proceed further
Using the R command spss.system.file to write a .csv - when I tried this, R has spend a lot of time thinking as it as it attempts to run this and has been running for a couple hours with no apparent success.
Using the PSPP text conversion tool (https://pspp.benpfaff.org/) to create either a dictionary or a .csv file - both of these options crash after the file has completed uploading.
I've gone back to the other company to try have them work on reducing the file size, however I wasn't sure if anyone else had any ideas to do either of the following:
Open the file using another program/converter that could turn it into a .csv or other similarly skinny file format
Use another program to at least read only the variable names included in the file so that I can provide the other company with the specific variables I need
The following command from PSPP should do what you need:
$ pspp-convert originalFile.sav output.csv
In case it doesn't, please provide terminal error message.

The connection "C:\\<path>\\*.txt" is not found. This error is thrown by Connections collection when the specific conn element is not found

I developed a SSIS package that creates several .txt files. These files are zipped and then the .txt files need to be removed. Using a foreach file enumerator, I loop through all the .txt files for a specific folder. The folder is retrieved from a variable in configuration and looks something like: C:\Folder\
The foreach loop uses: *.txt to gather all .txt files, does not traverse subfolder and uses the full qualified name.
In the Variable Mappings the "FileName" variable gets filled with the 0 index.
Within the foreachloop I use a File system task.
This task removes the .txt files which are generated before, using the FileName variable that is filled in the loop.
On the development machine this runs like a charm. All greens, no problem at all. Now I copy the package and the configuration file to the test environment. A basic version without the file removing was running perfectly fine here. I replaced the package. Nothing big.
Now I run the SQl Server Agent Job and it starts running. I can see all the text files appearing, and disappearing after it created the zipfiles. However, when all files are removed the package results with errors. Namely the error shown above in the title.
I tried looking for the connectionmanager that might have been removed
Looked for connection managers named in the config that don't exist in the package.
No such thing found. Annoying part is that the package is fully functioning, but still results with the error.
EDIT: I noticed that if I run the package using the execute package utility with the dev. config it gives the same errors.
Hopefully someone is able to help me out.
Thanks in advance!
I managed to "fix" the issue. Remove the File System Component responsible for deleting the files. Then add it again and configure it again.
I think this happens if you accidentally change General parameters before changing the Operation parameter. It holds the metadata to irrelevant parameters and upon execution says: "Wait, you defined this parameter but I don't need it, but I'm checking for it anyway, and it's not there!"
It's a bug for sure

Zip the contents of a folder in SSIS

I am trying to zip the contents of a Folder in SSIS, there are files and folders in the source folder and I need to zip them all individually. I can get the files to zip fine my problem is the folders.
I have to use 7.zip to create the zipped packages.
Can anyone point me to a good tutorial. I haven't been able to implement any of the samples that I have found.
Thanks
This is how I have configured it.
Its easy to configure but the trick is in constructing the Arguments. Though you see the Arguments as static in the screenshot, its actually coming from a variable and that variable is set in the Arguments expression of Execute Process Task.
I presume you will have this Execute Process task in a For Each File Ennumerator with Traverse SubFolders checked.
Once you have this basic setup in place, all you need to do is work on building the arguments to do the zipping, how you want them. A good place to find all the command line arguments is here.
Finally, the only issue I ran into was not providing a working directory in the command line arguments for 7zip. The package used to run fine on my dev environment but used to fail when running on the server via a SQL job. This was because 7zip didn't have access to the 'Temp' folder on the SQL Server, which it uses by default as the 'working directory'. I got round this problem by specifying the 'working directory as follows at the end of the command line arguments, using the -ws switch:
For e.g:
a -t7z DestinationFile.7z SourceFile -wS:YourTempDirectoryToWhichTheSQLAgentHasRights

How can I add file locations to a database after they are uploaded using a Perl CGI script?

I have a CGI program I have written using Perl. One of its functions is to upload pics to the server.
All of it is working well, including adding all kinds of info to a MySQL db. My question is: How can I get the uploaded pic files location and names added to the db?
I would rather that instead of changing the script to actually upload the pics to the db. I have heard horror stories of uploading binary files to databases.
Since I am new to all of this, I am at a loss. Have tried doing some research and web searches for 3 weeks now with no luck. Any suggestions or answers would be greatly appreciated. I would really hate to have to manually add all the locations/names to the db.
I am using: a Perl CGI script, MySQL db, Linux server and the files are being uploaded to the server. I AM NOT looking to add the actual files to the db. Just their location(s).
It sounds like you have your method complete where you take the upload, make it a string and toss it unto mysql similar to reading file in as a string. However since your given a filehandle versus a filename to read by CGI. You are wondering where that file actually is.
If your using CGI.pm, the upload, uploadInfo, the param for the upload, and upload private files will help you deal with the upload file sources. Where they are stashed after the remote client and the CGI are done isn't permanent usually and a minimum is volatile.
You've got a bunch of uploaded files that need to be added to the db? Should be trivial to dash off a one-off script to loop through all the files and insert the details into the DB. If they're all in one spot, then a simple opendir()/readdir() type loop would catch them all, otherwise you can make a list of file paths to loop over and loop over that.
If you've talking about recording new uploads in the server, then it would be something along these lines:
user uploads file to server
script extracts any wanted/needed info from the file (name, size, mime-type, checksums, etc...)
start database transaction
insert file info into database
retrieve ID of new record
move uploaded file to final resting place, using the ID as its filename
if everything goes file, commit the transaction
Using the ID as the filename solves the worries of filename collisions and new uploads overwriting previous ones. And if you store the uploads somewhere outside of the site's webroot, then the only access to the files will be via your scripts, providing you with complete control over downloads.

Why doesn't SSIS ftp task receive file?

I'm running an FTP task inside of SSIS to receive a file and the task executes successfully yet no file is returned to the local folder that I specified. Where did the file go? How can I make the FTP task download a file to the location that I need it at?
Make sure you have the FTP connection Manager and File Manager set up, and other stuff correct as per the list here (MSDN)
operation
"OverwriteFileAtDestination"
IsASCIITransfer
I've had this problem when I first used an FTP task, but can't recall exactly what I had to do. Double check every setting and also make sure a plain old command line FTP works too to ensure it's only an SSIS issue
Is it possible you have an expression or variable set up that puts the file in another folder? This might explain why the file is not where you expect it.
Also, check to make certain that you're not pulling from a UNIX/Linux box, as this is a known issue.