Open a CSV file from S3 using Roo on Heroku - csv

I'm putting together a rake task to grab a CSV file stored on S3 and do some processing on it. I use Roo to process the files. The product is running on Heroku so I do not have local storage for physical files.
The CSV processing I am running works perfectly from the website if I upload the file.
rake task looks like this if I upload just the contents of the file:
desc "Checks uploads bucket on S3 and processes for imports"
task s3_import: [:environment] do
# depends on the environemtn task to load dev/prod env first
# get an s3 connection
s3 = AWS::S3.new(access_key_id: ENV['AWS_ACCESS_KEY_ID'], secret_access_key: ENV['AWS_SECRET_ACCESS_KEY'])
# get the uploads bucket
bucket = s3.buckets[ENV['UPLOAD_BUCKET']]
# check each object in the bucket
bucket.objects.each do |obj|
#uri = obj.url_for(:read)
#puts uri
puts obj.key
file = obj.read
User.import(file)
end
end
I have also tried passing just the url to the object but this does not work either.
Basically, when uploading through the page this works to open the files:
Roo::Csv.new(file.path)
So, I can get the URL of the file, and I can also get the contents of the file into a variable, but I can't work out how to get Roo to open it without a physical file on disk.

Related

SSIS FTP won't send files

I am currently trying to send a single file through to an FTP site, using an SSIS FTP task to which it is failing and I'm receiving the error:
[FTP Task] Error: Unable to send files using "FTP Connection Manager".
Using the SSIS FTP Task, I am trying to send a file from a local drive to an FTP root folder. I have all the permissions to read and write. I have tried to delete the file on the FTP server (successful), move the file using WinSCP (successful), delete the file from the FTP folder (successful) as well as create a directory on the FTP site. However, when I try and send from a local folder to the root drive of the FTP site, I get the above-mentioned error. My local path includes the file I am wanting to send as well
C:\WebSites\Files\Import\SFTP\Extract\CE_CS_Shipment.xlsx
Is there anything else I can try because I am at my wits ends.
I seemed to have found a workaround because using the FTP Task in SSIS just wasn't budging.
I created the file on the FTP site, followed by copying the stream and that works perfectly.
FtpWebRequest request = (FtpWebRequest)WebRequest.Create(sftpServer + result);
request.Credentials = new NetworkCredential(sftpUser, strFTPPassword);
request.Method = WebRequestMethods.Ftp.UploadFile;
using (Stream fileStream = File.OpenRead(archiveFolder.Replace("//", "/") + #"/" + result))
using (Stream ftpStream = request.GetRequestStream())
{
fileStream.CopyTo(ftpStream);
}
This now works perfectly in the event that someone else is looking for or struggling with the same problem.

How do I disable the creation of the artifact-[build-no].json file?

When I build a maven project from GitHub using Cloud Build (resulting in jar files in a bucket) I get an extra file uploaded to my bucket that specifies what files have been built (artifacts-[build-no].json). The file has a unique name for every build, so the bucket gets filled up with loads of unwanted files. Is there a way to disable the creation of that file?
I think the json is only generated when using the artifacts flag, such as:
artifacts:
objects:
location: 'gs://$PROJECT_ID/'
paths: ['hello']
You could manually push to a bucket in a step with the gsutil cloud builder, without the helper syntax. This would avoid the json creation.
https://github.com/GoogleCloudPlatform/cloud-builders/tree/master/gsutil
# Upload it into a GCS bucket.
- name: 'gcr.io/cloud-builders/gsutil'
args: ['cp', 'gopath/bin/hello', 'gs://$PROJECT_ID/hello']

How to avoid AWS S3 changing file content-type from text/html?

I'm using S3 to host a static website for a portfolio. For each webpage the .htmlextension is removed. The problem with this is that AWS automatically categorizes the file as binary/octet-stream. The problem here is that when you access the extension such as myportfolio.com/contact instead of rendering the page, the file just downloads into the downloads folder.
By going to
[Object] → properties → metadata → content-type
I am able to change the type to text/html which causes the file to render instead of downloading.
Now after having made a change to all the files and re-uploading them using AWS Cli:
AWS s3 sync . s3://[bucketname]
the files went back to the old content-type. How do I permanently set the content type on these files?
Have you tried setting the --content-type flag? For example,
AWS s3 sync . s3://[bucketname] --content-type "text/html"
More info on this can be found here in the sync documentation
UPDATE: try this command for deployment,
aws s3 sync --acl public-read --sse --delete . s3://[bucketname]
--acl sets the files to be public read, --sse stores files encrypted, --delete removes files that are not in your local file.

How to download some files from heroku

I made one Dyno app in Heroku using node.js
that Dyno task is to collect data and create json file daily
but I don't know how to download them locally
I tried
http://myappname.heroku.com/filename.json
but failed
Heroku is new for me,so please don't treat me like advance user
You cannot do this.
If your code is writing a JSON file to the Heroku server daily, that file is almost instantly being deleted, so there is no way you can download it.
Heroku dynos are ephemeral. This means that any data you 'save' to the filesystem will be deleted almost instantly. If you need to save files, you should save them to a file service like Amazon S3 -- then download them through there.
Save your JSON file to /public folder.
Ensure that your app.js has the following:
app.use(express.static(__dirname + '/public'))
Now, you should be able to access:
http://myappname.heroku.com/filename.json

Get the latest updated file from FTP Folder

Kindly see this screen cast to get better idea about our requirement:
https://www.screenr.com/QmDN
We want to automate the Text Datasource Generation and connection to MS Excel in order to make it easier to the end-user to connect to the Text Datasource (CSV) to MS Excel so that they can generate their own reports.
The steps I have in mind:
Use WinSCP FTP Client with Scripting
Write script to get the most recent updated file from FTP Folder
Or instead of step 2, download all generated files from FTP to a Shared Folder on the Network.
Get the most recent version of the Generated CSV File
Rename the file to the Standard Naming Convention. This must be the name used in MS Excel as the CSV Text Datasource.
Delete all other files
I developed sample script that can be used by WinSCP to download the files from FTP folder:
# Automatically abort script on errors
option batch abort
# Disable overwrite confirmations that conflict with the previous
option confirm off
# Connect
open CSOD
# Change remote directory
cd /Reports/CAD
# Force binary mode transfer
option transfer binary
# Download file to the local directory d:\
#get "Training Attendance Data - Tarek_22_10_21_2014_05_05.CSV" "D:\MyData\Business\Talent Management System\Reports\WinCSP\"
get "*.CSV" "D:\MyData\Business\Talent Management System\Reports\WinCSP\Files\"
# Disconnect
close
exit
Then, I can schedule the above code to run periodically using this command:
winscp.com /script=example.txt
The above sample is working fine, but the main problem is how to identify the most recent file, so that I can rename it, and delete all the other files.
Appreciate your help.
Tarek
Just add the -latest switch to the get command:
get -latest "*.CSV" "D:\MyData\Business\Talent Management System\Reports\WinCSP\Files\"
For more details, see WinSCP article Downloading the most recent file.
You don't specify the language you use, here a Ruby script that downloads the most recent file of an FTP path. Just to demonstrate how easy and terse this can be done with a scripting language like Ruby.
require 'net/ftp'
Net::FTP.open('url of ftpsite') do |ftp|
ftp.login("username", "password")
path = "/private/transfer/*.*"
# file[55..-1] gives the filename part of the returned string
most_recent_file = ftp.list(path)[2..-1].sort_by {|file|ftp.mtime(file[55..-1])}.reverse.first[55..-1]
puts "downloading #{most_recent_file}"
ftp.getbinaryfile(most_recent_file, File.basename(most_recent_file))
puts "done"
end