SSIS Integration With GCP - ssis

We are trying to upload file to GCP console, but the performance is impacting our data transfer , being a cloud hosted its taking much longer to upload as compared to a Azure or AWS services. Any suggestion?

There could be multiple reasons why you are experiencing slow uploads. You must verify and troubleshoot on the following lines:
Is this purely related a case fo network bandwidth/congestion? Try upgrading your network bandwidth. If your case is more enterprise in nature, have you explored about Dedicated Interconnect
Try disabling, versioning/encryption and other miscellaneous object-store features before the upload - they influence upload speed
Are you copying data to the closest region where your bucket is launched? try changing your bucket relocation if this is not the case
Have you considered a multi-file upload or compressed file upload strategy? This is result in faster upload speeds also.

Related

Is it possible to fetch data from my website which is live to the simple mysql database in my system?

Suppose the users of my website uploads a pdf on my website which is live on the internet. Then is there a way that those files after being uploaded gets stored in my mysql database on my system(laptop) directly.
To refine more, would it matter if one uses mySql database on his local system(localhost) or on a live website to store data? , will the database fail to store data if the website is hosted online?
If the question is not clear to anyone in any sort please mention.
Thank you.
There are a lot of nuances to your question, and I'll try to address as many of them as I can.
I would not store files directly in the database. You certainly can, but in general you're going to get better performance and other ancillary benefits from storing files as a file in the file system. Store metadata in the database, including at the very least the file name and path on disk (perhaps you want to store more, like the uploader's account information, the size, a long-form text description, and so on, but at least store the path and filename). Then, in your application, fetch the filename from the database and serve the file instead of a database BLOB. One reason is that MySQL performance can really suffer if you don't do this properly.
Let's say you decide to defy my suggestion and store the file as some BLOB data in your database, how can you replicate that to your laptop? Your laptop isn't going to be powered on and connected to the internet all the time, in any case even if you had a server at home running 24 hours a day your hosting provider still should have better uptime than your home does. What should happen to the upload if you were hosting the database on your laptop, but your laptop was off (or rebooting for system updates)? So you should host the database at the hosting provider and somehow sync it to your local machine. MySQL provides several methods of this; replication, export and import of .sql files, or exporting binary logs. These each have tradeoffs that you'll want to consider depending on your needs.
But remember how I said you can get other ancillary benefits from storing the file on the file system directly? One of those is that you can rely on file transfer techniques to get the file to your local machine. SFTP, SCP, SyncThing, WebDAV, and any other way you can imagine transferring files can be used to get the remote file to your local system. You wouldn't automatically get the database metadata, but that didn't seem like much of a requirement from your question, so you'd have easy access to the file as uploaded, as quickly as you want.
So there are plenty of ways to accomplish this, and without more details on your question it's tough to recommend a solution, but you have plenty of options available.

Unattended download from Google Cloud Storage or Google Drive

First, the system architecture:
Server: Running IIS ASP and delivering data to a hundred or so WinXP+ clients in the field upon automated requests from those clients. Data sent from the server is large graphic or video files. If a file is placed on the server by a user, these remote clients will "discover" it and download the file.
Clients: As stated above, the clients are remote unattended boxes that fetch content from the server. The end purpose is digital signage.
Problem: All clients hitting the server at the same time makes for slow transfers of large files - not enough bandwidth.
Solution (I think): Use Google Cloud Storage or Google Drive to hold the files and have the clients request (automated and unattended) those files. I think Google would have a higher available bandwidth (at least the NSA thinks so).
Questions:
Which is a better solution between Google Cloud Storage and Google Drive?
Is it possible to use Windows PowerShell or WScript to run scripts to interact with Google? Reason is that I need to avoid installing new software on the client machines that might require user interaction.
Yes you can use powershell as long as you can urlfetch https data. The oauth flow might be tricky to get working, follow examples for installed apps.
100% use cloud storage instead of drive. Drive is not meant to scale with simultaneous downloads and has several quotas so you will need to implement exponential backoff etc with drive.
Yes you can use Drive or Cloud Storage. I would go for Drive over Cloud Storage, because :-
It's free, Cloud Storage will cost you and so you have to worry about your credit card expiring
It's easier to program since it's a simple http GET to retrieve your files
You need to think about your security model. With Drive you could (nb not should), make the files public. Provided your clients can be informed of the URL, then there is no OAuth to worry about. If you need better security, install a Refresh Token on each client. Before each download, your client will make a call to Google to convert the refresh token to an access token. I suggest prototype without OAuth to begin with. Then if (a) it fits, and (b) you need more security, add OAuth.
The Drive web app gives you your management console for the downloadable files. If you use Cloud Storage, you'll need to write your own.
The quota issue is discussed here Google Drive as a video hosting/streaming platform?
Because the quota isn't documented, we can only guess at what the restrictions are. It seems to be bandwidth for a given file, so the larger the file, the fewer the number of downloads. A simple workaround is to use the copy API https://developers.google.com/drive/v2/reference/files/copy to make multiple copies of the file.
You have other options too. Since these are simply static files, you could host them on Google Sites or Google App Engine. you could also store them within App Engine datastore which has a free quota.
Finally, you could even consider a BitTorrent approach.

Storage options for diskless servers [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I am trying to build a neural network simulation running on several high-CPU diskless instances. I am planning to use a persistent disk to store my simulation code and training data and mount them on all server instances. It is basically a map reduce kind of task (several nodes working on the same training data, the results of all nodes need to be collected to one single results file).
My only question now is, what are my options to (permanently) save the simulation results of the different servers (either at some points during the simulation or once at the end). Ideally, I would love to write them to the single persistent disk mounted on all servers but this is not possible because i can only mount it read-only to more than one server.
What is the smartest (and cheapest) way to collect all simulation results of all servers back to one persistent disk?
Google Cloud Storage is a great way to permanently store information in the Google Cloud. All you need to do is enable that product for your project, and you'll be able to access Cloud Storage directly from your Compute Engine virtual machines. If you create your instances with the 'storage-rw' service account, access is even easier because you can use the gsutil command built into your virtual machines without needing to do any explicit authorization.
To be more specific, go to the Google Cloud Console, select the project with which you'd like to use Compute Engine and Cloud Storage and make sure both those services are enabled. Then use the 'storage-rw' service account scope when creating your virtual machine. If you use gcutil to create your VM, simply add the --storage_account_scope=storage-rw (there's also an intuitive way to set the service account scope if you're using the Cloud Console to start your VM). Once your VM is up and running you can use the gsutil command freely without worrying about doing interactive login or OAuth steps. You can also script your usage by integrating any desired gsutil requests into your application (gsutil will also work in a startup script).
More background on the service account features of GCE can be found here.
Marc's answer is definitely best for long-term storage of results. Depending on your I/O and reliability needs, you can also set up one server as an NFS server, and use it to mount the volume remotely on your other nodes.
Typically, the NFS server would be your "master node", and it can serve both binaries and configuration. Workers would periodically re-scan the directories exported from the master to pick up new binaries or configuration. If you don't need a lot of disk I/O (you mentioned neural simulation, so I'm presuming the data set fits in memory, and you only output final results), it can be acceptably fast to simply write your output to NFS directories on your master node, and then have the master node backup results to some place like GCS.
The main advantage of using NFS over GCS is that NFS offers familiar filesystem semantics, which can help if you're using third-party software that expects to read files off filesystems. It's pretty easy to sync down files from GCS to local storage periodically, but does require running an extra agent on the host.
The disadvantages of setting up NFS are that you probably need to sync UIDs between hosts, NFS can be a security hole, (I'd only expose NFS on my private network, not to anything outside 10/8) and that it will require installing additional packages on both client and server to set up the shares. Also, NFS will only be as reliable as the hosting machine, while an object store like GCS or S3 will be implemented with redundant servers and possibly even geographic diversity.
If you want to stay in the google product space, how about google cloud storage?
Otherwise, I've used S3 and boto for these kinds of tasks
As a more general option, you're asking for some sort of general object store. Google, as noted in previous responses, makes a nice package, but nearly all cloud providers provide some storage option. Make sure your cloud provider has BOTH key options -- a volume store, a store for data similar to a virtual disk, and an object store, a key/value store. Both have their strengths and weaknesses. Volume stores are drop-in replacements for virtual disks. If you can use stdio, you can likely use a remote volume store. The problem is, they often have the structure of a disk. If you want anything more than that, you're asking for a database. The object store is a "middle ground" between the disk and the database. It's fast, and semi-structured.
I'm an OpenStack user myself -- first, because it does provide both storage families, and second, it's supported by a variety of vendors, so, if you decide to move from vendor A to vendor B, your code can remain unchanged. You can even run a copy of it on your own machines (Go to www.openstack.org) Note however, OpenStack does like memory. You're not going to run your private cloud on a 4GB laptop! Consider two 16GB machines.

improving loading performance by hosting images on different server

If you're hosting a blog on a shared server and you're concerned about page loading time, would it at least theoretically be better to host the images in an Amazon S3 bucket and then just link to them?
As Google apparently takes page loading into consideration, will this possibly improve search rankings?
I am not sure about search rankings but it really improves your page load speed. It commonly known as CDN. You can use this http://wordpress.org/extend/plugins/w3-total-cache/ and it has an option to select a cdn that includes the amazon server. So managing images between your wordpress and the amazon server is pretty easy by using this plugin. Thanks
Yes Good Idea. It will improve your web site performance by moving your media files from your main web server. This could be as simple as creating a sub-domain that points to a host that serves your media files.
Amazon S3 also provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web.
Generally, software developers use Amazon S3 in their applications that need the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites

uploaded files - database vs filesystem, when using Grails and MySQL

I know this is something of a "classic question", but does the mysql/grails (deployed on Tomcat) put a new spin on considering how to approach storage of user's uploaded files.
I like using the database for everything (simpler architecture, scaling is just scaling the database). But using the filesystem means we don't lard up mysql with binary files. Some might also argue that apache (httpd) is faster than Tomcat for serving up binary files, although I've seen numbers that actually show just putting Tomcat on the front of your site can be faster than using an apache (httpd) proxy.
How should I choose where to place user's uploaded files?
Thanks for your consideration, time and thought.
I don't know if one can make general observations about this kind of decision, since it's really down to what you are trying to do and how high up the priority list NFRs like performance and response time are to your application.
If you have lots of users, uploading lots of binary files, with a system serving large numbers of those uploaded binary files then you have a situation where the costs of storing files in the database include:
Large size binary files
Costly queries
Benefits are
Atomic commits
Scaling comes with database (though w MySQL there are some issues w multinode etc)
Less fiddly and complicated code to manage file systems etc
Given the same user situation where you store to the filesystem you will need to address
Scaling
File name management (user uploads same name file twice etc)
Creating corresponding records in DB to map to the files on disk (and the code surrounding all that)
Looking after your apache configs so they serve from the filesystem
We had a similar problem to solve as this for our Grails site where the content editors are uploading hundreds of pictures a day. We knew that driving all that demand through the application when it could be better used doing other processing was wasteful (given that the expected demand for pages was going to be in the millions per week we definitely didn't want images to cripple us).
We ended up creating upload -> file system solution. For each uploaded file a DB meta-data record was created and managed in tandem with the upload process (and conversely read that record when generating the GSP content link to the image). We served requests off disk through Apache directly based on the link requested by the browser. But, and there is always a but, remember that with things like filesystems you only have content per machine.
We had the headache of making sure images got re-synchronised onto every server, since unlike a DB which sits behind the cluster and enables the cluster behave uniformly, files are bound to physical locations on a server.
Another problem you might run up against with filesystems is folder content size. When you start having folders where there are literally tens of thousands of files in them, the folder scan at the OS level starts to really drag. To avert this problem we had to write code which managed image uploads into yyyy/MM/dd/image.name.jpg folder structures, so that no one folder accumulated hundreds of thousands of images.
What I'm implying is that while we got the performance we wanted by not using the DB for BLOB storage, that comes at the cost of development overhead and systems management.
Just as an additional suggestion: JCR (eg. Jackrabbit) - a Java Content Repository. It has several benefits when you deal with a lot of binary content. The Grails plugin isn't stable yet, but you can use Jackrabbit with the plain API.
Another thing to keep in mind is that if your site ever grows beyond one application server, you need to access the same files from all app servers. Now all app servers have access to the database, either because that's a single server or because you have a cluster. Now if you store things in the file system, you have to share that, too - maybe NFS.
Even if you upload file in filesystem, all the files get same permission, so any logged in user can access any other's file just entering the url (Since all of them get same permission). If you however plan to give each user a directory then a user permission of apache (that is what server has permission) is given to them. You should su to root, create a user and upload files to those directories. Again accessing those files could end up adding user's group to server group. If I choose to use filesystem to store binary files, is there an easier solution than this, how do you manage access to those files, corresponding to each user, and maintaining the permission? Does Spring's ACL help? Or do we have to create permission group for each user? I am totally cool with the filesystem url. My only concern is with starting a seperate process (chmod and stuff), using something like ProcessBuilder to run Operating Systems commands (or is there better solution ?). And what about permissions?