I'm developing a web application that's going to start with 200gb of data to be storaged. Over the years, the same application possibly can reach 1tb, perhaps 2tb in 5 years.
What I want from this application is the clients to upload files to the server and the server then upload files do Google Drive, persisting the webviewlink on database. It's working this way on localhost.
I know two options for authentication for Google Drive API: client account and service account.
Service Account's option fits better for me because I want the server to have control of the files, not the client have control.
But Service Account can storage too few data and the storage limit can't be increased. The limit is something around 15gb I guess, not sure.
If the Service Account will not help me, what options would I have to storage 2tb of data or more? Should I find another way to storage the files?
I'd like to stay using Google. If there's not any option using Google Drive API, please, suggest anything else for this scenario.
You have a couple of options.
Use a regular account instead of a Service Account. You will still need to pay for the storage, but it will work and you'll have everything in a single account. From your question "I want the server to have control of the files, not the client have control" I suspect you have looked at the OAuth quickstart examples and concluded that only end users can grant access. That's not the case. It's perfectly valid, and really quite simple, for your server app to grant access to an account it controls. See How do I authorise an app (web or installed) without user intervention? for how to do this.
Use multiple Service Accounts and shard your data across them. The various accounts could all share their folders to a kinda master account which would then have a coherent view of the entire corpus.
Personally I'd go with option 1 because it's the easiest to set up and manage.
Either way, make sure you understand how Google will want to charge you for the storage. For example, although each Service Account has a free quota, it is ultimately owned by the regular user that created it and the standard user quota limits and charges probably apply to that user.
Related
I am doing some research for a mobile app I want to develop, and was wondering whether I could get feedback on the following architecture. Within my future app users should be able to authenticate and register themselves via the mobile app and retrieve and use their settings after a successful authentication.
What I am looking for is an architecture in which user accounts are managed by AWS Cognito, but all application related information is stored in a MySQL database hosted somewhere else.
Why host the database outside of AWS? Because of high costs / vendor lock-in / for the sake of learning about architecture rather than going all-in on AWS or Azure
Why not build the identity management myself? Because in the end I want to focus on the App and don't spent a lot of energy on something that AWS can already provide me with (yeah I know, not quite in line with my last argument above, but otherwise all my time goes into database AND IAM)
One of my assumptions in this design (please correct me if I am wrong) is that it is only possible to retrieve data from a MySQL database with 'fixed credentials'. Therefore, I don't want the app (the user's device) to make these queries (but do this on the server instead) as the credentials to the database would otherwise be stored on the device.
Also, to make it (nearly) impossible for users to run queries on the database with a fake identity, I want the server to retrieve the User ID from AWS Cognito (rather than using the ID token from the device) and use this in the SQL query. This, should protect the service from a fake user ID injection from the device/user.
Are there functionalities I have missed in any of these components that could make my design less complicated or which could improve the flow?
Is that API (the one in the step 3) managed by the AWS API Gateway? If so, your cognito user pool can be set as Authorizer in your AWS API Gateway, then the gateway will take care automatically of the token verification (Authorizers enable you to control access to your APIs using Amazon Cognito User Pools or a Lambda function).
You can also do the token verification in a Lambda if you need to verify something else in the token.
Regarding to the connection between NodeJS (assuming that is an AWS lambda) that will work fine, but keep in mind the security as your customers data will travel outside AWS, and try to use tools like AWS Secret Manager to keep your database passwords safe and rotate them from time to time in your lambda.
My problem is that I need to build a cloud storage for my customers/clients/users, who can log in to my Cloud Storage Service.
I need to understand how they work in the back end, or how they’re developed, or how can I build my solution using a server, where I can thin provision my hard drive, let users see their data, etc. What resources and articles, along with the required skills, can I use? Or is there a software like WordPress is for websites?
Some additional points to better understand the problem:
How does Google Drive or Dropbox work in the background? Do they create a folder directory or a disk drive partition for each user?
Some of what I have in my mind: I develop a website where users purchase a plan of say 10 GB. The site then sends the userId, password, plan information to my Cloud Server, where I can assign storage to him.
At first, I thought to solve the problem with a root folder, where each new user will have a folder of his own. But that's where I found my first stumbling block: how to assign a size limit to a folder?
I also need to use the free storage (that the user is not using) to allocate to other users. And I don't think that can be done in directories (correct me if I'm wrong).
So far, I've searched about cloud storage, folder sizing, thin provisioning, public cloud, private cloud, etc. Most of the courses I've found so far teach about Amazon, Google, etc. However, I need to host my own cloud service.
Someone suggested to use Nextcloud or Syncthing, but they are not what I'm looking for (according to my understanding).
1- Syncthing works off of a peer-to-peer architecture rather than a client-server architecture.
2- NextCloud, from what I get, offers cloud storage for myself.
I apologize for the long post, but I'm in a real bind here. Any help will be much appreciated.
Nextcloud does what you want. You can create group folders and give permissions to registered users or groups. You can share files or folders with external users. It is not only for single private users. You have NC instances with 1000's of users or more
I'm working on a free educational web app for school music teachers and students that will allow them to collaborate and share mp3 recordings. Since earning revenue is not the goal, I'm looking for ways to reduce file storage costs. A single teacher assignment might produce hundreds of recorded responses. Instead of saving these recordings to my own storage (or to a service like Amazon's S3), I was wondering if there are any cloud storage services that teachers could sign up for - similar to something like Google Drive - and which they could then give my server app access to for storing their class's recordings. I'd still manage the info for the recordings and other data in a single database on my own server, but I'd save any large files to the shared storage provided to me by each teacher. I haven't found any examples of this sort of thing with services like Google Drive or Dropbox, but if it's possible with those or any other services, I'd appreciate a link to some info. The expectation would be that a teacher could pay the storage company for its service according to the school's usage. The service would have to be simple for teachers to sign up for and provide me access to, which I think puts some of the developer-oriented services out of reach.
Suggestions for different strategies are also welcome. I'd prefer not to handle financial transactions (so I don't want to rent space to people).
First, the system architecture:
Server: Running IIS ASP and delivering data to a hundred or so WinXP+ clients in the field upon automated requests from those clients. Data sent from the server is large graphic or video files. If a file is placed on the server by a user, these remote clients will "discover" it and download the file.
Clients: As stated above, the clients are remote unattended boxes that fetch content from the server. The end purpose is digital signage.
Problem: All clients hitting the server at the same time makes for slow transfers of large files - not enough bandwidth.
Solution (I think): Use Google Cloud Storage or Google Drive to hold the files and have the clients request (automated and unattended) those files. I think Google would have a higher available bandwidth (at least the NSA thinks so).
Questions:
Which is a better solution between Google Cloud Storage and Google Drive?
Is it possible to use Windows PowerShell or WScript to run scripts to interact with Google? Reason is that I need to avoid installing new software on the client machines that might require user interaction.
Yes you can use powershell as long as you can urlfetch https data. The oauth flow might be tricky to get working, follow examples for installed apps.
100% use cloud storage instead of drive. Drive is not meant to scale with simultaneous downloads and has several quotas so you will need to implement exponential backoff etc with drive.
Yes you can use Drive or Cloud Storage. I would go for Drive over Cloud Storage, because :-
It's free, Cloud Storage will cost you and so you have to worry about your credit card expiring
It's easier to program since it's a simple http GET to retrieve your files
You need to think about your security model. With Drive you could (nb not should), make the files public. Provided your clients can be informed of the URL, then there is no OAuth to worry about. If you need better security, install a Refresh Token on each client. Before each download, your client will make a call to Google to convert the refresh token to an access token. I suggest prototype without OAuth to begin with. Then if (a) it fits, and (b) you need more security, add OAuth.
The Drive web app gives you your management console for the downloadable files. If you use Cloud Storage, you'll need to write your own.
The quota issue is discussed here Google Drive as a video hosting/streaming platform?
Because the quota isn't documented, we can only guess at what the restrictions are. It seems to be bandwidth for a given file, so the larger the file, the fewer the number of downloads. A simple workaround is to use the copy API https://developers.google.com/drive/v2/reference/files/copy to make multiple copies of the file.
You have other options too. Since these are simply static files, you could host them on Google Sites or Google App Engine. you could also store them within App Engine datastore which has a free quota.
Finally, you could even consider a BitTorrent approach.
I have a simple REST JSON API for other websites/apps to access some of my website's database (through a PHP gateway). Basically the service works like this: call example.com/fruit/orange, server returns JSON information about the orange. Here is the problem: I only want websites I permit to access this service. With a simple API key system, any website could quickly attain a key by copying the key from an authorized website's (potentially) client side code. I have looked at OAuth, but it seems a little complicated for what I am doing. Solutions?
You should use OAuth.
There are actually two OAuth specifications, the 3-legged version and the 2-legged version. The 3-legged version is the one that gets most of the attention, and it's not the one you want to use.
The good news is that the 2-legged version does exactly what you want, it allows an application to grant access to another via either a shared secret key (very similar to Amazon's Web Service model, you will use the HMAC-SHA1 signing method) or via a public/private key system (use signing method: RSA-SHA1). The bad news, is that it's not nearly as well supported yet as the 3-legged version yet, so you may have to do a bit more work than you otherwise might have to right now.
Basically, 2-legged OAuth just specifies a way to "sign" (compute a hash over) several fields which include the current date, a random number called "nonce," and the parameters of your request. This makes it very hard to impersonate requests to your web service.
OAuth is slowly but surely becoming an accepted standard for this kind of thing -- you'll be best off in the long run if you embrace it because people can then leverage the various libraries available for doing that.
It's more elaborate than you would initially want to get into - but the good news is that a lot of people have spent a lot of time on it so you know you haven't forgotten anything. A great example is that very recently Twitter found a gap in the OAuth security which the community is currently working on closing. If you'd invented your own system, you're having to figure out all this stuff on your own.
Good luck!
Chris
OAuth is not the solution here.
OAuth is when you have endusers and want 3rd party apps not to handle end user passwords. When to use OAuth:
http://blog.apigee.com/detail/when_to_use_oauth/
Go for simple api-key.
And take additional measures if there is a need for a more secure solution.
Here is some more info, http://blog.apigee.com/detail/do_you_need_api_keys_api_identity_vs._authorization/
If someone's client side code is compromised, they should get a new key. There's not much you can do if their code is exposed.
You can however, be more strict by requiring IP addresses of authorized servers to be registered in your system for the given key. This adds an extra step and may be overkill.
I'm not sure what you mean by using a "simple API key" but you should be using some kind of authentication that has private keys(known only to client and server), and then perform some kind of checksum algorithm on the data to ensure that the client is indeed who you think it is, and that the data has not been modified in transit. Amazon AWS is a great example of how to do this.
I think it may be a little strict to guarantee that code has not been compromised on your clients' side. I think it is reasonable to place responsibility on your clients for the security of their own data. Of course this assumes that an attacker can only mess up that client's account.
Perhaps you could keep a log of what ip requests are coming from for a particular account, and if a new ip comes along, flag the account, send an email to the client, and ask them to authorize that ip. I don't know maybe something like that could work.
Basically you have two options, either restrict access by IP or then have an API key, both options have their positive and negative sides.
Restriction by IP
This can be a handy way to restrict the access to you service. You can define exactly which 3rd party services will be allowed to access your service without enforcing them to implement any special authentication features. The problem with this method is however, that if the 3rd party service is written for example entirely in JavaScript, then the IP of the incoming request won't be the 3rd party service's server IP, but the user's IP, as the request is made by the user's browser and not the server. Using IP restriction will hence make it impossible to write client-driven applications and forces all the requests go through the server with proper access rights. Remember that IP addresses can also be spoofed.
API key
The advantage with API keys is that you do not have to maintain a list of known IPs, you do have to maintain a list of API keys, but it's easier to automatize their maintenance. Basically how this works is that you have two keys, for example a user id and a secret password. Each method request to your service should provide an authentication hash consisting of the request parameters, the user id and a hash of these values (where the secrect password is used as the hash salt). This way you can both authenticate and restrict access. The problem with this is, that once again, if the 3rd party service is written as client-driven (for example JavaScript or ActionScript), then anyone can parse out the user id and secret salt values from the code.
Basically, if you want to be sure that only the few services you've specifically defined will be allowed to access your service, then you only option is to use IP restriction and hence force them to route all requests via their servers. If you use an API key, you have no way to enforce this.
All of production of IP's security seems produces a giant bug to users before getting connected. Symbian 60s has the fullest capability to left an untraced, reliable and secure signal in the midst of multiple users(applying Opera Handler UI 6.5, Opera Mini v8 and 10) along with the coded UI's, +completely filled network set-up. Why restrict for other features when discoverable method of making faster link method is finally obtained. Keeping a more identified accounts, proper monitoring of that 'true account'-if they are on the track-compliance of paying bills and knowing if the users has an unexpired maintaining balance will create a more faster link of internet signal to popular/signatured mobile industry. Why making hard security features before getting them to the site, a visit to their accounts monthly may erase all of connectivity issues? All of the user of mobile should have no capability to 'get connected' if they have unpaid bills. Why not provide an 'ALL in One' -Registration/Application account, a programmed fixed with OS, (perhaps an e-mail account) instead with a 'monitoring capability' if they are paying or not (password issues concern-should be given to other department). And if 'not' turn-off their account exactly and their other link features. Each of them has their own interests to where to get hooked daily, if you'd locked/turn them off due to unpaid bills that may initiate them to re-subscribe and discipline them more to become a more responsible users and that may even expire an account if not maintained. Monthly monitoring or accessing of an identified 'true account' with collaboration to the network provider produces higher privacy instead of always asking for users 'name' and 'password', 'location', 'permissions' to view their data services. IP's marked already their first identity or 'finding the location of the users' so, it's seems unnessary to place it on browsers pre-searches, why not use 'Obtaining data' or 'Processing data.'