How does google drive stores file?

How does google drive stores file? - google-drive-api

When I upload a file on google drive, does google segment that file into smaller pieces and upload them on different servers. For eg. I upload a fileA on my drive. Does google divide fileA into smaller chunks and upload them on different servers serverA, serverB, ...etc ?
Or is it so that a given file is uploaded on a particular server without breaking it into smaller chunks. For eg. I upload fileA then it will be stored on serverA or any other server, without segmentation.
Or is it so that all the files I upload on my drive are stored at one particular server?

The service hosted by google is supported by GFS, the google filesystem.
I was not sure until I spot this post from whom seems to be a google employee.
Edit:
The fact that Google use GFS is actually an inference based on the fact that the filesystem fits perfectly the model of google drive. An article has been published on arstechnica, which describe that in more details.
Of course no one expect former and current google employees knows what is behind, but the fact that someone who claims being one of them specifically points the OP of the question I linked to GFS, is another strong incentive to believe that GFS (and maybe BigTable) are backing up Google Drive.
Edit 2:
As I was seeing downvotes and that the topic actually interest me a lot, I decided to enrich this answer with the following arguments:
1) Google infrastructure strategy is to build clusters of inexpensive commodity hardware, with in mind the fact that it will fail. This was one of the motivation behind he GFS infrastructure.
2) The intuition that, as GFS totally fits Google's infrastructures, services and internal softwares (among which MapReduce is an important actor), has lead other people to the conclusion that it is backing up pretty much all their services. See :
http://www.slideshare.net/hasanveldstra/the-anatomy-of-the-google-architecture-fina-lv11.
Also, an interview of Jeff Dean, backup a lot this intuition, and explain 1) better
3) That doesn't have any actual meaning, but I found it fun: some user actually ended up having the extension .gfs in Drive (It'd be unexpected, but somehow unfortunate that this gfs actually refer to MS Groove files)
I don' think we'll ever see an actual former/current employee validate this statement, and of course much of the details of what is backing up Drive will stay hidden, but the intuition has strong roots.

Related

Is it possible to limit the controls/formatting allowed while editing in Google Docs?

I want to develop an academic papers repository system in which students can create/edit their docs. In this regard I NEED to take over the Docs Interface (e.g. a user CANNOT change font size). Digging into Docs (https://developers.google.com/apps-script/guides/docs) I've not found anything that allow me to get that job done!!!
The user will log into my app (it doesn't matter if he/her has a Google Account), cause I want full control over the docs.
So, I need to create an app with those requirements. Is it possible with Google Docs ???
Best regards,
Romualdo Rubens de Freitas.

As Bryan said in the comment, you can't disable the formatting tools... the best you could do is to write a script that continuously "reformats" your document, setting font styles, sizes and other format parameters to a value you choose.
This script could run on a timer to let it run periodically. I've written such a script for spreadsheets to keep the presentation the way I want and it works nicely... I guess it will be a bit trickier in documents because there are a lot more format parameters but it should be possible.

Syncing File Name for Drive Realtime Document

My real-time document allows the user to edit the file name within the editor (much like Google's own apps). I represent this as a collaborative string so all collaborators see the file renames as soon as possible.
I'm trying to determined the best and most efficient way to keep this collaborative string in sync with the actual file name. There are two scenarios to consider:
In Editor Changes
If a user edits the document name within the editor. In this case we need to use the Drive API to push that change out to the file on Google drive. To avoid race conditions, it is best if only one of the collaborators pushes the change out. The easiest way to do this seems to check if the rename event was local.
I also found it best to add a delay so we are not pushing the rename out to the Drive API with every character change. If a few seconds pass with no more name changes at that point it pushes the change out. This all seems to work well.
External Changes
The harder one and the one I am interested in requesting advice on, the case when the file name is changed externally. For example, if the user renamed the file within the Drive interface itself. We want this change to update our collaborative string to match.
My application is entirely client-side so I can't use webhook push notifications. So my only solution is to poll the file name every X seconds (currently set to 10). But this presents the following problems:
It is API intensive. If you have 4 collaborators that keep the screen open for 8 hour that is 11520 API calls. If my app has lots of users with lots of documents I could see how this might push me past my API limits.
To avoid race conditions (and reduce API calls) we only want one collaborator to check for changes and update the collaborative string if the file name has changed. But how to pick when collaborators might join/exit at any time? Currently I am having each collaborator check anytime the collaborators change if they are the "leader". The "leader" is the collaborator whose session id is the highest. This seems to work but it all seems fairly hackey. Also if collaborators join close together I wonder if it might be possible that a race condition would cause multiple collaborators to think they are the leader.
Is there an easier way? An real-time API function I am missing?
It would be ideal if the real-time API just provided a method that stored the document name. Anytime the real-time API checks for mutations it could grab the latest document name.

I think you've identified the options. There isn't any built in functionality currently to sync it via the Realtime API specifically.
Personally I'd probably back off the poll time a lot.. its probably not critical that the title is always exactly up to date, so asking every few minutes is probably sufficient and would greatly reduce your qps.
In terms of identifying a "leader", I can't think of anything better than something deterministic based on the session id. So long as each rechecks on each session join/leave event, I don't think there should be any issues.

Tools for viewing logs of unlimited size

It's no secret that application logs can go well beyond the limits of naive log viewers, and the desired viewer functionality (say, filtering the log based on a condition, or highlighting particular message types, or splitting it into sublogs based on a field value, or merging several logs based on a time axis, or bookmarking etc.) is beyond the abilities of large-file text viewers.
I wonder:
Whether decent specialized applications exist (I haven't found any)
What functionality might one expect from such an application? (I'm asking because my student is writing such an application, and the functionality above has already been implemented to a certain extent of usability)

I've been using Log Expert lately.
alt text http://www.log-expert.de/images/stories/logexpertshowcard.gif
I can take a while to load large files, but it will in fact load them. I couldn't find the file size limit (if there is any) on the site, but just to test it, I loaded a 300mb log file, so it can at least go that high.

Windows Commander has a built-in program called Lister which works very quickly for any file size. I've used it with GBs worth of log files without a problem.
http://www.ghisler.com/lister/
A slightly more powerful tool I sometimes use it Universal Viewer from http://www.uvviewsoft.com/.

PDF creation - Tags - Authouring

This is so vague it's ridiculous but who knows...
We have got this client who will not budge - they are supplying PDF files auto generated by their own software. These files don't import into our (printing) lab management software - made by kodak.
So I emailed Kodak the error log and relevant files and got this back..
DP2 supports the importing of PDF's from – Adobe Illustrator and Quark Express
Some of the capabilities when importing PDF's as ORDER ITEMS is that the images can be modified,
color corrected, or replaced. To accomplish this, the PDF is disassembled. PDF's from Illustrator and Quark,
contain additional information that tells us where everythings goes and how, thus enought information for
us to reassemble the PDF. While other applications do generate PDF's they don't contain this additional
information.
After speaking with a 3rd party 'expert' we need to consider another 3rd party 'rip' software that's fairly expensive. So before I go ahead I thought I'd ask if any one has experience with this stuff?
Cheers

Thats a tough one, PDFs can be created in so many different ways, it hard to tell exactly what any given PDF may be composed of, personally I'd try some different PDF editors first to see if you can exact the data you need before going the expensive route.
Eg Foxit PDF offer an editor (I think its free, or cheap in any case)
Darknight

How do you database access (I/O) to/from Magento Commerce?

So, I want to import, export and modify the database. I have read that I have to do that by XML, but I don't really understand their doc system and I haven't found any good tutorials out there that explain this. I am slowly reading the very expensive and short book which is somewhat answering my questions, but I crave more.
As a second question, I want to have a order system where I can send out information or emails with my own code. I assume this would be some type of plug-in that would override or be called at a certain time. Any info would be helpful.

Some parts of the magento data can be imported/exported via the backend (System->Import/Export), namely products and customers.
If you want to deal with the complete DB - use your DB tool of choice (I prefer mysqldump).
When dealing with exported CSV.. use OpenOffice, from my experience it deals better with the separation characters than Excel.
As for your second question - as far as I understood, you will have to develop a module if you want to do something different than the existing functionality and keep the original mail functions. If you don't want to/have to keep the original functions, you can opt to overwrite the module, which is much easier as far as I can see. Google search for "overriding magento module" should turn up atleast one decent tutorial.

I found what I was looking for here:
(on magento site: Resources -> Magento Core API -> Product API or whichever API you want)
The problem is there is no Order API yet (or none that I've seen)
http://www.magentocommerce.com/wiki/doc/webservices-api/api/catalog_product#examples
This details how you'd write an external php script and obtain,edit or delete products (or anything else with an API).
Modules still look daunting, but I am reading through the (very thin) magento book (the only one available).
I hope this helps someone else.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008