HTML5 File API - slicing or not? - html

There are some nice examples about file uploading at HTML5 Rocks but there are something that isn't clear enough for me.
As far as i see, the example code about file slicing is getting a specific part from file then reading it. As the note says, this is helpful when we are dealing with large files.
The example about monitoring uploads also notes this is useful when we're uploading large files.
Am I safe without slicing the file? I meaning server-side problems, memory, etc. Chrome doesn't support File.slice() currently and i don't want to use a bloated jQuery plugin if possible.

Both Chrome and FF support File.slice() but it has been prefixed as File.webkitSlice() File.mozSlice() when its semantics changed some time ago. There's another example of using it here to read part of a .zip file. The new semantics are:
Blob.webkitSlice(
in long long start,
in long long end,
in DOMString contentType
);
Are you safe without slicing it? Sure, but remember you're reading the file into memory. The HTML5Rocks tutorial offers chunking the upload as a potential performance improvement. With some decent server logic, you could also do things like recovering from a failed upload more easily. The user wouldn't have to re-try an entire 500MB file if it failed at 99% :)

This is the way to slice the file to pass as blobs:
function readBlob() {
var files = document.getElementById('files').files;
var file = files[0];
var ONEMEGABYTE = 1048576;
var start = 0;
var stop = ONEMEGABYTE;
var remainder = file.size % ONEMEGABYTE;
var blkcount = Math.floor(file.size / ONEMEGABYTE);
if (remainder != 0) blkcount = blkcount + 1;
for (var i = 0; i < blkcount; i++) {
var reader = new FileReader();
if (i == (blkcount - 1) && remainder != 0) {
stop = start + remainder;
}
if (i == blkcount) {
stop = start;
}
//Slicing the file
var blob = file.webkitSlice(start, stop);
reader.readAsBinaryString(blob);
start = stop;
stop = stop + ONEMEGABYTE;
} //End of loop
} //End of readblob

Related

Forge chunk upload .NET Core

I have question about uploading large objects in forge bucket. I know that I need to use /resumable api, but how can I get the file( when I have only filename). In this code what is exactly FILE_PATH? Generally, should I save file on server first and then do the upload on bucket?
private static dynamic resumableUploadFile()
{
Console.WriteLine("*****begin uploading large file");
string path = FILE_PATH;
if (!File.Exists(path))`enter code here`
path = #"..\..\..\" + FILE_PATH;
//total size of file
long fileSize = new System.IO.FileInfo(path).Length;
//size of piece, say 2M
long chunkSize = 2 * 1024 * 1024 ;
//pieces count
long nbChunks = (long)Math.Round(0.5 + (double)fileSize / (double)chunkSize);
//record a global response for next function.
ApiResponse<dynamic> finalRes = null ;
using (FileStream streamReader = new FileStream(path, FileMode.Open))
{
//unique id of this session
string sessionId = RandomString(12);
for (int i = 0; i < nbChunks; i++)
{
//start binary position of one certain piece
long start = i * chunkSize;
//end binary position of one certain piece
//if the size of last piece is bigger than total size of the file, end binary
// position will be the end binary position of the file
long end = Math.Min(fileSize, (i + 1) * chunkSize) - 1;
//tell Forge about the info of this piece
string range = "bytes " + start + "-" + end + "/" + fileSize;
// length of this piece
long length = end - start + 1;
//read the file stream of this piece
byte[] buffer = new byte[length];
MemoryStream memoryStream = new MemoryStream(buffer);
int nb = streamReader.Read(buffer, 0, (int)length);
memoryStream.Write(buffer, 0, nb);
memoryStream.Position = 0;
//upload the piece to Forge bucket
ApiResponse<dynamic> response = objectsApi.UploadChunkWithHttpInfo(BUCKET_KEY,
FILE_NAME, (int)length, range, sessionId, memoryStream,
"application/octet-stream");
finalRes = response;
if (response.StatusCode == 202){
Console.WriteLine("one certain piece has been uploaded");
continue;
}
else if(response.StatusCode == 200){
Console.WriteLine("the last piece has been uploaded");
}
else{
//any error
Console.WriteLine(response.StatusCode);
break;
}
}
}
return (finalRes);
}
FILE_PATH: is the path where you stored file on your server.
You should upload your file to server first. Why? Because when you upload your file to Autodesk Forge Server you need internal token, which should be kept secret (that why you keep it in your server), you dont want someone take that token and mess up your Forge Account.
The code you pasted from this article is more about uploading from a server when the file is already stored there - either for caching purposes or the server is using/modifying those files.
As Paxton.Huynh said, FILE_PATH there contains the location on the server where the file is stored.
If you just want to upload the chunks to Forge through your server (to keep credentials and internal access token secret), like a proxy, then it's probably better to just pass on those chunks to Forge instead of storing the file on the server first and then passing it on to Forge - what the sample code you referred to is doing.
See e.g. this, though it's in NodeJS: https://github.com/Autodesk-Forge/forge-buckets-tools/blob/master/server/data.management.js#L171

Why does MusicProperties->Year always return the current year?

I'm trying to get the music properties for each file in the music library by using the StorageFolder APIs.
After calling GetFilesAsync(CommonFileQuery::OrderByName) on my music library I'm iterating over the resulting IVectorView^ and calling StorageFile->Properties->GetMusicPropertiesAsync() for each file, which is inherently slow but I have to do it this way, since QueryOptions are not supported on Windows Phone for some reason.
Anyway, after completing that task every property is correct except for MusicProperties->Year, which is 2014 for every single one of well over 900 music files on my phone. Here's a short code snippet:
create_task(lib->GetFilesAsync(Search::CommonFileQuery::OrderByName))
.then([](IVectorView<StorageFile^>^ songFiles)
{
auto taskPtr = std::make_shared<std::vector<task<Song>>>(songFiles->Size);
for (size_t i = 0, len = songFiles->Size; i < len; ++i)
{
StorageFile^ song = songFiles->GetAt(i);
(*taskPtr)[i] = create_task(song->Properties->GetMusicPropertiesAsync()).then([]
(MusicProperties^ props)
{
Song s;
s.album = std::wstring(std::move(props->Album->Data()));
s.artist = std::wstring(std::move(props->Artist->Data()));
s.title = std::wstring(std::move(props->Title->Data()));
s.track = props->TrackNumber;
s.year = props->Year;
return s;
});
}
//further processing is done in a when_all function after the song tasks have completed
}
Song is just a plain struct to save my result temporarily and convert it to JSON later on, but that Year property is freaking me out. Has anybody else encountered that issue already and is there any other way to retrieve the proper release year from a music file?

HTML5 FileReader API crashes chrome 17 when reading large file as slice

I'm trying to read large file (3GB) as slice as 100Mb.
***function sliceMe() {
var file = document.getElementById('files').files[0],
fr = new FileReader;
var chunkSize = document.getElementById('txtSize').value;
chunkSize =1048576;
var chunks = Math.ceil(file.size / chunkSize);
var chunk = 0;
document.getElementById('byte_range').innerHTML = "";
function loadNext() {
var start, end,
blobSlice = File.prototype.mozSlice || File.prototype.webkitSlice;
start = chunk * chunkSize;
if (start > file.size)
start = end+1;
end = start + (chunkSize -1) >= file.size ? file.size : start + (chunkSize -1);
fr.onload = function(e) {
if (++chunk <= chunks) {
document.getElementById('byte_range').innerHTML += chunk + " " +
['Read bytes: ', start , ' - ', end,
' of ', file.size, ' byte file'].join('')+"<br>";
//console.info(chunk);
loadNext(); // shortcut here
}
};
fr.readAsArrayBuffer(blobSlice.call(file, start, end));
}
loadNext();
}***
Above code works as expected in Firefox and in Chrome 16. But in Chrome 17 & 18dev version, after reading 1GB data browser crashes.
Is it known issue in Chrome 17?
I had the same problem reading in a 1.8 GB file. If I watch task manager, chrome.exe would take up to 1.5 GB of memory and then crash. My solution was to use a Javascript worker and then use FileReaderSync instead of FileReader. The javascript worker runs in a separate thread, and FileReaderSync will only work in a javascript worker.
You need to change your algorithm that should change the chunk size run time according to the file size. the google chrome crashes when loop running continuously.

LoaderMax Error opening image URL from wordpress

Error opening URL 'http://test.myweb.com/wp-content/uploads/2010/12/premi_logo.jpg ?purpose=audit&gsCacheBusterID=1313725194183'
var picloader:LoaderMax =new LoaderMax({name:"mainQueue", onProgress:EventLoader, onComplete:ImageLoaded});
for (var i:uint; i < news_xml.news.length(); i++)
{
picloader.append(new ImageLoader(news_xml.news[i].imagelink. # path, {name:'photo_'+i}))
}
//picloader.auditSize =false;
picloader.load();
It looks like maybe you have a space (" ") at the end of the URL that you're passing into the ImageLoader.
Also, if you want to avoid the file size auditing, you can set LoaderMax.defaultAuditSize = false. There are some tips and tricks at http://www.greensock.com/loadermax-tips/ and dedicated forums at http://forums.greensock.com

Methods for deleting blank (or nearly blank) pages from TIFF files

I have something like 40 million TIFF documents, all 1-bit single page duplex. In about 40% of cases, the back image of these TIFFs is 'blank' and I'd like to remove them before I do a load to a CMS to reduce space requirements.
Is there a simple method to look at the data content of each page and delete it if it falls under a preset threshold, say 2% 'black'?
I'm technology agnostic on this one, but a C# solution would probably be the easiest to support. Problem is, I've no image manipulation experience so don't really know where to start.
Edit to add: The images are old scans and so are 'dirty', so this is not expected to be an exact science. The threshold would need to be set to avoid the chance of false positives.
You probably should:
open each image
iterate through its pages (using Bitmap.GetFrameCount / Bitmap.SelectActiveFrame methods)
access bits of each page (using Bitmap.LockBits method)
analyze contents of each page (simple loop)
if contents is worthwhile then copy data to another image (Bitmap.LockBits and a loop)
This task isn't particularly complex but will require some code to be written. This site contains some samples that you may search for using method names as keywords).
P.S. I assume that all of images can be successfully loaded into a System.Drawing.Bitmap.
You can do something like that with DotImage (disclaimer, I work for Atalasoft and have written most of the underlying classes that you'd be using). The code to do it will look something like this:
public void RemoveBlankPages(Stream source stm)
{
List<int> blanks = new List<int>();
if (GetBlankPages(stm, blanks)) {
// all pages blank - delete file? Skip? Your choice.
}
else {
// memory stream is convenient - maybe a temp file instead?
using (MemoryStream ostm = new MemoryStream()) {
// pulls out all the blanks and writes to the temp stream
stm.Seek(0, SeekOrigin.Begin);
RemoveBlanks(blanks, stm, ostm);
CopyStream(ostm, stm); // copies first stm to second, truncating at end
}
}
}
private bool GetBlankPages(Stream stm, List<int> blanks)
{
TiffDecoder decoder = new TiffDecoder();
ImageInfo info = decoder.GetImageInfo(stm);
for (int i=0; i < info.FrameCount; i++) {
try {
stm.Seek(0, SeekOrigin.Begin);
using (AtalaImage image = decoder.Read(stm, i, null)) {
if (IsBlankPage(image)) blanks.Add(i);
}
}
catch {
// bad file - skip? could also try to remove the bad page:
blanks.Add(i);
}
}
return blanks.Count == info.FrameCount;
}
private bool IsBlankPage(AtalaImage image)
{
// you might want to configure the command to do noise removal and black border
// removal (or not) first.
BlankPageDetectionCommand command = new BlankPageDetectionCommand();
BlankPageDetectionResults results = command.Apply(image) as BlankPageDetectionResults;
return results.IsImageBlank;
}
private void RemoveBlanks(List<int> blanks, Stream source, Stream dest)
{
// blanks needs to be sorted low to high, which it will be if generated from
// above
TiffDocument doc = new TiffDocument(source);
int totalRemoved = 0;
foreach (int page in blanks) {
doc.Pages.RemoveAt(page - totalRemoved);
totalRemoved++;
}
doc.Save(dest);
}
You should note that blank page detection is not as simple as "are all the pixels white(-ish)?" since scanning introduces all kinds of interesting artifacts. To get the BlankPageDetectionCommand, you would need the Document Imaging package.
Are you interested in shrinking the files or just want to avoid people wasting their time viewing blank pages? You can do a quick and dirty edit of the files to rid yourself of known blank pages by just patching the second IFD to be 0x00000000. Here's what I mean - TIFF files have a simple layout if you're just navigating through the pages:
TIFF Header (4 bytes)
First IFD offset (4 bytes - typically points to 0x00000008)
IFD:
Number of tags (2-bytes)
{individual TIFF tags} (12-bytes each)
Next IFD offset (4 bytes)
Just patch the "next IFD offset" to a value of 0x00000000 to "unlink" pages beyond the current one.