how to organize more than 40000 mp3 files in wp8? - windows-phone-8

I want to place about 45000 mp3 files in my wp8 app,
all of them are sound effect and less than 5k bytes,
however the total space are more than 150m bytes.
I think there are 2 ways to store them.
store each of mp3 as a separate file, it need more than 150m space in logical and actually more than 220m
use a binary file to save all of them in one file, maybe like below structure:
first 4bytes : --length of mp3 file name;
first byte[]: --store the mp3 file name;
senc 4bytes: --length of mp3 file;
senc byte[]: --store the real content of mp3;
and repeat this to append all of them to one file.
It need only 150m , however I have to seek the position of each mp3 file.
Which one you think is better? I prefer to the second solution. However I don't find any api can seek from offset 0 to offset 150*1024*1024,and maybe this will raise performance issue.

I'd use the second option with an index and use of BinaryReader. You can seek to a position (reference) in a file (something like this):
byte[] mp3File;
// Get the file.
var file = await dataFolder.OpenStreamForReadAsync("combined_mp3s.bin");
using (var binReader = new BinaryReader(file))
{
binReader.Postion = offset; // using your math ...
var fileName = binReader.ReadString(); (length prefixed string read)
// or you could read the size on your own and read the characters
var mp3FileLen = binReader.ReadInt32();
mp3File = binReader.ReaderBytes(mp3FileLen);
}
I would suggest you have a hash/dictionary of the file names and starting position of the data for each file stored separately so your application can quickly locate the contents of a file without doing a scan.
Downloading
You may also want to consider breaking the huge file into some smaller files for a better download experience. You could append to the huge file as the contents of a group or individual file is available on the phone.

The problem with your second alternative is that by avoiding the filesystem, you lose a great deal of convenience that the filesystem would afford you.
Perhaps most importantly, you can now (using your proposed data-structure) no longer retrieve a file with a particular name without scanning (potentially) the entire file.
The extra space in the filesystem is probably well used.

Related

Psychopy: how to avoid to store variables in the csv file?

When I run my PsychoPy experiment, PsychoPy saves a CSV file that contains my trials and the values of my variables.
Among these, there are some variables I would like to NOT be included. There are some variables which I decided to include in the CSV, but many others which automatically felt in it.
is there a way to manually force (from the code block) the exclusion of some variables in the CSV?
is there a way to decide the order of the saved columns/variables in the CSV?
It is not really important and I know I could just create myself an output file without using the one of PsychoPy, or I can easily clean it afterwards but I was just curious.
PsychoPy spits out all the variables it thinks you could need. If you want to drop some of them, that is a task for the analysis stage, and is easily done in any processing pipeline. Unless you are analysing data in a spreadsheet (which you really shouldn't), the number of columns in the output file shouldn't really be an issue. The philosophy is that you shouldn't back yourself into a corner by discarding data at the recording stage - what about the reviewer who asks about the influence of a variable that you didn't think was important?
If you are using the Builder interface, the saving of onset & offset times for each component is optional, and is controlled in the "data" tab of each component dialog.
The order of variables is also not under direct control of the user, but again, can be easily manipulated at the analysis stage.
As you note, you can of course write code to save custom output files of your own design.
there is a special block called session_variable_order: [var1, var2, var3] in experiment_config.yaml file, which you probably should be using; also, you should consider these methods:
from psychopy import data
data.ExperimentHandler.saveAsWideText(fileName = 'exp_handler.csv', delim='\t', sortColumns = False, encoding = 'utf-8')
data.TrialHandler.saveAsText(fileName = 'trial_handler.txt', delim=',', encoding = 'utf-8', dataOut = ('n', 'all_mean', 'all_raw'), summarised = False)
notice the sortColumns and dataOut params

Can I modify the total number of pages in multi-page tiff?

I am receiving data from a camera and saving each image as a page in a multi-page tiff. I can set that each file has e.g. 100 pages and I am calling:
TIFFSetField(out, TIFFTAG_PAGENUMBER, page_number, total_pages);
However, if I am unable to write data to disk fast enough, I will stop the acquisition. At this point I may have written 50 out of 100 pages into a multi-page tiff. Now the multi-page tiff file reports total number of pages as 100, but only 50 pages have been actually written. Some applications will report 100 pages, but for pages 51-100 there will be no data and images will appear to be black.
Therefore I would need to update the total_pages number at the moment when I am ending the disk writing to the value of the last written page.
Can this be done at all? Is the total_pages value written once into a common header which I could update and fix the file in this way, or is this value written into each page which means I would have to edit each page that has already been written to disk? Or is there any better approach how to handle this?
Actually, the solution is quite simple.
Once your image stream has ended and before you close the file, you have to iterate through all the directories (images in the multi-page tiff) and update the TIFFTAG_PAGENUMBER to the total pages written.
The catch is you have to do it before you close the tiff by calling TIFFClose. Once a TIFF is closed, its TAGS cannot be edited anymore. (see http://www.libtiff.org/libtiff.html):
Note that unlike the stdio library TIFF image files may not be opened for both reading and writing; there is no support for altering the contents of a TIFF file.
if (pagesTotal - pagesWritten > 0)
{
for (int i = 0; i < pagesWritten; i++)
{
int retVal = TIFFSetDirectory(out, i);
retVal = TIFFSetField(out, TIFFTAG_PAGENUMBER, i, pagesWritten);
retVal = TIFFWriteDirectory(out);
}
}
TIFFClose(out);
pagesTotal is number of pages that we inteded to write into this multi-page file
pagesWritten is number of pages that we actually wrote into the file

Copying fits-file data and/or header into a new fits-file

Similar question was asked before, but asked in an ambigous way and using a different code.
My problem: I want to make an exact copy of a .fits-file header into a new file. (I need to process a fits file in way, that I change the data, keep the header the same and save the result in a new file). Here a short example code, just demonstrating the tools I use and the discrepancy I arrive at:
data_old, header_old = fits.getdata("input_file.fits", header=True)
fits.writeto('output_file.fits', data_old, header_old, overwrite=True)
I would expect now that the the files are exact copies (headers and data of both being same). But if I check for difference, e.g. in this way -
fits.printdiff("input_file.fits", "output_file.fits")
I see that the two files are not exact copies of each other. The report says:
...
Files contain different numbers of HDUs:
a: 3
b: 2
Primary HDU:
Headers contain differences:
Headers have different number of cards:
a: 54
b: 4
...
Extension HDU 1:
Headers contain differences:
Keyword GCOUNT has different comments:
...
Why is there no exact copy? How can I do an exact copy of a header (and/or the data)? Is a key forgotten? Is there an alternative simple way of copy-pasting a fits-file-header?
If you just want to update the data array in an existing file while preserving the rest of the structure, have you tried the update function?
The only issue with that is it doesn't appear to have an option to write to a new file rather than update the existing file (maybe it should have this option). However, you can still use it by first copying the existing file, and then updating the copy.
Alternatively, you can do things more directly using the object-oriented API. Something like:
with fits.open(filename) as hdu_list:
hdu = hdu_list[<name or index of the HDU to update>]
hdu.data = <new ndarray>
# or hdu.data[<some index>] = <some value> i.e. just directly modify the existing array
hdu.writeto('updated.fits') # to write just that HDU to a new file, or
# hdu_list.writeto('updated.fits') # to write all HDUs, including the updated one, to a new file
There's nothing not "pythonic" about this :)

WinRT: Reading and deserializaing large amount of files takes too much time

I have a Windows Store application which manages collection of objects and stores them in the application local folder. Those objects are serialized on the file system using JSON. As I need to be able to edit and persist those items individually I opted for individual files for each objects instead of one large file. Objects are stored following this pattern:
Local Folder
|
--- db
|
--- AB283376-7057-46B4-8B91-C32E663EC964
| |
| --- AB283376-7057-46B4-8B91-C32E663EC964.json
| --- AB283376-7057-46B4-8B91-C32E663EC964.jpg
|
--- B506EFC5-E853-45E6-BA32-64193BB49ACD
| |
| --- B506EFC5-E853-45E6-BA32-64193BB49ACD.json
| --- B506EFC5-E853-45E6-BA32-64193BB49ACD.jpg
|
...
Each object has its folder node which will contains the JSON serialized object and other eventual resources.
Everything was fine when I made some writing, reading, deleting test. Where it got complicated is when I tried to load up large collections of object on application startup. I estimated that the largest amount of item one would store to 10000. So I wrote 10000 entries and then tried to load it... more than 3 minutes to the application to complete the operation, which of course is unacceptable.
So my questions are, What could be optimized in the code I made for reading and deserializing objects (code below)? Is there a way to implement a paging system so loading would be dynamic in my WinRT application? Is my storage method (pattern above) too heavy for in terms of IO/CPU? Am I missing something in WinRT?
public async Task<IEnumerable<Release>> GetReleases()
{
List<Release> items = new List<Release>();
var dbFolder = await ApplicationData.Current.LocalFolder.CreateFolderAsync(dbName, CreationCollisionOption.OpenIfExists);
foreach (var releaseFolder in await dbFolder.GetFoldersAsync())
{
var releaseFile = await releaseFolder.GetFileAsync(releaseFolder.DisplayName + ".json");
var stream = await releaseFile.OpenAsync(FileAccessMode.Read);
using (var inStream = stream.GetInputStreamAt(0))
{
DataContractJsonSerializer serializer = new DataContractJsonSerializer(typeof(Release));
Release release = (Release)serializer.ReadObject(inStream.AsStreamForRead());
items.Add(release);
}
stream.Dispose();
}
return items;
}
Thanks for your help.
NB: I already had a look as SQLite and I don't need such a sophisticated system.
Supposedly JSON.NET is better than the built in things. If you are not sending the data over the wire, then the quickest way is to do binary serialization rather than JSON or XML. Finally - think if you really need to load all the data when your application starts. Serialize your data as a list of binary records and create an index that will allow you to quickly jump to the range of records you actually need to use.
As Filip already mentioned, you probably don't need to load all data at startup. Even if you really want to show all the items in the first page (showing 10,000 items at once to a user doesn't sound like a good idea to me), you don't need to have all their properties available: usually only a couple of them are shown in the list, you need the rest of them when the user navigates to individual item details. You could have a separate "index" file containing only the data you need for the list. This does mean duplication, but it will help you with performance.
Although you've mentioned, you don't need SQLite as it is too sophisticated for your needs, you really should take a closer look at it. It is designed to efficiently handle structured data such as yours. I'm pretty sure if you switch to it, the performance will be much better and your code might end up even simpler in the end. Try it out.

Construct an Iterator

Let's say you want to construct an Iterator that spits out File objects. What type of data do you usually provide to the constructor of such an Iterator?
an array of pre-constructed File objects, or
simply raw data (multidimensional array for instance), and let the Iterator create File objects on the fly when Iterated through?
Edit:
Although my question was actually ment to be as general a possible, it seems my example is a bit to broad to tackle general, so I'll elaborate a bit more. The File objects I'm talking about are actually file references from a database. See these two tables:
folder
| id | folderId | name |
------------------------------------
| 1 | null | downloads |
file
| id | folderId | name |
------------------------------------
| 1 | 1 | instructions.pdf |
They reference actual folders and files on a filesystem.
Now, I created a FileManager object. This will be able to return a listing of folders and files. For instance:
FileManager::listFiles( Folder $folder );
... would return an Iterator of File objects (or, come to think of it, rather FileReference objects) from the database.
So what my question boils down to is:
If the FileManager object constructs the Iterator in listFiles() would you do something like this (pseudo code):
listFiles( Folder $folder )
{
// let's assume the following returns an multidimensional array of rows
$filesData = $db->fetch( $sqlForFetchingFilesFromFolder );
// let the Iterator take care of constructing the FileReference objects with each iteration
return FileIterator( $filesData );
}
or (pseudo code):
listFiles( Folder $folder )
{
// let's assume the following returns an multidimensional array of rows
$filesData = $db->fetch( $sqlForFetchingFilesFromFolder );
$files = array();
for each( $filesData as $fileData )
{
$files.push ( new FileReference( $fileData ) );
}
// provide the Iterator with precomposed FileReference objects
return FileIterator( $files );
}
Hope this clarifies things a bit.
What is your "File" object meant to be? An open handle to a file, or a representation of a file system path which can be opened in turn?
It would generally be a bad idea to open all the files at once - after all, part of the point of using an iterator is that you only access one object at a time. Your iterator could yield one open file at a time, and let the caller take responsibility for closing it, although again that might be slightly odd to use.
Your requirements aren't clear, to be honest - in my experience, most iterators which yield a series of files use something like Directory.GetFiles(pattern) - you don't pass them the raw data at all, you pass them something which they can use to find the data for you.
It's not obvious what you're trying to get at - it feels like you're trying to ask a general question, but you haven't provided enough information to let us advise you. It's like asking, "Do I want to use a string or an integer?" without giving any context.
EDIT: I would probably push all of that logic into FileIterator, personally. Otherwise it's hard to see what value it's really providing. In a language like C# or Python you wouldn't need a separate class in the first place - you'd just use a generator of some description. In that sense this question isn't language agnostic :(
What exactly is your iterator supposed to do? Write data to files? Create them?
An iterator is a pattern for iterating through data, which means providing sequential data in a uniformous way, not mutating them.
I find the question to be unclear.
Are we talking Iterator or Factory?
To me an Iterator is operating on a pre-existing collection of things and allows the caller to work on each thing in turn.
When you say "Spits Out" do you mean allows the client to work with one file from a pre-existing set of files or do you mean that you are iterating some data and intend to store that data in files you are generting. If we are geneating, then we've got a File factory.
My guess is that you are intending to process some files in a file sytstem. I think that your Iterator is akin to a Directory, it can give you the next file it knows about. So I construct the "Driectory" by passing enough data to allow it to know which files you mean (could be just an OS path, could be some kind of "find" enxpression, a list of ftp-like references, etc.) and expect it to give me the next File as I iterate.
----updated following question clarification
I think that the key question here is when the individual files should be opened. The Iterator itself will reasonably return a File object corresponding to an open file handle, the caller can then just work with the file. But iternally should the iterator be working against a list of pre-opened files or a list of file references, the files being opened as the iterator next() is used.
I think we should do the latter, because there is overhead in having an open file, hence we should open the files only when we need them.
That leads to one other point: who closes the file? We can't afford to keep them all open. Perhaps the iterator should close each file as next() is called. This implies that that the iterator itself needs a close() method to allow tidy up of the currently open file. Alterntaivelywe need to explictily document that closing is the client's responsibility.