Construct an Iterator - language-agnostic

Let's say you want to construct an Iterator that spits out File objects. What type of data do you usually provide to the constructor of such an Iterator?
an array of pre-constructed File objects, or
simply raw data (multidimensional array for instance), and let the Iterator create File objects on the fly when Iterated through?
Edit:
Although my question was actually ment to be as general a possible, it seems my example is a bit to broad to tackle general, so I'll elaborate a bit more. The File objects I'm talking about are actually file references from a database. See these two tables:
folder
| id | folderId | name |
------------------------------------
| 1 | null | downloads |
file
| id | folderId | name |
------------------------------------
| 1 | 1 | instructions.pdf |
They reference actual folders and files on a filesystem.
Now, I created a FileManager object. This will be able to return a listing of folders and files. For instance:
FileManager::listFiles( Folder $folder );
... would return an Iterator of File objects (or, come to think of it, rather FileReference objects) from the database.
So what my question boils down to is:
If the FileManager object constructs the Iterator in listFiles() would you do something like this (pseudo code):
listFiles( Folder $folder )
{
// let's assume the following returns an multidimensional array of rows
$filesData = $db->fetch( $sqlForFetchingFilesFromFolder );
// let the Iterator take care of constructing the FileReference objects with each iteration
return FileIterator( $filesData );
}
or (pseudo code):
listFiles( Folder $folder )
{
// let's assume the following returns an multidimensional array of rows
$filesData = $db->fetch( $sqlForFetchingFilesFromFolder );
$files = array();
for each( $filesData as $fileData )
{
$files.push ( new FileReference( $fileData ) );
}
// provide the Iterator with precomposed FileReference objects
return FileIterator( $files );
}
Hope this clarifies things a bit.

What is your "File" object meant to be? An open handle to a file, or a representation of a file system path which can be opened in turn?
It would generally be a bad idea to open all the files at once - after all, part of the point of using an iterator is that you only access one object at a time. Your iterator could yield one open file at a time, and let the caller take responsibility for closing it, although again that might be slightly odd to use.
Your requirements aren't clear, to be honest - in my experience, most iterators which yield a series of files use something like Directory.GetFiles(pattern) - you don't pass them the raw data at all, you pass them something which they can use to find the data for you.
It's not obvious what you're trying to get at - it feels like you're trying to ask a general question, but you haven't provided enough information to let us advise you. It's like asking, "Do I want to use a string or an integer?" without giving any context.
EDIT: I would probably push all of that logic into FileIterator, personally. Otherwise it's hard to see what value it's really providing. In a language like C# or Python you wouldn't need a separate class in the first place - you'd just use a generator of some description. In that sense this question isn't language agnostic :(

What exactly is your iterator supposed to do? Write data to files? Create them?
An iterator is a pattern for iterating through data, which means providing sequential data in a uniformous way, not mutating them.

I find the question to be unclear.
Are we talking Iterator or Factory?
To me an Iterator is operating on a pre-existing collection of things and allows the caller to work on each thing in turn.
When you say "Spits Out" do you mean allows the client to work with one file from a pre-existing set of files or do you mean that you are iterating some data and intend to store that data in files you are generting. If we are geneating, then we've got a File factory.
My guess is that you are intending to process some files in a file sytstem. I think that your Iterator is akin to a Directory, it can give you the next file it knows about. So I construct the "Driectory" by passing enough data to allow it to know which files you mean (could be just an OS path, could be some kind of "find" enxpression, a list of ftp-like references, etc.) and expect it to give me the next File as I iterate.
----updated following question clarification
I think that the key question here is when the individual files should be opened. The Iterator itself will reasonably return a File object corresponding to an open file handle, the caller can then just work with the file. But iternally should the iterator be working against a list of pre-opened files or a list of file references, the files being opened as the iterator next() is used.
I think we should do the latter, because there is overhead in having an open file, hence we should open the files only when we need them.
That leads to one other point: who closes the file? We can't afford to keep them all open. Perhaps the iterator should close each file as next() is called. This implies that that the iterator itself needs a close() method to allow tidy up of the currently open file. Alterntaivelywe need to explictily document that closing is the client's responsibility.

Related

interpreting a json string

I have an object in my database following a file upload that look like this
a:1:{s:4:"file";a:3:{s:7:"success";b:1;s:8:"file_url";a:2:{i:0;s:75:"http://landlordsplaces.com/wp-content/uploads/2021/01/23192643-threepersons.jpg";i:1;s:103:"http://landlordsplaces.com/wp-content/uploads/2021/01/364223-two-female-stick-figures.jpg";}s:9:"file_path";a:2:{i:0;s:93:"/var/www/vhosts/landlordsplaces.com/httpdocs/wp-content/uploads/2021/01/23192643-threepersons.jpg";i:1;s:121:"/var/www/vhosts/landlordsangel.com/httpdocs/wp-content/uploads/2021/01/364223-two-female-stick-figures.jpg";}}}
I am trying with no success to parse extract the two jpg urls programmatically from the object so i can show the images ont he site. Tried assigning parse(object) but that isnt helping. I just need to get the urls out.
Thank you in anticipation of any general direction
What you're looking at is not a JSON string. It is a serialized PHP object. If this database entry was created by Forminator, you should use the Forminator API to retrieve the needed form entry. The aforementioned link points to the get_entry method, which I suspect is what you're looking for (I have never used Forminator), but in any case, you should look for a method that will return that database entry as a PHP object containing your needed URLs.
In case it is ever of any help to anyone the answer to the question was based on John input. The API has the classes to handle that without needing to understand the data structure.
Forminator_API::initialize();
$form_id = 1449; // ID of a form
$entry_id = 3; // ID of an entry
$entry = Forminator_API::get_entry( $form_id, $entry_id );
$file_url = $entry->meta_data['upload-1']['value']['file']['file_url'];
$file_path = $entry->meta_data['upload-1']['value']['file']['file_path'];
var_dump($entry); //contains paths and urls
Hope someone benefits.

Psychopy: how to avoid to store variables in the csv file?

When I run my PsychoPy experiment, PsychoPy saves a CSV file that contains my trials and the values of my variables.
Among these, there are some variables I would like to NOT be included. There are some variables which I decided to include in the CSV, but many others which automatically felt in it.
is there a way to manually force (from the code block) the exclusion of some variables in the CSV?
is there a way to decide the order of the saved columns/variables in the CSV?
It is not really important and I know I could just create myself an output file without using the one of PsychoPy, or I can easily clean it afterwards but I was just curious.
PsychoPy spits out all the variables it thinks you could need. If you want to drop some of them, that is a task for the analysis stage, and is easily done in any processing pipeline. Unless you are analysing data in a spreadsheet (which you really shouldn't), the number of columns in the output file shouldn't really be an issue. The philosophy is that you shouldn't back yourself into a corner by discarding data at the recording stage - what about the reviewer who asks about the influence of a variable that you didn't think was important?
If you are using the Builder interface, the saving of onset & offset times for each component is optional, and is controlled in the "data" tab of each component dialog.
The order of variables is also not under direct control of the user, but again, can be easily manipulated at the analysis stage.
As you note, you can of course write code to save custom output files of your own design.
there is a special block called session_variable_order: [var1, var2, var3] in experiment_config.yaml file, which you probably should be using; also, you should consider these methods:
from psychopy import data
data.ExperimentHandler.saveAsWideText(fileName = 'exp_handler.csv', delim='\t', sortColumns = False, encoding = 'utf-8')
data.TrialHandler.saveAsText(fileName = 'trial_handler.txt', delim=',', encoding = 'utf-8', dataOut = ('n', 'all_mean', 'all_raw'), summarised = False)
notice the sortColumns and dataOut params

Program Structure to meet Martin's Clean Code Function Criteria

I have been reading Clean Code by Robert C. Martin and have a basic (but fundamental) question about functions and program structure.
The book emphasizes that functions should:
be brief (like 10 lines, or less)
do one, and only one, thing
I am a little unclear on how to apply this in practice. For example, I am developing a program to:
load a baseline text file
parse baseline text file
load a test text file
parse test text file
compare parsed test with parsed baseline
aggregate results
I have tried two approaches, but neither seem to meet Martin's criteria:
APPROACH 1
setup a Main function that centrally commands other functions in the workflow. But then main() can end up being very long (violates #1), and is obviously doing many things (violates #2). Something like this:
main()
{
// manage steps, one at a time, from start to finish
baseFile = loadFile("baseline.txt");
parsedBaseline = parseFile(baseFile);
testFile = loadFile("test.txt");
parsedTest = parseFile(testFile);
comparisonResults = compareFiles(parsedBaseline, parsedTest);
aggregateResults(comparisonResults);
}
APPROACH 2
use Main to trigger a function "cascade". But each function is calling a dependency, so it still seems like they are doing more than one thing (violates #2?). For example, calling the aggregation function internally calls for results comparison. The flow also seems backwards, as it starts with the end goal and calls dependencies as it goes. Something like this:
main()
{
// trigger end result, and let functions internally manage
aggregateResults("baseline.txt", "comparison.txt");
}
aggregateResults(baseFile, testFile)
{
comparisonResults = compareFiles(baseFile, testFile);
// aggregate results here
return theAggregatedResult;
}
compareFiles(baseFile, testFile)
{
parsedBase = parseFile(baseFile);
parsedTest = parseFile(testFile);
// compare parsed files here
return theFileComparison;
}
parseFile(filename)
{
loadFile(filename);
// parse the file here
return theParsedFile;
}
loadFile(filename)
{
//load the file here
return theLoadedFile;
}
Obviously functions need to call one another. So what is the right way to structure a program to meet Martin's criteria, please?
I think you are interpreting rule 2 wrong by not taking context into account. The main() function only does one thing and that is everything, i.e. running the whole program. Let's say you have a convert_abc_file_xyz_file(source_filename, target_filename) then this function should only do the one thing its name (and arguments) implies: converting a file of format abc into one of format xyz. Of course on a lower level there are many things to be done to achieve this. For instancereading the source file (read_abc_file(…)), converting the data from format abc into format xyz (convert_abc_to_xyz(…)), and then writing the converted data into a new file (write_xyz_file(…)).
The second approach is wrong as it becomes impossible to write functions that only do one thing because every functions does all the other things in the ”cascaded” calls. In the first approach it is possible to test or reuse single functions, i.e. just call read_abc_file() to read a file. If that function calls convert_abc_to_xyz() which in turn calls write_xyz_file() that is not possible anymore.

Append string to a Flash AS3 shared object: For science

So I have a little flash app I made for an experiment where users interact with the app in a lab, and the lab logs the interactions.
The app currently traces a timestamp and a string when the user interacts, it's a useful little data log in the console:
trace(Object(root).my_date + ": User selected the cupcake.");
But I need to move away from using traces that show up in the debug console, because it won't work outside of the developer environment of Flash CS6.
I want to make a log, instead, in a SO ("Shared Object", the little locally saved Flash cookies.) Ya' know, one of these deals:
submit.addEventListener("mouseDown", sendData)
function sendData(evt:Event){
{
so = SharedObject.getLocal("experimentalflashcookieWOWCOOL")
so.data.Title = Title.text
so.data.Comments = Comments.text
so.data.Image = Image.text
so.flush()
}
I don't want to create any kind of architecture or server interaction, just append my timestamps and strings to an SO. Screw complexity! I intend to use all 100kb of the SO allocation with pride!
But I have absolutely no clue how to append data to the shared object. (Cough)
Any ideas how I could create a log file out of a shared object? I'll be logging about 200 lines per so it'd be awkward to generate new variable names for each line then save the variable after 4 hours of use. Appending to a single variable would be awesome.
You could just replace your so.data.Title line with this:
so.data.Title = (so.data.Title is String) ? so.data.Title + Title.text : Title.text; //check whether so.data.Title is a String, if it is append to it, if not, overwrite it/set it
Please consider not using capitalized first letter for instance names (as in Title). In Actionscript (and most C based languages) instance names / variables are usually written with lowercase first letter.

WinRT: Reading and deserializaing large amount of files takes too much time

I have a Windows Store application which manages collection of objects and stores them in the application local folder. Those objects are serialized on the file system using JSON. As I need to be able to edit and persist those items individually I opted for individual files for each objects instead of one large file. Objects are stored following this pattern:
Local Folder
|
--- db
|
--- AB283376-7057-46B4-8B91-C32E663EC964
| |
| --- AB283376-7057-46B4-8B91-C32E663EC964.json
| --- AB283376-7057-46B4-8B91-C32E663EC964.jpg
|
--- B506EFC5-E853-45E6-BA32-64193BB49ACD
| |
| --- B506EFC5-E853-45E6-BA32-64193BB49ACD.json
| --- B506EFC5-E853-45E6-BA32-64193BB49ACD.jpg
|
...
Each object has its folder node which will contains the JSON serialized object and other eventual resources.
Everything was fine when I made some writing, reading, deleting test. Where it got complicated is when I tried to load up large collections of object on application startup. I estimated that the largest amount of item one would store to 10000. So I wrote 10000 entries and then tried to load it... more than 3 minutes to the application to complete the operation, which of course is unacceptable.
So my questions are, What could be optimized in the code I made for reading and deserializing objects (code below)? Is there a way to implement a paging system so loading would be dynamic in my WinRT application? Is my storage method (pattern above) too heavy for in terms of IO/CPU? Am I missing something in WinRT?
public async Task<IEnumerable<Release>> GetReleases()
{
List<Release> items = new List<Release>();
var dbFolder = await ApplicationData.Current.LocalFolder.CreateFolderAsync(dbName, CreationCollisionOption.OpenIfExists);
foreach (var releaseFolder in await dbFolder.GetFoldersAsync())
{
var releaseFile = await releaseFolder.GetFileAsync(releaseFolder.DisplayName + ".json");
var stream = await releaseFile.OpenAsync(FileAccessMode.Read);
using (var inStream = stream.GetInputStreamAt(0))
{
DataContractJsonSerializer serializer = new DataContractJsonSerializer(typeof(Release));
Release release = (Release)serializer.ReadObject(inStream.AsStreamForRead());
items.Add(release);
}
stream.Dispose();
}
return items;
}
Thanks for your help.
NB: I already had a look as SQLite and I don't need such a sophisticated system.
Supposedly JSON.NET is better than the built in things. If you are not sending the data over the wire, then the quickest way is to do binary serialization rather than JSON or XML. Finally - think if you really need to load all the data when your application starts. Serialize your data as a list of binary records and create an index that will allow you to quickly jump to the range of records you actually need to use.
As Filip already mentioned, you probably don't need to load all data at startup. Even if you really want to show all the items in the first page (showing 10,000 items at once to a user doesn't sound like a good idea to me), you don't need to have all their properties available: usually only a couple of them are shown in the list, you need the rest of them when the user navigates to individual item details. You could have a separate "index" file containing only the data you need for the list. This does mean duplication, but it will help you with performance.
Although you've mentioned, you don't need SQLite as it is too sophisticated for your needs, you really should take a closer look at it. It is designed to efficiently handle structured data such as yours. I'm pretty sure if you switch to it, the performance will be much better and your code might end up even simpler in the end. Try it out.