Splitting a feature collection by system index in Google Earth Engine?

Splitting a feature collection by system index in Google Earth Engine? - gis

I am trying to export a large feature collection from GEE. I realize that the Python API allows for this more easily than the Java does, but given a time constraint on my research, I'd like to see if I can extract the feature collection in pieces and then append the separate CSV files once exported.
I tried to use a filtering function to perform the task, one that I've seen used before with image collections. Here is a mini example of what I am trying to do
Given a feature collection of 10 spatial points called "points" I tried to create a new feature collection that includes only the first five points:
var points_chunk1 = points.filter(ee.Filter.rangeContains('system:index', 0, 5));
When I execute this function, I receive the following error: "An internal server error has occurred"
I am not sure why this code is not executing as expected. If you know more than I do about this issue, please advise on alternative approaches to splitting my sample, or on where the error in my code lurks.
Many thanks!

system:index is actually ID given by GEE for the feature and it's not supposed to be used like index in an array. I think JS should be enough to export a large featurecollection but there is a way to do what you want to do without relying on system:index as that might not be consistent.
First, it would be a good idea to know the number of features you are dealing with. This is because generally when you use size().getInfo() for large feature collections, the UI can freeze and sometimes the tab becomes unresponsive. Here I have defined chunks and collectionSize. It should be defined in client side as we want to do Export within the loop which is not possible in server size loops. Within the loop, you can simply creating a subset of feature starting from different points by converting the features to list and changing the subset back to feature collection.
var chunk = 1000;
var collectionSize = 10000
for (var i = 0; i<collectionSize;i=i+chunk){
var subset = ee.FeatureCollection(fc.toList(chunk, i));
Export.table.toAsset(subset, "description", "/asset/id")
}

Related

How to fix a query in functions within foundry which is hiting ObjectSet:PagingAboveConfiguredLimitNotAllowed?

I have phonorgraph object with billions of rows and we are querying it through object set service
for example, I want to get all DriverLicences from certain city.
#Function()
public getDriverLicences(city: string): ObjectSet<DriverLicences> {
let drivers = Objects.search().DriverLicences().filter(row => row.city.exactMatch(city));
return drivers ;
}
I am facing this error when I am trying query it from slate:
ERROR 400: {"errorCode":"INVALID_ARGUMENT","errorName":"ObjectSet:PagingAboveConfiguredLimitNotAllowed","errorInstanceId":"0000-000","parameters":{}}
I understand that I am probably retrieving more than 100 000 results but I need all the results because of the implemented logic in the front is a complex slate dashboard built by another team that we cannot re-factor.

The issue here is that, specifically in the Slate <> Function connector, there is a "translation layer" that serializes the contents of the object set and provides a response data structure that materializes the property:value pairs for each object in the set.
This clearly doesn't work for large object sets where throwing so much data into the browser is likely to overwhelm the resources allocated to the tab.
From context it seems like you might be migrating an existing Slate app over to Functions; in the current version, how is the query limiting the number of results returned? It certainly must not be returning several 100 thousand results for further processing on the front end? (And if so, that might be an anti-pattern to consider addressing).
As for options that you could currently explore, you can sort your object set and then specify a smaller limit to return:
Objects.search().DriverLicences().filter(row => row.city.exactMatch(city)).orderBy(date_of_issue).take(100)
You'll find a few more details in the Functions documentation Reference entry on Ontology API: Object Sets in the section on Ordering and limiting.
You can even make a work around for the (current) lack of paging when return an ObjectSet to Slate by using the last value from the property ordered on (i.e. date_of_issue) as a filter in the subsequent request and return the next N objects.
This can work if you need a Slate table or HTML widget that renders on set of results then, on a user action, gets the next page.

WinRT: Reading and deserializaing large amount of files takes too much time

I have a Windows Store application which manages collection of objects and stores them in the application local folder. Those objects are serialized on the file system using JSON. As I need to be able to edit and persist those items individually I opted for individual files for each objects instead of one large file. Objects are stored following this pattern:
Local Folder
|
--- db
|
--- AB283376-7057-46B4-8B91-C32E663EC964
| |
| --- AB283376-7057-46B4-8B91-C32E663EC964.json
| --- AB283376-7057-46B4-8B91-C32E663EC964.jpg
|
--- B506EFC5-E853-45E6-BA32-64193BB49ACD
| |
| --- B506EFC5-E853-45E6-BA32-64193BB49ACD.json
| --- B506EFC5-E853-45E6-BA32-64193BB49ACD.jpg
|
...
Each object has its folder node which will contains the JSON serialized object and other eventual resources.
Everything was fine when I made some writing, reading, deleting test. Where it got complicated is when I tried to load up large collections of object on application startup. I estimated that the largest amount of item one would store to 10000. So I wrote 10000 entries and then tried to load it... more than 3 minutes to the application to complete the operation, which of course is unacceptable.
So my questions are, What could be optimized in the code I made for reading and deserializing objects (code below)? Is there a way to implement a paging system so loading would be dynamic in my WinRT application? Is my storage method (pattern above) too heavy for in terms of IO/CPU? Am I missing something in WinRT?
public async Task<IEnumerable<Release>> GetReleases()
{
List<Release> items = new List<Release>();
var dbFolder = await ApplicationData.Current.LocalFolder.CreateFolderAsync(dbName, CreationCollisionOption.OpenIfExists);
foreach (var releaseFolder in await dbFolder.GetFoldersAsync())
{
var releaseFile = await releaseFolder.GetFileAsync(releaseFolder.DisplayName + ".json");
var stream = await releaseFile.OpenAsync(FileAccessMode.Read);
using (var inStream = stream.GetInputStreamAt(0))
{
DataContractJsonSerializer serializer = new DataContractJsonSerializer(typeof(Release));
Release release = (Release)serializer.ReadObject(inStream.AsStreamForRead());
items.Add(release);
}
stream.Dispose();
}
return items;
}
Thanks for your help.
NB: I already had a look as SQLite and I don't need such a sophisticated system.

Supposedly JSON.NET is better than the built in things. If you are not sending the data over the wire, then the quickest way is to do binary serialization rather than JSON or XML. Finally - think if you really need to load all the data when your application starts. Serialize your data as a list of binary records and create an index that will allow you to quickly jump to the range of records you actually need to use.

As Filip already mentioned, you probably don't need to load all data at startup. Even if you really want to show all the items in the first page (showing 10,000 items at once to a user doesn't sound like a good idea to me), you don't need to have all their properties available: usually only a couple of them are shown in the list, you need the rest of them when the user navigates to individual item details. You could have a separate "index" file containing only the data you need for the list. This does mean duplication, but it will help you with performance.
Although you've mentioned, you don't need SQLite as it is too sophisticated for your needs, you really should take a closer look at it. It is designed to efficiently handle structured data such as yours. I'm pretty sure if you switch to it, the performance will be much better and your code might end up even simpler in the end. Try it out.

What's happening when I use for(i in object) in AS3?

To iterate over the properties of an Object in AS3 you can use for(var i:String in object) like this:
Object:
var object:Object = {
thing: 1,
stuff: "hats",
another: new Sprite()
};
Loop:
for(var i:String in object)
{
trace(i + ": " + object[i]);
}
Result:
stuff: hats
thing: 1
another: [object Sprite]
The order in which the properties are selected however seems to vary and never matches anything that I can think of such as alphabetical property name, the order in which they were created, etc. In fact if I try it a few different times in different places, the order is completely different.
Is it possible to access the properties in a given order? What is happening here?

I'm posting this as an answer just to compliment BoltClock's answer with some extra insight by looking directly at the flash player source code. We can actually see the AVM code that specifically provides this functionality and it's written in C++. We can see inside ArrayObject.cpp the following code:
// Iterator support - for in, for each
Atom ArrayObject::nextName(int index)
{
AvmAssert(index > 0);
int denseLength = (int)getDenseLength();
if (index <= denseLength)
{
AvmCore *core = this->core();
return core->intToAtom(index-1);
}
else
{
return ScriptObject::nextName (index - denseLength);
}
}
As you can see when there is a legitimate property (object) to return, it is looked up from the ScriptObject class, specifically the nextName() method. If we look at those methods within ScriptObject.cpp:
Atom ScriptObject::nextName(int index)
{
AvmAssert(traits()->needsHashtable());
AvmAssert(index > 0);
InlineHashtable *ht = getTable();
if (uint32_t(index)-1 >= ht->getCapacity()/2)
return nullStringAtom;
const Atom* atoms = ht->getAtoms();
Atom m = ht->removeDontEnumMask(atoms[(index-1)<<1]);
if (AvmCore::isNullOrUndefined(m))
return nullStringAtom;
return m;
}
We can see that indeed, as people have pointed out here that the VM is using a hash table. However in these functions there is a specific index supplied, which would suggest, at first glance, that there must then be specific ordering.
If you dig deeper (I won't post all the code here) there are a whole slew of methods from different classes involved in the for in/for each functionality and one of them is the method ScriptObject::nextNameIndex() which basically pulls up the whole hash table and just starts providing indices to valid objects within the table and increments the original index supplied in the argument, so long as the next value points to a valid object. If I'm right in my interpretation, this would be the cause behind your random lookup and I don't believe there would be any way here to force a standardized/ordered map in these operations.
Sources
For those of you who might want to get the source code for the open source portion of the flash player, you can grab it from the following mercurial repositories (you can download a snapshop in zip like github so you don't have to install mercurial):
http://hg.mozilla.org/tamarin-central - This is the "stable" or "release" repository
http://hg.mozilla.org/tamarin-redux - This is the development branch. The most recent changes to the AVM will be found here. This includes the support for Android and such. Adobe is still updating and open sourcing these parts of the flash player, so it's good current and official stuff.
While I'm at it, this might be of interest as well: http://code.google.com/p/redtamarin/. It's a branched off (and rather mature) version of the AVM and can be used to write server-side actionscript. Neat stuff and has a ton of information that gives insight into the workings of the AVM so I thought I'd include it too.

This behavior is documented (emphasis mine):
The for..in loop iterates through the properties of an object, or the elements of an array. For example, you can use a for..in loop to iterate through the properties of a generic object (object properties are not kept in any particular order, so properties may appear in a seemingly random order)
How the properties are stored and retrieved appears to be an implementation detail, which isn't covered in the documentation. As ToddBFisher mentions in a comment, though, a data structure commonly used to implement associative arrays is the hash table. It's even mentioned in this page about associative arrays in AS3, and if you inspect the AVM code as shown by Ascension Systems, you'll find exactly such an implementation. As described, there is no notion of order or sorting in typical hash tables.
I don't believe there is a way to access the properties in a specific order unless you store that order somehow.

Is having a single massive class for all data storage OK?

I have created a class that I've been using as the storage for all listings in my applications. The class allows me to "sign" an object to a listing (which can be created on the fly via the sign() method like so):
manager.sign(myObject, "someList");
This stores the index of the element (using it's unique id) in the newly created or previously created listing "someList" as well as the object in a 2D array. So for example, I might end up with this:
trace(_indexes["someList"][objectId]); // 0 - the object is the first in this list
trace(_instances["someList"]); // [object MyObject]
The class has another two methods:
find(signature:String):Array
This method returns an array via slice() containing all of the elements signed with the given signature.
findFirst(signature:String):Object
This method just returns the first object in a given listing
So to retrieve myObject I can either go:
trace(find("someList")[0]); or trace(findFirst("someList"));
Finally, there is an unsign() function which will remove an object from a given listing. This function basically:
Stores the result of pop() in the specified listing against a variable.
Uses the stored index to quickly replace the specified object with the pop()'d item.
Deletes the stored index for the specified object and updates the index for the pop()'d item.
Through all this, using unsign() will remove an object extremely quickly from a listing of any size.
Now this is all well and good, but I've had some thoughts which are making me consider how good this really is? I mean being able to easily list, remove and access lists of anything I want throughout the application like this is awesome - but is there a catch?
A couple of starting thoughts I have had are:
So far I haven't implemented support for listings that are private and only accessible via a given class.
Memory - this doesn't seem very memory efficient. Then again, neither is creating arrays for everything I want to store individually either. Just seems.. Larger.. Somehow.
Any insights?
I've uploaded the class here in case the above doesn't make much sense: https://projectavian.com/AviManager.as

Your solution seems pretty solid. If you're looking to modify it to be a bit more extensible and handle rights management, you might consider moving all those individually indexed properties to a value object for your AV elements. You could perform operations like "sign" and "unsign" internally in the VOs, or check for access rights. Your management class could monitor the collection of these VOs, pass them around, perform the method calls, and the objects would hold the state in a bit more readable format.
Really, though, this is entering into a coding style discussion. Your method works and it's not particularly inefficient. Just make sure the code is readable, encapsulated, and extensible and you're good.

Trouble finding DataGrid to implement for over 10,000 records with pagination, filtering, and clickable links for each row

I have tried using a few different data grids (FlexiGrid, ExtJs Grid, and YUI DataGrid) and have found YUI to work the best as far as documentation and features available. However, I am having difficulty setting up the data source. When I try to set it up using JSON, it takes too long, or times out. I have already maxed out the memory usage in the php.ini file. There will be many more records in the future as well.
I need to select data to populate the grid based on the user that is currently logged in. Once this information populates the grid, I need each id to be click-able and take me to a different page, or populate information in a div on the same page.
Does anyone have suggestions on loading 25 – 50 records at a time of dynamic data? I have tried implementing the following example to do what I want: YUI Developer Example
I cannot get the data grid to show at all. I have changed the data instance to the following.
// DataSource instance
var curDealerNumber = YAHOO.util.Dom.getElementsByClassName('dealer_number', 'input');
var ds_path = + "lib/php/json_proxy.php?dealernumber='" + curDealerNumber + "'";
var myDataSource = new YAHOO.util.DataSource("ds_path");
myDataSource.responseType = YAHOO.util.DataSource.TYPE_JSON;
myDataSource.responseSchema = {
resultsList: "records",
fields: [
{key:"id", parser:"number"},
{key:"user_dealername"},
{key:"user_dealeraccttype"},
{key:"workorder_num", parser:"number"},
{key:"segment_num", parser:"number"},
{key:"status"},
{key:"claim_type"},
{key:"created_at"},
{key:"updated_at"}
],
metaFields: {
totalRecords: "totalRecords" // Access to value in the server response
}
};
Any help is greatly appreciated, and sorry if this seems similar to other posts, but I searched and still could not resolve my problem. Thank you!

It's hard to troubleshoot without a repro case, but I'd suggest turning on logging to see where the problem might be:
load datatable-debug file
load logger
either call YAHOO.widget.Logger.enableBrowserConsole() to output logs to your browser's JS console (i.e., Firebug), or call new YAHOO.widget.LogReader() to output logs to the screen.
Also make sure the XHR request and response are well-formed with Firebug or similar tool.
Finally, when working with large datasets, consider
pagination
enabling renderLoopSize (http://developer.yahoo.com/yui/datatable/#renderLoop)
chunking data loads into multiple requests (http://developer.yahoo.com/yui/examples/datatable/dt_xhrjson.html).
There is no one-size-fits-all solution for everyone, but hopefully you can find the right set of tweaks for your use case.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Splitting a feature collection by system index in Google Earth Engine? - gis

Related

How to fix a query in functions within foundry which is hiting ObjectSet:PagingAboveConfiguredLimitNotAllowed?

WinRT: Reading and deserializaing large amount of files takes too much time

What's happening when I use for(i in object) in AS3?

Is having a single massive class for all data storage OK?

Trouble finding DataGrid to implement for over 10,000 records with pagination, filtering, and clickable links for each row

Categories

Resources