Using Other Data Sources for cubism.js - json

I like the user experience of cubism, and would like to use this on top of a backend we have.
I've read the API doc's and some of the code, most of this seems to be extracted away. How could I begin to use other data sources exactly?
I have a data store of about 6k individual machines with 5 minute precision on around 100 or so stats.
I would like to query some web app with a specific identifier for that machine and then render a dashboard similar to cubism via querying a specific mongo data store.
Writing the webapp or the querying to mongo isn't the issue.
The issue is more in line with the fact that cubism seems to require querying whatever data store you use for each individual data point (say you have 100 stats across a window of a week...expensive).
Is there another way I could leverage this tool to look at data that gets loaded using something similar to the code below?
var data = [];
d3.json("/initial", function(json) { data.concat(json); });
d3.json("/update", function(json) { data.push(json); });

Cubism takes care of initialization and update for you: the initial request is the full visible window (start to stop, typically 1,440 data points), while subsequent requests are only for a few most recent metrics (7 data points).
Take a look at context.metric for how to implement a new data source. The simplest possible implementation is like this:
var foo = context.metric(function(start, stop, step, callback) {
d3.json("/data", function(data) {
if (!data) return callback(new Error("unable to load data"));
callback(null, data);
});
});
You would extend this to change the "/data" URL as appropriate, passing in the start, stop and step times, and whatever else you want to use to identify a metric. For example, both Cube and Graphite use a metric expression as an additional query parameter.

Related

Splitting a feature collection by system index in Google Earth Engine?

I am trying to export a large feature collection from GEE. I realize that the Python API allows for this more easily than the Java does, but given a time constraint on my research, I'd like to see if I can extract the feature collection in pieces and then append the separate CSV files once exported.
I tried to use a filtering function to perform the task, one that I've seen used before with image collections. Here is a mini example of what I am trying to do
Given a feature collection of 10 spatial points called "points" I tried to create a new feature collection that includes only the first five points:
var points_chunk1 = points.filter(ee.Filter.rangeContains('system:index', 0, 5));
When I execute this function, I receive the following error: "An internal server error has occurred"
I am not sure why this code is not executing as expected. If you know more than I do about this issue, please advise on alternative approaches to splitting my sample, or on where the error in my code lurks.
Many thanks!
system:index is actually ID given by GEE for the feature and it's not supposed to be used like index in an array. I think JS should be enough to export a large featurecollection but there is a way to do what you want to do without relying on system:index as that might not be consistent.
First, it would be a good idea to know the number of features you are dealing with. This is because generally when you use size().getInfo() for large feature collections, the UI can freeze and sometimes the tab becomes unresponsive. Here I have defined chunks and collectionSize. It should be defined in client side as we want to do Export within the loop which is not possible in server size loops. Within the loop, you can simply creating a subset of feature starting from different points by converting the features to list and changing the subset back to feature collection.
var chunk = 1000;
var collectionSize = 10000
for (var i = 0; i<collectionSize;i=i+chunk){
var subset = ee.FeatureCollection(fc.toList(chunk, i));
Export.table.toAsset(subset, "description", "/asset/id")
}

Sequelize / mysql using 100% CPU when create or update data

I need to update or create data in a mysql table from a large array (few 1000s objects) with sequelize.
When I run the following code it uses up almost all my cpu power of my db server (vserver 2gb ram / 2cpu) and clogs my app for a few minutes until it's done.
Is there a better way to do this with sequelize? Can this be done in the background somehow or as a bulk operation so it doesn't effect my apps performance?
data.forEach(function(item) {
var query = {
'itemId': item.id,
'networkId': item.networkId
};
db.model.findOne({
where: query
}).then(function(storedItem) {
try {
if(!!storedItem) {
storedItem.update(item);
} else  {
db.model.create(item);
}
} catch(e) {
console.log(e);
}
});
});
Your first line of your sample code data.forEach()... makes a whole mess of calls to your function(item){}. Your code in that function fires off, in turn, a whole mess of asynchronously completing operations.
Try using the async package https://caolan.github.io/async/docs.htm and doing this
async = require('async');
...
async.mapSeries(data, function(item){...
It should allow each iteration of your function (which iterates once per item in your data array) to complete before starting the next one. Paradoxically enough, doing them one at a time will probably make them finish faster. It will certainly avoid soaking up your resources.
Weeks later I found the actual reason for this. (And unfortunately using async didn't really help after all) It was as simple as stupid: I didn't have an MYSQL index for itemId so with every iteration the whole table was queried which caused the high CPU load (obviously).

Service Worker slow response times

In Windows and Android Google Chrome browser, (haven't tested for others yet) response time from a service worker increases linearly to number of items stored in that specific cache storage when you use Cache.match() function with following option;
ignoreSearch = true
Dividing items in multiple caches helps but not always convenient to do so. Plus even a small amount of increase in items stored makes a lot of difference in response times. According to my measurements response time is roughly doubled for every tenfold increase in number of items in the cache.
Official answer to my question in chromium issue tracker reveals that the problem is a known performance issue with Cache Storage implementation in Chrome which only happens when you use Cache.match() with ignoreSearch parameter set to true.
As you might know ignoreSearch is used to disregard query parameters in URL while matching the request against responses in cache. Quote from MDN:
...whether to ignore the query string in the url. For example, if set to
true the ?value=bar part of http://example.com/?value=bar would be ignored
when performing a match.
Since it is not really convenient to stop using query parameter match, I have come up with following workaround, and I am posting it here in hopes of it will save time for someone;
// if the request has query parameters, `hasQuery` will be set to `true`
var hasQuery = event.request.url.indexOf('?') != -1;
event.respondWith(
caches.match(event.request, {
// ignore query section of the URL based on our variable
ignoreSearch: hasQuery,
})
.then(function(response) {
// handle the response
})
);
This works great because it handles every request with a query parameter correctly while handling others still at lightning speed. And you do not have to change anything else in your application.
According to the guy in that bug report, the issue was tied to the number of items in a cache. I made a solution and took it to the extreme, giving each resource its own cache:
var cachedUrls = [
/* CACHE INJECT FROM GULP */
];
//update the cache
//don't worry StackOverflow, I call this only when the site tells the SW to update
function fetchCache() {
return Promise.all(
//for all urls
cachedUrls.map(function(url) {
//add a cache
return caches.open('resource:'url).then(function(cache) {
//add the url
return cache.add(url);
});
});
);
}
In the project we have here, there are static resources served with high cache expirations set, and we use query parameters (repository revision numbers, injected into the html) only as a way to manage the [browser] cache.
It didn't really work to use your solution to selectively use ignoreSearch, since we'd have to use it for all static resources anyway so that we could get cache hits!
However, not only did I dislike this hack, but it still performed very slowly.
Okay, so, given that it was only a specific set of resources I needed to ignoreSearch on, I decided to take a different route;
just remove the parameters from the url requests manually, instead of relying on ignoreSearch.
self.addEventListener('fetch', function(event) {
//find urls that only have numbers as parameters
//yours will obviously differ, my queries to ignore were just repo revisions
var shaved = event.request.url.match(/^([^?]*)[?]\d+$/);
//extract the url without the query
shaved = shaved && shaved[1];
event.respondWith(
//try to get the url from the cache.
//if this is a resource, use the shaved url,
//otherwise use the original request
//(I assume it [can] contain post-data and stuff)
caches.match(shaved || event.request).then(function(response) {
//respond
return response || fetch(event.request);
})
);
});
I had the same issue, and previous approaches caused some errors with requests that should be ignoreSearch:false. An easy approach that worked for me was to simply apply ignoreSearch:true to a certain requests by using url.contains('A') && ... See example below:
self.addEventListener("fetch", function(event) {
var ignore
if(event.request.url.includes('A') && event.request.url.includes('B') && event.request.url.includes('C')){
ignore = true
}else{
ignore = false
}
event.respondWith(
caches.match(event.request,{
ignoreSearch:ignore,
})
.then(function(cached) {
...
}

WinRT: Reading and deserializaing large amount of files takes too much time

I have a Windows Store application which manages collection of objects and stores them in the application local folder. Those objects are serialized on the file system using JSON. As I need to be able to edit and persist those items individually I opted for individual files for each objects instead of one large file. Objects are stored following this pattern:
Local Folder
|
--- db
|
--- AB283376-7057-46B4-8B91-C32E663EC964
| |
| --- AB283376-7057-46B4-8B91-C32E663EC964.json
| --- AB283376-7057-46B4-8B91-C32E663EC964.jpg
|
--- B506EFC5-E853-45E6-BA32-64193BB49ACD
| |
| --- B506EFC5-E853-45E6-BA32-64193BB49ACD.json
| --- B506EFC5-E853-45E6-BA32-64193BB49ACD.jpg
|
...
Each object has its folder node which will contains the JSON serialized object and other eventual resources.
Everything was fine when I made some writing, reading, deleting test. Where it got complicated is when I tried to load up large collections of object on application startup. I estimated that the largest amount of item one would store to 10000. So I wrote 10000 entries and then tried to load it... more than 3 minutes to the application to complete the operation, which of course is unacceptable.
So my questions are, What could be optimized in the code I made for reading and deserializing objects (code below)? Is there a way to implement a paging system so loading would be dynamic in my WinRT application? Is my storage method (pattern above) too heavy for in terms of IO/CPU? Am I missing something in WinRT?
public async Task<IEnumerable<Release>> GetReleases()
{
List<Release> items = new List<Release>();
var dbFolder = await ApplicationData.Current.LocalFolder.CreateFolderAsync(dbName, CreationCollisionOption.OpenIfExists);
foreach (var releaseFolder in await dbFolder.GetFoldersAsync())
{
var releaseFile = await releaseFolder.GetFileAsync(releaseFolder.DisplayName + ".json");
var stream = await releaseFile.OpenAsync(FileAccessMode.Read);
using (var inStream = stream.GetInputStreamAt(0))
{
DataContractJsonSerializer serializer = new DataContractJsonSerializer(typeof(Release));
Release release = (Release)serializer.ReadObject(inStream.AsStreamForRead());
items.Add(release);
}
stream.Dispose();
}
return items;
}
Thanks for your help.
NB: I already had a look as SQLite and I don't need such a sophisticated system.
Supposedly JSON.NET is better than the built in things. If you are not sending the data over the wire, then the quickest way is to do binary serialization rather than JSON or XML. Finally - think if you really need to load all the data when your application starts. Serialize your data as a list of binary records and create an index that will allow you to quickly jump to the range of records you actually need to use.
As Filip already mentioned, you probably don't need to load all data at startup. Even if you really want to show all the items in the first page (showing 10,000 items at once to a user doesn't sound like a good idea to me), you don't need to have all their properties available: usually only a couple of them are shown in the list, you need the rest of them when the user navigates to individual item details. You could have a separate "index" file containing only the data you need for the list. This does mean duplication, but it will help you with performance.
Although you've mentioned, you don't need SQLite as it is too sophisticated for your needs, you really should take a closer look at it. It is designed to efficiently handle structured data such as yours. I'm pretty sure if you switch to it, the performance will be much better and your code might end up even simpler in the end. Try it out.

Trouble finding DataGrid to implement for over 10,000 records with pagination, filtering, and clickable links for each row

I have tried using a few different data grids (FlexiGrid, ExtJs Grid, and YUI DataGrid) and have found YUI to work the best as far as documentation and features available. However, I am having difficulty setting up the data source. When I try to set it up using JSON, it takes too long, or times out. I have already maxed out the memory usage in the php.ini file. There will be many more records in the future as well.
I need to select data to populate the grid based on the user that is currently logged in. Once this information populates the grid, I need each id to be click-able and take me to a different page, or populate information in a div on the same page.
Does anyone have suggestions on loading 25 – 50 records at a time of dynamic data? I have tried implementing the following example to do what I want: YUI Developer Example
I cannot get the data grid to show at all. I have changed the data instance to the following.
// DataSource instance
var curDealerNumber = YAHOO.util.Dom.getElementsByClassName('dealer_number', 'input');
var ds_path = + "lib/php/json_proxy.php?dealernumber='" + curDealerNumber + "'";
var myDataSource = new YAHOO.util.DataSource("ds_path");
myDataSource.responseType = YAHOO.util.DataSource.TYPE_JSON;
myDataSource.responseSchema = {
resultsList: "records",
fields: [
{key:"id", parser:"number"},
{key:"user_dealername"},
{key:"user_dealeraccttype"},
{key:"workorder_num", parser:"number"},
{key:"segment_num", parser:"number"},
{key:"status"},
{key:"claim_type"},
{key:"created_at"},
{key:"updated_at"}
],
metaFields: {
totalRecords: "totalRecords" // Access to value in the server response
}
};
Any help is greatly appreciated, and sorry if this seems similar to other posts, but I searched and still could not resolve my problem. Thank you!
It's hard to troubleshoot without a repro case, but I'd suggest turning on logging to see where the problem might be:
load datatable-debug file
load logger
either call YAHOO.widget.Logger.enableBrowserConsole() to output logs to your browser's JS console (i.e., Firebug), or call new YAHOO.widget.LogReader() to output logs to the screen.
Also make sure the XHR request and response are well-formed with Firebug or similar tool.
Finally, when working with large datasets, consider
pagination
enabling renderLoopSize (http://developer.yahoo.com/yui/datatable/#renderLoop)
chunking data loads into multiple requests (http://developer.yahoo.com/yui/examples/datatable/dt_xhrjson.html).
There is no one-size-fits-all solution for everyone, but hopefully you can find the right set of tweaks for your use case.