MongoDB Map&Reduce much slower than MySQL group by - mysql

I am trying to evaluate which Databasesystem to use for a new Project.
At the moment I compare MySQL and MongoDB for the task at hand.
I have abotu 5 Million Recoreds with 350 Numeric fields, and I have to use this data to provide different granularity levels for some graph plotting.
I pumped the data into a MongoDB and into a Mysql, and on Mysql I generated some interim tables with 10/th, 100/th and 1000/th of the granularity. The application then chooses the correct table that matches best for the current task and then queries the data there.
With this technique I can get the data fast enough ( < 100 ms).
The SQL query I use is:
SELECT from_unixtime(CAST(FLOOR(MIN(STAMP/1000)) AS SIGNED INTEGER)),
MIN(RING),MIN(STATE),CAST(FLOOR(MIN(STAMP)) as SIGNED INTEGER),AVG(w21030401)
FROM project1 GROUP BY FLOOR((stamp - 1181589892000)/60000);
I use the identical query for creating the interim tables. The only difference is, tat there are 350 wXXXXXX fields.
INSERT INTO project1_10 (TTIME,RING,STATE,STAMP,w21030401,.........)
SELECT from_unixtime(CAST(FLOOR(MIN(STAMP/1000)) AS SIGNED INTEGER)),
MIN(RING),MIN(STATE),CAST(FLOOR(MIN(STAMP)) as SIGNED INTEGER),AVG(w21030401),.......
FROM project1 GROUP BY FLOOR((stamp - 1181589892000)/60000);
Then I tried to do the same thing with MongoDB.
I pumed all the data into MongoDB and got 4,8 Million documents in the Form:
{ "_id" : ObjectId("50040b3f0cf2872a8d3af90d"), "TTIME" :
ISODate("2008-11-30T06:40:07Z"), "STAMP" : NumberLong("1228027207000"),
"STATE" : 2531, "RING" : 1, "w13010096" : 34.991, "w13010097" : 1.432,
"w23010001" : 292, "w18030180" : 84, "w18030380" : 95, "w21030002" : 51.113,
"w21030005" : 60.321, "w21030004" : 274.662, "w21030008" : 149.629,
"w21030009" : 126.565, "w21030010" : 576.296, ........... }
Then I tried to generate the Interim Documents with the following mapReduce:
keylist = [ 'w21030401', 'w13011114', .... ];
m = function (){
var result = {};
result['STAMP'] = this['STAMP'];
result['RING'] = this['RING'];
result['TTIME'] = this['TTIME'];
result['STATE'] = this['STATE'];
for(var key in keylist){
if(key in this) {
result[key] = this[key];
result['cnt_' + key] = 1;
}
}
var zone = Math.floor((this['STAMP'] - 1171004118000) / 1000000);
emit( zone , result );
};
r = function (name, values){
var result = {};
result['STAMP'] = values[0]['STAMP'];
result['RING'] = values[0]['RING'];
result['TTIME'] = values[0]['TTIME'];
result['STATE'] = values[0]['STATE'];
for(var key in keylist) {
result[key] = 0;
result['cnt_' + key] = 0;
}
for ( var i=0; i<values.length; i++ ) {
if(values[i]['STAMP'] < result['STAMP']) {
result['STAMP'] = values[i]['STAMP'];
result['TTIME'] = values[i]['TTIME'];
}
if(values[i]['RING'] < result['RING']) {
result['RING'] = values[i]['RING'];
}
if(values[i]['STATE'] < result['STATE']) {
result['STATE'] = values[i]['STATE'];
}
for(var key in keylist) {
if(key in values[i]) {
result[key] += values[i][key];
result['cnt_' + key] += values[i]['cnt_' + key];
}
}
}
return result;
};
f = function(who, val){
var result = {};
result['STAMP'] = val['STAMP'];
result['RING'] = val['RING'];
result['TTIME'] = val['TTIME'];
result['STATE'] = val['STATE'];
for(var key in keylist) {
if(key in val) {
result[key] = val[key]/val['cnt_'+key];
}
}
return result;
};
db.project1.mapReduce( m, r, { finalize : f, scope: { keylist: keylist }, out : {replace : 'project1_100'} , jsMode : false });
MySQL used 210 Seconds for the creation fo the Interim Table, MongoDB used about 4 Hours.
My Question is:
Is MongoDB not suitable for my Problem, do I just need bigger Hardware for MongoDB than for MySQL, or did I do something wrong wih my MapReduce
Thanks
Peter

Related

what is the problem with the following code?

I have an array of facilities. I want to have indexes of the facility which is selected and allocated. In the end, I want to have a CSV output which shows me each of the facilities. But instead of showing them like [24 15 30 ...] I want to separate them like: [24,25,30,...]. The following code gives me an error. Is it possible to let me know what is the problem?
The error is 1. element "string" does not in an OPL model. The 2.element hub has never been used. (but as you can see I used it)
{int} hub = { s | s in facilities : y[s] == 1 };
//Output in a CSV file
execute{
string hubs="[";
for (var i=0; i<hub.length-1;i++){
hubs += hub[i]+",";
}
hubs += hub[hub.length-1]+"]";
var f=new IloOplOutputFile("1.csv");
f.writeln("Facilities");
f.writeln(hubs);
f.close();
}
{int} facilities=asSet(1..3);
int y[facilities]=[1,0,1];
{int} hub = { s | s in facilities : y[s] == 1 };
//Output in a CSV file
execute{
var f=new IloOplOutputFile("1.csv");
f.writeln("Facilities =");
var hubs="[";
for (var i in hub){
hubs += i+",";
}
hubs+="]";
f.writeln(hubs);
f.close();
}
This will give:
Facilities =
[1,3,]
PS:
{int} facilities=asSet(1..3);
int y[facilities]=[1,0,1];
{int} hub = { s | s in facilities : y[s] == 1 };
//Output in a CSV file
execute{
var f=new IloOplOutputFile("1.csv");
f.writeln("Facilities =");
var hubs="[";
for (var i in hub){
hubs += i;
if (i!=Opl.last(hub)) hubs+=",";
}
hubs+="]";
f.writeln(hubs);
f.close();
}
gives
Facilities =
[1,3]

Use Sequelize on for loop findAll query and merge result

I'm coding opensource project in the university course
It is a function to search the value of another table by dividing input keyword by comma.
under this example data
Python,CPP,Csharp
var keyword = result[0].keyword;
var keyword_arr = [];
var keyword_split = keyword.split(',');
for (var i in keyword_split)
{
keyword_arr.push(keyword_split[i]);
}
I have succeeded in separating them with commas like above, but I'm looking for a loop in sequelize.
"Error: Can not set headers after they are sent."
An error is returned and is not executed.
I want to output the results merged. What should I do?
my code is
for (i = 0; i < keyword_arr.length; i++) {
query += models.contents.findAll({
where: {keyword: {like: '%' + keyword_arr[i] + '%'}},
raw: true
});
}
Regards.
You were in the right direction , but here it his how you can do :
queries = [];
for (i = 0; i < keyword_arr.length; i++) {
queries.push({keyword: {like: '%' + keyword_arr[i] + '%'}});
}
models.contents.findAll({
where: {
$or : queries
}
raw: true
}).then(results => {
console.log(results); // <---- Check this
})
NOTES :
models.contents.findAll() //<---- Returns promises
You can't just combine the promises by += as its not string or number
like that
In your case , it will create and run the query for each tag , so
that's not proper way of doing , you should combine the tags and create a single query as I did

JSON contains array of nulls with three objects - expected only three objects

Using a standalone Google Apps Script and a Google Spreadsheet. I have this script which returns as JSON an array of nulls and three objects, but I expected only to get three objects. Its a search, and when a zipcode is searched, the script is to return any matches. The thing is, it returns the matches successfully, but it also returns a null for each row that was not a match, in the order the rows appear on the google sheet. To make it work, the function testDoGetWithZipcode() should be run.
I don't know if I'm supposed to get those nulls, if they matter, or how I can fix it. It doesn't seem to go with anything I've learned about JSON so far but before even asking this I did an hour and a half Lynda.com course on Javascript and JSON and read the JSON.org website and read the documentation on Mozilla about JSON. I've adjusted variables in all of the functions because at first I thought it was in the function formatOrganization() but now I'm completely stumped.
s = SpreadsheetApp.openById("1280aUAvFoUDP2rtpCFS2JYR7TuQNYcd5gm8QudukiGc");
var sheet = s.getSheetByName("RAP - Data");
var data = sheet.getDataRange().getValues();
var headings = data[0];
function zipcodeQuery(zipcode) {
zipcodeArray = [];
for (var i = 1; i < data.length; i++){
if (zipcode === data[i][4].toString()){
zipcodeArray.push(data[i]);
}
}
return zipcodeArray
}
function formatOrganization(rowData){
var organization = {}
for (var i = 0; i < headings.length; i++){
Logger.log('Headings: ' + headings[i]);
organization[headings[i].toString()] = rowData[i];
}
return organization
}
function executeZipcodeQuery(request) {
zipcodes = request.parameters.zipcode;
// The object to be returned as JSON
response = {
organizations : []
}
// Fill the organzations dictionary with requested organizations
for (var i = 0; i < zipcodes.length; i++) {
sheetData = zipcodeQuery(zipcodes[i]);
if(sheetData !== undefined) {
for (var orgIndex = 0; orgIndex < sheetData.length; orgIndex++) {
var org = formatOrganization(sheetData[orgIndex]);
if(org !== undefined) {
Logger.log('Org object: ' + org);
if(typeof org === 'object') {
//FIXME
var orgId = parseInt(org.Id);
Logger.log('Org Id: ' + orgId);
response.organizations[orgId] = org
//response.organizations.push({orgId : org});
}
}
}
}
}
if (response.organizations.length > 0)
{
return ContentService.createTextOutput(JSON.stringify(response.organizations));
}
else
{
return ContentService.createTextOutput('Invalid Request. zipcode(s) not found.');
}
}
function testDoGetWithZipcode() {
var testRequest = {"parameter":{"zipcode":"19132"},"contextPath":"","contentLength":-1,"queryString":"zipcode=19132","parameters":{"zipcode":["19132"]}};
var textResult = doGet(testRequest);
textResult.setMimeType(ContentService.MimeType.JSON);
Logger.log('Mime Type: ' + textResult.getMimeType());
Logger.log('Result content: ' + textResult.getContent());
}
The return I get is this (abridged because there's over a 180 rows in the spreadsheet and they're all represented in the return by either null or an object):
[
null,
....
null,
{
"Id":61,
"Category":"Day / Drop in Centers",
"Organization Name":"Philadelphia Recovery Community Center (PRCC)",
"Address":"1701 W Lehigh Ave, Philadelphia, PA 19132",
"Zip Code":19132,
"Days":"Mon, Tues, Thurs, Fri: 12-8pm, Wed: 9-5pm, Sat: 9-1pm",
"Time: Open":"",
"Time: Close":"",
"People Served":"Women, Men, Families",
"Description":"Case management, outpatient treatment, youth programs, training programs",
"Phone Number":"215-223-7700"
},
....
null,
{
"Id":81,
"Category":"Emergency Shelter",
"Organization Name":"Station House",
"Address":"2601 N Broad St, Philadelphia, PA 19132",
"Zip Code":19132,
"Days":"",
"Time: Open":"",
"Time: Close":"",
"People Served":"Men",
"Description":"After hours reception for single men\n 2601 N. Broad Street\n After 4 pm",
"Phone Number":"215-225-9230"
},
null,
...
]
Your original object is this:
response = {
organizations : []
}
The value of the key/value pair for organizations is an array. But you are using notation as if organizations was an object.
response.organizations[orgId] = org
You could push a value into the array with:
response.organizations.push(org);
I'd probably try something like this:
var tempObject = {}; //Reset every time
tempObject[orgId] = org;
response.organizations.push(tempObject);

Parsing Google Maps JSON data for Geocoding in JQ (Not JQuery)

I am trying to get the Country and City names from Lat and Long values with JQ.
Here is the full example JSON
https://maps.googleapis.com/maps/api/geocode/json?latlng=55.397563,10.39870099999996&sensor=false
I pasted returned JSON in jqplay,
Tried to select Country and City names, but the closest I get is
.results[0].address_components[].short_name
How can I specify just bring the nodes where "types" : [ "country", "political" ] ?
Thanks
It's unclear to me what exactly you're looking for. Each result has a set of types, each address component also has a set of types. Which one did you want? We can write a filter that will match what you attempted but considering the data, it will be completely useless to you. The only item that contains the types you listed is just a country name.
Anyway, assuming you wanted to get a result object that had the types "country" and "political", use the contains() filter.
.results | map(
select(
.types | contains(["country","political"])
)
)
Otherwise you'll need to clarify what exactly you wanted from this data set. An example of the expected results...
I wrote a function to do this.
/**
* geocodeResponse is an object full of address data.
* This function will "fish" for the right value
*
* example: type = 'postal_code' =>
* geocodeResponse.address_components[5].types[1] = 'postal_code'
* geocodeResponse.address_components[5].long_name = '1000'
*
* type = 'route' =>
* geocodeResponse.address_components[1].types[1] = 'route'
* geocodeResponse.address_components[1].long_name = 'Wetstraat'
*/
function addresComponent(type, geocodeResponse, shortName) {
for(var i=0; i < geocodeResponse.address_components.length; i++) {
for (var j=0; j < geocodeResponse.address_components[i].types.length; j++) {
if (geocodeResponse.address_components[i].types[j] == type) {
if (shortName) {
return geocodeResponse.address_components[i].short_name;
}
else {
return geocodeResponse.address_components[i].long_name;
}
}
}
}
return '';
}
The way to use it; an example:
...
myGeocoder.geocode({'latLng': marker.getPosition()}, function(results, status) {
if (status == google.maps.GeocoderStatus.OK && results[1]) {
var country = addresComponent('country', results[1], true);
var postal_code = addresComponent('postal_code', results[1], true);
...
}
});
...
I used it here: saving marker data into db
Assign the json to results variable, var results = {your json}.
Then try this :
for( var idx in results.results)
{
var address = results.results[idx].address_components;
for(var elmIdx in address)
{
if(address[elmIdx].types.indexOf("country") > -1 &&
address[elmIdx].types.indexOf("political") > -1)
{
address[elmIdx].short_name //this is the country name
address[elmIdx].long_name //this is the country name
}
}
}

ScriptDB object size calculation

I'm trying to estimate the limits of my current GAS project. I use ScriptDB to chunk out processing to get around the 6 min execution limit. If I have an object like
var userObj{
id: //user email address
count: //integer 1-1000
trigger: //trigger ID
label: //string ~30 char or less
folder: //Google Drive folder ID
sendto: //'true' or 'false'
shareto: //'true' or 'false'
}
How would I calculate the size that this object takes up in the DB? I would like to project how many of these objects can exist concurrently before I reach the 200MB limit for our domain.
Whenever you've got a question about google-apps-script that isn't about the API, try searching for javascript questions first. In this case, I found JavaScript object size, and tried out the accepted answer in apps-script. (Actually, the "improved" accepted answer.) I've made no changes at all, but have reproduced it here with a test function so you can just cut & paste to try it out.
Here's what I got with the test stud object, in the debugger.
Now, it's not perfect - for instance, it doesn't factor in the size of the keys you'll use in ScriptDB. Another answer took a stab at that. But since your object contains some potentially huge values, such as an email address which can be 256 characters long, the key lengths may be of little concern.
// https://stackoverflow.com/questions/1248302/javascript-object-size/11900218#11900218
function roughSizeOfObject( object ) {
var objectList = [];
var stack = [ object ];
var bytes = 0;
while ( stack.length ) {
var value = stack.pop();
if ( typeof value === 'boolean' ) {
bytes += 4;
}
else if ( typeof value === 'string' ) {
bytes += value.length * 2;
}
else if ( typeof value === 'number' ) {
bytes += 8;
}
else if
(
typeof value === 'object'
&& objectList.indexOf( value ) === -1
)
{
objectList.push( value );
for( i in value ) {
stack.push( value[ i ] );
}
}
}
return bytes;
}
function Marks()
{
this.maxMarks = 100;
}
function Student()
{
this.firstName = "firstName";
this.lastName = "lastName";
this.marks = new Marks();
}
function test () {
var stud = new Student();
var studSize = roughSizeOfObject(stud);
debugger;
}