Custom map reduce functions in couchbase - couchbase

i want to use Couchbase Map Reduce Functionality.
My input is as follows:
{
"domain": "cnn.com",
"country": "USA",
"value": 1
}
Each document represent a single access to a domain from a specific source country,
I want to be able to query the count of accesses for a domain and country
meaning i want to group by (domain,country) and sum (value)
how can i write the reduce function that does that?
wanted output :
{
"domain": "cnn.com",
"country": "USA",
"value": 5
}
....
{
"domain": "cnn.com",
"country": "France",
"value": 2
}

Assume you have 4 documents like this:
{"domain":"cnn.com","country":"USA","value":5}
{"domain":"cnn.com","country":"USA","value":6
{"domain":"cnn.com","country":"France","value":2}
{"domain":"cnn.com","country":"France","value":10}
then the map would be:
function (doc, meta) {
emit([doc.domain, doc.country], doc.value);
}
and the reduce is a simple as _sum. In your query remember to set enabled the group in order to have the summing to work.
The result would look like below image:

Related

What result from REST endpoint is more common?

We are during design of our REST-API and we are wondering in what form REST endpoint should return data?
We have an endpoint that returns so-called "identity" objects that have different attributes.
Each 'identities' has unique string eg. UUID#cf684c35-200e-4936-8b63-e6e51b6e3569.
We are wondering which format the developers are more used to?
Like this below:
{
"UUID#cf684c35-200e-4936-8b63-e6e51b6e3569": {
"validity_date": 1608591121,
"visibility": "private"
},
"RFID#cf684c35-200e-4936-8b63-e6e51b6e3570": {
"validity_date": 1608591123,
"visibility": "public".
}
}
or
{
"results": [
{
"identity": "UUID#cf684c35-200e-4936-8b63-e6e51b6e3569",
"validity_date": 1608591121,
"visibility": "private"
},
{
"identity": "RFID#cf684c35-200e-4936-8b63-e6e51b6e3570",
"validity_date": 0,
"visibility": "1608591123"
},
]
}
What is your opinion?
TL;DR I recommend to use a list of objects (your second approach).
Let's take your objects to a more obvious example of users with an id and a name:
{
1: {
"name": "Michal"
},
2: {
"name": "Thomas"
}
}
[
{
"id": 1,
"name": "Michal"
},
{
"id": 2,
"name": "Thomas"
}
]
Both approaches can be used, I don't see any difference from the API-level itself.
But let's consider how an application might provide or consume such data:
fetching a database table of users (e.g. whose birthday is next week)
showing a table of users (e.g. user name and birthday)
processing the monthly salary to employees
All three examples use a list of users, which is the second approach. Since many applications operate on a list of entities, that's a common sense for APIs.
I think that it [] is better than it { "results": [] }.
Said it, on my opinion the 2nd is better because of [on some languages] is easier to map it than to map the 1st.

Pentaho Kettle: How to dynamically fetch JSON file columns

Background: I work for a company that basically sells passes. Every order that is placed by the customer will contain N number of passes.
Issue: I have these JSON event-transaction files coming into a S3 bucket on a daily basis from DocumentDB (MongoDB). This JSON file is associated to the relevant type of event (insert, modify or delete) for every document key (which is an order in my case). The example below illustrates a "Insert" type of event that came through to the S3 bucket:
{
"_id": {
"_data": "11111111111111"
},
"operationType": "insert",
"clusterTime": {
"$timestamp": {
"t": 11111111,
"i": 1
}
},
"ns": {
"db": "abc",
"coll": "abc"
},
"documentKey": {
"_id": {
"$uuid": "abcabcabcabcabcabc"
}
},
"fullDocument": {
"_id": {
"$uuid": "abcabcabcabcabcabc"
},
"orderNumber": "1234567",
"externalOrderId": "12345678",
"orderDateTime": "2020-09-11T08:06:26Z[UTC]",
"attraction": "abc",
"entryDate": {
"$date": 2020-09-13
},
"entryTime": {
"$date": 04000000
},
"requestId": "abc",
"ticketUrl": "abc",
"tickets": [
{
"passId": "1111111",
"externalTicketId": "1234567"
},
{
"passId": "222222222",
"externalTicketId": "122442492"
}
],
"_class": "abc"
}
}
As we see above, every JSON file might contain N number of passes and every pass is - in turn - is associated to an external ticket id, which is a different column (as seen above). I want to use Pentaho Kettle to read these JSON files and load the data into the DW. I am aware of the Json input step and Row Normalizer that could then transpose "PassID 1", "PassID 2", "PassID 3"..."PassID N" columns into 1 unique column "Pass" and I would have to have to apply a similar logic to the other column "External ticket id". The problem with that approach is that it is quite static, as in, I need to "tell" Pentaho how many Passes are coming in advance in the Json input step. However what if tomorrow I have an order with 10 different passes? How can I do this dynamically to ensure the job will not break?
If you want a tabular output like
TicketUrl Pass ExternalTicketID
---------- ------ ----------------
abc PassID1Value1 ExTicketIDvalue1
abc PassID1Value2 ExTicketIDvalue2
abc PassID1Value3 ExTicketIDvalue3
And make incoming value dynamic based on JSON input file values, then you can download this transformation Updated Link
I found everything work dynamic in JSON input.

Retrieve only 1 dataset out of many dataset in Firebase

I would like to retrieve 1 dataset out of, example, 3 datasets in Firebase. I am using Firebase RESTful api to do that.
I tried using parameters but I kept getting all 3 datasets instead of 1.
https://mydatabase.firebaseio.com/user.json?Name=Alan
This is how my data looks like in JSon
{
"1234567": {
"Name": "Alan",
"Department": "Retail Team"
},
"7894563": {
"Name": "Joe",
"Department": "Sales Team"
},
"9876543": {
"Name": "Tammy",
"Department": "Customer Service"
}
}
If you want to filter data using the REST api you have to add orderBy to your parameters to specify what field you want to filter on. (It doesn't actually do anything for ordering on the client side) And in this case you have to combine it with equalTo as stated in the docs. The result will be this:
https://mydatabase.firebaseio.com/user.json?orderBy="Name"&equalTo="Alan"
In order to make this work you also need to add an index in your database rules like this:
{
"rules": {
"user": {
".indexOn": ["Name"]
}
}
}

typeahead nested json object

I am new to Ember and JSON. I want to parse a JSON object that is below with typeahead library
and access nested object values by searching their keys.
I have this Json format:
return [
{
"id": 1,
"category_name": "Supermarket",
"category_description": "SUPER MARKET",
"image_url": "",
"merchants": [
{
"name": "CARREFOUR",
"id": 12,
"merchant_type_id": 1,
"merchant_type_description": "Gold",
"merchant_redeption_rate": 0.002500,
"image_url": "https://jpg",
"branches": [
{
"id": 123456,
"latitude": 37.939483,
"area": "ΑΓ. ΔΗΜΗΤΡΙΟΣ",
"zip": "12345"
},
{
"id": 4567890,
"longitude": 23.650622,
"area": "ΑΓ. ΙΩΑΝΝΗΣ ΡΕΝΤΗΣ",
"zip": "12345"
}
]
},
{
"name": "CAFCO",
"id": 13,
"merchant_type_id": 3,
"merchant_type_description": "None",
"merchant_redeption_rate": 0.002500,
"image_url": "https:.jpg",
"branches": [
{
"id": 127890,
"latitude": 38.027870,
"area": "ΠΕΡΙΣΤΕΡΙ",
"zip": "12345"
}
]
}
]
},
{
"id": 2,
"category_name": "Πολυκαταστήματα",
"category_description": "ΠΟΛΥΚΑΤΑΣΤΗΜΑ",
"image_url": "",
"merchants": [
{
"name": "AGGELOPOYLOS CHR.",
"id": 15,
"merchant_type_id": 2,
"merchant_type_description": "Silver",
"merchant_redeption_rate": 0.002500,
"image_url": "https://www.nbg.gr/greek/retail/cards/reward-programmes/gonational/PublishingImages/aggelopoulos.jpg",
"branches": [
{
"id": 234780,
"latitude": 35.366118,
"longitude": 24.479461,
"address": "ΕΘΝ. ΜΑΚΑΡΙΟΥ 9 & ΕΛ. ΒΕΝΙΖΕΛΟΥ 1",
"area": "Ν. ΦΑΛΗΡΟ",
"zip": "12345"
}
]
}
]
}
];
--------------------------Updated----------------------------
For example, i want to search using typeahead the name of merchants and when the letter we write to search matches the name of merchants it will appear the corresponding category_name and backwards.
Example -> when i keyboard the s it will appear :
Category : Supermarket,
Name: CARREFOUR
Name: CAFCO
And the same output on the dropdown of search when i keyboard the letter c.
Any help?
New Jsbin example
The simplest way (in my mind) to get this to work is to create a computed property that will contain an array of latitudes. But how do we get there?
To get to latitude, you need to go through array of merchants and then array of branches. Being that this will be across multiple elements, you are going to end up with "array of arrays" type data structure, which is annoying to deal with. So, to simplify this, we can create a simple flatten function as follows:
flatten: function(origArray){
var newArr = [];
origArray.forEach(function(el) {
el.forEach(function(eachEl){
newArr.push(eachEl);
});
});
return newArr;
},
In addition to our function above, Ember already provides us with many other useful functions that can be used on arrays (see here). One of those is mapBy(property) which transforms an array into another array only keeping the values of the property we specified.
So, to create a lats (for latitudes) property, we can just do this:
lats: function(){
var merchantsArr = this.get('model').mapBy('merchants');
merchantsArr = this.flatten(merchantsArr);
var branchesArr = merchantsArr.mapBy('branches');
branchesArr = this.flatten(branchesArr);
return branchesArr.mapBy("latitude").compact();
}.property('model')
Above, I am basically using mapBy, flatten (see above) and compact which
Returns a copy of the array with all null and undefined elements removed.
Once you have the lats property with all the necessary data, the rest is easy.
Your call to component becomes:
{{x-typeahead data=lats name='category_name' selection=myColor}}
Note lats instead of model you originally were passing into the component.
And now, to access the value of data property in the component, you do
`this.get('data')`
which you can just pass in as the source like so:
source: substringMatcher(self.get('data'))
Working solution here
Update
Updating my answer based on your updated question.
OK, so this is getting a little more complicated. You now need more than just one property (latitude) from the object. You need category_name and merchant name.
In addition to mapBy, which just grabs one property out of array, Ember also has map which lets you transform the array into pretty much anything you want to:
lats: function(){
var merchantsArr = this.get('model').map(function(thing){
var category_name = thing.category_name;
return thing.merchants.map(function(merchant){
return {
"name": merchant.name,
"category": category_name
};
});
});
merchantsArr = this.flatten(merchantsArr);
return merchantsArr;
}.property('model')
The code above looks complicated, but it's basically just returning an array of top level objects' merchants accompanied by category_name. Since this is an array of arrays, we will need to flatten it.
Then, inside the component, we need to keep in mind that we are not just passing in an array of strings, but rather we are passing in an array of objects. Therefore, we need to look through object's properties (name and category) for a match
$.each(strs, function(i, str) {
if (substrRegex.test(str.name) || substrRegex.test(str.category)) {
matches.push(str);
}
});
Lastly, to actually display both category and merchant name, you need to tell Typeahead how to do that:
templates: {
suggestion: Handlebars.compile('<p>{{name}} – {{category}}</p>')
}
Working solution here

Freebase MQL to list out all commons types for a given word?

I'm trying to figure out how to write a MQL query to get a list of all the types associated to a given word.
For example I tried:
{
"id":null,
"name":null,
"name~=": "SOME_WORD",
"type":"/type/type",
"domain": {
"id": null,
"/freebase/domain_profile/category": {
"id": "/category/commons"
}
}
}​
I found this to list out all the Commons types or categories but haven't yet figured out how to narrow it down for a given input.
[{
"id": null,
"name": null,
"type": "/freebase/domain_profile",
"category": {
"id": "/category/commons"
}
}]​
There are a couple of different ways to do this with different tradeoffs for each.
Use the Search API with a query like this
https://www.googleapis.com/freebase/v1/search?indent=true&filter=%28all%20name{full}:%22uss%20constitution%22%29
You'll get back JSON results which look like this:
{
"status": "200 OK",
"result": [
{
"mid": "/m/07y14",
"name": "USS Constitution",
"notable": {
"name": "Ship",
"id": "/boats/ship"
},
"lang": "en",
"score": 1401.410400
},
...
You can make the matching more liberal by switching the "{full}" to "{phrase}" which will give you a substring match instead of an exact match.
Caveats:
- You'll only get a single "notable type" and it's fixed by Freebase's (unknown) algorithm
- I don't think there's a way to get both USS Constitution & U.S.S. Constitution results
- You can get a list of all types by adding &mql_output={"type":[]}, but then you lose the "notable" type. I don't think there's a way to get both in a single call.
Use MQL
This query returns the basic information that you want:
[{
"name~=":"uss constitution",
"type":[],
"/common/topic/notable_types" : []
}]​
Caveats:
It won't find "uss constitution" which is an alias rather than the primary name (there's a recipe in the MQL cookbook for that though)
It won't find "u.s.s. constitution"
The "notable_types" API is an MQL extension and MQL extensions aren't supported in the new Freebase API, only the legacy deprecated API
You're tied to whatever (unknown) algorithm Freebase used to compute "notability"
Depending on what you are trying to accomplish, you might need something more sophisticated than this (as well as a deeper understanding of what's in Freebase), but this should get you going with the basics.
Did you try:
[{
"name": "David Bowie",
"type": []
}]