Denormalization vs Child/Parent & Nesting - mysql

We are designing Elastic Search model for events, their schedules and venues, where the events take place.
The design is following:
Example of queries we might need:
Find events, which are Concerts, between 1/7/2017 and 7/7/2017
Find artists who performs at London and the event is Theatre play
Find events, which are Movies and having Score > 70%
Find users, who attend event AwesomeEvent
Find venues, which locality is London and any event is planned in the future since today
I've read elastic doc and few articles like this and some stack questions. But still I'm not sure about our model, because it's very specific.
Examples of possible usage:
1) Using nested pattern
{
"title": "Event",
"body": "This great event is going to be...",
"Schedules": [
{
"name": "Schedule 1",
"start": "7.7.2017",
"end": "8.7.2017"
},
{
"name": "Schedule 2",
"start": "10.7.2017",
"end": "11.7.2017"
}
],
"Performers": [
{
"name": "Performer 1",
"genre": "Rock"
},
{
"name": "Performer 2",
"genre": "Pop"
}
],
...
}
Pros:
More flat model which should stick to "key:value" approach
Entity carries all information by itself
Cons:
Lot of redundant data
More complex entities
2) Parent / Child relation between following entities (simplified)
{
"title": "Event",
"body": "This great event is going to be...",
}
{
"title": "Schedule",
"start": "7.7.2017",
"end": "8.7.2017"
}
{
"name": "Performer",
"genre": "Rock"
}
Pros:
Avoiding to duplicate redundant data
Cons:
More joins (even the parent/child are stored at same shard)
The model is not that flat, I'm not sure about the performance
So far we have a relational database, where the model works fine but it's not fast enough. Especially for example when you imagine a cinema, one event(movie) can have a thousands of schedules in different localities and we want to achieve very fast response for filtering as I wrote at the first part.
I expect any suggestions leading to properly designing the data model. I will be also glad for reviewing my assumptions (probably some of them might be wrong).

It's hard to denormalize your data. For example, the number of performers in an event is unknown; so if you were to have specific fields for performers, you would need perofrmer1.firstname, perofrmer1.lastname, performer2.firstname, performer2.lastname, etc. However if you use nested field instead, you would simply define a nested field Performer under event index with correct sub-field mappings, then you can add as many as you want to it. This will enable you to lookup event by performer or performer by event. The same apply to the rest of the indices.
As far as parent-child vs nested, parent-child provide more dependence as child documents reside on a completely separate index. Both parent-child and nested fields can specify "include_in_parent" option to automatically denormalize the fields for you

Related

How to Structure REST API If It Contains Different Structures?

Currently, I have an API end point that needs to return a list of items in a JSON format. However, each of the items that are returned has a different structure.
E.g. Think of a feed API. However, the structure of each item within the API can be very different.
Is it standard to return API response with multiple items - each with a different structure?
A made-up sample below to show different structures.
Store, Candy, and Personnel in the example is logically the same thing in my case (3 different items). Howevever, the structuring underneath can be very different - with different key-value pairs, different levels of nesting, etc.
{
"store":{
"book":[
{
"category":"reference",
"author":"Nigel Rees",
"title":"Sayings of the Century",
"price":8.95
},
{
"category":"fiction",
"author":"Evelyn Waugh",
"title":"Sword of Honour",
"price":12.99
}
],
"bicycle":{
"color":"red",
"price":19.95
}
},
{
"candy":
{
"type":"chocolate",
"manufacturer":"Hershey's",
"cost":10.00,
"reduced_cost": 9.00
},
},
{
"Personnel":
{
"name":"chocolate",
profile:
{
"Key": "Value",
"Key": "Value",
something:
{
"Key": "Value",
"Key": "Value",
}
}
},
},
}
There are no strict rules to REST in terms of how you design your payloads. However, there are still certainly things to consider obviously when doing so. Without knowing really the specifics of your needs it's hard to give specific advice but in general when it comes to designing a JSON REST API here is what I think about.
On average, how large will my payload be. We don't want to pass large amounts of data on each request. This will make your application extremely slow and perhaps even unusable on mobile devices. For me, the limit in the absolute worse case is 1mb and maybe this is even too high. If you find your payload is too large, break it down into separate resources. For example rather than including the books in the response to your stores resource, just reference the unique id's of the books that can be accessed through /stores/books/{id}
Is it simple enough that a person who stumbles across the resource can understand the general use of it. The simpler an API is the more useful it is for users. If the structure is really complex, perhaps breaking it into several resources is a better option
This point sort of balances number #1. Try and reduce the number of requests to get a certain piece of data as much of possible (still considering the other two points above). Excessively breaking down payloads into separate resources also reduces performance.

What is the impact (performance wise) on using linq statements like Where, GroupJoin etc on a mobile app in Xamarin Forms

Although the question might sound a bit vague and a bit misleading but I will try to explain it.
In Xamarin.Forms, I would like to present a list of products. The data are coming from an api call that delivers json.
The format of the data is as follows: A list of products and a list of sizes for each product. An example is the following:
{
"product": {
"id": 1,
"name": "P1",
"imageUrl": "http://www.image.com"
}
}
{
"sizes": [
{
"productId": 1,
"size": "S",
"price": 10
},
{
"productId": 1,
"size": "M",
"price": 12
}
]
}
It seems to me that I have 2 options:
The first is to deliver the data from the api call with the above format and transform them into the list that I want to present by using limq GroupJoin command (hence the title of my question)
The second option is to deliver the finalized list as json and just present it in the mobile application without any transformation.
The first option will deliver less amount of data but will use a linq statement to restructure the data and the second option will deliver a larger amount of data but the data will already be structured in the desired way.
Obviously, delivering less amount of data is preferable (first option), but my question is, will the use of a linq GroupJoin command “kill” the performance of the application?
Just for clarification, the list that will be presented in the mobile application will have 2 items and the items will be the following:
p1-size: s – price 10
p2-size: m – price 12
Thanks
I've had rather complex sets of linq statements; I think the most lists I was working with was six, with a few thousand items in a couple of those lists, and hundreds or less in the others; to join and where things, and the performance impact is negligible. This was in Xamarin.Forms PCL on Droid/iOS.
(I did manage really bad performance once when I was calling linq on a linq on a linq, rather than calling linq on a list. i.e. I had to ensure I ToList()ed a given linq statement before trying to use it in another join statement; understandably due to the deferred/lazy execution of linq).

How to create a view in Couchbase with multiple WHERE and OR clauses

I'm new to CouchBase, and I'm looking for a solution to scale my social network. Couchbase looks more interesting, specially it's easy to scale features.
But I'm struggling about creating a view for a specific kind of document.
My documents looks like this:
{
"id": 9476182,
"authorid": 86498,
"content": "some text here",
"uid": 41,
"accepted": "N",
"time": "2014-12-09 09:58:03",
"type": "testimonial"
}
{
"id": 9476183,
"authorid": 85490,
"content": "some text here",
"uid": 41,
"accepted": "Y",
"time": "2014-12-09 10:44:01",
"type": "testimonial"
}
What I'm looking for is for a view that would be equivalent to this SQL query.
SELECT * FROM bucket WHERE (uid='$uid' AND accepted='Y') OR
(uid='$uid' AND authorid='$logginid')
This way I could fetch all user's testimonials even the ones not approved, if the user who is viewing the testimonials page is the owner of that testimonials page, or if not, show all given users testimonials where accepted is =="Y", plus testimonials not approved yet, but written by the user's who is viewing the page.
If you could give me some tips about this I'll be very grateful.
Unlike SQL you cannot directly pass input parameters into views; however, you can emulate this to some extent by filtering ranges.
While not exactly matching SQL, I would suggest you simply filter testimonials based on the user ID, and then do the filtering on the client side. I am making the assumption that in most cases there will not even be any pending testimonials, and therefore you will not really end up with a lot of unnecessary data.
Note that it is possible to filter this using views entirely, however it would require:
Bigger keys OR
Multiple views OR
Multiple queries
In general it is recommended to make the emitted keys smaller, as this increases performance; so better stick with the above-mentioned solution.

Map or Array for RESTful design of finite, unordered collection?

A coworker and I are in a heated debate regarding the design of a REST service. For most of our API, GET calls to collections return something like this:
GET /resource
[
{ "id": 1, ... },
{ "id": 2, ... },
{ "id": 3, ... },
...
]
We now must implement a call to a collection of properties whose identifying attribute is "name" (not "id" as in the example above). Furthermore, there is a finite set of properties and the order in which they are sent will never matter. The spec I came up with looks like this:
GET /properties
[
{ "name": "{PROPERTY_NAME}", "value": "{PROPERTY_VALUE}", "description": "{PROPERTY_DESCRIPTION}" },
{ "name": "{PROPERTY_NAME}", "value": "{PROPERTY_VALUE}", "description": "{PROPERTY_DESCRIPTION}" },
{ "name": "{PROPERTY_NAME}", "value": "{PROPERTY_VALUE}", "description": "{PROPERTY_DESCRIPTION}" },
...
]
My coworker thinks it should be a map:
GET /properties
{
"{PROPERTY_NAME}": { "value": "{PROPERTY_VALUE}", "description": "{PROPERTY_DESCRIPTION}" },
"{PROPERTY_NAME}": { "value": "{PROPERTY_VALUE}", "description": "{PROPERTY_DESCRIPTION}" },
"{PROPERTY_NAME}": { "value": "{PROPERTY_VALUE}", "description": "{PROPERTY_DESCRIPTION}" },
...
}
I cite consistency with the rest of the API as the reason to format the response collection my way, while he cites that this particular collection is finite and the order does not matter. My question is, which design best adheres to RESTful design and why?
IIRC how you return the properties of a resource does not matter in a RESTful approach.
http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm
From an API client point of view I would prefer your solution, considering it is explicitly stating that the name of a property is XYZ.
Whereas your coworkers solution would imply it is the name, but how would I know for sure (without reading the API documenation). Try not to assume anything regarding your consuming clients, just because you know what it means (and probably is easy enough to assume to what it means) it might not be so obvious for your clients.
And on top of that, it could break consuming clients if you are ever to decide to revert that value from being a name back to ID. Which in this case you have done already in the past. Now all the clients need to change their code, whereas they would not have to in your solution, unless they need the newly added id (or some other property).
To me the approach would depend on how you need to use the data. Are the property names known before hand by the consuming system, such that having a map lookup could be used to directly access the record you want without needing to iterate over each item? Would there be a method such as...
GET /properties/{PROPERTY_NAME}
If you need to look up properties by name and that sort of method is NOT available, then I would agree with the map approach, otherwise, I would go with the array approach to provide consistent results when querying the resource for a full collection.
I think returning a map is fine as long as the result is not paginated or sorted server side.
If you need the result to be paginated and sorted on the server side, going for the list approach is a much safer bet, as not all clients might preserve the order of a map.
In fact in JavaScript there is no built in guarantee that maps will stay sorted (see also https://stackoverflow.com/a/5467142/817385).
The client would need to implement some logic to restore the sort order, which can become especially painful when server and client are using different collations for sorting.
Example
// server sent response sorted with german collation
var map = {
'ä':{'first':'first'},
'z':{'second':'second'}
}
// but we sort the keys with the default unicode collation algorigthm
Object.keys(map).sort().forEach(function(key){console.log(map[key])})
// Object {second: "second"}
// Object {first: "first"}
A bit late to the party, but for whoever stumbles upon this with similar struggles...
I would definitely agree that consistency is very important and would generally say that an array is the most appropriate way to represent a list. Also APIs should be designed to be useful in general, preferably without optimizing for a specific use-case. Sure, it could make implementing the use-case you're facing today a bit easier but it will probably make you want to hit yourself when you're implementing a different one tomorrow. All that being said, of course for quite some applications the map-formed response would just be easier (and possibly faster) to work with.
Consider:
GET /properties
[
{ "name": "{PROPERTY_NAME}", "value": "{PROPERTY_VALUE}", "description": "{PROPERTY_DESCRIPTION}" },
...
]
and
GET /properties/*
{
"{PROPERTY_NAME}": { "value": "{PROPERTY_VALUE}", "description": "{PROPERTY_DESCRIPTION}" },
...
}
So / gives you a list whereas /* gives you a map. You might read the * in /* as a wildcard for the identifier, so you're actually requesting the entities rather than the collection. The keys in the response map are simply the expansions of that wildcard.
This way you can maintain consistency across your API while the client can still enjoy the map-format response when preferred. Also you could probably implement both options with very little extra code on your server side.

Filter posts by multiple tags to return posts that have all those tags, with good performance

StackOverflow lets you search for posts by tags, and lets you filter by an intersection of tags, e.g. ruby x mysql x tags. But typically it's inefficient to retrieve such lists from MySQL using mulitple joins on the taggings. What's a more performant way to implement filter-by-multiple tag queries?
Is there a good NoSQL approach to this problem?
In a NoSQL or document-oriented scenario, you'd have the actual tags as part of your document, likely stored as a list. Since you've tagged this question with "couchdb", I'll use that as an example.
A "post" document in CouchDB might look like:
{
"_id": <generated>,
"question": "Question?",
"answers": [... list of answers ...],
"tags": ["mysql", "tagging", "joins", "nosql", "couchdb"]
}
Then, to generate a view keyed by tags:
{
"_id": "_design/tags",
"language": "javascript",
"views": {
"all": {
"map": "function(doc) {
emit(doc.tags, null);
}"
}
}
}
In CouchDB, you can issue an HTTP POST with multiple keys, if you wish. An example is in the documentation. Using that technique, you would be able to search by multiple tags.
Note: Setting the value to null, above, helps keep the views small. Use include_docs=true in your query if you want to see the actual documents as well.