Best DynamoDB Structure for Implementation - json

I am working on a save application, basically the user could go to an article and click save to store it in his profile. Instead of using a relational database the Application currently is using dynamodb. Each article has a specific type of article. The way the structure is currently used for this application is:
user-id [string][DynamoDBHashKey]
type-of-article [string] [DynamoDBRangeKey]
json [string]
user-id is the unique identifier for the user, type-of-article is well.. the type of the article, and the json is all the articles saved in a json format. the json format being:
[{article-id: timestamp}, {article-id: timestamp}]
Article #1 ^ Article #2 ^
article-id is (again) the article unique identifier and timestamp is the timestamp for when that article was stored .
Note This was done before dynamodb started supporting for json documents as Map and Lists. And the code is not mine.. It was already done..
So when the application needs to remove an article from saved It calls dynamo to get the json modify the json and then stores it again. When is going to add a new article it does the same thing. Now a problem appeared when I wanted to display all the articles ordered by the timestamps. I had to call to get all the types and merge them in a dictionary to sort them. (In the user profile I need to show all saved articles, no matter what type, sorted) Now the application is taking more than 700 or 900 ms to respond.
Personally I don't think this is the best way to approach this. So i'm thinking on rewriting the previous code to implement the new features from dynamodb (List and Maps). Now my idea for the structure in dynamodb is like this:
user-id [string] [DynamoDBHashKey]
saved-articles [List]
article-type_1
article_1 [Map] {id: article-id, timestamp: date}
article_2 [Map] {id: article-id, timestamp: date}
article-type_2
article_1 [Map] {id: article-id, timestamp: date}
But i'm relatively new to dynamodb, I made some test code to store this in dynamo using list and maps. I did it using the low level api and with the Object Persistence Model.
Now, my question is: is this a better approach or if is not why ? and what would be the better approach.
This way I think I can use the low level Api to only get the saved-articles of article-type #2. Or if I need them all I just call it all.

I would stick with a solution that is a bit more NoSQL-like. For NoSQL databases, if you have nested data models and/or updating existing records, those are often indicators that your data model can be optimized. I really see 2 objects that your application is using, 'users' and 'articles'. I would avoid a nested data model and updating existing records by doing the following:
'user' table
user id as hash key
'article' table
article id as hash key
timestamp as range key
user id (used in global secondary index described below)
article type and any other attributes would be non-key attributes
You'd also have a global secondary index on the article table that would allow you to search for articles by user id, which would look like something (assuming you want a user's articles sorted by date):
user id as hash key
timestamp as range key
article id as projected attribute
With this model, you never need to go back and edit existing records, you just add records that are 'edited' as new records, and you take the one with the most recent timestamp as your current version.
One thing to remember with NoSQL is that storage space is cheap, reads are cheap, but editing existing records are usually expensive and undesirable operations.

Related

Firebase Database: how to compare two values

In my Firebase database, I have a data structure similar to this:
The post ID (1a3b3c4d5e) is generated by the ChildByAutoId() function.
The user ID (fn394nf9u3) is the UID of the user.
In my app, I have a UILabel (author) and I would like to update it with the 'full name' of the user who created the post.
Since I have a reference to the post ID in the users part of the database, I assume there must be some code (if statement?) to check if the value exists and if so, update the label.
Can you help with that?
While it is possible to do the query (ref.child("Users").queryOrdered(byChild: "Posts/1a3b3c4d5e").queryEqual(toValue:true)), you will need to have an index on each specific user's posts to allow this query to run efficiently. This is not a feasible strategy.
As usual when working with NoSQL databases: if you need to do something that your current data model doesn't allow, change your data model to allow the use-case.
In this case that can either be adding the UID of the user to each post, or alternative add the user name to each post (as Andre suggests) and determining if/how you deal with user name changes.
Having such relational data in both directions to allow efficient lookups in both directions is very common in NoSQL database such as Firebase and Firestore. In fact I wrote a separate answer about dealing with many-to-many relations.
If you can change the structure then that is very good because I don't think you are maintaining proper structure for database.
You should take one more key name createdBy inside the Post node so actully structure would be
{description:"Thus the post is here", title:"Hello User", createdBy:"Javed Multani"}
Once you do this, It will dam easy to get detail of user.
OR
Unethical solution,
You can achieve this thing like while you are going to show Post from post node of firabase. Definitely you'll get the auto generated postid like:
1a3b3c4d5e
now first you should first get only posts then inside the successfully getting data and parsing you have to get users and find inside the user by putting the codition like postId == UserPostId if match found take fullname value from there.

Couchbase 4.0 Data Modelling

I have an application with entities like User, Message and MessageFeatures. Each User can have many messages and each message has a MessageFeatures entity. Currently the relational model is expressed as:
User{
UUID id
String email
...
}
Message{
UUID id,
UUID userId
String text
....
}
MessageFeatures{
UUID id
UUID messageId
UUID userId
PrimitiveObject feature1
....
PrimitiveObject featureN
}
The most important queries are:
Get all messages for user
Get all message features for a user
Get message by uuid
Get/Update message feature by uuid
Get message feature by message uuid
Less important(can be slow) queries are like :
Get message features where user_id = someuuid and featureX = value
Get all/count user uuids for which featureX = value
update message features set featureX = newValue where featureX = oldValue
While evaluating couchbase i am unable to arrive at a proper data model. I do not think putting all messages and message features for a user in a single document is a good idea because the size will keep on increasing and based on current data it will easily be in range of 4-5 MB for 2 year data. Also to maintain consistency i can update only one message feature at a time as atomicity is per document.
If i do not place them in a single document they will be scattered around the cluster and queries like get all messages/messagefeatures of a user will result in scatter and gather.
I have checked out global secondary indexes and N1QL but even if I index user_uuid field of messages it will only help in fetching the message_uuids of that user, loading all the messages will result in scatter and gather...
Is there a way to force that all messages, message features of a user_uuid get mapped to a same physical node without embedding them in the same document something like hashtags in redis.
You should translate the relational model above directly to Couchbase. You should create GSI indexes for all the relationships (id fields). Use EXPLAIN to make sure every query uses an index. For direct lookup by id, use USE KEYS.
Scatter/gather in Couchbase means something different than what you describe. It is when a single index scan has to visit several nodes and then merge the scan results (distributed index). Instead, each GSI index lives on a single node, so GSI indexes avoid scatter/gather.
Finally, note that Couchbase is fast at key-value fetches even across nodes, so you do not need to worry about locality of data.

Couchbase - Splitting a JSON object into many key-value entries - performance improvement?

Say my Couchbase DB has millions of user objects, each user object contains some primitive fields (score, balance etc.)
And say I read & write most of those fields on every server request.
I see 2 options of storing the User object in Couchbase:
A single JSON object mapped to a user key (e.g. user_555)
Mapping each field into a separate entry (e.g. score_555 and balance_555)
Option 1 - Single CB lookup, JSON parsing
Option 2 - Twice the lookups, less parsing if any
How can I tell which one is better in terms of performance?
What if I had 3 fields? what if 4? does it make a difference?
Thanks
Eyal
Think about your data structure and access patterns first before worrying if json parsing or extra lookups will add overhead to your system.
From my perspective and experience I would try to model documents based upon logical object groupings, I would store 'user' attributes together. If you were to store each field separately you'd have to do a series of lookups if you ever wanted to provide a client or service with a full overview of the player profile.
I've used Couchbase as the main data store for a social mobile game, we store 90% of user data in a user document, this contains all the relevant fields such as score,level,progress etc. For the majority of operations such as a new score or upgrades we want to be dealing with the whole User object in the application layer so it makes sense to inflate the user object from the cb document, alter/read what we need and then persist it again if there have been changes.
The only time we have id references to other documents is in the form of player purchases where we have an array of ids that each reference a separate purchase. We do this as we wanted to have richer information on each purchase (date of transaction,transaction id,product type etc) that isn't relevant to the user document as when a purchase is made we verify it's legitimate and then add to the User inventory and create the separate purchase document.
So our structure is:
UserDoc:
-Fields specific to a User (score,level,progress,friends,inventory)
-Arrays of IDS pointing to specific purchases
The only time I'd consider splitting out some specific fields as you outlined above would be if your user document got seriously large but I think it'd be best to divide documents up per groupings of data as opposed to specific fields.
Hope that helped!

How to get the latest document in couchbase bucket?

I have a activity bucket in my couchbase-db and I need to retrieve the latest document for different types, My initial approach was this:
Document Format : [ id , { val.time ,val.type, val.load } ]
And then I wrote different views to Map a specific val.type and I used reduce to get the latest val.time, However I have the problem of views not being updated ( Cause apparently the Map is only called on new or changed documents and this approach needs all of the documents to be mapped and reduced.)
What is the best practice / approach for time-based data on Couchbase ( NoSQL ) databases ?
You can access the documents by time with a simple view like this:
Convert your time to an integer value. Eg using the parse() method. http://www.quackit.com/javascript/javascript_date_and_time_functions.cfm
Use this integer value as the key to your view. You can then access the first or last document.
If you always need the last document it's faster to negate the time value, so the largest times are at the start of the index.
If you are using a development design document, it is expected that adding new keys into your bucket may have no effect on the view. Since only a subset of keys from the bucket are going into the Map/Reduce pipeline, adding a key that is not going into the subset would trigger no update to your view.
Either evaluate it on a production design document (by clicking the "publish" button), or try adding more keys.

Properly structuring scriptDb for and Inventory System

We have built a Inventory and Inspection Manager for our gear, each item type can be on a different schedule and can have multiple of the same item. We serialize all items for tracking. My question is how can I structure the scriptDb to only have one object per serial number. Currently I'm storing every inspection and movement separately and just iterate through by serial number. Is there a proper way to have it structured like the following with out overwriting a previous entry inside the inspHx section of the object?
{serialNumber, itemType, manufacturer, expDate, inspHx{multiple entries}}
if I get the item by serialNumber.inspHx and save a new record into it deletes the previous object in the inspHx. How can you continue to add new records to the inspHx of one serialNumber?
Thanks for the advice.
Database design can be a bit of an art and there is usually more than one way of reaching your goal. That said:
Instead of trying to store everything in a single object you might have two object types. There might be two buckets, so to speak, one for each data type. You would need to add an additional parameter to each object identifying its type or bucket. Type: meta or Type: inspHx
Meta data object that contains the information about the item that rarely changes. There should be only one of these per serial number.
2 Inspection objects, one for each inspection with date, status, etc.
Each type needs a common element or KEY which in this case would be the serial number.
When querying you would do two queries using the serial number for meta data and serial number plus any constraints on the inspection objects.
For a bit more see the Tables section at: https://developers.google.com/apps-script/scriptdb#tables