CouchBase map multiple documents with key prefix - couchbase

I have the following keys in my Couch DB :
content_HASHKEY
content_info_HASHKEY
In the above example the HASHKEY is unique in both my documents however the prefix and document content vary for both and for my web client I do not want them to be making multiple Couch DB calls to retrieve both pieces of information (related ot the same HASHKEY), what is the most efficient way to retrieve both the documents in a single call from Couch base Db ?
EDIT :
DO I actually need the HASHKEY to be a property insideexach of the documents as well to be able to JOIN them ?:
content_HASHKEY
HashKeyVal : HASHKEY
content_info_HASHKEY
HashKeyVal : HASHKEY

Related

Handling big JSONs in Azure Data Factory

I'm trying to use ADF for the following scenario:
a JSON is uploaded to a Azure Storage Blob, containing an array of similar objects
this JSON is read by ADF with a Lookup Activity and uploaded via a Web Activity to an external sink
I cannot use the Copy Activity, because I need to create a JSON payload for the Web Activity, so I have to lookup the array and paste it like this (payload of the Web Activity):
{
"some field": "value",
"some more fields": "value",
...
"items": #{activity('GetJsonLookupActivity').output.value}
}
The Lookup activity has a known limitation of an upper limit of 5000 rows at a time. If the JSON is larger, only 5000 top rows will be read and all else will be ignored.
I know this, so I have a system that chops payloads into chunks of 5000 rows before uploading to storage. But I'm not the only user, so there's a valid concern that someone else will try uploading bigger files and the pipeline will silently pass with a partial upload, while the user would obviously expect all rows to be uploaded.
I've come up with two concepts for a workaround, but I don't see how to implement either:
Is there any way for me to check if the JSON file is too large and fail the pipeline if so? The Lookup Activity doesn't seem to allow row counting, and the Get Metadata Activity only returns the size in bytes.
Alternatively, the MSDN docs propose a workaround of copying data in a foreach loop. But I cannot figure out how I'd use Lookup to first get rows 1-5000 and then 5001-10000 etc. from a JSON. It's easy enough with SQL using OFFSET N FETCH NEXT 5000 ROWS ONLY, but how to do it with a JSON?
You can't set any index range(1-5,000,5,000-10,000) when you use LookUp Activity.The workaround mentioned in the doc doesn't means you could use LookUp Activity with pagination,in my opinion.
My workaround is writing an azure function to get the total length of json array before data transfer.Inside azure function,divide the data into different sub temporary files with pagination like sub1.json,sub2.json....Then output an array contains file names.
Grab the array with ForEach Activity, execute lookup activity in the loop. The file path could be set as dynamic value.Then do next Web Activity.
Surely,my idea could be improved.For example,you get the total length of json array and it is under 5000 limitation,you could just return {"NeedIterate":false}.Evaluate that response by IfCondition Activity to decide which way should be next.It the value is false,execute the LookUp activity directly.All can be divided in the branches.

Store multiple authors in to couchbase database

I am a newbie to "couchbase server". What i am looking for is to store 10 author names to couchbase document one after another. Someone please help me whether the structure is like a single document "author" and multiple values
{ id : 1, name : Auther 1}, { id : 2, name : Author 2}
OR store Author 1 to a document and Author 2 to another document.
If so, how can i increment the id automatically before "insert" command.
you can store all authors in a single document
{ doctype : "Authors",
AuthorNames:[
{
id: 1,
Name : "author1"
}
{
id: 2,
Name : "author2"
}
so on
]
IF you want to increase the ID, one is to enter one author name at a time in new document, but ID will be randomly generated and it would not in incremental order.
In Couchbase think more about how your application will be using the data more than how you are want to store it. For example, will your application need to get all of the 10 authors all of the time? If so, then one document might be worthwhile. Perhaps your application needs to only ever read/write one of the authors at a time. Then you might want to put each in their own, but have an object key pattern that makes it so you can get the object really fast. Objects that are used often are kept in the managed cache, other objects that are not used often may fall out of the managed cache...and that is ok.
The other factor is what your reads to writes ratio is on this data.
So like I said, it depends on how your application will be reading and writing your data. Use this as the guidance for how your data should be stored.
The single JSON document is pretty straight forward. The more advanced schema design where each author is in its own document and you access them via object key, might be a bit more complicated, but ultimately faster and more scalable depending on what I already pointed out. I will lay out an example schema and some possibilities.
For the authors, I might create each author JSON document with an object key like this:
authors::ID
Where ID is a value I keep in a special incrementer object that I will called authors::incrementer. Think of that object as a key value pair only holding an integer that happens to be the upper bound of an array. Couchbase SDKs include a special function to increment just such an integer object. With this, my application can put together that object key very quickly. If I want to go after the 5th author, I do a read by object key for "authors::5". If I need to get 10, I do a parallelized BulkGet function and get authors::1 through authors::10. If I want to get all the authors, I get the incrementer object, and get that integer and then to a parallelized bulk get. This way i can get them in order or in whatever order I feel like and I am accessing them by object key which is VERY fast in Couchbase.
All this being said, I could use a view to query this data or the upcoming "SQL for Documents" in Couchbase 4.0 or I can mix and match when I query and when I get objects by their key. Key access will ALWAYS be faster. It is the difference between asking a question then going and getting the object and simply knowing the answer and getting it immediately.

Handling JSON posts in Yesod

An AngularJS client is sending a JSON post to a Yesod server to update a person record. The post can contain the following fields each of which is optional - the client can send any subset of these:
firstName
lastName
...
active
To limit the discussion a bit lets assume the client, at the moment, only wants to toggle activity, so it will only send the active value (it specifically wants to keep the rest intact) and the message will be:
{
active: 0
}
On the server now, we know the id of the person from the URL (eg. /api/v1.0/person/1) but the client does not send a complete Person entity, so the usual:
person <- requireJsonBody :: Handler Person
_ <- runDB $ update personId ...
will not work here. It would seem a more flexible approach is needed. Maybe something along the lines of:
mapToUpdate :: PersonInfo -> [Update PersonInfo]
where PersonInfo is an instance of FromJSON and is defined to match Person but has all the fields of type Maybe a. However, that seems totally contrary to DRY.
So to wrap this up: how would one handle such a use case in Yesod nicely going back and assuming again the client can send any subset of a Person's fields?
You could imagine even more horrifying scenarios. For example one JSON post needing to be mapped to an update of multiple database entities (api entities do not have to map 1:1 to database entities).
I've never tried this, but here's a theoretical approach:
Grab the current value from the database
Serialize that value to an aeson Value by calling toJSON
Write some kind of "update" algorithm that merges two Values together, e.g. mergeValues :: Value -> Value -> Value
Merge the original entity with the value uploaded by the user
Try to parse the resulting value with parseJSON
If it succeeds, use replace to put it back into the database

Edit json object by lua script in redis

I want edit my json object before back from the Redis server,
In my Redis server I have 4 keys:
user:1 {"Id":"1","Name":"Gholzam","Posts":"[]"}
user:1:post:1 {"PostId":"1","Content":"Test content"}
user:1:post:2 {"PostId":"2","Content":"Test content"}
user:1:post:3 {"PostId":"3","Content":"Test content"}
I want to get this context by lua script,How ? :
{"Id":"1","Name":"Gholzam","Posts":"[{"PostId":"1","Content":"Test
content"},{"PostId":"1","Content":"Test
content"},{"PostId":"1","Content":"Test content"}]}
The choice of client here is largely irrelevant; the important thing to do is: figure out the data storage. You say you have 4 keys - but it is not obvious to me how we we know, given user:1, what the posts are. Common approaches there include:
have a set called user:1:posts (or something similar) which contains either the full keys (user:1:post:1, etc) or the relative keys (1, etc)
have a hash called user:1:posts (or something similar) which contains the posts keyed by their id
I'd be tempted to use the latter approach, as it is more direct - so I might have:
user:1, a string with contents {"Id":"1","Name":"Gholzam","Posts":"[]"}
user:1:posts, a hash with 3 pairs:
key 1 with value {"PostId":"1","Content":"Test content"}
key 2 with value {"PostId":"2","Content":"Test content"}
key 3 with value {"PostId":"3","Content":"Test content"}
Then you can use hgetall or hvals to get the posts easily.
The second part is how to manipulate json at the server. The good news here is that redis provides access to json tools inside lua via cjson.
I am an expert in neither cjson nor lua; however, frankly my advice is: don't do this. IMO, redis works best if you let it focus on what it is great at: storage and retrieval. You probably can bend it to your whim, but I would be very tempted to do any json manipulation outside of redis.

Advice on best data structure to use in Redis?

I'm new to redis, and would like to start storing an object that's currently JSON in redis instead. But I need some advice on the best data structure to use.
Basically, the object stores information about which user has looked at which page. Here's the JSON:
all_pageviews = {
'unique_user_session_id_1' : { 'page' : 2, 'country' : 'DE' },
'unique_user_session_id_2': { 'page' : 2, 'country' : 'FR' }
...
};
I've been using a JSON object with the user IDs as keys because that way I can ensure the keys are unique, which is important for various reasons in my app.
I'm going to want to query it efficiently in the following ways:
By user: get all data related to unique_user_session_id_2.
By page: get all user objects related to page number 2.
Any thoughts on what would be the best redis structure to use? Ordering doesn't matter for the purposes of my app, but querying efficiently does.
Please let me know if I have explained myself badly, or if you need more information. Thanks!
To look up data in redis by multiple keys, you'll have to use multiple structures.
I would use a hash to map user_session_ids to the json string, and a sorted set to map pages to user_session_ids