How to remove blob, clob columns from couchbase tables? - couchbase

I have a use-case to process random rows from couchbase tables. I'm required to remove the blob/clob columns from the results to further process.
I'm very new to couchbase. Followed the documentation.
I've set up a test cluster on the couchbase server. And I've manually setup blob data in document.
{
"experience": 14248,
"hitpoints": 9223372036854775807,
"jsonType": "player",
"level": 141,
"loggedIn": true,
"name": "Aaron1",
"uuid": "78edf902-7dd2-49a4-99b4-1c94ee286a33",
"image": {
"#type": "blob",
"content_type": "image/jpeg",
"digest": "sha1–4xlj1AKFgLdzcD7a1pVChrVTJIc=",
"length": 3888349
}
}
I have the following questions:
Couchbase has blob datatype. I'm not seeing any good documentation with examples for Clob datatype. Does couchbase support clob datatype? if yes, could you provide me a sample dataset as an example.
While filtering out the blob datatype columns through my utility, I'm looking for fields having "#type" and "content_type" attribute. Is this approach correct? Or I'm in the risk of missing any blob type of column?
Could anyone pls help me with the understanding or route me to the right documentation for this purpose.

Related

Architectural Decision: How to structure a big Json Response

I'm working on an app that will generate a Json potentially very big. In my tests this was 8000 rows. This is because is an aggregation of data for a year, and is required to display details in the UI.
For example:
"voice1": {
"sum": 24000,
"items": [
{
"price": 2000,
"description": "desc1",
"date": "2021-11-01T00:00:00.000Z",
"info": {
"Id": "85fda619bbdc40369502ec3f792ae644",
"address": "add2",
"images": {
"icon": "img.png",
"banner": null
}
}
},
{
"price": 2000,
"description": "desc1",
"date": "2021-11-01T00:00:00.000Z",
"info": {
"Id": "85fda619bbdc40369502ec3f792ae644",
"address": "add2",
"images": {
"icon": "img.png",
"banner": null
}
}
}
]
},
The point is that I can have potentially 10 voices and for each dozen and dozens of items.
I was wondering if you can point to me some Best Practices or if you have some tips about them because I've got the feeling this can be done better.
It sounds like you are finding out that JSON is a rather verbose format (not as bad as XML but still very verbose). If you are worried about the size of messages between server client and you have a few options:
JSON compresses rather well. You can see how most tokens repeat many times. So make sure to Gzip or Snappy before sending to clients. This will drastically reduce the size, but cost some performance for inflating / deflating.
The other alternative is to not use JSON for transfer, but a more optimized format. One of the best options here is Flat Buffers. It does require you to provide schemas of the data that you are sending but is an optimized binary format with minimal overhead. It will also drastically speed up your application because it will remove the need for serialization / deserialization, which takes a significant time for JSON. Another popular, but slightly slower alternative is Protobuf.
The only thing immediately obvious to me is that you would likely want to make a list of voices (like you have for items) rather than voice1, voice2, etc.
Beyond that it really just depends the structure of the data you start with (to create the json) and the structure of the data or code at the destination (and possibly also the method of transferring data if size is a concern). If you're doing a significant amount of processing on either end to encode/decode the json that can suggest there's a simpler way to structure the data. Can you share some additional context or examples of the overall process?

Pact Consumer / Provider based in data type and not in data value

We are currently using Pact-Broker in our Spring Boot application with really good results for our integration tests.
Our tests using Pact-Broker are base in a call to a REST API and comparing the response with the value in our provider, always using JSON format.
Our problem is that the values to compare are in a DB where the data is changing quite often, which make us update the tests really often.
Do you know if it is possible to just validate by the data type?
What we would like to try is to validate that the JSON is properly formed and the data type match, for example, if our REST API gives this output:
[
{
"action": "VIEW",
"id": 1,
"module": "A",
"section": "pendingList",
"state": null
},
{
"action": "VIEW",
"id": 2,
"module": "B",
"section": "finished",
"state": null
}
}
]
For example, what we would like to validate from the previous output is the following:
The JSON is well formed.
All the keys / value pair exists based in the model.
The value match a specific data type, for example, that the key action exist in all the entries and contains a string data type.
Do you know if this is possible to be accomplished with Pact-Broker? I was searching in the documentation but I did not found any example of how to do it.
Thanks a lot in advance.
Best regards.
Absolutely! The first 2 things Pact will always do without any extra work.
What you are talking about is referred to as flexible matching [1]. You don't want to match the value, but the type (or a regex). Given you are using Spring Boot, you may want to look at the various matchers available for Pact JVM [2].
I'm not sure if you meant it, but just for clarity, Pact and Pact Broker are separate things. Pact is the Open Source contract-testing framework, and Pact Broker [3] is a tool to help share and collaborate on those contracts with the team.
[1] https://docs.pact.io/getting_started/matching
[2] https://github.com/DiUS/pact-jvm/tree/master/consumer/pact-jvm-consumer#dsl-matching-methods
[3] https://github.com/pact-foundation/pact_broker/

How so I expand multi dimensional JSON in Power BI

I have a number of JSON sources I wish to import to power BI. The format is such that foreign keys are such that there can be 0, 1, or many, but they store both the ID to another table as well as the name. An example of one entry in one of the JSON files is:
{
"ID": "5bb68fde9088104f8c2a85be",
"Name": "name here",
"Date": "2018-10-04T00:00:00Z",
"Account": {
"ID": "5bb683509088104f8c2a85bc",
"Name": "name here"
},
"Amount": 38.21,
"Received": true
}
Some tables are much more complex etc, but for the most part, they always follow this sort of format for foreign keys. In power BI, I pull in the JSON, convert to table, and expand the column to view the top level in the table, but any lower levels, such as these foreign keys, are represented as lists. How do I pull them out into each row? I can extract values, but that duplicates rows etc.
I have googled multiple times for this and tried to follow what others have posted but can't seem to get anything to work.

logic in JSON Schemas

Is there an already established way of incorporating logic into a JSON Schema?
For example if I had a JSON of the following:
{
"Gross Pay": "100",
"Hours": "5",
"Rate": "20"
}
And I have a Schema requiring these 3 fields. If I wanted to ensure that the "Gross Pay" equals "Hours" x "Rate" where would be the best place to incorporate such logic?
No, you can't describe this type of assertions with JSON Schema. See validation keywords, there's nothing suitable there. There are some keywords like minimum or exclusiveMaximum, but they won't allow you to express Gross Pay = Hours * Rate.

Elasticsearch Reindex or Flag Deleted Type Property

This is related to my original question here:
Elasticsearch Delete Mapping Property
From that post assuming you are going to have to "reindex" your data. What is a safe strategy for doing this?
To summarize from the original post I am trying to take the mapping from:
{
"propVal1": {
"type": "double",
"index": "analyzed"
},
"propVal2": {
"type": "string",
"analyzer": "keyword"
},
"propVal3": {
"type": "string",
"analyzer": "keyword"
}
}
to this:
{
"propVal1": {
"type": "double",
"index": "analyzed"
},
"propVal2": {
"type": "string",
"analyzer": "keyword"
}
}
Removing all data for the property that was removed.
I have been contemplating using the REST API for this. This seems dangerous though since you are going to need to synchronize state with the client application making the REST calls, i.e. you need to send all of your documents to the client, modify them, and send them back.
What would be ideal is if there was a server side operation that could move and transform types around. Does something like this exist or am I missing something obvious with the "reindexing"?
Another approach would be to flag the data as no longer valid. Is there any built in flags for this, in terms of the mapping, or is it necessary to create an auxiliary type to define if another type property is valid?
You can have a look to elasticsearch-reindex plugin.
A more manual operation could be to use scan & scroll API to get back your original content and use bulk API to index it in a new index or type.
Last answer, how did you get your docs in Elasticsearch? If you have already a data source somewhere, just use the same process as before.
If you don't want any downtime, use an alias on top of your old index and once reindex is done, just move the alias to the new index.