How to Create simple View or query subdocument - couchbase

I read the docs - but can't seem to get my head around this ( I'm a SQL guy )
1) I loaded a json file in using CBdocloader
[
{
"ID": "9e78f4a6-4061-48aa-8154-0b738d93461b",
"More fields": ""
}
]
2) There is now an object in my bucket calles values100 ( that was the name of the file ) .
3) How to I access the data in this object in the bucket that I imported through a query or view?
Select * from mybucket returns 1 result that has all the rows I loaded - but I really want to query that data in that bucket? Should I create a view? Should I query a View? My Question is #3 but I am confused..

I believe there are a couple of things going on:
a. cbdocloader expects that each document is contained in a separate file. The behavior desired indicates that each document should instead be its own file, rather than one single document on Couchbase. The tool will then create multiple couchbase documents which can be indexed. I'm not sure there is a way to split out the text file using the tool; you may have to write a script to do it for you.
b. Couchbase is intended to be a document database, not a SQL database. As such, the way you would access a document in a majority of the cases is via the document id, which should have some significance to your application. This is not to say you can't look up the document id in an index, but you may find a SQL database to work better if you plan to do a lot of complex queries. If you need help on creating an index, please post a new question.

Related

Querying a database record from flowfile content to retrive data using apache-nifi

My scenario is as followed.
From one process I retrieve data from a table.
id,user_name
1,sachith
2,nalaka
I need to retrieve account details from account_details table for these ids.
I have tried to use various database related processors. But none of them support flowfile content.
How can I retrieve records only for these id?
use below:
ExecuteSQL( account_details)
-> convertAvroToJSON
-> EvaluateJsonPath
->AttributesToJson
( here you take only id and ignore test)
Take a look at the LookupRecord using a DatabaseRecordLookupService controller service. That should allow you to use the id field to look up additional fields from a database and add them to the outgoing records. This is a common "enrichment" pattern, where the lookups can be done against databases, CSV files, etc.
You can use QueryRecord processor to query data from flowfiles. You will need to set a reader and a writer inside this processor to open your file properly and write as well. To create a query, you must create a property with the name of the query and put the query itself as the value for this property. After that, you can create an output stream for this property.
The query syntax is Apache Calcite.
You can find further explanation here

Batch Processing - Odata

I want to make requests to allow grouping of multiple operations into a single HTTP request payload
I have an API Key that allows me to make Get Requests and return tables in a Database as JSON blocks. Certain attributes are 'expandable' and OData (Open Data Protocol) allows you to 'expand' multiple attributes within the "CompanyA" table (ie Marketing, Sales, HR)
http://api.blahblah.com/odata/CompanyA?apikey=b8blachblahblachc&$expand=Marketing,Sales,HR
I would like to select multiple tables, (the request above only contains 1 table which was Company A) and understand this is possible via "Batch Requests"
https://www.odata.org/documentation/odata-version-3-0/batch-processing/
The documentation above alongside Microsoft's is hard to translate into my noted desire.
I wanted it to be as simple as, but I know it is not and can't figure out how to get there:
http://api.blahblah.com/odata/CompanyA,CompanyB,CompanyC?apikey=b8blachblahblachc
The end goal is to have one JSON file that contains detail about each table in the DB , rather than have to write each individual query and save it file as below:
http://api.blahblah.com/odata/CompanyA?apikey=b8blachblahblachc
http://api.blahblah.com/odata/CompanyB?apikey=b8blachblahblachc
http://api.blahblah.com/odata/CompanyC?apikey=b8blachblahblachc

Best DynamoDB Structure for Implementation

I am working on a save application, basically the user could go to an article and click save to store it in his profile. Instead of using a relational database the Application currently is using dynamodb. Each article has a specific type of article. The way the structure is currently used for this application is:
user-id [string][DynamoDBHashKey]
type-of-article [string] [DynamoDBRangeKey]
json [string]
user-id is the unique identifier for the user, type-of-article is well.. the type of the article, and the json is all the articles saved in a json format. the json format being:
[{article-id: timestamp}, {article-id: timestamp}]
Article #1 ^ Article #2 ^
article-id is (again) the article unique identifier and timestamp is the timestamp for when that article was stored .
Note This was done before dynamodb started supporting for json documents as Map and Lists. And the code is not mine.. It was already done..
So when the application needs to remove an article from saved It calls dynamo to get the json modify the json and then stores it again. When is going to add a new article it does the same thing. Now a problem appeared when I wanted to display all the articles ordered by the timestamps. I had to call to get all the types and merge them in a dictionary to sort them. (In the user profile I need to show all saved articles, no matter what type, sorted) Now the application is taking more than 700 or 900 ms to respond.
Personally I don't think this is the best way to approach this. So i'm thinking on rewriting the previous code to implement the new features from dynamodb (List and Maps). Now my idea for the structure in dynamodb is like this:
user-id [string] [DynamoDBHashKey]
saved-articles [List]
article-type_1
article_1 [Map] {id: article-id, timestamp: date}
article_2 [Map] {id: article-id, timestamp: date}
article-type_2
article_1 [Map] {id: article-id, timestamp: date}
But i'm relatively new to dynamodb, I made some test code to store this in dynamo using list and maps. I did it using the low level api and with the Object Persistence Model.
Now, my question is: is this a better approach or if is not why ? and what would be the better approach.
This way I think I can use the low level Api to only get the saved-articles of article-type #2. Or if I need them all I just call it all.
I would stick with a solution that is a bit more NoSQL-like. For NoSQL databases, if you have nested data models and/or updating existing records, those are often indicators that your data model can be optimized. I really see 2 objects that your application is using, 'users' and 'articles'. I would avoid a nested data model and updating existing records by doing the following:
'user' table
user id as hash key
'article' table
article id as hash key
timestamp as range key
user id (used in global secondary index described below)
article type and any other attributes would be non-key attributes
You'd also have a global secondary index on the article table that would allow you to search for articles by user id, which would look like something (assuming you want a user's articles sorted by date):
user id as hash key
timestamp as range key
article id as projected attribute
With this model, you never need to go back and edit existing records, you just add records that are 'edited' as new records, and you take the one with the most recent timestamp as your current version.
One thing to remember with NoSQL is that storage space is cheap, reads are cheap, but editing existing records are usually expensive and undesirable operations.

How to get the latest document in couchbase bucket?

I have a activity bucket in my couchbase-db and I need to retrieve the latest document for different types, My initial approach was this:
Document Format : [ id , { val.time ,val.type, val.load } ]
And then I wrote different views to Map a specific val.type and I used reduce to get the latest val.time, However I have the problem of views not being updated ( Cause apparently the Map is only called on new or changed documents and this approach needs all of the documents to be mapped and reduced.)
What is the best practice / approach for time-based data on Couchbase ( NoSQL ) databases ?
You can access the documents by time with a simple view like this:
Convert your time to an integer value. Eg using the parse() method. http://www.quackit.com/javascript/javascript_date_and_time_functions.cfm
Use this integer value as the key to your view. You can then access the first or last document.
If you always need the last document it's faster to negate the time value, so the largest times are at the start of the index.
If you are using a development design document, it is expected that adding new keys into your bucket may have no effect on the view. Since only a subset of keys from the bucket are going into the Map/Reduce pipeline, adding a key that is not going into the subset would trigger no update to your view.
Either evaluate it on a production design document (by clicking the "publish" button), or try adding more keys.

Should I convert my MySQL database to JSON file?

I created a quiz website, the questions and answers are fixed, the data usually does't change.
When a user loads a quiz page, a query is made to the MySQL database to retrieve the question data (including answer, related image data etc). There are about 2000 questions total in the database. The query to the database is made based on the question's unique ID.
I would like to speed up the page loading time. I've read about making a query to the MySQL database and then converting the data into JSON format, but that seems like it would make the process longer. Should I convert the MySQL database into a single JSON file, then have the website's quiz page directly query the JSON file rather than the MySQL database to grab the question data?
There's no need for JSON here unless you want to make AJAX requests
This is a common scenario for webapps. The following may be a one of many approaches (assuming MySQL and PHP):
1. Load the single question page with $questionID in the URL (and read it with $_GET['questionID']) or you pass it as $_POST
2. Query the database with:
SELECT *
FROM questions
WHERE question_id = "(int)$questionID"
LIMIT 1
3. Build your HTML with the info returned by your query
4. Display your info to the user
5. On $_POST check for right/wrong answers against answers table querying by $questionID and $answerID (you might need to loop thru them)
6. Store the results in $_SESSION so you can show final results on the last page.
Hope that helps,