Couchbase Query to shard data in collections - couchbase

I have millions of records in legacy couchbase bucket. Post migration to couchbase 7.0 it will be copied to _default scope and collection now If I need to shard it across collections. Say collections are per month and year what query should I write
which will take data from bucketname_default_default to respective collection for month i.e. collection named collection_month_year. I have createdDate in the document already.

Related

Best way to get data from 2 different databases (MSSQL Server and MySQL)

I have a website that uses 2 different DMS (MSSQL Server and MySQL). There is a table name product in both databases, these 2 tables have the same product ID.
In MSSQL Server: I stored price, quantity.
In MySQL: I stored name, size,...
Now, every time I query products, I would do like this:
- Connect to MySQL -> query products by a loop -> inside every loop, I will connect to MSSQL Server to get other data of this product.
I know this is totally a bad way, so I'm finding a new way to get what I want since I think my website is slower because of that kind of query.
Can you help me by writing pseudo-code or explanation, thank you.
You are right! Having your website data in 2 different database technologies is not optimal.
Until you fix that, one workaround could be (assuming we are not talking millions of records):
User selects product A or Product category X on the website.
Get all data for product A or products of category X from SQL Server and store it in memory (for eg. in a c# dataset or python data frame)
Get all data for product A or products of category X from MySQL and store it in memory (for eg. in a c# dataset or python data frame)
Join the 2 in memory objects based on Product Id
Use this combined dataset for display your website
If required, Update (commit) data to the databases at the end of the session (will need to consider how to deal with dirty read scenarios)

Data sync from MySQL to NoSQL key-value store

I am having a legacy system with the MySQL at the backend and python as the primary programming language.
Recently we have a scenario where we need to display a dashboard with the information present in the MySQL database. The data in the table changes every second.
This can be thought of similar to a bid application where people bid constantly. Every time a user bids a record goes in to the database. When an user updates his bid it updates the previous value.
I also have few clients who monitor this dashboard which updates the statistics.
I need to order this data in realtime as people bid in real time.
I don't prefer to run queries against MySQL because at any second I may have few 1000 clients querying the database. This will create load on database.
Please advice.
If you need to collect and order data in realtime you should be looking at the atomic ordered map and ordered list operations in Aerospike.
I have examples of using KV-ordered maps at rbotzer/aerospike-cdt-examples.
You could use a similar approach with the user's ID being the key, the bid being a list with the structure [1343, { foo: bar, ts: 1234, this: that} ]. The bid amount in cents (integer) is the first element of the list, all the other information is in a map in the second element position.
This will allow you to update a user bid with a single map operation, get back the user's bid with a single operation, order by rank (on the bid amount) to get the top bids ordered, get all the bids in a particular range, etc. You would have one record per item, with all the bids in this KV-sorted map.

MySQL JSON datatype to store historical price

I am thinking about possible solution to save historical data prices in MySQL 5.7+ with JSON datatype instead of adding new row per historical price.
Case description:
I have a table of products current prices and product description called "Products". Products prices can change after a few days, i want to save the historical price changes using JSON array:
JSON Base structure:
PriceChangeDate (date)
PriceChanged (float)
PromoType (tinyint)
PromoDesc (nvarchar(50))
The idea is to save historical data into a separated table called "HistoricalProductPrices" that includes ProductID, (Relation to "Products" table) , DateCreated and a JSON datatype for ProductID history prices with the fields i described.
In some cases i will need whole data history of a product so i will just fetch whole JSON and display for report. Sometimes i will need a specific date or range of product historical price so i guess i will just fetch historical data from JSON and look for deified "PriceChangeDate".
This will also allow me to save tons of daily insert, instead i will need to update JSON with the new data of products.
What do you guys think about this method to save historical data?
You asked:
What do you guys think about this method to save historical data?
With respect, I think it's a terrible idea. If you do this with MySQL or any other RDMS, the next person who has to work on your code will stick nails in a puppet that looks like you. Seriously.
Adding new rows for new events (like a stock trade) or new days is what RDMSs do. They do it very well indeed.
The entire point of a SQL database is to allow rapid updating, searching and aggregating of data in many columns. RDMS systems can store and search millions upon millions of rows of data without breaking a sweat. To put many records together in a single BLOB is to defeat all that search technology.
If you really want to use JSON documents for storage, you may want to investigate MongoDB. It has indexing that works inside some JSON documents.

Using a junction table with RestKit to map relationships in Core Data

I have three different pairs of objects connected with a Many-To-Many relationship. One of these is
Product <<--->> Discount
Product contains some general data like a product category, name and identifier. Discounts link to Products and have price data and Dates marking the time frame the discounts are valid, among other things.
I have successfully used RKEntityMappings and RKResponseDescriptors to fetch these objects from a REST API (implemented with Node.js) and these all work fine.
My problem is that we have a third API endpoint that returns the relationships between these objects (Product and Discount) in an array, like this:
[
{"discountid":1,"productid":3},
{"discountid":1,"productid":2},
{"discountid":108,"productid":29}
]
I know that I could make an object in the middle and then map that as an entity to the aforementioned JSON output, however as I do not have any other data to store in that I'd prefer not to.
If I skipped Core Data and used sqlite I could just make my own junction table, but after reading numerous posts advising against it and searching for hours and hours I'm asking: is there a way I can use this junction table output to map the relationships of two objects in Core Data using RestKit?

Couchbase - Splitting a JSON object into many key-value entries - performance improvement?

Say my Couchbase DB has millions of user objects, each user object contains some primitive fields (score, balance etc.)
And say I read & write most of those fields on every server request.
I see 2 options of storing the User object in Couchbase:
A single JSON object mapped to a user key (e.g. user_555)
Mapping each field into a separate entry (e.g. score_555 and balance_555)
Option 1 - Single CB lookup, JSON parsing
Option 2 - Twice the lookups, less parsing if any
How can I tell which one is better in terms of performance?
What if I had 3 fields? what if 4? does it make a difference?
Thanks
Eyal
Think about your data structure and access patterns first before worrying if json parsing or extra lookups will add overhead to your system.
From my perspective and experience I would try to model documents based upon logical object groupings, I would store 'user' attributes together. If you were to store each field separately you'd have to do a series of lookups if you ever wanted to provide a client or service with a full overview of the player profile.
I've used Couchbase as the main data store for a social mobile game, we store 90% of user data in a user document, this contains all the relevant fields such as score,level,progress etc. For the majority of operations such as a new score or upgrades we want to be dealing with the whole User object in the application layer so it makes sense to inflate the user object from the cb document, alter/read what we need and then persist it again if there have been changes.
The only time we have id references to other documents is in the form of player purchases where we have an array of ids that each reference a separate purchase. We do this as we wanted to have richer information on each purchase (date of transaction,transaction id,product type etc) that isn't relevant to the user document as when a purchase is made we verify it's legitimate and then add to the User inventory and create the separate purchase document.
So our structure is:
UserDoc:
-Fields specific to a User (score,level,progress,friends,inventory)
-Arrays of IDS pointing to specific purchases
The only time I'd consider splitting out some specific fields as you outlined above would be if your user document got seriously large but I think it'd be best to divide documents up per groupings of data as opposed to specific fields.
Hope that helped!