managing video watch history of users in cassandra - mysql

I was using MYSQL database but one of my history table growing very fast already more than 300 Million rows which making database slow and difficult to create backups. So i decided move just that table in Cassandra. It's my first time on Cassandra. In mysql I'm storing user_id, video_id, watch_secs, watch_counter, timestamp, user_id,video_id is unique composite key and increment watch_secs and watcher_counter if already exists. I tried to do following with Cassandra
CREATE TABLE IF NOT EXISTS history
(
user_id int,
video_id int,
watch_secs int,
watch_counter int,
last_updated timestamp,
history_timestamp timestamp,
PRIMARY KEY ((user_id, video_id))
);
CREATE TABLE IF NOT EXISTS history_counter
(
user_id int,
video_id int,
watch_secs counter,
watch_counter counter,
PRIMARY KEY ((user_id, video_id))
);
I have created two tables for incrementing seconds and counter and other table same data with timestamps because limitations due to counter.
Now that is working good for storing data but here i have two issues deleting and getting data.
I want to fetch history of for last 10 for specific user. I tried
query but it need both user_id and video_id in where clause.
I want to delete history by video_id
So main issue if fetching or deleting data with only one partition key which is not working and I can't find any solution.
I will really appreciate your help and I can use any other database which will fit better for this or any solution in this database.

SELECT ...
FROM history
WHERE user_id = ?
ORDER BY history_timestamp DESC
LIMIT 10
and add this to the table history:
INDEX(user_id, history_timestamp)
That probably needs a JOIN using video_id to some other table to get the names of the 10 videos.
(What is history_counter for? The current state of someone viewing a video? Something else?)

Related

Performance of Update query compared To Delete - Insert

I have two tables : Shop and Product
Table Shop
(id INT AUTO_INCREMENT,
shop_id INT,
PRIMARY KEY(id)
);
Table Product
(
product_id INT AUTO_INCREMENT,
p_name VARCHAR(100),
p_price INT,
shop_id INT,
PRIMARY KEY(product_id),
FOREIGN KEY(shop_id) REFERENCES Shop(id)
);
On a server using Node and mysql2 package for queries.
On a client side, I'm displaying all Products that are related to specific Shop in a table.
User can change Products, and when he is pressing Save, requests are being made, sending new data, and storing her.
User can either change existing Products, or add new ones.
But i have concerns, how it will behave with a relatively big amount of products per one shop. Let's say there are 1000 of them.
The data that was inserted - marked with the flag saved_in_db=false.
Existing data, that was changed - changed=true.
Considered a few approaches :
On a server, filtering array of records received from a client, INSERT into db newly created, that are not stored yet.
But to UPDATE existing Products, i need to create a bunch of UPDATE Products SET p_name=val_1 WHERE id = ? queries, and execute them at once.
To take all Products with the specified Shop_id, DELETE them, and INSERT a new bulk of data. Not making separation between already existing records, or changed.
In this approach, i see two cons.
First - sending constant amount of data from client to server.
Second - running out of ids in DB. Because if there are 10 shops, with 1000 Products in each, and every user frequently updates records, every update, even if one new record was added, or changed, will increment id by around 1000.
Is it the only way, to update a certain amount of records in DB, executing a bunch of UPDATE queries one after another?
You could INSERT...ON DUPLICATE KEY UPDATE.
INSERT INTO Products (product_id, p_name)
VALUES (123, 'newname1'), (456, 'newname2'), (789, 'newname3'), ...more...
ON DUPLICATE KEY UPDATE p_name = VALUES(p_name);
This does not change the primary key values, it only updates the columns you tell it to.
You must include the product id's in the INSERT VALUES, because that's how it detects that you're inserting a row that already exists in the table.

Mysql database empty column values vs additional identifying table

Sorry, not sure if question title is reflects the real question, but here goes:
I designing system which have standard orders table but with additional previous and next columns.
The question is which approach for foreign keys is better
Here I have basic table with following columns (previous, next) which are self referencing foreign keys. The problem with this table is that the first placed order doesn't have previous and next fields, so they left out empty, so if I have say 10 000 records 30% of them have those columns empty that's 3000 rows which is quite a lot I think, and also I expect numbers to grow. so in a let's say a year time period it can come to 30000 rows with empty columns, and I am not sure if it's ok.
The solution I've have came with is to main table with other 2 tables which have foreign keys to that table. In this case those 2 additional tables are identifying tables and nothing more, and there's no longer rows with empty columns.
So the question is which solution is better when considering query speed, table optimization, and common good practices, or maybe there's one even better that I don't know? (P.s. I am using mysql with InnoDB engine).
If your aim is to do order sets, you could simply add a new table for that, and just have a single column as a foreign key to that table in the order table.
The orders could also include a rank column to indicate in which order orders belonging to the same set come.
create table order_sets (
id not null auto_increment,
-- customer related data, etc...
primary key(id)
);
create table orders (
id int not null auto_increment,
name varchar,
quantity int,
set_id foreign key (order_set),
set_rank int,
primary key(id)
);
Then inserting a new order means updating the rank of all other orders which come after in the same set, if any.
Likewise, for grouping queries, things are way easier than having to follow prev and next links. I'm pretty sure you will need these queries, and the performances will be much better that way.

Mysql Multiple Unique Keys Across Tables

I have a table like that:
CREATE TABLE `Appointment` (
id INT NOT NULL AUTO_INCREMENT,
user_id INT NOT NULL,
doctor_slot_id INT NOT NULL,
date DATE NOT NULL,
PRIMARY KEY(id),
FOREIGN KEY(user_id) REFERENCES user(id),
FOREIGN KEY(doctor_slot_id) REFERENCES doctor_slot(id)
);
I want that a user can't arrange an appointment with a doctor more than once in a day. So I want to add a unique constraint between doctor_id and user_id but in this structure I can't. I tried those things which are not in SQL syntax:
UNIQUE(user_id, doctor_slot.doctor_id)
and
UNIQUE(user_id, doctor_slot(doctor_id))
and
UNIQUE(user_id, doctor_id(doctor_slot))
But as you know, they didn't work. Are there any suggestions you can make?
Based on your comment about the what the doctor_slot is, it would seem you have a bit on an issue with your schema design. There should be no reason for you to store both a slot_id and a date in the appointment table, in that the doctor_slot already has a date component, so storing the date in the appointment table is a redundant storage of data, and could become problematic to keep in sync.
Of course, without the date on this table it is impossible to force a unique constraint in the database for this table.
My recommendation for any type of calendar-based app like this, would be to first create a date table. I usually use a script like the one here: http://www.dwhworld.com/2010/08/date-dimension-sql-scripts-mysql/ to create this date table. Having such a table can allow you to use a simple date_id to reference all kinds of different information about a date (this is a technique commonly used in data warehouses). As long as you use this date_id in all the other tables where you need dates, it as extremely simple to look of dates in any fashion you desire (by day of week, month, week number, whether it is a weekday or not, etc.).
You could use a similar concept to build your timeslots. Maybe make a table that has 96 entries (24 hours * 15 minutes) to represent 15 minute intervals - obviously you can change this to whatever interval you like.
You could then build your appointment table like this:
appointment_id
user_id
doctor_id
date_id
time_start_id <= time slot for appointment start
time_end_id <= time slot for appointment end
Id don't see separate need for a doctor_slots table here. If you want to track open doctor slots, you could also do that in this table by having user_id simply = NULL until the slot is filled.
This would allow you to enforce unique index on user_id and date_id.

MySql database formatting

I am currently developing a database storage solution for product inventory information for the company I work for. I am using MySql, and I am having a hard time coming up with an efficient, feasible format for the data storage.
As it works right now, we have ~25000 products to keep track of. For each product, there are about 20 different categories that we need to track information for(quantity available, price, etc..). This report is downloaded and updated every 3-4 days, and it is stored and updated in excel right now.
My problem is that the only solution I have come up with so far is to create separate tables for each one of the categories mentioned above, using foreign keys based off of the product skus, and cascading to update each respective table. However, this method would require that every table add 24000 rows each time the program is run, given that each product needs updated for the date it was run. The problem with this is that the data will be store for around a year, so the tables will grow an extensive amount. My research for other database formats has yielded some examples, but none on the scale of this. They are geared towards adding maybe 100 rows a day.
Does anybody know or have any ideas of a suitable way to set up this kind of database, or is the method I described above suitable and within the limitations of the MySql tables?
Thanks,
Mike
25,000 rows is nothing to MySQL or a flat file for that case. Do not initially worry about data volume. I've worked on many retail database schemas and products are usually defined by either a static or arbitrary-length set of attributes. Your data quantity ends of not being that far off either way.
Static:
create table products (
product_id integer primary key auto_increment
, product_name varchar(255) -- or whatever
, attribute1_id -- FK
, attribute2_id -- FK
, ...
, attributeX_id -- FK
);
create table attributes (
attribute_id integer primary key -- whatever
, attribute_type -- Category?
, attribute_value varchar(255)
);
Or, you obviously:
create table products (
product_id integer primary key auto_increment
, product_name varchar(255) -- or whatever
);
create table product_attributes (
product_id integer
, attribute_id integer
, -- other stuff you want like date of assignment
, primary key (product_id , attribute_id)
);
create table attributes (
attribute_id integer primary key -- whatever
, attribute_type -- Category?
, attribute_value varchar(255)
);
I would not hesitate to shove a few hundred million records into a basic structure like either.

Would an index on a table be beneficial in this case?

Ive got a table with a many rows, currently no fields are unique. I've got a userid field and an gameid field and some other rows storing information on the games the users have played. As a user plays the game the score is updated so there are quite alot of update queries happening on this table and it's starting to get pretty large.
Would it be beneficial adding another field thats an index and then storing a string such as userid_gameid which would then mean updates were faster in the table if I do my update queries so where index=10_10 (example)
Thanks
Don't add a noddy field for userid + gameid, instead, create an index that includes both columns. If the two columns taken together are intended to be unique then make this the primary key of the table.
CREATE INDEX myIndex ON myTable (userid, gameid)