INSERT/DELETE entire row vs UPDATE BOOL field? - mysql

My normalized DB has a table where a 1-N relationship exists for attributes related to the user. One of these attributes involves storing a yes/no binary status. This attribute is relatively frequently updated (TRUE to FALSE then back to TRUE) but also frequently retrieved.
Table
user_id (FK user table) | value_id (regular int) | yes_no (bool)
User has multiple variations of value_id but these are always retrieved as an entire set i.e. SELECT * FROM table WHERE user_id=ID
I'm thinking a bulk SELECT like that would benefit from lesser rows to return if all the FALSE rows are deleted from the table instead of being updated.
However, I understand that updating a single field would definitely also be less taxing than multiple INSERT/DELETE.
Thoughts appreciated!

Delete/insert will require from MySQL to rebuild indexes and update statistics (you have FKs == indexes).
If you have many records in the table and changing data often, this would be expensive.

Related

Is one table with partition better for response time or multiple tables, for using in website and operations(insert, update, delete) used frequently

I have mysql database hosted on one of the websites hosting services companies -Hostinger-, this database used from mobile app by php APIs.
There are many tables.
I will show important tables with only the important columns as objects to be easier for understanding:
user(id, username, password, balance, state);
cardsTrans(id, user_id, number, password, price, state);
customersTrans(id, user_id, location, state);
posTrans(id, user_id, number, state);
I thought create one table instead of these three transactions tables, and this table showed like:
allTransaction(id, user_id, target_id, type, card_number, card_pass, location);
I know that there is a redundancy and some columns will get null, and I can normalize this table, but the normalization will produced with many join when query the data and I interested the response time.
To explain the main idea: the user can do three types of transactions(each type is with different table), these transactions stored on allTransaction table with user_id as foreign key from users table and target_id as foreign key from other table, determined in depends on the type.
the other columns also depends on the type and maybe set to null.
What I want is to determine which better for response time and performance when users using the app. The DML operations(insert , update, delete) applied frequently on these tables, and also very much queries, Usually querying by user_id and target_id.
If I used one table, this table will have very large number of rows and many null values in each row, so slowing the queries and take large storage.
If the table has index, the index will slowing the insert or update operations.
Is creating partition per user on the table without indexes will be better for response time with any operation (select, insert, update, or delete), or creating multiple tables (table per user) is better. the expected number of users is between (500 - 5000).
I searched and found this similar question MySQL performance: multiple tables vs. index on single table and partitions
But it doesn't in the same context when I interested in response time and then the performance, also my database is hosted on hosting server and not in the same device with the mobile app.
Who can tell me what is better and why?
As a general rule:
Worst: Multiple tables
Better: Builtin PARTITIONing
Best: Neither, just better indexing.
If you want to talk specifically about your case, please provide SHOW CREATE TABLE and the main SELECTs, DELETEs, etc.
It is possible to "over-normalize".
three types of transactions(each type is with different table)
That can be tricky. It may be better to have one table for transactions.
"Response time" -- Are you expecting hundreds of writes per second?
take large storage.
Usually proper indexing (especially with 'composite' indexes) makes table size not a performance issue.
partition per user on the table
That is no faster than having an index starting with user_id.
If the table has index, the index will slowing the insert or update operations.
The burden on writes is much less than the benefit on reads. Do not avoid indexes for that reason.
(I can be less vague if you provide tentative CREATE TABLEs and SQL statements.)
Instead of trying to predict the future, use the simplest schema that will work for now and be prepared to change it when you learn more by actual use. This means avoid scattering assumptions about the schema around the code. Look into the concept of Schema Migrations to safely change your schema and the Repository Pattern to hide the details of how things are stored. 5000 users is not a lot (unless they will all be using the system at the same time).
For now, go with the design that provides the strongest referencial integrity. That means as many not null columns as possible. While you're developing the product, you're going to be introducing bugs which might accidentally insert nulls where it should insert a value. Referencial integrity provides another layer of protection.
For example, if you have a single AllTransactions table which might have some fields filled in and might not depending on the type of transaction your schema has to make all these columns nullable. The schema cannot protect you from accidentally inserting a null value.
But if you have individual CardTransactions, CustomerTransactions, and PosTransactions tables their schemas can be constrained to ensure all the necessary fields are always filled in. This will catch many different sorts of bugs.
A variation on this is to have a single UserTransaction table which stores all the generic information about a user transaction (user_id, timestamp) and then join tables for each type of transaction. Here's a sketch.
user_transactions
id bigint primary key auto_increment
user_id integer not null references users on delete casade
-- Fields common to every transaction below
state enum(...) not null
price numeric not null
created_at timestamp not null default current_timestamp()
card_transactions
user_transaction_id bigint not null references user_transactions on delete cascade
card_id integer not null references cards on delete casade
..any other fields for card transactions...
pos_transactions
user_transaction_id bigint not null references user_transactions on delete cascade
pos_id integer not null references pos on delete cascade
..any other fields for POS transactions...
This provides full referential integrity. You can't make a card transaction without a card. You can't make a POS transation without a POS. Any fields required by a card transaction can be set not null. Any fields required by a POS transaction can be set not null.
Getting all transactions for a user is a simple indexed query.
select *
from user_transactions
where user_id = ?
And if you only want one type do a left join, also a simple indexed query.
select *
from card_transactions ct
join user_transactions ut on ut.id = ct.user_transaction_id
where ut.user_id = ?

how to structure large table and its transactions in database?

I have two big tables for example:
'tbl_items' and 'tbl_items_transactions'
First table keeping some items metadata which may have 20 (varchar) columns with millions rows... and second table keeping each transaction of first table.
for example if a user insert new record to tbl_items then automatically a new record will be adding to tbl_items_transactions with same data plus date, username and transaction type to keep each row history.
so in the above scenario two tables have same columns but tbl_items_transactions have 3 extra columns date, username, transaction_type to keep each tbl_items history
now assume we have 1000 users that wants to Insert, Update, Delete tbl_items records with a web application. so these two tables scale very soon (maybe billion rows in tbl_items_transactions)
I have tried MySQL, MariaDB, PostgreSQL... they are very good but when table scale and millions rows inserted they are slow when run some select queries on tbl_items_transactions... but sometimes PostgreSQL is faster than MySQL or MariaDB
now I think I'm doing wrong things... If you was me... do you use MariaDB or PostgreSQL or somthing like that and structure your database like what I did?
Your setup is wrong.
You should not duplicate the columns from tbl_items in tbl_items_transactions, rather you should have a foreign key in the latter table pointing to the former.
That way data integrity is preserved, and tbl_items_transactions will be much smaller. This technique is called normalization.
To speed up queries when the table get large, define indexes on them that match the WHERE and JOIN conditions.

Most efficient database design for this data

My database knowledge is reasonable I would say, im using MySQL (InnoDb) for this and have done some Postgres work as well. Anyway...
I have a large amount of Yes or No questions.
A large amount of people can contribute to the same poll.
A user can choose either option and this will be recorded in the database.
User can change their mind later and swap choices which will require an update to the data stored.
My current plan for storing this data:
POLLID, USERID, DECISION, TIMESTAMP
Obviously user data is in another table.
To add their choice, I would have to query to see if they have voted before and insert, otherwise, update.
If I want to see the poll results I would need to go iterate through all decisions (albeit indexed portions) every time someone wants to see the poll.
My questions are
Is there any more efficient way to store/query this?
Would I have an index on POLLID, or POLLID & USERID (maybe just a unique constraint)? Or other?
Additional side question: Why dont I have an option to choose HASH vs BTREE indexes on my tables like i would in Postgres?
The design sounds good, a few ideas:
A table for polls: poll id, question.
A table for choices: choice id, text.
A table to link polls to choices: poll id->choice ids.
A table for users: user details, user ids.
A votes table: (user id, poll id), choice id, time stamp. (brackets are a unique pair)
Inserting/updating for a single user will work fine, as you can just check if an entry exists for the user id and the poll id.
You can view the results much easier than iterating through by using COUNT.
e.g.: SELECT COUNT(*) FROM votes WHERE pollid = id AND decision = choiceid
That would tell you how many people voted for "choiceid" in the poll "pollid".
Late Edit:
This is a way of inserting if it doesn't exist and updating if it does:
IF EXISTS (SELECT * FROM TableName WHERE UserId='Uid' AND PollId = 'pollid')
UPDATE TableName SET (set values here) WHERE UserId='Uid' AND PollId = 'pollid'
ELSE
INSERT INTO TableName VALUES (insert values here)

Best solution for saving boolean values and saving cpu and memory on searches

What is the best solution for inserting boolean values on database if you want more query performance and minimum losing of memory on select statement.
For example:
I have a table with 36 fields that 30 of them has boolean values (zero or one) and i need to search records using the boolean fields that just have true values.
SELECT * FROM `myTable`
WHERE
`field_5th` = 1
AND `field_12th` = 1
AND `field_20` = 1
AND `field_8` = 1
Is there any solution?
If you want to store boolean values or flags there are basically three options:
Individual columns
This is reflected in your example above. The advantage is that you will be able to put indexes on the flags you intend to use most often for lookups. The disadvantage is that this will take up more space (since the minimum column size that can be allocated is 1 byte.)
However, if you're column names are really going to be field_20, field_21, etc. Then this is absolutely NOT the way to go. Numbered columns are a sign you should use either of the other two methods.
Bitmasks
As was suggested above you can store multiple values in a single integer column. A BIGINT column would give you up to 64 possible flags.
Values would be something like:
UPDATE table SET flags=b'100';
UPDATE table SET flags=b'10000';
Then the field would look something like: 10100
That would represent having two flag values set. To query for any particular flag value set, you would do
SELECT flags FROM table WHERE flags & b'100';
The advantage of this is that your flags are very compact space-wise. The disadvantage is that you can't place indexes on the field which would help improve the performance of searching for specific flags.
One-to-many relationship
This is where you create another table, and each row there would have the id of the row it's linked to, and the flag:
CREATE TABLE main (
main_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
);
CREATE TABLE flag (
main_id INT UNSIGNED NOT NULL,
name VARCHAR(16)
);
Then you would insert multiple rows into the flag table.
The advantage is that you can use indexes for lookups, and you can have any number of flags per row without changing your schema. This works best for sparse values, where most rows do not have a value set. If every row needs all flags defined, then this isn't very efficient.
For performance comparisson you can read a blog post I wrote on the topic:
Set Performance Compare
Also when you ask which is "Best" that's a very subjective question. Best at what? It all really depends on what your data looks like and what your requirements are and how you want to query it.
Keep in mind that if you want to do a query like:
SELECT * FROM table WHERE some_flag=true
Indexes will only help you if few rows have that value set. If most of the rows in the table have some_flag=true, then mysql will ignore indexes and do a full table scan instead.
How many rows of data are you querying over? You can store the boolean values in an integer value and use bit operations to test for them them. It's not indexable, but storage is very well packed. Using TINYINT fields with indexes would pick one index to use and scan from there.

MySQL tables relationship and the use of md5 hash

I have a MySQL DB with 2 tables:
sample_name (stores name of a file, multiple names for same sample_hash);
sample_hash (stores the hashes of a file, will not store duplicate md5);
(all tables have an id int unsigned NOT NULL auto_increment)
My first option to relate these two tables is by creating an md5 column in both tables and relate them. However this seems to have a downside as I will be duplicating a varchar(32), which can be a waste of space with millions of records.
My second option is to calculate the file hashes first, grab the mysql_insert_id() of the sample_hash table and insert into the sample_name table. This makes sense if the hash in the sample_hash table is new, thus I have the mysql_insert_id() variable at my disposal.
But if the hash already exist in the samples_db, I don't want to store the hash again, so I will have no mysql_insert_id().
Is there an alternative other than searching the id of a given md5 in order to store it in the samples_name table in case the md5 already exist? If so, how can I do that?
From the requirements that you describe, there is no need for the sample_hash table at all.
You can keep the hashes in the sample_name table and do all your lookups of hash values in that table.