My database knowledge is reasonable I would say, im using MySQL (InnoDb) for this and have done some Postgres work as well. Anyway...
I have a large amount of Yes or No questions.
A large amount of people can contribute to the same poll.
A user can choose either option and this will be recorded in the database.
User can change their mind later and swap choices which will require an update to the data stored.
My current plan for storing this data:
POLLID, USERID, DECISION, TIMESTAMP
Obviously user data is in another table.
To add their choice, I would have to query to see if they have voted before and insert, otherwise, update.
If I want to see the poll results I would need to go iterate through all decisions (albeit indexed portions) every time someone wants to see the poll.
My questions are
Is there any more efficient way to store/query this?
Would I have an index on POLLID, or POLLID & USERID (maybe just a unique constraint)? Or other?
Additional side question: Why dont I have an option to choose HASH vs BTREE indexes on my tables like i would in Postgres?
The design sounds good, a few ideas:
A table for polls: poll id, question.
A table for choices: choice id, text.
A table to link polls to choices: poll id->choice ids.
A table for users: user details, user ids.
A votes table: (user id, poll id), choice id, time stamp. (brackets are a unique pair)
Inserting/updating for a single user will work fine, as you can just check if an entry exists for the user id and the poll id.
You can view the results much easier than iterating through by using COUNT.
e.g.: SELECT COUNT(*) FROM votes WHERE pollid = id AND decision = choiceid
That would tell you how many people voted for "choiceid" in the poll "pollid".
Late Edit:
This is a way of inserting if it doesn't exist and updating if it does:
IF EXISTS (SELECT * FROM TableName WHERE UserId='Uid' AND PollId = 'pollid')
UPDATE TableName SET (set values here) WHERE UserId='Uid' AND PollId = 'pollid'
ELSE
INSERT INTO TableName VALUES (insert values here)
Related
I have mysql database hosted on one of the websites hosting services companies -Hostinger-, this database used from mobile app by php APIs.
There are many tables.
I will show important tables with only the important columns as objects to be easier for understanding:
user(id, username, password, balance, state);
cardsTrans(id, user_id, number, password, price, state);
customersTrans(id, user_id, location, state);
posTrans(id, user_id, number, state);
I thought create one table instead of these three transactions tables, and this table showed like:
allTransaction(id, user_id, target_id, type, card_number, card_pass, location);
I know that there is a redundancy and some columns will get null, and I can normalize this table, but the normalization will produced with many join when query the data and I interested the response time.
To explain the main idea: the user can do three types of transactions(each type is with different table), these transactions stored on allTransaction table with user_id as foreign key from users table and target_id as foreign key from other table, determined in depends on the type.
the other columns also depends on the type and maybe set to null.
What I want is to determine which better for response time and performance when users using the app. The DML operations(insert , update, delete) applied frequently on these tables, and also very much queries, Usually querying by user_id and target_id.
If I used one table, this table will have very large number of rows and many null values in each row, so slowing the queries and take large storage.
If the table has index, the index will slowing the insert or update operations.
Is creating partition per user on the table without indexes will be better for response time with any operation (select, insert, update, or delete), or creating multiple tables (table per user) is better. the expected number of users is between (500 - 5000).
I searched and found this similar question MySQL performance: multiple tables vs. index on single table and partitions
But it doesn't in the same context when I interested in response time and then the performance, also my database is hosted on hosting server and not in the same device with the mobile app.
Who can tell me what is better and why?
As a general rule:
Worst: Multiple tables
Better: Builtin PARTITIONing
Best: Neither, just better indexing.
If you want to talk specifically about your case, please provide SHOW CREATE TABLE and the main SELECTs, DELETEs, etc.
It is possible to "over-normalize".
three types of transactions(each type is with different table)
That can be tricky. It may be better to have one table for transactions.
"Response time" -- Are you expecting hundreds of writes per second?
take large storage.
Usually proper indexing (especially with 'composite' indexes) makes table size not a performance issue.
partition per user on the table
That is no faster than having an index starting with user_id.
If the table has index, the index will slowing the insert or update operations.
The burden on writes is much less than the benefit on reads. Do not avoid indexes for that reason.
(I can be less vague if you provide tentative CREATE TABLEs and SQL statements.)
Instead of trying to predict the future, use the simplest schema that will work for now and be prepared to change it when you learn more by actual use. This means avoid scattering assumptions about the schema around the code. Look into the concept of Schema Migrations to safely change your schema and the Repository Pattern to hide the details of how things are stored. 5000 users is not a lot (unless they will all be using the system at the same time).
For now, go with the design that provides the strongest referencial integrity. That means as many not null columns as possible. While you're developing the product, you're going to be introducing bugs which might accidentally insert nulls where it should insert a value. Referencial integrity provides another layer of protection.
For example, if you have a single AllTransactions table which might have some fields filled in and might not depending on the type of transaction your schema has to make all these columns nullable. The schema cannot protect you from accidentally inserting a null value.
But if you have individual CardTransactions, CustomerTransactions, and PosTransactions tables their schemas can be constrained to ensure all the necessary fields are always filled in. This will catch many different sorts of bugs.
A variation on this is to have a single UserTransaction table which stores all the generic information about a user transaction (user_id, timestamp) and then join tables for each type of transaction. Here's a sketch.
user_transactions
id bigint primary key auto_increment
user_id integer not null references users on delete casade
-- Fields common to every transaction below
state enum(...) not null
price numeric not null
created_at timestamp not null default current_timestamp()
card_transactions
user_transaction_id bigint not null references user_transactions on delete cascade
card_id integer not null references cards on delete casade
..any other fields for card transactions...
pos_transactions
user_transaction_id bigint not null references user_transactions on delete cascade
pos_id integer not null references pos on delete cascade
..any other fields for POS transactions...
This provides full referential integrity. You can't make a card transaction without a card. You can't make a POS transation without a POS. Any fields required by a card transaction can be set not null. Any fields required by a POS transaction can be set not null.
Getting all transactions for a user is a simple indexed query.
select *
from user_transactions
where user_id = ?
And if you only want one type do a left join, also a simple indexed query.
select *
from card_transactions ct
join user_transactions ut on ut.id = ct.user_transaction_id
where ut.user_id = ?
I have a MySQL table of Users, and a table of Actions performed by the Users (linked to that User by a the primary key, userid ). The Actions table has an incrementing key indx. Whenever I add a new row to that table, I then update the latest column of the relevant Users row with the indx of the row I just added to the Actions table. So something like:
INSERT INTO actions(indx,actionname,userid) VALUES(default, "myaction", 1);
UPDATE users SET latest=LAST_INSERT_ID() WHERE userid=1;
The idea being that I can check for updates for a User by seeing if the latest is higher then the last time I checked.
My issue is that if more than one connection is opened on the database and they try and add an Action for the same User at the same time, connection2 could conceivably run their INSERT and UPDATE between the INSERT and update of connection1, and the latest entry of the user they're both trying to update will no longer have the indx of the most recent action entry.
I've been reading up on transaction, isolation levels, etc. But haven't really found a way around this (though my understanding of how these work exactly is pretty shaky, so maybe I just misunderstood). I think I need a way to lock the Actions table until the User table is updated. This application only gets used by a few hundred users tops, so I don't think the performance hit due to momentarily locking the table will be too bad.
So is that something that can be done in MySQL? Is there a better solution? I imagine this general pattern must be pretty common: having one table with a bunch of varieties of rows, and a second table with a row that tracks meta data for each variety in table A and needs to be updated atomically each time that first table is changed. So I'm hoping there's a solution that isn't too complex
Use SELECT ... FOR UPDATE to lock the row in order to serialize the access to the table and prevent from race conditions:
START TRANSACTION;
SELECT any_column FROM users WHERE userid=1 FOR UPDATE;
INSERT INTO actions(indx,actionname,userid) VALUES(default, "myaction", 1);
UPDATE users SET latest=LATEST_INSERT_ID() WHERE userid=1;
COMMIT;
However this will slown down your INSERTing rate, because all these transactions from all sessions will be serialized.
The better option is to not store the last ID in users table at all. Just use SELECT max( id ) FROM actions WHERE userid = xxxx in all places where this number is required. With an index on actions( userid ) this query will be very fast (assuming that id column is the primary key in this table), and the inserts will not be slowed down
Here is my sample query:
SELECT userid,count(*)
FROM hits
GROUP BY userid
My table is something like this
id | userid | time ...etc
Where id is the primary key and I use this table to store every visit on a page.
Which means my table has 200,000+ rows.
For a userid lets say X i want to find out on which rank it is in the query that means how many users have visited that page more than the user with that userid.
I know there are many questions LIKE this but they aren't same because
My Query has group by
I tried quite a few answers here some don't even return anything while others take 5-10 mins. I need it to be faster.
for any further doubts pls clarify in comments
Thanks
Count/Group by in a query that's expected to return multiple rows will get progressively slower because the query will still have to touch every row in the table. Generally, if you expect to do reports like this often, and you expect your table to continue to grow, you should begin rolling that value into a cached value (so you should store every result still, but you should also add to a counter on that user's user record). Of course, this also begs the question of whether you have an index on your user_id column and a foreign key into your users table, which should speed that query up considerably.
I have the following MySQL table:
myTable:
id int auto_increment
voucher int not null
id_user int null
I've populated voucher field with values from 1 to 100000 so I've got 100000 records. When a user clicks a button in a PHP page, I need to allocate a record for the user so I make something similar like:
update myTable set id_user=XXX where
voucher=(SELECT * FROM (SELECT MIN(voucher) FROM myTable WHERE id_user is null) v);
The problem is that I don't use locks and I should use them because if two users click in the same moment I risk assigning the same voucher to different persons (2 updates in the same record so I lose 1 user).
I think there must be a correct way to do this, can you help me please?
Thanks!
If you truly want to serialize your process, you can grab a Lock Tables tablename Write at the start of your transaction, and Unlock Tables when done.
If you are using Innodb and transactions, you have to perform the Lock Tables after the start of the transaction.
I am not advocating this method, as there is usually a better way of handling, however if you need a quick and dirty solution, this will work with a minimal amount of code changes.
If I am building a multi-shop e-commerce solution and want the orders table to maintain a shop based sequential ID, what is the best way of doing this?
For instance imagine these order IDs in sequence: -
UK0001
UK0002
UK0003
DE0001
UK0004
DE0002
etc.
through grouped PK ID MySQL / MyISAM
MySQL will manage this automatically if a country field and
an auto incrementing ID field are
used. But MyISAM has some inherent
problems such as table locking and
this feature seems like it's feature
that is only available in MyISAM so
moving database engine would not be
possible with this solution.
Programmatically. Let's say we have
two fields: order_id (global auto
inc PK column managed by DB),
order_number (country specific
sequential ID field maintained
through code) and the table also has
a shop_id column to associate orders
to shops.
So - after the new order record has been created and the DB engine has assigned an ID to the new record, and the newly created order ID has been retrieved in code as variable $newID
select order_number+1 as new_order_number from orders where order_id < $newID and shop_id = UK order by order_id desc limit 1
(this is pseudo code / sql btw)
Questions:
is this a feasible solution? Or is there a better more efficient way to do this?
When the table has 1 million + records in it, will the additional query overhead per order submission cause problems, or not?
It seems there'd be a chance of order_number clashes if two orders are placed for the same country and they get processed simultaneously. If this a possibility? if so, is there a way of protecting against it? (perhaps a unique index and a transaction?)
Look forward to your help!
Thanks
Yes you are definitely on the right track. Set the ORDER ID field as UNIQUE and do the exact same thing you were trying to do. Add a catch statement where if it is not added because of the UNIQUE error, then try to insert it again with the same statement to ensure that the ORDER ID is never inserted with the same ID at the same time.