Scenario:
Designing a chat room for various users to chat at a time. All the chats needs to saved. Whenever user logs in, he should be able to see all the previous chats.
Here is one example of the table that can be used for storing the chats:
CREATE TABLE chat
(
chat_id int NOT NULL auto_increment,
posted_on datetime NOT NULL,
userid int NOT NULL,
message text NOT NULL,
PRIMARY KEY (chat_id),
FOREIGN KEY(userid) references users(userid) on update cascade on delete cascade
);
For retrieving chats in proper order, I need some primary key in the table in which I am storing the chats. So, if I use the above table for storing chats then I cannot store more than 2147483647 chats. Obviously, I can use some datatype which has huge range like unsigned bigint, but still it will have some limit.
But as the scenario says that the chats to be saved can be infinite, so what kind of table should I make? Should I make some other primary key?
Please help me sorting out the solution. I wonder how Google or facebook manage to save every chat.
If you weren't using MySQL, a primary key of the user id and a timestamp would probably work fine. But MySQL's timestamp only resolves to one second. (See below for recent changes that affect this answer.) There are a few ways to get around that.
Let application code handle a primary key violation by waiting a
second, then resubmitting.
Let application code provide a higher-precision timestamp, and store
it as a sortable CHAR(n), like '2011-01-01 03:45:46.987'.
Switch to a dbms that supports microsecond timestamps.
All that application code needs to be server-side code if you intend to write a query that presents rows ordered by timestamp.
Later
The current version of MySQL supports fractional seconds in timestamps.
Related
I'm designing an application and I need to create an user registration system. I have the following table structure
Where it doesn't click for me is that should I separate all password related columns onto another table as PASSWORD and connect this to main user table with a foreign key. Having said that, currently passwords are derived with a key derivation algorithm meaning two passwords wouldn't yield the same output digest. However, I wonder if having the user table like this or with a foreign key connecting to the password related columns would increase the performance by any means?
You seem to have an interest in historical passwords. That suggests that you have the wrong data model. It sounds like you want a type-2 table -- one that keeps track of passwords over time:
create table user_passwords (
user_password_id int auto_increment primary key,
user_id int not null,
password varchar(100),
eff_date datetime not null,
end_date datetime,
constraint fk_user_passwords_user_id (user_id) references users(user_id)
);
When a user changes the password, you would then insert a new row into this table, adjusting the eff_date and end_dates.
Note: The purpose of doing this is not for performance. The purpose is to accurately represent the data that you seem to need for your application.
This doesn't include the "trials". I'm not sure what that really means and it probably doesn't need to be kept historically, so that can stay in the users table.
The Citus documentation for the master_get_table_metadata function states:
part_storage_type: Type of storage used for the table. May be ‘t’ (standard table), ‘f’ (foreign table) or ‘c’ (columnar table).
But I searched the entire documentation and found no examples of how to work with tables distributed using the ‘f’ (foreign table) partition storage type.
I suppose the initial foreign table could be created using:
CREATE FOREIGN TABLE audit (
id integer NOT NULL,
ctime timestamp without time zone DEFAULT now() NOT NULL,
site_id integer NOT NULL,
client_id integer,
done_time timestamp without time zone,
status text DEFAULT 'NEW' NOT NULL,
file_id character varying(16) DEFAULT ''::character varying NOT NULL
) SERVER mysql_svr OPTIONS (dbname 'constructor', table_name 'audit');
But how do I distribute such a table after creating it? How will the shards be created?
Update
I have found this
FOREIGN (‘f’) — Indicates that shard stores foreign data. (Used by distributed file_fdw tables)
So my question remains: is it possible to use other foreign data wrappers, such as mysql_fdw?
Creating distributed foreign tables has only partial support right now within Citus.
Let's take your example:
CREATE FOREIGN TABLE audit (
id integer NOT NULL,
ctime timestamp without time zone DEFAULT now() NOT NULL,
site_id integer NOT NULL,
client_id integer,
done_time timestamp without time zone,
status text DEFAULT 'NEW' NOT NULL,
file_id character varying(16) DEFAULT ''::character varying NOT NULL
) SERVER mysql_svr
OPTIONS (dbname 'constructor', table_name 'audit');
You can now distribute this using:
SELECT * FROM master_create_distributed_table('audit', 'id', 'append');
And create shards using:
SELECT master_create_worker_shards('audit', <shard_count>);
However, each shard created on the worker node will inherit the same options as the master node. Thus, each shard will point, in this example, to dbname 'constructor', and foreign table 'audit'. There would be limited value in creating such a distribution, since even though Citus will issue parallel queries, they will all again be sent to a single node and table.
To construct a more useful example, let's say you already have some (let's say 8) sharded MySQL tables, e.g. audit_1, audit_2, ..., audit_8.
You can construct the same table as above, and create a distributed setup like so:
SELECT * FROM master_create_distributed_table('audit', 'id', 'append');
And create shards using:
SELECT master_create_worker_shards('audit', 8);
You would now need to log into each Citus worker node, and update each shard to point to it's relevant MySQL shard.
e.g:
ALTER TABLE audit_100208 OPTIONS (SET table_name 'audit_1');
If you have tables spread across multiple nodes or databases, you'd need to manually create specific servers for each foreign node on every Citus worker node.
There are caveats here to be careful of. For one, we marked the distribution as 'append', because we don't know the underlying distribution of the foreign table. If you use hash, you may get wrong partition pruning via Citus. There may be other caveats too, as this isn't a use-case we actively support or have tested. From a historical perspective, we primarily used this as a proof-of-concept to try reading flat-files spread across multiple nodes.
** Edit **
Adding responses to the other questions by Eugen.
Also, please note, such Q/A is best suited for the mailing list here:
https://groups.google.com/forum/#!forum/citus-users
By 'partial support', I meant we will push down the foreign table creation, but will not automatically map different foreign table settings to different shards.
SQL and PostgreSQL has a wide range of features, and we don't currently support all of them. We are compiling a list of available features, but in the meantime let us know if there is any features you are interested in.
We do automatically create shards with storage-type 'f', when you issue master_create_distributed_table.
I have read this two questions:
Is there a MySQL option/feature to track history of changes to records?
How to version control a record in a database
I understood how the version system should work but I have a a particular question for my situation.
For example I have this table:
Let us say that this table has around 4000 records in it. I will display to the user 100 records once based on a preset configuration, for example display all record where record A value is foo.
The user now has the possibility to change any of the 100 records, for example let us say that he changes 4 records and he leaves the other 96 records untouched.
My question is:
If the user changes only 4 record from the preset configuration what is the best way to keep track of the changes and beside that the track of configurations (the way that the 100 record looked like in a particular date before changing).
I can add a start_date and end_date fields to keep track of the configurations in another table but it doesn't feel right to have a table and fill it with 100 record from which only 4 record changed just to be able to know how the configuration looked at a certain date and what record changed according to the version from that date. At the end I will end up with hundreds of duplicated content that has only the date field different. What is the ideal solution for this situation?
Later Edit:
The main idea is to obtain something like this:
I want to be able to see each configuration version (version 1, 2, 3, 4) from the specific creation date. Each configuration is containing old rows (from previous configuration version) + the rows modified by the user in the new version.
Based on our chat discussion, and this link as a talking point,
consider the following schema and expand upon it.
-- drop table bom
create table bom
( -- Bill of Materials
bomId int auto_increment primary key
-- other high level fields
);
-- drop table bomVersion
create table bomVersion
( -- Bill of Materials / version
id int auto_increment primary key,
bomId int not null,
-- other high level fields
version int not null, -- you need to figure out how to increment this, and it is not an auto inc here
description varchar(1000), -- ie: let's add a Floppy Drive
creationDate datetime not null,
unique key(bomId,version), -- no dupes
-- yes, (bomId,version) could be the PK but I chose not to
CONSTRAINT fk_version_bom FOREIGN KEY (bomId) REFERENCES bom(bomId)
);
-- drop table bvDetails;
create table bvDetails
( -- Bill of Materials / Version / Details
id int auto_increment primary key,
bvId int not null,
lineNumber int not null, -- if ordering is important
partId int not null,
qty int not null, -- I am no BOM expert, is this stuff in there?
price decimal(12,2) not null, -- I am no BOM expert, is this stuff in there?
-- FK constraints back to Part table and bvId, below shows one of them
CONSTRAINT fk_det_bomversion FOREIGN KEY (bvId) REFERENCES bomVersion(id)
);
One of the biggest challenges is how to capture the changes made in Part descriptions if they change. So in that link at the very top, if that Case SX1040 has a change in description from Easy Access to Easy Access / Well vented.
So in that case a re-print of a BOM (that was supposed to be nailed down by ISO standards) is going to change. That is not good. So you need to have an audit, a history, of changes to rows that are textual, and save those ids (like for the Part Number). So to be clear, though you can have a Parts table, also have a PartsHistory table (and the id's from the latter go in the bom).
The numerics like price and qty are cool to save like in the above schema. It is the textual history changes that is problematic, and you need to solve that as described in the previous paragraph.
Note, I once wrote a system where in the case of changes to the text columns, we would keep all the revisions all in the same table and have only 1 row (say, for that part) marked as active='Y' for any given item. This way a join was not necessary to the other history table. Either way, you have flexibility from your GUI to select which version you want. Remember from an audit standpoint, you need to have an updateBy (personId) and an updatedDt in these tables.
Edit
Your question just changed. See new column in table bomVersion
I have a table that does not require a primary key. It consists of 4 columns which are email,date,time,message. Each time a user logs in, logs out, or does any particular action that is important, I log the email, date, time and action (message). Currently the table is setup with email as the Primary Key but I am unable to insert more than one record with the same email. I suppose I could use time as the PK but there is the possibility that two actions fall on the same time. How can I just use the table without a PK? I have tried turning it off for the email column but it does not allow me to.
Yes as you have defined email field as your primary, it can hold unique data only and no duplication allowed.
So you have two options:
1: Remove email field as a primary key
2: Add new integer field as a Primary key with auto increment (I would prefer this one)
You could use a natural primary key that would be a combination of Email + Date + Time + Action. That combination would be unique. It is impossible for the same user to do 2 different actions at the same time. That will help you to keep integrity of your information.
Hope this helps you.
To make a decision on a table' primary key one may start with considering these points (applicable to innodb):
How the data is going to be accessed after it is written (if you don't query it, why store it?). If you care about read performance you should query your data by the primary key, since for innodb primary key is the only possible clustered index.
The data is stored ordered by the primary key, so if you care about write performance, you should write data ideally ordered by your primary key, which always happens automatically if you have an auto_increment. Also table for which you don't explicitly specify a primary key are going to have a hidden auto_increment field which you won't be able to access, i.e. you get less for the same cost.
I have a portal model. Every user gets their own portal. Now the user can customize various strings across multiple pages on that portal to show their customers. Overall there are around 50 strings and an average user changes from defaults to around 7 of them.
One way to go about is to use a table per page and push strings for a page to that table as columns and map those tables to the portal. However, this would create 5 additional tables, models and corresponding management of forms. Also, these would be very sparse tables.
Is there a better way for this?
Why have a separate table for each page? It's very likely that terms will be shared across pages and when you add a new page to the app you will then have to add a new table??!!
My preference would just to have one table with the 'term' being the value used in your html and the other being the user preference
table user_strings (
user_id int not null
term varchar not null
val varchar not null
primary key(user_id, term)
foreign key (user_id) references user(id) on delete cascade
index (term)
)
If you are not using composite primary keys, then add a the default id