MYSQL InnoDB engine indexing (B-Tree) - mysql

I have the following user registration table:
ID INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
User_name VARCHAR(35) NOT NULL,
Full_name VARCHAR(55) NOT NULL,
User_vars VARCHAR(255) NOT NULL,
BirthDay DATE NOT NULL,
password CHAR(70) NOT NULL,
Security_hint VARCHAR(27) NOT NULL,
Email VARCHAR(225) NOT NULL,
userType ENUM ('a','b','c','d') NOT NULL DEFAULT 'a',
Signup_Date TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
activation TINYINT(1) UNSIGNED NOT NULL,
Ip BINARY(16) NOT NULL,
PRIMARY KEY (ID),
UNIQUE KEY User_name (User_name,Email)
I am really confused to whether I SHOULD index the fields BirthDay, Signup_Date, Full_name, User_vars and userType as this information will have to be displayed in any page the user visits.
User_varsis a field that the user might be updating on daily basis (or even weekly); I think that indexing this field might not be good idea as indexes slow down write operations; correct me if wrong. I have many other fields like this one in other tables (fields that user updates on daily basis and their updates have toble displayed in any page they visit ), and don't know if they have to be indexed as well
Note: I know that I can query them once then cache them, however I am afraid that the first query to the table will be slow to return the result-set.
Many thanks

For performance change the VARCHAR to TINYTEXT which allows the rows to have a fixed size thereby speeding up table scans. Based on the table likely queries are for User_name and Email. Change those to CHAR and index those.

Related

How to generate a unique UID from user's email in RESTful API?

I am creating WebService for a website, in which I have to generate the UID from the user's email/name. And this process should be at the user signup step only. Like the way, we have our twitter unique ID. I have a few questions in my mind:
It should be the responsibility of the client or the WebService?
I think it should be of WebService.
If webservice is responsible, then what should be the logic for generating a UID from myname#example.com.
One solution could be to extract the myname from email and append the user_id to its last. But for auto-generated user_id(MYSQL), this cannot be a solution. Also, the whole idea of using UID is to hide the user_id(integer) from URL in the browser, so this solution will again expose the user_id.
Another solution could be to append some random numbers at the end of myname and if ConstraintViolation occurs, then try with some other number. But this will take a hell lot of time only for the user signup operation.
What is the ideal and efficient way to handle this requirement?
This is my MySql table schema:
CREATE TABLE IF NOT EXISTS `user` (
`user_id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`unique_id` VARCHAR(45) NOT NULL,
`email` VARCHAR(45) NOT NULL,
`password` VARCHAR(500) NULL,
`first_name` VARCHAR(45) NOT NULL,
`last_name` VARCHAR(45) NOT NULL,
`created_on` DATETIME NOT NULL,
`gender` VARCHAR(2) NULL COMMENT 'M - male\nF - female\nO - other',
UNIQUE INDEX `unique_id_UNIQUE` (`unique_id` ASC),
UNIQUE INDEX `email_UNIQUE` (`email` ASC),
PRIMARY KEY (`user_id`))
ENGINE = InnoDB;
You can use php uniqid function to create user's unique id and you can save it in your user tabl's column unique_id. It will be unique
You can use below code to generate some unique ids.
<?php
echo uniqid("username");
?>

Longest MySQL queries for worst case testing

I have a big mysql Database (planned is about one million entries) and I want to test its performance by creating a worst query (longest calculation time) i am able to.
For now it is a database with two tables:
CREATE TABLE user (ID BIGINT NOT NULL AUTO_INCREMENT,
createdAt DATETIME NULL DEFAULT NULL,
lastAction DATETIME NULL DEFAULT NULL,
ip TEXT NULL DEFAULT NULL,
browser TEXT NULL DEFAULT NULL,
PRIMARY KEY (ID))
CREATE TABLE evt (ID BIGINT AUTO_INCREMENT,
UID BIGINT NULL DEFAULT NULL,
timeStamp DATETIME NULL DEFAULT NULL,
name TEXT NULL DEFAULT NULL,
PRIMARY KEY (ID),
FOREIGN KEY (UID)
REFERENCES user(ID))
It's populated and is running locally so no connection is required.
Are there any rules of Thumb on how to create horrible queries?
My worst query for now was:
SELECT user.browser, evt.name, count(*) as AmountOfActions
FROM evt
JOIN user ON evt.UID = user.ID
GROUP BY user.browser, evt.name
ORDER BY AmountOfActions DESC
The number one cost in a query is disk hits. So, make a table big enough so that it cannot be cached in RAM. And/or do a cross-join (etc) such that an intermediate table is too big to be cached in RAM.
A common problem on this forum is lots of joins followed by a group by. Or lots of joins, plus an order by on the big intermediate result.
Here's a double-whammy -- join two tables (each too big to be cached) on a UUID.

Add an effective index on a huge table

I have a MySQL database table with more than 34M rows (and growing).
CREATE TABLE `sensordata` (
`userID` varchar(45) DEFAULT NULL,
`instrumentID` varchar(10) DEFAULT NULL,
`utcDateTime` datetime DEFAULT NULL,
`dateTime` datetime DEFAULT NULL,
`data` varchar(200) DEFAULT NULL,
`dataState` varchar(45) NOT NULL DEFAULT 'Original',
`gps` varchar(45) DEFAULT NULL,
`location` varchar(45) DEFAULT NULL,
`speed` varchar(20) NOT NULL DEFAULT '0',
`unitID` varchar(5) NOT NULL DEFAULT '1',
`parameterID` varchar(5) NOT NULL DEFAULT '1',
`originalData` varchar(200) DEFAULT NULL,
`comments` varchar(45) DEFAULT NULL,
`channelHashcode` varchar(12) DEFAULT NULL,
`settingHashcode` varchar(12) DEFAULT NULL,
`status` varchar(7) DEFAULT 'Offline',
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=98772 DEFAULT CHARSET=utf8
I access this table from multiple threads (at least 400 threads) every minute to insert data into the table.
As the table was growing, it was getting slower to read and write the data. One SELECT query used to take about 25 seconds, then I added a unique index
UNIQUE INDEX idx_userInsDate ( userID,instrumentID,utcDateTime)
This reduced the read time from 25 seconds to some milliseconds but it has increased the insert time as it has to update the index for each record.
Also If I run a SELECT query from multiple threads as the same time the queries take too long to return the data.
This is an example query
Select dateTime from sensordata WHERE userID = 'someUserID' AND instrumentID = 'someInstrumentID' AND dateTime between 'startDate' AND 'endDate' order by dateTime asc;
Can someone help me, to improve the table schema or add an effective index to improve the performance, please.
Thank you in advance
A PRIMARY KEY is a UNIQUE key. Toss the redundant UNIQUE(id) !
Is id referenced by any other tables? If not, then get rid of it all together. Instead have just
PRIMARY KEY ( userID, instrumentID, utcDateTime)
That is, if that triple is guaranteed to be unique. You mentioned DST -- use the datatype TIMESTAMP instead of DATETIME. Doing that, you can convert to DATETIME if needed, thereby eliminating one of the columns.
That one index (the PK) takes virtually no space since it is "clustered" with the data in InnoDB.
Your table is awfully fat with all those VARCHARs. For example, status can be reduced to a 1-byte ENUM. Others can be normalized. Things like speed can be either a 4-byte FLOAT or some smaller DECIMAL, depending on how much range and precision you need.
With 34M wide rows, you have probably recently exceeded the cacheability of the RAM you have. By making the row narrower, you will postpone that overflow.
Why attack the indexes? Every UNIQUE (including PRIMARY) index is checked before allowing the row to be inserted. By getting it down to 1 index, that minimizes the cost there. (InnoDB really needs a PRIMARY KEY.)
INT is 4 bytes. Do you have a billion instruments? Maybe instrumentID could be SMALLINT UNSIGNED, which is 2 bytes, with a max of 64K? Think about all the other IDs.
You have 400 INSERTs/minute, correct? That is not bad. If you get to 400/second, we need to have a different talk.
("Fill factor" is not tunable in MySQL because it does not make much difference.)
How much RAM do you have? What is the setting for innodb_buffer_pool_size? Optimal is somewhere around 70% of available RAM.
Let's see your main queries; there may be other issues to address.
It's not the indexes at fault here. It's your data types. As the size of the data on disk grows, the speed of all operations decrease. Indexes can certainly help speed up selects - provided your data is properly structured - but it appears that it isnt
CREATE TABLE `sensordata` (
`userID` int, /* shouldn't this have a foreign key constraint? */
`instrumentID` int,
`utcDateTime` datetime DEFAULT NULL,
`dateTime` datetime DEFAULT NULL,
/* what exactly are you putting here? Are you sure it's not causing any reduncy? */
`data` varchar(200) DEFAULT NULL,
/* your states will be a finite number of elements. They can be represented by constants in your code or a set of values in a related table */
`dataState` int,
/* what's this? Sounds like what you are saving in location */
`gps` varchar(45) DEFAULT NULL,
`location` point,
`speed` float,
`unitID` int DEFAULT '1',
/* as above */
`parameterID` int NOT NULL DEFAULT '1',
/* are you sure this is different from data? */
`originalData` varchar(200) DEFAULT NULL,
`comments` varchar(45) DEFAULT NULL,
`channelHashcode` varchar(12) DEFAULT NULL,
`settingHashcode` varchar(12) DEFAULT NULL,
/* as above and isn't this the same as */
`status` int,
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=98772 DEFAULT CHARSET=utf8
1st of all: Avoid varchars for indexes and especially IDs. Each character position in the varchar generates an own index-entry internally!
2nd: Your select uses dateTime, your index is set to utcDateTime. It will only take userID and instrumentID and ignore the utcDateTime-Part.
Advise: Change your data types for the ids and change your index to match the query (dateTime, not utcDateTime)
Using an index decreases your performance on inserts, unluckily, there is nothing such as a fill factor for indexes in mysql right now. So the best thing you can do is try the indexes to be as small as possible.
Another approach on heavily loaded databases with random access would be: write to an unindexed table, read from an indexed one. At a given time, build the indexes and swap the tables (may require a third table for the index creation while leaving the other ones untouched in between).

Simple select query takes more time in very large table in MySQL database in C# application

I am using a MySQL database in my ASP.NET with C# web application. The MySQL Server version is 5.7 and there is 8 GB RAM in the PC. When I am executing the select query in MySQL database table, it takes more time in execution; a simple select query takes around 42 seconds. Across 1 crorerecord (10 million records) in the table. I have also done indexing for the table. How can I fix this?
The following is my table structure.
CREATE TABLE `smstable_read` (
`MessageID` int(11) NOT NULL AUTO_INCREMENT,
`ApplicationID` int(11) DEFAULT NULL,
`Api_userid` int(11) DEFAULT NULL,
`ReturnMessageID` varchar(255) DEFAULT NULL,
`Sequence_Id` int(11) DEFAULT NULL,
`messagetext` longtext,
`adtextid` int(11) DEFAULT NULL,
`mobileno` varchar(255) DEFAULT NULL,
`deliverystatus` int(11) DEFAULT NULL,
`SMSlength` int(11) DEFAULT NULL,
`DOC` varchar(255) DEFAULT NULL,
`DOM` varchar(255) DEFAULT NULL,
`BatchID` int(11) DEFAULT NULL,
`StudentID` int(11) DEFAULT NULL,
`SMSSentTime` varchar(255) DEFAULT NULL,
`SMSDeliveredTime` varchar(255) DEFAULT NULL,
`SMSDeliveredTimeTicks` decimal(28,0) DEFAULT '0',
`SMSSentTimeTicks` decimal(28,0) DEFAULT '0',
`Sent_SMS_Day` int(11) DEFAULT NULL,
`Sent_SMS_Month` int(11) DEFAULT NULL,
`Sent_SMS_Year` int(11) DEFAULT NULL,
`smssent` int(11) DEFAULT '1',
`Batch_Name` varchar(255) DEFAULT NULL,
`User_ID` varchar(255) DEFAULT NULL,
`Year_ID` int(11) DEFAULT NULL,
`Date_Time` varchar(255) DEFAULT NULL,
`IsGroup` double DEFAULT NULL,
`Date_Time_Ticks` decimal(28,0) DEFAULT NULL,
`IsNotificationSent` int(11) DEFAULT NULL,
`Module_Id` double DEFAULT NULL,
`Doc_Batch` decimal(28,0) DEFAULT NULL,
`SMS_Category_ID` int(11) DEFAULT NULL,
`SID` int(11) DEFAULT NULL,
PRIMARY KEY (`MessageID`),
KEY `index2` (`ReturnMessageID`),
KEY `index3` (`mobileno`),
KEY `BatchID` (`BatchID`),
KEY `smssent` (`smssent`),
KEY `deliverystatus` (`deliverystatus`),
KEY `day` (`Sent_SMS_Day`),
KEY `month` (`Sent_SMS_Month`),
KEY `year` (`Sent_SMS_Year`),
KEY `index4` (`ApplicationID`,`SMSSentTimeTicks`),
KEY `smslength` (`SMSlength`),
KEY `studid` (`StudentID`),
KEY `batchid_studid` (`BatchID`,`StudentID`),
KEY `User_ID` (`User_ID`),
KEY `Year_Id` (`Year_ID`),
KEY `IsNotificationSent` (`IsNotificationSent`),
KEY `isgroup` (`IsGroup`),
KEY `SID` (`SID`),
KEY `SMS_Category_ID` (`SMS_Category_ID`),
KEY `SMSSentTimeTicks` (`SMSSentTimeTicks`)
) ENGINE=MyISAM AUTO_INCREMENT=16513292 DEFAULT CHARSET=utf8;
The following is my select query:
SELECT messagetext, SMSSentTime, StudentID, batchid,
User_ID,MessageID,Sent_SMS_Day, Sent_SMS_Month,
Sent_SMS_Year,Module_Id,Year_ID,Doc_Batch
FROM smstable_read
WHERE StudentID=977 AND SID = 8582 AND MessageID>16013282
You need to learn about compound indexes and covering indexes. Read about those things.
Your query is slow because it's doing a half-scan of the table. It uses the primary key to find the first row with a qualifying MessageID, then looks at every row of the table to find matching rows.
Your filter criteria are StudentID = constant, SID = constant AND MessageID > constant. That means you need those three columns, in that order, in an index. The first two filter criteria will random-access your index to the correct place. The third criterion will scan the index starting right after the constant value in your query. It's called an Index Range Scan operation, and it's quite efficient.
ALTER TABLE smstable_read
ADD INDEX StudentSidMessage (StudentId, SID, MessageId);
This compound index should make your query efficient. Notice that in MyISAM, the primary key column of a table should appear in compound indexes. That's cool in this case because it's also part of your query criteria.
If this query is used very frequently, you could make a covering index: you could add the other columns of the query (the ones mentioned in your SELECT clause) to the index.
But, unfortunately you have defined your messageText column with a longtext data type. That allows for each message to contain up to four gigabytes. (Why? Is this really SMS data? There's a limit of 160 bytes per message in SMS. Four gigabytes >> 160 bytes.)
Now the point of a covering index is to allow the query to be satisfied entirely from the index, without referring back to the table. But when you include a longtext or any other LOB column in an index, it only contains a subset of the data. So the point of the covering index is lost.
If I were you I would change my table so messageText was a VARCHAR(255) data type, and then create this covering index:
ALTER TABLE smstable_read
ADD INDEX StudentSidMessage (StudentId, SID, MessageId,
SMSSentTime, batchid,
User_ID, Sent_SMS_Day, Sent_SMS_Month,
Sent_SMS_Year,Module_Id,Year_ID,Doc_Batch,
messageText);
(Notice that you should put variable-length items last in the index if you can.)
If you can't change your application to handle VARCHAR(255) then go with the first index I mentioned.
Pro tip: putting lots of single-column indexes on MySQL tables rarely helps SELECT performance and always harms INSERT and UPDATE performance. You need an index on your primary key, and you need indexes to support the queries you run. Extra indexes are harmful.
It looks like your database is not properly indexed and even not properly normalized. Normalizing your database will go a long way to speed up all your queries. Particularly in view of the fact that mysql used only one index per table in a query. Even though you have lot's of indexes, they cannot be used.
Your current query filters on StudentID,SID, and MessageID. The last is an inequality comparision so an index will not be very effective with that but the other two columns are equality comparisons. I suggest an index like this:
KEY `studid` (`StudentID`,`SID`)
Follow that up by dropping your existing index on SID. If you find that you don't want to drop it because it's used in another query, further evidence that your table is in desperate need of normalization.
Too many indexes slow down inserts and adds a little overhead to each SELECT because the query planner needs more effort to figure out which index to use.

What's is better for a members database?

Hi and thanx for reading my post i am having a little trouble learning my database in mysql.
Now i have it set up already but recently, but i had another person tell me my members table is slow and useless if i intend to have a lots of members!
I have looked it over a lot of times and did some google searches but i don't see anything wrong with it, maybe because i am new at it? can one of you sql experts look it over and tell me whats wrong with it please :)
--
-- Table structure for table `members`
--
CREATE TABLE IF NOT EXISTS `members` (
`userid` int(9) unsigned NOT NULL AUTO_INCREMENT,
`username` varchar(20) NOT NULL DEFAULT '',
`password` longtext,
`email` varchar(80) NOT NULL DEFAULT '',
`gender` int(1) NOT NULL DEFAULT '0',
`ipaddress` varchar(80) NOT NULL DEFAULT '',
`joinedon` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`acctype` int(1) NOT NULL DEFAULT '0',
`acclevel` int(1) NOT NULL DEFAULT '0',
`birthdate` date DEFAULT NULL,
`warnings` int(1) NOT NULL DEFAULT '0',
`banned` int(1) NOT NULL DEFAULT '0',
`enabled` int(1) NOT NULL DEFAULT '0',
`online` int(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`userid`),
UNIQUE KEY `username` (`username`),
UNIQUE KEY `emailadd` (`emailadd`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=19 ;
--
-- Dumping data for table `members`
--
It's going to be a site for faqs/tips for games, i do expect to get lots of members at one point later on but i thought i would ask to make sure it's all ok, thanx again peace.
Did the other person explain why they think it is slow and useless?
Here's a few things that I think could be improved:
email should be longer - off the top of my head, 320 should be long enough for most email addresses, but you might want to look that up.
If the int(1) fields are simple on/off fields, then they could be tinyint(1) or bool instead.
As #cularis points out, the ipaddress field might not be the appropriate type. INT UNSIGNED is better than varchar for IPv4. You can use INET_ATON() and INET_NTOA() for conversion. See:
Best Field Type for IP address?
How to store IPv6-compatible address in a relational database
As #Delan Azabani points out, your password field is too long for the value you are storing. MD5 produces a 32 character string, so varchar(32) will be sufficient. You could switch to the more secure SHA2, and use the MySQL 'SHA2()' function.
Look into using the InnoDB database engine instead of MyISAM. It offers foreign key constraints, row-level locking and transactions, amongst other things. See Should you move from MyISAM to Innodb ?.
I don't think it's necessarily slow, but I did notice that among all other text fields where you used varchar, you used longtext for the password field. This seems like you are going to store the password in the database -- don't do this!
Always take a fixed-length cryptographic hash (using, for example, SHA-1 or SHA-2) of the user's password, and put that into the database. That way, if your database server is compromised, the users' passwords are not exposed.
Apart from what #Delan said, I noted that;
JoinedOn column defined as ON UPDATE CURRENT_TIMESTAMP. If you need to maintain only the date joined, you should not update the field when the records been updated.
IPAddress column is VARCHAR(80). If you store IPv4 type IP addresses, this will be too lengthy.
Empty string ('') as DEFAULT for NOT NULL columns. Not good if intention is to have a value (other than '') on the field.
Empty string ('') as DEFAULT for UNIQUE Fields. This contradicts the contraints enforced if your intention is to have a Unique Value (other than '').