Best table/index structure for data with time-varying association - mysql

My company uses a number of data acquisition devices (DAQs) to monitor the output of solar panels on a test site. Each DAQ has a unique serial number, and each solar panel has a unique serial number. Occasionally we swap out the panels for new panels, and occasionally the DAQs fail and need to be replaced with new ones with different serial numbers.
My question is, what is the best table structure for queries to see all of the data for a particular solar panel's serial number, given that it can be on different DAQs at different times?
I'm currently using the following table structure:
Table: relationships
id int(11) NOT NULL AUTO_INCREMENT,
daqID char(4) NOT NULL,
dtdateFirst datetime NOT NULL,
dtdateLast datetime NOT NULL,
PanelType varchar(20) NOT NULL,
sgcucode varchar(45) NOT NULL,
serial varchar(15) NOT NULL,
ptype varchar(15) NOT NULL,
PRIMARY KEY (id),
KEY daqID (daqID),
KEY gcuidx (sgcucode),
KEY serialidx (serial),
KEY fullidx (sgcucode,daqID,serial,dtdateFirst,dtdateLast)
) ENGINE=InnoDB AUTO_INCREMENT=135 DEFAULT CHARSET=utf8
Table: data
id int(11) NOT NULL AUTO_INCREMENT,
dtdate datetime NOT NULL,
daqID char(4) NOT NULL,
Varray text NOT NULL,
Iarray text NOT NULL,
Iavg float NOT NULL,
Pmp float NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY id_UNIQUE (id),
UNIQUE KEY dupliData (dtdate,daqID,Iavg),
KEY idxDaqDate (daqID,dtdate),
KEY idxDate (dtdate),
KEY idxPmp (Pmp)
) ENGINE=InnoDB AUTO_INCREMENT=14027571 DEFAULT CHARSET=utf8
The "relationships" table matches the daqID to a panel serial number for a given time span (from "dtdateFirst" to "dtdateLast"). So the important columns for this question are: daqID, dtdateFirst, dtdateLast, serial. Also of some importance is the "sgcucode". This column indicates which test site the modules are on. It is used by a dashboard so that we can cycle through various sites which log data to the same table.
Data is constantly being logged to the "data" table from the DAQ devices. I then use the relationships table to correlate the serial number of the solar panel with the correct daqID for the time in question.
By far the most common query is to collect all of the data in the "data" table for a given day, and display it in a dashboard (shown below).
This is the query I use to do this:
SELECT relationships.serial as title, dtdate as time , Pmp as Value, relationships.ptype as type
FROM data INNER JOIN (relationships) ON (relationships.daqID=data.daqID)
AND dtdate BETWEEN DATE_FORMAT(example_date, '%Y-%m-%d 05:00:00') AND DATE_FORMAT(example_date, '%Y-%m-%d 21:00:00')
WHERE relationships.dtdateFirst <= dtdate
AND relationships.dtdateLast >= dtdate
AND sgcucode="example_code";
Given these conditions, is this the best solution? I probably have redundant indexes, I am still learning about database design, so any suggestions for improvement would be greatly appreciated!

Related

Add an effective index on a huge table

I have a MySQL database table with more than 34M rows (and growing).
CREATE TABLE `sensordata` (
`userID` varchar(45) DEFAULT NULL,
`instrumentID` varchar(10) DEFAULT NULL,
`utcDateTime` datetime DEFAULT NULL,
`dateTime` datetime DEFAULT NULL,
`data` varchar(200) DEFAULT NULL,
`dataState` varchar(45) NOT NULL DEFAULT 'Original',
`gps` varchar(45) DEFAULT NULL,
`location` varchar(45) DEFAULT NULL,
`speed` varchar(20) NOT NULL DEFAULT '0',
`unitID` varchar(5) NOT NULL DEFAULT '1',
`parameterID` varchar(5) NOT NULL DEFAULT '1',
`originalData` varchar(200) DEFAULT NULL,
`comments` varchar(45) DEFAULT NULL,
`channelHashcode` varchar(12) DEFAULT NULL,
`settingHashcode` varchar(12) DEFAULT NULL,
`status` varchar(7) DEFAULT 'Offline',
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=98772 DEFAULT CHARSET=utf8
I access this table from multiple threads (at least 400 threads) every minute to insert data into the table.
As the table was growing, it was getting slower to read and write the data. One SELECT query used to take about 25 seconds, then I added a unique index
UNIQUE INDEX idx_userInsDate ( userID,instrumentID,utcDateTime)
This reduced the read time from 25 seconds to some milliseconds but it has increased the insert time as it has to update the index for each record.
Also If I run a SELECT query from multiple threads as the same time the queries take too long to return the data.
This is an example query
Select dateTime from sensordata WHERE userID = 'someUserID' AND instrumentID = 'someInstrumentID' AND dateTime between 'startDate' AND 'endDate' order by dateTime asc;
Can someone help me, to improve the table schema or add an effective index to improve the performance, please.
Thank you in advance
A PRIMARY KEY is a UNIQUE key. Toss the redundant UNIQUE(id) !
Is id referenced by any other tables? If not, then get rid of it all together. Instead have just
PRIMARY KEY ( userID, instrumentID, utcDateTime)
That is, if that triple is guaranteed to be unique. You mentioned DST -- use the datatype TIMESTAMP instead of DATETIME. Doing that, you can convert to DATETIME if needed, thereby eliminating one of the columns.
That one index (the PK) takes virtually no space since it is "clustered" with the data in InnoDB.
Your table is awfully fat with all those VARCHARs. For example, status can be reduced to a 1-byte ENUM. Others can be normalized. Things like speed can be either a 4-byte FLOAT or some smaller DECIMAL, depending on how much range and precision you need.
With 34M wide rows, you have probably recently exceeded the cacheability of the RAM you have. By making the row narrower, you will postpone that overflow.
Why attack the indexes? Every UNIQUE (including PRIMARY) index is checked before allowing the row to be inserted. By getting it down to 1 index, that minimizes the cost there. (InnoDB really needs a PRIMARY KEY.)
INT is 4 bytes. Do you have a billion instruments? Maybe instrumentID could be SMALLINT UNSIGNED, which is 2 bytes, with a max of 64K? Think about all the other IDs.
You have 400 INSERTs/minute, correct? That is not bad. If you get to 400/second, we need to have a different talk.
("Fill factor" is not tunable in MySQL because it does not make much difference.)
How much RAM do you have? What is the setting for innodb_buffer_pool_size? Optimal is somewhere around 70% of available RAM.
Let's see your main queries; there may be other issues to address.
It's not the indexes at fault here. It's your data types. As the size of the data on disk grows, the speed of all operations decrease. Indexes can certainly help speed up selects - provided your data is properly structured - but it appears that it isnt
CREATE TABLE `sensordata` (
`userID` int, /* shouldn't this have a foreign key constraint? */
`instrumentID` int,
`utcDateTime` datetime DEFAULT NULL,
`dateTime` datetime DEFAULT NULL,
/* what exactly are you putting here? Are you sure it's not causing any reduncy? */
`data` varchar(200) DEFAULT NULL,
/* your states will be a finite number of elements. They can be represented by constants in your code or a set of values in a related table */
`dataState` int,
/* what's this? Sounds like what you are saving in location */
`gps` varchar(45) DEFAULT NULL,
`location` point,
`speed` float,
`unitID` int DEFAULT '1',
/* as above */
`parameterID` int NOT NULL DEFAULT '1',
/* are you sure this is different from data? */
`originalData` varchar(200) DEFAULT NULL,
`comments` varchar(45) DEFAULT NULL,
`channelHashcode` varchar(12) DEFAULT NULL,
`settingHashcode` varchar(12) DEFAULT NULL,
/* as above and isn't this the same as */
`status` int,
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=98772 DEFAULT CHARSET=utf8
1st of all: Avoid varchars for indexes and especially IDs. Each character position in the varchar generates an own index-entry internally!
2nd: Your select uses dateTime, your index is set to utcDateTime. It will only take userID and instrumentID and ignore the utcDateTime-Part.
Advise: Change your data types for the ids and change your index to match the query (dateTime, not utcDateTime)
Using an index decreases your performance on inserts, unluckily, there is nothing such as a fill factor for indexes in mysql right now. So the best thing you can do is try the indexes to be as small as possible.
Another approach on heavily loaded databases with random access would be: write to an unindexed table, read from an indexed one. At a given time, build the indexes and swap the tables (may require a third table for the index creation while leaving the other ones untouched in between).

Simple select query takes more time in very large table in MySQL database in C# application

I am using a MySQL database in my ASP.NET with C# web application. The MySQL Server version is 5.7 and there is 8 GB RAM in the PC. When I am executing the select query in MySQL database table, it takes more time in execution; a simple select query takes around 42 seconds. Across 1 crorerecord (10 million records) in the table. I have also done indexing for the table. How can I fix this?
The following is my table structure.
CREATE TABLE `smstable_read` (
`MessageID` int(11) NOT NULL AUTO_INCREMENT,
`ApplicationID` int(11) DEFAULT NULL,
`Api_userid` int(11) DEFAULT NULL,
`ReturnMessageID` varchar(255) DEFAULT NULL,
`Sequence_Id` int(11) DEFAULT NULL,
`messagetext` longtext,
`adtextid` int(11) DEFAULT NULL,
`mobileno` varchar(255) DEFAULT NULL,
`deliverystatus` int(11) DEFAULT NULL,
`SMSlength` int(11) DEFAULT NULL,
`DOC` varchar(255) DEFAULT NULL,
`DOM` varchar(255) DEFAULT NULL,
`BatchID` int(11) DEFAULT NULL,
`StudentID` int(11) DEFAULT NULL,
`SMSSentTime` varchar(255) DEFAULT NULL,
`SMSDeliveredTime` varchar(255) DEFAULT NULL,
`SMSDeliveredTimeTicks` decimal(28,0) DEFAULT '0',
`SMSSentTimeTicks` decimal(28,0) DEFAULT '0',
`Sent_SMS_Day` int(11) DEFAULT NULL,
`Sent_SMS_Month` int(11) DEFAULT NULL,
`Sent_SMS_Year` int(11) DEFAULT NULL,
`smssent` int(11) DEFAULT '1',
`Batch_Name` varchar(255) DEFAULT NULL,
`User_ID` varchar(255) DEFAULT NULL,
`Year_ID` int(11) DEFAULT NULL,
`Date_Time` varchar(255) DEFAULT NULL,
`IsGroup` double DEFAULT NULL,
`Date_Time_Ticks` decimal(28,0) DEFAULT NULL,
`IsNotificationSent` int(11) DEFAULT NULL,
`Module_Id` double DEFAULT NULL,
`Doc_Batch` decimal(28,0) DEFAULT NULL,
`SMS_Category_ID` int(11) DEFAULT NULL,
`SID` int(11) DEFAULT NULL,
PRIMARY KEY (`MessageID`),
KEY `index2` (`ReturnMessageID`),
KEY `index3` (`mobileno`),
KEY `BatchID` (`BatchID`),
KEY `smssent` (`smssent`),
KEY `deliverystatus` (`deliverystatus`),
KEY `day` (`Sent_SMS_Day`),
KEY `month` (`Sent_SMS_Month`),
KEY `year` (`Sent_SMS_Year`),
KEY `index4` (`ApplicationID`,`SMSSentTimeTicks`),
KEY `smslength` (`SMSlength`),
KEY `studid` (`StudentID`),
KEY `batchid_studid` (`BatchID`,`StudentID`),
KEY `User_ID` (`User_ID`),
KEY `Year_Id` (`Year_ID`),
KEY `IsNotificationSent` (`IsNotificationSent`),
KEY `isgroup` (`IsGroup`),
KEY `SID` (`SID`),
KEY `SMS_Category_ID` (`SMS_Category_ID`),
KEY `SMSSentTimeTicks` (`SMSSentTimeTicks`)
) ENGINE=MyISAM AUTO_INCREMENT=16513292 DEFAULT CHARSET=utf8;
The following is my select query:
SELECT messagetext, SMSSentTime, StudentID, batchid,
User_ID,MessageID,Sent_SMS_Day, Sent_SMS_Month,
Sent_SMS_Year,Module_Id,Year_ID,Doc_Batch
FROM smstable_read
WHERE StudentID=977 AND SID = 8582 AND MessageID>16013282
You need to learn about compound indexes and covering indexes. Read about those things.
Your query is slow because it's doing a half-scan of the table. It uses the primary key to find the first row with a qualifying MessageID, then looks at every row of the table to find matching rows.
Your filter criteria are StudentID = constant, SID = constant AND MessageID > constant. That means you need those three columns, in that order, in an index. The first two filter criteria will random-access your index to the correct place. The third criterion will scan the index starting right after the constant value in your query. It's called an Index Range Scan operation, and it's quite efficient.
ALTER TABLE smstable_read
ADD INDEX StudentSidMessage (StudentId, SID, MessageId);
This compound index should make your query efficient. Notice that in MyISAM, the primary key column of a table should appear in compound indexes. That's cool in this case because it's also part of your query criteria.
If this query is used very frequently, you could make a covering index: you could add the other columns of the query (the ones mentioned in your SELECT clause) to the index.
But, unfortunately you have defined your messageText column with a longtext data type. That allows for each message to contain up to four gigabytes. (Why? Is this really SMS data? There's a limit of 160 bytes per message in SMS. Four gigabytes >> 160 bytes.)
Now the point of a covering index is to allow the query to be satisfied entirely from the index, without referring back to the table. But when you include a longtext or any other LOB column in an index, it only contains a subset of the data. So the point of the covering index is lost.
If I were you I would change my table so messageText was a VARCHAR(255) data type, and then create this covering index:
ALTER TABLE smstable_read
ADD INDEX StudentSidMessage (StudentId, SID, MessageId,
SMSSentTime, batchid,
User_ID, Sent_SMS_Day, Sent_SMS_Month,
Sent_SMS_Year,Module_Id,Year_ID,Doc_Batch,
messageText);
(Notice that you should put variable-length items last in the index if you can.)
If you can't change your application to handle VARCHAR(255) then go with the first index I mentioned.
Pro tip: putting lots of single-column indexes on MySQL tables rarely helps SELECT performance and always harms INSERT and UPDATE performance. You need an index on your primary key, and you need indexes to support the queries you run. Extra indexes are harmful.
It looks like your database is not properly indexed and even not properly normalized. Normalizing your database will go a long way to speed up all your queries. Particularly in view of the fact that mysql used only one index per table in a query. Even though you have lot's of indexes, they cannot be used.
Your current query filters on StudentID,SID, and MessageID. The last is an inequality comparision so an index will not be very effective with that but the other two columns are equality comparisons. I suggest an index like this:
KEY `studid` (`StudentID`,`SID`)
Follow that up by dropping your existing index on SID. If you find that you don't want to drop it because it's used in another query, further evidence that your table is in desperate need of normalization.
Too many indexes slow down inserts and adds a little overhead to each SELECT because the query planner needs more effort to figure out which index to use.

What is the best way to limit a query with sorted results on a Closure table with a depth field in MySQL?

Researching hierarchical data persistence and led me to closure tables and pieced together this comment structure based off of the culmination of said research.
Queries for creating new nodes in the closure table were easy enough for me to grasp and fetching data for descendants via a JOIN on the closure table is simple enough.
However, I would like to expand upon that and get results back sorted and limited by both number of parents/children down through a depth of x.
I'm trying to keep things timely/efficient (I expect comments table to get very large) by making use of foreign keys and indexes. I am shooting for an all in one query that can do what I ask in the title, but am not opposed to breaking it up to increase speed/efficiency.
Current table structures:
CREATE TABLE `comments` (
`comment_id` int(11) UNSIGNED PRIMARY KEY,
`reply_to` int(11) UNSIGNED NOT NULL DEFAULT '0',
`user_id` int(11) UNSIGNED NOT NULL,
`comment_time` int(11) NOT NULL,
`comment` mediumtext NOT NULL,
FOREIGN KEY (`user_id`) REFERENCES users(`user_id`)
) Engine=InnoDB
CREATE TABLE `comments_closure`(
`ancestor_id` int(11) UNSIGNED NOT NULL,
`descendant_id` int(11) UNSIGNED NOT NULL,
`length` tinyint(3) UNSIGNED NOT NULL DEFAULT '0',
PRIMARY KEY(`ancestor_id`, `descendant_id`),
KEY `tree_adl`(`ancestor_id`, `descendant_id`, `length`),
KEY `tree_dl`(`descendant_id`, `length`),
FOREIGN KEY (`ancestor_id`) REFERENCES comments(`comment_id`),
FOREIGN KEY (`descendant_id`) REFERENCES comments(`comment_id`)
) Engine=InnoDB
A clearer summary of what I'm trying to do would be to fetch 20 comments that share an ancestor_id, sorted by time. While also fetching each one's comments 2 length deeper (keeping these limited to a much smaller amount 2) also sorted by time.
I'm not looking to always sort by time however and would also like to fetch results sorted by their comment_id Is it possible to do all this in a single query? I'm not quite sure where to begin.

Track database table changes

I'm trying to implement a way to track changes to a table named user and another named report_to Below are their definitions:
CREATE TABLE `user`
(
`agent_eid` int(11) NOT NULL,
`agent_id` int(11) DEFAULT NULL,
`agent_pipkin_id` int(11) DEFAULT NULL,
`first_name` varchar(45) NOT NULL,
`last_name` varchar(45) NOT NULL,
`team_id` int(11) NOT NULL,
`hire_date` date NOT NULL,
`active` bit(1) NOT NULL,
`agent_id_req` bit(1) NOT NULL,
`agent_eid_req` bit(1) NOT NULL,
`agent_pipkin_req` bit(1) NOT NULL,
PRIMARY KEY (`agent_eid`),
UNIQUE KEY `agent_eid_UNIQUE` (`agent_eid`),
UNIQUE KEY `agent_id_UNIQUE` (`agent_id`),
UNIQUE KEY `agent_pipkin_id_UNIQUE` (`agent_pipkin_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE TABLE `report_to`
(
`agent_eid` int(11) NOT NULL,
`report_to_eid` int(11) NOT NULL,
PRIMARY KEY (`agent_eid`),
UNIQUE KEY `agent_eid_UNIQUE` (`agent_eid`),
KEY `report_to_report_fk_idx` (`report_to_eid`),
CONSTRAINT `report_to_agent_fk` FOREIGN KEY (`agent_eid`) REFERENCES `user` (`agent_eid`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `report_to_report_fk` FOREIGN KEY (`report_to_eid`) REFERENCES `user` (`agent_eid`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8
What can change that needs to be tracked is user.team_id, user.active and report_to.report_to_eid. What i currently have implemented is a table that is populated via an update trigger on user that tracks team changes. That table is defined as:
CREATE TABLE `user_team_changes`
(
`agent_id` int(11) NOT NULL,
`date_changed` date NOT NULL,
`old_team_id` int(11) NOT NULL,
`begin_date` date NOT NULL,
PRIMARY KEY (`agent_id`,`date_changed`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
This works fine for just tracking team changes. I'm able to use joins and a union to populate a history view that tracks that change over time for the individual users. The issue of complexity rises when I try to implement tracking for the other two change types.
I have thought about creating additional tables similar to the one tracking changes for teams, but I worry about performance hits due to the joins that will be required.
Another way I have considered is creating a table similar to a view that I have that details the current user state (it joins all necessary user data together from 4 tables), then insert a record on update with a valid until date field added. My concern with that is the amount of space this could take.
We will be using the user change history quite a bit as we will be running YTD, MTD, PMTD and time interval reports with it on an almost daily basis.
Out of the two options I am considering, which would be the best for my given situation?
The options you've presented:
using triggers to populate transaction-log tables.
including a new table with an effective-date columns in the schema and tracking change by inserting new rows.
Either one of these will work. You can add logging triggers to other tables without causing any trouble.
What distinguishes these two choices? The first one is straightforward, once you get your triggers debugged.
The second choice seems to me that it will create denormalized redundant data. That is never good. I would opt not to do that. It is possible with judicious combinations of views and effective-date columns to create history tables that are viewable as the present state of the system. To learn about this look at Prof. RT Snodgrass's excellent book on Developing Time Oriented applications. http://www.cs.arizona.edu/~rts/publications.html If you have time to do an excellent engineering (over-engineering?) job on this project you might consider this approach.
The data volume you've mentioned will not cause intractable performance problems on any modern server hardware platform. If you do get slowdowns on JOIN operations, it's almost certain that the addition of appropriate indexes will completely fix them, as long as you declare all your DATE, DATETIME, and TIMESTAMP fields NOT NULL. (NULL values can mess up indexing and searching).
Hope this helps.

Database Design - Catalogue - Range - Product

Can anyone suggest a database design for the following:
A user can make a catalogue
Within a catalogue a user can make a range - i.e. a range of products
Within a range a user can add multiple products
Within a range a user can add multiple ranges -> range->range->range all with products in them.
I currently have in my database -
catalogue_range with - id, name, description
and
catalogue_product with - id, range_id, name, description
can anyone see what I'm trying to produce?
My aim is to be able to make multiple catalogue ranges within a catalogue range and add multiple products to each of these catalogue ranges.
Here is my current SQL:
`catalogue_range` (
`id` char(40) NOT NULL,
`profile_id` char(40) NOT NULL,
`type` enum('pdf','db') DEFAULT NULL,
`status` enum('new','draft','live') NOT NULL,
`name` varchar(64) NOT NULL,
`description` varchar(1000) NOT NULL,
`updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `profile_id` (`profile_id`)
)
`catalogue_product` (
`id` char(40) NOT NULL,
`catalogue_id` char(40) NOT NULL,
`order` smallint(5) unsigned NOT NULL,
`name` varchar(50) NOT NULL,
`description` varchar(250) NOT NULL,
PRIMARY KEY (`id`),
KEY `catalogue_id` (`catalogue_id`)
)
Thanks in advance.
catalogue(catalogue id, your private attributes)
product(product id, #catalogue id, your private attributes)
range(range id, #range id parent, your private attributes)
product range(#product id, #range id)
You will need stored procedures/applicative algorithms to compile:
the list of product of a range (to calculate recursive sqls mysql doesn't offer analytic functions as oracle does)
the list of ranges of a catalogue/range
Hope it helps.
S.
Assuming that a product can only exist in one catalogue at a time, your design is almost alright as it is. What you are missing is a recursive foreign key on catalogue_range. Add something like the following to your catalogue_range table definition:
`parent_range_id` char(40) NULL,
FOREIGN KEY (`parent_range_id`) REFERENCES catalogue_range(`id`)
The top level range(s) for any given user will have a NULL parent_range_id, others will refer to the containing range. Note that hierarchies aren't necessarily easy to work with in SQL. You may also want to look into techniques for making hierarchical data more SQL-friendly, such as nested sets.