MySQL Merged table duplicates - mysql

Here is what I currently have:
Archive tables (one for each year, 2008-2011) and 4 newly created tables for 2012 broken into quarters. All of these tables, including the new one, have the same structure and keys. The naming convention for these is ARCHIVE_PLAYS. I then have a "live" table (Called PLAYS) for current data. I have a merged table that combines all tables so that I can run reports. The issue I have, which I didn't have before, is that this merged table is showing duplicates. They have the same Primary keys so this shouldn't be the case, right? It must have something to do with the new tables I just created as I didn't have this issue before.
Structure:
**COMPANY**
COMPANY.MERGED_PLAYS
COMPANY.ARCHIVE_PLAYS_2008
COMPANY.ARCHIVE_PLAYS_2009
COMPANY.ARCHIVE_PLAYS_2010
COMPANY.ARCHIVE_PLAYS_2011
COMPANY.ARCHIVE_PLAYS_2012Q1
COMPANY.ARCHIVE_PLAYS_2012Q2
COMPANY.ARCHIVE_PLAYS_2012Q3
COMPANY.ARCHIVE_PLAYS_2012Q4
**COMPANY2**
COMPANY2.PLAYS
Each table, with the exception of the Merged_Plays, has the following Create:
CREATE TABLE `ARCHIVE_PLAYS_2011` (
`ENTRY_ID` BIGINT(20) NOT NULL,
`NODE_ID` VARCHAR(48) NOT NULL,
`HW_ID` VARBINARY(64) NOT NULL,
`LOG_DAY` DATE NOT NULL,
`ROW_NUMBER` INT(11) NOT NULL,
`NODE_NAME` VARCHAR(128) NOT NULL,
`FILE_NAME` VARCHAR(1024) NOT NULL,
`PRESENTATION_NAME` VARCHAR(1024) NULL DEFAULT NULL,
`SMIL_SEQUENCE_ID` VARCHAR(256) NULL DEFAULT NULL,
`SMIL_CONTENT_ID` VARCHAR(256) NULL DEFAULT NULL,
`PLAY_TIME_MS` BIGINT(20) NOT NULL,
`PLAY_TIME` TIME NOT NULL,
`STATUS_CODE` VARCHAR(48) NULL DEFAULT NULL,
`NUM_SCREENS_CONNECTED_AND_ON` INT(11) NULL DEFAULT NULL,
`NUM_SPEAKERS_CONNECTED_AND_ON` INT(11) NULL DEFAULT NULL,
`SCREEN_LAYOUT_MATCHES` CHAR(1) NULL DEFAULT NULL,
`ENTRY_PROCESSED` CHAR(1) NULL DEFAULT NULL,
`FILE_PATH` VARCHAR(1024) NULL DEFAULT NULL,
PRIMARY KEY (`NODE_ID`, `LOG_DAY`, `ROW_NUMBER`),
INDEX `PLAYLOG_ENTRY_ID` (`ENTRY_ID`),
INDEX `PLAYLOG_LOG_DAY` (`LOG_DAY`),
INDEX `PLAYLOG_LOG_DAY_PLAY_TIME` (`LOG_DAY`, `PLAY_TIME`),
INDEX `PLAYLOG_FILE_NAME` (`FILE_NAME`(600)),
INDEX `PLAYLOG_NODE_NAME` (`NODE_NAME`),
INDEX `PLAYLOG_FILE_NAME_NODE_NAME` (`FILE_NAME`(600), `NODE_NAME`),
INDEX `PLAYLOG_ENTRY_ID_PROCESSED` (`ENTRY_ID`, `ENTRY_PROCESSED`)
)
COLLATE='latin1_swedish_ci'
ENGINE=MyISAM;

A primary key only assures unique data within a single table. You must have duplicate records across multiple tables. Make sure you have deleted all of the 2012 data from the live table. Make sure there are no dups between any of the quarter tables.
Also if the records are 100% dups, if you do a UNION between all of your tables (instead of UNION ALL) you will get unique results, however this will decrease query performance.

Related

Indexing columns for faster querying in MySQL 5.6 or higher

I'm building a real estate app. I have a table called properties which is like the main table that has all common columns (10 columns) for all types of properties (lands, apartments, ... etc) and then I have a specific table for each property type since each type has some specific column. here is the property table:
CREATE TABLE `properties` (
`property_id` int(11) NOT NULL AUTO_INCREMENT,
`property_type` int(11) DEFAULT NULL,
`property_title` varchar(255) NOT NULL,
`property_description` varchar(1000) NOT NULL,
`country_id` int(11) NOT NULL,
`city_id` int(11) NOT NULL,
`city_location_id` int(11) NOT NULL,
`price` int(11) DEFAULT NULL,
`area` decimal(7,2) DEFAULT NULL,
`latitude` decimal(10,8) DEFAULT NULL,
`longitude` decimal(11,8) DEFAULT NULL,
`entry_date` datetime NOT NULL,
`last_modification_date` datetime NOT NULL,
PRIMARY KEY (`property_id`)
)
and here is the apartments for example:
CREATE TABLE `apartments` (
`apartment_id` INT NOT NULL COMMENT '',
`num_of_bedrooms` INT NULL COMMENT '',
`num_of_bathrooms` INT NULL COMMENT '',
`num_of_garages` INT NULL COMMENT '',
PRIMARY KEY (`apartment_id`) COMMENT '',
CONSTRAINT `properties_apartments_fk`
FOREIGN KEY (`apartment_id`)
REFERENCES `aqar_world`.`properties` (`property_id`)
ON DELETE CASCADE
ON UPDATE NO ACTION);
now the user can filter his search based on almost any of these columns or a combination of them, so how should I put my indexing strategy on the columns (the user could filter based on price, area, area and price, number of bedrooms and location and so on with these so many combinations) .. another point is that the property_description and property_title are texts so I'll have to add a fulltext index on each of them, right? also there is a join between these two tables and also between them and some other table (like agents tables for example).
I've read some say since mysql 5.6 there something in the optimizer that makes use of multiple indexes so you can put an index on each column but I don't know if that is right .. please advice since I'm not that good in taking care of DB performance
5.7 has JSON tricks. MariaDB 10 has Dynamic Columns with similar tricks.
The main principle: Expose the more useful fields; throw the more obscure fields into JSON or Dynamic columns. Then let MySQL filter on the former, and your app takes care of further filtering on the latter.
More discussion.

Storing customer specific details in MYSQL? Without new tables

I'm looking at storing customer details upon registering for my service, the service in question is a booking system, each user needs to have his/her own calender system which keeps records of all bookings (arrival data/time, name , price etc) i can envision a way of storing all this unique user information in a single table linked by only userID?
CREATE TABLE IF NOT EXISTS `bookings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`room_number` varchar(50) NOT NULL,
`name` varchar(20) NOT NULL,
`arrival` date NOT NULL,
`depart` date NOT NULL,
`nights` int(11) NOT NULL,
`price` decimal(11,2) NOT NULL,
`date_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
)
Each user would need to store this information? surely i would need to create a whole new table for each user? (Which i know is just plain slow and wrong).
You don't want to store a separate table for each user (except under some very specific requirements which are rather unusual). Your table is missing a userId. Something like:
CREATE TABLE IF NOT EXISTS `bookings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`UserId` int(11) NOT NULL,
`room_number` varchar(50) NOT NULL,
`name` varchar(20) NOT NULL,
`arrival` date NOT NULL,
`depart` date NOT NULL,
`nights` int(11) NOT NULL,
`price` decimal(11,2) NOT NULL,
`date_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
FOREIGN KEY (UserId) references users(UserId)
);
Don't worry about the number of rows in the table. SQL is designed to handle millions of rows for most applications. In fact, splitting the data among multiple tables would introduce some major problems with performance (notably partially filled pages) that could greatly reduce performance.

Which column(s) to index in MySQL

I'm trying to optimize the following table, according to phpMyAdmin several stats regarding Table Scans are high and indices do not exist or are not being used. (Handler read rnd next 5.7 M)
1.
$query = "
SELECT * FROM apps_discrep
WHERE discrep_station = '$station'
AND discrep_date = '$date'
ORDER BY discrep_timestart";
2.
$query = "
SELECT * FROM apps_discrep
WHERE discrep_date BETWEEN '$keyword' AND '$keyword3'
AND (discrep_station like '$keyword2%') ORDER BY discrep_date";
Would it be correct to Index discrep_station, discrep_date, and discrep_timestart?
There currently only exist the Primary Unique Index on the auto-increment ID.
-- Table structure
`index` int(11) NOT NULL AUTO_INCREMENT,
discrep_station varchar(5) NOT NULL,
discrep_timestart time NOT NULL,
discrep_timestop time NOT NULL,
discrep_date date NOT NULL,
discrep_datetime timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
discrep_show varchar(31) NOT NULL,
discrep_text text NOT NULL,
discrep_by varchar(11) NOT NULL,
discrep_opr varchar(11) NOT NULL,
email_traffic varchar(3) NOT NULL,
email_techs varchar(3) NOT NULL,
email_promos varchar(3) NOT NULL,
email_spots varchar(3) NOT NULL,
eas_row varchar(11) NOT NULL,
PRIMARY KEY (`index`)
ENGINE=MyISAM DEFAULT CHARSET=utf8;
It looks to me like you can get both queries with the same BTREE index, since that allows you to use the left-most tuples as a separate index.
Consider this MySQL doc page as a reference.
ALTER TABLE xxx ADD KEY `key1` (`discrep_station`, `discrep_date`, `discrep_timestart`) USING BTREE;
Your first query will use all 3 fields in the index. The second query will only use the first 2 fields in the index.

MySQL index help - which is faster?

What I'm dealing with:
I have a project which uses ActiveCollab 2, and the database structure is new to me - practically everything gets stored to a project_objects table and has a recursively hierarchical relationship:
Record 1234 might be type "Ticket" with parent_id of 123
Record 123 might be type "Category" with parent_id of 12
Record 12 might be type "Milestone" and so on.
Currently there are upwards of 450,000 records in this table and many of the queries in the code reference the name field which does NOT have an index on it. An example value might be Design or Development.
This might be an example query:
SELECT * FROM project_objects WHERE type = "Ticket" and name = "Design"
My problem:
I have a query that is taking upwards of 12-15 seconds and I have a feeling it's from that
name column lacking the index and requiring the full text search. My understanding with indexes is that if I add one to the name field, it'll speed up the reads, but slow down the inserts and updates. Does the index need to get rebuilt completely every time a record is added or updated or is it just altered/appended? I don't want to optimize this query with an index if it means drastically slowing down other parts of the code base which depend on faster writes.
My question:
Assume 100 reads and 100 writes per day, which is more likely to be a faster process for MySQL - executing the above query on the above table without the index or having to rebuild the index every time a record is added?
I don't have the knowledge or authority to start running benchmarks, but I would like to offer a suggestion to the client without sounding completely novice. Thanks!
EDIT: Here is the table:
'CREATE TABLE `project_objects` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`source` varchar(50) DEFAULT NULL,
`type` varchar(30) NOT NULL DEFAULT ''ProjectObject'',
`module` varchar(30) NOT NULL DEFAULT ''system'',
`project_id` int(10) unsigned NOT NULL DEFAULT ''0'',
`milestone_id` int(10) unsigned DEFAULT NULL,
`parent_id` int(10) unsigned DEFAULT NULL,
`parent_type` varchar(30) DEFAULT NULL,
`name` varchar(150) DEFAULT NULL,
`body` longtext,
`tags` text,
`state` tinyint(4) NOT NULL DEFAULT ''0'',
`visibility` tinyint(4) NOT NULL DEFAULT ''0'',
`priority` tinyint(4) DEFAULT NULL,
`created_on` datetime DEFAULT NULL,
`created_by_id` smallint(5) unsigned NOT NULL DEFAULT ''0'',
`created_by_name` varchar(100) DEFAULT NULL,
`created_by_email` varchar(100) DEFAULT NULL,
`updated_on` datetime DEFAULT NULL,
`updated_by_id` smallint(5) unsigned DEFAULT NULL,
`updated_by_name` varchar(100) DEFAULT NULL,
`updated_by_email` varchar(100) DEFAULT NULL,
`due_on` date DEFAULT NULL,
`completed_on` datetime DEFAULT NULL,
`completed_by_id` smallint(5) unsigned DEFAULT NULL,
`completed_by_name` varchar(100) DEFAULT NULL,
`completed_by_email` varchar(100) DEFAULT NULL,
`comments_count` smallint(5) unsigned DEFAULT NULL,
`has_time` tinyint(1) unsigned NOT NULL DEFAULT ''0'',
`is_locked` tinyint(3) unsigned DEFAULT NULL,
`estimate` float(9,2) DEFAULT NULL,
`start_on` date DEFAULT NULL,
`start_on_text` varchar(50) DEFAULT NULL,
`due_on_text` varchar(50) DEFAULT NULL,
`workflow_status` int(4) DEFAULT NULL,
`varchar_field_1` varchar(255) DEFAULT NULL,
`varchar_field_2` varchar(255) DEFAULT NULL,
`integer_field_1` int(11) DEFAULT NULL,
`integer_field_2` int(11) DEFAULT NULL,
`float_field_1` double(10,2) DEFAULT NULL,
`float_field_2` double(10,2) DEFAULT NULL,
`text_field_1` longtext,
`text_field_2` longtext,
`date_field_1` date DEFAULT NULL,
`date_field_2` date DEFAULT NULL,
`datetime_field_1` datetime DEFAULT NULL,
`datetime_field_2` datetime DEFAULT NULL,
`boolean_field_1` tinyint(1) unsigned DEFAULT NULL,
`boolean_field_2` tinyint(1) unsigned DEFAULT NULL,
`position` int(10) unsigned DEFAULT NULL,
`version` int(10) unsigned NOT NULL DEFAULT ''0'',
PRIMARY KEY (`id`),
KEY `type` (`type`),
KEY `module` (`module`),
KEY `project_id` (`project_id`),
KEY `parent_id` (`parent_id`),
KEY `created_on` (`created_on`),
KEY `due_on` (`due_on`)
KEY `milestone_id` (`milestone_id`)
) ENGINE=InnoDB AUTO_INCREMENT=993109 DEFAULT CHARSET=utf8'
As #Ray points out, indexes do not have to be rebuilt on every Insert, Update or Delete operation. So, if you only want to improve efficuency of this (or similar) queries, add either an index on (name, type) or on (type, name).
Since you already have an index on (type) alone, I would add the first one:
ALTER TABLE project_objects
ADD INDEX name_type_IDX
(name, type) ;
It may take a few seconds on a busy server but it has to be done once and then all the queries with conditions like yours will benefit. It may also improve efficiency of several other types of queries that involve name only or name and type:
WHERE name = 'Design' AND type = 'Ticket' --- your query
WHERE name = 'Design' --- condition on `name` only
GROUP BY name --- group by `name`
WHERE name LIKE 'Design%' --- range condition on `name` only
WHERE name = 'Design' --- equality condition on `name`
AND type LIKE 'Ticket%' --- and range condition on `type`
WHERE name = 'Design' --- equality condition on `name`
GROUP BY type --- and group by `type`
GROUP BY name --- group by `name`
, type --- and `type`
The insert cost of adding a single point index on the name column is most likely negligible--it will probably amount to an addition of a constant time increase, probably no more that a few milliseconds. You will eat up some extra disk space, but that's usually not a concern. Nothing like the multiple seconds you're experienceing on select performance.
Add the index, enjoy the performance improvement.
BTW: Indexes aren't 'rebuilt' on every insert. They're usually implemented in B-Trees and unless you're deleting frequently, should require very little re-balancing once you get larger than a few levels (and rebalancing with little depth is pretty cheap).

Avoid UNION for two almost identical tables in MySQL

I'm not very good at MySQL and i'm going to write a query to count messages sent by an user, based on its type and is_auto field.
Messages can be of type "small text message" or "newsletter". I created two entities with a few fields that differs between them. The important one is messages_count that is absent in table newsletter and it's used in the query:
CREATE TABLE IF NOT EXISTS `small_text_message` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`messages_count` int(11) NOT NULL,
`username` varchar(255) NOT NULL,
`method` varchar(255) NOT NULL,
`content` longtext,
`sent_at` datetime DEFAULT NULL,
`status` varchar(255) NOT NULL,
`recipients_count` int(11) NOT NULL,
`customers_count` int(11) NOT NULL,
`sheduled_at` datetime DEFAULT NULL,
`sheduled_for` datetime DEFAULT NULL,
`is_auto` tinyint(1) NOT NULL,
`user_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
And:
CREATE TABLE `newsletter` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`subject` varchar(78) DEFAULT NULL,
`content` longtext,
`sent_at` datetime DEFAULT NULL,
`status` varchar(255) NOT NULL,
`recipients_count` int(11) NOT NULL,
`customers_count` int(11) NOT NULL,
`sheduled_at` datetime DEFAULT NULL,
`sheduled_for` datetime DEFAULT NULL,
`is_auto` tinyint(1) NOT NULL,
`user_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
I ended up with a UNION query. Can this query be shortened or optimized since the only difference is messages_count that should be always 1 for newsletter?
SELECT
CONCAT('sms_', IF(is_auto = 0, 'user' , 'auto')) AS subtype,
SUM(messages_count * (customers_count + recipients_count)) AS count
FROM small_text_message WHERE status <> 'pending' AND user_id = 1
GROUP BY is_auto
UNION
SELECT
CONCAT('newsletter_', IF(is_auto = 0, 'user' , 'auto')) AS subtype,
SUM(customers_count + recipients_count) AS count
FROM newsletter WHERE status <> 'pending' AND user_id = 1
GROUP BY is_auto
I don't see any easy way to avoid a UNION (or UNION ALL) operation, that will return the specified result set.
I would recommend you use a UNION ALL operator in place of the UNION operator. Then the execution plan will not include the step that eliminates duplicate rows. (You already have GROUP BY operations on each query, and there is no way that those two queries can produce an identical row.)
Otherwise, your query looks fine just as it is written.
(It's always a good thing to consider the question, might there be a better way? To get the result set you are asking for, from the schema you have, your query looks about as good as it's going to get.)
If you are looking for more general DB advice, I recommend restructuring the tables to factor the common elements into one table, perhaps called outbound_communication or something, with all of your common fields, then perhaps have "sub tables" for the specific types to host the fields which are unique to that type. It does mean a simple JOIN is necessary to select all of the fields you want, but the again, it's normalized and actually makes situations like this one easier (one table holds all of the entities of interest). Additionally, you have the option of writing that JOIN just once as a "view", and then your existing code would not even need to change to see the two tables as if they never changed.
CREATE TABLE IF NOT EXISTS `outbound_communicaton` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`content` longtext,
`sent_at` datetime DEFAULT NULL,
`status` varchar(255) NOT NULL,
`recipients_count` int(11) NOT NULL,
`customers_count` int(11) NOT NULL,
`sheduled_at` datetime DEFAULT NULL,
`sheduled_for` datetime DEFAULT NULL,
`is_auto` tinyint(1) NOT NULL,
`user_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
CREATE TABLE `small_text_message` (
`oubound_communication_id` int(11) NOT NULL,
`messages_count` int(11) NOT NULL,
`username` varchar(255) NOT NULL,
`method` varchar(255) NOT NULL,
PRIMARY KEY (`outbound_communication_id`),
FOREIGN KEY (outbound_communication_id)
REFERENCES outbound_communicaton(id)
) ENGINE=InnoDB;
CREATE TABLE `newsletter` (
`oubound_communication_id` int(11) NOT NULL,
`subject` varchar(78) DEFAULT NULL,
PRIMARY KEY (`outbound_communication_id`),
FOREIGN KEY (outbound_communication_id)
REFERENCES outbound_communicaton(id)
) ENGINE=InnoDB;
Then selecting a text msg is like this:
SELECT *
FROM outbound_communication AS parent
JOIN small_text_message
ON parent.id = small_text_message.outbound_communication_id
WHERE parent.id = 1234;
The nature of the query is inherently the union of the data from the small text message and the newsletter tables, so the UNION query is the only realistic formulation. There's no join of relevance between the two tables, for example.
So, I think you're very much on the right lines with your query.
Why are you worried about a UNION?