simple select query optimise - mysql

I have a table defined as follows:
CREATE TABLE IF NOT EXISTS `cards` (
`ID` int(11) NOT NULL,
`Name` varchar(200) NOT NULL,
`WorkerID` varchar(20) NOT NULL,
`pic` varchar(200) NOT NULL,
`expDate` bigint(20) NOT NULL,
`reminderSent` tinyint(4) NOT NULL,
`regNum` varchar(8) NOT NULL,
`cardType` varchar(200) NOT NULL
) ENGINE=MyISAM AUTO_INCREMENT=92 DEFAULT CHARSET=latin1;
ALTER TABLE `cards`
ADD PRIMARY KEY (`ID`), ADD KEY `cardsWorkerID_idx` (`WorkerID`);
But running:
explain
SELECT pic, expDate, Name, ID, cardType, regNum FROM cards WHERE workerID= 18
tells me it is scanning the entire table, even though I added an index to the workerID field. Can anyone explain what I'm missing?

The use of indexes depends on the size of the data. It also depends on the types used for the comparison. If you have a small table, then the SQL engine might decide that a scan is more efficient than using the index. This is particularly true if the table fits on a single data page.
In your case, though, the problem is might be data conversion. Use the appropriate typed constant for the comparison:
SELECT pic, expDate, Name, ID, cardType, regNum
FROM cards
WHERE workerID = '18';

Related

is a field in a fulltext key indexed and fast to use in join?

I am currently working with a database that was auto generated by a tool (and is used in production)
(I will only speak about what is interesting for the question)
I have three tables : user, movie and userMovie.
the command show create table user return something like :
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(255) DEFAULT NULL,
`email` varchar(255) DEFAULT NULL,
`password` varchar(255) DEFAULT NULL,
`other_field_1` varchar(255) DEFAULT NULL, -- not actual field name
PRIMARY KEY (`id`),
FULLTEXT KEY `SEARCH_USERS` (`username`,`other_field_1`)
)
the command show create table movie return something like :
CREATE TABLE `movie` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`link` varchar(255) DEFAULT NULL,
`another_field_1` varchar(255) DEFAULT NULL, -- not actual field name
`another_field_2` varchar(255) DEFAULT NULL, -- not actual field name
PRIMARY KEY (`id`),
FULLTEXT KEY `SEARCH_MOVIES` (`name`,`link`,`another_field_1`,`another_field_2`)
)
the command show create table userMovie return something like :
CREATE TABLE `userMovie` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`Name` varchar(255) DEFAULT NULL,
`user` int(11) DEFAULT NULL,
`field1` varchar(255) DEFAULT NULL, -- not actual field name
`field2` varchar(255) DEFAULT NULL, -- not actual field name
`field3` varchar(255) DEFAULT NULL, -- not actual field name
PRIMARY KEY (`id`),
FULLTEXT KEY `SEARCH_USER_MOVIE` (`Name`,`field1`,`field2`,`field3`)
)
Obviously, there is several issue with this code, the main ones being :
There is no foreign key,
The field userMovie.Name contain the name of the movie, not the id
I'm well aware of the inconsistency risk, but I'm more ignorant about the potential performance issue. Especially, there is a lot of records in the userMovie table, and we have to join it quite often with the movie table (and the user table)
However, as userMovie.Name is in the "FULLTEXT KEY", does that mean it is indexed ?
By the way, I think that only the tool previously mentioned had an use of this, and can probably be removed if needed.
I would want to know if there is a performance issue and ways to improve it. (It would also be awesome if the modification I'll be doing are "safe", as I don't want to break anything)
The column(s) in a FULLTEXT index are usable only for MATCH...AGAINST.
If you also want a BTree index on the column(s), provide a separate INDEX.
You can do
WHERE MATCH(`Name`,`field1`,`field2`,`field3`) AGAINST("...")
AND field4 > 123
Or even
WHERE MATCH(`Name`,`field1`,`field2`,`field3`) AGAINST("...")
AND name = 'abc'
However, this second format makes little sense. Usually a column is searched by either FULLTEXT or a regular index, not both.
What is the intent of the table userMovie? The name sounds like a many-to-many mapping table (eg, which movies each user has watched), but the columns do not reflect that.
To address a "performance issue", we need to see the SELECTs -- they have performance issues, not the schema. They guide what indexes are useful.

Indexing columns for faster querying in MySQL 5.6 or higher

I'm building a real estate app. I have a table called properties which is like the main table that has all common columns (10 columns) for all types of properties (lands, apartments, ... etc) and then I have a specific table for each property type since each type has some specific column. here is the property table:
CREATE TABLE `properties` (
`property_id` int(11) NOT NULL AUTO_INCREMENT,
`property_type` int(11) DEFAULT NULL,
`property_title` varchar(255) NOT NULL,
`property_description` varchar(1000) NOT NULL,
`country_id` int(11) NOT NULL,
`city_id` int(11) NOT NULL,
`city_location_id` int(11) NOT NULL,
`price` int(11) DEFAULT NULL,
`area` decimal(7,2) DEFAULT NULL,
`latitude` decimal(10,8) DEFAULT NULL,
`longitude` decimal(11,8) DEFAULT NULL,
`entry_date` datetime NOT NULL,
`last_modification_date` datetime NOT NULL,
PRIMARY KEY (`property_id`)
)
and here is the apartments for example:
CREATE TABLE `apartments` (
`apartment_id` INT NOT NULL COMMENT '',
`num_of_bedrooms` INT NULL COMMENT '',
`num_of_bathrooms` INT NULL COMMENT '',
`num_of_garages` INT NULL COMMENT '',
PRIMARY KEY (`apartment_id`) COMMENT '',
CONSTRAINT `properties_apartments_fk`
FOREIGN KEY (`apartment_id`)
REFERENCES `aqar_world`.`properties` (`property_id`)
ON DELETE CASCADE
ON UPDATE NO ACTION);
now the user can filter his search based on almost any of these columns or a combination of them, so how should I put my indexing strategy on the columns (the user could filter based on price, area, area and price, number of bedrooms and location and so on with these so many combinations) .. another point is that the property_description and property_title are texts so I'll have to add a fulltext index on each of them, right? also there is a join between these two tables and also between them and some other table (like agents tables for example).
I've read some say since mysql 5.6 there something in the optimizer that makes use of multiple indexes so you can put an index on each column but I don't know if that is right .. please advice since I'm not that good in taking care of DB performance
5.7 has JSON tricks. MariaDB 10 has Dynamic Columns with similar tricks.
The main principle: Expose the more useful fields; throw the more obscure fields into JSON or Dynamic columns. Then let MySQL filter on the former, and your app takes care of further filtering on the latter.
More discussion.

MySQL-Query too slow

I am using the following tables in my MySQL-Database:
--
-- Table structure for table `company`
--
CREATE TABLE IF NOT EXISTS `company` (
`numb` varchar(4) NOT NULL,
`cik` varchar(30) NOT NULL,
`sNumber` varchar(30) NOT NULL,
`street1` varchar(255) NOT NULL,
`street2` varchar(255) NOT NULL,
`city` varchar(255) NOT NULL,
`state` varchar(100) NOT NULL,
`zip` varchar(100) NOT NULL,
`phone` varchar(255) NOT NULL,
`name` varchar(255) NOT NULL,
`dateChanged` varchar(30) NOT NULL,
`name2` varchar(255) NOT NULL,
`seriesId` varchar(30) NOT NULL,
`symbol` varchar(10) NOT NULL,
`exchange` varchar(20) NOT NULL,
PRIMARY KEY (`cik`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO `company` (`numb`, `cik`, `sNumber`, `street1`, `street2`, `city`, `state`, `zip`, `phone`, `name`, `dateChanged`, `name2`, `seriesId`, `symbol`, `exchange`) VALUES
('6798', 'abc', '953551121', '701 AVENUE', '', 'GLENDALE', 'CA', '91201-2349', '818-244-8080', '', '', 'Public Store', '', 'PSA', 'NYSE')
--
-- Table structure for table `data`
--
CREATE TABLE IF NOT EXISTS `data` (
`id` int(100) NOT NULL AUTO_INCREMENT,
`number` varchar(100) NOT NULL,
`elementname` mediumtext NOT NULL,
`date` varchar(100) NOT NULL,
`elementvalue` longtext NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=18439;
INSERT INTO `data` (`id`, `number`, `elementname`, `date`, `elementvalue`) VALUES
(1, '0001393311-10-000004', 'StockholdersEquityIncludingPortionAttributableToNoncontrollingInterest', '2009-12-31', '3399777000')
--
-- Table structure for table `filing`
--
CREATE TABLE IF NOT EXISTS `filing` (
`number` varchar(100) NOT NULL,
`file_number` varchar(100) NOT NULL,
`type` varchar(100) NOT NULL,
`amendment` tinyint(1) NOT NULL,
`date` varchar(100) NOT NULL,
`cik` varchar(30) NOT NULL,
PRIMARY KEY (`accession_number`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO `filing` (`number`, `file_number`, `type`, `amendment`, `date`, `cik`) VALUES
('0001393311-10-000004', '001-33519', '10-K', 0, '2009-12-31', '0000751653'),
('0000751652-10-000006', '001-08796', '10-K', 0, '2009-12-31', '0000751652')
The data table has around 22.000 entries, filing and company tables have around 400 entries each. I want to operate the database with a lot more entries in the future.
I perform the following query, which selects the newest item with a given type:
SELECT data.elementname, data.elementvalue, company.name2 FROM data
JOIN filing ON data.number = filing.number
JOIN company ON filing.cik = company.cik
WHERE elementname IN ('Elem1', 'Elem2', 'Elem3', 'Elem4', 'Elem5', 'ElemN')
AND number IN (
SELECT number
FROM filing
WHERE filing.cik IN ('cik1', 'cik2', 'cikN')
AND filing.type = '1L'
GROUP BY filing.cik
)
It takes between ~0.28 and 0.4 seconds to complete, which appears to be very slow.
When i perform the query without the following line
WHERE filing.cik IN ('cik1', 'cik2', 'cikN')
it takes only ~0.035 seconds.
Any idea how to speed the query up or to optimize the table structure because the table is growing rapidly and it's already too slow.
First off, the table structure you posted for filing is incorrect, as the primary key you specified doesn't. I'll assume you mean number. Additionally, you didn't specify the table definition for company, which makes trying to provide advice for this somewhat difficult.
However, both of the comments are correct. You need some indexes. Based on the query, you should probably some the following indexes.
ALTER TABLE company ADD INDEX ( cik )
ALTER TABLE data ADD INDEX ( number )
I would also recommend taking a look at whether data.elementname actually needs to be a MEDIUMTEXT, which is a pretty huge column. If the rest of the data looks like the example data you provided, you should probably change it into a varchar. TEXT columns can cause some serious performance penalties due to the way they're stored.
Additionally, your PRIMARY KEY number columns, which are currently strings, look as though they could be reformatted into different columns that are actually of type INT. Keep in mind that VARCHAR PRIMARY KEY columns will not be as efficient as INTs, just because they're so much bigger.
Lastly, 22k rows isn't all that much data. You should a take a look at your my.cnf settings. Your key_buffer value may be too small to fit indexes entirely in memory. Additionally, you may want to consider using INNODB for these tables, combined with an innodb_buffer_pool value that'll keep everything in memory.

Avoid UNION for two almost identical tables in MySQL

I'm not very good at MySQL and i'm going to write a query to count messages sent by an user, based on its type and is_auto field.
Messages can be of type "small text message" or "newsletter". I created two entities with a few fields that differs between them. The important one is messages_count that is absent in table newsletter and it's used in the query:
CREATE TABLE IF NOT EXISTS `small_text_message` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`messages_count` int(11) NOT NULL,
`username` varchar(255) NOT NULL,
`method` varchar(255) NOT NULL,
`content` longtext,
`sent_at` datetime DEFAULT NULL,
`status` varchar(255) NOT NULL,
`recipients_count` int(11) NOT NULL,
`customers_count` int(11) NOT NULL,
`sheduled_at` datetime DEFAULT NULL,
`sheduled_for` datetime DEFAULT NULL,
`is_auto` tinyint(1) NOT NULL,
`user_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
And:
CREATE TABLE `newsletter` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`subject` varchar(78) DEFAULT NULL,
`content` longtext,
`sent_at` datetime DEFAULT NULL,
`status` varchar(255) NOT NULL,
`recipients_count` int(11) NOT NULL,
`customers_count` int(11) NOT NULL,
`sheduled_at` datetime DEFAULT NULL,
`sheduled_for` datetime DEFAULT NULL,
`is_auto` tinyint(1) NOT NULL,
`user_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
I ended up with a UNION query. Can this query be shortened or optimized since the only difference is messages_count that should be always 1 for newsletter?
SELECT
CONCAT('sms_', IF(is_auto = 0, 'user' , 'auto')) AS subtype,
SUM(messages_count * (customers_count + recipients_count)) AS count
FROM small_text_message WHERE status <> 'pending' AND user_id = 1
GROUP BY is_auto
UNION
SELECT
CONCAT('newsletter_', IF(is_auto = 0, 'user' , 'auto')) AS subtype,
SUM(customers_count + recipients_count) AS count
FROM newsletter WHERE status <> 'pending' AND user_id = 1
GROUP BY is_auto
I don't see any easy way to avoid a UNION (or UNION ALL) operation, that will return the specified result set.
I would recommend you use a UNION ALL operator in place of the UNION operator. Then the execution plan will not include the step that eliminates duplicate rows. (You already have GROUP BY operations on each query, and there is no way that those two queries can produce an identical row.)
Otherwise, your query looks fine just as it is written.
(It's always a good thing to consider the question, might there be a better way? To get the result set you are asking for, from the schema you have, your query looks about as good as it's going to get.)
If you are looking for more general DB advice, I recommend restructuring the tables to factor the common elements into one table, perhaps called outbound_communication or something, with all of your common fields, then perhaps have "sub tables" for the specific types to host the fields which are unique to that type. It does mean a simple JOIN is necessary to select all of the fields you want, but the again, it's normalized and actually makes situations like this one easier (one table holds all of the entities of interest). Additionally, you have the option of writing that JOIN just once as a "view", and then your existing code would not even need to change to see the two tables as if they never changed.
CREATE TABLE IF NOT EXISTS `outbound_communicaton` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`content` longtext,
`sent_at` datetime DEFAULT NULL,
`status` varchar(255) NOT NULL,
`recipients_count` int(11) NOT NULL,
`customers_count` int(11) NOT NULL,
`sheduled_at` datetime DEFAULT NULL,
`sheduled_for` datetime DEFAULT NULL,
`is_auto` tinyint(1) NOT NULL,
`user_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
CREATE TABLE `small_text_message` (
`oubound_communication_id` int(11) NOT NULL,
`messages_count` int(11) NOT NULL,
`username` varchar(255) NOT NULL,
`method` varchar(255) NOT NULL,
PRIMARY KEY (`outbound_communication_id`),
FOREIGN KEY (outbound_communication_id)
REFERENCES outbound_communicaton(id)
) ENGINE=InnoDB;
CREATE TABLE `newsletter` (
`oubound_communication_id` int(11) NOT NULL,
`subject` varchar(78) DEFAULT NULL,
PRIMARY KEY (`outbound_communication_id`),
FOREIGN KEY (outbound_communication_id)
REFERENCES outbound_communicaton(id)
) ENGINE=InnoDB;
Then selecting a text msg is like this:
SELECT *
FROM outbound_communication AS parent
JOIN small_text_message
ON parent.id = small_text_message.outbound_communication_id
WHERE parent.id = 1234;
The nature of the query is inherently the union of the data from the small text message and the newsletter tables, so the UNION query is the only realistic formulation. There's no join of relevance between the two tables, for example.
So, I think you're very much on the right lines with your query.
Why are you worried about a UNION?

MYSQL: Find and delete similar records - Updated with example

I'm trying to dedup a table, where I know there are 'close' (but not exact) rows that need to be removed.
I have a single table, with 22 fields, and uniqueness can be established through comparing 5 of those fields. Of the remaining 17 fields, (including the unique key), there are 3 fields that cause each row to be unique, meaning the dedup proper method will not work.
I was looking at the multi table delete method outlined here: http://blog.krisgielen.be/archives/111 but I can't make sense of the final line of code (AND M1.cd*100+M1.track > M2.cd*100+M2.track) as I am unsure what the cd*100 part achieves...
Can anyone assist me with this? I suspect I could do better exporting the whole thing to python, doing something with it, then re-importing it, but then (1)I'm stuck with knowing how to dedup the string anyway! and (2) I had to break the record into chunks to be able to import it into mysql as it was timing out after 300 seconds so it turned into a whole debarkle to get into mysql in the first place.... (I am very novice at both mysql and python)
The table is a dump of some 40 log files from some testing. The test set for each log is some 20,000 files. The repeating values are either the test conditions, the file name/parameters or the results of the tests.
CREATE SHOW TABLE:
CREATE TABLE `t1` (
`DROID_V` int(1) DEFAULT NULL,
`Sig_V` varchar(7) DEFAULT NULL,
`SPEED` varchar(4) DEFAULT NULL,
`ID` varchar(7) DEFAULT NULL,
`PARENT_ID` varchar(10) DEFAULT NULL,
`URI` varchar(10) DEFAULT NULL,
`FILE_PATH` varchar(68) DEFAULT NULL,
`NAME` varchar(17) DEFAULT NULL,
`METHOD` varchar(10) DEFAULT NULL,
`STATUS` varchar(14) DEFAULT NULL,
`SIZE` int(10) DEFAULT NULL,
`TYPE` varchar(10) DEFAULT NULL,
`EXT` varchar(4) DEFAULT NULL,
`LAST_MODIFIED` varchar(10) DEFAULT NULL,
`EXTENSION_MISMATCH` varchar(32) DEFAULT NULL,
`MD5_HASH` varchar(10) DEFAULT NULL,
`FORMAT_COUNT` varchar(10) DEFAULT NULL,
`PUID` varchar(15) DEFAULT NULL,
`MIME_TYPE` varchar(24) DEFAULT NULL,
`FORMAT_NAME` varchar(10) DEFAULT NULL,
`FORMAT_VERSION` varchar(10) DEFAULT NULL,
`INDEX` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`INDEX`)
) ENGINE=MyISAM AUTO_INCREMENT=960831 DEFAULT CHARSET=utf8
The only unique field is the PriKey, 'index'.
Unique records can be established by looking at DROID_V,Sig_V,SPEED.NAME and PUID
Of the ¬900,000 rows, I have about 10,000 dups that are either a single duplicate of a record, or have upto 6 repetitions of the record.
Row examples: As Is
5;"v37";"slow";"10266";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/7";"image/tiff";"Tagged Ima";"3";"191977"
5;"v37";"slow";"10268";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/8";"image/tiff";"Tagged Ima";"4";"191978"
5;"v37";"slow";"10269";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/9";"image/tiff";"Tagged Ima";"5";"191979"
5;"v37";"slow";"10270";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/10";"image/tiff";"Tagged Ima";"6";"191980"
5;"v37";"slow";"12766";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/7";"image/tiff";"Tagged Ima";"3";"193977"
5;"v37";"slow";"12768";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/8";"image/tiff";"Tagged Ima";"4";"193978"
5;"v37";"slow";"12769";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/9";"image/tiff";"Tagged Ima";"5";"193979"
5;"v37";"slow";"12770";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/10";"image/tiff";"Tagged Ima";"6";"193980"
Row Example: As It should be
5;"v37";"slow";"10266";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/7";"image/tiff";"Tagged Ima";"3";"191977"
5;"v37";"slow";"10268";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/8";"image/tiff";"Tagged Ima";"4";"191978"
5;"v37";"slow";"10269";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/9";"image/tiff";"Tagged Ima";"5";"191979"
5;"v37";"slow";"10270";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/10";"image/tiff";"Tagged Ima";"6";"191980"
Please note, you can see from the index column at the end that I have cut out some other rows - I have only idenitified a very small set of repeating rows. Please let me know if you need any more 'noise' from the rest of the DB
Thanks.
I figured out a fix - using the count function, I was using a COUNT(*) that just returned everything in the table, by using a COUNT (distinct NAME) function I am able to weed out the dup rows that fit the dup critera (as set out by the field selection in a WHERE clause)
Example:
SELECT `PUID`,`DROID_V`,`SIG_V`,`SPEED`, COUNT(distinct NAME) as Hit FROM sourcelist, main_small WHERE sourcelist.SourcePUID = 'MyVariableHere' AND main_small.NAME = sourcelist.SourceFileName
GROUP BY `PUID`,`DROID_V`,`SIG_V`,`SPEED` ORDER BY `DROID_V` ASC, `SIG_V` ASC, `SPEED`;