Improving select speed - mysql - very large tables - mysql

Newbie to MySQL and SQL in general - so please be gentle :-)
I have a table with a very high number of rows. The table is:
create table iostat (
pkey int not null auto_increment,
serverid int not null,
datestr char(15) default 'NULL',
esttime int not null default 0,
rs float not null default 0.0,
ws float not null default 0.0,
krs float not null default 0.0,
kws float not null default 0.0,
wait float not null default 0.0,
actv float not null default 0.0,
wsvct float not null default 0.0,
asvct float not null default 0.0,
pctw int not null default 0,
pctb int not null default 0,
device varchar(50),
avgread float not null default 0.0,
avgwrit float not null default 0.0,
primary key (pkey),
index i_serverid (serverid),
index i_esttime (esttime),
index i_datestr (datestr),
index i_rs (rs),
index i_ws (ws),
index i_krs (krs),
index i_kws (kws),
index i_wait (wait),
index i_actv (actv),
index i_wsvct (wsvct),
index i_asvct (asvct),
index i_pctb (pctb),
index i_device (device),
index i_servdate (serverid, datestr),
index i_servest (serverid, esttime)
)
engine = MyISAM
data directory = '${IOSTATdatadir}'
index directory = '${IOSTATindexdir}'
;
Right now the table has 834,317,203 rows.
Yes - I need all the data. The highest level organization of the data is by the collection date (datestr). It is a CHAR instead of a date to preserve the specific date format I use for the various load, extract, and analysis scripts.
Each day adds about 16,000,000 rows.
One of the operations I would like to speed up is (Limit is generally 50 but ranges from 10 to 250):
create table TMP_TopLUNsKRead
select
krs, device, datestr, esttime
from
iostat
where
${WHERECLAUSE}
order by
krs desc limit ${Limit};
WHERECLAUSE is:
serverid = 29 and esttime between X and Y and device like '%t%'
where X and Y are timestamps spanning anywhere from 4 minutes to 24 hours.
I'd prefer to not change the DB engine. This lets me put data and indexes on separate drives which gave me significant overall performance. It's also a total of 1.6 billion rows, which would take an insane amount of time to reload.

device like '%t%'
This is the killer. The leading % means it is a search of the whole column, or index if it's indexed, not an index lookup. See if you can do without the leading %.

Without knowing what's in your ${WHERECLAUSE} it's impossible to help you. You are correct that this is a huge table.
But here is an observation that might help: A compound covering index on
(krs, device, datestr, esttime)
might speed up the ordering and extraction of your subset of data.

Related

MySQL - Join row with the next N smaller rows

I have a table:
id timestamp
1 1
23 2
12 4
45 6
3 7
4 8
I need this result:
major minor
1 2
1 4
1 6
2 4
2 6
2 7
I need to join each number, with the next 3 smallest numbers. Since these numbers are inserted out of order, I can't use the ids.
Because the numbers are also not in regular intervals I cannot set a specific limit to find the max number to join with.
Solutions I have:
I could create a temp table and use an auto increment id to do this.
I can do this for a single number, and write a script to iterate through the table. This is the query for it (Going with this for now, till something better comes up):
SELECT * FROM
(SELECT id major_id, timestamp major_timestamp FROM timestamps WHERE interval_id=7 ORDER BY timestamp DESC limit 1) timestamps_major
LEFT JOIN
(SELECT id minor_id, timestamp minor_timestamp FROM timestamps WHERE timestamp < (SELECT timestamp FROM timestamps WHERE interval_id=7 ORDER BY timestamp DESC limit 1) ORDER BY timestamp DESC LIMIT 2) timestamps_minor
ON major_timestamp>minor_timestamp
This just needs to be done for all numbers once, and then once per day to calculate and store a moving average. So speed is not an issue.
Wondering what is the best way to approach this. Thanks.
EDIT:
This is the actual table with timestamps and ids. The example I posted is just simplified for the sake of the question.
CREATE TABLE `timestamps` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`interval_id` tinyint(3) unsigned NOT NULL,
`timestamp` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `interval_timestamp` (`interval_id`,`timestamp`),
KEY `interval_id` (`interval_id`),
KEY `timestamp` (`timestamp`),
CONSTRAINT `timestamps_ibfk_1` FOREIGN KEY (`interval_id`) REFERENCES `intervals` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=75157 DEFAULT CHARSET=latin1
Here's a possible solution (see this sqlfiddle to play around with it)
SELECT *
FROM mytable major inner join mytable minor
ON minor.timestamp > major.timestamp
WHERE (SELECT COUNT(*) FROM mytable m WHERE m.timestamp < minor.timestamp and m.timestamp > major.timestamp) < 3
ORDER BY major.timestamp, minor.timestamp
I'm definitely not confident this is the cleanest solution (and I didn't do anything to handle "ties" for equal timestamps), but it does do what you want so it might be something to build off of at a minimum.
All I am doing is joining the tables then counting the number of rows "between" the major and minor so that I don't get too many.

MySql Data size [duplicate]

What is the size of column of int(11) in mysql in bytes?
And Maximum value that can be stored in this columns?
An INT will always be 4 bytes no matter what length is specified.
TINYINT = 1 byte (8 bit)
SMALLINT = 2 bytes (16 bit)
MEDIUMINT = 3 bytes (24 bit)
INT = 4 bytes (32 bit)
BIGINT = 8 bytes (64 bit).
The length just specifies how many characters to pad when selecting data with the mysql command line client. 12345 stored as int(3) will still show as 12345, but if it was stored as int(10) it would still display as 12345, but you would have the option to pad the first five digits. For example, if you added ZEROFILL it would display as 0000012345.
... and the maximum value will be 2147483647 (Signed) or 4294967295 (Unsigned)
INT(x) will make difference only in term of display, that is to show the number in x digits, and not restricted to 11. You pair it using ZEROFILL, which will prepend the zeros until it matches your length.
So, for any number of x in INT(x)
if the stored value has less digits than x, ZEROFILL will prepend zeros.
INT(5) ZEROFILL with the stored value of 32 will show 00032
INT(5) with the stored value of 32 will show 32
INT with the stored value of 32 will show 32
if the stored value has more digits than x, it will be shown as it is.
INT(3) ZEROFILL with the stored value of 250000 will show 250000
INT(3) with the stored value of 250000 will show 250000
INT with the stored value of 250000 will show 250000
The actual value stored in database is not affected, the size is still the same, and any calculation will behave normally.
This also applies to BIGINT, MEDIUMINT, SMALLINT, and TINYINT.
According to here, int(11) will take 4 bytes of space that is 32 bits of space with 2^(31) = 2147483648 max value and -2147483648min value. One bit is for sign.
As others have said, the minumum/maximum values the column can store and how much storage it takes in bytes is only defined by the type, not the length.
A lot of these answers are saying that the (11) part only affects the display width which isn't exactly true, but mostly.
A definition of int(2) with no zerofill specified will:
still accept a value of 100
still display a value of 100 when output (not 0 or 00)
the display width will be the width of the largest value being output from the select query.
The only thing the (2) will do is if zerofill is also specified:
a value of 1 will be shown 01.
When displaying values, the column will always have a width of the maximum possible value the column could take which is 10 digits for an integer, instead of the miniumum width required to display the largest value that column needs to show for in that specific select query, which could be much smaller.
The column can still take, and show a value exceeding the length, but these values will not be prefixed with 0s.
The best way to see all the nuances is to run:
CREATE TABLE `mytable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`int1` int(10) NOT NULL,
`int2` int(3) NOT NULL,
`zf1` int(10) ZEROFILL NOT NULL,
`zf2` int(3) ZEROFILL NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `mytable`
(`int1`, `int2`, `zf1`, `zf2`)
VALUES
(10000, 10000, 10000, 10000),
(100, 100, 100, 100);
select * from mytable;
which will output:
+----+-------+-------+------------+-------+
| id | int1 | int2 | zf1 | zf2 |
+----+-------+-------+------------+-------+
| 1 | 10000 | 10000 | 0000010000 | 10000 |
| 2 | 100 | 100 | 0000000100 | 100 |
+----+-------+-------+------------+-------+
This answer is tested against MySQL 5.7.12 for Linux and may or may not vary for other implementations.
What is the size of column of int(11) in mysql in bytes?
(11) - this attribute of int data type has nothing to do with size of column. It is just the display width of the integer data type. From 11.1.4.5. Numeric Type Attributes:
MySQL supports an extension for optionally specifying the display
width of integer data types in parentheses following the base keyword
for the type. For example, INT(4) specifies an INT with a display
width of four digits.
A good explanation for this can be found here
To summarize : The number N in int(N) is often confused by the maximum size allowed for the column, as it does in the case of varchar(N). But this is not the case with Integer data types- the number N in the parentheses is not the maximum size for the column, but simply a parameter to tell MySQL what width to display the column at when the table's data is being viewed via the MySQL console (when you're using the ZEROFILL attribute).
The number in brackets will tell MySQL how many zeros to pad incoming integers with. For example: If you're using ZEROFILL on a column that is set to INT(5) and the number 78 is inserted, MySQL will pad that value with zeros until the number satisfies the number in brackets. i.e. 78 will become 00078 and 127 will become 00127. To sum it up: The number in brackets is used for display purposes.
In a way, the number in brackets is kind of usless unless you're using the ZEROFILL attribute.
So the size for the int would remain same i.e., -2147483648 to 2147483648 for signed and 0 to 4294967295 for unsigned (~ 2.15 billions and 4.2 billions, which is one of the reasons why developers remain unaware of the story behind the Number N in parentheses, as it hardly affects the database unless it contains over 2 billions of rows), and in terms of bytes it would be 4 bytes.
For more information on Integer Types size/range, refer to MySQL Manual
In MySQL integer int(11) has size is 4 bytes which equals 32 bit.
Signed value is : -2^(32-1) to 0 to 2^(32-1)-1
= -2147483648 to 0 to 2147483647
Unsigned values is : 0 to 2^32-1
= 0 to 4294967295
Though this answer is unlikely to be seen, I think the following clarification is worth making:
the (n) behind an integer data type in MySQL is specifying the display width
the display width does NOT limit the length of the number returned from a query
the display width DOES limit the number of zeroes filled for a zero filled column so the total number matches the display width (so long as the actual number does not exceed the display width, in which case the number is shown as is)
the display width is also meant as a useful tool for developers to know what length the value should be padded to
A BIT OF DETAIL
the display width is, apparently, intended to provide some metadata about how many zeros to display in a zero filled number.
It does NOT actually limit the length of a number returned from a query if that number goes above the display width specified.
To know what length/width is actually allowed for an integer data type in MySQL see the list & link: (types: TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT);
So having said the above, you can expect the display width to have no affect on the results from a standard query, unless the columns are specified as ZEROFILL columns
OR
in the case the data is being pulled into an application & that application is collecting the display width to use for some other sort of padding.
Primary Reference: https://blogs.oracle.com/jsmyth/entry/what_does_the_11_mean
according to this book:
MySQL lets you specify a “width” for integer types, such as INT(11).
This is meaningless for most applications: it does not restrict the
legal range of values, but simply specifies the number of characters
MySQL’s interactive tools will reserve for display purposes. For
storage and computational purposes, INT(1) is identical to INT(20).
I think max value of int(11) is 4294967295
4294967295 is the answer, because int(11) shows maximum of 11 digits IMO

MySQL indexing and Using filesort

This related to my last problem. I made a new two columns in the listings table, one for composed views views_point (increment every 100 view) and one for publish on date publishedon_hourly (by year-month-day hour only) to make some unique values.
This is my new table:
CREATE TABLE IF NOT EXISTS `listings` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`type` tinyint(1) NOT NULL DEFAULT '1',
`hash` char(32) NOT NULL,
`source_id` int(10) unsigned NOT NULL,
`link` varchar(255) NOT NULL,
`short_link` varchar(255) NOT NULL,
`cat_id` mediumint(5) NOT NULL,
`title` mediumtext NOT NULL,
`description` mediumtext,
`content` mediumtext,
`images` mediumtext,
`videos` mediumtext,
`views` int(10) unsigned NOT NULL DEFAULT '0',
`views_point` int(10) unsigned NOT NULL DEFAULT '0',
`comments` int(11) DEFAULT '0',
`comments_update` int(11) NOT NULL DEFAULT '0',
`editor_id` int(11) NOT NULL DEFAULT '0',
`auther_name` varchar(255) DEFAULT NULL,
`createdby_id` int(10) NOT NULL,
`createdon` int(20) NOT NULL,
`editedby_id` int(10) NOT NULL,
`editedon` int(20) NOT NULL,
`deleted` tinyint(1) NOT NULL,
`deletedon` int(20) NOT NULL,
`deletedby_id` int(10) NOT NULL,
`deletedfor` varchar(255) NOT NULL,
`published` tinyint(1) NOT NULL DEFAULT '1',
`publishedon` int(11) unsigned NOT NULL,
`publishedon_hourly` int(10) unsigned NOT NULL DEFAULT '0',
`publishedby_id` int(10) NOT NULL,
PRIMARY KEY (`id`),
KEY `hash` (`hash`),
KEY `views_point` (`views_point`),
KEY `listings` (`publishedon_hourly`,`published`,`cat_id`,`source_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ROW_FORMAT=FIXED AUTO_INCREMENT=365513 ;
When I run a query like this:
SELECT *
FROM listings
WHERE (`publishedon_hourly` BETWEEN
UNIX_TIMESTAMP( '2015-09-5 00:00:00' )
AND UNIX_TIMESTAMP( '2015-10-5 12:00:00' ))
AND (published =1)
AND cat_id IN ( 1, 2, 3, 4, 5 )
ORDER BY by `views_point` DESC
LIMIT 10
It is working great and this the explanation:
But when I change the date range from month to day like this:
SELECT *
FROM listings
WHERE (`publishedon_hourly` BETWEEN
UNIX_TIMESTAMP( '2015-09-5 00:00:00' )
AND UNIX_TIMESTAMP( '2015-09-5 12:00:00' ))
AND (published =1)
AND cat_id IN ( 1, 2, 3, 4, 5 )
ORDER BY `views_point` DESC
LIMIT 10
Then the query becomes slow and the filesort appears. Any one know the reason and how can I fix it?
the data sample (from the slow query)
INSERT INTO `listings` (`id`, `type`, `hash`, `source_id`, `link`, `short_link`, `cat_id`, `title`, `description`, `content`, `images`, `videos`, `views`, `views_point`, `comments`, `comments_update`, `editor_id`, `auther_name`, `createdby_id`, `createdon`, `editedby_id`, `editedon`, `deleted`, `deletedon`, `deletedby_id`, `deletedfor`, `published`, `publishedon`, `publishedon_hourly`, `publishedby_id`) VALUES
(94189, 1, '44a46d128ce730c72927b19c445ab26e', 8, 'http://Larkin.com/sapiente-laboriosam-omnis-tempore-aliquam-qui-nobis', '', 5, 'And Alice was more and.', 'So they got settled down again very sadly and quietly, and.', 'Dormouse. ''Fourteenth of March, I think it so quickly that the Gryphon only answered ''Come on!'' and ran the faster, while more and more sounds of broken glass, from which she concluded that it was looking down at them, and then a voice sometimes choked with sobs, to sing this:-- ''Beautiful Soup, so rich and green, Waiting in a natural way. ''I thought you did,'' said the Dormouse, without considering at all what had become of it; and as it.', NULL, '', 200, 19700, 0, 0, 0, 'Max', 0, 1441442729, 0, 0, 0, 0, 0, '', 1, 1441442729, 1441440000, 0),
(19030, 1, '3438f6a555f2ce7fdfe03cee7a52882a', 3, 'http://Romaguera.com/voluptatem-rerum-quia-sed', '', 2, 'Dodo said, ''EVERYBODY.', 'I wish I hadn''t to bring but one; Bill''s got the.', 'I wonder what they''ll do well enough; don''t be particular--Here, Bill! catch hold of this remark, and thought to herself. (Alice had no idea what Latitude or Longitude I''ve got to the confused clamour of the other queer noises, would change to dull reality--the grass would be offended again. ''Mine is a long way. So she went on. ''I do,'' Alice said nothing; she had succeeded in curving it down ''important,'' and some were birds,) ''I suppose so,''.', NULL, '', 800, 19400, 0, 0, 0, 'Antonio', 0, 1441447567, 0, 0, 0, 0, 0, '', 1, 1441447567, 1441447200, 0),
(129247, 4, '87d2029a300d8b4314508786eb620a24', 10, 'http://Ledner.com/', '', 4, 'I ever saw one that.', 'The Cat seemed to be a person of authority among them,.', 'I BEG your pardon!'' she exclaimed in a natural way again. ''I wonder what was the same height as herself; and when she looked down at her feet as the question was evidently meant for her. ''I can tell you my history, and you''ll understand why it is I hate cats and dogs.'' It was all dark overhead; before her was another long passage, and the blades of grass, but she had sat down a very little! Besides, SHE''S she, and I''m sure I have dropped them, I wonder?'' As she said to herself; ''his eyes are so VERY tired of being all alone here!'' As she said to itself ''Then I''ll go round a deal.', NULL, '', 1000, 19100, 0, 0, 0, 'Drake', 0, 1441409756, 0, 0, 0, 0, 0, '', 1, 1441409756, 1441407600, 0),
(264582, 2, '5e44fe417f284f42c3b10bccd9c89b14', 8, 'http://www.Dietrich.info/laboriosam-quae-eaque-aut-dolorem', '', 2, 'Alice asked in a very.', 'THINK; or is it directed to?'' said the Mock Turtle,.', 'I can listen all day to such stuff? Be off, or I''ll have you executed.'' The miserable Hatter dropped his teacup and bread-and-butter, and then unrolled the parchment scroll, and read as follows:-- ''The Queen will hear you! You see, she came upon a little of the players to be lost, as she spoke--fancy CURTSEYING as you''re falling through the wood. ''It''s the stupidest tea-party I.', NULL, '', 800, 18700, 0, 0, 0, 'Kevin', 0, 1441441192, 0, 0, 0, 0, 0, '', 1, 1441441192, 1441440000, 0),
(44798, 1, '567cc77ba88c05a4a805dc667816a30c', 14, 'http://www.Hintz.com/distinctio-nulla-quia-incidunt-facere-reprehenderit-sapiente-sint.html', '', 5, 'The Cat seemed to Alice.', 'And the moral of that is--"Be what you mean,'' said Alice..', 'Alice very politely; but she felt very lonely and low-spirited. In a little faster?" said a sleepy voice behind her. ''Collar that Dormouse,'' the Queen said severely ''Who is it directed to?'' said the Footman, and began staring at the Footman''s head: it just at first, but, after watching it a violent blow underneath her chin: it had no pictures or conversations in it, ''and what is the capital of Paris, and Paris is the same thing, you know.'' ''I DON''T.', NULL, '', 300, 17600, 0, 0, 0, 'Rocio', 0, 1441442557, 0, 0, 0, 0, 0, '', 1, 1441442557, 1441440000, 0),
(184472, 1, 'f852e3ed401c7c72c5a9609687385f65', 14, 'https://www.Schumm.biz/voluptatum-iure-qui-dicta-modi-est', '', 4, 'Alice replied, so.', 'I should have liked teaching it tricks very much, if--if.', 'NEVER come to the Dormouse, not choosing to notice this question, but hurriedly went on, ''What''s your name, child?'' ''My name is Alice, so please your Majesty,'' said Two, in a great thistle, to keep back the wandering hair that WOULD always get into her face. ''Wake up, Alice dear!'' said her sister; ''Why, what a dear quiet thing,'' Alice went on, spreading out the answer to shillings and pence. ''Take off your hat,'' the King had said that day. ''No, no!'' said the Gryphon. ''They can''t have anything to say, she simply bowed, and took the watch and looked at it again: but he could.', NULL, '', 900, 17600, 0, 0, 0, 'Billy', 0, 1441407837, 0, 0, 0, 0, 0, '', 1, 1441407837, 1441407600, 0),
(344246, 2, '09dc73287ff642cfa2c97977dc42bc64', 6, 'http://www.Cole.com/sit-maiores-et-quam-vitae-ut-fugiat', '', 1, 'IS the use of a.', 'And when I learn music.'' ''Ah! that accounts for it,'' said.', 'Gryphon answered, very nearly carried it out loud. ''Thinking again?'' the Duchess by this time.) ''You''re nothing but a pack of cards, after all. I needn''t be so stingy about it, you know--'' ''But, it goes on "THEY ALL RETURNED FROM HIM TO YOU,"'' said Alice. ''Call it what you mean,'' the March Hare, ''that "I breathe when I breathe"!'' ''It IS the same side of WHAT? The other guests had taken his watch out of it, and talking over its head. ''Very uncomfortable for the first to speak. ''What size do you like to go and get.', NULL, '', 600, 16900, 0, 0, 0, 'Enrico', 0, 1441406107, 0, 0, 0, 0, 0, '', 1, 1441406107, 1441404000, 0),
(19169, 1, '116c443b5709e870248c93358f9a328e', 12, 'http://www.Gleason.com/et-vero-optio-exercitationem-aliquid-optio-consectetur', '', 4, 'Let this be a lesson to.', 'Sir, With no jury or judge, would be very likely to eat.', 'I wonder who will put on your head-- Do you think I can find them.'' As she said this, she was quite out of sight before the end of every line: ''Speak roughly to your little boy, And beat him when he sneezes; For he can EVEN finish, if he had never heard of such a subject! Our family always HATED cats: nasty, low, vulgar things! Don''t let him know she liked them best, For this must ever be A secret, kept from all the creatures wouldn''t be so kind,'' Alice replied, so eagerly that the way I want to get very tired of being upset, and their curls got entangled together. Alice was not a regular rule: you invented it just grazed his nose, you know?'' ''It''s the thing Mock Turtle would be only.', NULL, '', 700, 16800, 0, 0, 0, 'Unique', 0, 1441407961, 0, 0, 0, 0, 0, '', 1, 1441407961, 1441407600, 0),
(192679, 1, '06a33747b5c95799034630e578e53dc5', 10, 'http://www.Pouros.com/qui-id-molestias-non-dolores-non', '', 5, 'Rabbit just under the.', 'KNOW IT TO BE TRUE--" that''s the jury-box,'' thought Alice,.', 'Mock Turtle, who looked at Two. Two began in a hoarse, feeble voice: ''I heard every word you fellows were saying.'' ''Tell us a story.'' ''I''m afraid I can''t tell you how it was too dark to see what I should say "With what porpoise?"'' ''Don''t you mean by that?'' said the King; and as it was indeed: she was now more than Alice could not make out exactly what they WILL do next! As for pulling me out of court! Suppress him! Pinch him! Off with his head!"'' ''How dreadfully savage!'' exclaimed Alice. ''That''s the first witness,'' said the Duchess. ''Everything''s got a moral, if only you can find it.'' And she squeezed herself up and ran the faster, while more and more faintly came, carried on the end of every line:.', NULL, '', 800, 15900, 0, 0, 0, 'Gene', 0, 1441414720, 0, 0, 0, 0, 0, '', 1, 1441414720, 1441411200, 0),
(251878, 4, '3eafacc53f86c8492c309ca2772fbfe9', 5, 'http://www.Schinner.info/tempora-et-est-qui-nulla', '', 2, 'NOT!'' cried the Mouse,.', 'Twinkle, twinkle--"'' Here the Queen till she heard the.', 'Alice and all of them even when they hit her; and the sounds will take care of the gloves, and she dropped it hastily, just in time to begin at HIS time of life. The King''s argument was, that she had forgotten the Duchess to play croquet with the Dormouse. ''Write that down,'' the King added in an undertone to the fifth bend, I think?'' ''I had NOT!'' cried the Mouse, sharply and very neatly and simply arranged; the only difficulty was, that if something wasn''t done about it in less than a pig, my dear,'' said Alice, a little wider. ''Come, it''s pleased so far,'' said the Gryphon. ''Do you play croquet with the glass table and the King hastily said, and went by without noticing her. Then followed the Knave ''Turn them over!'' The Knave of.', NULL, '', 500, 15900, 0, 0, 0, 'Demarcus', 0, 1441414681, 0, 0, 0, 0, 0, '', 1, 1441414681, 1441411200, 0);
In your first query, the ORDER BY is done using the views_point INDEX, because it was used in the WHERE part of the query and therefore in MySQL can be used for sorting.
In the second query, MySQL resolves the WHERE part using a different index, listing_pcs. This cannot be used to satisfy the ORDER BY condition. MySQL uses filesort instead, which is the best option if an index cannot be used.
MySQL only uses indexes to sort if the index is the same as that used in the WHERE condition. This is what the manual means by:
In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause. These cases include the following:
The key used to fetch the rows is not the same as the one used in the ORDER BY:
SELECT * FROM t1 WHERE key2=constant ORDER BY key1;
So what can you do:
Try increasing your sort_buffer_size config option to make filesorting as effective as possible. Large results that are too big for the sort buffer cause MySQL to break the sort down into chunks, which is slower.
Force MySQL to choose a different index. It’s worth noting that different MySQL versions choose default indexes differently. Version 5.1, for example, is pretty bad as the Query Optimizer had been vastly re-written for this release and needed lots of refinement. Version 5.6 is pretty good.
SELECT *
FROM listings
FORCE INDEX (views_point)
WHERE (`publishedon_hourly` BETWEEN
UNIX_TIMESTAMP( '2015-09-5 00:00:00' )
AND UNIX_TIMESTAMP( '2015-09-5 12:00:00' ))
AND (published =1)
AND cat_id IN ( 1, 2, 3, 4, 5 )
ORDER BY `views_point` DESC
LIMIT 10
It seems that some kind of news database, so try to think about make some sort of news archiving every month.
Think about this solution, it's not the best but it may help
Add these columns into listings table
publishedmonth tinyint(2) UNSIGNED NOT NULL DEFAULT '0'
publishedyear tinyint(2) UNSIGNED NOT NULL DEFAULT '0'
publishedminute mediumint(6) UNSIGNED NOT NULL DEFAULT '0'
Add this INDEXING KEY into listings table
ADD KEY published_month (publishedmonth,publishedyear,publishedminute)
During inserting use these values from PHP code
publishedmonth will has date('n')
publishedyear will has date('y')
publishedminute will has date('jHi')
Dump huge number of records then test this query
SELECT * FROM listings WHERE publishedmonth = 2 AND publishedyear = 17 ORDER BY publishedminute
The EXPLAIN says listings_pcs, but the SHOW CREATE TABLE does not list that index. Are we missing something?
Don't use SELECT * if you only need a few columns. In particular the TEXT columns will prevent one form of performance speedup during the query.
Subqueries to work out part of the query usually show things down. However, in your case (lots of MEDIUMTEXT being fetched, and use of LIMIT), it may be efficient to get the ids in a subquery first, then fetch the bulky columns. ("Lazy eval") See below.
A range value (publishedon_hourly) is better off last, not first, in an index.
Starting an index with = column (published) is usually best.
The Optimizer chooses, sometimes incorrectly, to focus on the ORDER BY instead of the WHERE. (Neither is very productive in your case).
INDEX(published, views_point) may avoid the sort, while helping some with the WHERE.
Having a flag (published) that is always tested in queries adds to the complexity and inefficiency of the schema.
BETWEEN is inclusive, so the second query is actually scanning 12 hours plus one second.
Splitting a date into year+month+day usually hurts more than helps.
Do not set sort_buffer_size bigger than, say, 1% of RAM. Otherwise, you may encounter other problems.
FORCE INDEX may help today, but then hurt tomorrow when the constants change. Caveat emptor.
It is often better to put "click_count" or "likes" or "upvotes" into a separate table. This separates rapidly changing counters from the bulky, relatively static, data. Hence, there is less interference between the two.
If you do the above, simply remove non-published rows from the counter table, thereby simplifying several things.
Most people vilify the filesort, but it is usually other things that are the villains -- in your case, the number and size of rows.
Please provide EXPLAIN FORMAT=JSON SELECT ...; there may be some interesting clues.
Your findings are odd enough to warrant filling a bug at bugs.mysql.com .
I would add these indexes with the columns in the order given, and see what the Optimizer picks:
INDEX(published, views_point) -- aiming at the ORDER BY, plus picking up '='
INDEX(published, cat_id, publishedon_hourly) -- possibly the best for WHERE
Or, maybe, the "lazy eval" of
SELECT L.*
FROM listings AS L
JOIN (
SELECT id
FROM listings
WHERE `publishedon_hourly` BETWEEN UNIX_TIMESTAMP(...)
AND UNIX_TIMESTAMP(...)
AND published = 1
AND cat_id IN ( 1, 2, 3, 4, 5 )
ORDER BY `views_point` DESC
LIMIT 10
) AS s ON L.id = s.id
ORDER BY views_point DESC
-- with
INDEX(published, cat_id, publishedon_hourly, views_point, id)
Notes:
The subquery will be "Using Index"; that is, the index is covering.
There will be two file sorts. One is in the subquery, but working from the index, not the bulky texts. And one is only 10 rows, although bulky.
Very odd behavior. Hard to see why views_point would not be used for sort operation withiout seeing the data in question. You can try to give an index hint for MySQL to use views_point for sort like this.
SELECT * FROM listings
USE INDEX FOR ORDER BY (`views_point`)
WHERE
(
`publishedon_hourly` BETWEEN UNIX_TIMESTAMP( '2015-09-5 00:00:00' )
AND UNIX_TIMESTAMP( '2015-09-5 12:00:00' )
)
AND (published =1)
AND cat_id IN ( 1, 2, 3, 4, 5 )
ORDER BY `views_point` DESC LIMIT 10
Query optimizer is not perfect. This one of those cases where it makes wrong decision. It happens for some border line cases. If the data in your table would change even by a small amount it will perhaps use the other index and run the faster query.
You don't wont to wait for it, you can change your listing_pcs index. It has source_id but you are not using. So why not replace it with view_points?
KEY `listings` (`publishedon_hourly`,`published`,`point`,`cat_id`)
Also using tinyint(1) not much use for speed or saving space. It still takes one full byte. And same mediumint(5) it take 3 bytes. Combine deleted, type,catid and published into one column and put index on that one column.

How can I speed up this MySQL query that finds the closest locations to a given latitude/longitude?

I have a zip code table in my database which is used in conjunction with a business table to find businesses matching certain criteria that is closest to a specified zip code. The first thing I do is grab just the latitude and longitude since it's used in a couple places on the page. I use:
$zipResult = mysql_fetch_array(mysql_query("SELECT latitude,longitude FROM zipCodes WHERE zipCode='".mysql_real_escape_string($_SESSION['zip'])."' Limit 1"));
$latitude = $zipResult['latitude'];
$longitude = $zipResult['longitude'];
$radius = 100;
$lon1 = $longitude - $radius / abs(cos(deg2rad($latitude))*69);
$lon2 = $longitude + $radius / abs(cos(deg2rad($latitude))*69);
$lat1 = $latitude - ($radius/69);
$lat2 = $latitude + ($radius/69);
From there, I generate the query:
$query2 = "Select * From (SELECT business.*,zipCodes.longitude,zipCodes.latitude,
(3956 * 2 * ASIN ( SQRT (POWER(SIN((zipCodes.latitude - $latitude)*pi()/180 / 2),2) + COS(zipCodes.latitude* pi()/180) * COS($latitude *pi()/180) * POWER(SIN((zipCodes.longitude - $longitude) *pi()/180 / 2), 2) ) )) as distance FROM business INNER JOIN zipCodes ON (business.listZip = zipCodes.zipCode)
Where business.active = 1
And (3958*3.1415926*sqrt((zipCodes.latitude-$latitude)*(zipCodes.latitude-$latitude) + cos(zipCodes.latitude/57.29578)*cos($latitude/57.29578)*(zipCodes.longitude-$longitude)*(zipCodes.longitude-$longitude))/180) <= '$radius'
And zipCodes.longitude between $lon1 and $lon2 and zipCodes.latitude between $lat1 and $lat2
GROUP BY business.id ORDER BY distance) As temp Group By category_id ORDER BY distance LIMIT 18";
Which turns out something like:
Select *
From (SELECT business.*,zipCodes.longitude,zipCodes.latitude, (3956 * 2 * ASIN ( SQRT (POWER(SIN((zipCodes.latitude - 39.056784)*pi()/180 / 2),2) + COS(zipCodes.latitude* pi()/180) * COS(39.056784 *pi()/180) * POWER(SIN((zipCodes.longitude - -84.343573) *pi()/180 / 2), 2) ) )) as distance
FROM business
INNER JOIN zipCodes ON (business.listZip = zipCodes.zipCode)
Where business.active = 1
And (3958*3.1415926*sqrt((zipCodes.latitude-39.056784)*(zipCodes.latitude-39.056784) + cos(zipCodes.latitude/57.29578)*cos(39.056784/57.29578)*(zipCodes.longitude--84.343573)*(zipCodes.longitude--84.343573))/180) <= '100'
And zipCodes.longitude between -86.2099407074 and -82.4772052926
and zipCodes.latitude between 37.6075086377 and 40.5060593623
GROUP BY business.id
ORDER BY distance) As temp
Group By category_id
ORDER BY distance
LIMIT 18
The code runs and executes just fine, but it takes just over a second to complete (usually around 1.1 seconds). However, I've been told that in some browsers the page loads slowly. I have tested this is multiple browsers and multiple versions of those browsers without ever seeing an issue. However, I figure if I can get the execution time down it will help either way. The problem is I do not know what else I can do to cut down on the execution time. The zip code table already came with preset indexes which I assume are good (and contains the columns I'm using in my queries). I've added indexes to the business table as well, though I'm not too knowledgeable about them. But I've made sure to include the fields used in the Where clause at least, and maybe a couple more.
If I need to add my indexes to this question just let me know. If you see something in the query itself I can improve also please let me know.
Thanks,
James
EDIT
Table structure for the business table:
CREATE TABLE IF NOT EXISTS `business` (
`id` smallint(6) unsigned NOT NULL AUTO_INCREMENT,
`active` tinyint(3) unsigned NOT NULL,
`featured` enum('yes','no') NOT NULL DEFAULT 'yes',
`topFeatured` tinyint(1) unsigned NOT NULL DEFAULT '0',
`category_id` smallint(5) NOT NULL DEFAULT '0',
`listZip` varchar(12) NOT NULL,
`name` tinytext NOT NULL,
`address` tinytext NOT NULL,
`city` varchar(128) NOT NULL,
`state` varchar(32) NOT NULL DEFAULT '',
`zip` varchar(12) NOT NULL,
`phone` tinytext NOT NULL,
`alt_phone` tinytext NOT NULL,
`website` tinytext NOT NULL,
`logo` tinytext NOT NULL,
`index_logo` tinytext NOT NULL,
`large_image` tinytext NOT NULL,
`description` text NOT NULL,
`views` int(5) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `featured` (`featured`,`topFeatured`,`category_id`,`listZip`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=3085 ;
SQL Fiddle
http://sqlfiddle.com/#!2/2e26ff/1
EDIT 2014-03-26 09:09
I've updated my query, but the shorter query actually takes about .2 seconds longer to execute every time.
Select * From (
SELECT Distinct business.id, business.name, business.large_image, business.logo, business.address, business.city, business.state, business.zip, business.phone, business.alt_phone, business.website, business.description, zipCodes.longitude, zipCodes.latitude, (3956 * 2 * ASIN ( SQRT (POWER(SIN((zipCodes.latitude - 39.056784)*pi()/180 / 2),2) + COS(zipCodes.latitude* pi()/180) * COS(39.056784 *pi()/180) * POWER(SIN((zipCodes.longitude - -84.343573) *pi()/180 / 2), 2) ) )) as distance
FROM business
INNER JOIN zipCodes ON (business.listZip = zipCodes.zipCode)
Where business.active = 1
And zipCodes.longitude between -86.2099407074 and -82.4772052926
And zipCodes.latitude between 37.6075086377 and 40.5060593623
GROUP BY business.category_id
HAVING distance <= '50'
ORDER BY distance
) As temp LIMIT 18
There is already an index on the zip code, latitude, and longitude fields in the zip codes database, both all in one index, and each with their own index. That's just how the table came when purchased.
I had updated the listZip data type to match the zip code table's zip data type yesterday.
I did take out the GROUP BY business.id and replace it with DISTINCT, but left the GROUP BY business.category_id because I only want one business per category.
Also, I started getting the 0.2 second execution difference as soon as I changed the query to use the HAVING clause instead of the math formula in the WHERE clause. I did try using WHERE distance <= 50 in the outer-query, but that didn't speed anything up either. Also using 50 miles instead of 100 miles doesn't seem to effect this particular query either.
Thanks for all of the suggestions so far though.
Put indexes on zipCodes.longitude and zipCodes.latitude. That should help a lot.
See here for more information. http://www.plumislandmedia.net/mysql/haversine-mysql-nearest-loc/
Edit you need an index in the zipCodes table on longitude alone or starting with longitude. It looks to me like you should try a composite index on
(longitude, latitude, zipCode)
for best results.
Make the data types of zipCodes.zipCode and business.listingZip the same, so the join will be more efficient. If those data types are different, MySQL will typecast one to the other as it does the join, and so the join will be inefficient. Make sure business.listingZip has an index.
You are misusing GROUP BY. (Did you maybe mean SELECT DISTINCT?) It makes no sense unless you also use an aggregate function like MAX() In a similar vein, see if you can get rid of the * in SELECT business.*, and instead give a list of the columns you need.
100 miles is a very wide search radius. Narrow it a bit to speed things up.
You're computing the great circle distance twice. You surely can recast the query to do it once.

How to to insert floats into mysql and then query eqality

Wowee ..does mysql work with floats or not!
1) I insert a float into mysql field
price = 0.1
2) I run the below query:
select * from buy_test where price = 0.1
WOW! I get no results
3) I run the below query:
select * from buy_test where price < 0.1
I get no results
4) I run the below query
select * from buy_test where price > 0.1
YAY! I get results but no..I wanted where price =0.1
How to I insert a float to mysql so I can query a float in mysql
Thanks
CREATE TABLE `buy_test` (
`user_id` varchar(45) DEFAULT NULL,
`order_id` varchar(100) NOT NULL,
`price` float DEFAULT NULL,
`insert_time` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`order_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1$$
That's because 0.1 doesn't exist in floating point arithmetic.
It would take an infinity number of digits to print the real value of 0.1 in binary (just like it would take an infinity number of digits do print the real value of 10/3).
In your table, you are storing the price with a 'float' type, which is represented on 32 bits. The value 0.1 is rounded to 0.100000001490116119384765625 (which is the nearest representation of 0.1 in the float type format).
When you are requesting all rows where prices are equal to 0.1, I strongly suspect the interpreter to use the double type, or at least, a more precise type than float.
But let's consider it's using the double type on 64 bits.
In the double type, 0.1 is rounded to 0.1000000000000000055511151231257827021181583404541015625 .
When the engines makes the comparison, it leads to:
if (0.100000001490116119384765625 ==
0.1000000000000000055511151231257827021181583404541015625) ...
which is obviously false. But it's true for operator > .
I'm pretty sure that this where clause would work: "where price = 0.100000001490116119384765625"
By the way, when the result of your query tells you that the price is "0.1", it's a lie. The value is rounded to be "beautifully displayed".
There is no real solution to your problem, everybody knowing floating point arithmetic problems will discourage you to use equality comparison on floats.
You may use an epsilon for your request.
There is a very interesting article named "What Every Computer Scientist Should Know About Floating-Point Arithmetic"; you can find it there:
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html