I have a complicated issue but rather than go into the specifics i have simplified it to the following.
Lets say we are trying to build a system, where users of the system can apply for priority levels on various services on a per zip-code basis. This system would have four tables like so...
CREATE TABLE `zip_code` (
`zip` varchar(7) NOT NULL DEFAULT '',
`lat` float NOT NULL DEFAULT '0',
`long` float NOT NULL DEFAULT '0'
PRIMARY KEY (`zip`,`lat`,`long`),
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `user` (
`user_id` int(10) NOT NULL AUTO_INCREMENT
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `service` (
`service_id` int(10) NOT NULL AUTO_INCREMENT
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `service_priority` (
`user_id` int(10) NOT NULL',
`service_id` int(10) NOT NULL',
`zip` varchar(7) NOT NULL,
`priority` tinyint(1) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Now lets also say that we have 45000 zip-codes, a few hundred services and a few thousand users, and that no user can have the same priority level as another user for the same service in the same zip code.
I need a query that if given a particular zip code, radius, service, and a user_id will return the highest available priority level for all other zip codes within that radius for that service.
And, also, would like to know any suggestions for restructuring this data.
The problem that i see happening here is as the user base grows, the service_priority table is going to get huge, in theory 45000 rows bigger for every user although in practice probably only 10000 rows bigger.
What can i do to mitigate these problems?
Switch to InnoDB.
zip_code table should probably have PRIMARY KEY(zip) unless you really want multiple rows for a given zip.
"no user can have the same priority level as another user for the same service in the same zip code" -- can be enforced by
service_priority : UNIQUE(service_id, user_id, zip)
Then your query may look something like
SELECT sp.*
FROM ( SELECT b.zip
FROM ( SELECT lat, lng FROM zip_code WHERE zip = '$zip' ) AS a
JOIN zip_code AS b
WHERE ... < $radius
) AS z
JOIN service_priority AS sp
WHERE sp.zip = z.zip
AND sp.user_id = $user_id
AND sp.service_id = $service_id
ORDER BY sp.priority DESC
LIMIT 1
Notes:
The index, above, is also tailored for this query.
The innermost query gets the one lat/lng for the center point.
The middle query focuses on finding the nearby zips. See the tag I added to find many questions discussion how to do that.
The outer query then filters results based on user and service.
Finally, the highest priority row is picked.
Related
I currently am doing this to get some data from our table:
SELECT DISTINCT(CategoryID),Distance FROM glinks_DistancesForTowns WHERE LinkID = $linkID ORDER BY Distance LIMIT 20
I'm iterating over that for every link id we have (50k odd). Them I'm processing them in Perl with:
my #cats;
while (my ($catid,$distance) = $sth->fetchrow) {
push #cats, $cat;
}
I'm trying to see if there is a better way to do this in a sub-query with MySQL, vs doing 50k smaller queries (i.e one per link)
The basic structure of the table is:
glinks_Links
ID
glinks_DistancesForTowns
LinkID
CategoryID
Distance
I'm sure there must be a simple way to do it - but I'm just not seeing it.
As requested - here is a dump of the table structure. Its actually more complex than that, but the other fields just hold values so I've taken those bits out to give a cleaner over-view of the structure:
CREATE TABLE `glinks_DistancesForTowns` (
`LinkID` int(11) DEFAULT NULL,
`CategoryID` int(11) DEFAULT NULL,
`Distance` float DEFAULT NULL,
`isPaid` int(11) DEFAULT NULL,
KEY `LinkID` (`LinkID`),
KEY `CategoryID` (`CategoryID`,`isPaid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
CREATE TABLE `glinks_Links` (
`ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
`Title` varchar(100) NOT NULL DEFAULT '',
`URL` varchar(255) NOT NULL DEFAULT 'http://',
PRIMARY KEY (`ID`),
KEY `booking_hotel_id_fk` (`booking_hotel_id_fk`)
) ENGINE=MyISAM AUTO_INCREMENT=617547 DEFAULT CHARSET=latin1
This is the kind of thing I'm hoping for:
SELECT glinks_Links.ID FROM glinks_Links as links, glinks_DistancesForTowns as distance (
SELECT DISTINCT(CategoryID),Distance FROM distance WHERE distance.LinkID = links.ID ORDER BY Distance LIMIT 20
)
But obviously that doesn't work;)
It sounds like you want the top 20 towns by distance for each link, right?
MySQL 8.0 supports window functions, and this would be the way to write the query:
WITH cte AS (
SELECT l.ID, ROW_NUMBER() OVER(PARTITION BY l.ID ORDER BY d.Distance) AS rownum
FROM glinks_Links as l
JOIN glinks_DistancesForTowns AS d ON d.LinkID = l.ID
) SELECT ID FROM cte WHERE rownum <= 20;
Versions older than 8.0 do not support these features of SQL, so you have to get creative with user-defined variables or self-joins. See for example my answer to How to SELECT the newest four items per category?
I've been working on a small Perl program that works with a table of articles, displaying them to the user if they have not been already read. It has been working nicely and it has been quite speedy, overall. However, this afternoon, the performance has degraded from fast enough that I wasn't worried about optimizing the query to a glacial 3-4 seconds per query. To select articles, I present this query:
SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM `channelitem`
WHERE ciid NOT
IN (
SELECT ciid
FROM `uninet_channelitem_read`
WHERE uid = '1030'
)
AND (
cid =117
OR cid =308
OR cid =310
)
ORDER BY `channelitem`.`creationdate` DESC
LIMIT 0 , 100
The list of possible cid's varies and could be quite a bit more. In any case, I noted that about 2-3 seconds of the total time to make the query is devoted to "ORDER BY." If I remove that, it only takes about a half second to give me the query back. If I drop the subquery, the performance goes back to normal... but the subquery didn't seem to be problematic until just this afternoon, after working fine for a week or so.
Any ideas what could be slowing it down so much? What might I do to try to get the performance back up to snuff? The table being queried has 45,000 rows. The subquery's table has fewer than 3,000 rows at present.
Update: Incidentally, if anyone has suggestions on how to do multiple queries or some other technique that would be more efficient to accomplish what I am trying to do, I am all ears. I'm really puzzled how to solve the problem at this point. Can I somehow apply the order by before the join to make it apply to the real table and not the derived table? Would that be more efficient?
Here is the latest version of the query, derived from suggestions from #Gordon, below
SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM `channelitem`
LEFT JOIN (
SELECT ciid, dateRead
FROM `uninet_channelitem_read`
WHERE uid = '1030'
)alreadyRead ON channelitem.ciid = alreadyRead.ciid
WHERE (
alreadyRead.ciid IS NULL
)
AND `cid`
IN ( 6648, 329, 323, 6654, 6647 )
ORDER BY `channelitem`.`creationdate` DESC
LIMIT 0 , 100
Also, I should mention what my db structure looks like with regards to these two tables -- maybe someone can spot something odd about the structure:
CREATE TABLE IF NOT EXISTS `channelitem` (
`newsversion` int(11) NOT NULL DEFAULT '0',
`cid` int(11) NOT NULL DEFAULT '0',
`ciid` int(11) NOT NULL AUTO_INCREMENT,
`description` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`url` varchar(222) DEFAULT NULL,
`creationdate` datetime DEFAULT NULL,
`urgent` varchar(10) DEFAULT NULL,
`name` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`lastchanged` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`author` varchar(255) NOT NULL,
PRIMARY KEY (`ciid`),
KEY `newsversion` (`newsversion`),
KEY `cid` (`cid`),
KEY `creationdate` (`creationdate`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1638554365 ;
CREATE TABLE IF NOT EXISTS `uninet_channelitem_read` (
`ciid` int(11) NOT NULL,
`uid` int(11) NOT NULL,
`dateRead` datetime NOT NULL,
PRIMARY KEY (`ciid`,`uid`),
KEY `ciid` (`ciid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
It never hurts to try the left outer join version of such a query:
SELECT ci.ciid, ci.cid, ci.name, ci.description, ci.url, ci.creationdate, ci.author
FROM `channelitem` ci left outer join
(SELECT ciid
FROM `uninet_channelitem_read`
WHERE uid = '1030'
) cr
on ci.ciid = cr.ciid
where cr.ciid is null and
ci.cid in (117, 308, 310)
ORDER BY ci.`creationdate` DESC
LIMIT 0 , 100
This query will be faster with an index on uninet_channelitem_read(ciid) and probably on channelitem(cid, ciid, createddate).
The problem could be that you need to create an index on the channelitem table for the column creationdate. Indexes help a database to run queries faster. Here is a link about MySQL Indexing
As a simplified example, imagine that I'm selling widgets. I sell them nationwide (in both the U.S. and Canada) but there are some that can only be sold in certain areas (one or more U.S. states or Canadian provinces).
I'd like a good way to store this information, coupled with a fast way to query for the widgets that are available to a given user. "U.S., 50 states and D.C." is the most common value, so I'd rather not insert 51 rows.
MySQL doesn't support bitmap indexes, so that's ruled out.
Here are some combinations:
U.S. 50 states and D.C.
U.S. 50 states, D.C., Canada, but not Quebec.
U.S. 48 contiguous states and D.C.
U.S., D.C., but not Colorado
U.S., D.C., and territories (Puerto Rico, etc).
My user will have given me one value for their state/province and country.
Can you suggest a schema that provides good storage and fast matching?
Thanks!
You should build predefined sets of values and storing this set to the items.
With a value you retrieve the matching sets and the matching items.
CREATE TABLE `valuesets` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(50) NOT NULL DEFAULT '',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `valueset_items` (
`valueset_id` int(11) unsigned NOT NULL,
`value` varchar(20) NOT NULL DEFAULT '',
PRIMARY KEY (`valueset_id`,`value`),
CONSTRAINT `fk_valueset_items_valueset` FOREIGN KEY (`valueset_id`) REFERENCES `valuesets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `items` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(50) NOT NULL DEFAULT '',
`valueset_id` int(11) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `fk_items_valueset` (`valueset_id`),
CONSTRAINT `fk_items_valueset` FOREIGN KEY (`valueset_id`) REFERENCES `valuesets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
To select all items that matching a special value
SELECT *
FROM items
WHERE
valueset_id IN ( SELECT valueset_id
FROM valueset_items
WHERE `value` = 'A' )
SQL Fiddle DEMO
This is a MySQL SET type, assuming that you can keep your dataset down to 64 items (or, use multiple sets based on other conditions).
I thought I would expand on my answer, because I think some people just don't understand the power of the set. Example table:
CREATE TABLE `Test` (
`setid` int(10) unsigned NOT NULL AUTO_INCREMENT,
`setname` varchar(64) NOT NULL,
`setstate` set('AK','AL','AR','AZ','CA','CO','CT','DC','DE','FL','GA','HI','IA','ID','IL','IN','KS','KY','LA','MA','MD','ME','MI','MN','MO','MS','MT','NC','ND','NE','NH','NJ','NM','NV','NY','OH','OK','OR','PA','RI','SC','SD','TN','TX','UT','VA','VT','WA','WI','WV','WY') NOT NULL,
PRIMARY KEY (`setid`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
insert into `Test` values('1','test','AZ,CA,NJ,NM,NY,VA,VT');
Note that we use a single set field for states. More complex uses will likely require use of multiple sets, but the slightly more horizontal qword for each record may be cheaper than adding a large # of extra join operations on a lookup table that could easily reach a huge # of records on its on.
Below are 3 (functionally) equivalent pulls. Note that the bitmask is very much the fastest way to pull this data:
SELECT * FROM Test WHERE setstate & 1000;
For test #1, We use 1000 as the bitmask, because this corresponds to item #4 in our list (AZ). This is, by far, the fastest method... and there are few ways to store this data which will give you faster result potential.
SELECT * FROM Test WHERE setstate LIKE '%AZ%';
This method can use indexes, but will be somewhat slow because of the fuzzy match.
SELECT * FROM Test WHERE FIND_IN_SET('AZ',setstate);
This method will be faster than the fuzzy match, but its nature will pretty much require the use of a temporary table in most real-world uses.
I am working on a way to record click times for each user on my website.
I have currently 600,000+ records when trying to think of a way to go about this.
CREATE TABLE IF NOT EXISTS `clicktime` (
`id` int(5) NOT NULL AUTO_INCREMENT,
`page` int(11) DEFAULT NULL,
`user` varchar(20) DEFAULT NULL,
`time` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=686277 ;
I will have to do ten of these searches per page. My blog shows a snippet of ten pages at once.
SELECT time
FROM clicktime
WHERE `page` = '112'
AND `user` = 'admin'
ORDER BY `id` ASC LIMIT 1
The call that looks like it's getting me, is the WHERE page = '112'
How can I make this work faster, it is taking up to 3 seconds to pull each call?
Though there are multiple things that could be better here (the time being a bigint for instance), the thing that will help you on short term is just to add an index on your user field.
So I've got a table with all users, and their values. And I want to order them after how much "money" they got. The problem is that they have money in two seperate fields: users.money and users.bank.
So this is my table structure:
CREATE TABLE IF NOT EXISTS `users` (
`id` int(4) unsigned NOT NULL AUTO_INCREMENT,
`username` varchar(54) COLLATE utf8_swedish_ci NOT NULL,
`money` bigint(54) NOT NULL DEFAULT '10000',
`bank` bigint(54) NOT NULL DEFAULT '10000',
PRIMARY KEY (`id`),
KEY `users_all_money` (`money`,`bank`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_swedish_ci AUTO_INCREMENT=100 ;
And this is the query:
SELECT id, (money+bank) AS total FROM users FORCE INDEX (users_all_money) ORDER BY total DESC
Which works fine, but when I run EXPLAIN it shows "Using filesort", and I'm wondering if there is any way to optimize it?
Because you want to sort by a derived value (one that must be calculated for each row) MySQL can't use the index to help with the ordering.
The only solution that I can see would be to create an additional total_money or similar column and as you update money or bank update that value too. You could do this in your application code or it would be possible to do this in MySQL with triggers too if you wanted.