Working with millions of MySQL rows

Working with millions of MySQL rows - mysql

I am working on a way to record click times for each user on my website.
I have currently 600,000+ records when trying to think of a way to go about this.
CREATE TABLE IF NOT EXISTS `clicktime` (
`id` int(5) NOT NULL AUTO_INCREMENT,
`page` int(11) DEFAULT NULL,
`user` varchar(20) DEFAULT NULL,
`time` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=686277 ;
I will have to do ten of these searches per page. My blog shows a snippet of ten pages at once.
SELECT time
FROM clicktime
WHERE `page` = '112'
AND `user` = 'admin'
ORDER BY `id` ASC LIMIT 1
The call that looks like it's getting me, is the WHERE page = '112'
How can I make this work faster, it is taking up to 3 seconds to pull each call?

Though there are multiple things that could be better here (the time being a bigint for instance), the thing that will help you on short term is just to add an index on your user field.

Related

Mysql Very Slow Performance Using Order by Clause

I have a one table with millions of entry.Below is table structure.
CREATE TABLE `useractivity` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`userid` bigint(20) NOT NULL,
`likes` bigint(20) DEFAULT NULL,
`views` bigint(20) DEFAULT NULL,
`shares` bigint(20) DEFAULT NULL,
`totalcount` bigint(20) DEFAULT NULL,
`status` bigint(20) DEFAULT NULL,
`createdat` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `userid` (`userid`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
And Below is query in which i am getting slow performance.
SELECT userid,
(sum(likes)+SUM(views)+SUM(shares)+SUM(totalcount)+SUM(`status`)) as total
from useractivity
GROUP BY userid
ORDER BY total DESC
limit 0, 20;
When i am executing above query without ORDER BY then it gives me fast result set But when using ORDER BY then this query became slow,though i used limit for pagination.
What can I do to speed up this query?

You can't speed up the query as it is, MySQL needs to visit every single row and calculate the sum before sorting and finally returning the first rows. That is bound to take time. You can probably cheat though.
The most obvious approach would be to create a summary table with userid and total. Update it when the base table changes or recompute it regularly, whatever makes sense. In that table you can index total, which makes the query trivial.
Another option may be to find the top users. Most sites have users that are more active than the others. Keep the 1000 top users in a separate table, then use the same select but only for the top users (i.e. join with that table). Only the useractivity rows for the top users need to be visited, which should be fast. If 1000 users are not enough perhaps 10000 works.

How to store/query users meta-data by zip code

I have a complicated issue but rather than go into the specifics i have simplified it to the following.
Lets say we are trying to build a system, where users of the system can apply for priority levels on various services on a per zip-code basis. This system would have four tables like so...
CREATE TABLE `zip_code` (
`zip` varchar(7) NOT NULL DEFAULT '',
`lat` float NOT NULL DEFAULT '0',
`long` float NOT NULL DEFAULT '0'
PRIMARY KEY (`zip`,`lat`,`long`),
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `user` (
`user_id` int(10) NOT NULL AUTO_INCREMENT
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `service` (
`service_id` int(10) NOT NULL AUTO_INCREMENT
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `service_priority` (
`user_id` int(10) NOT NULL',
`service_id` int(10) NOT NULL',
`zip` varchar(7) NOT NULL,
`priority` tinyint(1) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Now lets also say that we have 45000 zip-codes, a few hundred services and a few thousand users, and that no user can have the same priority level as another user for the same service in the same zip code.
I need a query that if given a particular zip code, radius, service, and a user_id will return the highest available priority level for all other zip codes within that radius for that service.
And, also, would like to know any suggestions for restructuring this data.
The problem that i see happening here is as the user base grows, the service_priority table is going to get huge, in theory 45000 rows bigger for every user although in practice probably only 10000 rows bigger.
What can i do to mitigate these problems?

Switch to InnoDB.
zip_code table should probably have PRIMARY KEY(zip) unless you really want multiple rows for a given zip.
"no user can have the same priority level as another user for the same service in the same zip code" -- can be enforced by
service_priority : UNIQUE(service_id, user_id, zip)
Then your query may look something like
SELECT sp.*
FROM ( SELECT b.zip
FROM ( SELECT lat, lng FROM zip_code WHERE zip = '$zip' ) AS a
JOIN zip_code AS b
WHERE ... < $radius
) AS z
JOIN service_priority AS sp
WHERE sp.zip = z.zip
AND sp.user_id = $user_id
AND sp.service_id = $service_id
ORDER BY sp.priority DESC
LIMIT 1
Notes:
The index, above, is also tailored for this query.
The innermost query gets the one lat/lng for the center point.
The middle query focuses on finding the nearby zips. See the tag I added to find many questions discussion how to do that.
The outer query then filters results based on user and service.
Finally, the highest priority row is picked.

Optimize MySQL count query with JOIN

I have a query that takes about 20 seconds, I would like to understand if there is a way to optimize it.
Table 1:
CREATE TABLE IF NOT EXISTS `sessions` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=9845765 ;
And table 2:
CREATE TABLE IF NOT EXISTS `access` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`session_id` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `session_id ` (`session_id `)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=9467799 ;
Now, what I am trying to do is to count all the access connected to all sessions about one user, so my query is:
SELECT COUNT(*)
FROM access
INNER JOIN sessions ON access.session_id=session.id
WHERE session.user_id='6';
It takes almost 20 seconds...and for user_id 6 there are about 3 millions sessions stored.
There is anything I can do to optimize that query?

Change this line from the session table:
KEY `user_id` (`user_id`)
To this:
KEY `user_id` (`user_id`, `id`)
What this will do for you is allow you to complete the query from the index, without going back to the raw table. As it is, you need to do an index scan on the session table for your user_id, and for each item go back to the table to find the id for the join to the access table. By including the id in the index, you can skip going back to the table.
Sadly, this will make your inserts slower into that table, and it seems like this may be a bid deal, given just one user has 3 millions sessions. Sql Server and Oracle would address this by allowing you to include the id column in your index, without actually indexing on it, saving a little work at insert time, and also by allowing you specify a lower fill factor for the index, reducing the need to re-build or re-order the indexes at insert, but MySql doesn't support these.

Ordering in MySQL Bogs Down

I've been working on a small Perl program that works with a table of articles, displaying them to the user if they have not been already read. It has been working nicely and it has been quite speedy, overall. However, this afternoon, the performance has degraded from fast enough that I wasn't worried about optimizing the query to a glacial 3-4 seconds per query. To select articles, I present this query:
SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM `channelitem`
WHERE ciid NOT
IN (
SELECT ciid
FROM `uninet_channelitem_read`
WHERE uid = '1030'
)
AND (
cid =117
OR cid =308
OR cid =310
)
ORDER BY `channelitem`.`creationdate` DESC
LIMIT 0 , 100
The list of possible cid's varies and could be quite a bit more. In any case, I noted that about 2-3 seconds of the total time to make the query is devoted to "ORDER BY." If I remove that, it only takes about a half second to give me the query back. If I drop the subquery, the performance goes back to normal... but the subquery didn't seem to be problematic until just this afternoon, after working fine for a week or so.
Any ideas what could be slowing it down so much? What might I do to try to get the performance back up to snuff? The table being queried has 45,000 rows. The subquery's table has fewer than 3,000 rows at present.
Update: Incidentally, if anyone has suggestions on how to do multiple queries or some other technique that would be more efficient to accomplish what I am trying to do, I am all ears. I'm really puzzled how to solve the problem at this point. Can I somehow apply the order by before the join to make it apply to the real table and not the derived table? Would that be more efficient?
Here is the latest version of the query, derived from suggestions from #Gordon, below
SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM `channelitem`
LEFT JOIN (
SELECT ciid, dateRead
FROM `uninet_channelitem_read`
WHERE uid = '1030'
)alreadyRead ON channelitem.ciid = alreadyRead.ciid
WHERE (
alreadyRead.ciid IS NULL
)
AND `cid`
IN ( 6648, 329, 323, 6654, 6647 )
ORDER BY `channelitem`.`creationdate` DESC
LIMIT 0 , 100
Also, I should mention what my db structure looks like with regards to these two tables -- maybe someone can spot something odd about the structure:
CREATE TABLE IF NOT EXISTS `channelitem` (
`newsversion` int(11) NOT NULL DEFAULT '0',
`cid` int(11) NOT NULL DEFAULT '0',
`ciid` int(11) NOT NULL AUTO_INCREMENT,
`description` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`url` varchar(222) DEFAULT NULL,
`creationdate` datetime DEFAULT NULL,
`urgent` varchar(10) DEFAULT NULL,
`name` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`lastchanged` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`author` varchar(255) NOT NULL,
PRIMARY KEY (`ciid`),
KEY `newsversion` (`newsversion`),
KEY `cid` (`cid`),
KEY `creationdate` (`creationdate`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1638554365 ;
CREATE TABLE IF NOT EXISTS `uninet_channelitem_read` (
`ciid` int(11) NOT NULL,
`uid` int(11) NOT NULL,
`dateRead` datetime NOT NULL,
PRIMARY KEY (`ciid`,`uid`),
KEY `ciid` (`ciid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

It never hurts to try the left outer join version of such a query:
SELECT ci.ciid, ci.cid, ci.name, ci.description, ci.url, ci.creationdate, ci.author
FROM `channelitem` ci left outer join
(SELECT ciid
FROM `uninet_channelitem_read`
WHERE uid = '1030'
) cr
on ci.ciid = cr.ciid
where cr.ciid is null and
ci.cid in (117, 308, 310)
ORDER BY ci.`creationdate` DESC
LIMIT 0 , 100
This query will be faster with an index on uninet_channelitem_read(ciid) and probably on channelitem(cid, ciid, createddate).

The problem could be that you need to create an index on the channelitem table for the column creationdate. Indexes help a database to run queries faster. Here is a link about MySQL Indexing

Order by two fields - Indexing

So I've got a table with all users, and their values. And I want to order them after how much "money" they got. The problem is that they have money in two seperate fields: users.money and users.bank.
So this is my table structure:
CREATE TABLE IF NOT EXISTS `users` (
`id` int(4) unsigned NOT NULL AUTO_INCREMENT,
`username` varchar(54) COLLATE utf8_swedish_ci NOT NULL,
`money` bigint(54) NOT NULL DEFAULT '10000',
`bank` bigint(54) NOT NULL DEFAULT '10000',
PRIMARY KEY (`id`),
KEY `users_all_money` (`money`,`bank`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_swedish_ci AUTO_INCREMENT=100 ;
And this is the query:
SELECT id, (money+bank) AS total FROM users FORCE INDEX (users_all_money) ORDER BY total DESC
Which works fine, but when I run EXPLAIN it shows "Using filesort", and I'm wondering if there is any way to optimize it?

Because you want to sort by a derived value (one that must be calculated for each row) MySQL can't use the index to help with the ordering.
The only solution that I can see would be to create an additional total_money or similar column and as you update money or bank update that value too. You could do this in your application code or it would be possible to do this in MySQL with triggers too if you wanted.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Working with millions of MySQL rows - mysql

Though there are multiple things that could be better here (the time being a bigint for instance), the thing that will help you on short term is just to add an index on your user field.

Related

Mysql Very Slow Performance Using Order by Clause

How to store/query users meta-data by zip code

Optimize MySQL count query with JOIN

Ordering in MySQL Bogs Down

Order by two fields - Indexing

Categories

Resources