I have to update the views of the current post. From table posts witch have data > 2 millions. And the loading time of the page is slow.
Tables:
idpost | iduser | views | title |
1 | 5675 | 45645 | some title |
2 | 345 | 457 | some title 2 |
6 | 45 | 98 | some title 3 |
and many more... up to 2 millions
And iduser have Index, idpost have Primary key.
If I seprate the data and make a new table post_views and use LEFT JOIN to get the value of the views. At first it will be fast since the new table is still small, but over time she as well will have > 2 millions rows. And again it will be slow. How you deal with huge table ?
Split the table
You should split the table to separate different things and prevent repetition of title data. This will be a better design. I suggest following schema:
posts(idpost, title)
post_views(idpost, iduser, views)
Updating views count
You will need to update views of only one row at a time. Because, someone views your page, then you update related row. So, just one row update at a time without a searching overhead (thanks to key & index). I didn't understand how this can make an overhead?
Getting total views
Probably, you run a query like this one:
SELECT SUM(views) FROM post_views WHERE idpost = 100
Yes, this can make an overhead. A solution may be to create anew table total_post_views and update corresponding value in this table after each update on post_views. Thus, you will get rid of the LEFT JOIN and access total view count directly.
But, updating for each update also makes an overhead. To increase performance, you can give up updating total_post_views after each update on post_views. If you choose this way, you can perform update:
periodically, say in each 30sec,
after certain update counts of post_views, say for each 30 update.
In this way, you will get approximate results, of course. If this is tolerable, then I suggest you to go this way.
Related
I have a Table which has 3 columns
ID, name, and class
Class can have one of the 2 values "A" or "B"
If I wanted to search a student based on ID, then my SQL query would be
SELECT *
FROM users
WHERE ID='4'
This gives me correct results.
Suppose if I know for a specific value 4 the probability to find this user in class A rows is very high than in class B
Is there a way to optimize this query OR
optimize the schema OR
should I partition the tables into 2 for class A and class B users?
(Or is it that MySQL already knows about these optimizations)
You asked whether knowing the probability of finding a row in one class or the other can affect query performance.
Don't even start thinking about this until you have many crore (many tens of millions) of rows to deal with. Seriously, Instead, spend your irreplaceable time learning about table indexing and getting your application finished.
Looking up rows based on id values is very fast indeed if your indexes are correct. Partitioning doesn't help performance much if at all for simple lookups. And, maintaining partitioned tables is a huge and ongoing pain in the neck.
A query of the form WHERE id = constant AND class = constant can be made very near optimal with a compound index on (id, class).
Good material to learn about SQL performance is here. http://use-the-index-luke.com/sql/table-of-contents
You should have a table for class :
id | name
---+-------
1 | A
2 | B
Then instead of class names store class ids in users :
id | name | classId
---+------+-----------
1 | u1 | 1
2 | u2 | 2
Is there a way to auto increment the id field of my database based on the values of two other columns in the inserted row?
I'd like to set up my database so that when multiple rows are inserted at the same time, they keep their tracknumber ordering. The ID field should auto increment based firstly on the automatically generated timestamp field, and then secondly the tracknumber contained within that timestamp.
Here's an example of how the database might look:
id | tracknumber | timestamp
________________________________________
1 | 1 | 2014-03-31 11:35:17
2 | 2 | 2014-03-31 11:35:17
3 | 3 | 2014-03-31 11:35:17
4 | 1 | 2014-04-01 09:10:14
5 | 2 | 2014-04-01 09:10:14
I've been reading up on triggers but not sure if that's appropriate here? I feel as though i'm missing an obvious function.
This is a bit long for a comment.
There is no automatic way to do this. You can do it with triggers, if you like. Note the plural, you will need triggers for insert, update, and delete, if you want the numbering to remain accurate as the data changes.
You can do this one the query side, if the goal is to enumerate the values. Here is one method using a subquery:
select t.*,
(select count(*) from table t2 where t2.timestamp = t.timestamp and t2.id <= t.id
) as tracknumber
from table t;
The performance of this might even be reasonable with an index on table(timestamp, id).
If the data is being created once, you can also populate the values using an update query.
If you are inserting them in one transaction and or script, then sort the data yourself in the server side according to these two fields (assuming you create timestamp on server side too because that would seem logical) and then insert the rows one after another. I don't think it is necessary to overthink this and look for a difficult approach in the database. Database will still be inserting rows one after another, not all at once so there is no way it will know that it needs to do some kind of sorting beforehand. It is you who has to do it.
I have a large database with two tables: stat and total.
The example of the relation is the following:
STAT:
| ID | total event |
+--------+--------------+
| 7 | 2 |
| 8 | 1 |
TOTAL:
|ID | Event |
+---+--------------+
| 7 | "hello" |
| 7 | "everybody" |
| 8 | "hi" |
This is a very simplified version; also consider that STAT table could have 500K records, and for each STAT I can have about 200 TOTAL rows.
Currently, if I run a simple SELECT query in table TOTAL the system is terribly slow.
Could anyone help me with some advice for the creation of the TOTAL table? Is it possible to say to MySQL that the id column is already sorted so that there is no reason to scan all the rows till the end where, for example, id=7?
Add INDEX(ID) to your tables (both), if you did not already.
SELECT COUNT(*) FROM TOTAL WHERE ID=7 -> if ID is indexed, this will be fast.
You can add an index, and furthermore you can partition your table.
As per #ypercube's comment, tables are not stored in a sorted state, so one cannot "tell" this to the database. However you can add an index on tables to make them faster to search.
One important thing to check - it looks like TOTAL.ID is intended as a foreign key - if so, the table TOTAL should have a primary key called ID. Rename the existing column of that name to STAT_ID instead, so it is obvious what it is a foreign key for. Then add an index on STAT_ID.
Lastly, as a point of style, I recommend that you make your table and column names case-insensitive, and write them in lower-case. It makes it easier to read SQL when keywords are in upper case, and database objects are in lower.
I have a more or less good working query (concerning to the result) but it takes about 45seconds to be processed. That's definitely too long for presenting the data in a GUI.
So my demand is to find a much faster/efficient query (something around a few milliseconds would be nice)
My data table has something around 3000 ~2,619,395 entries and is still growing.
Schema:
num | station | fetchDate | exportValue | error
1 | PS1 | 2010-10-01 07:05:17 | 300 | 0
2 | PS2 | 2010-10-01 07:05:19 | 297 | 0
923 | PS1 | 2011-11-13 14:45:47 | 82771 | 0
Explanation
the exportValue is always incrementing
the exportValue represents the actual absolute value
in my case there are 10 stations
every ~15 minutes 10 new entries are written to the table
error is just an indicator for a proper working station
Working query:
select
YEAR(fetchDate), station, Max(exportValue)-MIN(exportValue)
from
registros
where
exportValue > 0 and error = 0
group
by station, YEAR(fetchDate)
order
by YEAR(fetchDate), station
Output:
Year | station | Max-Min
2008 | PS1 | 24012
2008 | PS2 | 23709
2009 | PS1 | 28102
2009 | PS2 | 25098
My thoughts on it:
writing several queries with between statements like 'between 2008-01-01 and 2008-01-02' to fetch the MIN(exportValue) and between 2008-12-30 and 2008-12-31' to grab the MAX(exportValue) - Problem: a lot of queries and the problem with having no data in a specified time range (it's not guaranteed that there will be data)
limiting the resultset to my 10 stations only with using order by MIN(fetchDate) - problem: takes also a long time to process the query
Additional Info:
I'm using the query in a JAVA Application. That means, it would be possible to do some post-processing on the resultset if necessary. (JPA 2.0)
Any help/approaches/ideas are very appreciated. Thanks in advance.
Adding suitable indexes will help.
2 compound indexes will speed things up significantly:
ALTER TABLE tbl_name ADD INDEX (error, exportValue);
ALTER TABLE tbl_name ADD INDEX (station, fetchDate);
This query running on 3000 records should be extremely fast.
Suggestions:
do You have PK set on this table? station, fetchDate?
add indexes; You should experiment and try with indexes as rich.okelly suggested in his answer
depending on experiments with indexes, try breaking your query into multiple statements - in one stored procedure; this way You will not loose time in network traffic between multiple queries sent from client to mysql
You mentioned that You tried with separate queries and there is a problem when there is no data for particular month; it is regular case in business applications, You should handle it in a "master query" (stored procedure or application code)
guess fetchDate is current date and time at the moment of record insertion; consider keeping previous months data in sort of summary table with fields: year, month, station, max(exportValue), min(exportValue) - this means that You should insert summary records in summary table at the end of each month; deleting, keeping or moving detail records to separate table is your choice
Since your table is rapidly growing (every 15 minutes) You should take the last suggestion into account. Probably, there is no need to keep detailed history at one place. Archiving data is process that should be done as part of maintenance.
A colleague asked me to explain how indexes (indices?) boost up performance; I tried to do so, but got confused myself.
I used the model below for explanation (an error/diagnostics logging database). It consists of three tables:
List of business systems, table "System" containing their names
List of different types of traces, table "TraceTypes", defining what kinds of error messages can be logged
Actual trace messages, having foreign keys from System and TraceTypes tables
I used MySQL for the demo, however I don't recall the table types I used. I think it was InnoDB.
System TraceTypes
----------------------------- ------------------------------------------
| ID | Name | | ID | Code | Description |
----------------------------- ------------------------------------------
| 1 | billing | | 1 | Info | Informational mesage |
| 2 | hr | | 2 | Warning| Warning only |
----------------------------- | 3 | Error | Failure |
| ------------------------------------------
| ------------|
Traces | |
--------------------------------------------------
| ID | System_ID | TraceTypes_ID | Message |
--------------------------------------------------
| 1 | 1 | 1 | Job starting |
| 2 | 1 | 3 | System.nullr..|
--------------------------------------------------
First, i added some records to all of the tables and demonstrated that the query below executes in 0.005 seconds:
select count(*) from Traces
inner join System on Traces.System_ID = System.ID
inner join TraceTypes on Traces.TraceTypes_ID = TraceTypes.ID
where
System.Name='billing' and TraceTypes.Code = 'Info'
Then I generated more data (no indexes yet)
"System" contained about 100 entries
"TraceTypes" contained about 50 entries
"Traces" contained ~10 million records.
Now the previous query took 8-10 seconds.
I created indexes on Traces.System_ID column and Traces.TraceTypes_ID column. Now this query executed in milliseconds:
select count(*) from Traces where System_id=1 and TraceTypes_ID=1;
This was also fast:
select count(*) from Traces
inner join System on Traces.System_ID = System.ID
where System.Name='billing' and TraceTypes_ID=1;
but the previous query which joined all the three tables still took 8-10 seconds to complete.
Only when I created a compound index (both System_ID and TraceTypes_ID columns included in index), the speed went down to milliseconds.
The basic statement I was taught earlier is "all the columns you use for join-ing, must be indexed".
However, in my scenario I had indexes on both System_ID and TraceTypes_ID, however MySQL didn't use them. The question is - why? My bets is - the item count ratio 100:10,000,000:50 makes the single-column indexes too large to be used. But is it true?
First, the correct, and the easiest, way to analyze a slow SQL statement is to do EXPLAIN. Find out how the optimizer chose its plan and ponder on why and how to improve that. I'd suggest to study the EXPLAIN results with only 2 separate indexes to see how mysql execute your statement.
I'm not very familiar with MySQL, but it seems that there's restriction of MySQL 4 of using only one index per table involved in a query. There seems to be improvements on this since MySQL 5 (index merge), but I'm not sure whether it applies to your case. Again, EXPLAIN should tell you the truth.
Even with using 2 indexes per table allowed (MySQL 5), using 2 separate indexes is generally slower than compound index. Using 2 separate indexes requires index merge step, compared to the single pass of using a compound index.
Multi Column indexes vs Index Merge might be helpful, which uses MySQL 5.4.2.
It's not the size of the indexes so much as the selectivity that determines whether the optimizer will use them.
My guess would be that it would be using the index and then it might be using traditional look up to move to another index and then filter out. Please check the execution plan. So in short you might be looping through two indexes in nested loop. As per my understanding. We should try to make a composite index on column which are in filtering or in join and then we should use Include clause for the columns which are in select. I have never worked in MySql so my this understanding is based on SQL Server 2005.