I have a table having approx 1 crore record.
Required : I need to fire a query which will fetch records from this table having login_date of less than 6 months (yields 5 lacs records) and some conditions , query is talking approx 60sec.
Consideration : if i kept records of login date of last 6 month in a separate table then the query is talking just 1 to 2 seconds.
Solution ?
i should create a separate table by using trigger ?
or any other better solution is better .... like views or something similar ?
Are you using an index on this table? Creating a btree index on login_date should give you about the same performance as having a second table without the schema complexity.
Also, crore and lac aren't very common English words. Try "ten million" and "five hundred thousand", and more people should understand what you mean.
Related
For the following 2 tables structures, assuming the data volume is really high:
cars table
Id | brand name | make year | purchase year | owner name
Is there any query performance benefit with structuring it this way and joining the 2 tables instead?
cars table
Id | brand_id | make year | purchase year | owner name
brands table
Id | name
Also, if all 4 columns fall in my where clause, does it make sense indexing any?
I would at least have INDEX(owner_name) since that is very selective. Having INDEX(owner_name, model_year) won't help enough to matter for this type of data. There are other cases where I would recommend a 4-column composite index.
"data volume is really high". If you are saying there are 100K rows, then it does not matter much. If you are saying a billion rows, then we need to get into a lot more details.
"data volume is really high". 10 queries/second -- Yawn. 1000/second -- more details, please.
2 tables vs 1.
Data integrity - someone could mess up the data either way
Speed -- a 1-byte TINYINT UNSIGNED (range 0..255) is smaller than an average of about 7 bytes for VARCHAR(55) forbrand. But it is hardly enough smaller to matter on space or speed. (And if you goof and makebrand_idaBIGINT`, which is 8 bytes; well, oops!)
Indexing all columns is different than having no indexes. But "indexing all" is ambiguous:
INDEX(user), INDEX(brand), INDEX(year), ... is likely to make it efficient to search or sort by any of those columns.
INDEX(user, brand, year), ... makes it especially efficient to search by all those columns (with =), or certain ORDER BYs.
No index implies scanning the entire table for any SELECT.
Another interpretation of what you said (plus a little reading between the lines): Might you be searching by any combination of columns? Perhaps non-= things like year >= 2016? Or make IN ('Toyota', 'Nissan')?
Study http://mysql.rjweb.org/doc.php/index_cookbook_mysql
An argument for 1 table
If you need to do
WHERE brand = 'Toyota'
AND year = 2017
Then INDEX(brand, year) (in either order) is possible and beneficial.
But... If those two columns are in different tables (as with your 2-table example), then you cannot have such an index, and performance will suffer.
Transport table
id name
1 T1
2 T2
Pallets table
id name
1 P1
2 P2
Transport Pallet Capacity table
id transport_id pallet_id capacity
1 1 1 10
2 1 2 null
3 2 1 20
4 2 2 24
How to generate table like this:
id transport_id pallet_id_1_capacity pallet_id_2_capacity
1 1 10 null
2 2 20 24
Problem: pallets and transports can be added, so, neither quantity is known in advance.
For example, manager adds another pallet type and 'pallet_id_3_capacity' column should be generated (and can show null if no capacity data is yet available).
Another manager can fill 'transport pallet capacity' table later when notified.
Is there a way to build sql in mysql that will care about the above: specifically - dynamic number of pallets?
The SQL select-list must be fixed at the time you write the query. You can't make SQL that auto-expands its columns based on the data it finds.
But your request is common, it's called a pivot-table or a crosstab table.
The only solution is to do this in multiple steps:
Query to discover the distinct pallet ids.
Use application code to build a dynamic SQL query with as many columns as distinct pallet id values found in the first query.
Run the resulting dynamic SQL query.
This is true for all SQL databases, not just MySQL.
See MySQL pivot row into dynamic number of columns for a highly-voted solution for producing a pivot-table query in MySQL.
I am not voting your question as a duplicate of that question, because your query also involves transport_id, which will make the query solution a bit different. But reading about other pivot-table solutions should get you started.
I have to create a table that assigns an user id and a product id to some data (models 2 one to many relationships). I will make a lot of queries like
select * from table where userid = x;
The first thing that I am interested is how big should the table get before the query starts to be observable (let's say it takes more than 1 second).
Also, how this can be optimised?
I know that this might depend on the implementation. I will use mysql for this specific project, but I am interested in more general answers as well.
It all depends on the horse power of your machine. The make that query more efficient, create an index with "userid"
how big should the table get before the query starts to be observable (let's say it takes more than 1 second)
There are too many factors to deterministically measure run time. CPU speed, memory, I/O speed, etc. are just some of the external factors.
how this can be optimized?
That's more staightforward. If there is an index on userid then the query will likely to an index seek which is about as fast as you can get as far as finding the record. If the userid is a clustered index then it will be faster because it won't have to use the position from the index to find the record in data pages - the data is physically organized as part of the index.
let's say it takes more than 1 second
With an index on userid, Mysql will manage to find the correct row in (worst case) Oh (log n). In "seconds" it now depends on the performance of your machine.
It is impossible to give you an exact number, without considering how long one operation takes.
As an Example: Assuming you have a database with 4 records. This requires 2 operations worst case. Any time, you double your data, one more operation is required.
for example:
# records | # operations to find entry in worst case
2 1
4 2
8 3
16 4
...
4096 12
...
~1 B 30
~2 B 31
So, with a huge amount of records - time almost remains constant. For 1 Billion records, you would need to perform ~ 30 operations.
And like that it continues: 2 Billion records, 31 operations.
so, let's say your query executes in 0.001 second for 4096 entries (12 ops)
it would take arround (0.001 / 12 * 30) 0.0025 seconds for 1 Billion records.
Heavy Sidenode: this is just considering the runtime complexity of the binary search, but it shows how the complexity would scale.
In a nutshell: Your database would be unimpressed by a single query on an indexed value. However, if you run a heavy amount of those queries at the same time, time increases ofc.
I have 2 tables posts<id, user_id, text, votes_counter, created> and votes<id, post_id, user_id, vote>. Here the table vote can be either 1 (upvote) or -1(downvote). Now if I need to fetch the total votes(upvotes - downvotes) on a post, I can do it in 2 ways.
Use count(*) to count the number of upvotes and downvotes on that post from votes table and then do the maths.
Set up a counter column votes_counter and increment or decrement it everytime a user upvotes or downvotes. Then simply extract that votes_counter.
My question is which one is better and under what condition. By saying condition, I mean factors like scalability, peaktime et cetera.
To what I know, if I use method 1, for a table with millions of rows, count(*) could be a heavy operation. To avoid that situation, if I use a counter then during peak time, the votes_counter column might get deadlocked, too many users trying to update the counter!
Is there a third way better than both and as simple to implement?
The two approaches represent a common tradeoff between complexity of implementation and speed.
The first approach is very simple to implement, because it does not require you to do any additional coding.
The second approach is potentially a lot faster, especially when you need to count a small percentage of items in a large table
The first approach can be sped up by well designed indexes. Rather than searching through the whole table, your RDBMS could retrieve a few records from the index, and do the counts using them
The second approach can become very complex very quickly:
You need to consider what happens to the counts when a user gets deleted
You should consider what happens when the table of votes is manipulated by tools outside your program. For example, merging records from two databases may prove a lot more complex when the current counts are stored along with the individual ones.
I would start with the first approach, and see how it performs. Then I would try optimizing it with indexing. Finally, I would consider going with the second approach, possibly writing triggers to update counts automatically.
As this sounds a lot like StackExchange, I'll refer you to this answer on the meta about the database schema used on the site. The votes table looks like this:
Votes table:
Id
PostId
VoteTypeId, one of the following values:
1 - AcceptedByOriginator
2 - UpMod
3 - DownMod
4 - Offensive
5 - Favorite (if VoteTypeId = 5, UserId will be populated)
6 - Close
7 - Reopen
8 - BountyStart (if VoteTypeId = 8, UserId will be populated)
9 - BountyClose
10 - Deletion
11 - Undeletion
12 - Spam
15 - ModeratorReview
16 - ApproveEditSuggestion
UserId (only present if VoteTypeId is 5 or 8)
CreationDate
BountyAmount (only present if VoteTypeId is 8 or 9)
And so based on that it sounds like the way it would be run is:
SELECT VoteTypeId FROM Votes WHERE VoteTypeId = 2 OR VoteTypeId = 3
And then based on the value, do the maths:
int score = 0;
for each vote in voteQueryResults
if(vote == 2) score++;
if(vote == 3) score--;
Even with millions of results, this is probably going to be a very fast operation as it's so simple.
I actually have a table with 30 columns. In one day this table can get around 3000 new records!
The columns datas look like :
IMG Name Phone etc..
http://www.site.com/images/image.jpg John Smith 123456789 etc..
http://www.site.com/images/image.jpg Smith John 987654321 etc..
I'm looking a way to optimize the size of the table but also the response time of the sql queries. I was thinking of doing something like :
Column1
http://www.site.com/images/image.jpg|John Smith|123456789|etc..
And then via php i would store each value into an array..
Would it be faster ?
Edit
So to take an example of the structure, let's say i have two tables :
package
package_content
Here is the structure of the table package :
id | user_id | package_name | date
Here is the structure of the table package_content :
id | package_id | content_name | content_description | content_price | content_color | etc.. > 30columns
The thing is for each package i can get up to 16rows of content. For example :
id | user_id | package_name | date
260 11 Package 260 2013-7-30 10:05:00
id | package_id | content_name | content_description | content_price | content_color | etc.. > 30columns
1 260 Content 1 Content 1 desc 58 white etc..
2 260 Content 2 Content 2 desc 75 black etc..
3 260 Content 3 Content 3 desc 32 blue etc..
etc...
Then with php i make like that
select * from package
while not EOF {
show package name, date etc..
select * from package_content where package_content.package_id = package.id and package.id = package_id
while not EOF{
show package_content name, desc, price, color etc...
}
}
Would it be faster? Definitely not. If you needed to search by Name or Phone or etc... you'd have to pull those values out of Column1 every time. You'd never be able to optimize those queries, ever.
If you want to make the table smaller it's best to look at splitting some columns off into another table. If you'd like to pursue that option, post the entire structure. But note that the number of columns doesn't affect speed that much. I mean it can, but it's way down on the list of things that will slow you down.
Finally, 3,000 rows per day is about 1 million rows per year. If the database is tolerably well designed, MySQL can handle this easily.
Addendum: partial table structures plus sample query and pseudocode added to question.
The pseudocode shows the package table being queried all at once, then matching package_content rows being queried one at a time. This is a very slow way to go about things; better to use a JOIN:
SELECT
package.id,
user_id,
package_name,
date,
package_content.*
FROM package
INNER JOIN package_content on package.id = package_content.id
WHERE whatever
ORDER BY whatever
That will speed things up right away.
If you're displaying on a web page, be sure to limit results with a WHERE clause - nobody will want to see 1,000 or 3,000 or 1,000,000 packages on a single web page :)
Finally, as I mentioned before, the number of columns isn't a huge worry for query optimization, but...
Having a really wide result row means more data has to go across the wire from MySQL to PHP, and
It isn't likely you'll be able to display 30+ columns of information on a web page without it looking terrible, especially if you're reading lots of rows.
With that in mind, you'll be better of picking specific package_content columns in your query instead of picking them all with a SELECT *.
Don't combine any columns, this is no use and might even be slower in the end.
You should use indexes on a column where you query at. I do have a website with about 30 columns where atm are around 600.000 results. If you use EXPLAIN before a query, you should see if it uses any indexes. If you got a JOIN with 2 values and a WHERE at the same table. You should make a combined index with the 3 columns, in order from JOIN -> WHERE. If you join on the same table, you should see this as a seperate index.
For example:
SELECT p.name, p.id, c.name, c2.name
FROM product p
JOIN category c ON p.cat_id=c.id
JOIN category c2 ON c.parent_id=c2.id AND name='Niels'
WHERE p.filterX='blaat'
You should have an combined index at category
parent_id,name
AND
id (probably the AI)
A index on product
cat_id
filterX
With this easy solution you can optimize queries from NOT DOABLE to 0.10 seconds, or even faster.
If you use MySQL 5.6 you should step over to INNODB because MySQL is better with optimizing JOINS and sub queries. Also MySQL will try to run them into MEMORY which will make it a lot faster aswel. Please keep in mind that backupping INNODB tables might need some extra attention.
You might also think about making MEMORY tables for super fast querieing (you do still need indexes).
You can also optimize by making integers size 4 (4 bytes, not 11 characters). And not always using VARCHAR 255.