Limit maximum number of records/rows in Table [duplicate] - mysql

Is it possible to set the number of rows that a table can accommodate in MySQL ?
I don't want to use any java code. I want to do this using pure mysql scripts.

I wouldn't recommend trying to limit the number of rows in a SQL table, unless you had a very good reason to do so. It seems you would be better off using a query like:
select top 1000 entityID, entityName from TableName
rather than physically limiting the rows of the table.
However, if you really want to limit it to 1000 rows:
delete from TableName where entityID not in (select top 1000 entityID from TableName)

Mysql supports a MAX_ROWS parameter when creating (and maybe altering?) a table. http://dev.mysql.com/doc/refman/5.0/en/create-table.html
Edit: Sadly it turns out this is only a hint for optimization
"The maximum number of rows you plan to store in the table. This is not a hard limit, but rather a hint to the storage engine that the table must be able to store at least this many rows."
.. Your question implied that scripts are ok; is it ridiculous to make one as simple as a cron job regularly dropping table rows above a given ID ? It's not nearly as elegant as it would've been to have mysql throw errors when something tries to add a row too many, but it would do the job - and you may be able to have your application also then check if it's ID is too high, and throw a warning to the user/relevant party.

Related

Faster counts with mysql by sampling table

I'm looking for a way I can get a count for records meeting a condition but my problem is the table is billions of records long and a basic count(*) is not possible as it times out.
I thought that maybe it would be possible to sample the table by doing something like selecting 1/4th of the records. I believe that older records will be more likely to match so I'd need a method which accounts for this (perhaps random sorting).
Is it possible or reasonable to query a certain percent of rows in mysql? And is this the smartest way to go about solving this problem?
The query I currently have which doesn't work is pretty simple:
SELECT count(*) FROM table_name WHERE deleted_at IS NOT NULL
SHOW TABLE STATUS will 'instantly' give an approximate Row count. (There is an equivalent SELECT ... FROM information_schema.tables.) However, this may be significantly far off.
A count(*) on an index on any column in the PRIMARY KEY will be faster because it will be smaller. But this still may not be fast enough.
There is no way to "sample". Or at least no way that is reliably better than SHOW TABLE STATUS. EXPLAIN SELECT ... with some simple query will do an estimate; again, not necessarily any better.
Please describe what kind of data you have; there may be some other tricks we can use.
See also Random . There may be a technique that will help you "sample". Be aware that all techniques are subject to various factors of how the data was generated and whether there has been "churn" on the table.
Can you periodically run the full COUNT(*) and save it somewhere? And then maintain the count after that?
I assume you don't have this case. (Else the solution is trivial.)
AUTO_INCREMENT id
Never DELETEd or REPLACEd or INSERT IGNOREd or ROLLBACKd any rows
ADD an index key with deleted_at column, to improve time execution
and try to count id if id is set.

MySQL the fastest way to count thousands of rows

I have MySQL database which consist of 13 tables. One table transactions will store in future a lot of data (nearly one million records). This table use InnoDB storage Engine. So business rules require to know amount of all records in this table. So, my question is what is the faster way to count all of this records?
First
Of course I can use something like that:
SELECT COUNT(*) FROM transaction
but obviously this is not a best solution.
Second
I can create additional table where I can store incrementable variable
and add trigger which start executing when row was added into transaction table.
CREATE TRIGGER update_counter AFTER INSERT ON transaction
ON counter
BEGIN
count_var = count_var + 1;
END;
but what happens if 10 entries are added at the same time, for example?
And the last solution is to use information_schema. Something like that
SELECT TABLE_ROWS
FROM information_schema.tables
WHERE table_name = "transaction"
So what is the most appropriate way to resolve this situation?
A "business rule" that requires the exact value of a number around a million? Send that to Dilbert; the pointy-hair boss will love it.
Remember when search engines would show you the exact number of hits, yet they would return the value so fast that it was suspect? Then they got a little more honest and said "hits 1-20 out of more than 120,000"? Now they don't even bother.
You should as a serious question -- Why do you need the exact number? Will an approximate number do? Will the number as of last night suffice?
With those answers, we can help design a "good enough" computation that is also "fast enough".

Whether or not SQL query (SELECT) continues or stops reading data from table when find the value

Greeting,
My question; Whether or no sql query (SELECT) continues or stops reading data (records) from table when find the value that I was looking for?
referance: "In order to return data for this query, mysql must start at the beginning of the disk data file, read in enough of the record to know where the category field data starts (because long_text is variable length), read this value, see if it satisfies the where condition (and so decide whether to add to the return record set), then figure out where the next record set is, then repeat."
link for referance: http://www.verynoisy.com/sql-indexing-dummies/#how_the_database_finds_records_normally
In general you don't know and you don't care, but you have to adapt when queries take too long to execute. When you do something like
select a,b,c from mytable where a=3 and b=5
then the database engine has a couple of options to optimize. When all these options fail, then it will do a "full table scan" - which means, it will have to examine the entire table to see which rows are eligible. When you have indices on e.g. column a then the database engine can optimize the search because it can pre-select rows where a has value 3. So, in general, make sure that you have indices for the columns that are most searched. (Perversely, some database engines get confused when you have too many indices and will fall back to a full table scan because they've lost their way...)
As to whether or not the scanning stops: In general, the database engine has to examine all data in the table (hopefully aided by indices) and won't stop after having found just one hit. If you want just the first hit, use a limit 1 clause to make sure that your result set has only one outcome. But then again, if you have a sort by clause, the database engine cannot stop after the first hit, there might be next ones that should get priority given the sorting.
Summarizing, how the db engine does its scan depends on how smart it is, what indices are available etc.. If your select queries take too long then consider re-organizing your indices, writing your select statements differently, or rebuilding the table.
The RDBMS reading data from disk is something you cannot know, you should not care and you must not rely on.
The issue is too broad to get a precise answer. The engine reads data from storage in blocks, a block can contain records that are not needed by the query at hand. If all the columns needed by the query is available in an index, the RDBMS won't even read the data file, it will only use the index. The data it needs could already be cached in memory (because it was read during the execution of a previous query). The underlying OS and the storage media also keep their own caches.
On a busy system, all these factors could lead to very different storage access patterns while running the same query several times on a couple of minutes apart.
Yes it scans the entire file. Unless you put something like
select * from user where id=100 limit 1
This of course will still search entire rows if id 100 is the last record.
If id is a primary key it will automatically be indexed and searching would be optimized
I'm sorry... I thought the table.
I will change question and I will explain it in the following image;
I understand that in CASE 1 all columns must be read with each iteration.
My question is: If it's the same in the CASE 2 or columns that are not selected in the query are excluded from reading in each iteration.
Also, are the both queries are the some in performance perspective?
Clarify:
CASE: 1 In first CASE select print all data
CASE: 2 In second CASE select print columns first_name and last_name
Whether in CASE 2 mysql server (SQL query) reads only columns first_name, last_name or read the entire table to get that data(rows)=(first_name, last_name)?
An interest of me how the server reads table row in CASE 1 and CASE 2?

Fast mysql query to randomly select N usernames

In my jsp application I have a search box that lets user to search for user names in the database. I send an ajax call on each keystroke and fetch 5 random names starting with the entered string.
I am using the below query:
select userid,name,pic from tbl_mst_users where name like 'queryStr%' order by rand() limit 5
But this is very slow as I have more than 2000 records in my table.
Is there any better approach which takes less time and let me achieve the same..? I need random values.
How slow is "very slow", in seconds?
The reason why your query could be slow is most likely that you didn't place an index on name. 2000 rows should be a piece of cake for MySQL to handle.
The other possible reason is that you have many columns in the SELECT clause. I assume in this case the MySQL engine first copies all this data to a temp table before sorting this large result set.
I advise the following, so that you work only with indexes, for as long as possible:
SELECT userid, name, pic
FROM tbl_mst_users
JOIN (
-- here, MySQL works on indexes only
SELECT userid
FROM tbl_mst_users
WHERE name LIKE 'queryStr%'
ORDER BY RAND() LIMIT 5
) AS sub USING(userid); -- join other columns only after picking the rows in the sub-query.
This method is a bit better, but still does not scale well. However, it should be sufficient for small tables (2000 rows is, indeed, small).
The link provided by #user1461434 is quite interesting. It describes a solution with almost constant performance. Only drawback is that it returns only one random row at a time.
does table has indexing on name?
if not apply it
2.MediaWiki uses an interesting trick (for Wikipedia's Special:Random feature): the table with the articles has an extra column with a random number (generated when the article is created). To get a random article, generate a random number and get the article with the next larger or smaller (don't recall which) value in the random number column. With an index, this can be very fast. (And MediaWiki is written in PHP and developed for MySQL.)
This approach can cause a problem if the resulting numbers are badly distributed; IIRC, this has been fixed on MediaWiki, so if you decide to do it this way you should take a look at the code to see how it's currently done (probably they periodically regenerate the random number column).
3.http://jan.kneschke.de/projects/mysql/order-by-rand/

Can we limit the number of rows in a table in MySQL?

Is it possible to set the number of rows that a table can accommodate in MySQL ?
I don't want to use any java code. I want to do this using pure mysql scripts.
I wouldn't recommend trying to limit the number of rows in a SQL table, unless you had a very good reason to do so. It seems you would be better off using a query like:
select top 1000 entityID, entityName from TableName
rather than physically limiting the rows of the table.
However, if you really want to limit it to 1000 rows:
delete from TableName where entityID not in (select top 1000 entityID from TableName)
Mysql supports a MAX_ROWS parameter when creating (and maybe altering?) a table. http://dev.mysql.com/doc/refman/5.0/en/create-table.html
Edit: Sadly it turns out this is only a hint for optimization
"The maximum number of rows you plan to store in the table. This is not a hard limit, but rather a hint to the storage engine that the table must be able to store at least this many rows."
.. Your question implied that scripts are ok; is it ridiculous to make one as simple as a cron job regularly dropping table rows above a given ID ? It's not nearly as elegant as it would've been to have mysql throw errors when something tries to add a row too many, but it would do the job - and you may be able to have your application also then check if it's ID is too high, and throw a warning to the user/relevant party.