Can we limit the number of rows in a table in MySQL? - mysql

Is it possible to set the number of rows that a table can accommodate in MySQL ?
I don't want to use any java code. I want to do this using pure mysql scripts.

I wouldn't recommend trying to limit the number of rows in a SQL table, unless you had a very good reason to do so. It seems you would be better off using a query like:
select top 1000 entityID, entityName from TableName
rather than physically limiting the rows of the table.
However, if you really want to limit it to 1000 rows:
delete from TableName where entityID not in (select top 1000 entityID from TableName)

Mysql supports a MAX_ROWS parameter when creating (and maybe altering?) a table. http://dev.mysql.com/doc/refman/5.0/en/create-table.html
Edit: Sadly it turns out this is only a hint for optimization
"The maximum number of rows you plan to store in the table. This is not a hard limit, but rather a hint to the storage engine that the table must be able to store at least this many rows."
.. Your question implied that scripts are ok; is it ridiculous to make one as simple as a cron job regularly dropping table rows above a given ID ? It's not nearly as elegant as it would've been to have mysql throw errors when something tries to add a row too many, but it would do the job - and you may be able to have your application also then check if it's ID is too high, and throw a warning to the user/relevant party.

Related

Memory engine table giving inconsistent results

Using MariaDB 10.6.7-MariaDB-2ubuntu1.1-log - Ubuntu 22.04
I have a relatively complex application that is making use of temporary memory (engine) tables. The tables are generated from select insert statements and then the data is manipulated. After which a series of selects are performed with one or more rejoins to the same table.
The tables have 8 indexes created with the table create statement some are hash and some are btree.
Doing exactly the same process using exactly the same data I am getting slightly different results (in terms of data return and the number of rows).
It's taken me a while to get to the root of this, I have figured out that the temp memory tables are identical in terms of data and the database calls are the same.
I have let the tables be created as permanent memory tables so I can see them in myphpadmin, rename them, and let them be created again. Then run the same query against each table, and I get different results. The table checksums are identical, row count is identical. but the same query gives different results when tables as memory engine. Convert them both to INNODB and both give the same result... is the memory engine broken?
Has anyone ever seen this before and got any ideas about what might be going on?
Thanks in advance.
I've found the answer, in case anyone is interested or has this same issue.
In my query, I have an order by and limit 1
example.
SELECT * FROM `tmp_lookup` T2
WHERE
T2.`from_location` = 'location1'
T2.`to_location` = 'location2'
T2.`depart_time` > '1970-01-01 22:14:00'
ORDER BY T2.`depart_time` ASC
LIMIT 1;
In the above example, when ordering by depart_time and limiting to 1, if there is more than one row with the same smallest depart_time it will return a random 1st row from the ones with the same value, depending on the table.
If you build a memory table and do the query it will consistently pick the same first row, delete the table, remake the table, rerun the query and it will consistently pick another first row.
I have no firm idea why, but I guess as it's in memory and not on disk the order of the rows in the table is 'random' utter guess.

Faster counts with mysql by sampling table

I'm looking for a way I can get a count for records meeting a condition but my problem is the table is billions of records long and a basic count(*) is not possible as it times out.
I thought that maybe it would be possible to sample the table by doing something like selecting 1/4th of the records. I believe that older records will be more likely to match so I'd need a method which accounts for this (perhaps random sorting).
Is it possible or reasonable to query a certain percent of rows in mysql? And is this the smartest way to go about solving this problem?
The query I currently have which doesn't work is pretty simple:
SELECT count(*) FROM table_name WHERE deleted_at IS NOT NULL
SHOW TABLE STATUS will 'instantly' give an approximate Row count. (There is an equivalent SELECT ... FROM information_schema.tables.) However, this may be significantly far off.
A count(*) on an index on any column in the PRIMARY KEY will be faster because it will be smaller. But this still may not be fast enough.
There is no way to "sample". Or at least no way that is reliably better than SHOW TABLE STATUS. EXPLAIN SELECT ... with some simple query will do an estimate; again, not necessarily any better.
Please describe what kind of data you have; there may be some other tricks we can use.
See also Random . There may be a technique that will help you "sample". Be aware that all techniques are subject to various factors of how the data was generated and whether there has been "churn" on the table.
Can you periodically run the full COUNT(*) and save it somewhere? And then maintain the count after that?
I assume you don't have this case. (Else the solution is trivial.)
AUTO_INCREMENT id
Never DELETEd or REPLACEd or INSERT IGNOREd or ROLLBACKd any rows
ADD an index key with deleted_at column, to improve time execution
and try to count id if id is set.

MySQL the fastest way to count thousands of rows

I have MySQL database which consist of 13 tables. One table transactions will store in future a lot of data (nearly one million records). This table use InnoDB storage Engine. So business rules require to know amount of all records in this table. So, my question is what is the faster way to count all of this records?
First
Of course I can use something like that:
SELECT COUNT(*) FROM transaction
but obviously this is not a best solution.
Second
I can create additional table where I can store incrementable variable
and add trigger which start executing when row was added into transaction table.
CREATE TRIGGER update_counter AFTER INSERT ON transaction
ON counter
BEGIN
count_var = count_var + 1;
END;
but what happens if 10 entries are added at the same time, for example?
And the last solution is to use information_schema. Something like that
SELECT TABLE_ROWS
FROM information_schema.tables
WHERE table_name = "transaction"
So what is the most appropriate way to resolve this situation?
A "business rule" that requires the exact value of a number around a million? Send that to Dilbert; the pointy-hair boss will love it.
Remember when search engines would show you the exact number of hits, yet they would return the value so fast that it was suspect? Then they got a little more honest and said "hits 1-20 out of more than 120,000"? Now they don't even bother.
You should as a serious question -- Why do you need the exact number? Will an approximate number do? Will the number as of last night suffice?
With those answers, we can help design a "good enough" computation that is also "fast enough".

Whether or not SQL query (SELECT) continues or stops reading data from table when find the value

Greeting,
My question; Whether or no sql query (SELECT) continues or stops reading data (records) from table when find the value that I was looking for?
referance: "In order to return data for this query, mysql must start at the beginning of the disk data file, read in enough of the record to know where the category field data starts (because long_text is variable length), read this value, see if it satisfies the where condition (and so decide whether to add to the return record set), then figure out where the next record set is, then repeat."
link for referance: http://www.verynoisy.com/sql-indexing-dummies/#how_the_database_finds_records_normally
In general you don't know and you don't care, but you have to adapt when queries take too long to execute. When you do something like
select a,b,c from mytable where a=3 and b=5
then the database engine has a couple of options to optimize. When all these options fail, then it will do a "full table scan" - which means, it will have to examine the entire table to see which rows are eligible. When you have indices on e.g. column a then the database engine can optimize the search because it can pre-select rows where a has value 3. So, in general, make sure that you have indices for the columns that are most searched. (Perversely, some database engines get confused when you have too many indices and will fall back to a full table scan because they've lost their way...)
As to whether or not the scanning stops: In general, the database engine has to examine all data in the table (hopefully aided by indices) and won't stop after having found just one hit. If you want just the first hit, use a limit 1 clause to make sure that your result set has only one outcome. But then again, if you have a sort by clause, the database engine cannot stop after the first hit, there might be next ones that should get priority given the sorting.
Summarizing, how the db engine does its scan depends on how smart it is, what indices are available etc.. If your select queries take too long then consider re-organizing your indices, writing your select statements differently, or rebuilding the table.
The RDBMS reading data from disk is something you cannot know, you should not care and you must not rely on.
The issue is too broad to get a precise answer. The engine reads data from storage in blocks, a block can contain records that are not needed by the query at hand. If all the columns needed by the query is available in an index, the RDBMS won't even read the data file, it will only use the index. The data it needs could already be cached in memory (because it was read during the execution of a previous query). The underlying OS and the storage media also keep their own caches.
On a busy system, all these factors could lead to very different storage access patterns while running the same query several times on a couple of minutes apart.
Yes it scans the entire file. Unless you put something like
select * from user where id=100 limit 1
This of course will still search entire rows if id 100 is the last record.
If id is a primary key it will automatically be indexed and searching would be optimized
I'm sorry... I thought the table.
I will change question and I will explain it in the following image;
I understand that in CASE 1 all columns must be read with each iteration.
My question is: If it's the same in the CASE 2 or columns that are not selected in the query are excluded from reading in each iteration.
Also, are the both queries are the some in performance perspective?
Clarify:
CASE: 1 In first CASE select print all data
CASE: 2 In second CASE select print columns first_name and last_name
Whether in CASE 2 mysql server (SQL query) reads only columns first_name, last_name or read the entire table to get that data(rows)=(first_name, last_name)?
An interest of me how the server reads table row in CASE 1 and CASE 2?

Limit maximum number of records/rows in Table [duplicate]

Is it possible to set the number of rows that a table can accommodate in MySQL ?
I don't want to use any java code. I want to do this using pure mysql scripts.
I wouldn't recommend trying to limit the number of rows in a SQL table, unless you had a very good reason to do so. It seems you would be better off using a query like:
select top 1000 entityID, entityName from TableName
rather than physically limiting the rows of the table.
However, if you really want to limit it to 1000 rows:
delete from TableName where entityID not in (select top 1000 entityID from TableName)
Mysql supports a MAX_ROWS parameter when creating (and maybe altering?) a table. http://dev.mysql.com/doc/refman/5.0/en/create-table.html
Edit: Sadly it turns out this is only a hint for optimization
"The maximum number of rows you plan to store in the table. This is not a hard limit, but rather a hint to the storage engine that the table must be able to store at least this many rows."
.. Your question implied that scripts are ok; is it ridiculous to make one as simple as a cron job regularly dropping table rows above a given ID ? It's not nearly as elegant as it would've been to have mysql throw errors when something tries to add a row too many, but it would do the job - and you may be able to have your application also then check if it's ID is too high, and throw a warning to the user/relevant party.