I am assisting to manage a data resource sitting on a SQL server (1.3M+ records, 125 Columns). The data model is fixed in its current state, though a handful of columns were added in the past month.
I have a DB app that copies a subset records from the primary table into a local table to let users efficiently review/edit, then write back the updated records to the server. Been working well since 2013. Typical subset is 3K to 10K records.
SELECT dbo_TblMatchedTb.* INTO TblMatchedTb
FROM dbo_TblMatchedTb
WHERE (((dbo_TblMatchedTb.INVID)=11339));
This week, while creating the local "copy", I saw an error for the first time:
"Record Too Large - err 3047"
Inspecting the dataset produced via the statement above (in a CSV export), I found 7 records MUCH longer than average. These records were 2087 chars wide, vs 1500 chars ave (including the CSV commas)
Via a bit of manual iteration, I was able to copy over the records when the max record width was shortened to < 1907 chars.
Question:
Is there an efficient method/query to measure the current total record width in the local table described above? (3K to 10K records, 125 columns). If I can ID records approaching some limit, I can TRIM several candidate data values (i.e.,from 255 to 100 char).
I can't touch the schema, but I can conditionally shorten some of the less than critical data values.
Any ideas?
Related
I have a table called users with a couple dozen columns such as height, weight, city, state, country, age, gender etc...
The only keys/indices on the table are for the columns id and email.
I have a search feature on my website that filters users based on these various columns. The query could contain anywhere from zero to a few dozen different where clauses (such as where `age` > 40).
The search is set to LIMIT 50 and ORDER BY `id`.
There are about 100k rows in the database right now.
If I perform a search with zero filters or loose filters, mysql basically just returns the first 50 rows and doesn't have to read very many more rows than that. It usually takes less than 1 second to complete this type of query.
If I create a search with a lot of complex filters (for instance, 5+ where clauses), MySQL ends up reading through the entire database of 100k rows, trying to accumulate 50 valid rows, and the resulting query takes about 30 seconds.
How can I more efficiently query to improve the response time?
I am open to using caching (I already use Redis for other caching purposes, but I don't know where to start with properly caching a MySQL table).
I am open to adding indices, although there are a lot of different combinations of where clauses that can be built. Also, several of the columns are JSON where I am searching for rows that contain certain elements. To my knowledge I don't think an index is a viable solution for that type of query.
I am using MySQL version 8.0.15.
In general you need to create indexes for the columns which are mentioned in the criteria of the WHERE clauses. And you can also create indexes for JSON columns, use generated column index: https://dev.mysql.com/doc/refman/8.0/en/create-table-secondary-indexes.html.
Per the responses in the comments from ysth and Paul, the problem was just the server capacity. After upgrading the an 8GB RAM server, to query times dropped to under 1s.
What are the limitations in terms of performance of MySQL when it comes to the amount of rows in a table? I currently have a running project that runs cronjobs every hour. Those gather data and write them into the database.
In order to boost the performance, I'm thinking about saving the data of those cronjobs in a table. (Not just the result, but all the things). The data itself will be something similar to this;
imgId (INT,FKEY->images.id) | imgId (INT,FKEY->images.id) | myData(INT)
So, the actual data per row is quite small. The problem is, that the amount of rows in this table will grow exponentially. With every imgId I add, I need the myData for every other image. That means, with 3000 images, I will have 3000^2 = 9 million rows (not counting the diagonals because I'm too lazy to do it now).
I'm concered about what MySQL can handle with such preconditions. Every hour will add roughly 100-300 new entries in the origin-table, meaning 10,000 to 90,000 new entries in the cross table.
Several questions arise:
Are there limitations to the number of rows in a table?
When (if) will MySQL significally drop performance?
What actions can I take to make this cross-table as fast (acessible-wise, writing doesn't have to be fast) as possible?
EDIT
I just finished by polynomial interpolation and it turns out the growth will not be as drastic as I originally thought. As the relation 1-2 has the same data as 2-1, I only need "half" a table, bringing the growth down to (x^2-x)/2.
Still, it will get a lot.
9 million rows is not a huge table. Given the structure you provided, as long as it's indexed properly performance of select / update / insert queries won't be an issue. DDL may be a bit slow.
Since all the rows are already described by a cartesian join, you don't need to populate the entire table.
If the order of the image pairs is not significant then you can save some space by sorting the attributes or using a two / three table schema where the imgIds are equivalent.
I have a 2 row table pertaining of a number, and that numbers cube. Right now, I have about 13 million numbers inserted, and that's growing very, very quickly.
Is there a faster way to output simple tables quicker than using a command like SELECT * FROM table?
My second question pertains to the selection of a range of numbers. As stated above, I have a large database growing extremely fast to hold numbers and their cubes. If you're wondering, I'm trying to find the 3 numbers that will sum up to 33 when cubed. So, I'm doing this by using a server/client program to send a range of numbers to a client so they can do the equations on said range of numbers.
So, for example, let's say that the first client chimes in. I give him a range of 0, 100. He than goes off to compute the numbers and report back to tell the server if he found the triplet. If he didn't the loop will just continue.
When the client is doing the calculations for the numbers by itself, it goes extremely slow. So, I have decided to use a database to store the cubed numbers so the client does not have to do the calculations. The problem is, I don't know how to access only a range of numbers. For example, if the client had the range 0-100, it would need to access the cubes of all numbers from 0-100.
What is the select command that will return a range of numbers?
The engine I am using for the table is MyISAM.
If your table "mytable" has two columns
number cube
0 0
1 1
2 8
3 27
the query command will be (Assuming the start of the range is 100 and the end is 200):
select number, cube from mytable where number between 100 and 200 order by number;
If you want this query to be as fast as possible, make sure of the following:
number is an index. Thus you don't need to do a table scan to find the start of your range.
the index you create is clustered. Clustered indexes are way faster for
scans like this as the leaf in the index is the record (in comparison, the leaf in a
non-clustered index is a pointer to the record which may be in a completely different
part of the disk). As well, the clustered index
forces a sorted structure on the
data. Thus you may be able to read all 100
records from a single block.
Of course, adding an index will make writing to the table slightly slower. As well, I am assuming you are writing to the table in order (i.e. 0,1,2,3,4 etc. not 10,5,100,3 etc.). Writes to tables with clustered indexes are very slow if you write to the table in a random order (as the DB has to keep moving records to fit the new ones in).
Does anyone have any tips that could help speed up a process of breaking down a table and inserting a large number of records into a new table.
I'm currently using Access and VBA to convert a table that contains records with a large string (700 + characters) into a new table where each character has its own record (row). I'm doing this by looping through the string 1 character at a time and inserting into the new table using simple DAO in VBA.
Currently I'm working with a small subset of data - 300 records each with a 700 character string. This process takes about 3 hours to run so isn't going to scale up to the full dataset of 50,000 records!
table 1 structure
id - string
001 - abcdefg
becomes
table 2 structure
id - string
001 - a
001 - b
001 - c
. .
. .
. .
I'm open to any suggestions that could improve things.
Cheers
Phil
Consider this example using Northwind. Create a table called Sequence with an INTEGER (Access = Long Integer) and populate it with values 1 to 20 (i.e. 20 row table). Then use this ACE/Jet syntax SQL code to parse each letter of the employees' last names:
SELECT E1.EmployeeID, E1.LastName, S1.seq, MID(E1.LastName, S1.Seq, 1)
FROM Employees AS E1, Sequence AS S1
WHERE S1.seq BETWEEN 1 AND LEN(E1.LastName);
When doing bulk inserts, you can often get a substantial performance boost by dropping the table's indexes, doing the bulk insert, and then restoring the indexes. In one case, when inserting a couple million records into a MySQL table, I've seen this bring the run time down from 17 hours to about 20 minutes.
I can't advise specifically regarding Access (I haven't used it since Access 2, 15 or so years ago), but the general technique is applicable to pretty much any database engine.
We have a routine that transposes data. Not sure if the code is optimized, but it runs significantly faster after the file has been compacted.
Doing a lot of deleting and rebuilding of tables bloats an .mdb file significantly.
I have a database efficiency question.
Here is some info about my table:
-table of about 500-1000 records
-records are added and deleted every day.
- usually have about the same amount being added and deleted every day (size of active records stays the same)
Now, my question is.....when I delete records,...should I (A) delete the record and move it to a new table?
Or,...should I (B) just have and "active" column and set the record to 0 when it is no long active.
The reason I am hesitant to use B is because my site is based on the user being able to filter/sort this table of 500-1000 records on the fly (using ajax)....so I need it to be as fast as possible,..(i'm guessing a table with more records would be slower to filter)...and I am using mySQL InnoDB.
Any input would be great, Thanks
Andrew
~1000 records is a very small number.
If a record can be deleted and re-added later, maybe it makes sense to have an "active" indicator.
Realistically, this isn't a question about DB efficiency but about network latency and the amount of data you're sending over the wire. As far as MySQL goes, 1000 rows or 100k rows are going to be lightning-fast, so that's not a problem.
However, if you've got a substantial amount of data in those rows, and you're transmitting it all to the client through AJAX for filtering, the network latency is your bottleneck. If you're transmitting a handful of bytes (say 20) per row and your table stays around 1000 records in length, not a huge problem.
On the other hand, if your table grows (with inactive records) to, say, 20k rows, now you're transmitting 400k instead of 20k. Your users will notice. If the records are larger, the problem will be more severe as the table grows.
You should really do the filtering on the server side. Let MySQL spend 2ms filtering your table before you spend a full second or two sending it through Ajax.
It depends on what you are filtering/sorting on and how the table is indexed.
A third, and not uncommon, option, you could have a hybrid approach where you inactivate records (B) (optionally with a timestamp) and periodically archive them to a separate table (A) (either en masse or based on the timestamp age).
Realistically, if your table is in the order 1000 rows, it's probably not worth fussing too much over it (assuming the scalability of other factors is known).
If you need to keep the records for some future purpose, I would set an Inactive bit.
As long as you have a primary key on the table, performance should be excellent when SELECTing the records.
Also, if you do the filtering/sorting on the client-side then the records would only have to be retrieved once.