MySQL - How to maintain acceptable response time while querying with many filters (Should I use Redis?) - mysql

I have a table called users with a couple dozen columns such as height, weight, city, state, country, age, gender etc...
The only keys/indices on the table are for the columns id and email.
I have a search feature on my website that filters users based on these various columns. The query could contain anywhere from zero to a few dozen different where clauses (such as where `age` > 40).
The search is set to LIMIT 50 and ORDER BY `id`.
There are about 100k rows in the database right now.
If I perform a search with zero filters or loose filters, mysql basically just returns the first 50 rows and doesn't have to read very many more rows than that. It usually takes less than 1 second to complete this type of query.
If I create a search with a lot of complex filters (for instance, 5+ where clauses), MySQL ends up reading through the entire database of 100k rows, trying to accumulate 50 valid rows, and the resulting query takes about 30 seconds.
How can I more efficiently query to improve the response time?
I am open to using caching (I already use Redis for other caching purposes, but I don't know where to start with properly caching a MySQL table).
I am open to adding indices, although there are a lot of different combinations of where clauses that can be built. Also, several of the columns are JSON where I am searching for rows that contain certain elements. To my knowledge I don't think an index is a viable solution for that type of query.
I am using MySQL version 8.0.15.

In general you need to create indexes for the columns which are mentioned in the criteria of the WHERE clauses. And you can also create indexes for JSON columns, use generated column index: https://dev.mysql.com/doc/refman/8.0/en/create-table-secondary-indexes.html.

Per the responses in the comments from ysth and Paul, the problem was just the server capacity. After upgrading the an 8GB RAM server, to query times dropped to under 1s.

Related

Is selecting fewer columns speeding up my query?

I have seen several questions comparing select * to select by all columns explicitly, but what about fewer columns selected vs more.
In other words, is:
SELECT id,firstname,lastname,lastlogin,email,phone
More than negligibly faster than:
SELECT id,firstname,lastlogin
I realize there will be small differences for more data being transferred through the system and to the application, but this is a total data/load difference, not a cost of the query (larger data in the cells would have the same effect anyway I believe) - I'm only trying to optimize my query, as I will have to load ALL the data at some point anyway...
When my admin user logs in, I'm going to load the entire user database into a cache, but I can either query only critical data upfront to shave some execution time, or just get everything - if it works out roughly the same. I know more rows equals longer query execution - but what about more selected values in my query?
Under most circumstances, the only difference is going to be slightly larger data for these fields and the additional time to fetch them.
There are two things to consider:
If the additional fields are very big, then this could be a big difference in performance.
If there is an index that covers the columns you actually want, then the index can be used for the query. This could speed the query in the database.
In general, though, the advice is to return the columns you want to the application. If there is complex processing, you should consider doing that in the database rather than the application.

Storing the 20mln records in one table or two separated tables in MySql each 10mlns ?

In my project, there are 20 mln users in two types 10mln for the first type and 10 mln for the second type. These users have access rights to other tables and use them. Also, I am using MySql database. My question is, Will it affect the performance of database if I add these two types of users in one table with 20mln users. Will it be slower or 20 mln records doesn't affect the performance for DBMS ?
If there is a index on type then it wont matter much on number of records, though your hardware configuration is a different matter all together.
One more point to consider is, that are you doing query on both type in one statement or not. If not go for different tables , if yes it will be good to have them in one table to save a join.
Also do consider your schema as whole(which is not provided here)
20 million rows is well within the capability of MySQL. But you need to be considerate when forming your SQL queries as inefficient queries can lead to slow performance.
If you are using Laravel's Eloquent then that is mostly taken care of.
Also, you might want to read MySQL tuning

MySQL indexing - optional search criteria

"How many indexes should I use?" This question has been asked generally multiple times, I know. But I'm asking for an answer specific to my table structure and querying purposes.
I have a table with about 60 columns. I'm writing an SDK which has a function to fetch data based on optional search criteria. There are 10 columns for which the user can optionally pass in values (so the user might want all entries for a certain username and clientTimestamp, or all entries for a certain userID, etc). So potentially, we could be looking up data based on up to 10 columns.
This table will run INSERTS almost as often as SELECTS, and the table will usually have somewhere around 200-300K rows. Each row contains a significant amount of data (probably close to 0.5 MB).
Would it be a good or bad idea to have 10 indexes on this table?
Simple guide that may help you make a decision.
1. Index columns that have high selectivity.
2. Try normalizing your table (you mentioned username and userid columns; if it's not user table, no need for storing name here)
3. If your system is not abstract, it should be a number of parameters that are used more often than others. First of all, make sure you have indexes that support fast result retrieval with such parameters.

single table vs multiple table for millions of records

Here's the scenario, the old database has this kind of design
dbo.Table1998
dbo.Table1999
dbo.Table2000
dbo.table2001
...
dbo.table2011
and i merged all the data from 1998 to 2011 in this table dbo.TableAllYears
now they're both indexed by "application number" and has the same numbers of columns (56 columns actually..)
now when i tried
select * from Table1998
and
select * from TableAllYears where Year=1998
the first query has 139669 rows # 13 seconds
while the second query has same number of rows but # 30 seconds
so for you guys, i'm i just missing something or is multiple tables better than single table?
You should partition the table by year, this is almost equivalent to having different tables for each year. This way when you query by year it will query against a single partition and the performance will be better.
Try dropping an index on each of the columns that you're searching on (where clause). That should speed up querying dramatically.
So in this case, add a new index for the field Year.
I believe that you should use a single table. Inevitably, you'll need to query data across multiple years, and separating it into multiple tables is a problem. It's quite possible to optimize your query and your table structure such that you can have many millions of rows in a table and still have excellent performance. Be sure your year column is indexed, and included in your queries. If you really hit data size limitations, you can use partitioning functionality in MySQL 5 that allows it to store the table data in multiple files, as if it were multiple tables, while making it appear to be one table.
Regardless of that, 140k rows is nothing, and it's likely premature optimization to split it into multiple tables, and even a major performance detriment if you need to query data across multiple years.
If you're looking for data from 1998, then having only 1998 data in one table is the way to go. This is because the database doesn't have to "search" for the records, but knows that all of the records in this table are from 1998. Try adding the "WHERE Year=1998" clause to the Table1998 table and you should get a slightly better comparison.
Personally, I would keep the data in multiple tables, especially if it is a particularly large data set and you don't have to do queries on the old data frequently. Even if you do, you might want to look at creating a view with all of the table data and running the reports on that instead of having to query several tables.

What are some optimization techniques for MySQL table with 300+ million records?

I am looking at storing some JMX data from JVMs on many servers for about 90 days. This data would be statistics like heap size and thread count. This will mean that one of the tables will have around 388 million records.
From this data I am building some graphs so you can compare the stats retrieved from the Mbeans. This means I will be grabbing some data at an interval using timestamps.
So the real question is, Is there anyway to optimize the table or query so you can perform these queries in a reasonable amount of time?
Thanks,
Josh
There are several things you can do:
Build your indexes to match the queries you are running. Run EXPLAIN to see the types of queries that are run and make sure that they all use an index where possible.
Partition your table. Paritioning is a technique for splitting a large table into several smaller ones by a specific (aggregate) key. MySQL supports this internally from ver. 5.1.
If necessary, build summary tables that cache the costlier parts of your queries. Then run your queries against the summary tables. Similarly, temporary in-memory tables can be used to store a simplified view of your table as a pre-processing stage.
3 suggestions:
index
index
index
p.s. for timestamps you may run into performance issues -- depending on how MySQL handles DATETIME and TIMESTAMP internally, it may be better to store timestamps as integers. (# secs since 1970 or whatever)
Well, for a start, I would suggest you use "offline" processing to produce 'graph ready' data (for most of the common cases) rather than trying to query the raw data on demand.
If you are using MYSQL 5.1 you can use the new features.
but be warned they contain lot of bugs.
first you should use indexes.
if this is not enough you can try to split the tables by using partitioning.
if this also wont work, you can also try load balancing.
A few suggestions.
You're probably going to run aggregate queries on this stuff, so after (or while) you load the data into your tables, you should pre-aggregate the data, for instance pre-compute totals by hour, or by user, or by week, whatever, you get the idea, and store that in cache tables that you use for your reporting graphs. If you can shrink your dataset by an order of magnitude, then, good for you !
This means I will be grabbing some data at an interval using timestamps.
So this means you only use data from the last X days ?
Deleting old data from tables can be horribly slow if you got a few tens of millions of rows to delete, partitioning is great for that (just drop that old partition). It also groups all records from the same time period close together on disk so it's a lot more cache-efficient.
Now if you use MySQL, I strongly suggest using MyISAM tables. You don't get crash-proofness or transactions and locking is dumb, but the size of the table is much smaller than InnoDB, which means it can fit in RAM, which means much quicker access.
Since big aggregates can involve lots of rather sequential disk IO, a fast IO system like RAID10 (or SSD) is a plus.
Is there anyway to optimize the table or query so you can perform these queries
in a reasonable amount of time?
That depends on the table and the queries ; can't give any advice without knowing more.
If you need complicated reporting queries with big aggregates and joins, remember that MySQL does not support any fancy JOINs, or hash-aggregates, or anything else useful really, basically the only thing it can do is nested-loop indexscan which is good on a cached table, and absolutely atrocious on other cases if some random access is involved.
I suggest you test with Postgres. For big aggregates the smarter optimizer does work well.
Example :
CREATE TABLE t (id INTEGER PRIMARY KEY AUTO_INCREMENT, category INT NOT NULL, counter INT NOT NULL) ENGINE=MyISAM;
INSERT INTO t (category, counter) SELECT n%10, n&255 FROM serie;
(serie contains 16M lines with n = 1 .. 16000000)
MySQL Postgres
58 s 100s INSERT
75s 51s CREATE INDEX on (category,id) (useless)
9.3s 5s SELECT category, sum(counter) FROM t GROUP BY category;
1.7s 0.5s SELECT category, sum(counter) FROM t WHERE id>15000000 GROUP BY category;
On a simple query like this pg is about 2-3x faster (the difference would be much larger if complex joins were involved).
EXPLAIN Your SELECT Queries
LIMIT 1 When Getting a Unique Row
SELECT * FROM user WHERE state = 'Alabama' // wrong
SELECT 1 FROM user WHERE state = 'Alabama' LIMIT 1
Index the Search Fields
Indexes are not just for the primary keys or the unique keys. If there are any columns in your table that you will search by, you should almost always index them.
Index and Use Same Column Types for Joins
If your application contains many JOIN queries, you need to make sure that the columns you join by are indexed on both tables. This affects how MySQL internally optimizes the join operation.
Do Not ORDER BY RAND()
If you really need random rows out of your results, there are much better ways of doing it. Granted it takes additional code, but you will prevent a bottleneck that gets exponentially worse as your data grows. The problem is, MySQL will have to perform RAND() operation (which takes processing power) for every single row in the table before sorting it and giving you just 1 row.
Use ENUM over VARCHAR
ENUM type columns are very fast and compact. Internally they are stored like TINYINT, yet they can contain and display string values.
Use NOT NULL If You Can
Unless you have a very specific reason to use a NULL value, you should always set your columns as NOT NULL.
"NULL columns require additional space in the row to record whether their values are NULL. For MyISAM tables, each NULL column takes one bit extra, rounded up to the nearest byte."
Store IP Addresses as UNSIGNED INT
In your queries you can use the INET_ATON() to convert and IP to an integer, and INET_NTOA() for vice versa. There are also similar functions in PHP called ip2long() and long2ip().