Best structure for MySQL table to hold statistical data - mysql

I have a need for a solution that would allow me to track every single click (and the link clicked, and the date) in a web application (PHP5 / MySQL5.7). The simplest solution is obviously a simple table :
CREATE TABLE stats_data (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
log_date DATETIME NOT NULL DEFAULT NOW(),
link VARCHAR(512) NOT NULL
)
I'm not such how this scales up performance-wise, as the expected amount of clicks per day could well go above 10000.
Is this a reliable solution, say, after 5 months of data stored ?
What optimizations could make this solution perform better?
If not, what would be a better solution approach for this ?

Mostly it depends on your use-case. What queries do you want to run over this dataset?
I would definitely recommend some document oriented database (like Redis or MongoDb), but as I said, it depends how will you use your data.
If you want to stick to MySQL, I have some advice on how to make that solution more reliable.
Dont save every click into database each time is clicked, but store it into cache (memcached for example) and once every hour save into MySQL
Make own table for each month to not make searches in one large table. And backup that table each month.

I guess you could possible put the links in a separate table and have your table reference that as a foreign key. Should possible make it faster to for example check the number of clicks on a specific link.
Depending on how accurate you want the data you could also aggregate it into another table in maby a nightly running operation of some sort (scheduled sp should work).
That way you can have a table where you for example can se how many times a link was clicked in a specific interval, a day or an hour or whatever suits your needs. I've used this approach at work where we store statistic data on web-service calls in an application with very heavy load and it has been working fine with no performance issues what so ever.

There's a couple of thinks you can do to ensure performance:
Index log_date column so queries can run faster when searching for results by dates range (https://dev.mysql.com/doc/refman/5.5/en/column-indexes.html)
Create partitions by log_date column (https://dev.mysql.com/doc/refman/5.6/en/partitioning-types.html)
By partitioning data by date columns you can "separate" data by hour / day / week / month / year... whatever you want...
Example:
CREATE TABLE members (
firstname VARCHAR(25) NOT NULL,
lastname VARCHAR(25) NOT NULL,
username VARCHAR(16) NOT NULL,
email VARCHAR(35),
joined DATE NOT NULL
)
PARTITION BY RANGE( YEAR(joined) ) (
PARTITION p0 VALUES LESS THAN (1960),
PARTITION p1 VALUES LESS THAN (1970),
PARTITION p2 VALUES LESS THAN (1980),
PARTITION p3 VALUES LESS THAN (1990),
PARTITION p4 VALUES LESS THAN MAXVALUE
)
Therefore, imagining that you separates data by week, when you search by a log with date equal to '2016-08-25', that record will be searched only on logs with dates between '2016-08-22' and '2016-08-28'.
I hope this can help you.

Related

Database table with million of rows

example i have some gps devices that send info to my database every seconds
so 1 device create 1 row in mysql database with these columns (8)
id=12341 date=22.02.2018 time=22:40
langitude=22.236558789 longitude=78.9654582 deviceID=24 name=device-name someinfo=asdadadasd
so for 1 minute it create 60 rows , for 24 hours it create 864000 rows
and for 1 month(31days) 2678400 ROWS
so 1 device is creating 2.6 million rows per month in my db table ( records are deleted every month.)
so if there are more devices will be 2.6 Million * number of devices
so my questions are like this:
Question 1: if i make a search like this from php ( just for current day and for 1 device)
SELECT * FROM TABLE WHERE date='22.02.2018' AND deviceID= '24'
max possible results will be 86400 rows
will it overload my server too much
Question 2: limit with 5 hours (18000 rows) will that be problem for database or will it load server like first example or less
SELECT * FROM TABLE WHERE date='22.02.2018' AND deviceID= '24' LIMIT 18000
Question 3: if i show just 1 result from db will it overload server
SELECT * FROM TABLE WHERE date='22.02.2018' AND deviceID= '24' LIMIT 1
does it mean that if i have millions of rows and 1000rows will load server same if i show just 1 result
Millions of rows is not a problem, this is what SQL databases are designed to handle, if you have a well designed schema and good indexes.
Use proper types
Instead of storing your dates and times as separate strings, store them either as a single datetime or separate date and time types. See indexing below for more about which one to use. This is both more compact, allows indexing, faster sorting, and it makes available date and time functions without having to do conversions.
Similarly, be sure to use the appropriate numeric type for latitude, and longitude. You'll probably want to use numeric to ensure precision.
Since you're going to be storing billions of rows, be sure to use a bigint for your primary key. A regular int can only go up to about 2 billion.
Move repeated data into another table.
Instead of storing information about the device in every row, store that in a separate table. Then only store the device's ID in your log. This will cut down on your storage size, and eliminate mistakes due to data duplication. Be sure to declare the device ID as a foreign key, this will provide referential integrity and an index.
Add indexes
Indexes are what allows a database to search through millions or billions of rows very, very efficiently. Be sure there are indexes on the rows you use frequently, such as your timestamp.
A lack of indexes on date and deviceID is likely why your queries are so slow. Without an index, MySQL has to look at every row in the database known as a full table scan. This is why your queries are so slow, you're lacking indexes.
You can discover whether your queries are using indexes with explain.
datetime or time + date?
Normally it's best to store your date and time in a single column, conventionally called created_at. Then you can use date to get just the date part like so.
select *
from gps_logs
where date(created_at) = '2018-07-14'
There's a problem. The problem is how indexes work... or don't. Because of the function call, where date(created_at) = '2018-07-14' will not use an index. MySQL will run date(created_at) on every single row. This means a performance killing full table scan.
You can work around this by working with just the datetime column. This will use an index and be efficient.
select *
from gps_logs
where '2018-07-14 00:00:00' <= created_at and created_at < '2018-07-15 00:00:00'
Or you can split your single datetime column into date and time columns, but this introduces new problems. Querying ranges which cross a day boundary becomes difficult. Like maybe you want a day in a different time zone. It's easy with a single column.
select *
from gps_logs
where '2018-07-12 10:00:00' <= created_at and created_at < '2018-07-13 10:00:00'
But it's more involved with a separate date and time.
select *
from gps_logs
where (created_date = '2018-07-12' and created_time >= '10:00:00')
or (created_date = '2018-07-13' and created_time < '10:00:00');
Or you can switch to a database with partial indexes like Postgresql. A partial index allows you to index only part of a value, or the result of a function. And Postgresql does a lot of things better than MySQL. This is what I recommend.
Do as much work in SQL as possible.
For example, if you want to know how many log entries there are per device per day, rather than pulling all the rows out and calculating them yourself, you'd use group by to group them by device and day.
select gps_device_id, count(id) as num_entries, created_at::date as day
from gps_logs
group by gps_device_id, day;
gps_device_id | num_entries | day
---------------+-------------+------------
1 | 29310 | 2018-07-12
2 | 23923 | 2018-07-11
2 | 23988 | 2018-07-12
With this much data, you will want to rely heavily on group by and the associated aggregate functions like sum, count, max, min and so on.
Avoid select *
If you must retrieve 86400 rows, the cost of simply fetching all that data from the database can be costly. You can speed this up significantly by only fetching the columns you need. This means using select only, the, specific, columns, you, need rather than select *.
Putting it all together.
In PostgreSQL
Your schema in PostgreSQL should look something like this.
create table gps_devices (
id serial primary key,
name text not null
-- any other columns about the devices
);
create table gps_logs (
id bigserial primary key,
gps_device_id int references gps_devices(id),
created_at timestamp not null default current_timestamp,
latitude numeric(12,9) not null,
longitude numeric(12,9) not null
);
create index timestamp_and_device on gps_logs(created_at, gps_device_id);
create index date_and_device on gps_logs((created_at::date), gps_device_id);
A query can generally only use one index per table. Since you'll be searching on the timestamp and device ID together a lot timestamp_and_device combines indexing both the timestamp and device ID.
date_and_device is the same thing, but it's a partial index on just the date part of the timestamp. This will make where created_at::date = '2018-07-12' and gps_device_id = 42 very efficient.
In MySQL
create table gps_devices (
id int primary key auto_increment,
name text not null
-- any other columns about the devices
);
create table gps_logs (
id bigint primary key auto_increment,
gps_device_id int references gps_devices(id),
foreign key (gps_device_id) references gps_devices(id),
created_at timestamp not null default current_timestamp,
latitude numeric(12,9) not null,
longitude numeric(12,9) not null
);
create index timestamp_and_device on gps_logs(created_at, gps_device_id);
Very similar, but no partial index. So you'll either need to always use a bare created_at in your where clauses, or switch to separate date and time types.
Just read you question, for me the Answer is
Just create a separate table for Latitude and longitude and make your ID Foreign key and save it their.
Without knowing the exact queries you want to run I can just guess the best structure. Having said that, you should aim for the optimal types that use the minimum number of bytes per row. This should make your queries faster.
For example, you could use the structure below:
create table device (
id int primary key not null,
name varchar(20),
someinfo varchar(100)
);
create table location (
device_id int not null,
recorded_at timestamp not null,
latitude double not null, -- instead of varchar; maybe float?
longitude double not null, -- instead of varchar; maybe float?
foreign key (device_id) references device (id)
);
create index ix_loc_dev on location (device_id, recorded_at);
If you include the exact queries (naming the columns) we can create better indexes for them.
Since probably your query selectivity is bad, your queries may run Full Table Scans. For this case I took it a step further I used the smallest possible data types for the columns, so it will be faster:
create table location (
device_id tinyint not null,
recorded_at timestamp not null,
latitude float not null,
longitude float not null,
foreign key (device_id) references device (id)
);
Can't really think of anything smaller than this.
The best what I can recommend to you is to use time-series database for storing and accessing time-series data. You can host any kind of time-series database engine locally, just put a little bit more resources into development of it's access methods or use any specialized databases for telematics data like this.

Better Way of Storing Old Data for Faster Access

The application we are developing is writing around 4-5 millions rows of data every day. And, we need to save these data for the past 90 days.
The table user_data has the following structure (simplified):
id INT PRIMARY AUTOINCREMENT
dt TIMESTAMP CURRENT_TIMESTAMP
user_id varchar(20)
data varchar(20)
About the application:
Data that is older than 7 days old will not be written / updated.
Data is mostly accessed based on user_id (i.e. all queries will have WHERE user_id = XXX)
There are around 13000 users at the moment.
User can still access older data. But, in accessing the older data, we can restrict that he/she can only get the whole day data only and not a time range. (e.g. If a user attempts to get the data for 2016-10-01, he/she will get the data for the whole day and will not be able to get the data for 2016-10-01 13:00 - 2016-10-01 14:00).
At the moment, we are using MySQL InnoDB to store the latest data (i.e. 7 days and newer) and it is working fine and fits in the innodb_buffer_pool.
As for the older data, we created smaller tables in the form of user_data_YYYYMMDD. After a while, we figured that these tables cannot fit into the innodb_buffer_pool and it started to slow down.
We think that separating / sharding based on dates, sharding based on user_ids would be better (i.e. using smaller data sets based on user and dates such as user_data_[YYYYMMDD]_[USER_ID]). This will keep the table in much smaller numbers (only around 10K rows at most).
After researching around, we have found that there are a few options out there:
Using mysql tables to store per user per date (i.e. user_data_[YYYYMMDD]_[USER_ID]).
Using mongodb collection for each user_data_[YYYYMMDD]_[USER_ID]
Write the old data (json encoded) into [USER_ID]/[YYYYMMDD].txt
The biggest con I see in this is that we will have huge number of tables/collections/files when we do this (i.e. 13000 x 90 = 1.170.000). I wonder if we are approaching this the right way in terms of future scalability. Or, if there are other standardized solutions for this.
Scaling a database is an unique problem to the application. Most of the times someone else's approach cannot be used as almost all applications writes its data in its own way. So you have to figure out how you are going to manage your data.
Having said that, if your data continue to grow, best solution is the shadring where you can distribute the data across different servers. As long as bound to a single server like creating different tables you are getting hit by resource limits like memory, storage and processing power. Those cannot be increased unlimited manner.
How to distribute the data, that you have to figure out based on your business use cases. As you mentioned, if you are not getting more request on old data, the best way to distribute the data base on date. Like DB for 2016 data, DB for 2015 and so on. Later you may purge or shutdown the servers which you have more old data.
This is a big table, but not unmanageable.
If user_id + dt is UNIQUE, make it the PRIMARY KEY, and get rid if id, thereby saving space. (More in a minute...)
Normalize user_id to a SMALLINT UNSIGNED (2 bytes) or, to be safer MEDIUMINT UNSIGNED (3 bytes). This will save a significant amount of space.
Saving space is important for speed (I/O) for big tables.
PARTITION BY RANGE(TO_DAYS(dt))
with 92 partitions -- the 90 you need, plus 1 waiting to be DROPped and one being filled. See details here .
ENGINE=InnoDB
to get the PRIMARY KEY clustered.
PRIMARY KEY(user_id, dt)
If this is "unique", then it allows efficient access for any time range for a single user. Note: you can remove the "just a day" restriction. However, you must formulate the query without hiding dt in a function. I recommend:
WHERE user_id = ?
AND dt >= ?
AND dt < ? + INTERVAL 1 DAY
Furthermore,
PRIMARY KEY(user_id, dt, id),
INDEX(id)
Would also be efficient even if (user_id, dt) is not unique. The addition of id to the PK is to make it unique; the addition of INDEX(id) is to keep AUTO_INCREMENT happy. (No, UNIQUE(id) is not required.)
INT --> BIGINT UNSIGNED ??
INT (which is SIGNED) will top out at about 2 billion. That will happen in a very few years. Is that OK? If not, you may need BIGINT (8 bytes vs 4).
This partitioning design does not care about your 7-day rule. You may choose to keep the rule and enforce it in your app.
BY HASH
will not work as well.
SUBPARTITION
is generally useless.
Are there other queries? If so they must be taken into consideration at the same time.
Sharding by user_id would be useful if the traffic were too much for a single server. MySQL, itself, does not (yet) have a sharding solution.
Try TokuDB engine at https://www.percona.com/software/mysql-database/percona-tokudb
Archive data are great for TokuDB. You will need about six times less disk space to store AND memory to PROCESS your dataset compared to InnoDB or about 2-3 times less than archived myisam.
1 million+ tables sounds like a bad idea. Having sharding via dynamic table naming by the app code at runtime has also not been a favorable pattern for me. My first go-to for this type of problem would be partitioning. You probably don't want 400M+ rows in a single unpartitioned table. In MySQL 5.7 you can even subpartition (but that gets more complex). I would first range partition on your date field, with one partition per day. Index on the user_id. If you are on 5.7 and want to dabble with subpartitioning, I would suggest range partition by date, then hash subpartition by user_id. As a starting point, try 16 to 32 hash buckets. Still index the user_id field.
EDIT: Here's something to play with:
CREATE TABLE user_data (
id INT AUTO_INCREMENT
, dt TIMESTAMP DEFAULT CURRENT_TIMESTAMP
, user_id VARCHAR(20)
, data varchar(20)
, PRIMARY KEY (id, user_id, dt)
, KEY (user_id, dt)
) PARTITION BY RANGE (UNIX_TIMESTAMP(dt))
SUBPARTITION BY KEY (user_id)
SUBPARTITIONS 16 (
PARTITION p1 VALUES LESS THAN (UNIX_TIMESTAMP('2016-10-25')),
PARTITION p2 VALUES LESS THAN (UNIX_TIMESTAMP('2016-10-26')),
PARTITION p3 VALUES LESS THAN (UNIX_TIMESTAMP('2016-10-27')),
PARTITION p4 VALUES LESS THAN (UNIX_TIMESTAMP('2016-10-28')),
PARTITION pMax VALUES LESS THAN MAXVALUE
);
-- View the metadata if you're interested
SELECT * FROM information_schema.partitions WHERE table_name='user_data';

MySQL table setup for stock information

I am collecting about 3 - 6 millions lines of stock data per day and storing it in a MySQL database.
All of the data is coming from Interactive Brokers every piece of information comes with these five fields: Symbol, Date, Time, Value and Type (type being information on what type of data I am receiving such as price, volume etc)
Here is my create table statement. idticks is just my unique key but I almost never am able to use it in queries.
CREATE TABLE `ticks` (
`idticks` int(11) NOT NULL AUTO_INCREMENT,
`symbol` varchar(30) NOT NULL,
`date` int(11) NOT NULL,
`time` int(11) NOT NULL,
`value` double NOT NULL,
`type` double NOT NULL,
KEY `idticks` (`idticks`),
KEY `symbol` (`symbol`),
KEY `date` (`date`),
KEY `idx_ticks_symbol_date` (`symbol`,`date`),
KEY `idx_ticks_type` (`type`),
KEY `idx_ticks_date_type` (`date`,`type`),
KEY `idx_ticks_date_symbol_type` (`date`,`symbol`,`type`),
KEY `idx_ticks_symbol_date_time_type` (`symbol`,`date`,`time`,`type`)
) ENGINE=InnoDB AUTO_INCREMENT=13533258 DEFAULT CHARSET=utf8
/*!50100 PARTITION BY KEY (`date`)
PARTITIONS 1 */;
As you can see, I have no idea what I am doing because I just keep on creating indexes to make my queries go faster.
Right now the data is being stored on a rather slow computer for testing purposes so I understand that my queries are not nearly as fast as they could be (I have a 6 core, 64gig of ram, SSD machine arriving tomorrow which should help significantly)
That being said, I am running queries like this one
select time, value from ticks where symbol = "AAPL" AND date = 20150522 and type = 8 order by time asc
The query above, if I do not limit it, returns 12928 records for one of my test days and takes 10.2 seconds if I do it from cleared cache.
I am doing lots of graphing and eventually would like to be able to just query the data as I need to it graph. Right now I haven't noticed a lot of difference in speed between getting part of a days worth of data vs just getting the entire day's. It would be cool to have those queries respond fast enough that there is barely any delay when I moving to the next day/screen whatever.
Another query I am using for usability of a program I am writing to interact with the data include
String query = "select distinct `date` from ticks where symbol = '" + symbol + "' order by `date` desc";
But most of my need is the ability to pull a certain type of data from a certain day for a certain symbol like my first query.
I've googled all over the place and I think I understand that creating tons of indexes makes the database bigger and slows down the input speed (I get about 300 pieces of information per second on a busy day). Should I just index each column individually?
I am willing to throw more harddrives at things if it means responsive interface.
Basically, my questions relate to the creation/altering of my table. Based on the above query, can you think of anything I could do to make that faster? Or an indexing system that would help me out? Is InnoDB even the right engine? I tried googling this vs MyISam and after a couple of hours of this, I still wasn't sure.
Thanks :)
Combine date and time into a DATETIME field
Assuming Price and Volume always come in together, put them together (2 columns) and get rid if type.
Get rid of the AUTO_INCREMENT; change to PRIMARY KEY(symbol, datetime)
Get rid of any indexes that are the left part of some other index.
Once you are using DATETIME, use date ranges to find everything in a single date (if you need such). Do not use DATE(datetime) = '...', performance will be terrible.
Symbol can probably be ascii, not utf8.
Use InnoDB, the clustering of the Primary Key can be beneficial.
Do you expect to collect (and use) more data than will fit in innodb_buffer_pool_size? If so, we need to discuss your SELECTs and look into PARTITIONing.
Make those changes, then come back for more advice/abuse.
You're creating a historical database, so MyISAM would work as well as InnoDB. InnoDB is a transactional relational database, and is better suited for relational databases with multiple tables that must remain synchronized.
Your Stock table looks like this.
Stock
-----
Stock ID (idticks)
Symbol
Date
Time
Value
Type
It would be better if you combine the date and time into a time stamp column, and unpack the types like this.
Stock
-----
Stock ID
Symbol
Time Stamp
Volume
Open
Close
Bid
Ask
...
This makes it easier for the database to return rows for a query on a particular type, like the close value.
As far as indexes, you can create as many indexes as you want. You're adding (inserting) information, so the increased time to add information is offset by the decreased time to query the information.
I'd have a primary index on Stock ID, and a unique index on Symbol and Time Stamp descending. You could also have indexes on the values you query most often, like Close.

How to handle large amounts of data in MySQL database?

Background
I have spent couple of days trying to figure out how I should handle large amounts of data in MySQL. I have selected some programs and techniques for the new server for the software. I am probably going to use Ubuntu 14.04LTS running nginx, Percona Server and will be using TokuDB for the 3 tables I have planned and InnoDB for the rest of the tables.
But yet I have the major problem unresolved. How to handle the huge amount of data in database?
Data
My estimates for the possible data to receive is 500 million rows a year. I will be receiving measurement data from sensors every 4 minutes.
Requirements
Insertion speed is not very critical, but I want to be able to select few hundred measurements in 1-2 seconds. Also the amount of required resources is a key factor.
Current plan
Now I have thought of splitting the sensor data in 3 tables.
EDIT:
On every table:
id = PK, AI
sensor_id will be indexed
CREATE TABLE measurements_minute(
id bigint(20),
value float,
sensor_id mediumint(8),
created timestamp
) ENGINE=TokuDB;
CREATE TABLE measurements_hour(
id bigint(20),
value float,
sensor_id mediumint(8),
created timestamp
) ENGINE=TokuDB;
CREATE TABLE measurements_day(
id bigint(20),
value float,
sensor_id mediumint(8),
created timestamp
) ENGINE=TokuDB;
So I would be storing this 4 minute data for one month. After the data is 1 month old it would be deleted from minute table. Then average value would be calculated from the minute values and inserted into the measurements_hour table. Then again when the data is 1 year old all the hour data would be deleted and daily averages would be stored in measurements_day table.
Questions
Is this considered a good way of doing this? Is there something else to take in consideration? How about table partitioning, should I do that? How should I execute the splitting of the date into different tables? Triggers and procedures?
EDIT: My ideas
Any idea if MonetDB or Infobright would be any good for this?
I have a few suggestions, and further questions.
You have not defined a primary key on your tables, so MySQL will create one automatically. Assuming that you meant for "id" to be your primary key, you need to change the line in all your table create statements to be something like "id bigint(20) NOT NULL AUTO_INCREMENT PRIMARY KEY,".
You haven't defined any indexes on the tables, how do you plan on querying? Without indexes, all queries will be full table scans and likely very slow.
Lastly, for this use-case, I'd partition the tables to make the removal of old data quick and easy.
I had to solve that type of ploblem before, with nearly a Million rows per hour.
Some tips:
Engine Mysam. You don't need to update or manage transactions with that tables. You are going to insert, select the values, and eventualy delete it.
Be careful with the indexes. In my case, It was critical the insertion and sometimes Mysql queue was full of pending inserts. A insert spend more time if your table has more index. The indexes depends of your calculated values and when you are going to do it.
Sharding your buffer tables. I only trigger the calculated values when the table was ready. When I was calculating my a values in buffer_a table, it's because the insertions was on buffer_b one. In my case, I calculate the values every day, so I switch the destination table every day. In fact, I dumped all the data and exported it in another database to make the avg, and other process without disturb the inserts.
I hope you find this helpful.

What is MYSQL Partitioning?

I have read the documentation (http://dev.mysql.com/doc/refman/5.1/en/partitioning.html), but I would like, in your own words, what it is and why it is used.
Is it mainly used for multiple servers so it doesn't drag down one server?
So, part of the data will be on server1, and part of the data will be on server2. And server 3 will "point" to server1 or server2...is that how it works?
Why does MYSQL documentation focus on partitioning within the same server...if the purpose is to spread it across servers?
The idea behind partitioning isn't to use multiple servers but to use multiple tables instead of one table. You can divide a table into many tables so that you can have old data in one sub table and new data in another table. Then the database can optimize queries where you ask for new data knowing that they are in the second table. What's more, you define how the data is partitioned.
Simple example from the MySQL Documentation:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT,
store_id INT
)
PARTITION BY RANGE ( YEAR(separated) ) (
PARTITION p0 VALUES LESS THAN (1991),
PARTITION p1 VALUES LESS THAN (1996),
PARTITION p2 VALUES LESS THAN (2001),
PARTITION p3 VALUES LESS THAN MAXVALUE
);
This allows to speed up e.g.:
Dropping old data by simple:
ALTER TABLE employees DROP PARTITION p0;
Database can speed up a query like this:
SELECT COUNT(*)
FROM employees
WHERE separated BETWEEN '2000-01-01' AND '2000-12-31'
GROUP BY store_id;
Knowing that all data is stored only on the p2 partition.
A partitioned table is a single logical table that’s composed of multiple physical subtables.
The partitioning code is really just a wrapper around a set of Handler objects
that represent the underlying partitions, and it forwards requests to the storage engine
through the Handler objects. Partitioning is a kind of black box that hides the underlying
partitions from you at the SQL layer, although you can see them quite easily by
looking at the filesystem, where you’ll see the component tables with a hash-delimited
naming convention.
For example,
here’s a simple way to place each year’s worth of sales into a separate partition:
CREATE TABLE sales (
order_date DATETIME NOT NULL,
-- Other columns omitted
) ENGINE=InnoDB PARTITION BY RANGE(YEAR(order_date)) (
PARTITION p_2010 VALUES LESS THAN (2010),
PARTITION p_2011 VALUES LESS THAN (2011),
PARTITION p_2012 VALUES LESS THAN (2012),
PARTITION p_catchall VALUES LESS THAN MAXVALUE );
read more here.
It is not really about using different server instances (although that is sometimes a possibility), it is more about dividing your tables in different physical partitions.
It's dividing your tables and indexes into smaller pieces, and even subdivide it into even smaller pieces.
Think of it as having several million different magazines of different topics and different years (say 2000-2019) all in one big warehouse (one big table). Partitioning would mean that you would put them organized in different rooms inside that big warehouse. They still belong together inside the one warehouse, but now you group them on a logical level, depending on your database partitioning strategy.
Indexing is actually like keeping a table of which magazine is where in your warehouse, or in your rooms inside your warehouse. As you can see, there is a big difference between database partitioning and indexing, and they can be very well used together.
You can read more about it on my website on this article about Database Partitioning