split table performance in mysql - mysql

everyone. Here is a problem in my mysql server.
I have a table about 40,000,000 rows and 10 columns.
Its size is about 4GB.And engine is innodb.
It is a master database, and only execute one sql like this.
insert into mytable ... on duplicate key update ...
And about 99% sqls executed update part.
Now the server is becoming slower and slower.
I heard that split table may enhance its performance. Then I tried on my personal computer, splited into 10 tables, failed , also tried 100 ,failed too. The speed became slower instead. So I wonder why splitting tables didn't enhance the performance?
Thanks in advance.
more details:
CREATE TABLE my_table (
id BIGINT AUTO_INCREMENT,
user_id BIGINT,
identifier VARCHAR(64),
account_id VARCHAR(64),
top_speed INT UNSIGNED NOT NULL,
total_chars INT UNSIGNED NOT NULL,
total_time INT UNSIGNED NOT NULL,
keystrokes INT UNSIGNED NOT NULL,
avg_speed INT UNSIGNED NOT NULL,
country_code VARCHAR(16),
update_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY(id), UNIQUE KEY(user_id)
);
PS:
I also tried different computers with Solid State Drive and Hard Disk Drive, but didn't help too.

Splitting up a table is unlikely to help at all. Ditto for PARTITIONing.
Let's count the disk hits. I will skip counting non-leaf nodes in BTrees; they tend to be cached; I will count leaf nodes in the data and indexes; they tend not to be cached.
IODKU does:
Read the index block containing the for any UNIQUE keys. In your case, that is probably user_id. Please provide a sample SQL statement. 1 read.
If the user_id entry is found in the index, read the record from the data as indexed by the PK(id) and do the UPDATE, and leave this second block in the buffer_pool for eventual rewrite to disk. 1 read now, 1 write later.
If the record is not found, do INSERT. The index block that needs the new row was already read, so it is ready to have a new entry inserted. Meanwhile, the "last" block in the table (due to id being AUTO_INCREMENT) is probably already cached. Add the new row to it. 0 reads now, 1 write later (UNIQUE). (Rewriting the "last" block is amortized over, say, 100 rows, so I am ignoring it.)
Eventually do the write(s).
Total, assuming essentially all take the UPDATE path: 2 reads and 1 write. Assuming the user_id follows no simple pattern, I will assume that all 3 I/Os are "random".
Let's consider a variation... What if you got rid of id? Do you need id anywhere else? Since you have a UNIQUE key, it could be the PK. That is replace your two indexes with just PRIMARY KEY(user_id). Now the counts are:
1 read
If UPDATE, 0 read, 1 write
If INSERT, 0 read, 0 write
Total: 1 read, 1 write. 2/3 as many as before. Better, but still not great.
Caching
How much RAM do you have?
What is the value of innodb_buffer_pool_size?
SHOW TABLE STATUS -- What are Data_length and Index_length?
I suspect that the buffer_pool is not big enough, and possible could be raised. If you have more than 4GB of RAM, make it about 70% of RAM.
Others
SSDs should have helped significantly, since you appear to be I/O bound. Can you tell whether you are I/O-bound or CPU-bound?
How many rows are you updating at once? How long does it take? Is it batched, or one at a time? There may be a significant improvement possible here.
Do you really need BIGINT (8 bytes)? INT UNSIGNED is only 4 bytes.
Is a transaction involved?
Is the Master having a problem? The Slave? Both? I don't want to fix the Master in such a way that it messes up the Slave.

Try to split your database into some mysql instances using mysql proxy just like mysql-proxy or haproxy instead of one mysql instance. Maybe you can have great performance.

Related

Very slow INSERTs into a large MySQL table without an AUTOINCREMENT primary key

I've recently noticed a significant increase in variance of time needed to complete simple INSERT statements. While these statements on average take around 11ms, they can sometimes take 10-30 seconds, and I even noticed them taking over 5 minutes to execute.
MySQL version is 8.0.24, running on Windows Server 2016. The server's resources are never overloaded as far as I can tell. There is an ample amount of cpu overhead for the server to use, and 32GB of ram is allocated to it.
This is the table I'm working with:
CREATE TABLE `saved_segment` (
`recording_id` bigint unsigned NOT NULL,
`index` bigint unsigned NOT NULL,
`start_filetime` bigint unsigned NOT NULL,
`end_filetime` bigint unsigned NOT NULL,
`offset_and_size` bigint unsigned NOT NULL DEFAULT '18446744073709551615',
`storage_id` tinyint unsigned NOT NULL,
PRIMARY KEY (`recording_id`,`index`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
The table has no other indices or foreign keys, nor is it used as a reference for a foreign key in any other table. The entire table size is approximately 20GB with around 281M rows, which doesn't strike me as too large.
The table is used almost entirely read-only, with up to 1000 reads per second. All of these reads happen in simple SELECT queries, not in complex transactions, and they utilize the primary key index efficiently. There are very few, if any, concurrent writes to this table. This has been done intentionally in order to try to figure out if it would help with slow inserts, but it didn't. Before that there were up to 10 concurrent inserts going on at all times. UPDATE or DELETE statements are never executed on this table.
The queries that I'm having trouble with are all structured like this. They never appear in a transaction. While the inserts are definitely not append-only according to the clustered primary key, the queries almost alwayas insert 1 to 20 adjacent rows into the table:
INSERT IGNORE INTO saved_segment
(recording_id, `index`, start_filetime, end_filetime, offset_and_size, storage_id) VALUES
(19173, 631609, 133121662986640000, 133121663016640000, 20562291758298876, 10),
(19173, 631610, 133121663016640000, 133121663046640000, 20574308942546216, 10),
(19173, 631611, 133121663046640000, 133121663076640000, 20585348350688128, 10),
(19173, 631612, 133121663076640000, 133121663106640000, 20596854568114720, 10),
(19173, 631613, 133121663106640000, 133121663136640000, 20609723363860884, 10),
(19173, 631614, 133121663136640000, 133121663166640000, 20622106425668780, 10),
(19173, 631615, 133121663166640000, 133121663196640000, 20634653501528448, 10),
(19173, 631616, 133121663196640000, 133121663226640000, 20646967172721148, 10),
(19173, 631617, 133121663226640000, 133121663256640000, 20657773176227488, 10),
(19173, 631618, 133121663256640000, 133121663286640000, 20668825200822108, 10)
This is the output for an EXPLAIN statement of the above query:
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
INSERT
saved_segment
NULL
ALL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
These problems are relatively recent and weren't apparent while the table was around twice as small.
I tried reducing the number of concurrent inserts into the table, from around 10 to 1. I also deleted foreign keys on some columns (recording_id) in order to speed up inserts farther. ANALYZE TABLE and schema profiling didn't yield any actionable information.
One solution I had in mind was to remove the clustered primary key, add an AUTOINCREMENT primary key and a regular index on the (recording_id, index) columns. In my mind this would help by making inserts 'append only'. I'm open to any and all suggestions, thanks in advance!
EDIT:
I'm going to address some points and questions raised in the comments and answers:
autocommit is set to ON
the value of innodb_buffer_pool_size is 21474836480, and the value of innodb_buffer_pool_chunk_size is 134217728
One comment raised a concern about contention between the read-locks used by reads, and the exclusive locks used by writes. The table is question is used somewhat like a cache, and I don't need reads to always reflect the most up to date state of the table, if that would mean increased performance. The table should however remain durable even in cases of server crashes and hardware failures. Is this possible to achieve with a more relaxed transaction isolation level?
The schema could definitely be optimized; recording_id could be 4 byte integer, end_filetime could instead be an elapsed value, and start_filetime could also probably be smaller. I'm afraid that these changes would just push the issue back for a while until the table grows in size to compensate for the space savings.
INSERTs into the table are always sequential
SELECTs executed on the table look like this:
SELECT TRUE
FROM saved_segment
WHERE recording_id = ? AND `index` = ?
SELECT index, start_filetime, end_filetime, offset_and_size, storage_id
FROM saved_segment
WHERE recording_id = ? AND
start_filetime >= ? AND
start_filetime <= ?
ORDER BY `index` ASC
The second type of query could definitely be improved with an index, but I'm afraid this would further degrade INSERT performance.
Another thing that I forgot to mention is the existence of a very similar table to this one. It is queried and inserted into in exactly the same manner, but might further contribute to IO starvation.
EDIT2:
Results of SHOW TABLE STATUS for the table saved_segment, and a very similar table saved_screenshot (this one has an aditional INDEX on an bigint unsigned not null column).
Name
Engine
Version
Row_format
Rows
Avg_row_length
Data_length
Max_data_length
Index_length
Data_free
Auto_increment
Create_time
Update_time
Check_time
Collation
Checksum
Create_options
Comment
saved_screenshot
InnoDB
10
Dynamic
483430208
61
29780606976
0
21380464640
6291456
NULL
"2021-10-21 01:03:21"
"2022-11-07 16:51:45"
NULL
utf8mb4_0900_ai_ci
NULL
saved_segment
InnoDB
10
Dynamic
281861164
73
20802699264
0
0
4194304
NULL
"2022-11-02 09:03:05"
"2022-11-07 16:51:22"
NULL
utf8mb4_0900_ai_ci
NULL
I'll go out on a limb with this Answer.
Assuming that
The value of innodb_buffer_pool_size is somewhat less than 20MB, and
Those 1K Selects/second randomly reach int various parts of the table, then
The system recently become I/O bound because the 'next' block needed for the next Select is becoming more often not cached in the buffer_pool.
The simple solution is to get more RAM and up the setting of that tunable. But the table will only grow to whatever next limit you purchase.
Instead, here are some partial solutions.
If the numbers don't get too big, the first two columns could be INT UNSIGNED (4 bytes instead of 8) or maybe even MEDIUMINT UNSIGNED (3 bytes). Caution the ALTER TABLE would lock the table for a long time.
Those start and end times look like timestamps with fractional seconds that are always ".000". DATETIME and TIMESTAMP take 5 bytes (versus 8).
Your sample shows 0 elapsed time. If (end-start) is usually very small, then storing elapsed instead of endtime would further shrink the data. (But make it messy to use the endtime).
The sample data you presented looks "consecutive". That is about as efficient as an autoincrement. Is that the norm? If not, the INSERTs could be part of the I/O thrashing.
Your suggestion of adding AI, plus a secondary index, sort of doubles the effort for Inserts; so I do not recommend it.
More
just push the issue back for a while until the table grows in size
Yes, that will be the case.
Both of your queries are optimally helped by this as an INDEX or, even better, as the start of the PRIMARY KEY:
(recording_id, index)
Re:
SELECT TRUE
FROM saved_segment
WHERE recording_id = ? AND `index` = ?
If that is used to control some other SQL, consider adding this to that other SQL:
... EXISTS ( SELECT 1
FROM saved_segment
WHERE recording_id = ? AND `index` = ? ) ...
That query (in either form) needs what you already have
PRIMARY KEY(recording_id, index)
Your other query needs
INDEX(recording_id, start_filetime)
So, add that INDEX, or...
Even better... This combination would be better for both SELECTs:
PRIMARY KEY(recording_id, start_filetime, index).
INDEX(recording_id, index)
With that combo,
The single-row existence check would be performed "Using index" because it is "covering".
And the other query would find all the relevant rows clustered together on the PK.
(The PK has those 3 columns because it needs to be Unique. And they are in that order to benefit your second query. And it is the PK, not just an INDEX so it does not need to bounce between the index's BTree and the data's BTree.)
The "clustering" may help your performance by cutting down on the number of disk blocks needed for such queries. This leads to less "thrashing" in the buffer_pool, hence less need to increase RAM.
My index suggestions are mostly orthogonal to my datatype suggestions.

Updating single table frequently vs using another table and CRON to import changes into main table in MySQL?

I have a table with login logs which is EXTREMELY busy and large InnoDB table. New rows are inserted all the time, the table is queried by other parts of the system, it is by far the busiest table in the DB. In this table, there is logid which is PRIMARY KEY and its generated as a random hash by software (not auto increment ID). I also want to store some data like number of items viewed.
create table loginlogs
(
logid bigint unsigned primary key,
some_data varchar(255),
viewed_items biging unsigned
)
viewed_items is a value that will get updated for multiple rows very often (assume thousands of updates / second). The dilemma I am facing now is:
Should I
UPDATE loginlogs SET viewed_items = XXXX WHERE logid = YYYYY
or should I create
create table loginlogs_viewed_items
(
logid bigint unsigned primary key,
viewed_items biging unsigned,
exported tinyint unsigned default 0
)
and then execute with CRON
UPDATE loginlogs_viewed_items t
INNER JOIN loginlogs l ON l.logid = t.logid
SET
t.exported = 1,
l.viewed_items = t.viewed_items
WHERE
t.exported = 0;
e.g. every hour?
Note that either way the viewed_items counter will be updated MANY TIMES for one logid, it can be even 100 / hour / logid and there is tons of rows. So whichever table I chose for this, either the main one or the separate one, it will be getting updated quite frequently.
I want to avoid unnecessary locking of loginlogs table and at the same time I do not want to degrade performance by duplicating data in another table.
Hmm, I wonder why you'd want to change log entries and not just add new ones...
But anyway, as you said either way the updates have to happen, whether individually or in bulk.
If you have less busy time windows updating in bulk then might have an advantage. Otherwise the bulk update may have more significant impact when running in contrast to individual updates that might "interleave" more with the other operations making the impact less "feelable".
If the column you need to update is not needed all the time, you could think of having a separate table just for this column. That way queries that just need the other columns may be less affected by the updates.
"Tons of rows" -- To some people, that is "millions". To others, even "billions" is not really big. Please provide some numbers; the answer can be different. Meanwhile, here are some general principles.
I will assume the table is ENGINE=InnoDB.
UPDATEing one row at a time is 10 times as costly as updating 100 rows at a time.
UPDATEing more than 1000 rows in a single statement is problematic. It will lock each row, potentially leading to delays in other statements and maybe even deadlocks.
Having a 'random' PRIMARY KEY (as opposed to AUTO_INCREMENT or something roughly chronologically ordered) is very costly when the table is bigger than the buffer_pool. How much RAM do you have?
"the table is queried by other parts of the system" -- by the random PK? One row at a time? How frequently?
Please elaborate on how exported works. For example, does it get reset to 0 by something else?
Is there a single client doing all the work? Or are there multiple servers throwing data and queries at the table? (Different techniques are needed.)

mysql select * by index is very slow

I have a table like this:
create table test (
id int primary key auto_increment,
idcard varchar(30),
name varchar(30),
custom_value varchar(50),
index i1(idcard)
)
I insert 30,000,000 rows to the table
and then I execute:
select * from test where idcard='?'
The statement cost 12 seconds to return
when I use iostat to monitor disk
the read speed is about 6 mb/s while the util is 94%
is any way to optimize it?
12 seconds may be realistic.
Assumptions about the question:
A total of 30M rows, but only 3000 rows in the resultset.
Not enough room to cache things in RAM or you are running from a cold start.
InnoDB or MyISAM (the analysis is the same; the details are radically different).
Any CHARACTER SET and COLLATION for idcard.
INDEX(idcard) exists and is used in the query.
HDD disk drive, not SSD.
Here's a breakdown of the processing:
Go to the index, find the first entry with ?, scan forward until hitting an entry that is not ? (about 3K rows later).
For each of those 3K items, reach into the table to find all the columns (cf SELECT *.
Deliver them.
Step 1: Fast.
Step 2: This is (based on the assumption of not being cached) costly. It may involve about 3K disk hits. For an HDD, that would be about 30 seconds. So, 12 seconds could imply some of the stuff was cached or happened to be near each other.
Step 3: This is a network cost, which I am not considering.
Run the query a second time. It may take only 1 second the this time -- because all 3K blocks are cached in RAM! And iostat will show zero activity!
is any way to optimize it?
Well...
You already have the best index.
What are you going to do with 3000 rows all at once? Is this a one-time task?
When using InnoDB, innodb_buffer_pool_size should be about 70% of available RAM, but not so big that it leads to swapping. What is its setting, and how much RAM do you have and what else is running on the machine?
Could you do more of the task while you are fetching the 3K rows?
Switching to SSDs would help, but I don't like hardware bandaids; they are not reusable.
How big is the table (in GB) -- perhaps 3GB data plus index? (SHOW TABLE STATUS.) If you can't make the buffer_pool big enough for it, and you have a variety of queries that compete for different parts of this (and other) tables, then more RAM may be beneficial.
Seems more like an I/O limitation than something that could be solved by adding indices. What will improve the speed is change the collation of the idcard column to latin1_bin. This uses only 1 byte per character. It also uses binary comparison which is faster than case insensitive comparison.
Only do this if you have no special characters in the idcard column, because the character set of latin1 is quite limited.
ALTER TABLE `test` CHANGE COLUMN `idcard` `idcard` VARCHAR(30) COLLATE 'latin1_bin' AFTER `id`;
Furthermore the ROW_FORMAT=FIXED also improves the speed. ROW_FORMAT=FIXED is not available using the InnoDB engine, but it is with MyISAM. The resulting table I now have is shown below. It's 5 times quicker (80% less time) with select statements than the initial table.
Note that I also changed the collation for 'name' and 'custom_value' to latin1_bin. This does make quite a difference in speed in my test setup, and I'm still figuring out why.
CREATE TABLE `test` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`idcard` VARCHAR(30) COLLATE 'latin1_bin',
`name` VARCHAR(30) COLLATE 'latin1_bin',
`custom_value` VARCHAR(50) COLLATE 'latin1_bin',
PRIMARY KEY (`id`),
INDEX `i1` (`idcard`)
)
ENGINE=MyISAM
ROW_FORMAT=FIXED ;
You may try adding the three other columns in the select clause to the index:
CREATE INDEX idx ON test (idcard, id, name, custom_value);
The three columns other than idcard are being added to allow the index to cover everything being selected. The problem with your current index is that it is only on idcard. This means that once MySQL has traversed down to each leaf node in the index, it would have to do another seek back to the clustered index to lookup the values of all columns mentioned in the select *. As a result of this, MySQL may choose to ignore the index completely. The suggestion I made above avoids this additional seek.

Search 1 row data on bigtable 800'000'000 row MariaDB InnoDB

I have table storing phone numbers with 800M rows.
column
region_code_id smallint(4) unsigned YES
local_number mediumint(7) unsigned YES
region_id smallint(4) unsigned YES
operator_id smallint(4) unsigned YES
id int(10) unsigned NO PRI auto_increment
I need find number.id where region_code_id = 119 and localnumber = 1234567
select * from numbers where numbers.region_code_id = 119 and numbers.local_number = 1234567;
this query execute over 600 second.
How can I improve it ?
UPDATE
Thank for unswer, i understand i need index for this column, i try this as soon as I get the server with more SSD, now i have free 1GB SSD space. How i can to find out how much space the index will occupy?
Consider adding INDEX on columns which you use in WHERE clause.
Start with:
ALTER TABLE `numbers`
ADD INDEX `region_code_id_local_number`
(`region_code_id`, `local_number`);
Note : it can take some time for index to build.
Before and after change, execute explain plan to compare:
EXPLAIN EXTENDED select * from numbers where numbers.region_code_id = 119 and numbers.local_number = 1234567;
References:
How MySQL uses indexes
For this query:
select *
from numbers
where numbers.region_code_id = 119 and
numbers.local_number = 1234567;
You want an index on numbers(region_code_id, local_number) or numbers(local_number, region_code_id). The order of the columns doesn't matter because the conditions are equality for both columns.
create index idx_numbers_region_local on numbers(region_code_id, local_number);
I agree that INDEX(region_code_id, local_number) (in either order) is mandatory for this problem, but I am sticking my nose in to carry it a step further. Isn't that pair "unique"? Or do you have duplicate numbers in the table? If it is unique, then get rid of id and make that pair PRIMARY KEY(region_code_id, local_number). The table will possibly be smaller after the change.
Back to your question of "how big". How big is the table now? Perhaps 40GB? A secondary index (as originally proposed) would probably add about 20GB. And you would need 20-60GB of free disk space to perform the ALTER. This depends on whether adding the index can be done "inplace" in that version.
Changing the PK (as I suggest) would result in a little less than 40GB for the table. It will take 40GB of free space to perform the ALTER.
In general (and pessimistically), plan on an ALTER needing the both the original table and the new table sitting on disk at one time. That includes full copies of the data and index(es).
(A side question: Are you sure local_number is limited to 7 digits everywhere?)
Another approach to the question... For calculating the size of a table or index in InnoDB, add up the datatype sizes (3 bytes for MEDIUMINT, some average for VARCHAR, etc). Then multiply by the number of rows. Then multiply by 4; this will give you the approximate disk space needed. (Usually 2-3 is sufficient for the last multiplier.)
When changing the PK, do it in one step:
ALTER TABLE foo
DROP PRIMARY KEY,
ADD PRIMARY KEY(region_code_id, local_number);
Changing the PK cannot be done "inplace".
Edit (mostly for other readers)
#berap points out that id is needed for other purposes. Hence, dropping id and switching the PK is not an option.
However, this is sometimes an option (perhaps not in this case):
ALTER TABLE foo
DROP PRIMARY KEY,
ADD PRIMARY KEY(region_code_id, local_number),
ADD INDEX(id);
Notes:
The id..AUTO_INCREMENT will continue to work even with just INDEX.
The SELECT in question will be more efficient because it is the PK.
SELECT .. WHERE id = ... will be less efficient because id is a secondary key.
The table will be the same size either way; the secondary key would also be the same size either way -- because every secondary key contains the PK columns, too. (This note is InnoDB-specific.)

InnoDB vs MyIsam on a frequently sorted MySQL 5.5 table

I have a table (currently InnoDB) with roughly 100k records. These records have an order column so they can make up an ordered queue. Actually, these records belong to about 40 departments that have their own queue which in turn have their own records in this table.
The problem is that we're constantly getting "lock wait time" errors because various departments are sorting its queue (and records) simultaneously.
I know that MyIsam is a table-level lock engine and InnoDB is row-level. The thing is I'm not sure which one is faster for this kind of operation.
The other thing is that this table is joined in various queries with other InnoDB tables and I don't know the what can happen if I switch the table to MyIsam.
Here's the table structure:
CREATE TABLE `ssti` (
`demand_nber` MEDIUMINT(8) UNSIGNED NOT NULL COMMENT,
`year` CHAR(4) NULL DEFAULT NULL COMMENT,
`department` CHAR(4) NULL DEFAULT NULL COMMENT '4 caracteres',
-- [other columns ]
`priority` INT(10) UNSIGNED NOT NULL DEFAULT '9999999',
PRIMARY KEY (`NR_DMD`)
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB;
And here's the piece of java code that updates the priorities:
PreparedStatement psUpdatePriority = con.prepareStatement("UPDATE `ssti` SET `priority` = ? WHERE demand_nber=?;");
for (int i = 0; i < demands.length(); ++i) {
JSONObject d = demands.getJSONObject(i);
psUpdatePriority.setInt(1, d.getInt("newPriority"));
psUpdatePriority.setInt(2, d.getInt("demandNumber"));
psUpdatePriority.addBatch();
}
int[] totalUpdated = psUpdatePriority.executeBatch();
When investigating performance problems be sure to engage the slow query log so you have a record of the specific queries causing problems.
What it looks like here is you're including a column in your WHERE clause that's not indexed. That's extremely painful on large data sets as it requires a "table scan", or reading every record and evaluating them sequentially.
When indexed your queries should be significantly faster.
If you're really up against the wall, you may want to break out each department into their own table. This is very difficult to undo so I'd only pursue this as a last resort.
Select statements normally do not block each other. Sorting is done in tempDB for each query separately. If you are getting waits for locks, then look at updates that are blocking selects.
With row-locking UPDATE will block only needed (small number of) rows, allowing other statements access other rows. With table-locking an UPDATE will block whole table and no other statements will access table until UPDATE is finished. So MyISAM will make your problem worse in any case.
--
It seems that you are using this table for many purposes. Therefore, you need to consider all of them and their importance, when tuning performance of this table.
Case 1: Department queries its own data and needs it sorted
When a result of some data manipulation is reused many times, the general rule is to save it. It would allow reading the result straight away, rather then computing it every time.
To allow queries to read sorted data you need to create an index.
However index just on sorting column priority will not help. As each department can see only its own data, every query also contains department number. Hence your index should contain two key columns as KEY (department, priority).
Case 2: Table is joined to several other tables
To speed up queries with JOINs you'll need indexes with key same as columns used for joins.
Case 3: Inserting new, possibly transactional, data
A single table is limited in how long it can handle inserts of new data and processing reporting queries. Usually, transactional and reporting uses are considered alternative to each other. It is a good practice to use reporting tables, that summarise data from transactional tables. Also joins to dimensions are easier when data is aggregated (there are less rows).