partitioning in MySQL : insert into partition - mysql

I come from an Apache Hive background.
In that language, you would say the below to insert into date 20220601:
insert into table db.tablename partition(date=20220601)
In MySQL; I can't get such an insert statement to work. I have been Googling & it seems it just sorts itself out?
So if I did
insert into db.tablename
select * from db.othertable
Would it automatically partition the ingested data?
I feel like I am missing something here!

If the table is partitioned, the values you insert determine which partition the row goes into. Partitioning a table requires you define the mapping, so it's always deterministic which partition a row goes into.
Therefore you don't need to tell INSERT which partition to insert the row into. It's determined automatically by the values you insert in the row.
Partitioning in MySQL is not required for a table. By default, a table is not partitioned. This is normal and sufficient in almost all cases.
Perhaps partitioning in Apache Hive is necessary and does something different from the feature called partitioning in MySQL? I don't know Apache Hive, so I can't answer that.
I suggest you read the MySQL manual chapter about partitioning if you want to learn more about it: https://dev.mysql.com/doc/refman/8.0/en/partitioning.html

Related

MySql - How to insert and on duplicate key update without explicitly specifying all non key columns

I have a table which was created as a select * from a view (and then added a PK).
I want to periodically update the table with all the data from the view.
I thought the best option is to do this using: INSERT INTO table_a SELECT * FROM view_a ON DUPLICATE KEY UPDATE VALUES(non_key_col_1), VALUES(non_key_col_1), .... ;
Since there are quite a lot of columns, and they might change in the future (then I can re-create the table, but I wish I won't have to edit the periodic insert, I was wondering if there is a way to avoid the explicit specification of all columns?
There no such syntax in mysql unfortunately. You'll have to update all the columns one by one.
You can go with a trigger on insert operation, that is if the primary key exists update the row otherwise insert it. But definitely it is going to impact the performance in case of large data
One thing i can think of is get the column names from INFORMATION_SCHEMA.COLUMNS and use those to dynamically compose your query in your app.
SELECT * FROM information_schema.columns WHERE table_name = 'view_a';
Now you have the columns no matter if the view changes.
Do the same for the table and you have the column differences.
Use those differences to run ALTER TABLE statements or drop it and recreate it all together.
Of course this is probably even more laborious then dropping and recreating the table manually.

How to solve a real time dwh delete process?

I am trying to create a near real time dwh. My first attempt is every 15 minutes load a table into my application from my DWH.
I would like to avoid all the possible problems that a near real time DWH can face. One of those problems is query an empty table that shows the value for a multiselect html tag.
To solve this I have thought the following solution but I do not know if there exists a standard to solve this kind of problem.
I create a table like this to save the possible values of the multiselect:
CREATE TABLE providers (
provider_id INT PRIMARY KEY,
provider_name VARCHAR(20) NOT NULL,
delete_flag INT NOT NULL
)
Before the insert I update the table like this:
UPDATE providers set my_flag=1
I insert rows with an ETL process like this:
INSERT INTO providers (provider_name, delete_flag) VALUES ('Provider1',0)
From my app I query the table like this:
SELECT DISTINCT provider_name FROM providers
While the app still working and selecting all providers without duplicated (The source can delete, add or update one provider, so I always have to still updated respect the source) and without showing an error because table is empty I can run this statement just after the insert statement:
DELETE FROM providers WHERE delete_flag=1
I think that this is a good solution for small tables, or big tables with few changes, but what happens when a table is big? Exist some standard to solve this kind of problems?
We can not risk user usability because we are updating data.
There are two aproaches to publich a bulk change of a dimenstion without taking a maintainance window that would interupt the queries.
The first one is simple using a transactional concept, but performs bad for large data.
DELETE the replaced dimension records
INSERT the new or changed dimension records
COMMIT;
Note that you need no logical DELETE flag as the changes are visible only after the COMMIT - so the table is never empty.
As mentioned this approach is not suitable if you have a large dimension with lot of changes. In such case you may use the EXCHANGE PARTITION feature as of MySQL 5.6
You define a temporary table with he same structure as your dimension table, that is partitioned with only one partition containing all data.
CREATE TABLE dim_tmp (
id INT NOT NULL,
col1 VARCHAR(30),
col2 VARCHAR(30)
)
PARTITION BY RANGE (id) (
PARTITION pp VALUES LESS THAN (MAXVALUE)
);
Populate the table with the complete new dimension definition and switch this temporary table with your dimension table.
ALTER TABLE dim_tmp EXCHANGE PARTITION pp WITH TABLE dim;
After this statement the data from the temporary table will be stored (published) in your dimension table (new definition) and the old state of the dimension will be stored in the temporary table.
Please check the documentation link above for constraints of this feature.
Disclaimer: I use this feature in Oracle DB and I have no experience with it in MySQL.

MySQL 5.6 on insert creating holes/gaps/jumps on index

I'm testing MySQL 5.6 and noticed some gap on my table idx.
while using two simple ways to bulk insert the same data on a indexed table, they produce two different indexes.
They are not weird structures just:
normal insert using value()
insert using select
Also, I'm not using especial insert condition, only simple insert and auto index.
The first, operate as expected but the second will generate gaps on the table index per each bulk insert.
Here is my script, to demonstrate this behavior:
http://sqlfiddle.com/#!9/b138d/1
I'll be glad if someone can explain it or tell me if I'm doing something wrong.
Have a lovely celebration day..

INSERT INTO statement in MySQL

I'm trying to work with YEAR function on one column in the DB and then add the results to a different table in the DWH.
What am I doing wrong?
INSERT INTO example_dwh1.dim_time (date_year)
SELECT YEAR(time_taken)
FROM exampledb.photos;
When removing the INSERT INTO line, I get the results I want, but I'm not able to insert them into the dwh table.
Thanks for your help!
The following select works, but I don't see the data in the table after the insert:
INSERT INTO example_dwh1.dim_time (date_year)
SELECT YEAR(time_taken)
FROM exampledb.photos;
There is rather broad. Assuming you have no errors in the insert, you might have:
You are incorrectly querying dim_time, so the data is there but your check is wrong.
You are inserting into dim_time in one database but querying it in another.
Assuming you have errors but are missing them, here are some possibilities:
The database does not exist.
The table does not exist.
The column is misnamed.
Other columns are declared NOT NULL.
Triggers defined on the table are preventing the insert.
Unique constraints/indexes on the table are preventing the insert.
Your question does not provide enough information to be more specific. However, it seems highly suspicious to be inserting a bunch of years -- which might include many duplicates -- into a dimension table.

MySQL and implementing something close to sequences?

I am recently in the process of moving from oracle to mysql and would like some advice if how i am implementing something similar to sequences in mysql is a good way.
Essentially how i am currently going to implement it is by having a separate table in mysql for each sequence in oracle and have a single column which represents the last_number and increment this column when ever i insert a new row, that's one way another way i could go about doing it is by creating a single table with several rows representing each sequence and increment each row separately whenever i do an insert.
Another simpler way of doing it i could just do a select max()+1 on the relevant column when inserting data.
I'm basically thinking of switching to the select max()+1 option as it seems simpler to implement, but i would like to get some advice on what you think would be the best way of doing it out of these options, and if there is any pitfalls that i am currently not aware of when using select max()+1.
Also the reason im am not using auto_increment and the function last_insert_id() is i want to follow the ansi standard.
Thanks.
First of all: The max()+1 version is NOT guaranteed to give you a sequence, if you use transactions in a high isolation level.
The way we typically use sequences (if we can't avoid them) is to create a table with an AUTO_INCREMENT value, INSERT INTO it, SELECT last_insert_id(), DELETE FROM table WHERE field<$LASTINSERTID. This is ofcourse done in a stored procedure.
There is a read consistency problem, in that two sessions both running ...
insert into ... select max(..)+1 from ...
... at the same time both see the same value of max(...), hence they both try to insert the same new value.
You have the same problem with your table of maxima method, and you have to use a locking mechanism to avoid multiple session reading the same value. This leads to a concurrency problem where inserts to the table are serialised.