MySQL 5.6 on insert creating holes/gaps/jumps on index - mysql

I'm testing MySQL 5.6 and noticed some gap on my table idx.
while using two simple ways to bulk insert the same data on a indexed table, they produce two different indexes.
They are not weird structures just:
normal insert using value()
insert using select
Also, I'm not using especial insert condition, only simple insert and auto index.
The first, operate as expected but the second will generate gaps on the table index per each bulk insert.
Here is my script, to demonstrate this behavior:
http://sqlfiddle.com/#!9/b138d/1
I'll be glad if someone can explain it or tell me if I'm doing something wrong.
Have a lovely celebration day..

Related

partitioning in MySQL : insert into partition

I come from an Apache Hive background.
In that language, you would say the below to insert into date 20220601:
insert into table db.tablename partition(date=20220601)
In MySQL; I can't get such an insert statement to work. I have been Googling & it seems it just sorts itself out?
So if I did
insert into db.tablename
select * from db.othertable
Would it automatically partition the ingested data?
I feel like I am missing something here!
If the table is partitioned, the values you insert determine which partition the row goes into. Partitioning a table requires you define the mapping, so it's always deterministic which partition a row goes into.
Therefore you don't need to tell INSERT which partition to insert the row into. It's determined automatically by the values you insert in the row.
Partitioning in MySQL is not required for a table. By default, a table is not partitioned. This is normal and sufficient in almost all cases.
Perhaps partitioning in Apache Hive is necessary and does something different from the feature called partitioning in MySQL? I don't know Apache Hive, so I can't answer that.
I suggest you read the MySQL manual chapter about partitioning if you want to learn more about it: https://dev.mysql.com/doc/refman/8.0/en/partitioning.html

INSERT INTO statement in MySQL

I'm trying to work with YEAR function on one column in the DB and then add the results to a different table in the DWH.
What am I doing wrong?
INSERT INTO example_dwh1.dim_time (date_year)
SELECT YEAR(time_taken)
FROM exampledb.photos;
When removing the INSERT INTO line, I get the results I want, but I'm not able to insert them into the dwh table.
Thanks for your help!
The following select works, but I don't see the data in the table after the insert:
INSERT INTO example_dwh1.dim_time (date_year)
SELECT YEAR(time_taken)
FROM exampledb.photos;
There is rather broad. Assuming you have no errors in the insert, you might have:
You are incorrectly querying dim_time, so the data is there but your check is wrong.
You are inserting into dim_time in one database but querying it in another.
Assuming you have errors but are missing them, here are some possibilities:
The database does not exist.
The table does not exist.
The column is misnamed.
Other columns are declared NOT NULL.
Triggers defined on the table are preventing the insert.
Unique constraints/indexes on the table are preventing the insert.
Your question does not provide enough information to be more specific. However, it seems highly suspicious to be inserting a bunch of years -- which might include many duplicates -- into a dimension table.

Is there any disadvantages of unique column in MYSQL

i'd like to ask a question regarding Unique columns in MySQL.
Would like to ask experts on which is a better way to approach this problem, advantages or disadvantages if there is any.
Set a varchar column as unique
Do a SQL INSERT IGNORE
If affected rows > 0 proceed with running the code
versus
Leave a varchar column as not-unique
Do a search query to look for identical value
If there is no rows returned in query, Do a SQL INSERT
proceed with running the code
Neither of the 2 approaches is good.
You don't do INSERT IGNORE nor do you search. The searching part is also unreliable, because it fails at concurrency and compromises the integrity. Imagine this scenario: you and I try to insert the same info into the database. We connect at the same time. Code in question determines that there's no such record in the database, for both of us. We both insert the same data. Now your column isn't unique, therefore we'll end up with 2 records that are the same - your integrity now fails.
What you do is set the column to unique, insert and catch the exception in the language of your choice.
MySQL will fail in case of duplicate record, and any proper db driver for MySQL will interpret this as an exception.
Since you haven't mentioned what the language is, it's difficult to move forward with examples.
Defining a column as an unique index has a few advantages, first of all when you define it as an "unique index" MySQL can optimize your index for unique values (same as a primary key) because mysql doesn't have to check if there are more rows with the same value so it can use an optimized algoritme for the lookups.
Also you are assured that there never will be a double entry in your database instead of handeling this in multiple places in your code.
When you don't define it as UNIQUE you first need to check if an records exists in your table, and then insert something wich requires 2 queries (and even a full table lock) instead of 1 wich decreases your performance and is more error prone
http://dev.mysql.com/doc/refman/5.0/en/constraint-primary-key.html
I'm leaving the fact that you would use the INSERT IGNORE wich IGNORES the exception when the entry allready exists in the database (Still you could use it for high performance operations maybe in some sort of special case). A normal INSERT will give you the feedback if an entry allready exists
Putting a constraint like UNIQUE is better when it comes to query performance and data reliability. But there is also a trade-off when it comes to writing. So It's up to you which do you prefer. But in your case, since you also do INSERT IF NOT EXIST query, so I guess, it's better to just use the Constraint.

How to handle milions of separate insert queries

I have a situation in which I have to insert over 10 million separate records into one table. Normally a batch insert split into chunks does the work for me. The problem however is that this over 3gig file contains over 10 million separate insert statements. Since every query takes 0.01 till 0.1 seconds, it will take over 2 days to insert everything.
I'm sure there must be a way to optimize this by either lowering the insert time drasticly or somehow import in a different way.
I'm now just using the cli
source /home/blabla/file.sql
Note: It's a 3th party that is providing me this file. I'm
Small update
I removed any indexes
Drop the indexes, then re-index when you are done!
Maybe you can parse the file data and combine several INSERT queries to one query like this:
INSERT INTO tablename (field1, field2...) VALUES (val1, val2, ..), (val3, val4, ..), ...
There are some ways to improve the speed of your INSERT statements:
Try to insert many rows at once if this is an option.
An alternative can be to insert the data into a copy of your desired table without indexes, insert the data there, then add the indexes and rename your table.
Maybe use LOAD DATA INFILE, if this is an option.
The MySQL manual has something to say about that, too.

MySQL and implementing something close to sequences?

I am recently in the process of moving from oracle to mysql and would like some advice if how i am implementing something similar to sequences in mysql is a good way.
Essentially how i am currently going to implement it is by having a separate table in mysql for each sequence in oracle and have a single column which represents the last_number and increment this column when ever i insert a new row, that's one way another way i could go about doing it is by creating a single table with several rows representing each sequence and increment each row separately whenever i do an insert.
Another simpler way of doing it i could just do a select max()+1 on the relevant column when inserting data.
I'm basically thinking of switching to the select max()+1 option as it seems simpler to implement, but i would like to get some advice on what you think would be the best way of doing it out of these options, and if there is any pitfalls that i am currently not aware of when using select max()+1.
Also the reason im am not using auto_increment and the function last_insert_id() is i want to follow the ansi standard.
Thanks.
First of all: The max()+1 version is NOT guaranteed to give you a sequence, if you use transactions in a high isolation level.
The way we typically use sequences (if we can't avoid them) is to create a table with an AUTO_INCREMENT value, INSERT INTO it, SELECT last_insert_id(), DELETE FROM table WHERE field<$LASTINSERTID. This is ofcourse done in a stored procedure.
There is a read consistency problem, in that two sessions both running ...
insert into ... select max(..)+1 from ...
... at the same time both see the same value of max(...), hence they both try to insert the same new value.
You have the same problem with your table of maxima method, and you have to use a locking mechanism to avoid multiple session reading the same value. This leads to a concurrency problem where inserts to the table are serialised.