Kafka Connect MySQL - mysql

I am using Kafka connect to upsert records into MySql. However, the default option is INSERT .. ON DUPLICATE KEY REPLACE .. I am wondering if there is a way to change this - i.e: on duplicate keys, I would like to add the columns together.

Looking at the docs I would say the answer is no. INSERT or UPSERT is your choice.
You could address this in the database, or with a Kafka Streams app to perform the logical processing (adding fields together) as required.

Related

MySql filtering data from insert query

I'll start by saying I'm new to MySql, at least in the level of my question. :)
I got a data logger with a high data output and I'm interested in saving the data to a database.
I've been wondering if it's possible to filter the INSERT query in the database itself, so it will save only data if certain values appear in the query.
As #Akina mentioned, you can use CHECK CONSTRAINT and INSERT IGNORE. However, It is better not trying to insert any problematic data, since it will slow down insert operation.
you need to filter data before insert operation. You may want to consider writing custom log shipper or if you have option you can use logstash

How does Hibernate get the AutoIncrement Value on Identity Insert

I am working on a high scale application of the order of 35000 Qps, using Hibernate and MySQL.
A large table has AutoIncrement Primary key, and generation defined is IDENTITY at Hibernate. Show Sql is true as well.
Whenever an Insert happens I see only one query being fired in DB, which is an
Insert statement.
Few Questions Follow:
1) I was wondering how does Hibernate get the AutoIncrement Value after insert?
2) If the answer is "SELECT LAST_INSERT_ID()", why does it not show up at VividCortex or in Show Sql Logs...?
3) How does "SELECT LAST_INSERT_ID()" account for multiple autoincrements in different tables?
4) If MySql returns a value on Insert, why aren't the MySql clients built so that we can see what is being returned?
Thanks in Advance for all the help.
You should call SELECT LAST_INSERT_ID().
Practically, you can't do the same thing as the MySQL JDBC driver using another MySQL client. You'd have to write your own client that reads and writes the MySQL protocol.
The MySQL JDBC driver gets the last insert id by parsing packets of the MySQL protocol. The last insert id is returned in this protocol by a MySQL result set.
This is why SELECT LAST_INSERT_ID() doesn't show up in query metrics. It's not calling that SQL statement, it's picking the integer out of the result set at the protocol level.
You asked how it's done internally. A relevant line of code is https://github.com/mysql/mysql-connector-j/blob/release/8.0/src/main/protocol-impl/java/com/mysql/cj/protocol/a/result/OkPacket.java#L55
Basically, it parses an integer from a known position in a packet as it receives a result set.
I'm not going to go into any more detail about parsing the protocol. I don't have experience coding a MySQL protocol client, and it's not something I wish to do.
I think it would not be a good use of your time to implement your own MySQL client.
It probably uses the standard JDBC mechanism to get generated values.
It's not
You execute it imediately after inserting in one table, and you thus get the values that have been generated by that insert. But that's not what is being used, so it's irrelevant
Not sure what you mean by that: the MySQL JDBC driver allows doing that, using the standard JDBC API
(Too long for a comment.)
SELECT LAST_INSERT_ID() uses the value already available in the connection. (This may explain its absence from any log.)
Each table has its own auto_inc value.
(I don't know any details about Hibernate.)
35K qps is possible, but it won't be easy.
Please give us more details on the queries -- SELECTs? writes? 35K INSERTs?
Are you batching the inserts in any way? You will need to do such.
What do you then use the auto_inc value in?
Do you use BEGIN..COMMIT? What value of autocommit?

How to avoid duplicates while updating a MySQL database?

I'm receiving a MySQL dump file .sql daily from an external server, which I don't have any control of. I created a local database to store all data in the .sql file. I hope I can set up a script to automatically update my local database daily. The sql file I'm receiving daily contains old data that is in the local database already. How can I avoid duplicates of such old data and only insert into the local MySQL server new data? Thank you very much!
You can use a third-party database compare tool such as those from Red Gate to create two databases, one current (your "master") and the new dump. You can then run the compare tool between the two versions and update only changes between them, updating your master.
Use unique constraints on field, that you want to be unique.
Also, as Danny Beckett mentioned, to avoid errors in output (which I would prefer to redirect into file for future analysis, to check, if I haven't missed anything in process), you can use INSERT IGNORE construct instead of INSERT.
You can use a constraint supported with IGNORE statement.
The second option, you can first insert the data to a temp table. Then insert only the difference.
Using the second option you may use some restriction to do not search for duplication through add records stored in database.
You need to create a primary key in your table. It should be a unique combination of column values. Using the INSERT query with IGNORE will avoid adding duplicates in this table.
see http://dev.mysql.com/doc/refman/5.5/en/insert.html
If this is a plain vanilla mysqldump file, then normally it includes DROP TABLE IF EXISTS... statements and create table statements, so the tables are recreated when the data is imported. So duplicte data should not be a problem, unless I'm missing something.

How to insert data to mysql directly (not using sql queries)

I have a MySQL database that I use only for logging. It consists of several simple look-alike MyISAM tables. There is always one local (i.e. located on the same machine) client that only writes data to db and several remote clients that only read data.
What I need is to insert bulks of data from local client as fast as possible.
I have already tried many approaches to make this faster such as reducing amount of inserts by increasing the length of values list, or using LOAD DATA .. INFILE and some others.
Now it seems to me that I've came to the limitation of parsing values from string to its target data type (doesn't matter if it is done when parsing queries or a text file).
So the question is:
does MySQL provide some means of manipulating data directly for local clients (i.e. not using SQL)? Maybe there is some API that allow inserting data by simply passing a pointer.
Once again. I don't want to optimize SQL code or invoke the same queries in a script as hd1 adviced. What I want is to pass a buffer of data directly to the database engine. This means I don't want to invoke SQL at all. Is it possible?
Use mysql's LOAD DATA command:
Write the data to file in CSV format then execute this OS command:
LOAD DATA INFILE 'somefile.csv' INTO TABLE mytable
For more info, see the documentation
Other than LOAD DATA INFILE, I'm not sure there is any other way to get data into MySQL without using SQL. If you want to avoid parsing multiple times, you should use a client library that supports parameter binding, the query can be parsed and prepared once and executed multiple times with different data.
However, I highly doubt that parsing the query is your bottleneck. Is this a dedicated database server? What kind of hard disks are being used? Are they fast? Does your RAID controller have battery backed RAM? If so, you can optimize disk writes. Why aren't you using InnoDB instead of MyISAM?
With MySQL you can insert multiple tuples with one insert statement. I don't have an example, because I did this several years ago and don't have the source anymore.
Consider as mentioned to use one INSERT with multiple values:
INSERT INTO table_name (col1, col2) VALUES (1, 'A'), (2, 'B'), (3, 'C'), ( ... )
This leads to you only having to connect to your database with one bigger query instead of several smaller. It's easier to take in the entire couch through the door once than running back and forth with all disassembled pieces of the couch, opening the door every time. :)
Apart from that, you can also run LOCK TABLES table_name WRITE before INSERT and UNLOCK TABLES afterwards. That will secure that nothing else is inserted during.
Lock tables
INSERT into foo (foocol1, foocol2) VALUES ('foocol1val1', 'foocol2val1'),('foocol1val2','foocol2val2') and so on should sort you. More information and sample code will be found here. If you have further problems, do leave a comment.
UPDATE
If you don't want to use SQL, then try this shell script to do as many inserts as you want, put it in a file, say insertToDb.sh, and get on with your day/evening:
#!/bin/sh
mysql --user=me --password=foo dbname -h foo.example.com -e "insert into tablename (col1, col2) values ($1, $2);"
Invoke as sh insertToDb.sh col1value col2value. If I've still misunderstood your question, leave another comment.
After making some investigation I found no way of passing data directly to mysql database engine (without parsing it).
My aim was to speed up communication between local client and db server as much as possible. The idea was if client is local then it could use some api functions to pass data to db engine thus not using (i.e. parsing) SQL and values in it. The only closest solution was proposed by bobwienholt (using prepared statement and binding parameters). But LOAD DATA .. INFILE appeared to be a bit faster in my case.
The best way to insert data on MS SQL without using insert into or update queries is just to access MS SQL Interface. Right click on the table name and select "Edit top 200 rows". Then you will be able to add data on the database directly by just typing per cell. For you to enable searching or using select or other sql commands just right click on any of the 200 rows you have selected. Go to pane then select SQL and you can add sql command. Check it out. :D
without using insert statement , use " Sqllite Studio " for inserting data in mysql. It's free and open source so u can download and check.

Querying MySQL and MSSQL databases at the same time

I'm getting data from an MSSQL DB ("A") and inserting into a MySQL DB ("B") using the date created in the MSSQL DB. I'm doing it with simple logics, but there's got to be a faster and more efficient way of doing this. Below is the sequence of logics involved:
Create one connection for MSSQL DB and one connection for MySQL DB.
Grab all of data from A that meet the date range criterion provided.
Check to see which of the data obtained are not present in B.
Insert these new data into B.
As you can imagine, step 2 is basically a loop, which can easily max out the time limit on the server, and I feel like there must be a way of doing this must faster and during when the first query is made. Can anyone point me to right direction to achieve this? Can you make "one" connection to both of the DBs and do something like below?
SELECT * FROM A.some_table_in_A.some_column WHERE
"it doesn't exist in" B.some_table_in_B.some_column
A linked server might suit this
A linked server allows for access to distributed, heterogeneous
queries against OLE DB data sources. After a linked server is created,
distributed queries can be run against this server, and queries can
join tables from more than one data source. If the linked server is
defined as an instance of SQL Server, remote stored procedures can be
executed.
Check out this HOWTO as well
If I understand your question right, you're just trying to move things in the MSSQL DB into the MySQL DB. I'm also assuming there is some sort of filter criteria you're using to do the migration. If this is correct, you might try using a stored procedure in MSSQL that can do the querying of the MySQL database with a distributed query. You can then use that stored procedure to do the loops or checks on the database side and the front end server will only need to make one connection.
If the MySQL database has a primary key defined, you can at least skip step 3 ("Check to see which of the data obtained are not present in B"). Use INSERT IGNORE INTO... and it will attempt to insert all the records, silently skipping over ones where a record with the primary key already exists.