java and mysql load data infile misunderstanding - mysql

Thanks for viewing this. I need a little bit of help for this project that I am working on with MySql.
For part of the project I need to load a few things into a MySql database which I have up and running.
The info that I need, for each column in the table Documentation, is stored into text files on my hard drive.
For example, one column in the documentation table is "ports" so I have a ports.txt file on my computer with a bunch of port numbers and so on.
I tried to run this mysql script through phpMyAdmin which was
LOAD DATA INFILE 'C:\\ports.txt" INTO TABLE `Documentation`(`ports`).
It ran successfully so I went to do the other load data i needed which was
LOAD DATA INFILE 'C:\\vlan.txt' INTO TABLE `Documentation` (`vlans`)
This also completed successfully, but it added all the rows to the vlan column AFTER the last entry to the port column.
Why did this happen? Is there anything I can do to fix this? Thanks

Why did this happen?
LOAD DATA inserts new rows into the specified table; it doesn't update existing rows.
Is there anything I can do to fix this?
It's important to understand that MySQL doesn't guarantee that tables will be kept in any particular order. So, after your first LOAD, the order in which the data were inserted may be lost & forgotten - therefore, one would typically relate such data prior to importing it (e.g. as columns of the same record within a single CSV file).
You could LOAD your data into temporary tables that each have an AUTO_INCREMENT column and hope that such auto-incremented identifiers remain aligned between the two tables (MySQL makes absolutely no guarantee of this, but in your case you should find that each record is numbered sequentially from 1); once there, you could perform a query along the following lines:
INSERT INTO Documentation SELECT port, vlan FROM t_Ports JOIN t_Vlan USING (id);

Related

SSIS package design, where 3rd party data is replacing existing data

I have created many SSIS packages in the past, though the need for this one is a bit different than the others which I have written.
Here's the quick description of the business need:
We have a small database on our end sourced from a 3rd party vendor, and this needs to be overwritten nightly.
The source of this data is a bunch of flat files (CSV) from the 3rd party vendor.
Current setup: we truncate the tables of this database, and we then insert the new data from the files, all via SSIS.
Problem: There are times when the files fail to come, and what happens is that we truncate the old data, though we don't have the fresh data set. This leaves us without a database where we would prefer to have yesterday's data over no data at all.
Desired Solution: I would like some sort of mechanism to see if the new data truly exists (these files) prior to truncating our current data.
What I have tried: I tried to capture the data from the files and add them to an ADO recordset and only proceeding if this part was successful. This doesn't seem to work for me, as I have all the data capture activities in one data flow and I don't see a way for me to reuse that data. It would seem wasteful of resources for me to do that and let the in-memory tables just sit there.
What have you done in a similar situation?
If files are not present update some flags like IsFile1Found to false and pass these flags to stored procedure which truncates on conditional basis.
If file is empty then Using powershell through Execute Process Task you can extract first two rows if there are two rows (header + data row) then it means data file is not empty. Then you can truncate the table and import the data.
other approach could be
you can load data into some staging table and from these staging table insert data to the destination table using SQL stored procedure and truncate these staging tables after data is moved to all the destination table. In this way before truncating destination table you can check if staging tables are empty or not.
I looked around and found that some others were struggling with the same issue, though none of them had a very elegant solution, nor do I.
What I ended up doing was to create a flat file connection to each file of interest and have a task count records and save to a variable. If a file isn't there, the package fails and you can stop execution at that point. There are some of these files whose actual count is interesting to me, though for the most part, I don't care. If you don't care what the counts are, you can keep recycling the same variable; this will reduce the creation of variables on your end (I needed 31). In order to preserve resources (read: reduce package execution time), I excluded all but one of the columns in each data source; it made a tremendous difference.

Fastest-Cleanest way to update database (mysql large tables)

I have a website feeded with large mysql tables (>50k of rows in some tables). Lets name one table "MotherTable". Every night I update the site with a new csv file (produced locally) that has to substitute "MotherTable" data.
The way I do this currently (I am not an expert, as you see), is:
- First, I TRUNCATE the MotherTable table.
- Second, I import the csv file to the empty table, with columns separated by "/" and skipping 1 line.
As the csv file is not very small, there are some seconds (or even a minute) when the MotherTable is empty, so the web users that make SELECTS on this table find nothing.
Obviously, I don't like that. Is there any procedure to update MotherTable in a way users note nothing? If not, what would be the quickest way to update the table with the new csv file?
Thank you!

mysql master-slave partitioned table doesn't exists

I use create Raw Data Files for mysql-master-slave replication,after setup,It's return table xxx doesn't exists when query on the partitioned tables,but it's work ok on the other tables.
And,When I change to use mysqldump, It's all work ok.
Can anyone help me to fix this problem?
If the partition table did not work but the other tables did and the mysqldump worked fine, my best guess would be that your Partitioned data is not stored in the same place as the rest of your data. Thus, when you used the tar, zip, or rsync method to copy your data directory, you left out the data that made up the partitioned table. You would need to locate where the partitioned data is stored and moved that over along with the rest of the data directory.
Based on your comment below, however, you have what is called the famous Schrodinger table. Based on Schrodinger's Cat paradox, This is where Mysql thinks that the table exists, because it shows up when you run show tables, but does not allow you to query of it; as in it exist but does not exist.
Usually this is as a result of not copy over the metadata (as in the ibdata1 file, and the ib_logfiles) correctly. One thing that you can do to test this is, if possible, remove the partition from the tables and try your rsync again. If you are still getting this error, it has nothing to do with the fact that the table is partitioned. Then, this test would lead me to believe that you did not copy all the data over correctly.

MySQL "source" command overwrites table

I have a MySQL Server which has one database called "Backup".
It only has one table with the name "storage".
In the Backup db the storage table contains about 5 Millions datarows.
Now I wanted to append new rows to the table by using the "source" command in the SQL command line.
So what happend is, that source uploaded all the new files in the table, but it overwrote the existing entries (seems that he first deleted all data)
What I have to say is that the sql file that I want to update comes from another server where this table has the same name and structure as "storage".
What I want is to append the new entries that are in the sql file to the one in my datebase. I do not want to overwrite them.
The structure in the two tables is exactly the same. I use the Backup datebase as the name says for backup uses, so that from time to time I can backup my data.
Has anyone an idea how to solve this?
Look in the .sql file you're reading with the SOURCE command, and remove the DROP TABLE and CREATE TABLE statements that appear there. They are the cause of your table being overwritten; what's actually happening is that the table is being replaced.
You could also look into using SELECT ... INTO OUTFILE and LOAD DATA INFILE as a faster and less potentially destructive way to get data from one server to the other in a file.

How to avoid duplicates while updating a MySQL database?

I'm receiving a MySQL dump file .sql daily from an external server, which I don't have any control of. I created a local database to store all data in the .sql file. I hope I can set up a script to automatically update my local database daily. The sql file I'm receiving daily contains old data that is in the local database already. How can I avoid duplicates of such old data and only insert into the local MySQL server new data? Thank you very much!
You can use a third-party database compare tool such as those from Red Gate to create two databases, one current (your "master") and the new dump. You can then run the compare tool between the two versions and update only changes between them, updating your master.
Use unique constraints on field, that you want to be unique.
Also, as Danny Beckett mentioned, to avoid errors in output (which I would prefer to redirect into file for future analysis, to check, if I haven't missed anything in process), you can use INSERT IGNORE construct instead of INSERT.
You can use a constraint supported with IGNORE statement.
The second option, you can first insert the data to a temp table. Then insert only the difference.
Using the second option you may use some restriction to do not search for duplication through add records stored in database.
You need to create a primary key in your table. It should be a unique combination of column values. Using the INSERT query with IGNORE will avoid adding duplicates in this table.
see http://dev.mysql.com/doc/refman/5.5/en/insert.html
If this is a plain vanilla mysqldump file, then normally it includes DROP TABLE IF EXISTS... statements and create table statements, so the tables are recreated when the data is imported. So duplicte data should not be a problem, unless I'm missing something.