Problem: I have an Aurora RDS database that has a table where the data for a certain column was deleted. I have a snapshot of the DB from a few days ago that I want to use to populate the said column with the values from the snapshot. The issue is that certain rows have been deleted from the live DB in the meantime and I don't want to include them again.
I want to mount the snapshot, connect to it and then SELECT INTO OUTFILE S3 the table that interests me. Then I will LOAD DATA FROM S3 into the live DB, selecting only the column that interests me. But I haven't found information about what happens if the number of rows differ, namely if the snapshot has rows that were deleted in the meantime from the live DB.
Does the import command take the ID column into consideration when doing the import? Should I also import the ID column? I don't want to recreate the rows in question, I only want to populate the existing rows with the values from the column I want from the snapshot.
ALTER TABLE the destination table to add the column you are missing. It will be empty of data for now.
LOAD DATA your export into a different table than the ultimate destination table.
Then do an UPDATE with a JOIN between the destination table and the imported table. In this update, copy the values for the column you're trying to restore.
By using an inner join, it will only match rows that exist in both tables.
Related
I have a hive database records with 50 tables;
I want to check if any tables are empty.
The database name is employee.
I don't want to do this manually i.e Do a select * query on each table individually.
Can anyone explain
Hive does not keep track of the number of records present in a table. Only during the query execution, the files belonging to the particular table is read and processed. So there is no other way to know the number of records present in each table without querying each table individually.
Alternatively, You can run a disk usage command on the database directory in HDFS
hdfs dfs -du -s -h <hive.warehouse.dir>/employee/*
the table folders with 0B are obviously empty.
This is possible because Hive stores the table files in the HDFS LOCATION given at the time of table creation or at the path mentioned for hive.warehouse.dir property in hive-site.xml. Default is /user/hive/warehouse.
If the tables are a managed tables, for the database employee all the tables' records will be stored under <hive.warehouse.dir>/employee/.
I have the following issue related to I have in localhost(my computer) a Table in a database which I use to update the data for a month. Once data is correct, I need to update the Table in the database which resides in the server.
I use Navicat to do the work and it only transfer data deleting the actual database in the server and sending all the data from my localhost.
The problem is that the Table now has almost 300.000 records stored and it takes too long transfering the data leaving the database empty for some time.
Is there any way I could use that only update the data without deleting the whole table?
export local table with different name as mysqldump or just csv, 300k rows is not a big deal and use a different table now.
then upload the table 2 to db and use a query to update table 1 using table2 data.
Thanks for viewing this. I need a little bit of help for this project that I am working on with MySql.
For part of the project I need to load a few things into a MySql database which I have up and running.
The info that I need, for each column in the table Documentation, is stored into text files on my hard drive.
For example, one column in the documentation table is "ports" so I have a ports.txt file on my computer with a bunch of port numbers and so on.
I tried to run this mysql script through phpMyAdmin which was
LOAD DATA INFILE 'C:\\ports.txt" INTO TABLE `Documentation`(`ports`).
It ran successfully so I went to do the other load data i needed which was
LOAD DATA INFILE 'C:\\vlan.txt' INTO TABLE `Documentation` (`vlans`)
This also completed successfully, but it added all the rows to the vlan column AFTER the last entry to the port column.
Why did this happen? Is there anything I can do to fix this? Thanks
Why did this happen?
LOAD DATA inserts new rows into the specified table; it doesn't update existing rows.
Is there anything I can do to fix this?
It's important to understand that MySQL doesn't guarantee that tables will be kept in any particular order. So, after your first LOAD, the order in which the data were inserted may be lost & forgotten - therefore, one would typically relate such data prior to importing it (e.g. as columns of the same record within a single CSV file).
You could LOAD your data into temporary tables that each have an AUTO_INCREMENT column and hope that such auto-incremented identifiers remain aligned between the two tables (MySQL makes absolutely no guarantee of this, but in your case you should find that each record is numbered sequentially from 1); once there, you could perform a query along the following lines:
INSERT INTO Documentation SELECT port, vlan FROM t_Ports JOIN t_Vlan USING (id);
I have one excel sheet which has information which has to be saved in 4 tables. I have to create unique ids for each table.I also have to insert some data into table 1 and then the unique id created there will be used for inserting data into second table(Referential Integrity) . Moreover one table will always get records to be inserted but for rest 3 tables if some data already exists then it has to be updated and not inserted. I m new to SSIS so please guide me on how to proceed further in SSIS.
loads of requirements :)
First, here is an example of a package that loads an excel sheet to a sql database.
You can easily follow it to build your package.
Differences:
You say you need to insert the same data on 4 tables, so between your excel source and your destination, you will add a multicast component and them instead of 1 destination, you will have 4. The "multicast" will create 4 copies of your data, so you can insert into your 4 tables.
The IDs may be a problem, since the 4 destinations will execute separately, you cant get the ID inserted on the first table to update the second. I suggest you do it using a T-SQL on a "Execute SQL task" after everything is imported .
If that is not possible you will need to have 4 separately data flows where on each one you do the inserts reading from your excel and joining with the result of the previous insert with a lookup task
Import it into a Temp table on SQL server. Then you will be able to write a query which retrieves from the Temp table to multiple table.
Hope this solves your problem as per your requirement.
It was easy using phpMyAdmin to pull a list of WordPress admin comments that were missing an IP location, but now that I've got that data in hand I'm looking for a quick way to insert it into the table. My guess is that'll probably be the use (upload) of a .sql file.
ยป WordPress ERD
I've currently got my fresh data in an Excel sheet, with columns for comment_ID and comment_author_IP. There are several hundred records, with a variety of IP addresses.
UPDATE:
The winning query:
UPDATE wp_comments x, temp xx
SET x.comment_author_IP = xx.IP
WHERE x.comment_ID = xx.ID;
If you are looking into a manual and easy process
Use phpMyAdmin to upload your Excel to a new table. (it can import Excels too);
add some indexes on foreign keys on the new created table;
then join the new table with the actual table and update the relevant fields.