How to find empty tables in hive database - mysql

I have a hive database records with 50 tables;
I want to check if any tables are empty.
The database name is employee.
I don't want to do this manually i.e Do a select * query on each table individually.
Can anyone explain

Hive does not keep track of the number of records present in a table. Only during the query execution, the files belonging to the particular table is read and processed. So there is no other way to know the number of records present in each table without querying each table individually.
Alternatively, You can run a disk usage command on the database directory in HDFS
hdfs dfs -du -s -h <hive.warehouse.dir>/employee/*
the table folders with 0B are obviously empty.
This is possible because Hive stores the table files in the HDFS LOCATION given at the time of table creation or at the path mentioned for hive.warehouse.dir property in hive-site.xml. Default is /user/hive/warehouse.
If the tables are a managed tables, for the database employee all the tables' records will be stored under <hive.warehouse.dir>/employee/.

Related

Restore mysqldump that created using --no-create-info and want to skip tables that not present in current database

I have a mysqldump backup that was created using --no-create-info option. I want to restore it to a new database that does not have certain tables (approximately 50 tables remove from target database as they were no longer needed).
So I am getting Table 'table_name' doesn't exist for the obvious reason.
So what is the mysql way of restoring to a database that does not have all the tables present in backup file.
I may user --insert-ignore to avoid this failure but I doubt this may also ignore some genuine errors such as data type mismatch etc.
You cannot insert rows to a table that doesn't exist, obviously.
To restore the data in your dump file, you need to create those tables first. You could go back to your source MySQL instance and dump those table definitions with mysqldump --no-data
If you don't care about the data, and you only want to restore data for tables that do exist, then you could filter out the INSERT statements before trying to import that script.
You could use grep -v for example to eliminate the rows.
Or you could use sed to delete lines between "-- Dumping data for table tablename" and whatever the next table is.
If you don't want to filter the data but you don't care about restoring data for the tables that don't exist, you could create dummy tables with the right fields, but define the tables with the BLACKHOLE storage engine, so the INSERTs won't actually result in saving any data.
One more option: Import the dump file with mysql --force so it continues even if it gets errors on some of the INSERTs.

How to import certain data from file on AWS Aurora

Problem: I have an Aurora RDS database that has a table where the data for a certain column was deleted. I have a snapshot of the DB from a few days ago that I want to use to populate the said column with the values from the snapshot. The issue is that certain rows have been deleted from the live DB in the meantime and I don't want to include them again.
I want to mount the snapshot, connect to it and then SELECT INTO OUTFILE S3 the table that interests me. Then I will LOAD DATA FROM S3 into the live DB, selecting only the column that interests me. But I haven't found information about what happens if the number of rows differ, namely if the snapshot has rows that were deleted in the meantime from the live DB.
Does the import command take the ID column into consideration when doing the import? Should I also import the ID column? I don't want to recreate the rows in question, I only want to populate the existing rows with the values from the column I want from the snapshot.
ALTER TABLE the destination table to add the column you are missing. It will be empty of data for now.
LOAD DATA your export into a different table than the ultimate destination table.
Then do an UPDATE with a JOIN between the destination table and the imported table. In this update, copy the values for the column you're trying to restore.
By using an inner join, it will only match rows that exist in both tables.

Importing MYSQL database to NeO4j

I have a mysql database on a remote server which I am trying to migrate into Neo4j database. For this I dumped the individual tables into csv files and am now planning to use the LOAD CSV functionality to create graphs from the tables.
How does loading each table preserve the relationship between tables?
In other words, how can I generate a graph for the entire database and not just a single table?
Load each table as a CSV
Create indexes on your relationship field (Neo4j only does single property indexes)
Use MATCH() to locate related records between the tables
Use MERGE(a)-[:RELATIONSHIP]->(b) to create the relationship between the tables.
Run "all at once", this'll create a large transaction, won't go to completion, and most likely will crash with a heap error. Getting around that issue will require loading the CSV first, then creating the relationships in batches of 10K-100K transaction blocks.
One way to accomplish that goal is:
MATCH (a:LabelA)
MATCH (b:LabelB {id: a.id}) WHERE NOT (a)-[:RELATIONSHIP]->(b)
WITH a, b LIMIT 50000
MERGE (a)-[:RELATIONSHIP]->(b)
What this does is find :LabelB records that don't have a relationship with the :LabelA records and then creates that relationship for the first 50,000 records it finds. Running this repeatedly will eventually create all the relationships you want.

Update MySQL database Table from another Table in different location

I have the following issue related to I have in localhost(my computer) a Table in a database which I use to update the data for a month. Once data is correct, I need to update the Table in the database which resides in the server.
I use Navicat to do the work and it only transfer data deleting the actual database in the server and sending all the data from my localhost.
The problem is that the Table now has almost 300.000 records stored and it takes too long transfering the data leaving the database empty for some time.
Is there any way I could use that only update the data without deleting the whole table?
export local table with different name as mysqldump or just csv, 300k rows is not a big deal and use a different table now.
then upload the table 2 to db and use a query to update table 1 using table2 data.

MySQL "source" command overwrites table

I have a MySQL Server which has one database called "Backup".
It only has one table with the name "storage".
In the Backup db the storage table contains about 5 Millions datarows.
Now I wanted to append new rows to the table by using the "source" command in the SQL command line.
So what happend is, that source uploaded all the new files in the table, but it overwrote the existing entries (seems that he first deleted all data)
What I have to say is that the sql file that I want to update comes from another server where this table has the same name and structure as "storage".
What I want is to append the new entries that are in the sql file to the one in my datebase. I do not want to overwrite them.
The structure in the two tables is exactly the same. I use the Backup datebase as the name says for backup uses, so that from time to time I can backup my data.
Has anyone an idea how to solve this?
Look in the .sql file you're reading with the SOURCE command, and remove the DROP TABLE and CREATE TABLE statements that appear there. They are the cause of your table being overwritten; what's actually happening is that the table is being replaced.
You could also look into using SELECT ... INTO OUTFILE and LOAD DATA INFILE as a faster and less potentially destructive way to get data from one server to the other in a file.