remove duplicated files moved from database to hdfs with apache_nifi - duplicates

i have a problem with apache nifi i want to move data from database to hdfs .i have one table year with one column when i move it i found a lot of files contain the same table year .
what i have to do to remove the duplicated files i have used the updateattribute processor but i didnt know how to use it to fix the problem enter image description here
this pic show the duplicated files with the same content in hdfs directory

Related

SSIS - Exporting data with commas to a csv file

I am trying to export a list of fields to a csv file from a database.
It keeps putting all the data onto one column and doesn't separate it. When checking the preview it seems to be okay but on export its not working. Currently trying to following settings. Any help would be appreciated.
SSIS settings
Excel file output issue
Actually it seems to work, Excel is just too dumb to recognize it.
Mark the whole table, then go to Data -> Text in Rows
And configure the wizard (Separeted, Semikolon as Separator):
Now you have seperated the rows and cells:

Hive - external tables and csv data

I need some help from you with a problem of understanding refrencing data from hive. The following situation: I have a CSV fil data.csv imported into hadoop. Now I have found many snippets that use an external table to create a schema on top of the csv file. My question is, how does hive know that the schema of the external table is connected to data.csv. In examples I cannot find a reference to the csv file.
Where is sample_1.csv referenced for usage in this hive example or how does hive know that data from sample_1.csv includes the data?
While creating external table we have to give the list of columns and hdfs location. Hive will store only column metadata like column name, datatype.. and the hdfs location.
When we execute query on external table it will fetch metadata and then fetch available files from hdfs location.
now we've got the answer. The manual recommends to store one file in one directory. When we then build an external table on top it seems that the data ist identified by the schema.
In my Testcase i have imorted 3 csv files with one schema 2 files got the matching schema. The third file got one column more. If i run a query the data of all three files are shown. The additional column from the third file is missing.
Everything is fine now - thank you!

Google Big Query Wildcard Data sets

I have 45 CSV files stores in google cloud storage in the same folder, when wild carding these into a data set table I am finding that some of the rows are missing once I connect the data to tableau. If I just select one of the files all the data will appear. all the files are called "PMPRO_PIVOT_ASDKE" the last 5 digits will change for each file. I have tried wildcarding with "PMPRO_PIVOT*" and it will take data from each file but some of the data is missing from each file.
Any ideas would be great as I've been trying to solve this all day.

Erasing records from text file after importing it to MySQL database

I know how to import a text file into MySQL database by using the command
LOAD DATA LOCAL INFILE '/home/admin/Desktop/data.txt' INTO TABLE data
The above command will write the records of the file "data.txt" into the MySQL database table. My question is that I want to erase the records form the .txt file once it is stored in the database.
For Example: If there are 10 records and at current point of time 4 of them have been written into the database table, I require that in the data.txt file these 4 records get erased simultaneously. (In a way the text file acts as a "Queue".) How can I accomplish this? Can a java code be written? Or a scripting language is to be used?
Automating this is not too difficult, but it is also not trivial. You'll need something (a program, a script, ...) that can
Read the records from the original file,
Check if they were inserted, and, if they were not, copy them in another file
Rename or delete the original file, and rename the new file to replace the original one.
There might be better ways of achieving what you want to do, but, that's not something I can comment on without knowing your goal.

mysql table is not displaying

I have one database name - manaskavya. In this database I had created 10 table with xampp server. Due to some reason i installed Wamp server in that it was showing only 9 table, It misses one table name 'manas_likes'. Then I again installed the Xampp server but the missing table is not displaying, and when i am creating the new table with same name its, showing table exists, and when I am trying to drop it,repair it or truncate it, its showing table not exists.
I don't know why its happening , if you know please help me out.
Thank you
Once i faced the same problem. I tried Change table name in the create table query, exequte it and then rename the table. Then drop the table and recreate it.
After checking all these details I got some thing, I am right or wrong don't know but it worked for me..
1. When we create new table in database, its create one file with .frm extension
you can check this file in C:\xampp\mysql\data\database_name\anytable.frm
for wamp C:\xampp\bin\mysql\mysql5.0.2\data\database_name\anytable.frm .
2. After inserting data in this particular table it creates two more files with extension
.MYD and .MYI .
3.when we create the backup of the particular database..
If you are making backup with phpmyadmin then such type of problem will never be happen.
if you are making backup directly from folder means C:\xampp\mysql\data\database_name then some time it misses the .MYD and .MYI extensions files
4. After importing again in database there would be only .frm extension file, Due to absence of .MYD and .MYI extension file it becomes invisible. and it also won't allow you to create new table with same name because .frm extension file is already present in your database.
5. So in such type of case go to directly at folder position and delete that specific .frm extension file (Please be sure to delete the correct file).
6. Then after you will be able to create the table with same name.