merge two big csv files

merge two big csv files - csv

I have two the type of csv files,the first file's content is the following:
1 13733776062
2 13535581615
3 13987993374
4 13866603331
The second file's content is the following:
13535581615|1
13733776062|0
13866603331|0
13987993374|1
The first file's format of each line is:id number,the second file's format of each line is:number flag. They have a relationship field:number.
Each file has 10 million lines.
Now I want to combine the two files by the number field into a new file which contains 3 fields of id,number,flag of each line.I am using Java to do this.
Can someone tell me the best method for this work that consumes lower time?

This is task more appropriate for SQLite, not for Java. You can do it as follows:
$ sqlite3 database.db
sqlite> CREATE TABLE table1 (id int, number int);
sqlite> .separator " "
sqlite> .import t1.csv table1
sqlite> CREATE TABLE table2 (number int, flag int);
sqlite> .separator "|"
sqlite> .import t2.csv table2
sqlite> CREATE TABLE mytable AS
SELECT t1.id, t1.number, t2.flag
FROM table1 t1, table2 t2
WHERE t1.number=t2.number;
sqlite> SELECT * FROM mytable;
1|13733776062|0
2|13535581615|1
3|13987993374|1
4|13866603331|0
I would expect that it should work for 10 million lines very fast.
And of course, you can use SQLite JDBC to create and access new database from Java.
To make access faster, it is good idea to create appropriate indexes.

Related

Finding ID and inserting it into another table

I have a table with two columns. ID and WORD. I've used the following query to insert several files into this table
LOAD DATA LOCAL INFILE 'c:/xad' IGNORE INTO TABLE words LINES TERMINATED BY '\n' (#col1) set word=#col1;
Now I'd like to find specific values and insert them into another table. I know based on this question that I can do the following
insert into tab2 (id_customers, value)
values ((select id from tab1 where customers='john'), 'alfa');
But I'd like to do this based on the files. For example:
Loop through each line of file xad and pass it's value to a query like the following
insert into othertable (word_id)
values ((select id from firsttable where word='VALUE FROM CURRENT LINE OF FILE'));
I can write a Java app to do this line by line but I figured it'd be faster to make MySQL do the work if possible. Is there a way to make MySQL loop over each line, find the ID, and insert it into othertable?

Plan A: A TRIGGER could be used to conditionally copy the id to another table when encountered in whatever loading process is used (LOAD DATA / INSERT .. SELECT .. / etc).
Plan B: Simply load the table, then copy over the ids that you desire.
Notes:
The syntax for this
insert into tab2 (id_customers, value)
values ((select id from tab1 where customers='john'), 'alfa');
is more like
insert into tab2 (id_customers, value)
SELECT id, 'alpha'
FROM tab1
WHERE customers = 'john'

remove duplicated rows between two tables

t1 - colums: title, story
t2 - colums: title, story
some rows are duplicated between the two tables, i.e. title and story values are the same.
need to delete that rows from t2 and move the rest of rows from t2 to t1;
Any help?

First delete the duplicate records from t2:
DELETE
FROM t2
WHERE EXISTS (SELECT 1 FROM t1 WHERE t2.title = t1.title AND t2.story = t1.story);
Finally, insert the unique records from t2 into t1:
INSERT INTO t1 (title, story)
SELECT title, story
FROM t2;

-- Create temporary table
CREATE TABLE temp_t LIKE t1;
-- Add constraint
ALTER TABLE temp_t ADD UNIQUE(title, story);
-- Copy data
INSERT IGNORE INTO temp_t SELECT * FROM t1;
INSERT IGNORE INTO temp_t SELECT * FROM t2;
-- copy back and drop temp ( if you dont want these constraints on t1 table)
TRUNCATE TABLE t1;
INSERT INTO t1 SELECT * FROM temp_t;
DROP TABLE temp_t;
-- rename and drop ( if you want these constraints on t1 table)
RENAME TABLE t1 TO old_t1, temp_t TO t1;
DROP TABLE old_t1;

I see the question is tagged mysql and my answer uses linux commandline, vi and a programming language of your choice, but here is the technique to do what you want:
step 1 - sort both tables, t1 and t2 into t1sorted and t2sorted;
step 2 - remove duplicates - once sorted(this step does not work if tables are not sorted, sort direction ascending/descending is not relevant), simply iterate once through each table, comparing current line with previous line, and skip current line if identical with previous line; you can write a simple program to do this in your favorite language such as C; output to t1sorted_dupesremoved and t2sorted_dupesremoved;
step 3 - now you can find which lines from t2sorted_dupesremoved are not present in t1sorted_dupesremoved so that you can add them to t1sorted_dupesremoved like this: (assuming linux command line - the grep \< shows only the lines in t2 not present in t1 and they are redirected into the output file after > note that this output file will be overwritten if it exists):
diff t2sorted_dupesremoved t1sorted_dupesremoved | grep \< > t2lines_not_in_t1
step 4 - now you need to use your favorite editor, such as vi, to edit the output file, t2lines_not_in_t1, to remove the < and extra space inserted by diff at the beginning of each line; then you can concatenate it at the end of t1 like this:
step 5 -
cat t2lines_not_in_t1 >> t1sorted_dupesremoved
note the double >> to add t2 at the end of t1 - very important, a single > will overwrite t1 instead!
step 6 - now you can sort t1sorted_dupesremoved and you're done! you have exactly one instance of each unique line from t1 and t2 in sorted order in one file;

SQL Workbench- Generational table reading

I am new to SQL, I have a bunch of tables test1 test2 test3----test100. Those tables are created on a daily basis and have same columns/format. I want to create a table that should have 1 month past data, in other words i have to combine test1-test30 into one single table. I can only thinks of the below code but i am looking for something efficient and easy to do like using loops or while conditions..or some other way
create table final as
select * from test1;
insert into final from test2
insert into final from test3
.
.
insert into final from test30..
Can you suggest me a simpler way to do this instead of using 30 insert statements

Here is an example:
Copy all columns from one table to another table:
INSERT INTO table2
SELECT * FROM table1
WHERE condition;
Copy only some columns from one table into another table:
INSERT INTO table2 (column1, column2, column3, ...)
SELECT column1, column2, column3, ...
FROM table1
WHERE condition;
Another solution is to dump the data from all your database if it has the same tables then insert it into .csv or txt file and import it into your final table.
Mysql -u [username] -p [database_name] < /path/to/file.[extention]

MySQL load data infile concatenate columns

I have a table like this
mytable(`id` int, `text` varchar(255))
Also I have cvs-like file
1 hello word
2 this is a test
They are separated by space(or something else)
So can I use LOAD DATA INFILE to load the file into the table? How can I do this..?

I would probably go for 2 tables... One is the final, the other has 2 fields just as your CSV.
First solution, in file followed by
insert into table2 select concat(field1, field2) from table1
The other solution is to automate the solution 1 using a trigger. I'm not sure you can trigger something at the end, so trigger for each line added...

Fast load data into file split to tables connect by id

Have a MySQL database using InnoDB and Foreign Keys...
I need to import 100MiB of data from a huge CSV file and split it into two tables and the records have to be like follows
Table1
id|data|data2
Table2
id|table1_id|data3
Where Table2.table1_id is a foreign key referencing Table1.id.
The MySQL sequence for one instance would look like this
Load file into a temporary table
After that do an insert from temporary table to the needed
Get the last insert ID
Do the last insert group using this reference id...
That is utterly slow...
How do I do this using file load into...? Any real ideas with high speed result?

You could temporarily add column data3 to Table1 (I also add a done column to distinguish records which originate from the CSV from those that already exist/originate from elsewhere):
ALTER TABLE Table1
ADD COLUMN data3 TEXT,
ADD COLUMN done BOOLEAN DEFAULT TRUE;
LOAD DATA
INFILE '/path/to/csv'
INTO TABLE Table1 (data, data2, data3)
SET done = FALSE;
INSERT
INTO Table2 (table1_id, data3)
SELECT (id, data3) FROM Table1 WHERE NOT done;
ALTER TABLE Table1
DROP COLUMN data3,
DROP COLUMN done;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

merge two big csv files - csv

Related

Finding ID and inserting it into another table

remove duplicated rows between two tables

SQL Workbench- Generational table reading

MySQL load data infile concatenate columns

Fast load data into file split to tables connect by id

Categories

Resources