Mysql load null data - mysql

The document I want to upload is organized this way:
student#ubuntuM0151:~/PEC3$ head -5 Attributes.txt
GTEX-1117F-0003-SM-58Q7G Blood Whole Blood
GTEX-1117F-0003-SM-5DWSB Blood Whole Blood
GTEX-1117F-0003-SM-6WBT7 Blood Whole Blood
GTEX-1117F-0011-R10a-SM-AHZ7F Brain Brain - Frontal Cortex (BA9)
GTEX-1117F-0011-R10b-SM-CYKQ8 Brain Brain - Frontal Cortex (BA9)
I create a table to upload it to:
mysql> CREATE TABLE attributes
-> (sampID VARCHAR(200) NOT NULL,
-> muestra VARCHAR(200) NOT NULL,
-> cantidad FLOAT,
-> PRIMARY KEY(muestra),
-> FOREIGN KEY(sampID) REFERENCES agesex(SUBJID));
Query OK, 0 rows affected (0.11 sec)
mysql> LOAD DATA LOCAL INFILE 'Attributes.txt' INTO TABLE attributes
-> FIELDS TERMINATED BY "\t" LINES TERMINATED BY "\n"
-> ;
Query OK, 0 rows affected, 65535 warnings (2.43 sec)
Records: 22951 Deleted: 0 Skipped: 22951 Warnings: 68853
I can't understand what is going on

Related

Timestamp field only on insert in MariaDB, combined with 'LOAD DATA LOCAL INFILE' data load

I want a timestamp field in MySQL table, to be set only on inserts, not on updates. The table created like that:
CREATE TABLE `test_insert_timestamp` (
`key` integer NOT NULL,
`value` integer NOT NULL,
`insert_timestamp` timestamp DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`key`)
);
The data is loaded with this sentence (need to be used LOAD DATA LOCAL INFILE):
LOAD DATA LOCAL INFILE
"inserts_test_timestamp1.txt"
REPLACE
INTO TABLE
`test_insert_timestamp`
FIELDS TERMINATED BY ';'
Note: I need to use REPLACE option, not matter why.
The content of inserts_test_timestamp**1**.txt been:
1;2
3;4
I have another file inserts_test_timestamp**2**.txt been:
3;4
5;6
What I wont is:
if I load file inserts_test_timestamp**1**.txt then the field insert_timestamp is set (that is ok with the code)
if I load inserts_test_timestamp**2**.txt, record (3;4) don't change field insert_timestamp already set, but record (5;6) set new insert_timestamp.
But no way. Both records are timestamped with same value, instead of left (3;4) with the old timestamp.
I'm working on MariaDB 5.5.52 database over CentOS 7.3 release. Think that MariaDB version is important, but I can't change that.
You can divide the process in two steps:
MariaDB [_]> DROP TABLE IF EXISTS
-> `temp_test_insert_timestamp`,
-> `test_insert_timestamp`;
Query OK, 0 rows affected (0.00 sec)
MariaDB [_]> CREATE TABLE IF NOT EXISTS `test_insert_timestamp` (
-> `key` integer NOT NULL,
-> `value` integer NOT NULL,
-> `insert_timestamp` timestamp DEFAULT CURRENT_TIMESTAMP,
-> PRIMARY KEY (`key`)
-> );
Query OK, 0 rows affected (0.00 sec)
MariaDB [_]> CREATE TABLE IF NOT EXISTS `temp_test_insert_timestamp` (
-> `key` integer NOT NULL,
-> `value` integer NOT NULL,
-> `insert_timestamp` timestamp DEFAULT CURRENT_TIMESTAMP,
-> PRIMARY KEY (`key`)
-> );
Query OK, 0 rows affected (0.00 sec)
MariaDB [_]> LOAD DATA LOCAL INFILE '/path/to/file/inserts_test_timestamp1.txt'
-> INTO TABLE `test_insert_timestamp`
-> FIELDS TERMINATED BY ';'
-> (`key`, `value`);
Query OK, 2 rows affected (0.00 sec)
Records: 2 Deleted: 0 Skipped: 0 Warnings: 0
MariaDB [_]> SELECT
-> `key`,
-> `value`,
-> `insert_timestamp`
-> FROM
-> `test_insert_timestamp`;
+-----+-------+---------------------+
| key | value | insert_timestamp |
+-----+-------+---------------------+
| 1 | 2 | 2018-03-20 00:49:38 |
| 3 | 4 | 2018-03-20 00:49:38 |
+-----+-------+---------------------+
2 rows in set (0.00 sec)
MariaDB [_]> DO SLEEP(5);
Query OK, 0 rows affected (5.00 sec)
MariaDB [_]> LOAD DATA LOCAL INFILE '/path/to/file/inserts_test_timestamp2.txt'
-> INTO TABLE `temp_test_insert_timestamp`
-> FIELDS TERMINATED BY ';'
-> (`key`, `value`);
Query OK, 2 rows affected (0.00 sec)
Records: 2 Deleted: 0 Skipped: 0 Warnings: 0
MariaDB [_]> SELECT
-> `key`,
-> `value`,
-> `insert_timestamp`
-> FROM
-> `temp_test_insert_timestamp`;
+-----+-------+---------------------+
| key | value | insert_timestamp |
+-----+-------+---------------------+
| 3 | 4 | 2018-03-20 00:49:43 |
| 5 | 6 | 2018-03-20 00:49:43 |
+-----+-------+---------------------+
2 rows in set (0.00 sec)
MariaDB [_]> REPLACE INTO `test_insert_timestamp`
-> SELECT
-> `ttit`.`key`,
-> `ttit`.`value`,
-> `tit`.`insert_timestamp`
-> FROM
-> `temp_test_insert_timestamp` `ttit`
-> LEFT JOIN `test_insert_timestamp` `tit`
-> ON `ttit`.`key` = `tit`.`key`;
Query OK, 2 rows affected (0.01 sec)
Records: 2 Duplicates: 0 Warnings: 0
MariaDB [_]> SELECT
-> `key`,
-> `value`,
-> `insert_timestamp`
-> FROM
-> `test_insert_timestamp`;
+-----+-------+---------------------+
| key | value | insert_timestamp |
+-----+-------+---------------------+
| 1 | 2 | 2018-03-20 00:49:38 |
| 3 | 4 | 2018-03-20 00:49:38 |
| 5 | 6 | 2018-03-20 00:49:43 |
+-----+-------+---------------------+
3 rows in set (0.00 sec)
MariaDB [_]> TRUNCATE TABLE `temp_test_insert_timestamp`;
Query OK, 0 rows affected (0.00 sec)
I implement the solution in this post: MySQL LOAD DATA INFILE with ON DUPLICATE KEY UPDATE
This solution not only allows me to get the insert_timestamp, but also a field with update_timestamp:
# --- Create temporary table ---
CREATE TEMPORARY TABLE temporary_table LIKE test_insert_timestamp;
# --- Delete index to speed up
DROP INDEX `PRIMARY` ON temporary_table;
DROP INDEX `STAMP_INDEX` ON temporary_table;
# --- Load data in temporary table
LOAD DATA LOCAL INFILE "./inserts_test_timestamp1.txt"
INTO TABLE temporary_table
FIELDS TERMINATED BY ';' OPTIONALLY ENCLOSED BY '"'
IGNORE 1 LINES
SET
insert_timestamp = CURRENT_TIMESTAMP(),
update_timestamp = NULL
;
# --- Insert data in temporary table ---
INSERT INTO test_insert_timestamp
SELECT * FROM temporary_table
ON DUPLICATE KEY UPDATE
update_timestamp = CURRENT_TIMESTAMP();
# --- Drop temporary
DROP TEMPORARY TABLE temporary_table;
Thanks for help !

Loading csv into mysql selecting columns

I am trying to learn how to use efficiently mysql. Now, I want to load into a mysql database a csv containing the bibliography of an author. This is the code I have generating the database and trying to upload the file:
USE stephenkingbooks;
DROP TABLE IF EXISTS stephenkingbooks;
CREATE TABLE stephenkingbooks
(
`id` int unsigned NOT NULL auto_increment,
`original_title` varchar(255) NOT NULL,
`spanish_title` varchar(255) NOT NULL,
`year` decimal(4) NOT NULL,
`pages` decimal(10) NOT NULL,
`in_collection` enum('Y','N') NOT NULL DEFAULT 'N',
`read` enum('Y','N') NOT NULL DEFAULT 'N',
PRIMARY KEY (id)
);
LOAD DATA LOCAL INFILE '../files/unprocessed_sking.csv'
INTO TABLE stephenkingbooks (column1, column2, column4, column3)
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 ROWS;
The csv file is format like this:
Carrie,Carrie,Terror,199,19745,"En 1976, el director de cine Brian de Palma hizo la primera pelĂ­cula basada en la novela.7 3"
My idea is to load only the two first columns corresponding to the original_title, the second being the spanish title (the same in mysql and the csv) and after the column3 in csv would be the pages and the column4 the year.
In addition, for the year column, I only want to take the 4 first numbers of the field because I have some of them with a reference that is not part of the year. For example, Carrie was released in 1974 but the csv includes a 5 in the date that I do not want to consider.
My problem is I am not able to obtain what I want without errors in my terminal... any suggestion?
13.2.6 LOAD DATA INFILE Syntax
...
You must also specify a column list if the order of the fields in the
input file differs from the order of the columns in the table.
...
Try:
mysql> LOAD DATA INFILE '../files/unprocessed_sking.csv'
-> INTO TABLE `stephenkingbooks`
-> FIELDS TERMINATED BY ','
-> ENCLOSED BY '"'
-> LINES TERMINATED BY '\r\n'
-> (`original_title`, `spanish_title`, #`genre`, #`pages`, #`year`)
-> SET `year` = LEFT(#`year`, 4), `pages` = #`pages`;
Query OK, 1 row affected (0.00 sec)
Records: 1 Deleted: 0 Skipped: 0 Warnings: 0
mysql> SELECT
-> `id`,
-> `original_title`,
-> `spanish_title`,
-> `year`,
-> `pages`,
-> `in_collection`,
-> `read`
-> FROM `stephenkingbooks`;
+----+----------------+---------------+------+-------+---------------+------+
| id | original_title | spanish_title | year | pages | in_collection | read |
+----+----------------+---------------+------+-------+---------------+------+
| 1 | Carrie | Carrie | 1974 | 199 | N | N |
+----+----------------+---------------+------+-------+---------------+------+
1 row in set (0.00 sec)

My sql query to segregate alphabets from numeric

I have a column name send which consists of alphabetical and numerical value not alphanumeric, i want to segregate the alphabetical value and their counts.
suggest me the query? i tried '%[0-9]%' but not able to segregate
You should really try to include some example data or a little bit of code showing what you're trying to do.
But, whatever, hopefully you can apply my example to your problem. First, I'll create a table with a column name "send" as you mentioned:
mysql> CREATE TABLE test (send VARCHAR(255));
Query OK, 0 rows affected (0.11 sec)
We'll insert into that some various values which are numeric and non numeric:
mysql> INSERT INTO test (send) VALUES (1), (-2), (3), ('a'), ('b'), ('ZZ'),
-> ('test'), (42), ('1.2'), (0), (0123), ('123');
Query OK, 12 rows affected (0.01 sec)
Records: 12 Duplicates: 0 Warnings: 0
Then we can build a query which uses REGEXP to determine if a value is numeric or not:
mysql> SELECT
-> COUNT(*) AS `Count`,
-> IF(1 = send REGEXP '^(-|\\+)?([0-9]+\\.[0-9]*|[0-9]*\\.[0-9]+|[0-9]+)$',
-> 'NUMERIC',
-> 'NON-NUMERIC') AS IsNumeric
-> FROM
-> test
-> GROUP BY
-> IsNumeric
-> ;
+-------+-------------+
| Count | IsNumeric |
+-------+-------------+
| 4 | NON-NUMERIC |
| 8 | NUMERIC |
+-------+-------------+
2 rows in set (0.00 sec)
Apply this to whatever it is that you're trying. The part which says "send REGEXP '^(-|\+)?([0-9]+\.[0-9]|[0-9]\.[0-9]+|[0-9]+)$' " returns 1 if the value of send is a number.

Why isn't all the data getting loaded into my MySQL table?

So I have a file of Twitter data that looks like this
Robert_Aderholt^&^&^2013-06-12 18:32:02^&^&^RT #financialcmte: In 2012, the Obama Admin published 1,172 new regulations totaling 79,000 pages. 57 were expected to have costs of at...
Robert_Aderholt^&^&^2013-06-12 13:42:09^&^&^The Administration's idea of a 'recovery' is 4 million fewer private sector jobs than the average post WWII recovery http://t.co/gSVW0Q8MYK
Robert_Aderholt^&^&^2013-06-11 13:51:17^&^&^As manufacturing jobs continue to decrease, its time to open new markets #4Jobs http://t.co/X2Mswr1i43
(The ^&^&^ words are separators, and I chose that separator because it's unlikely to occur in any of the tweets.)
This file is 90663 lines long (I checked by typing "wc -l tweets_parsed-6-12.csv").
However, when I load them into the table, I only get a table with 40456 entries:
mysql> source ../code/tweets2tables.sql;
Query OK, 0 rows affected (0.03 sec)
Query OK, 0 rows affected (0.08 sec)
Query OK, 40456 rows affected, 2962 warnings (0.81 sec)
Records: 40456 Deleted: 0 Skipped: 0 Warnings: 2962
mysql> SELECT COUNT(*) FROM tweets;
+----------+
| COUNT(*) |
+----------+
| 40456 |
+----------+
1 row in set (0.02 sec)
Why is that? I deleted all lines that didn't contain ^&^&^ so I didn't think there was any funny business going on with the data, but I could be wrong.
My loading code is
DROP TABLE IF EXISTS tweets;
CREATE TABLE tweets (
twitter_id VARCHAR(20),
post_date DATETIME,
body VARCHAR(140)
);
LOAD DATA
LOCAL INFILE 'tweets_parsed-6-12.csv'
INTO TABLE tweets
FIELDS TERMINATED BY '^&^&^'
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
(twitter_id, post_date, body);
The lines that weren't loaded probably contained the " character. If you specify that your fields are terminated with ", the quotes inside of them should be escaped like this - "" (double quotes).
The OPTIONALLY keyword before ENCLOSED may help.

how to speed data loading XML file into MySQL

I've got a 2Gb XML file that I want to load into a single table in MySQL.
The number of records/rows is ~140,000, but the default behavior of the LOAD XML function in MYSQL seems to depart from linear time.
Cutting the data into smaller pieces, I get the following performance (dropped table between each LOAD)
all were: Deleted: 0 Skipped: 0 Warnings: 0
5000 row(s) affected Records: 5000 4.852 sec
10000 row(s) affected Records: 10000 20.670 sec
15000 row(s) affected Records: 15000 80.294 sec
20000 row(s) affected Records: 20000 202.474 sec
The XML is well formed. I've tried:
SET FOREIGN_KEY_CHECKS=0;
SET UNIQUE_CHECKS=0;
What can I do to load it in a reasonable time that doesn't involve cutting it into a dozen pieces?
Try removing the indexes before the load, then rebuilding them afterward.