Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 20 days ago.
Improve this question
Is there a way to create a copy of a database with sample rows using mysqldump command?
myslqdump -u <username> -h <host> -p <database name> [<table name> ...]
I have a fairly large DB and need to create a copy so that a developer can work with the App. Instead of dumping the entire DB, is there a way to randomly sample rows and create a copy of the db?
Mysqldump does support row-level backup, as it has --where option which filters rows to be dumped.Here is what the reference manual says about it: --where='where_condition', -w 'where_condition'Dump only rows selected by the given WHERE condition. Quotes around the condition are mandatory if it contains spaces or other characters that are special to your command interpreter. But it might not be that user-friendly. Even if we put a subquery in a WHERE clause, we are still facing some restrictions which might not be possible to overcome. For instance, let's use actor table from DB sakila. It's legit to execute this in mysql cli:
select * from actor
where actor_id in (select *
from (select actor_id from actor order by rand() limit 5) t
);
+----------+-------------+-----------+---------------------+
| actor_id | first_name | last_name | last_update |
+----------+-------------+-----------+---------------------+
| 19 | BOB | FAWCETT | 2006-02-15 04:34:33 |
| 91 | CHRISTOPHER | BERRY | 2006-02-15 04:34:33 |
| 11 | ZERO | CAGE | 2006-02-15 04:34:33 |
| 120 | PENELOPE | MONROE | 2006-02-15 04:34:33 |
| 109 | SYLVESTER | DERN | 2006-02-15 04:34:33 |
+----------+-------------+-----------+---------------------+
However, it's erroneous to use the same WHERE clause when using mysqldump.
mysqldump -uroot -p sakila actor --where="actor_id in (select * from (select actor_id from actor order by rand() limit 5) t )" > /tmp/acto
r_bck.sql
-- error message:
mysqldump: Couldn't execute 'SELECT /*!40001 SQL_NO_CACHE */ * FROM `actor` WHERE actor_id in (select * from (select actor_id from actor order by rand() limit 5) t )': Table 'actor' was not locked with LOCK TABLES (1100)
Besides, as Shadow stated, retaining referential integrity is an issue when using mysqldump. We don't want a broken relationship between tables and unusable dataset. With all regards, please do not use mysqldump for random row-level sampling.
Under the circumstances, the best I can come up with is to use a stored procedure to do the row-level backup to a new database,with contents like:
create database sakila_bck
create table sakila_bck.actor select * from sakila.actor order by rand() limit 10;
create table sakila_bck.actor_film select * from sakila.actor_film where actor_id in (select actor_id from sakila_bck.actor);
-- Note: The create table xx select * from yy does not create keys for backup table. By the way, if you want to retrieve a random number of rows, you can try the PREPARED statement to provide the `limit clause` with a randomly generated number beforehand.
The whole process is definitely not a pushover as we have to keep table relationship in mind. But once the job is done, we can safely use mysqldump to dump sakila_bck at DB-level.
Related
I was comparing the speeds of several databases, working with some randomly generated strings and curiously noticed the sorting differed. I am working with thousands of rows but have isolated my data to just these 2 strings for simplicity:
Pazyn Qhhbltw Vxsnwgt
Pazynkfc Tttzqjss Zzpxuarhy
Mongo and MySQL both sort them in the order displayed above, but Postgres switches them around. It seems that the space character is considered to be before "k" by both Mongo and MySQL but after it by Postgres.
How can I get Postgres to fall-in and be consistent with MySQL and Mongo?
I am using Postgres version 10.10 and MySQL 8.0.18.
Both columns are varchar(32) with no specific collation specified so I presume they are using default.
I've tried both with and without an index and I've tried several types of collation on the index but still get the same result.
I'm not sure how to debug this.
Use bytes order
ORDER BY texta::bytea;
CREATE TABLE temp (
id INTEGER
,texta varchar(50)
)
✓
INSERT INTO temp VALUES (1,'Pazyn Qhhbltw Vxsnwgt'),
(2,'Pazynkfc Tttzqjss Zzpxuarhy')
2 rows affected
SELECT * FROM temp ORDER BY texta;
id | texta
-: | :--------------------------
2 | Pazynkfc Tttzqjss Zzpxuarhy
1 | Pazyn Qhhbltw Vxsnwgt
SELECT * FROM temp ORDER BY texta::bytea;
id | texta
-: | :--------------------------
1 | Pazyn Qhhbltw Vxsnwgt
2 | Pazynkfc Tttzqjss Zzpxuarhy
SELECT * FROM temp;
id | texta
-: | :--------------------------
1 | Pazyn Qhhbltw Vxsnwgt
2 | Pazynkfc Tttzqjss Zzpxuarhy
db<>fiddle here
I have two different server server1 and server2, now I have db1 in server1 and db2 in server2.
I am trying to join these two table in MySQL like this.
Select a.field1,b.field2
FROM [server1, 3306].[db1].table1 a
Inner Join [server2, 3312].[db2].table2 b
ON a.field1=b.field2
But I am getting error. Is is possible in MYSQL.
Yes, it is possible in MySQL.
There are similar questions asked previously too. You have to use FEDERATED ENGINE to do this. The idea goes like this:
You have to have a federated table based on the table at another remote location to use the way you want. The structure of the table have to exactly same.
CREATE TABLE federated_table (
id INT(20) NOT NULL AUTO_INCREMENT,
name VARCHAR(32) NOT NULL DEFAULT '',
other INT(20) NOT NULL DEFAULT '0',
PRIMARY KEY (id),
INDEX name (name),
INDEX other_key (other)
)
ENGINE=FEDERATED
DEFAULT CHARSET=latin1
CONNECTION='mysql://fed_user#remote_host:9306/federated/test_table';
[Source Answer]
Replication will be alternate and suitable solution.
server1 - db1 -> replicate to server2. (now db1 and db2 will be in same server server2. join will be easy).
NOTE: If the server2 is enough capable of take the load of db1 in terms of store/process etc., then wen can do the replication. As #brilliand mentioned yes Federated will make the much manual work and slow in process.
It's kind of a hack, and it's not a join, but I use bash functions to make it feel like I'm doing cross-server queries:
The explicit version:
tb2lst(){
echo -n "("
tail -n +2 - | paste -sd, | tr -d "\n"
echo ")"
}
id_list=$(mysql -h'db_a.hostname' -ume -p'ass' -e "SELECT id FROM foo;" | tb2lst)
mysql -h'db_b.hostname' -ume -p'ass' -e "SELECT * FROM bar WHERE foo_id IN $id_list"
+--------|-----+
| foo_id | val |
+--------|-----+
| 1 | 3 |
| 2 | 4 |
+--------|-----+
I wrote some wrapper functions which I keep in my bashrc, so my perspective it's just one command:
db_b "SELECT * FROM bar WHERE foo_id IN $(db_a "SELECT id FROM foo;" | tb2lst);"
+--------|-----+
| foo_id | val |
+--------|-----+
| 1 | 3 |
| 2 | 4 |
+--------|-----+
At least for my use case, this stitches the two queries together quickly enough that the output is equivalent to the join, and then I can pipe the output into whatever tool needs it.
Keep in mind that the id list from one query ends up as query text in the other query. If you "join" too much data this way, your OS might limit the length of query (https://serverfault.com/a/163390). So be aware that this is a poor solution for very large datasets. I have found that doing the same thing with a mysql library like pymysql works around this limitation.
I have two different server server1 and server2, now I have db1 in server1 and db2 in server2.
I am trying to join these two table in MySQL like this.
Select a.field1,b.field2
FROM [server1, 3306].[db1].table1 a
Inner Join [server2, 3312].[db2].table2 b
ON a.field1=b.field2
But I am getting error. Is is possible in MYSQL.
Yes, it is possible in MySQL.
There are similar questions asked previously too. You have to use FEDERATED ENGINE to do this. The idea goes like this:
You have to have a federated table based on the table at another remote location to use the way you want. The structure of the table have to exactly same.
CREATE TABLE federated_table (
id INT(20) NOT NULL AUTO_INCREMENT,
name VARCHAR(32) NOT NULL DEFAULT '',
other INT(20) NOT NULL DEFAULT '0',
PRIMARY KEY (id),
INDEX name (name),
INDEX other_key (other)
)
ENGINE=FEDERATED
DEFAULT CHARSET=latin1
CONNECTION='mysql://fed_user#remote_host:9306/federated/test_table';
[Source Answer]
Replication will be alternate and suitable solution.
server1 - db1 -> replicate to server2. (now db1 and db2 will be in same server server2. join will be easy).
NOTE: If the server2 is enough capable of take the load of db1 in terms of store/process etc., then wen can do the replication. As #brilliand mentioned yes Federated will make the much manual work and slow in process.
It's kind of a hack, and it's not a join, but I use bash functions to make it feel like I'm doing cross-server queries:
The explicit version:
tb2lst(){
echo -n "("
tail -n +2 - | paste -sd, | tr -d "\n"
echo ")"
}
id_list=$(mysql -h'db_a.hostname' -ume -p'ass' -e "SELECT id FROM foo;" | tb2lst)
mysql -h'db_b.hostname' -ume -p'ass' -e "SELECT * FROM bar WHERE foo_id IN $id_list"
+--------|-----+
| foo_id | val |
+--------|-----+
| 1 | 3 |
| 2 | 4 |
+--------|-----+
I wrote some wrapper functions which I keep in my bashrc, so my perspective it's just one command:
db_b "SELECT * FROM bar WHERE foo_id IN $(db_a "SELECT id FROM foo;" | tb2lst);"
+--------|-----+
| foo_id | val |
+--------|-----+
| 1 | 3 |
| 2 | 4 |
+--------|-----+
At least for my use case, this stitches the two queries together quickly enough that the output is equivalent to the join, and then I can pipe the output into whatever tool needs it.
Keep in mind that the id list from one query ends up as query text in the other query. If you "join" too much data this way, your OS might limit the length of query (https://serverfault.com/a/163390). So be aware that this is a poor solution for very large datasets. I have found that doing the same thing with a mysql library like pymysql works around this limitation.
I have noticed a particular performance issue that I am unsure on how to deal with.
I am in the process of migrating a web application from one server to another with very similar specifications. The new server typically outperforms the old server to be clear.
The old server is running MySQL 5.6.35
The new server is running MySQL 5.7.17
Both the new and old server have virtually identical MySQL configurations.
Both the new and old server are running the exact same database perfectly duplicated.
The web application in question is Magento 1.9.3.2.
In Magento, the following function
Mage_Catalog_Model_Category::getChildrenCategories()
is intended to list all the immediate children categories given a certain category.
In my case, this function bubbles down eventually to this query:
SELECT `main_table`.`entity_id`
, main_table.`name`
, main_table.`path`
, `main_table`.`is_active`
, `main_table`.`is_anchor`
, `url_rewrite`.`request_path`
FROM `catalog_category_flat_store_1` AS `main_table`
LEFT JOIN `core_url_rewrite` AS `url_rewrite`
ON url_rewrite.category_id=main_table.entity_id
AND url_rewrite.is_system=1
AND url_rewrite.store_id = 1
AND url_rewrite.id_path LIKE 'category/%'
WHERE (main_table.include_in_menu = '1')
AND (main_table.is_active = '1')
AND (main_table.path LIKE '1/494/%')
AND (`level` <= 2)
ORDER BY `main_table`.`position` ASC;
While the structure for this query is the same for any Magento installation, there will obviously be slight discrepancies on values between Magento Installation to Magento Installation and what category the function is looking at.
My catalog_category_flat_store_1 table has 214 rows.
My url_rewrite table has 1,734,316 rows.
This query, when executed on its own directly into MySQL performs very differently between MySQL versions.
I am using SQLyog to profile this query.
In MySQL 5.6, the above query performs in 0.04 seconds. The profile for this query looks like this: https://codepen.io/Petce/full/JNKEpy/
In MySQL 5.7, the above query performs in 1.952 seconds. The profile for this query looks like this: https://codepen.io/Petce/full/gWMgKZ/
As you can see, the same query on almost the exact same setup is virtually 2 seconds slower, and I am unsure as to why.
For some reason, MySQL 5.7 does not want to use the table index to help produce the result set.
Anyone out there with more experience/knowledge can explain what is going on here and how to go about fixing it?
I believe the issue has something to do with the way that MYSQL 5.7 optimizer works. For some reason, it appears to think that a full table scan is the way to go. I can drastically improve the query performance by setting max_seeks_for_key very low (like 100) or dropping the range_optimizer_max_mem_size really low to forcing it to throw a warning.
Doing either of these increases the query speed by almost 10x down to 0.2 sec, however, this is still magnitudes slower that MYSQL 5.6 which executes in 0.04 seconds, and I don't think either of these is a good idea as I'm not sure if there would be other implications.
It is also very difficult to modify the query as it is generated by the Magento framework and would require customisation of the Magento codebase which I'd like to avoid. I'm also not even sure if it is the only query that is effected.
I have included the minor versions for my MySQL installations. I am now attempting to update MySQL 5.7.17 to 5.7.18 (the latest build) to see if there is any update to the performance.
After upgrading to MySQL 5.7.18 I saw no improvement. In order to bring the system back to a stable high performing state, we decided to downgrade back to MySQL 5.6.30. After doing the downgrade we saw an instant improvement.
The above query executed in MySQL 5.6.30 on the NEW server executed in 0.036 seconds.
Wow! This is the first time I have seen something useful from Profiling. Dynamically creating an index is a new Optimization feature from Oracle. But it looks like that was not the best plan for this case.
First, I will recommend that you file a bug at http://bugs.mysql.com -- they don't like to have regressions, especially this egregious. If possible, provide EXPLAIN FORMAT=JSON SELECT... and "Optimizer trace". (I do not accept tweaking obscure tunables as an acceptable answer, but thanks for discovering them.)
Back to helping you...
If you don't need LEFT, don't use it. It returns NULLs when there are no matching rows in the 'right' table; will that happen in your case?
Please provide SHOW CREATE TABLE. Meanwhile, I will guess that you don't have INDEX(include_in_menu, is_active, path). The first two can be in either order; path needs to be last.
And INDEX(category_id, is_system, store_id, id_path) with id_path last.
Your query seems to have a pattern that works well for turning into a subquery:
(Note: this even preserves the semantics of LEFT.)
SELECT `main_table`.`entity_id` , main_table.`name` , main_table.`path` ,
`main_table`.`is_active` , `main_table`.`is_anchor` ,
( SELECT `request_path`
FROM url_rewrite
WHERE url_rewrite.category_id=main_table.entity_id
AND url_rewrite.is_system = 1
AND url_rewrite.store_id = 1
AND url_rewrite.id_path LIKE 'category/%'
) as request_path
FROM `catalog_category_flat_store_1` AS `main_table`
WHERE (main_table.include_in_menu = '1')
AND (main_table.is_active = '1')
AND (main_table.path like '1/494/%')
AND (`level` <= 2)
ORDER BY `main_table`.`position` ASC
LIMIT 0, 1000
(The suggested indexes apply here, too.)
THIS is not a ANSWER only for comment for #Nigel Ren
Here you can see that LIKE also use index.
mysql> SELECT *
-> FROM testdb
-> WHERE
-> vals LIKE 'text%';
+----+---------------------------------------+
| id | vals |
+----+---------------------------------------+
| 3 | text for line number 3 |
| 1 | textline 1 we rqwe rq wer qwer q wer |
| 2 | textline 2 asdf asd fas f asf wer 3 |
+----+---------------------------------------+
3 rows in set (0,00 sec)
mysql> EXPLAIN
-> SELECT *
-> FROM testdb
-> WHERE
-> vals LIKE 'text%';
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | testdb | NULL | range | vals | vals | 515 | NULL | 3 | 100.00 | Using where; Using index |
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0,01 sec)
mysql>
sample with LEFT()
mysql> SELECT *
-> FROM testdb
-> WHERE
-> LEFT(vals,4) = 'text';
+----+---------------------------------------+
| id | vals |
+----+---------------------------------------+
| 3 | text for line number 3 |
| 1 | textline 1 we rqwe rq wer qwer q wer |
| 2 | textline 2 asdf asd fas f asf wer 3 |
+----+---------------------------------------+
3 rows in set (0,01 sec)
mysql> EXPLAIN
-> SELECT *
-> FROM testdb
-> WHERE
-> LEFT(vals,4) = 'text';
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | testdb | NULL | index | NULL | vals | 515 | NULL | 5 | 100.00 | Using where; Using index |
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0,01 sec)
mysql>
I have two different server server1 and server2, now I have db1 in server1 and db2 in server2.
I am trying to join these two table in MySQL like this.
Select a.field1,b.field2
FROM [server1, 3306].[db1].table1 a
Inner Join [server2, 3312].[db2].table2 b
ON a.field1=b.field2
But I am getting error. Is is possible in MYSQL.
Yes, it is possible in MySQL.
There are similar questions asked previously too. You have to use FEDERATED ENGINE to do this. The idea goes like this:
You have to have a federated table based on the table at another remote location to use the way you want. The structure of the table have to exactly same.
CREATE TABLE federated_table (
id INT(20) NOT NULL AUTO_INCREMENT,
name VARCHAR(32) NOT NULL DEFAULT '',
other INT(20) NOT NULL DEFAULT '0',
PRIMARY KEY (id),
INDEX name (name),
INDEX other_key (other)
)
ENGINE=FEDERATED
DEFAULT CHARSET=latin1
CONNECTION='mysql://fed_user#remote_host:9306/federated/test_table';
[Source Answer]
Replication will be alternate and suitable solution.
server1 - db1 -> replicate to server2. (now db1 and db2 will be in same server server2. join will be easy).
NOTE: If the server2 is enough capable of take the load of db1 in terms of store/process etc., then wen can do the replication. As #brilliand mentioned yes Federated will make the much manual work and slow in process.
It's kind of a hack, and it's not a join, but I use bash functions to make it feel like I'm doing cross-server queries:
The explicit version:
tb2lst(){
echo -n "("
tail -n +2 - | paste -sd, | tr -d "\n"
echo ")"
}
id_list=$(mysql -h'db_a.hostname' -ume -p'ass' -e "SELECT id FROM foo;" | tb2lst)
mysql -h'db_b.hostname' -ume -p'ass' -e "SELECT * FROM bar WHERE foo_id IN $id_list"
+--------|-----+
| foo_id | val |
+--------|-----+
| 1 | 3 |
| 2 | 4 |
+--------|-----+
I wrote some wrapper functions which I keep in my bashrc, so my perspective it's just one command:
db_b "SELECT * FROM bar WHERE foo_id IN $(db_a "SELECT id FROM foo;" | tb2lst);"
+--------|-----+
| foo_id | val |
+--------|-----+
| 1 | 3 |
| 2 | 4 |
+--------|-----+
At least for my use case, this stitches the two queries together quickly enough that the output is equivalent to the join, and then I can pipe the output into whatever tool needs it.
Keep in mind that the id list from one query ends up as query text in the other query. If you "join" too much data this way, your OS might limit the length of query (https://serverfault.com/a/163390). So be aware that this is a poor solution for very large datasets. I have found that doing the same thing with a mysql library like pymysql works around this limitation.