Update table with count MySQL Workbench - large data - improvement needed - mysql

I am trying to generate some data as follows (everything is done in MySQL Workbench 6.0.8.11354 build 833):
What I have:
users table (~9.000.000 entries):
SUBSCRIPTION,NUMBER,STATUS,CODE,TEST1,TEST2,TEST3,MANUFACTURER,TYPE,PROFILE
text(3 options),number,text(3
options),number,yes/no,yes/no,yes,no,number(6 options),text(50
options),text(30 options)
What I need:
stats1 table (data that I want and I need to create):
PROFILE,TYPE,SUBSCRIPTION,STATUS,MANUFACTURER,TEST1YES,TEST1NO,TEST2YES,TEST2NO,TEST3YES,TEST3NO
profile1,type1,subscription1,status1,man1,count,count,count,count,count,count
profile1,type2,subscription2,status2,man2,count,count,count,count,count,count
each PROFILE,TYPE,SUBSCRIPTION,STATUS,MANUFACTURER pair is unique.
What I did so far:
Created the stats1 table
Execute the following query in order to populate the table (I ended up with ~500 distinct entries):
insert into stats1 (PROFILE,TYPE,SUBSCRIPTION,STATUS,MANUFACTURER) select DISTINCT users.PROFILE,users.TYPE,users.SUBSCRIPTION,users.STATUS,users.MANUFACTURER from users;
Execute the following script for counting the values foe OCT13YES, for each of the ~500 entries:
update stats1 SET TEST1YES = (select count(*) from users where (users.TEST1YES='yes' and users.PROFILE=stats1.PROFILE and users.TYPE=stats1.TYPE and users.SUBSCRIPTION=stats1.SUBSCRIPTION and users.STATUS=stats1.STATUS and users.MANUFACTURER=stats1.MANUFACTURER));
I receive the following error in Workbench:
Error Code: 2013. Lost connection to MySQL server during query 600.573 sec
This is a known bug in Workbench and, even so, the server continues to execute the query.
However, the query runs for more than 70 minutes in the background (as I saw in client connections / management) and I need to run 5 more queries like this, for the rest of the columns.
Is there a better / faster / more efficient way for performing the count for those 6 columns in stats1 table?

Related

How to view the list of SQL queries executed in RDS MYSQL

I would like to view the following columns in the query output in RDS MySQL. Could you help me to write sql query against the right system tables to view all the sql queries (from one specific db or all db ) executed by one specific database user or all users ?
The columns that I am trying to fetch are -
database user name
database name
tablename,
sql query id,
sql query text,
query start time
query end time
For example, I executed select count(*) and then trying to see the list of comamnds that I had executed, so, I tried to
query them using the system table "INFORMATION_SCHEMA.PROCESSLIST" but i coudldn't find the same. Please guide/correct me.
I connect the instance using "testuser" credentail and execute the following
create database testdb;
use testdb;
create table testdb.table01 as select * from testdb01.Persons;
select count(*) from testdb.table01;
SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST WHERE USER='testuser';
228336 testuser 10.xx.xxx.xxx:50881 Sleep 14
228337 testuser 10.xx.xxx.xxx:50882 testdb Query 0 executing SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST WHERE USER='testuser'
LIMIT 0, 1000
Thank you,
Kind regards,
sk
The INFORMATION_SCHEMA.PROCESSLIST is a monitor for all the corrently active queries on your DB, but if they are fast to execute you most likely never see them in that table
You could try to use the MySQL integrated logging procedure insted by logging all the queries within the mysql.general_log table and filter the user_host column
Refere to this question Log all queries in mysql
Keep in mind that this table can grow in size very quickly if not frequently cleared, also it can cause some performance issues due to resource consumption

What is the most efficient way to select multiple rows by a set of 100 thousands IDs in sql

I know that you can use
SELECT *
FROM table
WHERE id IN (ids)
In my case, I have 100,000 of ids.
I'm wondering if MySQL has a limit for IN clause. If anyone knows a more efficient way to do this, that would be great!
Thanks!
Just this week I had to kill -9 a MySQL 5.7 Server where one of the developers had run a query like you describe, with a list of hundreds of thousands of id's in an IN( ) predicate. It caused the thread running the query to hang, and it wouldn't even respond to a KILL command. I had to shut down the MySQL Server instance forcibly.
(Fortunately it was just a test server.)
So I recommend don't do that. I would recommend one of the following options:
Split your list of 100,000 ids into batches of at most 1,000, and run the query on each batch. Then use application code to merge the results.
Create a temporary table with an integer primary key.
CREATE TEMPORARY TABLE mylistofids (id INT PRIMARY KEY);
INSERT the 100,000 ids into it. Then run a JOIN query for example:
SELECT t.* FROM mytable AS t JOIN mylistofids USING (id)
Bill Karwin suggestions are good.
The number of values from IN clause is only limited by max_allowed_packet from my.ini
MariaDB creates a temporary table when the IN clause exceeds 1000 values.
Another problem with such number of IDs is the transfer of data from the PHP script (for example) to the MySQL server. It will be a very long query. You can also create a stored procedure with that select and just call it from you script. It will be more efficient in terms of passing data from your script to MySQL.

MySQL - query from slave and insert on master

I used to run this command to insert some rows in a counter table:
insert into `monthly_aggregated_table`
select year(r.created_at), month(r.created_at), count(r.id) from
raw_items r
group by 1,2;
This query is very heavy and takes some time to run (millions of rows), and the raw_items table is MyISAM, so it was causing table locking and writes to it had to wait for the insert to finish.
Now I created a slave server to do the SELECT.
What I would like to do is to execute the SELECT in the slave, but get the results and insert into the master database. Is it possible? How? What is the most efficient way to do this? (The insert used to have 1.3 million rows)
I am running MariaDB 10.0.17
You will have to split the action in 2 parts with a programming language like java or php in between.
First the select, then load the resultset into your application, and then insert the data.
Another optimization which you could do to speed the select up is add one new column in your table "ym_created_at" containing a concatenation of year(created_at) and month(created_at). Place an index on that column and then run the updated statement:
insert into `monthly_aggregated_table`
select ym_created_at, count(r.id) from
raw_items r
group by 1;
Easier and might be a lot quicker since not functions are acting on the columns you are using the group by on.

SSIS Data Flow performance slow vs select into

so I have a script:
select *
into my_archive_05302013
from archive_A
where my_Date = '05/18/2013'
and:
insert into archive_B (ID,my_date,field_n )
select ID, my_Date,field_n from my_archive_05302013
where the n in field_n is about 100 or so. so in other words there are more than 100 columns in the table that I am loading.
which run pretty fast the query inserts about 200000 records. my_date is a non-clustered index in table archive_A
Now when I create a dataflow using SSIS 2008 it takes HOURS to complete
I have the following in my OLE DB source:
SELECT * FROM Archive_A
WHERE My_Date = (SELECT MAX(My_Date) from Archive_A)
and for OLE DB Destination:
Data access mode of: "Table or view - fast load"
Name of the table: archive_B
Table lock and Check constraints is checked
Any one know what the problem could be?
Thanks in advance
The problem is that because you are using a data source and a data destination what you are doing is pulling all of the data out of the database to then put it all back in again whereas your INSERT statement keeps it all contained within the database. Use an Excute SQL Task with your INSERT statement instead.

Google Cloud SQL: Unable to execute statement

My Google cloud sql table have 1126571 rows currently and adding minimum 30 thousand every day.When execute the query :
select count(distinct sno) as tot from visits
sql prompt it will generate following error:
Error 0: Unable to execute statement
. Is Cloud SQL Query liable to 60 seconds exceed exception. How can overcome the problem when the table become large.
Break the table into two tables. One to receive new visits ... transactions ... one for reporting. Index the reporting table. Transfer and clear data on a regular basis.
The transaction table will remain relatively small and thus it will be fast to count. The reporting table will be fast to count because of the index.
add an INDEX in your column sno and it will improve its performance.
ALTER TABLE visits ADD INDEX (sno)
Try to split your select query for many parts, for example, the first select query must be limited to 50000, and then the second select query must be started from 50000 and limited to 50000 and so on.
You can do that by this scenario :
1- Get records count.
2- Make a loop and make it end at the records count.
3- For each loop, make the select query select 50000 records and append the results to a datatable (depends on what's your programming language)
4- In the next loop, you must start selecting from where previous loop ended, for example, the second query must select the next 50000 records and so on.
You can specify your select starting index by this SQL query statement:
SELECT * FROM mytable somefield LIMIT 50000 OFFSET 0;
Then you will get the whole data that you want.
NOTE : make a test to see what's the maximum records count can be loaded in 60 sec, this will decrease your loops and therefore, increased performance.