How to avoid temporary table on group by with join? - mysql

I'm having two tables say(for example), Department and Members
Department table description:
CREATE TABLE `Department` (
`code` int(10) DEFAULT NULL,
`name` char(100) DEFAULT NULL,
KEY `code_index` (`code`),
KEY `name_index` (`name`)
)
Department table values:
+------+-------------+
| code | name |
+------+-------------+
| 1 | Production |
| 2 | Development |
| 3 | Management |
+------+-------------+
Members table description:
CREATE TABLE `Members` (
`department_code` int(10) DEFAULT NULL,
`name` char(100) DEFAULT NULL,
KEY `department_code_index` (`department_code`),
KEY `name_index` (`name`)
)
Members table values:
+-----------------+----------------+
| department_code | name |
+-----------------+----------------+
| 1 | Ross Geller |
| 1 | Monica Geller |
| 1 | Phoebe Buffay |
| 1 | Rachel Green |
| 1 | Chandler Bing |
| 1 | Joey Tribianni |
| 2 | Janice |
| 2 | Gunther |
| 2 | Cathy |
| 2 | Emily |
| 2 | Fun Bobby |
| 2 | Heckles |
| 3 | Paolo |
| 3 | Mike Hannigan |
| 3 | Carol |
| 3 | Susan |
| 3 | Richard |
| 3 | Tag |
+-----------------+----------------+
I want to get the all the department code and name for the given set of users. As i just want the department names alone, I used the below query.
mysql> select Department.code, Department.name, Members.department_code from Department left join Members on (Department.code=Members.department_code) where Members.name in ('Rachel Green', 'Gunther', 'Paolo') group by Department.code;
+------+-------------+-----------------+
| code | name | department_code |
+------+-------------+-----------------+
| 1 | Production | 1 |
| 2 | Development | 2 |
| 3 | Management | 3 |
+------+-------------+-----------------+
This works fine and the "explain" gives me below execution plan.
+----+-------------+------------+------------+------+----------------------------------+-----------------------+---------+----------------------+------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+----------------------------------+-----------------------+---------+----------------------+------+----------+---------------------------------+
| 1 | SIMPLE | Department | NULL | ALL | code_index | NULL | NULL | NULL | 3 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | Members | NULL | ref | department_code_index,name_index | department_code_index | 5 | test.Department.code | 1 | 16.67 | Using where |
+----+-------------+------------+------------+------+----------------------------------+-----------------------+---------+----------------------+------+----------+---------------------------------+
But the "group by" uses temporary table which may degrade the performance if the Members table contains a lot of rows. Though I guess some ideal indexing would help out here, i can't get the proper idea. Any help will be appreciated.
Thanks in advance!

You can avoid the group by over all the data using a subquery:
select d.code, d.name, d.department_code
from Department d
where exists (select 1
from Members m
where d.code = m.department_code and
m.name in ('Rachel Green', 'Gunther', 'Paolo')
);
With an index on members(department_code, name), this should be much faster.

Related

MySql JOIN performance go down with group by

I have these tables:
table "f" (26000 record)
+------------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+------------------+------+-----+---------+-------+
| idFascicolo | int(11) | NO | PRI | | |
| oggetto | varchar | NO |index| | |
+------------------+------------------+------+-----+---------+-------+
table "r" (22000 record)
+------------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+------------------+------+-----+---------+-------+
| idRichiedente | int(11) | NO | PRI | | |
| name | varchar | NO |index| | |
+------------------+------------------+------+-----+---------+-------+
table "fr" (32000 record)
+------------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+------------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | | |
| idFascicolo | int(11) | NO |index| | FK |
| idRichiedente | int(11) | NO |index| | FK |
+------------------+------------------+------+-----+---------+-------+
this is my select:
SELECT
f.idFascicolo,
f.oggetto,
r.richiedente
FROM fr
JOIN f ON (f.idFascicolo=fr.idFascicolo)
JOIN r ON (r.idRichiedente=fr.idRichiedente)
WHERE r.name LIKE '%string%'
in the result, I would like to see only 1 row per f.idFascicolo (I should have "Rossi Mario" and "Rossi Marco" for the same f.idFascicolo) , the my new select is:
SELECT
f.idFascicolo,
f.oggetto,
r.richiedente
FROM fr
JOIN f ON (f.idFascicolo=fr.idFascicolo)
JOIN r ON (r.idRichiedente=fr.idRichiedente)
WHERE r.name LIKE '%string%'
GROUP BY f.idFascicolo
here, the performance read from PhpMyAdmin:
0.0057 seconds: .. WHERE r.name LIKE '%string%'
0.0527 seconds: .. WHERE r.name LIKE '%string%' GROUP BY f.idFascicolo
0.0036 seconds: .. WHERE r.name LIKE 'string%' GROUP BY f.idFascicolo
I don't understand if the problem of the slow query is GROUP BY or LIKE '%string%'(i need '%string%' .. I can't find an equivalent solution with fulltext index and MATCH .. AGAINST)
This is the explain:
+------+-------------+-------+------+-------------------------+---------------+---------+----------------------+-----------+---------------------------------------------+
| id | select type | table | type | possible keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+-------------------------+---------------+---------+----------------------+-----------+---------------------------------------------+
| 1 | simple | r | ALL | PRIMARY | NULL | NULL | NULL | 20925 |Using where; Using temporary; Using filesort |
+------+-------------+-------+------+-------------------------+---------------+---------+----------------------+-----------+---------------------------------------------+
| 1 | simple | fr | ref |idFascicolo,idRichiedente| idRichiedente | 4 | db.r.idRichiedente | 1 | |
+------+-------------+-------+------+-------------------------+---------------+---------+----------------------+-----------+---------------------------------------------+
| 1 | simple | f |eq_ref|PRIMARY | PRIMARY | 4 | db.fr.idFascicolo | 1 | |
+------+-------------+-------+------+-------------------------+---------------+---------+----------------------+-----------+---------------------------------------------+
You have two potential performance issues. First is the GROUP BY. This requires sorting the data, so it has to read all the data and do a lot of work.
The second is the LIKE. There is a fundamental difference between:
WHERE r.name LIKE '%string%'
and
WHERE r.name LIKE 'string%'
The second can use an index on r(name), because the like pattern does not start with a pattern.
I am not sure what your actual question is. I don't recommend doing using GROUP BY the way you are using it -- because you have unaggregated columns in the SELECT.

MySQL: How to optimize this query

I wish to reduce time to query data in view.
My tables have following structure:
Table Rings contains individual rings, each ring has unique combination of ID_RingType and Number, But also ID, which is used as foreign key elsewhere.
-- RINGS
CREATE TABLE `Rings` (
ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
ID_RingType CHAR(2) NOT NULL,
Number MEDIUMINT UNSIGNED NOT NULL,
ID_RingStatus TINYINT DEFAULT 1,
ID_User INT(11),
DateLastChange TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
FOREIGN KEY (ID_RingType) REFERENCES RingType(Code),
FOREIGN KEY (ID_RingStatus) REFERENCES RingStatus(ID),
FOREIGN KEY (ID_User) REFERENCES `848-cso`.`Users`(UID)
);
-- create index on tripple ID_User, ID_RingType, Number
CREATE INDEX idx_rings ON `Rings` (ID_User, ID_RingType, Number);
CREATE INDEX idx_rings_overview ON `Rings` (ID_RingType, Number, ID_RingStatus);
CREATE INDEX idx_rings_numbers ON `Rings` (ID_RingStatus, ID_User, ID_RingType, Number);
Ring Status contains only 4 values and their meaning
-- RING STATUS
CREATE TABLE `RingStatus` (
ID TINYINT NOT NULL PRIMARY KEY,
Name VARCHAR(20) UNIQUE COLLATE utf8_czech_ci,
NameEng VARCHAR(20)
);
Ring Type is indentified by two-letters Code
-- RING TYPE
CREATE TABLE `RingType` (
Code CHAR(2) NOT NULL PRIMARY KEY,
Material VARCHAR(30) COLLATE utf8_czech_ci,
Radius DOUBLE UNSIGNED,
MaxVal MEDIUMINT UNSIGNED NOT NULL
);
Moreover, I use following function:
/*
Function returns tinyint(1) specifying, whether ring was assigned
*/
CREATE FUNCTION fn_isRingAssigned (idRingStatus TINYINT)
RETURNS TINYINT(1) DETERMINISTIC
RETURN IF(idRingStatus = 1,1,2);
The query which I try to optimize is stored in following VIEW:
/*
View finds contiguous ranges of rings grouped by type, radius and status
*/
ALTER VIEW vw_rings_overview AS SELECT
a.ID_RingType,
rt.Radius,
fn_isRingAssigned(a.ID_RingStatus) AS status,
rs.Name,
a.Number AS min,
MIN(b.Number) AS max
FROM
RingStatus AS rs, Rings AS a
JOIN RingType AS rt ON a.ID_RingType = rt.Code
JOIN Rings AS b
ON a.ID_RingType = b.ID_RingType
AND fn_isRingAssigned(a.ID_RingStatus) = fn_isRingAssigned(b.ID_RingStatus)
AND a.Number <= b.Number
WHERE NOT EXISTS
( SELECT 1
FROM Rings AS c
WHERE c.ID_RingType = a.ID_RingType
AND fn_isRingAssigned(c.ID_RingStatus) = fn_isRingAssigned(a.ID_RingStatus)
AND c.Number = a.Number - 1
)
AND NOT EXISTS
( SELECT 1
FROM Rings AS d
WHERE d.ID_RingType = b.ID_RingType
AND fn_isRingAssigned(d.ID_RingStatus) = fn_isRingAssigned(b.ID_RingStatus)
AND d.Number = b.Number + 1
)
AND fn_isRingAssigned(a.ID_RingStatus) = rs.ID
GROUP BY
a.ID_RingType,
fn_isRingAssigned(a.ID_RingStatus),
a.Number
ORDER BY
a.ID_RingType,
a.Number;
The data in Rings table look as follows
+----+-------------+--------+---------------+---------+---------------------+
| ID | ID_RingType | Number | ID_RingStatus | ID_User | DateLastChange |
+----+-------------+--------+---------------+---------+---------------------+
| 1 | A | 1 | 4 | 2 | 2015-12-02 19:02:50 |
| 2 | A | 2 | 4 | 2 | 2015-12-02 19:02:56 |
| 3 | A | 3 | 4 | 2 | 2015-12-02 19:22:29 |
| 4 | A | 4 | 4 | 2 | 2015-12-21 20:32:24 |
| 5 | A | 5 | 4 | 2 | 2015-12-21 20:52:08 |
| 6 | A | 6 | 4 | 2 | 2015-12-21 20:52:22 |
| 7 | A | 7 | 1 | 2 | 2015-12-02 19:00:23 |
| 8 | A | 8 | 1 | 2 | 2015-12-02 19:00:23 |
| 9 | A | 9 | 1 | 2 | 2015-12-02 19:00:23 |
| 10 | A | 10 | 1 | 2 | 2015-12-02 19:00:23 |
+----+-------------+--------+---------------+---------+---------------------+
And results of the query look like this:
mysql> select * from vw_rings_overview;
+-------------+--------+--------+----------------+-----+-------+
| ID_RingType | Radius | status | Name | min | max |
+-------------+--------+--------+----------------+-----+-------+
| A | 20 | 2 | Assigned | 1 | 6 |
| A | 20 | 1 | Not assigned | 7 | 10 |
+-------------+--------+--------+----------------+-------------+
What the view does is it finds contiguous ranges in rings, having the same ring type, status and radius.
Table Rings currently contains less than 30 000 rows, and querying takes approx. 2 seconds. It is expected to contains few millions of rows, so I wish to optimize design of tables, indexes and view.
Here is result of EXPLAIN:
mysql> explain select * from vw_rings_overview;
+----+--------------------+------------+--------+--------------------+--------------------+---------+-----------------------------+-------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+--------+--------------------+--------------------+---------+-----------------------------+-------+-----------------------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 19 | |
| 2 | DERIVED | a | index | idx_rings_overview | idx_rings_overview | 7 | NULL | 25173 | Using where; Using index; Using temporary; Using filesort |
| 2 | DERIVED | rt | eq_ref | PRIMARY | PRIMARY | 2 | 848-avi2.a.ID_RingType | 1 | |
| 2 | DERIVED | rs | eq_ref | PRIMARY | PRIMARY | 1 | func | 1 | Using where |
| 2 | DERIVED | b | ref | idx_rings_overview | idx_rings_overview | 2 | 848-avi2.rt.Code | 1573 | Using where; Using index |
| 4 | DEPENDENT SUBQUERY | d | ref | idx_rings_overview | idx_rings_overview | 5 | 848-avi2.b.ID_RingType,func | 1 | Using where; Using index |
| 3 | DEPENDENT SUBQUERY | c | ref | idx_rings_overview | idx_rings_overview | 5 | 848-avi2.a.ID_RingType,func | 1 | Using where; Using index |
+----+--------------------+------------+--------+--------------------+--------------------+---------+-----------------------------+-------+-----------------------------------------------------------+
Here are some sample data: http://sqlfiddle.com/#!9/b8b489/1

Wondering how I can speed up a MySQL call

I'm looking for an answer as to how I can speed up my query on a table of 500,000 records.
I'm just inserting the COUNT to BROKERAGE_STOCKS_COVERED counting the number of times the same brokerage ESTIMID shows up within a date range for each record - excluding the record being examined. The only other condition is that the ANALYST is not blank.
I make a number of similar calls on the table - they all come back in 10 ... maybe 15 seconds. The only difference from this call and my others - is that this one returns a COUNT of up to 1000 for BROKERAGE_STOCKS_COVERED - whereas my other queries result in maybe 3, or 4 COUNT. This one takes almost a whole hour: :/
UPDATE `working` SET `BROKERAGE_STOCKS_COVERED` =
(SELECT COUNT(`ID`)
FROM ( SELECT `ID`, `ESTIMID`, `ANNDATS_CONVERTED`,
`ANALYST`, `REVDATS_CONVERTED`
FROM `working`
) AS BB
WHERE
BB.`ANNDATS_CONVERTED` <= `working`.`ANNDATS_CONVERTED`
AND
BB.`REVDATS_CONVERTED` > `working`.`ANNDATS_CONVERTED`
AND
BB.`ID` != `working`.`ID`
AND
BB.`ESTIMID` = `working`.`ESTIMID`
AND
BB.`ANALYST` != ''
)
WHERE `working`.`ANALYST` != '';
-- 0n 500,000 rows "457656 rows affected. (Query took 2782.4304 seconds.)" (46 min)
| ID | ANALYST | ESTIMID | ANNDATS_CONVERTED | REVDATS_CONVERTED | BROKERAGE_STOCKS_COVERED | NO_TOP_RATING |
--------------------------------------------------------------------------------------------------------------------
| 1 | DAVE | Brokerage000 | 1998-07-01 | 1998-07-04 | | 3 |
| 2 | DAVE | Brokerage000 | 1998-06-28 | 1998-07-10 | | 4 |
| 3 | DAVE | Brokerage000 | 1998-07-02 | 1998-07-08 | | 2 |
| 4 | DAVE | Brokerage000 | 1998-07-04 | 1998-12-04 | | 3 |
| 5 | SAM | Brokerage000 | 1998-06-14 | 1998-06-30 | | 4 |
| 6 | SAM | Brokerage000 | 1998-06-28 | 1999-08-08 | | 4 |
| 7 | | Brokerage000 | 1998-06-28 | 1999-08-08 | | 5 |
| 8 | DAVE | Brokerage111 | 1998-06-28 | 1999-08-08 | | 3 |
'EXPLAIN' results:
id| select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
----------------------------------------------------------------------------------------------------------------------------------------
1 | PRIMARY | working | index | ANALYST | PRIMARY | 4 | NULL | 467847 | Using where
2 | DEPENDENT SUBQUERY | <derived3> | ref | <auto_key0> | <auto_key0> | 92 | working.ESTIMID | 46785 | Using where
3 | DERIVED | working | ALL | NULL | NULL | NULL | NULL | 467847 | NULL
EXPLAIN
SELECT COUNT(`ID`) FROM (SELECT `ID`, `IRECCD`, `ANALYST`, `ESTIMID`, `ANNDATS_CONVERTED`, `REVDATS_CONVERTED` FROM `working`) AS BB
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
--------------------------------------------------------------------------------------------------
1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 462762 | NULL
2 | DERIVED | working | ALL | NULL | NULL | NULL | NULL | 462762 | NULL
EXPLAIN
SELECT COUNT(`ID`) FROM (SELECT `ID`, `IRECCD`, `ANALYST`, `ESTIMID`, `ANNDATS_CONVERTED`, `REVDATS_CONVERTED` FROM `working`) AS BB
WHERE
BB.`ANNDATS_CONVERTED` <= `ANNDATS_CONVERTED`
AND
BB.`REVDATS_CONVERTED` > `ANNDATS_CONVERTED`
AND
BB.`ID` != `ID`
AND
BB.`ESTIMID` = `ESTIMID`
AND
BB.`ANALYST` != ''
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
----------------------------------------------------------------------------------------------------
1 | PRIMARY |NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE
2 | DERIVED | working | ALL | NULL | NULL | NULL | NULL | 462762 | NULL
I think the "impossible WHERE" is just because it this part of the query is separated from the UPDATE for the purpose of displaying the "EXPLAIN
I am using InnoDB on a windows 8 PHP/MySQL install.
My columns are indexed. I have memory maxed on my windows/MySQL/
and it all works great.
- Just wondering if this is a normal wait time for such a query?
- And is there a way to speed this particular query up?
Generally - when attempting to optimize a slow running query - one would ask the database system to explain it's strategy of resolving the query. In this case you can use the SQL Explain command, on the sub-select and independently and on the where clause, to find the exact cause of the slow down. This may indicate if your where clause should exist outside the sub-select, or if the problem lies elsewhere.

mysql how to find the total number of child rows with respect to a parent

I have a table which having parent child relatiionship like this,
Employee_ID | Employee_Manager_ID | Employee_Name
--------------------------------------------------------
1 | 1 | AAAA
2 | 1 | BBBB
3 | 2 | CCCC
4 | 3 | DDDD
5 | 3 | EEEEE
Is it possible to get the count of all the employees come under a particular employee(Not only direct child,count of all the childs of child ) using a single query ?
Eg if the input = 1
output should be 4
if input = 2 ,output should be 3
thanks in advance
Suppose your table is:
mysql> SELECT * FROM Employee;
+-----+------+-------------+------+
| SSN | Name | Designation | MSSN |
+-----+------+-------------+------+
| 1 | A | OWNER | 1 |
| 10 | G | WORKER | 5 |
| 11 | D | WORKER | 5 |
| 12 | E | WORKER | 5 |
| 2 | B | BOSS | 1 |
| 3 | F | BOSS | 1 |
| 4 | C | BOSS | 2 |
| 5 | H | BOSS | 2 |
| 6 | L | WORKER | 2 |
| 7 | I | BOSS | 2 |
| 8 | K | WORKER | 3 |
| 9 | J | WORKER | 7 |
+-----+------+-------------+------+
12 rows in set (0.00 sec)
Query is:
SELECT SUPERVISOR.name AS SuperVisor,
GROUP_CONCAT(SUPERVISEE.name ORDER BY SUPERVISEE.name ) AS SuperVisee,
COUNT(*)
FROM Employee AS SUPERVISOR
INNER JOIN Employee SUPERVISEE ON SUPERVISOR.SSN = SUPERVISEE.MSSN
GROUP BY SuperVisor;
The query will produce result like:
+------------+------------+----------+
| SuperVisor | SuperVisee | COUNT(*) |
+------------+------------+----------+
| A | A,B,F | 3 |
| B | C,H,I,L | 4 |
| F | K | 1 |
| H | D,E,G | 3 |
| I | J | 1 |
+------------+------------+----------+
5 rows in set (0.00 sec)
[Answer]:
This for One level (immediate supervise) to find all supervises at all possible level you have to use while loop (use stored procedures).
Although it is possible to retrieve employees at each level and then take their UNION, we cannot, in general, specify a query such as "retrieve the supervisees of a employee at all levels" without utilizing a looping mechanism."
REFERENCE: in this slide read slid number 23.
The BOOK is " FUNDAMENTALS OF FourthEdition DATABASE SYSTEMS" in chapter "The Relational Algebra and Relational Calculus" there is a topic "Recursive Closure Operations".
Adding Query for Table creation, May be helpful to you:
mysql> CREATE TABLE IF NOT EXISTS `Employee` (
-> `SSN` varchar(64) NOT NULL,
-> `Name` varchar(64) DEFAULT NULL,
-> `Designation` varchar(128) NOT NULL,
-> `MSSN` varchar(64) NOT NULL,
-> PRIMARY KEY (`SSN`),
-> CONSTRAINT `FK_Manager_Employee` FOREIGN KEY (`MSSN`) REFERENCES Employee(SSN)
-> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.17 sec)
You can check Table like:
mysql> DESCRIBE Employee;
+-------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| SSN | varchar(64) | NO | PRI | NULL | |
| Name | varchar(64) | YES | | NULL | |
| Designation | varchar(128) | NO | | NULL | |
| MSSN | varchar(64) | NO | MUL | NULL | |
+-------------+--------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
You may try this:
SELECT
table_name.Employee_ID,
table_name.Employee_Name,
COUNT(*) AS children
FROM
table_name AS t_one
INNER JOIN table_name AS t_two ON
t_two.Employee_Manager_ID=t_one.Employee_ID
GROUP BY
t_one.Employee_ID

Join by part of string

I have following tables:
**visitors**
+---------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+---------+----------------+
| visitors_id | int(11) | NO | PRI | NULL | auto_increment |
| visitors_path | varchar(255) | NO | | | |
+---------------------+--------------+------+-----+---------+----------------+
**fedora_info**
+----------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+-------+
| pid | varchar(255) | NO | PRI | | |
| owner_uid | int(11) | YES | | NULL | |
+----------------+--------------+------+-----+---------+-------+
First I looking for visitors_path that are related to specific pages by:
SELECT visitors_id, visitors_path
FROM visitors
WHERE visitors_path REGEXP '[[:<:]]fedora/repository/.*:[0-9]+$';
The above query return expected result.
now .*:[0-9]+ in above query referred to pid in second table. now I want know count of result in above query grouped by owner_uid in second table.
How can I JOIN this tables?
EDIT
sample data:
visitors
+-------------+---------------------------------+
| visitors_id | visitors_path |
+-------------+---------------------------------+
| 4574 | fedora/repository/islandora:123 |
| 4575 | fedora/repository/islandora:123 |
| 4580 | fedora/repository/islandora:321 |
| 4681 | fedora/repository/islandora:321 |
| 4682 | fedora/repository/islandora:321 |
| 4704 | fedora/repository/islandora:321 |
| 4706 | fedora/repository/islandora:456 |
| 4741 | fedora/repository/islandora:456 |
| 4743 | fedora/repository/islandora:789 |
| 4769 | fedora/repository/islandora:789 |
+-------------+---------------------------------+
fedora_info
+-----------------+-----------+
| pid | owner_uid |
+-----------------+-----------+
| islandora:123 | 1 |
| islandora:321 | 2 |
| islandora:456 | 3 |
| islandora:789 | 4 |
+-----------------+-----------+
Expected result:
+-----------------+-----------+
| count | owner_uid |
+-----------------+-----------+
| 2 | 1 |
| 4 | 2 |
| 3 | 3 |
| 2 | 4 |
| 0 | 5 |
+-----------------+-----------+
I suggest you to normalize your database. When inserting rows in visitors extract pid in the front end language and put it in a separate column (e.g. fi_pid). Then you can join it easily.
The following query might work for you. But it'll be little cpu intensive.
SELECT
COUNT(a.visitors_id) as `count`,
f.owner_uid
FROM (SELECT visitors_id,
visitors_path,
SUBSTRING(visitors_path, ( LENGTH(visitors_path) -
LOCATE('/', REVERSE(visitors_path)) )
+ 2) AS
pid
FROM visitors
WHERE visitors_path REGEXP '[[:<:]]fedora/repository/.*:[0-9]+$') AS `a`
JOIN fedora_info AS f
ON ( a.pid = f.pid )
GROUP BY f.owner_uid
Following query returns expected result, but its very slow Query took 9.6700 sec
SELECT COUNT(t2.pid), t1.owner_uid
FROM fedora_info t1
JOIN (SELECT TRIM(LEADING 'fedora/repository/' FROM visitors_path) as pid
FROM visitors
WHERE visitors_path REGEXP '[[:<:]]fedora/repository/.*:[0-9]+$') t2 ON t1.pid = t2.pid
GROUP BY t1.owner_uid