Identifying faulty data in MySQL table - mysql

Apologies for asking a question that may have answers in some form or another on here, but I was unable to make any of those solutions work for me.
I have the following query:
SELECT `user_id`, `application_id`, `unallocated_date`, `check_in_date`, `check_out_date`
FROM `student_room`
WHERE `user_id` = 17225
ORDER BY `application_id` DESC
It produces the following result:
user_id | application_id | unallocated_date | check_in_date | check_out_date
--------+----------------+---------------------+---------------------+---------------
17225 | 30782 | 2018-02-04 14:32:29 | NULL | NULL
17225 | 30782 | 2018-02-04 14:32:49 | NULL | NULL
17225 | 30782 | 2018-02-04 14:32:51 | NULL | NULL
17225 | 30782 | NULL | NULL | NULL
17225 | 30782 | NULL | 2018-02-04 14:41:54 | NULL
The fourth row in the result is a fault in my data; it should look similar to the first three rows - these rows occur happens when a student was allocated a new room, and the previous one needs to be unallocated. In this case, the unallocation of row 4 did not actually happen due to either a historical bug in the system I am working on, or user error, but most likely the former.
How can I identify ALL such rows? My attempts with GROUP BY and HAVING did not work, as I checked where all three date fields were NULL, but it did not pick up this particular user - so I was doing something wrong. My original query was:
SELECT COUNT(user_id) AS `count`, user_id FROM `student_room`
WHERE `unallocated_date` IS NULL
AND `check_in_date` IS NULL
AND `check_out_date` IS NULL
GROUP BY `user_id`
HAVING COUNT(user_id) > 1
ORDER BY `user_id` ASC
I tried various INNER JOIN attempts too, but I did not of them right...
The rows that I am interested in will have at least one entry with all three dates NULL, but also one where there is a check_in_date that is NOT NULL, as per this example. If I only had the first four rows, then the data could be correct, but the fifth row's presence makes the fourth row a faulty record - it should've been given an "unallocated_date" value at the time of the allocation of the room in the fifth row, which for some reason did not happen.

Together with a friend of mine, we made the following query that works. I have now learned that you can use "EXISTS" in MySQL. I saw it used when dropping or creating tables, but never like this. It ended up that this query solves the problem:
SELECT cte.user_id, COUNT(*)
FROM (
SELECT sro.user_id
FROM student_room AS sro
WHERE sro.unallocated_date IS NULL
AND sro.check_in_date IS NULL
AND sro.check_out_date IS NULL
AND EXISTS (
SELECT *
FROM student_room AS sri
WHERE sri.user_id = sro.user_id
AND sri.student_room_id > sro.student_room_id
)
ORDER BY user_id DESC
)
AS cte
GROUP BY cte.user_ID
ORDER BY COUNT(*) DESC
This query is the result of more than an hour of tinkering with records that was erroneous, so apologies if this appears to not match the question's requirements 100%, but it does solve the problem for me.

Related

MySQL - get a column showing COUNT() from an associated table on each row

I have a database in MySQL (5.5.60-MariaDB).
I'm doing a SELECT query to get rows from a table called revision_filters followed by various INNER JOIN's to get associated data. The query looks as follows and executes correctly:
SELECT RevisionFilters.id AS `RevisionFilters__id`,
RevisionFilters.date AS `RevisionFilters__date`,
RevisionFilters.comment AS `RevisionFilters__comment`,
filters.label AS `Filters__label`,
filters.anchor AS `Filters__anchor`,
groups.label AS `Groups__label`
FROM revision_filters RevisionFilters
INNER JOIN dev_hub_subdb.filters Filters
ON filters.id = ( RevisionFilters.filter_id )
INNER JOIN dev_hub_subdb.groups Groups
ON groups.id = ( filters.group_id )
INNER JOIN dev_hub_subdb.regulations Regulations
ON regulations.id = ( groups.regulation_id )
There is a table called revision_filters_substances. The structure of the table is as follows. In this instance revision_filter_id is a foreign key that relates to revision_filters.id.
mysql> describe revision_filters_substances;
+--------------------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+-----------------------+------+-----+---------+----------------+
| id | mediumint(8) unsigned | NO | PRI | NULL | auto_increment |
| revision_filter_id | mediumint(8) unsigned | NO | MUL | NULL | |
| substance_id | mediumint(8) unsigned | NO | MUL | NULL | |
+--------------------+-----------------------+------+-----+---------+----------------+
What I want to do is adapt my SELECT query so that on each row returned I can get a COUNT of the number of rows in revision_filters_substances that correspond to the rows in the SELECT query for revision_filters.
In some instances, it's possible that there are no rows in revision_filters_substances corresponding to a particular revision_filters.id and in this case I need the count to return 0.
I've read https://dba.stackexchange.com/questions/133384/counting-rows-from-a-subquery
But I can't see how to adapt this to my query.
It says on the linked article
The subquery should immediately follow the FROM keyword.
So I've tried doing this immediately following FROM revision_filters RevisionFilters in the query I have already:
, (SELECT COUNT(id) FROM revision_filters_substances WHERE id = revision_filters.id) AS count_substances
But this errors:
Unknown column 'revision_filters.id' in 'where clause'
Please can someone advise if this is possible? I don't see how to specify 0 if there are no corresponding rows either, so also need advice on how to achieve that.
You have aliased the revision_filters table to RevisionFilters. Use RevisionFilters.id instead in the where clause of the Correlated Subquery.
Also, to handle "no rows", current subquery will return NULL; you would have to use Coalesce(..) around it to return 0.
SELECT .... ,
COALESCE(SELECT COUNT(id)
FROM revision_filters_substances
WHERE id = RevisionFilters.id, 0) AS count_substances
.... /* your rest of the query here (FROM, WHERE clauses etc) */

MYSQL - Query to extract all columns from the top N distinct elements

I have design an event where you register multiple fishes and I wanted a query to extract the top 3 heaviest fishes from different people. In case of tie, it should be decided by a third parameter: who registered it first. I've tested several ways I found here on stack overflow but none of them worked the way I needed.
My schema is the following:
id | playerid | playername | itemid | weight | date | received | isCurrent
Where:
id = PK, AUTO_INCREMENT - it's basically an index
playerid = the unique code of the person who registered the fish
playername = name of the person who registered the fish
itemid = the code of the fish
weight = the weight of the fish
date = pre-defined as CURRENT_TIMESTAMP, the exact time the fish was registered
received = pre-defined as 0, it really don't matter for this analysis
isCurrent = pre-defined as 1, basically every time this event runs it updates this field to 0, meaning the registers don't belong to the current version of the event.
Here you can see the data I'm testing with
my problem is: How to avoid counting the same playerid for this rank more than once?
Query 1:
SELECT `playerid`, `playername`, `itemid`, `weight`
FROM `event_fishing`
WHERE `isCurrent` = 1 AND `weight` IN (
SELECT * FROM
(SELECT MAX(`weight`) as `fishWeight`
FROM `event_fishing`
WHERE `isCurrent` = 1
GROUP BY `playerid`
LIMIT 3) as t)
ORDER BY `weight` DESC, `date` ASC
LIMIT 3
Query 2:
SELECT * FROM `event_fishing`
INNER JOIN
(SELECT playerid, MAX(`weight`) as `fishWeight`
FROM `event_fishing`
WHERE `isCurrent` = 1
GROUP BY `playerid`
LIMIT 3) as t
ON t.playerid = `event_fishing`.playerid AND t.fishWeight = `event_fishing`.weight
WHERE `isCurrent` = 1
ORDER BY weight DESC, date ASC
LIMIT 3
Keep in mind that I must return at least the fields: playerid, playername, itemid, weight, that the version of the event must be the actual (isCurrent = 1), one playerid per line with the heaviest weight he registered for this version of the event and the date is registered.
Expected output for the data I've sent:
id |playerid|playername|itemid|weight| date |received| isCurrent
7 | 3734 |Mago Xxx | 7963 | 1850 | 2018-07-26 00:17:41 | 0 | 1
14 | 228 |Night Wolf| 7963 | 1750 | 2018-07-26 19:45:49 | 0 | 1
8 | 3646 |Test Spell| 7159 | 1690 | 2018-07-26 01:16:51 | 0 | 1
Output I'm getting (with both queries):
playerid|playername|itemid|weight
3734 |Mago Xxx | 7963 | 1850
228 |Night Wolf| 7963 | 1750
228 |Night Wolf| 7963 | 1750
Thank you for the attention.
EDIT: I've followed How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL? since my query is very similar to the accepted answer, in the comments I've found something that at a first glance seem to have solved my problem but I've found a case where the accepted answer fail. Check http://sqlfiddle.com/#!9/72aeef/1
If you take a look at data you'll notice that the id 14 was the first input of 1750 and therefore should be second place, but the MAX(id) returns the last input of the same playerid and therefore give us a wrong result.
Despite the problems seems alike, mine has a greater complexity and therefore the queries that were suggested doesn't work
EDIT 2:
I've managed to solve my problem with the following query:
http://sqlfiddle.com/#!9/d711c7/6
But I'll leave this question open because of two things:
1- I don't know if there's a case where this query might fail
2- Despite we limit a lot the first query, I still think this can be more optimized, so I'll leave it open to any one that might know a better way to solve the issue.

Mysql 3 tables, multiple counts grouped by, but gets stuck

I'm running with some troubles on a query. I'm trying to retrieve some data of a big database where 3 tables are involved.
These tables contain data about adds where, in a backend website, the administrator can manage which local adds he wants to be displayed, position and etc... These are organized in 3 tables, 1 of them, contains all the data that are relevant to adds info (Name, date of avaliability, date of expiration, etc...). Then, there's another 2 tables which contain some extra info, but just about views, or clicks.
So I have only 15 adds, that have multiple clicks and multiple views.
Each click and view table, register a new row for every click. So, when a click is registered, it will add a new row where addid_views is a register(click), and addid is addid from adds_table. So for instance, add (1) will have 2 views and 2 clicks while add (2) will have 1 view and 1 click.
My idea is to get for each add, how many clicks and views had in total.
I have 3 tables like these:
adds_table adds_clicks_table adds_views_table
+-------+-----------+ +-------------+------+ +-------------+------+
| addid | name | | addid_click |addid | | addid_views |addid |
+-------+-----------+ +-------------+------+ +-------------+------+
| 1 | add_name1 | | 1 | 1 | | 1 | 1 |
+-------+-----------+ +-------------+------+ +-------------+------+
| 2 | add_name2 | | 2 | 2 | | 2 | 1 |
+-------+-----------+
| 3 | add_name3 | | 3 | 1 | | 3 | 2 |
+-------+-----------+ +-------------+------+ +-------------+------+
CREATE TABLE `bwm_adds` (
`addid` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL,
...
PRIMARY KEY (`addid`)
) ENGINE=InnoDB AUTO_INCREMENT=16 DEFAULT CHARSET=utf8
CREATE TABLE `bwm_adds_clicks` (
`add_clickid` int(19) NOT NULL AUTO_INCREMENT,
`addid` int(11) NOT NULL,
...
PRIMARY KEY (`add_clickid`)
) ENGINE=InnoDB AUTO_INCREMENT=3374 DEFAULT CHARSET=utf8
CREATE TABLE `bwm_adds_views` (
`add_viewsid` int(19) NOT NULL AUTO_INCREMENT,
`addid` int(11) NOT NULL,
...
PRIMARY KEY (`add_viewsid`)
) ENGINE=InnoDB AUTO_INCREMENT=2078738 DEFAULT CHARSET=utf8
The result would be a single table where I retrieved, per each add (addid), how many clicks and how many views it had.
I need to get all a query where I get something like this:
+-------+---------+-----------+
| addid | clicks | views |
+-------+---------+-----------+
| 1 | 123123 | 235457568 |
+-------+---------+-----------+
| 2 | 5124123 | 435345234 |
+-------+---------+-----------+
| 3 | 123541 | 453563623 |
+-------+---------+-----------+
I tried to execute a query but it get's stuck and loading for undefined time... I 'm pretty sure that my query is failing cause if I remove one of the counts, displays some data very fast.
SELECT a.addid, COUNT(ac.addid_clicks) as 'clicks', COUNT(av.addid_views) as 'views'
FROM `adds_table` a
LEFT JOIN `adds_clicks_table` ac ON a.addid = ac.addid_click
LEFT JOIN `adds_views_table` av ON ac.addid_click = av.addid_views
GROUP BY a.addid
Mysql gets loading all the time, any idea to help know what I'm missing?
By the way, I found this post where treats almost the same problem I have, you can see I have the query very similar to the first answer, but I get the Loading message all the time. No errors, just Loading.
Edit: I missplaced the numbers and got confused. Now the tables are fixed and I added some explanation about it.
Edit2: I updated the post with SHOW CREATE TABLES DEFINITIONS.
Edit3: Is there any way to optimise this query? It seems it retrieves the result I want but the mysql database cancels the query because it gets more than 30 seconds to execute.
SELECT a.addid,
(SELECT COUNT(addid) FROM add_clicks where addid = a.addid) as clicks,
(SELECT COUNT(addid) FROM add_views where addid = a.addid) as views
FROM adds a ORDER BY a.addid;
If those are really your tables (one column, plus an auto_inc), then there is no meaningful information justifying having 3 tables instead of 1:
CREATE TABLE `bwm_adds` (
`addid` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL,
clicks INT UNSIGNED NOT NULL,
views INT UNSIGNED NOT NULL,
PRIMARY KEY (`addid`)
) ENGINE=InnoDB AUTO_INCREMENT=16 DEFAULT CHARSET=utf8
and then UPDATE ... SET views = views + 1 (etc) rather than inserting into the other tables.
If you have an old version,
SELECT a.addid,
( SELECT COUNT(addid_clicks)
FROM `adds_clicks_table`
WHERE addid = a.addid
) AS 'clicks',
( SELECT COUNT(addid_clicks)
FROM `adds_views_table`
WHERE addid = a.addid
) AS 'views'
FROM adds_table AS a
For 5.6 and later, this might be faster:
SELECT a.addid, c.clicks, v.views
FROM `adds_table` a
LEFT JOIN ( SELECT addid, COUNT(addid_clicks) FROM addid_clicks ) AS c USING(addid)
LEFT JOIN ( SELECT addid, COUNT(addid_views) FROM addid_views ) AS v USING(addid)
If you get NULLs but prefer 0s, then wrap the value in IFNULL(..., 0).
If you need to discuss further, please provide SHOW CREATE TABLE and EXPLAIN SELECT ...
I ended with a solution to my problem. The table I was trying to reach was too big cause of the bad engineered database, where in adds_views_table, for each view, a new row would be added. Ending with almost 3 millions of rows and with a table that weights almost the 35% of the entire database (326MB).
When phpmyadmin tried to execute a query, loaded for ever and never showed a result because a timeout limit applied to mysql. Changing this value would help but wasn't viable to retrieve that data and display it on a website (that implies the website or data wouldn't load until the query its executed).
That problem was fixed thanks to creating an index of addid in adds_table. Also, the query it's faster if subquery's are used for some reason. The query ended like this:
SELECT a.addid,
(SELECT COUNT(addid) FROM adds_clicks_table WHERE addid = a.addid) AS 'clicks',(SELECT COUNT(addid) FROM adds_views_table WHERE addid = a.addid) AS 'views'
FROM adds_table a
ORDER BY a.addid;
Thanks to #Rick James who posted a similar query and I ended modifying it to get the data I needed
forgive my horrible english

How to use MySQL REGEXP in the WHERE of a JOIN statement

I have two tables A and B
Table A has columns: ID and POST
Table B has columns: ID, POST_ID and UPPERS
I want to select all records where a.POST matches the regex
'\\[cd(i|b)?(=[a-z0-9]+)?\\].+\\[/cd(i|b)?\\]'
and JOIN table B on a.ID = b.POST_ID where b.UPPERS matches the regex
'(\\|[0-9]+\\![0-9]{4}[-]+[0-9]{2}[-]+[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},){1,}'
I came up with the following statement but it is not returning any row even when the columns contains the contents matching the regex
SELECT a.*,b.*
FROM a JOIN
b
ON b.POST_ID=a.ID
WHERE a.POST RLIKE '\\[cd(i|b)?(=[a-z0-9]+)?\\].+\\[/cd(i|b)?\\]' AND
b.UPPERS REGEXP '(\\|[0-9]+\\![0-9]{4}[-]+[0-9]{2}[-]+[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},){1,}'
Summary:
I want to select records where a users has sent contents that matches this regex
'\\[cd(i|b)?(=[a-z0-9]+)?\\].+\\[/cd(i|b)?\\]'
and then check if that very post has received at least two ups(or likes) using the regex
'(\\|[0-9]+\\![0-9]{4}[-]+[0-9]{2}[-]+[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},){2,}'
which can be broken down as simply:
a prefix pipe: |
a user id: [0-9]+
an exclamation mark: !
a datetime: [0-9]{4}[-]+[0-9]{2}[-]+[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}
and a sufix: ,
NOTE: {2,} simply to check how many times the match occurs
Please can someone point me in the right direction as to what am doing wrong.
Sample table datas:
Table A
ID | POST
23 match found [cd=plain]6h+#gtyr[/cd]
24 match found [cd]65#%gte2!iu[/cd]
25 match found [cdi]*tre&y^g82u[/cdi]
26 no match found *tre&y^g82u
27 no match found rtyure99
28 match found [cdb]aha87ulchr[/cdb]
Table B
ID | POST_ID | UPPERS
4 24 |98!2018-02-10 22:43:03,
|35!2018-02-08 20:42:09,
|3!2018-02-05 02:05:07,
5 26 |2!2018-02-10 22:43:03,
|30!2018-02-08 20:42:09,
6 25 |21!2018-02-10 22:43:03,
7 27 |23!2018-02-10 22:43:03,
|11!2018-02-08 20:42:09,
NOTE: POST_ID in table B is a foreign key referencing ID of table A
If you don't mind, I'm actually going to answer the question that lies beneath your actual question. I'm sure we could work through why the regular expression is not working as you expect, but it begs the question: why use regular expressions for such a simple task?
It happens a lot that people first just use a database to stash stuff that is the same format that appears in the code. But if you take a little time to break down your data in a meaningful way, you can unlock a lot of power from humble MySQL.
Think about the question you want this query to answer:
Which posts that match certain criteria have been upped?
As you already realized, that suggests two tables - one to store information about the posts, and another to store information about who upped them. To make your queries fast and easy, think about which attributes of the information are going to show up in your where clause.
You want posts that are enclosed by certain markup. To make your search more efficient, put the markup tag in its own column:
CREATE TABLE `posts` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`tag` enum('cd','cdi','cdb') DEFAULT NULL,
`tag_value` varchar(11) DEFAULT NULL,
`content` text NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
for the data you list above, the table might look something like:
+-----+------+-----------+-------------+
| id | tag | tag_value | content |
+-----+------+-----------+-------------+
| 23 | cd | plain | 6h+#gtyr |
| 24 | cd | NULL | 65#%gte2!iu |
| 25 | cdi | NULL | *tre&y^g82u |
| 26 | NULL | NULL | *tre&y^g82u |
| 27 | NULL | NULL | rtyure99 |
| 28 | cdb | NULL | aha87ulchr |
+-----+------+-----------+-------------+
It takes a little more work to put your data IN (this is where your regex powers are better applied, as you create the INSERT), but now you can do all sorts of things with it quite easily. I used an ENUM for the tag column, because that is extra-fast to search on. If you have a large number of tags or don't know what they will all be, you can just use a VARCHAR instead.
So how to you track UPPERS? That part gets very easy. All you need is a table with a row for each time someone ups something:
CREATE TABLE `uppers` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`post_id` int(11) DEFAULT NULL,
`time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Currently when someone ups something, you have to go find the relevant record, append new data to it, then save it back. Now you can just slap a record into the table. The time will be set automatically; all you need to insert is the user_id and post_id. Some of your data might look like:
+----+---------+---------+---------------------+
| id | user_id | post_id | time |
+----+---------+---------+---------------------+
| 2 | 98 | 24 | 2018-02-10 15:23:03 |
| 3 | 35 | 24 | 2018-02-10 15:23:23 |
| 4 | 27 | 24 | 2018-02-10 15:23:43 |
| 5 | 2 | 26 | 2018-02-10 15:24:16 |
| 6 | 30 | 26 | 2018-02-10 15:24:28 |
+----+---------+---------+---------------------+
Now you can harness the power of the MySQL engine to capture all the information you need:
All posts with the desired tags:
SELECT * FROM posts where tag IN ('cd', 'cdi', 'cdb')
All post with the desired tags and at least one up:
SELECT posts.*, uppers.user_id, uppers.time
FROM posts
INNER JOIN uppers ON posts.id = uppers.post_id
WHERE tag IN ('cd', 'cdi', 'cdb')
That will return a row for each post-upper combination. The INNER JOIN means it will not return any posts that don't have a match in the uppers table. This may be what you are looking for, but if you want to group the ups together by post ID, you can ask MySQL to group them for you:
SELECT posts.*, COUNT(uppers.user_id)
FROM posts
INNER JOIN uppers
WHERE tag IN ('cd', 'cdi', 'cdb')
GROUP BY posts.id
If you want to rule out duplicate ups by the same user, you can easily only count unique user id's for each post:
SELECT posts.*, COUNT(DISTINCT uppers.user_id)
FROM posts
INNER JOIN uppers
WHERE tag IN ('cd', 'cdi', 'cdb')
GROUP BY posts.id
There are many functions like COUNT() you can use to work with the data that gets grouped together. You could MAX(uppers.time) to get the time of the most recent up for that post, or you can use functions like GROUP_CONCAT() to put the values together in a long string.
The bottom like is that by breaking down your data into its fundamental pieces, you allow MySQL (or any other relational database) to work much more efficiently, and life gets much, much easier.

How to structure mySQL query to find related information sorted in a particular order

I have a number of tables in my database and I need help in structuring queries that are quick and efficient. With the different queries that I have written so far either the results have been inconsistent or to big (returning more information than I need and therefore, have to use PHP later to constraint the results). Here is the background.
Our database handles leads for the senior housing industry. Each lead has many notes (a sales history) that not only give a history of the sale but inform the user of the next follow up date (actionDate). There are also many different statuses (i.e., active, top 10, move in, etc.) each lead can be assigned to (though not at the same time). The status of a lead is a history of the progression of a lead through the sales process. We can see what status the lead was and when.
In the base "lead" table each lead has a Primary Key called "inquiryID" that auto increments. This key is referenced in most other tables to relate them to the "lead" table. Here is the structure of the "lead" table.
TABLE: lead (~500 rows)
+-------------------+------------+-------+--------+
| Field | Type | Key | Extra |
+-------------------+------------+-------+--------+
| inquiryID | int(11) | PK | AI |
| communityID | int(3) | | |
| initialDate | date | | |
| inquirySource | tinytext | | |
| inquiryType | tinytext | | |
+-------------------+------------+-------+--------+
Another table is titled "leadNote". This table handles the sales journal for each lead. Basically a salesperson would enter in the date the note was written (date) who is writing the note (salesCounselor), the note itself (note), who is to follow up with the lead (actionCounselor), and what date they will follow up (actionDate).
TABLE: leadNote (~15000 rows)
+-------------------+------------+-------+--------+
| Field | Type | Key | Extra |
+-------------------+------------+-------+--------+
| inquiryNoteID | int(11) | PK | AI |
| inquiryID | int(11) | FK | |
| date | date | | |
| salesCounselor | tinytext | | |
| note | text | | |
| actionCounselor | int(5) | | |
+-------------------+------------+-------+--------+
The final table I will reference is titled "leadStatusHistory". This table handles the history of the status of this lead. A lead can have many different statuses, but not at the same time. We want to be able to track what a lead's status is and when. A lead would have a status (leadStatus), a date the status was assigned to them (statusDate), and who assigned the status to them (author) among other gathered data.
TABLE: leadStatusHistory (~1200 rows)
+-------------------+-------------+-------+--------+
| Field | Type | Key | Extra |
+-------------------+-------------+-------+--------+
| historyID | int(11) | PK | AI |
| inquiryID | int(11) | FK | |
| leadStatus | tintytext | | |
| date | datetime | | |
| communityID | int(3) | | |
| timestamp | timestamp | | |
+-------------------+-------------+-------+--------+
My goal is to be able to run a query that returns the inquiryID, actionCounselor, actionDate, and current leadStatus. As I said earlier the many different queries that I have tried have brought mixed results. There are two types of ways that I want to gather this list. 1) find all leads that have a next contact date that is less than or equal to today (this is the list of leads scheduled to follow up with today). 2) find all leads that match a certain leadStatus currently (i.e., to look up all leads that are currently with a status of "move in".
This is how I would ORDER the tables to get the information that I am looking for.
1) Find inquiryID, actionCounselor (value in the actionCounselor column on the most recently created "leadNote" row or "date" that is the greatest), actionDate (value in the actionDate column on the most recently created "leadNote" row or "date" that is the greatest), and leadStatus (value in the leadStatus column on the most recently created "leadStatusHistory" row or "timestamp" that is the greatest) WHERE the actionDate (value in the actionDate column on the most recently created "leadNote" row or "date" that is the greatest) is less than or equal to today.
2) Find inquiryID, actionCounselor (value in the actionCounselor column on the most recently created "leadNote" row or "date" that is the greatest), actionDate (value in the actionDate column on the most recently created "leadNote" row or "date" that is the greatest), and leadStatus (value in the leadStatus column on the most recently created "leadStatusHistory" row or "timestamp" that is the greatest) WHERE leadStatus (value in the leadStatus column on the most recently created "leadStatusHistory" row or "timestamp" that is the greatest) is equal to "move in".
Here are some examples of current and past queries with my comments as to what is wrong with them.
query #1:
SELECT
tt.inquiryID,
tt.actionDate,
tt.date,
tt.actionCounselor,
(SELECT
leadstatushistory.leadstatus
FROM
leadstatushistory
WHERE
leadstatushistory.inquiryID = tt.inquiryID AND leadstatushistory.historyID = (SELECT
MAX(leadstatushistory.historyID) as historyID
FROM
leadstatushistory
WHERE
inquiryID = tt.inquiryID)) AS leadStatus
FROM
leadnote tt
INNER JOIN
(SELECT
inquiryID,
MAX(inquiryNoteID) as inquiryNoteID,
MAX(leadnote.actionDate) AS actionDate
FROM
leadnote
GROUP BY inquiryID) groupedtt ON tt.inquiryID = groupedtt.inquiryID AND tt.inquiryNoteID = groupedtt.inquiryNoteID
WHERE
tt.actionDate <= '2012-08-27' AND tt.actionDate != '0000-00-00' AND (SELECT
leadstatushistory.leadstatus
FROM
leadstatushistory
WHERE
leadstatushistory.inquiryID = tt.inquiryID AND leadstatushistory.historyID =
(SELECT
MAX(leadstatushistory.historyID) as historyID
FROM
leadstatushistory
WHERE
inquiryID = tt.inquiryID)) != 'Resident' AND tt.communityID = 4
GROUP BY tt.inquiryID
COMMENTS: Gave me the columns I needed, but have had complaints that now and then the "actionDate" column will not reflect the the date of the of the most recently created leadNote row and sometimes the leadStatus was wrong. For example, the max(historyID) for the leadStatusHistory table is not necessarily the most recent status that we want to find. Sometimes our employees will go back and fill in missing leadStatus for leads in the past. This creates a new leadStatusHistory row with a new auto increment historyID. In this case the most recent (or greatest historyID) does not have the greatest "leadStatusHistory.date", because the date the user entered in was a past date (filling in past information so our historical records are accurate). The exact same problem we have with entering notes into the leadNote table for past notes. The new auto increment inquiryNoteID does not necessarily match the row with the greatest "tt.date".
query #2:
SELECT
maxDate.inquiryID, maxDate.date, maxDate.actionDate, maxDate.actionCounselor
FROM
(SELECT
*
FROM
leadnote
ORDER BY date DESC , type ASC, inquiryNoteID DESC) as maxDate
LEFT JOIN
staff ON maxDate.actionCounselor = staff.staffID
WHERE
maxDate.communityID = 4
GROUP BY inquiryID
COMMENTS: Gave me the columns I needed, but it also finds the information for all leads. This wastes valuable time and makes the response slower. I then have to use PHP to constrain the data to show only those leads that have an actionDate of <= today and a date that isn't "0000-00-00" or I constrain the data to show only those leads with a leadStatus of "Move In". Again, this does give me the results that I am looking for, but it is slow. Also, if I add into the query "WHERE date<=[today] AND date != '0000-00-00'" in the subquery it changes the results so they are not accurate and then I still have to use PHP to constrain the results to show only those that are of the status that I am looking for.
By looking at the above information does anyone have any ideas of how to better structure my query so that I can quickly find the exact information that I am looking for. Or is there a way to change the structure or relationship of the tables to get the results I am looking for. Please, any help is appreciated.
My goal is to be able to run a query that returns the inquiryID, actionCounselor, actionDate, and current leadStatus.
You are seeking to find the groupwise maxima from your leadNote and leadStatusHistory tables: namely the records with the maximum dates within each group of inquiryID.
You can achieve this with a query along the following lines:
SELECT inquiryID, actionCounselor, actionDate, leadStatus
FROM (
leadNote NATURAL JOIN (
SELECT inquiryID, MAX(actionDate) AS actionDate
FROM leadNote
GROUP BY inquiryID
) AS t
) JOIN (
leadStatusHistory NATURAL JOIN (
SELECT inquiryID, MAX(statusDate) AS statusDate
FROM leadStatusHistory
GROUP BY inquiryID
) AS t
) USING (inquiryID)
For the best performance, you should ensure that leadNote has a composite index on (inquiryID, actionDate) and that leadStatusHistory has a composite index on (inquiryID, statusDate, leadStatus):
ALTER TABLE leadNote ADD INDEX (inquiryID, actionDate);
ALTER TABLE leadStatusHistory ADD INDEX (inquiryID, statusDate, leadStatus);
There are two types of ways that I want to gather this list. 1) find all leads that have a next contact date that is less than or equal to today (this is the list of leads scheduled to follow up with today). 2) find all leads that match a certain leadStatus currently (i.e., to look up all leads that are currently with a status of "move in".
Add WHERE actionDate <= CURRENT_DATE
Add WHERE leadStatus = 'move in'