MySQL - Combining two select statements into one result with LIMIT efficiently - mysql

For a dating application, I have a few tables that I need to query for a single output with a LIMIT 10 of both queries combined. It seems difficult to do at the moment, even though it's not an issue to query them separately, but the LIMIT 10 won't work as the numbers are not exact (ex. not LIMIT 5 and LIMIT 5, one query may return 0 rows, while the other 10, depending on the scenario).
members table
member_id | member_name
------------------------
1 Herb
2 Karen
3 Megan
dating_requests
request_id | member1 | member2 | request_time
----------------------------------------------------
1 1 2 2012-12-21 12:51:45
dating_alerts
alert_id | alerter_id | alertee_id | type | alert_time
-------------------------------------------------------
5 3 2 platonic 2012-12-21 10:25:32
dating_alerts_status
status_id | alert_id | alertee_id | viewed | viewed_time
-----------------------------------------------------------
4 5 2 0 0000-00-00 00:00:00
Imagine you are Karen and just logged in, you should see these 2 items:
1. Herb requested a date with you.
2. Megan wants a platonic relationship with you.
In one query with a LIMIT of 10. Instead here are two queries that need to be combined:
1. Herb requested a date with you.
-> query = "SELECT dr.request_id, dr.member1, dr.member2, m.member_name
FROM dating_requests dr
JOIN members m ON dr.member1=m.member_id
WHERE dr.member2=:loggedin_id
ORDER BY dr.request_time LIMIT 5";
2. Megan wants a platonic relationship with you.
-> query = "SELECT da.alert_id, da.alerter_id, da.alertee_id, da.type,
da.alert_time, m.member_name
FROM dating_alerts da
JOIN dating_alerts_status das ON da.alert_id=das.alert_id
AND da.alertee_id=das.alertee_id
JOIN members m ON da.alerter_id=m.member_id
WHERE da.alertee_id=:loggedin_id AND da.type='platonic'
AND das.viewed='0' AND das.viewed_time<da.alert_time
ORDER BY da.alert_time LIMIT 5";
Again, sometimes both tables may be empty, or 1 table may be empty, or both full (where LIMIT 10 kicks in) and ordered by time. Any ideas on how to get a query to perform this task efficiently? Thoughts, advice, chimes, optimizations are welcome.

You can combine multiple queries with UNION, but only if the queries have the same number of columns. Ideally the columns are the same, not only in data type, but also in their semantic meaning; however, MySQL doesn't care about the semantics and will handle differing datatypes by casting up to something more generic - so if necessary you could overload the columns to have different meanings from each table, then determine what meaning is appropriate in your higher level code (although I don't recommend doing it this way).
When the number of columns differs, or when you want to achieve a better/less overloaded alignment of data from two queries, you can insert dummy literal columns into your SELECT statements. For example:
SELECT t.cola, t.colb, NULL, t.colc, NULL FROM t;
You could even have some columns reserved for the first table and others for the second table, such that they are NULL elsewhere (but remember that the column names come from the first query, so you may wish to ensure they're all named there):
SELECT a, b, c, d, NULL AS e, NULL AS f, NULL AS g FROM t1
UNION ALL -- specify ALL because default is DISTINCT, which is wasted here
SELECT NULL, NULL, NULL, NULL, a, b, c FROM t2;
You could try aligning your two queries in this fashion, then combining them with a UNION operator; by applying LIMIT to the UNION, you're close to achieving your goal:
(SELECT ...)
UNION
(SELECT ...)
LIMIT 10;
The only issue that remains is that, as presented above, 10 or more records from the first table will "push out" any records from the second. However, we can utilise an ORDER BY in the outer query to solve this.
Putting it all together:
(
SELECT
dr.request_time AS event_time, m.member_name, -- shared columns
dr.request_id, dr.member1, dr.member2, -- request-only columns
NULL AS alert_id, NULL AS alerter_id, -- alert-only columns
NULL AS alertee_id, NULL AS type
FROM dating_requests dr JOIN members m ON dr.member1=m.member_id
WHERE dr.member2=:loggedin_id
ORDER BY event_time LIMIT 10 -- save ourselves performing excessive UNION
) UNION ALL (
SELECT
da.alert_time AS event_time, m.member_name, -- shared columns
NULL, NULL, NULL, -- request-only columns
da.alert_id, da.alerter_id, da.alertee_id, da.type -- alert-only columns
FROM
dating_alerts da
JOIN dating_alerts_status das USING (alert_id, alertee_id)
JOIN members m ON da.alerter_id=m.member_id
WHERE
da.alertee_id=:loggedin_id
AND da.type='platonic'
AND das.viewed='0'
AND das.viewed_time<da.alert_time
ORDER BY event_time LIMIT 10 -- save ourselves performing excessive UNION
)
ORDER BY event_time
LIMIT 10;
Of course, now it's up to you to determine what type of row you're dealing with as you read each record in the resultset (suggest you test request_id and/or alert_id for NULL values; alternatively one could add an additional column to the results that explicitly states from which table each record originated, but it should be equivalent provided those id columns are NOT NULL).

Related

SQL Validate a column with the same column

I have the following situation. I have a table with all info of article. I will like to compare the same column with it self. because I have multiple type of article. Single product and Master product. the only way that I have to differences it, is by SKU. for example.
ID | SKU
1 | 11111
2 | 11112
3 | 11113
4 | 11113-5
5 | 11113-8
6 | 11114
7 | 11115
8 | 11115-1-W
9 | 11115-2
10 | 11116
I only want to list or / and count only the sku that are full unique. follow th example the sku that are unique and no have variant are (ID = 1, 2, 6 and 10) I will want to create a query where if 11113 are again on the column not cout it. so in total I will be 4 unique sku and not "6 (on total)". Please let me know. if this are possible.
Assuming the length of master SKUs are 5 characters, try this:
select a.*
from mytable a
left join mytable b on b.sku like concat(a.sku, '%')
where length(a.sku) = 5
and b.sku is null
This query joins master SKUs to child ones, but filters out successful joins - leaving only solitary master SKUs.
You can do this by grouping and counting the unique rows.
First, we will need to take your table and add a new column, MasterSKU. This will be the first five characters of the SKU column. Once we have the MasterSKU, we can then GROUP BY it. This will bundle together all of the rows having the same MasterSKU. Once we are grouping we get access to aggregate functions like COUNT(). We will use that function to count the number of rows for each MasterSKU. Then, we will filter out any rows that have a COUNT() over 1. That will leave you with only the unique rows remaining.
Take that unique list and LEFT JOIN it back into your original table to grab the IDs.
SELECT ID, A.MasterSKU
FROM (
SELECT
MasterSKU = SUBSTRING(SKU,1,5),
MasterSKUCount = COUNT(*)
FROM MyTable
GROUP BY SUBSTRING(SKU,1,5)
HAVING COUNT(*) = 1
) AS A
LEFT JOIN (
SELECT
ID,
MasterSKU = SUBSTRING(SKU,1,5)
FROM MyTable
) AS B
ON A.MasterSKU = B.MasterSKU
Now one thing I noticed from you example. The original SKU column really looks like three columns in one. We have multiple values being joined with hypens.
11115-1-W
There may be a reason for it, but most likely this violates first normal form and will make the database hard to query. It's part of the reason why such a complicated query is needed. If the SKU column really represents multiple things then we may want to consider breaking it out into MasterSKU, Version, and Color or whatever each hyphen represents.

Retrieve rows that have a first entry in 2014 in MySQL

I want to retrieve all rows from a table that have their first entry on or after 01/01/2014 but no later than 31/12/2014
Example of the table:
OID FK_OID Treatment Trt_DATE
1 100 19304 2011-05-24
2 100 19304 2011-08-01
3 100 19306 2014-03-05
4 200 19305 2012-02-02
5 300 19308 2014-01-20
6 400 19308 2014-06-06
For example. I would like to pull all entries that have STARTED treatment in 2014. So above i would to extract FK_OID's 300 and 400 because their first entry is in 2014, but i would like to omit FK_OID 100 because they have 2 entries prior to 2014.
How do i go about this? I can extract all entries within a date range etc but that brings back all entries for that date and doesn't omit anyone who has an entry prior to the start of the date range. It just returns their first entry in 2014.
For the ones who need to see that i have tried something. See below.
I am not an experienced coder and this is the best i can get because i don't have the knowledge.
SELECT
mod,
(select NHSNum from person p
WHERE
p.oid = t.fk_oid) as 'NHS'
FROM
timeline t
Where trt_date BETWEEN '2014-01-01' AND '2014-12-31'
ORDER BY trt_date ASC
This returns every treatment for 2014 regardless of whether it is the first ever one for that person. I want to omit anyone from this list who has had treatment before 01/01/2014 as well as only return the first treatment per person. For example, this code returns all treatments for all people in 2014. I only want their first one and only if it is their first one ever.
Thanks.
create table aThing
( oid int auto_increment primary key,
fk_oid int not null,
treatment int not null,
trt_date date not null
);
insert aThing (fk_oid,treatment,trt_date) values
(100, 19304, '2011-05-24'),
(100, 19304, '2011-08-01'),
(100, 19306, '2014-03-05'),
(200, 19305, '2012-02-02'),
(300, 19308, '2014-01-20'),
(400, 19308, '2014-06-06');
select fk_oid,dt
from
( select fk_oid,min(trt_date) as dt
from aThing
group by fk_oid
) xDerived
where year(dt)=2014;
+--------+------------+
| fk_oid | dt |
+--------+------------+
| 300 | 2014-01-20 |
| 400 | 2014-06-06 |
+--------+------------+
The inner part, the nested one, become a derived table, and is given a name xDerived. This means that even though it is just a result set, by making it a derived table, it can be referred to by name. So it is not a physical table, but a derived one, or virtual one.
So that derived table is a very simple group by with an aggregate function. It says, for every fk_oid, bring back one row and only 1 row, with its minimum value for trt_date.
So if you have 10 million rows in that table called aThing, but only 17 distinct values for fk_oid, it will return only 17 rows. Each row being the minimum of trt_date for its fk_oid.
So now that that is achieved, the outer wrapper says just show me those two columns (but with a year check). There is a complicated to explain reason why I had to do that, so I will try to do it here.
But I might need a little time to explain it well, so bear with me.
This will be a shortcut way to say it. I had to get the min into an alias, and I only had access to that alias if resolved in a derived table, to cleanse it so to speak, and then access it with an outer wrapper.
An alias of aggregate column, like as dt, is not available (as a pseudo like column name which is what an alias is) ... it is not available in a where clause. But by wrapping it in a derived table name, I cleanse it so to speak, and then I can access it in a where clause.
So I can't access it directly in its own query in the where clause, but when I wrap it in an envelope (a derived table), I can access it on the outside.
I will try better to explain it later, maybe, but I would have to show alternative attempts to gain access to results, and the syntax errors that would result.
There's probably a more elegant solution, but this seems to satisfy the requirement...
SELECT x.*
FROM my_table x
JOIN
( SELECT fk_oid
, MIN(trt_date) min_date
FROM my_table
GROUP
BY fk_oid
HAVING min_date > '2014-01-01'
) a
ON a.fk_oid = x.fk_oid
LEFT
JOIN my_table b
ON b.fk_oid = a.fk_oid
AND b.trt_date > '2014-12-31'
WHERE b.oid IS NULL;
Having a few years a experience with this, i decided to revisit it. The solution i now use regularly is:
SELECT t1.column1, t1.column2
FROM MyTable AS t1
LEFT OUTER JOIN MyTable AS t2
ON t1.fkoid = t2.fkoid
AND (t1.date > t2.date
OR (t1.date = t2.date AND t1.oid > t2.oId))
WHERE t2.fkoid IS NULL and t1.date >= '2014-01-01'

MySql order by specific ID values

Is it possible to sort in MySQL by "order by" using a predefined set of column values (ID) like order by (ID=1,5,4,3) so I would get records 1, 5, 4, 3 in that order out?
UPDATE: Why I need this...
I want my records to change sort randomly every 5 minutes. I have a cron task to update the table to put different, random sort order in it.
There is just one problem! PAGINATION.
I will have visitors who come to my page, and I will give them the first 20 results. They will wait 6 minutes, go to page 2 and have the wrong results as the sort order has already changed.
So I thought that if I put all the IDs into a session on page 2, we get the correct records even if the sorting had already changed.
Is there any other better way to do this?
You can use ORDER BY and FIELD function.
See http://lists.mysql.com/mysql/209784
SELECT * FROM table ORDER BY FIELD(ID,1,5,4,3)
It uses Field() function, Which "Returns the index (position) of str in the str1, str2, str3, ... list. Returns 0 if str is not found" according to the documentation. So actually you sort the result set by the return value of this function which is the index of the field value in the given set.
You should be able to use CASE for this:
ORDER BY CASE id
WHEN 1 THEN 1
WHEN 5 THEN 2
WHEN 4 THEN 3
WHEN 3 THEN 4
ELSE 5
END
On the official documentation for mysql about ORDER BY, someone has posted that you can use FIELD for this matter, like this:
SELECT * FROM table ORDER BY FIELD(id,1,5,4,3)
This is untested code that in theory should work.
SELECT * FROM table ORDER BY id='8' DESC, id='5' DESC, id='4' DESC, id='3' DESC
If I had 10 registries for example, this way the ID 1, 5, 4 and 3 will appears first, the others registries will appears next.
Normal exibition
1
2
3
4
5
6
7
8
9
10
With this way
8
5
4
3
1
2
6
7
9
10
There's another way to solve this. Add a separate table, something like this:
CREATE TABLE `new_order` (
`my_order` BIGINT(20) UNSIGNED NOT NULL,
`my_number` BIGINT(20) NOT NULL,
PRIMARY KEY (`my_order`),
UNIQUE KEY `my_number` (`my_number`)
) ENGINE=INNODB;
This table will now be used to define your own order mechanism.
Add your values in there:
my_order | my_number
---------+----------
1 | 1
2 | 5
3 | 4
4 | 3
...and then modify your SQL statement while joining this new table.
SELECT *
FROM your_table AS T1
INNER JOIN new_order AS T2 on T1.id = T2.my_number
WHERE ....whatever...
ORDER BY T2.my_order;
This solution is slightly more complex than other solutions, but using this you don't have to change your SELECT-statement whenever your order criteriums change - just change the data in the order table.
If you need to order a single id first in the result, use the id.
select id,name
from products
order by case when id=5 then -1 else id end
If you need to start with a sequence of multiple ids, specify a collection, similar to what you would use with an IN statement.
select id,name
from products
order by case when id in (30,20,10) then -1 else id end,id
If you want to order a single id last in the result, use the order by the case. (Eg: you want "other" option in last and all city list show in alphabetical order.)
select id,city
from city
order by case
when id = 2 then city else -1
end, city ASC
If i had 5 city for example, i want to show the city in alphabetical order with "other" option display last in the dropdown then we can use this query.
see example other are showing in my table at second id(id:2) so i am using "when id = 2" in above query.
record in DB table:
Bangalore - id:1
Other - id:2
Mumbai - id:3
Pune - id:4
Ambala - id:5
my output:
Ambala
Bangalore
Mumbai
Pune
Other
SELECT * FROM TABLE ORDER BY (columnname,1,2) ASC OR DESC

GROUP BY does not remove duplicates

I have a watchlist system that I've coded, in the overview of the users' watchlist, they would see a list of records, however the list shows duplicates when in the database it only shows the exact, correct number.
I've tried GROUP BY watch.watch_id, GROUP BY rec.record_id, none of any types of group I've tried seems to remove duplicates. I'm not sure what I'm doing wrong.
SELECT watch.watch_date,
rec.street_number,
rec.street_name,
rec.city,
rec.state,
rec.country,
usr.username
FROM
(
watchlist watch
LEFT OUTER JOIN records rec ON rec.record_id = watch.record_id
LEFT OUTER JOIN members usr ON rec.user_id = usr.user_id
)
WHERE watch.user_id = 1
GROUP BY watch.watch_id
LIMIT 0, 25
The watchlist table looks like this:
+----------+---------+-----------+------------+
| watch_id | user_id | record_id | watch_date |
+----------+---------+-----------+------------+
| 13 | 1 | 22 | 1314038274 |
| 14 | 1 | 25 | 1314038995 |
+----------+---------+-----------+------------+
GROUP BY does not "remove duplicates". GROUP BY allows for aggregation. If all you want is to combine duplicated rows, use SELECT DISTINCT.
If you need to combine rows that are duplicate in some columns, use GROUP BY but you need to to specify what to do with the other columns. You can either omit them (by not listing them in the SELECT clause) or aggregate them (using functions like SUM, MIN, and AVG). For example:
SELECT watch.watch_id, COUNT(rec.street_number), MAX(watch.watch_date)
... GROUP by watch.watch_id
EDIT
The OP asked for some clarification.
Consider the "view" -- all the data put together by the FROMs and JOINs and the WHEREs -- call that V. There are two things you might want to do.
First, you might have completely duplicate rows that you wish to combine:
a b c
- - -
1 2 3
1 2 3
3 4 5
Then simply use DISTINCT
SELECT DISTINCT * FROM V;
a b c
- - -
1 2 3
3 4 5
Or, you might have partially duplicate rows that you wish to combine:
a b c
- - -
1 2 3
1 2 6
3 4 5
Those first two rows are "the same" in some sense, but clearly different in another sense (in particular, they would not be combined by SELECT DISTINCT). You have to decide how to combine them. You could discard column c as unimportant:
SELECT DISTINCT a,b FROM V;
a b
- -
1 2
3 4
Or you could perform some kind of aggregation on them. You could add them up:
SELECT a,b, SUM(c) "tot" FROM V GROUP BY a,b;
a b tot
- - ---
1 2 9
3 4 5
You could add pick the smallest value:
SELECT a,b, MIN(c) "first" FROM V GROUP BY a,b;
a b first
- - -----
1 2 3
3 4 5
Or you could take the mean (AVG), the standard deviation (STD), and any of a bunch of other functions that take a bunch of values for c and combine them into one.
What isn't really an option is just doing nothing. If you just list the ungrouped columns, the DBMS will either throw an error (Oracle does that -- the right choice, imo) or pick one value more or less at random (MySQL). But as Dr. Peart said, "When you choose not to decide, you still have made a choice."
While SELECT DISTINCT may indeed work in your case, it's important to note why what you have is not working.
You're selecting fields that are outside of the GROUP BY. Although MySQL allows this, the exact rows it returns for the non-GROUP BY fields is undefined.
If you wanted to do this with a GROUP BY try something more like the following:
SELECT watch.watch_date,
rec.street_number,
rec.street_name,
rec.city,
rec.state,
rec.country,
usr.username
FROM
(
watchlist watch
LEFT OUTER JOIN est8_records rec ON rec.record_id = watch.record_id
LEFT OUTER JOIN est8_members usr ON rec.user_id = usr.user_id
)
WHERE watch.watch_id IN (
SELECT watch_id FROM watch WHERE user_id = 1
GROUP BY watch.watch_id)
LIMIT 0, 25
I Would never recommend using SELECT DISTINCT, it's really slow on big datasets.
Try using things like EXISTS.
You are grouping by watch.watch_id and you have two results, which have different watch IDs, so naturally they would not be grouped.
Also, from the results displayed they have different records. That looks like a perfectly valid expected results. If you are trying to only select distinct values, then you don't want ot GROUP, but you want to select by distinct values.
SELECT DISTINCT()...
If you say your watchlist table is unique, then one (or both) of the other tables either (a) has duplicates, or (b) is not unique by the key you are using.
To suppress duplicates in your results, either use DISTINCT as #Laykes says, or try
GROUP BY watch.watch_date,
rec.street_number,
rec.street_name,
rec.city,
rec.state,
rec.country,
usr.username
It sort of sounds like you expect all 3 tables to be unique by their keys, though. If that is the case, you are simply masking some other problem with your SQL by trying to retrieve distinct values.

MySql - getting a row from latest non-null values in each column

I have a table and I would like to get a row containing all the latest non-null attributes for each column (without combining separate queries for each column, which doesn't seem elegant to me).
Example:
A B C Time
1 a 7 0
NULL NULL 3 1
3 NULL 4 2
NULL NULL 6 3
Result I seek:
A B C
3 a 6
As I said, I know how to select what I want for each column separately, but I was wondering if there's a better way to do it. No need to tax the poor database if it isn't needed.
Probably a better way than this, but it's Monday and I'm not quite conscious yet:
select #a:=null, #b:=null, #c:=null;
select A,B,C from (
select #a:=coalesce(A,#a) as A, #b:=coalesce(B,#b) as B, #c:=coalesce(C,#) as C time
from yourtable
order by time asc
) as y order by time desc limit 1;
Basically, iterate over each row in the database and build up the "latest" value as you go, then reverse the result set and select only the one with the highest time value