Select distinct column value from date range - mysql

I created a sqlfiddle that outlines what I'm trying to do:
http://sqlfiddle.com/#!9/41f3c/2
Basically, I need to search for unique posts in a table that contains meta information. The meta in this instance is a series of dates that represent exclusions (think of a booking system for a hotel).
I pass in a start and end date. I want to find post_id that does not contain a date that falls in the range. I'm close, but can't quite figure it out.
SELECT DISTINCT post_id
FROM wp_facetwp_index
WHERE facet_name = 'date_range'
AND facet_value NOT BETWEEN '$start_date' AND '$end_date'
This works if the only excluded dates in the table are in the range, but if some are out of the range, I still get the post_id.
Thanks for looking.

Do not forget, in SQL, the filters (where clause, etc.) are applied on a RECORD basis. Each record is evaluated independantly from the others.
So, since
(1, 511, 'date_range', 'cf/excluded_dates', '2015-07-31', '2015-07-31')
validates your condition, 511 is returned.
Since post_id is not unique, you need to proceed with a exclusion on SETS, as opposed to an exclusion on RECORDS which you're doing right now.
Here is the solution (adjusted fiddle here: http://sqlfiddle.com/#!9/41f3c/7)
SELECT DISTINCT i1.`post_id`
FROM `wp_facetwp_index` i1
WHERE i1.`facet_name` = 'date_range'
AND NOT EXISTS (
SELECT 1
FROM `wp_facetwp_index` i2
WHERE
i2.`facet_value` BETWEEN '$start_date' AND '$end_date'
AND i2.`facet_name` = 'date_range'
AND i2.`post_id` = i1.`post_id`
)
The subquery right after EXISTS ( is a subset of rows. It will be evaluated negatively by NOT EXISTS based on the junction i2.post_id = i1.post_id.
This is a negative intersection.
Working on exluding records does not work if the tuple you need to indentify is not unique.

Related

MYSQL count unique rows even if count is zero, for single table

I have a table just like in this SQLfiddle - http://www.sqlfiddle.com/#!9/3b6f8a/1
And i try to count unique occurrences of visits, but I need to also have dates where count is zero. In sample result you can see that for date 2019-07-24 there is missing 0 result for .bid and similar for 25th.
I have searched through stackoverflow and found only solutions for multiple tables using LEFT JOIN. I have no idea how to apply it to single table.
Below is working query I have that doesnt show zero values.
SELECT COUNT(DISTINCT `hash`) wizyt, DATE(timestamp) AS timestamp, CASE
WHEN url LIKE '%.bid%' THEN 'bid'
WHEN url LIKE '%.online%' THEN 'online'
END AS landing
FROM `tracking_actions`
WHERE `action` = 'w90|landing-visit'
GROUP BY DATE(timestamp), landing ORDER BY timestamp
Kindly please help :)
The thing is - in SQL you cannot normally create rows out of nothing. If you have a set of rows then you can reduce it in various ways (like group by or where or having), or you can multiply it with another set of rows (with various kinds of join). But you cannot create rows out of thin air, which is what you're trying to do. If there's a date with no activity, then you'll not have any rows for that and no way to get a row with that date.
So there are conceptually two ways to try to get around this. The more universal one is to create a calendar table with all the dates. Or maybe three tables years, months and days which you join together to get actual dates (though that gets a bit tricky). And then, when you have a row for each date, you can join your actual data to that, and you'll get your zeroes.
Alternatively, you need to look in DB-specific extensions to SQL.
MariaDB has common table expressions which can do this (see this related question for inspiration: How do I make a row generator in MySQL?)
It also has a "sequence storage engine" which creates pseudo-tables with numerical sequences. Those can be converted to dates.
For MySQL itself I don't know what could be used. Dig through the documentation, perhaps you'll find something that generates rows and can adapt that.
Last but not least, there's also the option to NOT use SQL for this. After getting the incomplete data set, process it in your programming language of choice (which you're probably doing anyway) and fill the gaps.
You should first using cross join for build the keys (date and landi) for the rows you need and the using left join obatin the relateed count
select a.my_date, a.landing , ifnull(t.wizyt,0) wizyt
from (
SELECT distinct DATE(timestamp) my_date
from FROM `tracking_actions`
cross join (
select distinct case
WHEN url LIKE '%.bid%' THEN 'bid'
WHEN url LIKE '%.online%' THEN 'online'
end landing
from
FROM `tracking_actions`
WHERE `action` = 'w90|landing-visit'
)
WHERE `action` = 'w90|landing-visit'
) a
LEFT JOIN (
SELECT COUNT(DISTINCT `hash`) wizyt, DATE(timestamp) AS timestamp, CASE
WHEN url LIKE '%.bid%' THEN 'bid'
WHEN url LIKE '%.online%' THEN 'online'
END AS landing
FROM `tracking_actions`
WHERE `action` = 'w90|landing-visit'
GROUP BY DATE(timestamp), landing ORDER BY timestamp
) t ON t.landing = a.landing and a.my_date = t.timestamp

Alteryx to select top N records where N=a value on that group

I'm in a fix with Alteryx. I'm trying to select the top N rows where N=a cell value for that partition. The business question is:
"We need to know, out of our orders (TicketIDs), those that have
least 1 combination of Type of discount item AND drink AND side."
The SQL query would join this table onto itself and partition to get the TopNtoIncludeInItems for that row, however, I just can't seem to find a way to do this in Alteryx. I've tried the community, but the question has gone ananswered.
In other words, select thusly:
<pseudocode>
for each (TicketID)
for each(Type)
select top(TopNtoIncludeInItems for this.TicketID)
next
next
</pseudocode>
or indeed select just the green records
Here's my solution:
MultiRow Formula: create new field ComboCount (or whatever) as Int32, 0 or empty for rows that don't exists, Group By TicketID and Type, with the Expression [Row-1:ComboCount]+1 ... this counts up each group; we'll want the first topN of each group, ensuring the group actuall has that many, and not going beyond TopN.
Filter on [ComboCount] <= [TopN] ... which excludes unnecessary rows beyond TopN
Summarize: group by TicketID and Type, doing Max(ComboCount) ... if this value is less than TopN for any group, the group should be excluded:
Join the summary back to the earlier pre-summary data on TicketID and Type
Filter on [Max_ComboCount] = [TopN] ... this excludes the groups where any ItemType falls short of TopN
And that's it. Pictorally, this is what my workflow looks like, along with data results based on data similar to that in your screenshot:

Mysql, EXIST / NOT EXIST, asterisk as column name

An example from a book about MySql:
SELECT vendor_id, vendor_name, vendor_state
FROM vendors
WHERE NOT EXISTS
(SELECT *
FROM invoices
WHERE vendor_id = vendors.vendor_id)
"In this example, the correlated subquery selects all invoices that have the same vendor_id value as the current vendor in the outer query. Because the subquery doesn't actually return a result set, it doesn't matter what columns are included in the SELECT clause. As a result it's customary to just code an asterisk."
The invoices table has like 10 separate columns which look like this: http://prntscr.com/h3106k
I am not fully understanding the asterisk part. Since there is 10 separate columns in this table is it not possible that some columns will be empty (or not empty) and we can check for that? There is no use of checking individual columns, and it only makes sense to check a table as a whole (so nothing else that the asterisk is needed here)?
In this example, there is no row satisfying the condition (WHERE …=…). So, it is not important which column is checked as there is no row to check at all.
An alternative would be the following clause, maybe it is easier to understand:
SELECT vendor_id, vendor_name, vendor_state
FROM vendors
WHERE
(
SELECT COUNT(vendor_id)
FROM invoices
WHERE vendor_id = vendors.vendor_id
) = 0

Not selecting duplicates in join / where query

I've been trying to learn MySQL, and I'm having some trouble creating a join query to not select duplicates.
Basically, here's where I'm at :
SELECT atable.phonenumber, btable.date
FROM btable
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
However, in my database, there is the possibility of having duplicate rows in column atable.phonenumber.
For example (added asterisks for clarity)
phonenumber | date
-------------|-----------
*555-681-2105 | 2015-08-12
555-425-5161 | 2015-08-15
331-484-7784 | 2015-08-17
*555-681-2105 | 2015-08-25
.. and so on.
I tried using SELECT DISTINCT but that doesn't work. I also was looking through other solutions which recommended GROUP BY, but that threw an error, most likely because of my WHERE clause and condition. Not really sure how I can easily accomplish this.
DISTINCT applies to the whole row being returned, essentially saying "I want only unique rows" - any row value may participate in making the row unique
You are getting phone numbers duplicated because you're only looking at the column in isolation. The database is looking at phone number and also date. The rows you posted have different dates, and these hence cause the rows to be different
I suggest you do as the commenter recommended and decide what you want to do with the dates. If you want the latest date for a phone number, do this:
SELECT atable.phonenumber, max(btable.date)
FROM battle
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
GROUP BY atable.phonenumber
When you write a query that uses grouping, you will get a set of rows where there is only one set of value combinations for anything that is in the group by list. In this case, only unique phone numbers. But, because you want other values as well (I.e. Date) you MUST use what's called an aggregate function, to specify what you want to do with all the various values that aren't part of the unique set. Sometimes it will be MAX or MIN, sometimes it will be SUM, COUNT, AVG and so on.
if you're familiar with hash tables or dictionaries from elsewhere in programming, this is what a group by is: it maps a set of values (a key) to a list of rows that have those key values, and then the aggregating function is applied to any of the values in the list associated with the key
The simple rule when using group by (and one that MySQL will do implicitly for you) is to write queries thus:
SELECT
List,
of,
columns,
you,
want,
in,
unique,
combination,
FN(List),
FN(of),
FN(columns),
FN(you),
FN(want),
FN(aggregating)
FROM table
GROUP BY
List,
of,
columns,
you,
want,
in,
unique,
combination
i.e. You can copy paste from your select list to your group list. MySQL does this implicitly for you if you don't do it (i.e. If you use one or more aggregate functions like max in your select list, but forget or omit the group by clause- it will take everything that isn't in an agggregate function and run the grouping as if you'd written it). Whether group by is hence largely redundant is often debated, but there do exist other things you can do with a group by, such as rollup, cube and grouping sets. Also you can group on a column, if that column is used in a deterministic function, without having to group on the result of he deterministic function. Whether there is any point to doing so is a debate for another time :)
You should add GROUP BY, and an aggregate to the date field, something like this:
SELECT atable.phonenumber, MAX(btable.date)
FROM btable
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
GROUP BY atable.phonenumber
This will return the maximum date, hat is the latest date...

How to preserve order of temp table rows when inner joined with another table?

Does an SQL Server "join" preserve any kind of row order consistently (i.e. that of the left table or that of the right table)?
Psuedocode:
create table #p (personid bigint);
foreach (id in personid_list)
insert into #p (personid) values (id)
select id from users inner join #p on users.personid = #p.id
Suppose I have a list of IDs that correspond to person entries. Each of those IDs may correspond to zero or more user accounts (since each person can have multiple accounts).
To quickly select columns from the users table, I populate a temp table with person ids, then inner join it with the users table.
I'm looking for an efficient way to ensure that the order of the results in the join matches the order of the ids as they were inserted into the temp table, so that the user list that's returned is in the same order as the person list as it was entered.
I've considered the following alternatives:
using "#p inner join users", in case the left table's order is preserved
using "#p left join users where id is not null", in case a left join preserves order and the inner join doesn't
using "create table (rownum int, personid bigint)", inserting an incrementing row number as the temp table is populated, so the results can be ordered by rownum in the join
using an SQL Server equivalent of the "order by order of [tablename]" clause available in DB2
I'm currently using option 3, and it works... but I hate the idea of using an order by clause for something that's already ordered. I just don't know if the temp table preserves the order in which the rows were inserted or how the join operates and what order the results come out in.
EDIT:
Assuming I go with option 3, so there IS a field to order on... is there any form of the join that will help SQL Server to do the least amount of work in maintaining the order. I mean, is it smart enough, for example, to look at what table's fields are in the order by clause and work off that table first while doing the join, so that the result set's order roughly or completely coincides with that table's order, just in case it's already in the desired order?
SQL sets are never ordered unless you explicitly order them with an order by clause.
Do this:
create table #p (personid bigint);
insert into #p (personid) values (id)
select id from users
ORDER BY <something like users.name>;
select * from #p
ORDER BY <something like users.name>;
Note that while you can insert in order, that doesn't mean the subsequent select will be ordered, because SQL sets are never ordered unless you explicitly order them with an order by clause.
You write:
To quickly select columns from the users table, I populate a temp table with person ids, then inner join it with the users table.
Note that in most cases, it'll be faster to just select directly from users, using an in list:
select * form users where users.id in (1, 2, 3, 6, 9, ... );
You're probably prematurely "optimizing" something that doesn't need optimizing. RDBMSes are (usually) written to be efficient, and will probably do little extra work sorting something that's already sorted by chance. Concentrate on functionality until you have a demonstrated need to optimize. (I say this as someone who has been spending the last several months almost solely optimizing SQL on very large (~ half billion row OLTP) datasets, because most of the time, that's true.)