How to select all users for which given parameters are always true - mysql

I have a table containing users and locations where they were seen:
user_id | latitude | longitude | date_seen
-------------------------------------------
1035 | NULL | NULL | April 25 2010
1035 | 127 | 35 | April 28 2010
1038 | 127 | 35 | April 30 2010
1037 | NULL | NULL | May 1 2010
1038 | 126 | 34 | May 21 2010
1037 | NULL | NULL | May 24 2010
The dates are regular timestamps in the database; I just simplified them here.
I need to get a list of the users for whom latitude and longitude are always null. So in the above example, that would be user 1037--user 1035 has one row with lat/lon information, and 1038 has two rows with lat/lon information, whereas for user 1037, in both rows the information is null.
What query can I use to achieve this result?

select distinct user_id
from table_name t
where not exists(
select 1 from table_name t1
where t.user_id = t1.user_id and
t1.latitude is not null and
t1.longitude is not null
)
You can read this query: give me all users that haven't set lat and long different than null in any row in table. In my opinion exists is preferred in such case (no exists) because even if table scan is used (not optimal way to find row) it stops just after it finds specific row (there is no need to count all rows).
Read more about this topic: Exists Vs. Count(*) - The battle never ends... .

Try this, it should work.
SELECT user_id, count(latitude), count(longitude)
FROM user_loc
GROUP BY user_id HAVING count(latitude)=0 AND count(longitude)=0;
tested in MySQL.

Try:
SELECT * FROM user WHERE latitude IS NULL AND longitude IS NULL;
-- Edit --
2nd try (untested, but constructed it from a query I have used before):
SELECT user_id, CASE WHEN MIN(latitude) IS NULL AND MAX(latitude) IS NULL THEN 1 ELSE 0 END AS noLatLong FROM user GROUP BY user_id HAVING noLatLong = 1;

This works:
SELECT DISTINCT user_id
FROM table
WHERE latitude IS NULL
AND longitude IS NULL
AND NOT user_id IN
(SELECT DISTINCT user_id
FROM table
WHERE NOT latitude IS NULL
AND NOT longitude IS NULL)
result:
1037
(syntax validated with SQLite here)
BUT: Even if not using COUNT here, my statement has to scan all table lines, so MichaƂ Powaga's statement is more efficient.
rationale:
get list of user_ids with lat/lon records to compare against (you want to EXCLUDE these from final result) - optimization: use EXISTS here...
get list of user_ids without lat/lon records (that you're interested in)
reduce by all IDs, that exist in the first list - optimization: use EXISTS here...
make user_ids DISTINCT, because the example shows multiple entries per user_id (but you want just the unique IDs)

Related

MYSQL - Query to extract all columns from the top N distinct elements

I have design an event where you register multiple fishes and I wanted a query to extract the top 3 heaviest fishes from different people. In case of tie, it should be decided by a third parameter: who registered it first. I've tested several ways I found here on stack overflow but none of them worked the way I needed.
My schema is the following:
id | playerid | playername | itemid | weight | date | received | isCurrent
Where:
id = PK, AUTO_INCREMENT - it's basically an index
playerid = the unique code of the person who registered the fish
playername = name of the person who registered the fish
itemid = the code of the fish
weight = the weight of the fish
date = pre-defined as CURRENT_TIMESTAMP, the exact time the fish was registered
received = pre-defined as 0, it really don't matter for this analysis
isCurrent = pre-defined as 1, basically every time this event runs it updates this field to 0, meaning the registers don't belong to the current version of the event.
Here you can see the data I'm testing with
my problem is: How to avoid counting the same playerid for this rank more than once?
Query 1:
SELECT `playerid`, `playername`, `itemid`, `weight`
FROM `event_fishing`
WHERE `isCurrent` = 1 AND `weight` IN (
SELECT * FROM
(SELECT MAX(`weight`) as `fishWeight`
FROM `event_fishing`
WHERE `isCurrent` = 1
GROUP BY `playerid`
LIMIT 3) as t)
ORDER BY `weight` DESC, `date` ASC
LIMIT 3
Query 2:
SELECT * FROM `event_fishing`
INNER JOIN
(SELECT playerid, MAX(`weight`) as `fishWeight`
FROM `event_fishing`
WHERE `isCurrent` = 1
GROUP BY `playerid`
LIMIT 3) as t
ON t.playerid = `event_fishing`.playerid AND t.fishWeight = `event_fishing`.weight
WHERE `isCurrent` = 1
ORDER BY weight DESC, date ASC
LIMIT 3
Keep in mind that I must return at least the fields: playerid, playername, itemid, weight, that the version of the event must be the actual (isCurrent = 1), one playerid per line with the heaviest weight he registered for this version of the event and the date is registered.
Expected output for the data I've sent:
id |playerid|playername|itemid|weight| date |received| isCurrent
7 | 3734 |Mago Xxx | 7963 | 1850 | 2018-07-26 00:17:41 | 0 | 1
14 | 228 |Night Wolf| 7963 | 1750 | 2018-07-26 19:45:49 | 0 | 1
8 | 3646 |Test Spell| 7159 | 1690 | 2018-07-26 01:16:51 | 0 | 1
Output I'm getting (with both queries):
playerid|playername|itemid|weight
3734 |Mago Xxx | 7963 | 1850
228 |Night Wolf| 7963 | 1750
228 |Night Wolf| 7963 | 1750
Thank you for the attention.
EDIT: I've followed How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL? since my query is very similar to the accepted answer, in the comments I've found something that at a first glance seem to have solved my problem but I've found a case where the accepted answer fail. Check http://sqlfiddle.com/#!9/72aeef/1
If you take a look at data you'll notice that the id 14 was the first input of 1750 and therefore should be second place, but the MAX(id) returns the last input of the same playerid and therefore give us a wrong result.
Despite the problems seems alike, mine has a greater complexity and therefore the queries that were suggested doesn't work
EDIT 2:
I've managed to solve my problem with the following query:
http://sqlfiddle.com/#!9/d711c7/6
But I'll leave this question open because of two things:
1- I don't know if there's a case where this query might fail
2- Despite we limit a lot the first query, I still think this can be more optimized, so I'll leave it open to any one that might know a better way to solve the issue.

update rate for unique productId by each userID

I'm going to implement a method on my own SQL. I have two tables in MySQL. Suppose that each row is updated in the FirstTable and the values of the rate and countView are variable, I'm trying to update them with the same command:
UPDATE FirstTable SET `countView`= `countView`+1,
`rate`=('$MyRate' + (`countView`-1)*`rate`)/`countView`
WHERE `productId`='$productId'
FirstTable:
productId | countView | rate | other column |
------------+-----------+------+-------------------+---
21 | 12 | 4 | anything |
------------+-----------+------+-------------------+---
22 | 18 | 3 | anything |
------------+-----------+------+-------------------+---
But in this way, a user can vote every time he wants to. So I tried to create a table with two columns productId and userID. Like below:
SecondTable:
productId | userID |
------------+---------------|
21 | 100001 |
------------+---------------|
22 | 100002 |
------------+---------------|
21 | 100001 |
------------+---------------|
21 | 100003 |
------------+---------------|
Now, as in the example given in the SecondTable, a user has given to a productId two vote. So I don't want both of these votes to be recorded.
Problems with this method:
The value of the counter is added to each vote.
I can not properly link the SecondTable and FirstTable to manage the update of the FirstTable.
Of course, this question may not be completely new, but I searched a lot to get the right answer. One of the questions from this site came through this method. Using this method, you can manage the update of a table. This method is as follows:
UPDATE `FirstTable` SET `countView`= `countView`+1,
`rate`=('$MyRate' + (`countView`-1)*`rate`)/`countView`
WHERE `productId`='$productId' IN ( SELECT DISTINCT productId, userID
FROM SecondTable)
But the next problem is that even when I use this command, I encounter the following error:
1241 - Operand should contain 1 column(s)
So thank you so much if you can guide me. And I'm sure my question is not duplicate... thank you again.
This fixes your specific syntax problem:
UPDATE FirstTable
SET countView = countView + 1,
rate = ($MyRate + (countView - 1) * rate) / countView
WHERE productId = $productId AND
productId IN (SELECT t2.productId FROM SecondTable t2);
But if two different users vote on the same product, FirstTable will be updated only once. It is unclear if that is intentional behavior or not.
Note that SELECT DISTINCT is not needed in the subquery.
The error is being generated because you can't return 2 fields in an "in" statement. You'll want to use group by:
Try:
IN ( SELECT DISTINCT productId FROM rating group by product, UserID)
Here's documentation to look over for mysql group by if you want: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html

MySQL select with all where and one or more where not

Table structure and data (I know data in IP/domain fields might not make much sense, but this is for illustration purposes):
rec_id | account_id | product_id | ip | domain | some_data
----------------------------------------------------------------------------
1 | 1 | 1 | 192.168.1.1 | 127.0.0.1/test | abc
2 | 1 | 1 | 192.168.1.1 | 127.0.0.1/other | xyz
3 | 1 | 1 | 192.168.1.2 | 127.0.0.1/test | ooo
Table has unique index ip_domain combined from ip and domain fields (so records with identical values in both fields can't exist).
In each case I know values for account_id, product_id, ip, domain fields, and I need to get other rows that have the SAME account_id, product_id values and one (or both) of ip, domain values are DIFFERENT.
Example: I know that account_id=1, product_id=1, ip=192.168.1.1, domain=127.0.0.1/test (so it matches rec_id 1), I need to select records with IDs 2 and 3 (because record 2 has different domain and record 3 has different ip).
So, I used query:
SELECT * FROM table WHERE
account_id='1' AND product_id='1' AND ip!='192.168.1.1' AND domain!='127.0.0.1/test'
Of course, it returned 0 rows. Looked at mysql multiple where and where not in and wrote:
SELECT * FROM table WHERE
account_id='1' AND product_id='1' AND installation_ip NOT IN ('192.168.1.1') AND installation_domain NOT IN ('127.0.0.1/test')
My guess is that this query is identical (just formatted different way), so 0 rows again. Found some more examples too, but none worked in my case.
The syntax is correct, but you're using the wrong logical operation
SELECT *
FROM table
WHERE account_id='1' AND product_id='1' AND
(ip != '192.168.1.1' OR domain != '127.0.0.1/test')
Select * from table
Where ROWID <> myRowid
And account_id = '1'
And product_id = '1';
myRowid is the unique id given by your dbms to each record, in this case you need to retrieve it with your select statement and then pass it back when using this select. This will return all the rows with account_id = 1 and product_id = 1 except the one you have selected.
If your inputs are not defined/or if you want list then you may be look at Group By clause. Also, you may look at group_concat
Query would be something like:
SELECT ACCOUNT_ID, PRODUCT_ID, GROUP_CONCAT(DISTINCT IP||'|'||DOMAIN, ','), COUNT(1)
FROM TABLE
GROUP BY ACCOUNT_ID, PRODUCT_ID
P.S.: I dont have mysql installed hence the query syntax is not verified

Query database in weekly interval

I have a database with a created_at column containing the datetime in Y-m-d H:i:s format.
The latest datetime entry is 2011-09-28 00:10:02.
I need the query to be relative to the latest datetime entry.
The first value in the query should be the latest datetime entry.
The second value in the query should be the entry closest to 7 days from the first value.
The third value should be the entry closest to 7 days from the second value.
REPEAT #3.
What I mean by "closest to 7 days from":
The following are dates, the interval I desire is a week, in seconds a week is 604800 seconds.
7 days from the first value is equal to 1316578202 (1317183002-604800)
the value closest to 1316578202 (7 days) is... 1316571974
unix timestamp | Y-m-d H:i:s
1317183002 | 2011-09-28 00:10:02 -> appear in query (first value)
1317101233 | 2011-09-27 01:27:13
1317009182 | 2011-09-25 23:53:02
1316916554 | 2011-09-24 22:09:14
1316836656 | 2011-09-23 23:57:36
1316745220 | 2011-09-22 22:33:40
1316659915 | 2011-09-21 22:51:55
1316571974 | 2011-09-20 22:26:14 -> closest to 7 days from 1317183002 (first value)
1316499187 | 2011-09-20 02:13:07
1316064243 | 2011-09-15 01:24:03
1315967707 | 2011-09-13 22:35:07 -> closest to 7 days from 1316571974 (second value)
1315881414 | 2011-09-12 22:36:54
1315794048 | 2011-09-11 22:20:48
1315715786 | 2011-09-11 00:36:26
1315622142 | 2011-09-09 22:35:42
I would really appreciate any help, I have not been able to do this via mysql and no online resources seem to deal with relative date manipulation such as this. I would like the query to be modular enough to be able to change the interval weekly, monthly, or yearly. Thanks in advance!
Answer #1 Reply:
SELECT
UNIX_TIMESTAMP(created_at)
AS unix_timestamp,
(
SELECT MIN(UNIX_TIMESTAMP(created_at))
FROM my_table
WHERE created_at >=
(
SELECT max(created_at) - 7
FROM my_table
)
)
AS `random_1`,
(
SELECT MIN(UNIX_TIMESTAMP(created_at))
FROM my_table
WHERE created_at >=
(
SELECT MAX(created_at) - 14
FROM my_table
)
)
AS `random_2`
FROM my_table
WHERE created_at =
(
SELECT MAX(created_at)
FROM my_table
)
Returns:
unix_timestamp | random_1 | random_2
1317183002 | 1317183002 | 1317183002
Answer #2 Reply:
RESULT SET:
This is the result set for a yearly interval:
id | created_at | period_index | period_timestamp
267 | 2010-09-27 22:57:05 | 0 | 1317183002
1 | 2009-12-10 15:08:00 | 1 | 1285554786
I desire this result:
id | created_at | period_index | period_timestamp
626 | 2011-09-28 00:10:02 | 0 | 0
267 | 2010-09-27 22:57:05 | 1 | 1317183002
I hope this makes more sense.
It's not exactly what you asked for, but the following example is pretty close....
Example 1:
select
floor(timestampdiff(SECOND, tbl.time, most_recent.time)/604800) as period_index,
unix_timestamp(max(tbl.time)) as period_timestamp
from
tbl
, (select max(time) as time from tbl) most_recent
group by period_index
gives results:
+--------------+------------------+
| period_index | period_timestamp |
+--------------+------------------+
| 0 | 1317183002 |
| 1 | 1316571974 |
| 2 | 1315967707 |
+--------------+------------------+
This breaks the dataset into groups based on "periods", where (in this example) each period is 7-days (604800 seconds) long. The period_timestamp that is returned for each period is the 'latest' (most recent) timestamp that falls within that period.
The period boundaries are all computed based on the most recent timestamp in the database, rather than computing each period's start and end time individually based on the timestamp of the period before it. The difference is subtle - your question requests the latter (iterative approach), but I'm hoping that the former (approach I've described here) will suffice for your needs, since SQL doesn't lend itself well to implementing iterative algorithms.
If you really do need to determine each period based on the timestamp in the previous period, then your best bet is going to be an iterative approach -- either using a programming language of your choice (like php), or by building a stored procedure that uses a cursor.
Edit #1
Here's the table structure for the above example.
CREATE TABLE `tbl` (
`id` int(10) unsigned NOT NULL auto_increment PRIMARY KEY,
`time` datetime NOT NULL
)
Edit #2
Ok, first: I've improved the original example query (see revised "Example 1" above). It still works the same way, and gives the same results, but it's cleaner, more efficient, and easier to understand.
Now... the query above is a group-by query, meaning it shows aggregate results for the "period" groups as I described above - not row-by-row results like a "normal" query. With a group-by query, you're limited to using aggregate columns only. Aggregate columns are those columns that are named in the group by clause, or that are computed by an aggregate function like MAX(time)). It is not possible to extract meaningful values for non-aggregate columns (like id) from within the projection of a group-by query.
Unfortunately, mysql doesn't generate an error when you try to do this. Instead, it just picks a value at random from within the grouped rows, and shows that value for the non-aggregate column in the grouped result. This is what's causing the odd behavior the OP reported when trying to use the code from Example #1.
Fortunately, this problem is fairly easy to solve. Just wrap another query around the group query, to select the row-by-row information you're interested in...
Example 2:
SELECT
entries.id,
entries.time,
periods.idx as period_index,
unix_timestamp(periods.time) as period_timestamp
FROM
tbl entries
JOIN
(select
floor(timestampdiff( SECOND, tbl.time, most_recent.time)/31536000) as idx,
max(tbl.time) as time
from
tbl
, (select max(time) as time from tbl) most_recent
group by idx
) periods
ON entries.time = periods.time
Result:
+-----+---------------------+--------------+------------------+
| id | time | period_index | period_timestamp |
+-----+---------------------+--------------+------------------+
| 598 | 2011-09-28 04:10:02 | 0 | 1317183002 |
| 996 | 2010-09-27 22:57:05 | 1 | 1285628225 |
+-----+---------------------+--------------+------------------+
Notes:
Example 2 uses a period length of 31536000 seconds (365-days). While Example 1 (above) uses a period of 604800 seconds (7-days). Other than that, the inner query in Example 2 is the same as the primary query shown in Example 1.
If a matching period_time belongs to more than one entry (i.e. two or more entries have the exact same time, and that time matches one of the selected period_time values), then the above query (Example 2) will include multiple rows for the given period timestamp (one for each match). Whatever code consumes this result set should be prepared to handle such an edge case.
It's also worth noting that these queries will perform much, much better if you define an index on your datetime column. For my example schema, that would look like this:
ALTER TABLE tbl ADD INDEX idx_time ( time )
If you're willing to go for the closest that is after the week is out then this'll work. You can extend it to work out the closest but it'll look so disgusting it's probably not worth it.
select unix_timestamp
, ( select min(unix_tstamp)
from my_table
where sql_tstamp >= ( select max(sql_tstamp) - 7
from my_table )
)
, ( select min(unix_tstamp)
from my_table
where sql_tstamp >= ( select max(sql_tstamp) - 14
from my_table )
)
from my_table
where sql_tstamp = ( select max(sql_tstamp)
from my_table )

Find adjacent rows without stored procedure

Considering the following table:
someId INTEGER #PK
ageStart TINYINT(3)
ageEnd TINYINT(3)
dateBegin INTEGER
dateEnd INTEGER
Where dateBegin and dateEnd are dates represented as days since 1800-12-28...
And considering some sample data:
someId | ageStart | ageEnd | dateStart | dateEnd
------------------------------------------------
203 | 16 | 25 | 76533 | 76539 \
506 | 16 | 25 | 76540 | 76546 adjacent rows
384 | 16 | 25 | 76547 | 76553 /
342 | 16 | 25 | 76563 | 76569 \
545 | 16 | 25 | 76570 | 76576 adjacent rows
764 | 16 | 25 | 76577 | 76583 /
(There would be arbitrary rows mixed in off course, I just want to illustrate 2 relevant rowsets)
Is it possible to find adjacent rows for a given age category (ageStart to ageEnd) without a stored procedure? The criteria for adjacency is: dateStart is 1 day after dateEnd of the previous found row.
For instance, given the above sample data, if I were to query it with the following parameters:
ageStart = 16
ageEnd = 25
dateStart = 76533
I would like it to return me the rows 1, 2 and 3 of the sample data, since their dates are adjacent (dayStart is next day of previous row's dateEnd).
ageStart = 16
ageEnd = 25
dateStart = 76563
...would give me rows 4, 5 and 6 of the sample data
Probably not efficient if lots of data into your table but try this:
SELECT b.*
FROM
(SELECT #continue:=2) init,
(
SELECT *
FROM ageTable
WHERE ageStart=16 AND
ageEnd=25 AND
dateStart=76533
) a
INNER JOIN (
SELECT *
FROM ageTable
ORDER BY dateStart
) b ON (
b.ageStart=a.ageStart AND
b.ageEnd=a.ageEnd AND
b.dateStart>=a.dateStart
)
LEFT JOIN ageTable c ON (
c.dateStart=b.dateEnd+1 AND
c.ageStart=b.ageStart AND
c.ageEnd=b.ageEnd
)
WHERE
CASE
WHEN #continue=2 THEN
CASE
WHEN c.someId IS NULL THEN
#continue:=1
ELSE
#continue
END
WHEN #continue=1 THEN
#continue:=0
ELSE
#continue
END
You can consider your data to be in a parent-child relationship: a record is a child of a (parent) record if the child's startDate equals the parent's endDate + 1. For hierarchical data (with parent-child relationships), the nested sets model allows you to query the data without stored procedures. You can find a brief description of the nested sets model here:
http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
The idea is to number your records in a clever way so that you can use simple queries instead of recursive stored procedures.
While it is very easy to query hierarchical data stored in this way, some care is required when adding new records. Adding new records in a nested sets model requires updates of existing records. This may or may not be acceptable in your use case.
Well, you can generate a result-set ordered in a specific way and use LIMIT, to get only first record from it.
For example, get the next record by dateEnd in the list:
SELECT *
FROM `table`
WHERE `dateEnd` > '76546'
ORDER BY `dateEnd`
LIMIT 1
You will get:
384 | 16 | 25 | 76547 | 76553
For a previous row:
SELECT *
FROM `table`
WHERE `dateEnd` < '76546'
ORDER BY `dateEnd` DESC
LIMIT 1
You will get:
203 | 16 | 25 | 76533 | 76539
I doubt that it can be done with just one query...