How to perform a complex range query in MySQL - mysql

I have a table (Table A) with a field of integers (Field B). For each row of Table A, I would like to construct a range of +/- 100 surrounding the integer value of Field B then find all values from Field B that are within these ranges. The query needs to be performed for all values in Field B. The query needs to return each row that is within each row range. Here is an example of what I am trying to do:
Table A
_______
A 1000
B 3000
C 5000
D 1090
Using the above Table A, the query would first find the ranges (+/- 100) for all integers in Field B.
900 - 1100
2900 - 3100
4900 - 5100
990 - 1190
The query would then iterate through these ranges and return rows from Table A that fall within the generated ranges. Using the above example, the query would return:
A 1000
A 1000
B 3000
C 5000
D 1090
D 1090
A and D are returned twice because it they fall within their own ranges. How can I construct a query that will return each row that falls between the range of each row? Thanks in advance for the help.

SELECT t2.*
FROM tableA AS t1
INNER JOIN tableA AS t2 ON t2.fieldB >= (t1.fieldB - 100) AND t2.fieldB <= (t1.fieldB + 100)
Shouldn't A also be shown twice, since it's also in D's range? (that's the case with above query - if incorrect, please elaborate why ^^)

Start with your inner-most pre-qualifier of every Table A record... Then re-join to table A again. I've added the qualifying Group ranges low and hi to show the qualifier basis you were looking for... In addition to D showing up twice, A should show up twice too as it qualifies the "D"s range too.
select
a2.ShowLetter,
a2.FieldB,
GrpRanges.RangeLow,
GrpRanges.RangeHi
from
( select distinct
a1.FieldB - 100 as RangeLow,
a1.FieldB + 100 as RangeHi
from
TableA a1 ) GrpRanges
JOIN TableA a2
on a2.FieldB between GrpRanges.RangeLow and GrpRanges.RangeHi
order by
a2.ShowLetter

Related

mysql: how do I ignore all rows that contain (value column A), if one of these rows has a specific value in column B?

I am looking for a way to filter not only the duplicate rows, but also the "initial" row. The goal is to have a clean list of all positions. The list is used by sales / accounting to see open positions, thats why the initial "Invoice" position has to be removed as well if a "Cancellcation" exists for that invoice.
I've tried solutions with group by, subqueries and EXISTS, but can't get the expected result. Ideally, I get this to work as an additional filter inside the where clause.
Default
ID
Nr
Type
Amount
1
NR-100
Invoice
100
2
NR-101
Invoice
200
3
NR-102
Invoice
300
4
NR-100
Cancellation
100
5
NR-102
Cancellation
300
6
NR-103
Invoice
150
Expected results
ID
Nr
Type
Amount
2
NR-101
Invoice
200
6
NR-103
Invoice
150
EXISTence test would seem to be the way to go so I wonder what problem you had with it..
select *
from t
where type = 'invoice' and
not exists (select 1 from t t1 where t1.nr = t.nr and t1.type = 'cancellation')

Tree like data collation in SQL (Mysql)

I have two tables in my database
Table A with columns user_id, free_data, used_data
Table B with columns donor_id, receptor_id, share_data
Basically, a user (lets call x) has some data in his account which is represented by his entry in table A. The data is stored in free_data column. He can donate data to any other user (lets call y), which will show up as an entry in Table B. The same amount of data gets deducted from the user x free_data column.
While entry in Table B gets created, an entry in Table A for user y is also created with free_data value equal to share_data. Now user y can give away data to user z & the process continues.
Each user keep using their data & the entry used_data in Table A keeps on adding up to indicate how much data each user has used.
This is like a tree structure where there is a an entry with all the data (root node) who eventually gives data to others who in-turn give data to other nodes.
Now I would like to write an sql query such that, given a node x (id of entry in Table A), I should be able to sum up total data x has given & who all are beneficiaries at multiple level, all of their used_data need to be collated & showed against x.
Basically, I want to collate
Overall data x has donated.
How much of the donated data from x has been used up.
While the implementation is more graph-like, I am more interested to know if we assume it to be a tree below node x & can come up with a single sql query to be able to get the data I need.
Example
Table A
user_id, free_data, used_data
1 50 10
2 30 20
3 20 20
Table B
donor_id, receptor_id, share_data
1 2 30
1 3 20
Total data donated by 1 - 30 + 20 = 50
Total donated data used - 20 + 20 = 40
This is just one level where 1 donated to 2 & 3. 2 in turn could donated to 4 & all that data needed to be collated in a bubbled up fashion for calculating the overall donated data usage.
Yes its possible using a nested set model. There's a book by Joe Celko that describes but if you want to get straight into it there's an article that talks about it. Both the collated data that you need can be retrieved by a single select statement like this:
SELECT * FROM TableB where left > some_value1 and right < some_value2
In the above example to get all the child nodes of "Portable Electronics" the query will be:
SELECT * FROM Electronics WHERE `left` > 10 and `right` < 19
The article describes how the left and right columns should be initialised.
If I understand the problem correctly, the following should give you the desired results:
SELECT B.donor_id AS donor_id, SUM(A.used_data) AS total_used_data FROM A
INNER JOIN B ON A.user_id = B.receptor_id GROUP BY B.donor_id;
Hope this will solve your problem now.
Try below query(note that you will have to pass userid at 2 places):
SELECT SUM(share_data) as total_donated, sum(used_data) as total_used FROM tablea
LEFT JOIN tableB
ON tableA.user_id = tableB.donor_id
WHERE user_id IN (select receptor_id as id
from (select * from tableb
order by donor_id, receptor_id) u_sorted,
(select #pv := '1') initialisation
where find_in_set(donor_id, #pv) > 0
and #pv := concat(#pv, ',', receptor_id)) OR user_id = 1;

Coalesce pulling zero as value

this is driving me crazy!
I have three tables. The first table has a list of all records along with other data (region, dates, etc). The 2nd and 3rd tables have all the hours/cost data, but the 2nd table contains only historical records, and the 3rd table contains newer records. I want my coalesce to try to find a value in the newer records first, but if no record is found, to look in the historic table. For some reason, even though i KNOW there is a value in the historic table, the result of my coalesce is coming in as 0.
Table1
ID Region
1 US
2 US
3 Europe
4 US
5 Europe
6 US
Table2
ID Hours
1 10
2 15
3 20
Table3
ID Hours
4 3
5 7
6 4
So, my statement is written like this:
SELECT
t1.ID,
COALESCE(t3.hours, t2.hours) AS HOURS
FROM table1 t1
LEFT JOIN table2 t2
ON t1.ID=t2.ID
LEFT JOIN table3 t3
ON t1.ID=t3.ID
Now, for some reason, if the value is found in t3 (the newer records) it pulls in the correct value, but if it does not find a value and has to pull in a value from t2, it is pulling in a 0 instead of the actual value. Result looks like this:
ID HOURS
1 0
2 0
3 0
4 3
5 7
6 4
My guess is that it has something to do with the column type in table 2 (I have all column settings as VARCHAR(55), but I can't find any rules around data types in coalesce function about having to use only a certain column type with coalesce.
Appreciate any guidance!
edited to add results for Spencer's inquiry:
ID t2.hours + 0 t2.hours hex(t2.hours) length(t2.hours)
413190 240 240 F0 3
Incorrect joins:
FROM table1 t1
LEFT JOIN table2 t2 ON t1.ID=t2.ID
^
LEFT JOIN table3 t3 ON t1.ID=t2.ID
^
you're joining table 3 using values from table 2
It looks like the evaluation of t2.hours in the COALESCE function is being done in numeric context, and the values stored in the hours column are evaluating to a numeric value of 0.
One quick way to check what numeric that evaluates to is to add a zero, as in the first expression in this query:
SELECT t2.hours + 0
, t2.hours
, HEX(t2.hours)
, LENGTH(t2.hours)
FROM table2 t2
I'm curious what that query shows for one of the rows that's returning a 0 from the COALESCE expression, whether the numeric evaluation is returning 0, and whether there's any leading wonky characters in the column value.

Mysql query to count from A where not in B

I got a table A and a table B (and a Table C which is not really relevant). The relation is 1:n.
Table A
- id
- c_foreign_key
Table B
- id
- A_id
- datetime
Table A has about 400'000 entries, table B about 20 million.
I have a time-range, lets say from 2014/01/01 to 2014/12/31.
What i want for each month in this range is:
Count all entries from table A, grouped by c_foreign_key, where table A has no entries in table B for (month - 1.year to month).
The Result should look like this:
date c_foreign_key count(*)
--------------------------------
14/01 1 2000
14/01 2 3000
...
14/02 1 4000
14/01 2 6000
...
I already tried left join and "not in select" for each month the performance wasn't really good.
You should debug your SQL queries with explain more info at Mysql Explain Syntax, also you should place index- es on your datetime fields for a better performance. Explain usualy is used to see which indexes does mysql use in your query.

Obtain running frequency distribution from previous N rows of MySQL database

I have a MySQL database where one column contains status codes. The column is of type int and the values will only ever be 100,200,300,400. It looks like below; other columns removed for clarity.
id | status
----------------
1 300
2 100
3 100
4 200
5 300
6 300
7 100
8 400
9 200
10 300
11 100
12 400
13 400
14 400
15 300
16 300
The id field is auto-generated and will always be sequential. I want to have a third column displaying a comma-separated string of the frequency distribution of the status codes of the previous 10 rows. It should look like this.
id | status | freq
-----------------------------------
1 300
2 100
3 100
4 200
5 200
6 300
7 100
8 400
9 300
10 300
11 100 300,100,200,400 -- from rows 1-10
12 400 100,300,200,400 -- from rows 2-11
13 400 100,300,200,400 -- from rows 3-12
14 400 300,400,100,200 -- from rows 4-13
15 300 400,300,100,200 -- from rows 5-14
16 300 300,400,100 -- from rows 6-15
I want the most frequent code listed first. And where two status codes have the same frequency it doesn't matter to me which is listed first but I did list the smaller code before the larger in the example. Lastly, where a code doesn't appear at all in the previous ten rows, it shouldn't be listed in the freq column either.
And to be very clear the row number that the frequency string appears on does NOT take into account the status code of that row; it's only the previous rows.
So what have I done? I'm pretty green with SQL. I'm a programmer and I find this SQL language a tad odd to get used to. I managed the following self-join select statement.
select *, avg(b.status) freq
from sample a
join sample b
on (b.id < a.id) and (b.id > a.id - 11)
where a.id > 10
group by a.id;
Using the aggregate function avg, I can at least demonstrate the concept. The derived table b provides the correct rows to the avg function but I just can't figure out the multi-step process of counting and grouping rows from b to get a frequency distribution and then collapse the frequency rows into a single string value.
Also I've tried using standard stored functions and procedures in place of the built-in aggregate functions, but it seems the b derived table is out of scope or something. I can't seem to access it. And from what I understand writing a custom aggregate function is not possible for me as it seems to require developing in C, something I'm not trained for.
Here's sql to load up the sample.
create table sample (
id int NOT NULL AUTO_INCREMENT,
PRIMARY KEY(id),
status int
);
insert into sample(status) values(300),(100),(100),(200),(200),(300)
,(100),(400),(300),(300),(100),(400),(400),(400),(300),(300),(300)
,(100),(400),(100),(100),(200),(500),(300),(100),(400),(200),(100)
,(500),(300);
The sample has 30 rows of data to work with. I know it's a long question, but I just wanted to be as detailed as I could be. I've worked on this for a few days now and would really like to get it done.
Thanks for your help.
The only way I know of to do what you're asking is to use a BEFORE INSERT trigger. It has to be BEFORE INSERT because you want to update a value in the row being inserted, which can only be done in a BEFORE trigger. Unfortunately, that also means it won't have been assigned an ID yet, so hopefully it's safe to assume that at the time a new record is inserted, the last 10 records in the table are the ones you're interested in. Your trigger will need to get the values of the last 10 ID's and use the GROUP_CONCAT function to join them into a single string, ordered by the COUNT. I've been using SQL Server mostly and I don't have access to a MySQL server at the moment to test this, but hopefully my syntax will be close enough to at least get you moving in the right direction:
create trigger sample_trigger BEFORE INSERT ON sample
FOR EACH ROW
BEGIN
DECLARE _freq varchar(50);
SELECT GROUP_CONCAT(tbl.status ORDER BY tbl.Occurrences) INTO _freq
FROM (SELECT status, COUNT(*) AS Occurrences, 1 AS grp FROM sample ORDER BY id DESC LIMIT 10) AS tbl
GROUP BY tbl.grp
SET new.freq = _freq;
END
SELECT id, GROUP_CONCAT(status ORDER BY freq desc) FROM
(SELECT a.id as id, b.status, COUNT(*) as freq
FROM
sample a
JOIN
sample b ON (b.id < a.id) AND (b.id > a.id - 11)
WHERE
a.id > 10
GROUP BY a.id, b.status) AS sub
GROUP BY id;
SQL Fiddle