Help diagnose bizzare MySQL query behavior - mysql

I have a very specific query that is acting up and I could use any help at all with debugging it.
There are 4 tables involved in this query.
Transaction_Type
Transaction_ID (primary)
Transaction_amount
Transaction_Type
Transaction
Transaction_ID (primary)
Timestamp
Purchase
Transaction_ID
Item_ID
Item
Item_ID
Client_ID
Lets say there is a transaction in which someone pays $20 in cash and $0 in credit it inserts two rows into the table.
//row 1
Transaction_ID: 1
Transaction_amount: 20.00
Transaction_type: cash
//row 2
Transaction_ID: 1
Transaction_amount: 0.00
Transaction_type: credit
here is the specific query:
SELECT
tt.Transaction_Amount, tt.Transaction_ID
FROM
ItemTracker_dbo.Transaction_Type tt
JOIN
ItemTracker_dbo.Transaction t
ON
tt.Transaction_ID = t.Transaction_ID
JOIN
ItemTracker_dbo.Purchase p
ON
p.Transaction_ID = tt.Transaction_ID
JOIN
ItemTracker_dbo.Item i
ON
i.Item_ID = p.Item_ID
WHERE
t.TimeStamp >= "2010-01-06 00:00:00" AND t.TimeStamp <= "2010-01-06 23:59:59"
AND
tt.Transaction_Format IN ('cash', 'credit')
AND
i.Client_ID = 3
when I execute this query, it returns 4 rows for a specific transaction. (it should be 2)
When I remove ALL where clauses and insert WHERE tt.Transaction_ID = problematicID it only returns two.
EDIT:::::
still repeats upon changing date range
The kicker:
When I change the initial daterange it only returns two rows for that specific transaction_id.
::::
Is it the way I use join? that's all I can think of...
EDIT: This is the problem
in purchase - two sepparate purchase_ID's can have the same transaction_ID (purhcase_ID breaks down specific item sales).
There are duplicate Transaction_ID rows in purchase_ID

We need to see all the data in all the tables to be able to know where the problem is. However, because the joins are the problem it is because one of your tables has two rows when you think it has only one.

There's a problem with your schema. You have rows with the same transaction_id, which is the primary key. I would think they couldn't be marked primary in that database. With two rows with the same id, that could cause unexpected extra rows to come back from the join(s).

Related

How to add a tag based on a column value

I'm trying to join two tables and select certain columns to display in the output including a 'flag' if a certain transaction amount is greater than or equal to 100. The flag would return a 1 if it is, else null.
I thought I could achieve this using a CASE in my SELECT but it only returns one record every time since it returns the first record that meets this condition. How do I just create this 'FLAG' column during my join easily?
SELECT payment_id, amount, type,
CASE
WHEN amount >= 100 THEN 1
ELSE NULL
END AS flag
FROM trans JOIN customers ON (user_id = cust_id)
JOIN bank ON (trans.bank = bank.id)
WHERE (error is false)
I expect an output such as:
payment_id amount type flag
1 81 3 NULL
2 104 2 1
3 150 2 1
4 234 1 1
However, I'm only getting the first record such as:
payment_id amount type flag
2 104 2 1
I tried your table structure in my local and it is working perfectly.
I need one thing from you is in which table you are having error column.
If I comment where condition then it is working fine.
If you're getting fewer rows than you expect, it's either due to:
Join condition
You're doing a INNER joins to the customers and bank tables. If you have 4 source rows in your trans table, but only one row that matches in your customers table (condition user_id = cust_id), then you will only have one row returned.
The same goes for the subsequent join to your bank table. If there you somehow have a transaction that references a bank which is not defined in the bank table, then you won't see a record for this row.
WHERE clause
Obviously you won't see any rows that don't meet the conditions specified here.
It's probably #1 -- check to see if the rows with payment_id IN (1,3,4) have corresponding user id values in the user table and corresponding bank id values in the banks table.

SQL Validate a column with the same column

I have the following situation. I have a table with all info of article. I will like to compare the same column with it self. because I have multiple type of article. Single product and Master product. the only way that I have to differences it, is by SKU. for example.
ID | SKU
1 | 11111
2 | 11112
3 | 11113
4 | 11113-5
5 | 11113-8
6 | 11114
7 | 11115
8 | 11115-1-W
9 | 11115-2
10 | 11116
I only want to list or / and count only the sku that are full unique. follow th example the sku that are unique and no have variant are (ID = 1, 2, 6 and 10) I will want to create a query where if 11113 are again on the column not cout it. so in total I will be 4 unique sku and not "6 (on total)". Please let me know. if this are possible.
Assuming the length of master SKUs are 5 characters, try this:
select a.*
from mytable a
left join mytable b on b.sku like concat(a.sku, '%')
where length(a.sku) = 5
and b.sku is null
This query joins master SKUs to child ones, but filters out successful joins - leaving only solitary master SKUs.
You can do this by grouping and counting the unique rows.
First, we will need to take your table and add a new column, MasterSKU. This will be the first five characters of the SKU column. Once we have the MasterSKU, we can then GROUP BY it. This will bundle together all of the rows having the same MasterSKU. Once we are grouping we get access to aggregate functions like COUNT(). We will use that function to count the number of rows for each MasterSKU. Then, we will filter out any rows that have a COUNT() over 1. That will leave you with only the unique rows remaining.
Take that unique list and LEFT JOIN it back into your original table to grab the IDs.
SELECT ID, A.MasterSKU
FROM (
SELECT
MasterSKU = SUBSTRING(SKU,1,5),
MasterSKUCount = COUNT(*)
FROM MyTable
GROUP BY SUBSTRING(SKU,1,5)
HAVING COUNT(*) = 1
) AS A
LEFT JOIN (
SELECT
ID,
MasterSKU = SUBSTRING(SKU,1,5)
FROM MyTable
) AS B
ON A.MasterSKU = B.MasterSKU
Now one thing I noticed from you example. The original SKU column really looks like three columns in one. We have multiple values being joined with hypens.
11115-1-W
There may be a reason for it, but most likely this violates first normal form and will make the database hard to query. It's part of the reason why such a complicated query is needed. If the SKU column really represents multiple things then we may want to consider breaking it out into MasterSKU, Version, and Color or whatever each hyphen represents.

Retrieve rows that have a first entry in 2014 in MySQL

I want to retrieve all rows from a table that have their first entry on or after 01/01/2014 but no later than 31/12/2014
Example of the table:
OID FK_OID Treatment Trt_DATE
1 100 19304 2011-05-24
2 100 19304 2011-08-01
3 100 19306 2014-03-05
4 200 19305 2012-02-02
5 300 19308 2014-01-20
6 400 19308 2014-06-06
For example. I would like to pull all entries that have STARTED treatment in 2014. So above i would to extract FK_OID's 300 and 400 because their first entry is in 2014, but i would like to omit FK_OID 100 because they have 2 entries prior to 2014.
How do i go about this? I can extract all entries within a date range etc but that brings back all entries for that date and doesn't omit anyone who has an entry prior to the start of the date range. It just returns their first entry in 2014.
For the ones who need to see that i have tried something. See below.
I am not an experienced coder and this is the best i can get because i don't have the knowledge.
SELECT
mod,
(select NHSNum from person p
WHERE
p.oid = t.fk_oid) as 'NHS'
FROM
timeline t
Where trt_date BETWEEN '2014-01-01' AND '2014-12-31'
ORDER BY trt_date ASC
This returns every treatment for 2014 regardless of whether it is the first ever one for that person. I want to omit anyone from this list who has had treatment before 01/01/2014 as well as only return the first treatment per person. For example, this code returns all treatments for all people in 2014. I only want their first one and only if it is their first one ever.
Thanks.
create table aThing
( oid int auto_increment primary key,
fk_oid int not null,
treatment int not null,
trt_date date not null
);
insert aThing (fk_oid,treatment,trt_date) values
(100, 19304, '2011-05-24'),
(100, 19304, '2011-08-01'),
(100, 19306, '2014-03-05'),
(200, 19305, '2012-02-02'),
(300, 19308, '2014-01-20'),
(400, 19308, '2014-06-06');
select fk_oid,dt
from
( select fk_oid,min(trt_date) as dt
from aThing
group by fk_oid
) xDerived
where year(dt)=2014;
+--------+------------+
| fk_oid | dt |
+--------+------------+
| 300 | 2014-01-20 |
| 400 | 2014-06-06 |
+--------+------------+
The inner part, the nested one, become a derived table, and is given a name xDerived. This means that even though it is just a result set, by making it a derived table, it can be referred to by name. So it is not a physical table, but a derived one, or virtual one.
So that derived table is a very simple group by with an aggregate function. It says, for every fk_oid, bring back one row and only 1 row, with its minimum value for trt_date.
So if you have 10 million rows in that table called aThing, but only 17 distinct values for fk_oid, it will return only 17 rows. Each row being the minimum of trt_date for its fk_oid.
So now that that is achieved, the outer wrapper says just show me those two columns (but with a year check). There is a complicated to explain reason why I had to do that, so I will try to do it here.
But I might need a little time to explain it well, so bear with me.
This will be a shortcut way to say it. I had to get the min into an alias, and I only had access to that alias if resolved in a derived table, to cleanse it so to speak, and then access it with an outer wrapper.
An alias of aggregate column, like as dt, is not available (as a pseudo like column name which is what an alias is) ... it is not available in a where clause. But by wrapping it in a derived table name, I cleanse it so to speak, and then I can access it in a where clause.
So I can't access it directly in its own query in the where clause, but when I wrap it in an envelope (a derived table), I can access it on the outside.
I will try better to explain it later, maybe, but I would have to show alternative attempts to gain access to results, and the syntax errors that would result.
There's probably a more elegant solution, but this seems to satisfy the requirement...
SELECT x.*
FROM my_table x
JOIN
( SELECT fk_oid
, MIN(trt_date) min_date
FROM my_table
GROUP
BY fk_oid
HAVING min_date > '2014-01-01'
) a
ON a.fk_oid = x.fk_oid
LEFT
JOIN my_table b
ON b.fk_oid = a.fk_oid
AND b.trt_date > '2014-12-31'
WHERE b.oid IS NULL;
Having a few years a experience with this, i decided to revisit it. The solution i now use regularly is:
SELECT t1.column1, t1.column2
FROM MyTable AS t1
LEFT OUTER JOIN MyTable AS t2
ON t1.fkoid = t2.fkoid
AND (t1.date > t2.date
OR (t1.date = t2.date AND t1.oid > t2.oId))
WHERE t2.fkoid IS NULL and t1.date >= '2014-01-01'

How to I find multiple logs in the same time interval?

I have a table with two columns of importance, customer ID# and timestamp. Whenever a customer orders something, five rows are created with the customer ID # and the timestamp of when it went through.
If there is more than five rows, it means our system hasn't processed the order correctly and there could be a problem, and I was asked to look through the log to find the customer IDs of any people who received more than 5, as well as how many times they received an incorrect amount and the number they received each time (when it was not 5)
I want it to show me, whenever the same customer ID (in column "ID") has more than 5 rows with the same timestamp (column "stamp") it will tell me 1. the person's customer ID 2. how many times this irregularity has happened to that customer ID, and 3. how many rows were in each irregularity (was it 6 or 7... or more? etc.) (if #2 was 3 times, I would like #3 to be an array like { 7, 8, 6 })
I don't know if this is possible... but any help at all will be appreciated. Thanks!
This should get you most of the way there:
SELECT `CustomerID`, `Timestamp`, COUNT(1)
FROM
OrderItems
GROUP BY
`CustomerID`, `Timestamp`
HAVING
COUNT(1) > 5
This will get you the IDs and Timestamps with more than 5 rows. I am making the assumption that the timestamps for all 5 (or more rows) are identical.
SELECT A.ID, A.TIMESTAMP
FROM "TABLE" A
WHERE
(SELECT COUNT(B.ID)
FROM "TABLE" B
WHERE B.ID = A.ID
AND B.TIMESTAMP = A.TIMESTAMP) > 5

mysql update with a self referencing query

I have a table of surveys which contains (amongst others) the following columns
survey_id - unique id
user_id - the id of the person the survey relates to
created - datetime
ip_address - of the submission
ip_count - the number of duplicates
Due to a large record set, its impractical to run this query on the fly, so trying to create an update statement which will periodically store a "cached" result in ip_count.
The purpose of the ip_count is to show the number of duplicate ip_address survey submissions have been recieved for the same user_id with a 12 month period (+/- 6months of created date).
Using the following dataset, this is the expected result.
survey_id user_id created ip_address ip_count #counted duplicates survey_id
1 1 01-Jan-12 123.132.123 1 # 2
2 1 01-Apr-12 123.132.123 2 # 1, 3
3 2 01-Jul-12 123.132.123 0 #
4 1 01-Aug-12 123.132.123 3 # 2, 6
6 1 01-Dec-12 123.132.123 1 # 4
This is the closest solution I have come up with so far but this query is failing to take into account the date restriction and struggling to come up with an alternative method.
UPDATE surveys
JOIN(
SELECT ip_address, created, user_id, COUNT(*) AS total
FROM surveys
WHERE surveys.state IN (1, 3) # survey is marked as completed and confirmed
GROUP BY ip_address, user_id
) AS ipCount
ON (
ipCount.ip_address = surveys.ip_address
AND ipCount.user_id = surveys.user_id
AND ipCount.created BETWEEN (surveys.created - INTERVAL 6 MONTH) AND (surveys.created + INTERVAL 6 MONTH)
)
SET surveys.ip_count = ipCount.total - 1 # minus 1 as this query will match on its own id.
WHERE surveys.ip_address IS NOT NULL # ignore surveys where we have no ip_address
Thank you for you help in advance :)
A few (very) minor tweaks to what is shown above. Thank you again!
UPDATE surveys AS s
INNER JOIN (
SELECT x, count(*) c
FROM (
SELECT s1.id AS x, s2.id AS y
FROM surveys AS s1, surveys AS s2
WHERE s1.state IN (1, 3) # completed and verified
AND s1.id != s2.id # dont self join
AND s1.ip_address != "" AND s1.ip_address IS NOT NULL # not interested in blank entries
AND s1.ip_address = s2.ip_address
AND (s2.created BETWEEN (s1.created - INTERVAL 6 MONTH) AND (s1.created + INTERVAL 6 MONTH))
AND s1.user_id = s2.user_id # where completed for the same user
) AS ipCount
GROUP BY x
) n on s.id = n.x
SET s.ip_count = n.c
I don't have your table with me, so its hard for me to form correct sql that definitely works, but I can take a shot at this, and hopefully be able to help you..
First I would need to take the cartesian product of surveys against itself and filter out the rows I don't want
select s1.survey_id x, s2.survey_id y from surveys s1, surveys s2 where s1.survey_id != s2.survey_id and s1.ip_address = s2.ip_address and (s1.created and s2.created fall 6 months within each other)
The output of this should contain every pair of surveys that match (according to your rules) TWICE (once for each id in the 1st position and once for it to be in the 2nd position)
Then we can do a GROUP BY on the output of this to get a table that basically gives me the correct ip_count for each survey_id
(select x, count(*) c from (select s1.survey_id x, s2.survey_id y from surveys s1, surveys s2 where s1.survey_id != s2.survey_id and s1.ip_address = s2.ip_address and (s1.created and s2.created fall 6 months within each other)) group by x)
So now we have a table mapping each survey_id to its correct ip_count. To update the original table, we need to join that against this and copy the values over
So that should look something like
UPDATE surveys SET s.ip_count = n.c from surveys s inner join (ABOVE QUERY) n on s.survey_id = n.x
There is some pseudo code in there, but I think the general idea should work
I have never had to update a table based on the output of another query myself before.. Tried to guess the right syntax for doing this from this question - How do I UPDATE from a SELECT in SQL Server?
Also if I needed to do something like this for my own work, I wouldn't attempt to do it in a single query.. This would be a pain to maintain and might have memory/performance issues. It would be best have a script traverse the table row by row, update on a single row in a transaction before moving on to the next row. Much slower, but simpler to understand and possibly lighter on your database.