Something wrong with Inner Join on Teradata - mysql

I had to find a particular number on consumers on a network. Client suggested to use table "abcd" and to make sure that manufacturer='big_company" is met. So I ran below query on Teradata.
select count(*)from(select tel_num, manufacturer from abcd
where manufacturer='big_company'
and tel_num is not null)pqr
This query ran properly and the total number of record were 600 million.
The another question client had was, Out of the consumers on network how many of them are choosing a particular service. I was being asked to use table "wxyz" and ensure postpaid=1 condition is met. To achieve this I had to create inner join between abcd and wxyz on tel_num. Below was the query I used:
select cast (count (*)as bigint) from (select a.tel_num, b.postpaid from
abcd as a inner join wxyz as b on a.tel_num=b.tel_num
where a.manufacturer='big_company'
and b.postpaid=1) xyz
The above query generates 5 billion records.
This seems very strange because, since I have used inner join the number of records in the second query should be less than 600 million. I'm just not able to figure out where I'm going wrong.

As #useless'MJ already put it in a comment, you are probably getting multiple results per tel_num from table wxyz. You could avoid JOINand distinct altogether by using EXISTS like in
select cast (count (*)as bigint)
from abcd a
where exists (select 1 from wxyz b
where a.tel_num=b.tel_num and b.postpaid=1)
and a.manufacturer='big_company'

Related

SQL Query - Distinct on One Column for Distinct Value of Other (with INNER JOIN)

I appreciate that questions similar to this one have been asked on here before but I have thus far been unable to implement the answers provided into my code both because of wanting to distinguish duplicates in one column only whilst the other stays the same and the INNER JOIN in my code. The INNER JOIN is problematic because most of the provided answers use the PARTITION function and, being a novice with SQL, I do not know how to integrate this with it. Advice just on using INNER JOIN with PARTITION would be useful.
Whilst I could do this post-export in Python (where I will be using the desired output), this code currently outputs ~2 million rows, making it time-consuming to work with and check. Here is the code:
SELECT client_ip_address, language_enum_code
FROM vw_user_session_log AS usl
INNER JOIN vw_user_topic_ownership AS uto
ON usl.user_id = uto.user_id
Using SELECT DISTINCT instead of SELECT gets me closer to the desired output but rather than leaving one duplicate row behind it removes all of them. Advice on using this function whilst preserving one of the duplicate rows would be preferred. I am on a read-only connection to the database so the DELETE FROM approach seen here would only be viable if I could make a temporary query-able table from the query output which I don't think is possible and seems clumsy.
Raw data sample:
user_id: client_ip_address: language_enum_code: (other stuff...)
4 194:4:62:18 107
2 101:9:23:34 14
3 180:4:87:99 15
3 194:4:62:18 15
4 166:1:19:27 107
2 166:1:19:27 14
Desired result:
user_id: client_ip_address: language_enum_code: (other stuff...)
4 194:4:62:18 107
2 101:9:23:34 14
3 180:4:87:99 15
As you can see, any id-enum combination should be filtered to occur only once. The reason this is not any ip-enum combination is that multiple users can connect through the same IP address.
If you don't care which IP address you keep for each user_id / enum combo, then something like this should do:
SELECT user_id, min(client_ip_address), language_enum_code
FROM vw_user_session_log AS usl
INNER JOIN vw_user_topic_ownership AS uto
ON usl.user_id = uto.user_id
where client_ip_address is not null
group by user_id, language_enum_code
Do you simply want aggregation?
SELECT client_ip_address, GROUP_CONCAT(DISTINCT language_enum_code)
FROM vw_user_session_log usl INNER JOIN
vw_user_topic_ownership uto
ON usl.user_id = uto.user_id
GROUP BY client_ip_address;
This will return one row per client_ip_address with each language code in a comma delimited list.
You can also use MIN() or MAX() to get an arbitrary value for language_enum_code for each client_ip_address.

Creating joins based on range of number value

Could you guys provide me on the situation below?
I have 2 tables.
Table 1 looks like this:
Meanwhile, this is table 2:
I would like to join table 2 to table 1 to lookup the grade for each job based on the upper and lower limit column.
By conceptualizing some of the lovely answers here, I manage to come up with a statement that looks something like this:
FROM table2 LEFT JOIN table1 ON (table2.[score] >= table1.[lower limit]) AND (table2.[score] <= table1.[upper limit])
The statement above manage to join them according to a range, however, for some unknown reasons, some rows from the left table went missing and I could not determine what it is. e.g (2000 rows in table 2, but only 1800 in the query)
I am sure the join is the cause, as if i change the join to a equal left join, 2000 rows appear in the query.
Can someone advice me on this?
Regards,
Guang Yong
Perhaps it would be much cleaner to create a table with values from 1-100 and assign them each on of your categories, and essentially mirroring your table 1.
Then you can do Table 2
SELECT Table1.Grade, Table2.Score
FROM Table2 LEFT JOIN Table1 ON Table2.Score = Table1.Score
This would definitely cover all integers between 0 and 100.
If you are manually inputing the scores, you could also use a data macro as simple as this:
go to Table Tools >> Table >> Before Change
Then use the Set Field Action, and set
Name = Table2.Grade
Value = IIf([Score]>=70,"Good",IIf([Score]<=59,"bad","so so"))
With this ^ everytime you type in a score, it will automatically populate the grade column.
Another option is create a query as follows, that will evaluate each line and assign the proper grade:
SELECT Table2.Score,
IIf([Score]>=70,"Good",IIf([Score]<=59,"bad","so so")) AS Grade
FROM Table2;
Good luck!

How can i execute this query faster?

This is my query:
create table vi_all as
select
d.primaryid, d.age, d.gndr_cod, d.wt, d.wt_cod, d.reporter_country,
dr.primaryiddrug, dr.role_cod, dr.drug_name,
r.primaryidreac, r.pt,
o.primaryidoutc, o.outc_cod,
i.primaryidindi, i.indi_pt
FROM demo d,
drug dr,
reac r,
outc o,
indi i;
Each table contains at least 80K records and more than 20 fields so its getting really tough to execute select statement on multiple tables; and i just want 4 or 3 fields from each table so i thought of this, but the above query has taken more than 5 hours but still has not given back any result.
My crystal ball says you need something like this:
create table vi_all as
select
d.primaryid, d.age, d.gndr_cod, d.wt, d.wt_cod, d.reporter_country,
dr.primaryiddrug, dr.role_cod, dr.drug_name,
r.primaryidreac, r.pt,
o.primaryidoutc, o.outc_cod,
i.primaryidindi, i.indi_pt
FROM demo d
LEFT JOIN drug dr ON d.drug=dr.id
LEFT JOIN reac r ON d.reac=r.id
LEFT JOIN outc o ON d.outc=o.id
LEFT JOIN indi i ON d.indi=i.id;
As far as I can tell your query is selecting all results from all tables but not assocating them in anyways so, you can maybe get duplicate data in the newly created table. Also, if you have some good foreign keys to associate those tables, the performance will be considerably better.

read a list of values from another table using subquery and check where in condition

A small question may be it is silly but I am not getting idea how to solve this problem
select * from customers where id in(select assigned from users where username='test');
in the above query
select assigned from users where username='test'
this returns 1,2
but the condition where in doesnot work which should be like below
select * from customers where id in(1,2);
this is not the exact output i am just guessing that it might be this way. which is not so the problem is occuring.
i am getting only one row that is corresponding to 1
so help me figuring this out.
please check the sqlfiddle below:
http://sqlfiddle.com/#!2/95c28/2
thanks
SELECT DISTINCT c.*
FROM customers c
JOIN users u ON FIND_IN_SET(c.id, u.assigned) IS NOT NULL
Putting comma-separated values is a bad idea in relational databases, it makes everything more complicated. You should use a relation table instead, so you can write a normal equality join. The above query cannot be indexed, so it will be very innefficient if the tables are large.
SQLFIDDLE
if select assigned from users where username='test' returns 1,2. This means your customer table contains only Id=1.

MySQL Left Outer Join, Exclude Items in Second Table Belonging to User

I have two tables in my MySQL database, one is a library of all of the books in the database, and the other is containing individual rows corresponding to which books are in a user's library.
For example:
Library Table
`id` `title`...
===== ===========
1 Moby Dick
2 Harry Potter
Collection Table
`id` `user` `book`
===== ====== =======
1 1 2
2 2 2
3 1 1
What I want to do is run a query that will show all the books that are not in a user's collection. I can run this query to show all the books not in any user's collection:
SELECT *
FROM `library`
LEFT OUTER JOIN `collection` ON `library`.`id` = `collection`.`book`
WHERE `collection`.`book` IS NULL
This works just fine as far as I can tell. Running this in PHPMyAdmin will result in all of the books that aren't in the collection table.
However, how do I restrict that to a certain user? For example, with the above dummy data, I want book 1 to result if user 2 runs the query, and no books if user 1 runs the query.
Just adding a AND user=[id] doesn't work, and with my extremely limited knowledge of JOIN statements I'm not getting anywhere really.
Also, the ID of the results being returned (of query shown, which doesn't do what I want but does function) is 0-- how do I make sure the ID returned is that of library.id?
You'll have to narrow down your LEFT JOIN selection to only the books that a particular user has, then whatever is NULL in the joined table will be rows(books) for which the user does not have in his/her collection:
SELECT
a.id,
a.title
FROM
library a
LEFT JOIN
(
SELECT book
FROM collection
WHERE user = <userid>
) b ON a.id = b.book
WHERE
b.book IS NULL
An alternative is:
SELECT
a.id,
a.title
FROM
library a
WHERE
a.id NOT IN
(
SELECT book
FROM collection
WHERE user = <userid>
)
However, the first solution is more optimal as MySQL will execute the NOT IN subquery once for each row rather than just once for the whole query. Intuitively, you would expect MySQL to execute the subquery once and use it as a list, but MySQL is not smart enough to distinguish between correlated and non-correlated subqueries.
As stated here:
"The problem is that, for a statement that uses an IN subquery, the
optimizer rewrites it as a correlated subquery."
How about this? It's just off the top of my head - I don't have access to a database to test on right now. (sorry)
SELECT
*
FROM
library lib
WHERE
lib.id NOT IN (
SELECT
book
FROM
collection coll
WHERE
coll.user =[id]
)
;