Different row count while creating a table or view in Impala - mysql

Different row count when trying to create a table and view in Impala
I am trying to run a query in Impala having a left outer join with another table. The table structure is as below:
SELECT COUNT (*)
FROM (
SELECT A.*,
B.ORDERED_DATE,
B.PROMISE_DATE,
B.REQUEST_DATE,
B.SCHEDULE_SHIP_DATE,
A.SCHEDULED_START_DATE,
A.SCHEDULED_COMPLETION_DATE,
A.DATE_RELEASED,
A.DATE_COMPLETED,
B.ORDERED_DATE_DT,
B.PROMISE_DATE_DT,
B.REQUEST_DATE_DT,
B.ORDERED_QUANTITY,
a.DEMAND_SOURCE_LINE_NUMBER,
B.FLOW_STATUS_CODE,
A.ORDER_NUMBER
FROM TABLE A
LEFT OUTER JOIN TABLE B
ON (A.DEMAND_SOURCE_LINE_ID) = (B.LINE_ID)
) AAAAA
Demand_source_line_id can be null here.
The row count is always different if I do select count(*), count(1). Also the inner select gives me row count different than outer one. Also if i try to create a view out of this query, the record count is different from if i create table out of same query.
Can someone help me?
Expected should be 3585 records. I am getting only 299 on count(*), and 662 on count(1) -- demand source line id is not null for 662 records.

As you mentioned Demand_source_line_id can be null and you are using in on condition, So definitely you will not get expected output and it will impact count as well.
Can you use coalesce function in on condition e.g coalesce(A.DEMAND_SOURCE_LINE_ID,-1) = coalesce(B.LINE_ID, -1).

Related

WHERE clause with INNER JOIN and Sub Query

I have achieved my desired query but I want to know how this one worked. I have multiple tables on my database and my requirements was to take the id from table called product and using this id, I want to retrieve some data from multiple tables and product id is a foreign key to the other tables. The query below works fine (by the way I was just experimenting and luckily got this query).
SELECT ponsfdp.*, product.pName, product.pImage, product.productSizes FROM product
INNER JOIN priceOnSizesForDigitalPrinting AS ponsfdp ON ponsfdp.pId_fk =
(SELECT pId FROM product WHERE pName LIKE "%booklet%")
WHERE pName LIKE "%booklet%";
But when I tried this query,
SELECT ponsfdp.*, product.pName, product.pImage, product.productSizes FROM product
INNER JOIN priceOnSizesForDigitalPrinting AS ponsfdp ON ponsfdp.pId_fk =
(SELECT pId FROM product WHERE pName LIKE "%booklet%");
It contains all the data even with null fields too. Can someone explain to me how it works? My personal opinion is both query should return same data because on the second query, I am using a subquery and it returns only one id, on the other hand, first query has a WHERE clause which generates the same id but by the help of name. How does the first query returns very specific columns and second return all columns even null columns too? I need an explanation for both queries.
Your first query also returning all rows as returned from your second query. But, when you are adding the last filter-
WHERE pName LIKE "%booklet%"
It's just keeping one single row from all rows where pName is like 'booklet'. You can consider the output from your second query as a single table and your logic working as below-
SELECT * FROM (
SELECT ponsfdp.*, product.pName, product.pImage, product.productSizes
FROM product
INNER JOIN priceOnSizesForDigitalPrinting AS ponsfdp
ON ponsfdp.pId_fk = (SELECT pId FROM product WHERE pName LIKE "%booklet%")
)A
WHERE pName LIKE "%booklet%"
Hope this will at least give you some insight of your query.
I don't see any need for a subquery here. You should be using the where condition to select rows from your FROM table, then use the ON clause of your join to find the right record(s) in your joined table for each row of the FROM table:
SELECT ponsfdp.*, product.pName, product.pImage, product.productSizes
FROM product
INNER JOIN priceOnSizesForDigitalPrinting AS ponsfdp
ON ponsfdp.pId_fk = pId
WHERE pName LIKE "%booklet%";

MySQL SELECT DISTINCT how to join some corresponding columns to query result

I have a table with
id,column1, column2, column3,column4,column5
MySQL SELECT DISTINCT returns me exactly what I need and this query must remains exactly AS IS
SELECT DISTINCT column1,column3 from tablename
But as an output I have only 2 columns as requested in DISTINCT query above
column1,column3
Question - how to join to this result other corresponding columns from the same table ?
For example
column1,column3,id,column5
Spent a couple of hours in Google, the two 'nearest' idea is something like
SELECT DISTINCT column1,column3
from tablename AS tbl1
INNER JOIN (SELECT id,column5 from tablename) AS tbl2
But such a query anyway returns me ONLY
column1,column3
My Expected result should be
column1,column3 (exactly as a result of SELECT DISTINCT) + two other columns id,column3 - so that I could save this result to a new table with 4 columns.
But I have only two columns as I told above.
EXAMPLE added
http://sqlfiddle.com/#!9/1e4472/1
Please, pay attention - the query result does not show ID 4 because this ID has duplicates in BOTH fields - company and city. So this is exactly what I need.
So, HOW to show in results not only COMPANY and CITY, but also, for example, ID and MANAGER ?
In case of duplication, if you want to display the first(original) record with the same values of company & city (i.e. like here you want to show id=1 not id=4) then following query should work:
SELECT min(id) as id,manager,company,city
FROM DistinctResult
group by company,city;
Click here for Demo
Hope it helps!
Simply :
SELECT DISTINCT Column1,Column3,id,column5
Or alternatively , if there is more than 1 value for these columns:
SELECT Column1,Column3,MAX(id),MAX(column5)
GROUP BY Column1,Column3
But it really depends on the requirement and you weren't very clear.

Why aren't these two result sets adding up?

I have a "master" table (cited_papers) of 144,194 rows, and a "sample" table (publication) that contains a sample of 7,977 of these rows. I am trying to get the rows (by their unique id field) that are in the master table but not the sample table:
SELECT DISTINCT c.*
FROM alex_WOS.cited_papers as c LEFT JOIN alex_WOS.publication USING (id)
WHERE alex_WOS.publication.id IS NULL
This works, but the result count I get is 141,019. Why aren't these counts adding up? (141,019 + 7977 != 144,194) I did a SELECT DISTINCT to count the rows in both the master and sample tables, so I am certain there are no duplicates in either of those tables.
The distinct may be throwing things off. Run the queries below to verify your numbers.
Verify the number of "master" rows:
SELECT count(*) FROM alex_WOS.cited_papers
Verify the number of "sample" rows:
SELECT count(*) FROM alex_WOS.publication
Verify the number of "master" rows not in "sample" table:
SELECT count(*) FROM alex_WOS.cited_papers c LEFT JOIN alex_WOS.publication p USING(id) WHERE p.id IS NULL
These numbers should add up...
As JNevill suggested, my sample table was not a subset of master after all. I'm a dummy...

Use count of unrelated table in SQL query

I have this query:
select skill.name, IFNULL(Round(((SUM(ROUND((student_skills.value/skill.value)*100,0)))/82),0),0) as successRate from skill left JOIN student_skills on skill.id = student_skills.skill_id group by skill.name
This query returns exactly what I want but I need to replace constant 82 (just for example) with number of rows in table user (something like COUNT(user.name)).
Problem is that user is not related to skill or student_skill table in any way.
How should I alter my query so that it would use current count of users?
Thanks
Use a subquery
select skill.name,
IFNULL(Round(((SUM(ROUND((student_skills.value/skill.value)*100,0)))/(select COUNT(*) from user)),0),0) as successRate
from skill
left JOIN student_skills on skill.id = student_skills.skill_id
group by skill.name

How to run a query for each row from a table?

I have a query that returns a list of Advertisers, each Advertisers has several Brands attached to it.
So I'm running this:
select distinct Name from tblAdvertiser
inner join tblAvertiserBrand on AdvertiserID=ID
So this would return a list of Advertisers and I'm trying to write a query that would loop for each row of the list that the query above would give in an automatic manner, any ideas?
SQL 'loops' by default
SELECT *
FROM TABLE WHERE advertiser_name IN
(SELECT DISTINCT Name
FROM tblAdvertiser
INNER JOIN tblAvertiserBrand ON AdvertiserID=ID)