How to avoid case sensitivity in group by using apache drill - apache-drill

Drill Environment:-
OS:- Window 10,
Version:- 1.9,
Mode:- embedded mode,
I have a column name 'State' in db, which have data like(e.g:- Florida,
florida,texas, etc).
My problem is while using SUM(Price) and group by in query,
Florida and florida are showing two seperate rows.
So how to avoid case sensitivity so that both Florida and florida act as 1
rows while firing count(Sate).?
EXAMPLE:->
This is my input table in db:-
State Price
Alaska 75
Texas 80
Alaska 90
Florida 100
florida 70
Sql Server Query:- select State, Sum(Price) from testTable group by State.
Sql Server Output:-
State Price
Alaska 165
Texas 80
Florida 170
Drill Query:- select T1.State, Sum(T1.Price) from . T1 group by T1.State.
Drill Output:-
State Price
Alaska 165
Texas 80
Florida 100
florida 70
I want same output as shown in Sql Server Output. Please help.

Drill provides a lowercase string function. You can do a subquery to first convert all your states to lower case and then do the group by.
select lstate, sum(lprice) from (select lower(T1.State) as lstate, T1.Price as lprice from . T1) group by lstate

Related

SQL: how to select where one column does not match another column for ALL records within a given group

I have a table named sales in a MySQL database that looks like this:
company manufactured shipped
Mercedes Germany United States
Mercedes Germany Germany
Mercedes Germany United States
Toyota Japan Canada
Toyota Japan England
Audi Germany United States
Audi Germany France
Audi Germany Canada
Tesla United States Mexico
Tesla United States Canada
Tesla United States United States
Here is a Fiddle: http://www.sqlfiddle.com/#!17/145ff/3
I would like to return the list of companies that ship ALL of their products internationally (that is, where the value in the manufactured column differs from the value in the shipped column for ALL records of a particular company).
Using the example above, the desired result set would be:
company
Toyota
Audi
Here is my (hackish) attempt:
WITH temp_table AS (
SELECT
s.company AS company
, SUM(CASE
WHEN s.manufactured != s.shipped THEN 1
ELSE 0
END
) AS count_international
, COUNT(s.company) AS total_within_company
FROM
sales s
GROUP BY
s.company
)
SELECT
company
FROM
temp_table
WHERE count_international = total_within_company
Essentially, I count the instances where the columns do not match. Then I check whether the sum of those mismatched instances matches the number of records within a given group.
This approach works, but it's far from an elegant solution!
Can anyone offer advice as to a more idiomatic way to implement this query?
Thanks!
We can GROUP BY company and use a HAVING clause to say all countries in shipped must differ to the country in manufactured:
SELECT company
FROM sales
GROUP BY company
HAVING COUNT(CASE WHEN manufactured = shipped THEN 1 END) = 0;
Try out here: db<>fiddle
The fiddle linked in the question is a Postgres DB, but MySQL is taged as DBMS.
In a MySQL DB, the above query can be simplified to:
SELECT company
FROM sales
GROUP BY company
HAVING SUM(manufactured = shipped) = 0;
In a Postgres DB, this is not possible.
You have to think in sets... you want to display all without a match -- find the matches display the rest
SELECT DISTINCT company
FROM sales
WHERE company NOT IN (
SELECT company
FROM sales
WHERE manufactured = shipped
)

Trouble with Group By and Having in SQL

I am trying to learn Group By and Having but I can't seem to understand what happened here. I used w3shools SQL Tryit Editor.
The table I created is:
name age country
------------------------
Sara 17 America
David 21 America
Jared 27 America
Jane 54 Canada
Rob 32 Canada
Matthew 62 Canada
The Query I used:
select
sum(age), country
from
NewTable
group by
country
having
age>25;
I expected the query to categorize the information by country and use age>25 filter to create the results but here is the output:
sum(age) country
--------------------
65 America
148 Canada
What happened?! The result is sum of American and Canadian people in all ages.
The piece you're missing is specific to the having keyword. Using the having clause in your query is applied to the dataset after the grouping occurs.
It sounds like you are expecting the records with age less than 25 to be excluded from your query before grouping occurs. But, the way it works is the having clause excludes the total age for each group that sums to a total over 25.
If you want to exclude individual records before totaling the sum of the age, you could do something like this (using a where clause which is applied prior to grouping):
select sum(age), country from NewTable where age > 25 group by country;
A where clause puts a condition on which rows participate in the results.
A having clause is like a where, but puts a condition on which grouped (or aggregated) values participate in the results.
Either, try this:
select sum(age), country
from NewTable
where age > 25 -- where puts condition on raw rows
group by country
or this:
select sum(age), country
from NewTable
group by country
having sum(age) > 25 -- having puts a condition on groups
depending on what you're trying to do.

To get one record for multiple duplicates

I have a view in which there are multiple duplicates. I want to get only one duplicate. There is no condition for extracting that duplicate .
View :
State|ZIP|Sales
AZ 231 asdas
AZ 231 qwerq
FL 12 wqeq
FL 12 lak
KY 1 zxc
KY 1 qwe
expected output:
State|ZIP|Sales
AZ 231 asdas
FL 12 wqeq
KY 1 zxc
There are more than 5000 records. If you do distinct it will still give the same dataset as these are the fields I am trying to extract from the view which is bigger and as you can see the records are not exactly distinct.
Thank You.
You can use aggregation for this since any of the many value is acceptable to you:
select state, zip, max(sales)
from your_table
group by state, zip
A way could be based on group by and aggregation function eg:
select state, ZIP, min(Sale)
from my_table
group by state, ZIP
or max length
select state, ZIP, max(length(Sale))
from my_table
group by state, ZIP

Get row position within a series of records in mysql

I have the following rows within a table.
MatchIDAuto CompetitionIDAuto TeamHome TeamAway MatchDate
4770 65 New Zealand South Africa 2017-02-19
4771 65 New Zealand South Africa 2017-02-21
4772 65 New Zealand South Africa 2017-02-25
4773 65 New Zealand South Africa 2017-03-01
4774 65 New Zealand South Africa 2017-03-04
What I need to be able to do is when I do the following:
select * from Match2 where MatchIDAuto=4772
Is know that it is the 3rd Match in the series. How could I dynamically calculate that with the query?
Without an ORDER BY clause there is no guarantee that row with id 4772 is the third row in the serie. That is because the database without an order by clause MAY generate different orders accordingly the Execution Plan.
So the first thing to do is to add an order by clause to your query. By the data it could be MatchIDAuto or MatchDate that I will leave by your choice. The query in MySql to get you the order number of a row will be:
select *
from (
select m.*,
#ord:=#ord+1 roworder
from Match2 m,
(select #ord:=0) t
ORDER BY MatchDate
) sub
where MatchIDAuto=4772;
This will return every column plus roworder with the order 3. On this query I choose the MatchDate field since, to me makes more sense on your data to be the column of your order by.
This technique will create a variable and sums it up to every row. You just wrap it in a subquery and query it.

How to get results from Mysql database using WHERE if there is more than 1 criterion for identification?

id points year country
-----------------------------------
1 45 1998 Mexico
2 45 2000 Germany
3 47 2010 Russia
4 45 1970 China
5 49 2010 Austria
I wonder how can I take row results considering only 2 items from country column. For example only records where country is Germany and Mexico. When I try to get results where only 1 country is criterion the thing is easy:
SELECT * FROM List WHERE Country='Mexico';
the result is:
id points year country
-----------------------------------
1 45 1998 Mexico
but when I try to get results where 2 country items are criteria problems start. I tried:
SELECT * FROM List WHERE country='Mexico' AND Country='Germany';
SELECT * FROM List WHERE country='Mexico' AND 'Germany';
SELECT * FROM List WHERE country='Mexico','Germany';
SELECT * FROM List WHERE country='Mexico'AND WHERE country='Germany';
but no desired result:
id points year country
-----------------------------------
1 45 1998 Mexico
2 45 2000 Germany
I understand that maybe I committed logical error because there is no single record where country is Mexico and Germany at same time, and sql maybe understands claim exactly that way, but, how to write correctly in sql language: Give me results for records where countries are Mexico and Germany. Thanks.
You are looking for IN operator
SELECT * FROM List WHERE Country in ('Mexico','Germany');
Just use OR.
So instead of
SELECT * FROM List WHERE country='Mexico' AND Country='Germany';
it would be
SELECT * FROM List WHERE country='Mexico' OR country='Germany';
IN is also a good function to use, especially if you've got multiple values that you want to check against but that's been covered in the other answers.
You need to use or or in, you have been using and and asking mysql to find a row where country is both Mexico and Germany which is not true.
SELECT * FROM List WHERE Country in ('Mexico','Germany');
try this:
SELECT * FROM List WHERE country='Mexico' OR Country='Germany';
SQL is using logic. Natural language is not.
When you say that you want the results for a list of countries you need to specify so. This request corresponds to an logical or. Since the name can be one or the other, both are correct.
SELECT * FROM List WHERE Country = 'Mexico' OR Country = 'Germany'
To prevent further mistakes like these, I recommend that you look up logical operations in the docs (they are very good). MySQL or the PostGres, both should be fine.