Divide columns values by total rows in impala - mysql

SELECT COUNT(DISTINCT cgi.sample_idSince Impala does not allow the SET operation, or subqueries in a select statement, I'm having a hard time figuring out how to divide column values by the total number of rows returned. My ultimate goal is to calculate minor allele frequency at each chr:start position.
My data is structured as follows:
| chr | start | stop | ref | allele1seq | allele2seq | sample_id |
| 6 | 66720709 | 66720710 | A | A | T | 101-46-3 |
| 7 | 66720809 | 66720810 | GG | GA | GG | 101-46-3 |
I'd like to do something similar to the query below:
WITH vars as
(SELECT cgi.chr, cgi.start, concat(cgi.chr, ':', CAST(cgi.start AS STRING)) as pos, cgi.ref, cgi.allele1seq, cgi.allele2seq,
CASE
WHEN (cgi.allele1seq = cgi.ref AND cgi.allele2seq <> cgi.ref) THEN '1'
WHEN (cgi.allele1seq <> cgi.ref AND cgi.allele2seq = cgi.ref) THEN '1'
WHEN (cgi.allele1seq = cgi.ref AND cgi.allele2seq = cgi.ref) THEN '2'
ELSE '0' END as ma_count
FROM comgen_variants as cgi)
SELECT vars.*, (CAST(vars.ma_count as INT)/
((SELECT COUNT(DISTINCT cgi.sample_id) from comgen_variants as cgi) * 2)) as maf
FROM vars
Where my desired output would like like:
| chr | start | ref | allele1seq | allele2seq | ma_count | maf |
| 6 | 66720709 | A | A | T | 1 | .05 |
| 7 | 66720809 | GG | GG | GG | 0 | 0 |
In addition to figuring out a way to divide by row count, I also need to group the results by chr and pos, and then count how many times each alternate allele (where allele1seq and allele2seq are not equal to ref) occurs instead of simply counting per row as I have above; but I haven't gotten that far due to the counting issue.
Thanks in advance for your help.

It looks like you could just calculate the total number of distinct sample_ids*2 in advance, and then use that for the subsequent query, since that value doesn't change per row. If the value did depend on the row, you might want to take a look at the analytic/window functions available to Impala.
But, since it doesn't look like you need to, you could do something like the following:
WITH total AS
(SELECT COUNT(DISTINCT sample_id) * 2 AS total FROM comgen_variants)
SELECT cgi.*,
(CASE
WHEN (cgi.allele1seq = cgi.ref AND cgi.allele2seq <> cgi.ref) THEN 1
WHEN (cgi.allele1seq <> cgi.ref AND cgi.allele2seq = cgi.ref) THEN 1
WHEN (cgi.allele1seq = cgi.ref AND cgi.allele2seq = cgi.ref) THEN 2
ELSE 0 END) / total.total AS maf
FROM comgen_variants AS cgi, total;
I'm not sure that this is what the minor allele frequency is, though; it seems like you'd want to choose the second most common allele frequency for each locus?

Related

How do i add another COUNT statement with different condition to sql query

I have 2 tables.
1st table: duels
| duelId | user1Id | user2Id | gameId | winnerId |
2nd table: usergameprogress
| usergameprogressId | userId | gameId | gameStar |
Given an userId, I would like to get duel count, gameStar, win count for each gameId.
Example return:
| duelCount | duelWinCount | gameStar | gameId |
I have managed to get duelCount, gameStar and gameId given a userId but I couldn't add duelWinCount to my query result. How do I do that ?
My query:
SELECT
COUNT(d1.duelId) AS duelCount,
usergameprogress.gameId, usergameprogress.gameStar
FROM
duels d1
JOIN
usergameprogress ON (usergameprogress.gameId = d1.gameId)
WHERE
d1.user1Id = "gkfurcwsi033qzxg0u2bmj1ekebsklej"
OR d1.user2Id = "gkfurcwsi033qzxg0u2bmj1ekebsklej"
GROUP BY
usergameprogress.gameId
EDIT:
solved thanks to comment use sum instead of count
SELECT sum(case when d1.user1Id = 'gkfurcwsi033qzxg0u2bmj1ekebsklej' OR d1.user2Id="gkfurcwsi033qzxg0u2bmj1ekebsklej" then 1 else 0 end) AS totalDuelCount,sum(case when winnerId="gkfurcwsi033qzxg0u2bmj1ekebsklej" then 1 else 0 end) AS duelWinCount,usergameprogress.gameId,usergameprogress.gameStar FROM duels d1 JOIN usergameprogress ON (usergameprogress.gameId = d1.gameId) GROUP BY usergameprogress.gameId

First Unique Sql row

I have a MySql table of users order and it has columns such as:
user_id | timestamp | is_order_Parent | Status |
1 | 10-02-2020 | N | C |
2 | 11-02-2010 | Y | D |
3 | 11-02-2020 | N | C |
1 | 12-02-2010 | N | C |
1 | 15-02-2020 | N | C |
2 | 15-02-2010 | N | C |
I want to count number of new custmer per day defined as: a customer who orders non-parent order and his order status is C AND WHEN COUNTING A USER ONCE IN A DAY WE DONT COUNT HIM FOR OTHER DAYS
An ideal resulted table will be:
Timestamp: Day | Distinct values of User ID
10-02-2020 | 1
11-02-2010 | 1
12-02-2010 | 0 <--- already counted user_id = 1 above, so no need to count it here
15-02-2010 | 1
table name is cscart_orders
If you are running MySQL 8.0, you can do this with window functions an aggregation:
select timestamp, sum(timestamp = timestamp0) new_users
from (
select
t.*,
min(case when is_order_parent = 'N' and status = 'C' then timestamp end) over(partition by user_id) timestamp0
from mytable t
) t
group by timestamp
The window min() computes the timestamp when each user became a "new user". Then, the outer query aggregates by date, and counts how many new users were found on that date.
A nice thing about this approach is that it does not require enumerating the dates separately.
You can use two levels of aggregation:
select first_timestamp, count(*)
from (select t.user_id, min(timestamp) as first_timestamp
from t
where is_order_parent = 'N' and status = 'C'
group by t.user_id
) t
group by first_timestamp;

how to show all result query even though the results are empty

I count my data from database, but I have a problem with the result. the result only displays data that is not empty, while the empty data is not displayed. how do I display data rows that are empty and not empty?
the result of my query like this
pendidikan| Male | Famale | Total
----------+------+--------+------
SD | 3 | 4 | 7
SMP | 2 | 1 | 3
SMA | 1 | 3 | 4
S1 | 10 | 1 | 11
BUT i want the result like this :
pendidikan| Male | Famale | Total
----------+------+--------+------
SD | 3 | 4 | 7
SMP | 2 | 1 | 3
SMA | 1 | 3 | 4
S1 | 10 | 1 | 11
S2 | 0 | 0 | 0
S3 | 0 | 0 | 0
i want to show empty data from my database. this is my query
SELECT a.NamaStatusPendidikan, COUNT(c.IDPencaker) as total,
count(case when c.JenisKelamin='0' then 1 end) as laki,
count(case when c.JenisKelamin='1' then 1 end) as cewe
FROM msstatuspendidikan as a JOIN mspencaker as c ON
a.IDStatusPendidikan = c.IDStatusPendidikan JOIN
mspengalaman as d ON c.IDPencaker = d.IDPencaker
WHERE d.StatusPekerjaan = '0' AND c.RegisterDate
BETWEEN '2019-01-01' AND '2019-03-01' GROUP BY a.IDStatusPendidikan
Try running this query:
SELECT sp.NamaStatusPendidikan,
COUNT(*) as total,
SUM( p.JenisKelamin = 0 ) as laki,
SUM( p.JenisKelamin = 1 ) as cewe
FROM msstatuspendidikan sp LEFT JOIN
mspencaker p
ON sp.IDStatusPendidikan = p.IDStatusPendidikan AND
p.RegisterDate BETWEEN '2019-01-01' AND '2019-03-01' LEFT JOIN
mspengalaman g
ON g.IDPencaker = c.IDPencaker AND
g.StatusPekerjaan = 0
GROUP BY sp.IDStatusPendidikan;
Notes:
The JOINs have been replaced with LEFT JOINs.
Filtering conditions on all but the first table have been moved to the ON clauses.
This replaces the meaningless table aliases with table abbreviations, so the table is easier to read.
Things that looks like numbers probably are numbers, so I removed the single quotes.
This simplifies the counts, using the fact that MySQL treats booleans as numbers in a numeric context.

MYSQL - Selecting multiple rows in a single row

I have a table in MYSQL named as permit_bills which contains columns as bill_no, alcohol_typ, origin, 2000ml, 1000ml, bill_date. Table is shown below:
+---------+--------------+---------+--------+--------+-----------+
| bill_no | alcohol_typ | origin | 2000ml | 1000ml | bill_date |
+---------+------------- + --------+--------+--------+-----------+
| 2001 | s | f | 2 | 1 |01-02-2017 |
| 2001 | m | w | 3 | 4 |01-02-2017 |
+---------+--------------+---------+--------+--------+-----------+
I want to select all rows from above table into a single row based on their bill_no and bill_date and want to display the columns of 2000ml and 1000ml as per their alcohol_typ and `origin.
My output table must be like this:
+---------+--------------+-------------+------------+------------+-----------+
| bill_no | s_f_2000ml | s_f_1000ml | m_w_2000ml | m_w_1000ml | bill_date |
+---------+------------- + ------------+------------+------------+-----------+
| 2001 | 2 | 1 | 3 | 4 |01-02-2017 |
+---------+--------------+-------------+------------+------------+-----------+
Try this (pivot) query -
SELECT
bill_no,
MAX(IF(alcohol_typ = 's' AND origin = 'f', `2000ml`, NULL)) AS s_f_2000ml,
MAX(IF(alcohol_typ = 's' AND origin = 'f', `1000ml`, NULL)) AS s_f_1000ml,
MAX(IF(alcohol_typ = 'm' AND origin = 'w', `2000ml`, NULL)) AS m_w_2000ml,
MAX(IF(alcohol_typ = 'm' AND origin = 'w', `1000ml`, NULL)) AS m_w_1000ml,
bill_date
FROM permit_bills
GROUP BY bill_no, bill_date
Are you sure your output table needs to look like that?
You may be able to use the GROUP_CONCAT function instead which is sometimes an amazingly useful tool. You will need to split or explode the values in your application, but it might be all you need.
https://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_group-concat

Condition row total as a column in another table using MySQL

Firstly, I apologize for the terrible wording, but I'm not sure how to describe what I'm doing...
I have a table of computer types (id, type, name), called com_types
id | type | name
1 | 1 | Dell
2 | 4 | HP
In a second table, I have each individual computer, with a column 'type_id' to denote what type of computer it is, called com_assets
id | type_id | is_assigned
1 | 4 | 0
2 | 1 | 1
I'd like to create a view that shows each computer type, and how many we have on hand and in use, and a total, so the outcome would be
id | type | name | on_hand | in_use | total |
1 | 1 | Dell | 0 | 1 | 1 |
2 | 4 | HP | 1 | 0 | 1 |
As you can see, the on_hand, in_use, and total columns are dependent on the type_id and is_assigned column in the second table.
So far I have tried this...
CREATE VIEW test AS
SELECT id, type, name,
( SELECT COUNT(*) FROM com_assets WHERE type_id = id AND is_assigned = '0' ) as on_hand,
( SELECT COUNT(*) FROM com_assets WHERE type_id = id AND is_assigned = '1' ) as in_use,
SUM( on_hand + in_use ) AS total
FROM com_types
But all this returns is one column with all correct values, except the total equals ALL of the computers in the other table. Will I need a trigger to do this instead?
on_hand is the count of assigned = 0, and in_use is the count of assigned = 1. You can count them together, without the correlated subqueries, like this:
SELECT
com_types.id,
com_types.type,
com_types.name,
COUNT(CASE WHEN com_assets.is_assigned = 0 THEN 1 END) AS on_hand,
COUNT(CASE WHEN com_assets.is_assigned = 1 THEN 1 END) AS in_use,
COUNT(*) AS total
FROM com_types
JOIN com_assets ON com_types.id = com_assets.id
GROUP BY
com_types.id,
com_types.type,
com_types.name