I have a report that includes an item hierarchy that goes across the report in columns. I need to remove the duplicates that are occurring in the brand row as shown below:
Brand 1 | Brand 2 |
P1 | P2 | P3 | P4 | P1 | P2 | P3 | P4 |
i1 | i2 | i3 | i4 | i5 | i6 | i7 | i8 |
Instead what I'm getting is this:
B1 | B1 | B1 | B1 | B2 | B2 | B2 | B3 |
P1 | P2 | P3 | P4 | P1 | P2 | P3 | P4 |
i1 | i2 | i3 | i4 | i5 | i6 | i7 | i8 |
This is easy enough to do with row groups... But it seems a bit harder to do with column groups, which I am using in this case.
How can I get my Brand column group to appear like my top example?
As I suggested in my comment, I'm achieving the similar type of output for one of my report using column group.
e.g. to achieve following:
I got related value in the SQL result set and using column grouping like:
Does that make sense?
Assuming the data is correct, there is one important step required. While adding a parent column group, you must select "Add group header" as shown:
You will still probably need delete a column (column only, don't delete your group!) after doing the above, but it will achieve what you want!
Related
I'm using MySQL 5.7 and I'm trying to do a join with one of my source tables to a reference table in order to get the appropriate corresponding values. However, I'd like the join to be conditional so it can match according to the length of the value found in the source column.
Source Table
|---------------------|------------------|
| Company_Name | NAICS_Code |
|---------------------|------------------|
| Chem Inc | 325 |
|---------------------|------------------|
| Joe's Farming | 1112 |
|---------------------|------------------|
Reference Table
|---------------------|------------------|--------------------|------------------|
| NAICS_Code_3_Digit | NAICS_Code_ | NAICS_Code_4_Digit | NAICS_Cod_ |
| | 3D_Description | | 4D_Description |
|---------------------|------------------|--------------------|------------------|
| 325 | Chemicals | 3252 | Resin and Rubber|
|---------------------|------------------|--------------------|------------------|
| 111 | Crop Production | 1112 | Fruit and Nuts |
|---------------------|------------------|----------------------------------------
Final Table
|---------------------|------------------|------------------|--------------------|
| Company_Name | NAICS_Code | NAICS_Code_3D_ | NAICS_Code_4D |
| | | Description | Description |
|---------------------|------------------|---------------------------------------|
| Chem Inc | 325 | Chemicals | NULL |
|---------------------|------------------|------------------|--------------------|
| Joe's Farming | 1112 | Crop Production | Fruit and Nuts |
|---------------------|------------------|------------------|--------------------|
While I'm able to write a query that works, it takes an extremely long time and I' curious as to if there is a better way. Here's what I got so far:
SELECT src.Company_Name,
src.NAICS_Code,
CASE
WHEN LENGTH(src.NAICS_Code < 3 THEN NULL
ELSE ref.NAICS_Code_3D_Description
END AS NAICS_Code_3D_Description,
CASE
WHEN LENGTH(src.NAICS_Code < 4 THEN NULL
ELSE ref.NAICS_Code_4D Description
END AS NAICS_Code_4D_Description
FROM source_table AS src
LEFT JOIN reference_table AS ref ON CASE
WHEN LENGTH(src.NAICS_Code) = 4
AND src.NAICS_Code = ref.NAICS_Code_4_Digit THEN 1
WHEN LENGTH(src.NAICS_Code) = 3
AND src.NAICS_Code = ref.NAICS_Code_3_Digit THEN 1
ELSE 0
END = 1;
It might be more efficient to left join twice:
this avoids the need for the complicated logic in the on clause of the join
conditions are exclusive so it will not generate duplicates in the resultset
then you can use coalesce() in the select clause
So:
select
s.compay_name,
s.naics_code,
coalesce(r1.naics_code_3d_description, r2.naics_code_3d_description) naics_code_3d_description,
r2.naics_code_4d_description
from source_table s
left join reference_table r1 on r1.naics_code_3_digit = s.naics_code
left join reference_table r2 on r2.naics_code_4_digit = s.naics_code
If you want to evict source rows that did not match in the reference table, you can add a where clause, like:
where r1.naics_code_3_digit is not null or r2.naics_code_3d_description is not null
I have three tables that are all linked through an "id" column. When all three joined, it looks somewhat like this:
+----+-------+--------+-------------------+---------+
| id | Color | T1Data | distinct_value | T3_data |
+----+-------+--------+-------------------+---------+
| 1 | green | ab | A | 10 |
| 1 | green | ab | A | 20 |
| 1 | green | ab | B | 100 |
| 1 | green | ab | B | 200 |
| 2 | blue | xz | A | 30 |
| 2 | blue | xz | A | 40 |
| 2 | blue | xz | B | 300 |
| 2 | blue | xz | B | 400 |
+----+-------+--------+-------------------+---------+
Currently I'm just SELECTING the columns, averaging (AVG) the T3 data, and GROUP BY every column preceding the data:
SELECT T1.id, T1.Color, T1Data, T2.distinct_value, avg(T3_data)
from T1
left join T2 on T1.id = T2.id
left join T3 on (T3.id = T2.id and T3.distinct_value = T2.distinct_value)
group by T1.id, T1.Color, T1.Data, T2.distinct_value
order by T1.id;
This is the result:
+----+-------+--------+-------------------+---------+
| id | Color | T1Data | T2_distinct value | avg |
+----+-------+--------+-------------------+---------+
| 1 | green | ab | A | 15 |
| 1 | green | ab | B | 150 |
| 2 | blue | xz | A | 35 |
| 2 | blue | xz | B | 350 |
+----+-------+--------+-------------------+---------+
Which is the desired outcome. I was wondering if there is a more efficient way to call the aggregation data? The Color and T1Data columns are strings and will all be identical for the same id. However, I'd like to not have as many group by statements as I have columns preceding the averaged T3_data. Is there a way to group by just id and distinct_value that will produce the same output?
Yes, it is possible.
If you have MySQL 5.7.5 or newer, then you may omit color and t1data fields from the group by list, since they are functionally dependent on the t1.id field. See mysql manual on handling group by:
SQL99 and later permits such nonaggregates per optional feature T301 if they are functionally dependent on GROUP BY columns: If such a relationship exists between name and custid, the query is legal. This would be the case, for example, were custid a primary key of customers.
MySQL 5.7.5 and up implements detection of functional dependence.
If you have an earlier mysql versìon, then you can still use mysql's special aggregation function any_value():
This function is useful for GROUP BY queries when the ONLY_FULL_GROUP_BY SQL mode is enabled, for cases when MySQL rejects a query that you know is valid for reasons that MySQL cannot determine. The function return value and type are the same as the return value and type of its argument, but the function result is not checked for the ONLY_FULL_GROUP_BY SQL mode.
The 3rd solution is obviously disabling the only full group by sql mode, but I would not recommend that.
I am new to mysql and tried to research it but couldn't find any solution. I have a table like this:
| SW_Pair1 | SW_Pair2 | Pair1_VLAN1| Pair1_VLAN2| Pair2_VLAN1| Pair2_VLAN2| Inter | Mgmt| OSPF| Env | Domain|
|-----------------|-----------------|------------|------------|------------|------------|-------|-----|-----|-----|-------|
| Switch1.abc.com | Switch2.abc.com | VLAN-111 | VLAN-333 | Unknown | Unknown | 47 | 24 | 0.1 | Dev | abc |
| Switch2.abc.com | Switch1.abc.com | VLAN-222 | VLAN-444 | Unknown | Unknown | 47 | 24 | 0.1 | Dev | abc |
| Switch3.abc.com | Switch4.abc.com | VLAN-121 | VLAN-123 | Unknown | Unknown | 47 | 24 | 0.1 | Dev | abc |
| Switch4.abc.com | Switch3.abc.com | VLAN-515 | VLAN-717 | Unknown | Unknown | 47 | 24 | 0.1 | Dev | abc |
| Switch5.abc.com | Switch6.abc.com | VLAN-919 | VLAN-101 | Unknown | Unknown | 47 | 24 | 0.1 | Dev | abc |
| Switch6.abc.com | Switch5.abc.com | VLAN-105 | VLAN-108 | Unknown | Unknown | 47 | 24 | 0.1 | Dev | abc |
| Switch7.abc.com | Switch8.abc.com | VLAN-110 | VLAN-115 | Unknown | Unknown | 47 | 24 | 0.1 | Dev | abc |
| Switch8.abc.com | Switch7.abc.com | VLAN-199 | VLAN-200 | Unknown | Unknown | 47 | 24 | 0.1 | Dev | abc |
Lets take first 2 rows as an example.
SW_Pair1 in row 1 == SW_Pair2 in row 2
SW_Pair1 in row 2 == SW_Pair2 in row 1
I put them in next row but they can be anywhere in database. Now I would like to merge these 2 so that data in Pair1_VLAN1 and Pair1_VLAN2 in row 2 goes in Pair2_VLAN1 and Pair2_VLAN2 of row 1 and then row 2 disappears. So, here is how the table should look after merge:
| SW_Pair1 | SW_Pair2 | Pair1_VLAN1| Pair1_VLAN2| Pair2_VLAN1| Pair2_VLAN2| Inter | Mgmt| OSPF| Env | Domain|
|-----------------|-----------------|------------|------------|------------|------------|-------|-----|-----|-----|-------|
| Switch1.abc.com | Switch2.abc.com | VLAN-111 | VLAN-333 | VLAN-222 | VLAN-444 | 47 | 24 | 0.1 | Dev | abc |
| Switch3.abc.com | Switch4.abc.com | VLAN-121 | VLAN-123 | VLAN-515 | VLAN-717 | 47 | 24 | 0.1 | Dev | abc |
and so on ..
I am using python 2.7 to push data to sql.
Edit:
I tried below query to add additional checks on DELETE but it failed:
UPDATE yourTable AS a
DELETE FROM yourTable AS b ON a.SW_Pair1 = b.SW_Pair2 AND a.SW_Pair2 = b.SW_Pair1
WHERE Pair2_VLAN1 IS Unknown;
Or better, can it SET the values of Pair1_VLAN1 and Pair1_VLAN2 rows of second switch after it moves it data to switch 1? Maybe over-write the vlan to something like "MERGED". I can then safely remove anything that has "MERGED" in Pair1_VLAN1 and Pair1_VLAN2. I know it will only say that when it's data was successfully got moved to another row.
EDIT2:
nvm .. figured it out. See below:
UPDATE yourTable AS a
JOIN yourTable AS b ON a.SW_Pair1 = b.SW_Pair2 AND a.SW_Pair2 = b.SW_Pair1
SET a.Pair2_VLAN1 = b.Pair1_VLAN1,
a.Pair2_VLAN2 = b.Pair1_VLAN2,
b.Pair1_VLAN1 = "MERGED",
b.Pair1_VLAN2 = "MERGED
WHERE a.SW_Pair1 < a.SW_Pair2;
First update the first row in each pair with the data from the matching row:
UPDATE yourTable AS a
JOIN yourTable AS b ON a.SW_Pair1 = b.SW_Pair2 AND a.SW_Pair2 = b.SW_Pair1
SET a.Pair2_VLAN1 = b.Pair1_VLAN1,
a.Pair2_VLAN2 = b.Pair1_VLAN2
WHERE a.SW_Pair1 < a.SW_Pair2;
The WHERE clause ensures that only one row in each pair (the one with the lower name in SW_Pair1) is updated.
Then delete the rows that weren't updated. They will still have NULL in the columns that were updated by the first query.
DELETE FROM yourTable
WHERE Pair2_VLAN1 IS NULL;
This assumes that there are matching rows for everything. If you need something safer, you'll need to do a join that checks that there's a matching row with the opposite names.
DELETE a FROM yourTable AS a
JOIN yourTable AS b ON a.SW_Pair1 = b.SW_Pair2 AND a.SW_Pair2 = b.SW_Pair1
WHERE a.Pair2_VLAN1 IS NULL
I'm sure there is a cleaner way to do this, but this is a hacky way I came up with.
select
a.sw_pair1,
a.sw_pair2,
a.pair1_vlan1,
a.pair2_vlan1 as pair1_vlan2,
b.pair1_vlan1 as pair2_vlan1,
b.pair2_vlan1 as pair2_vlan2
from TABLENAME a
join TABLENAME b on a.sw_pair1 = b.sw_pair2
where cast(substring_index(substring_index(a.sw_pair1, '.abc.com', 1), 'Switch', -1) as unsigned) % 2 > 0
I'm using the modulo (% 2) to make sure we get the odd numbers in the first column only, therefore having the even numbers in column 2. I'd be curious to see if someone else can come up with a cleaner solution for that than I did. If so, that would help me with some of the things I do from time to time.
This method worked for me and seems simpler than the current answers. This gave your desired output from the sample data.
SELECT
a.SW_Pair1,
a.SW_Pair2,
a.Pair1_VLAN1,
a.Pair1_VLAN2,
b.Pair1_VLAN1 as Pair2_VLAN1,
b.Pair1_VLAN2 as Pair2_VLAN2
FROM test as a , test as b
WHERE a.SW_Pair1 = b.SW_Pair2 AND a.SW_Pair2>b.SW_Pair2;
If you want to store the merged data into table, then #Barmar's solution will work perfectly.
But if you just want to display the data then following query will get the job done:
select least(t1.SW_Pair1,t1.SW_Pair2),greatest(t1.SW_Pair1,t1.SW_Pair2),
t2.Pair1_VLAN1,t2.Pair1_VLAN2,
t1.Pair1_VLAN1 as Pair2_VLAN1,t1.Pair1_VLAN2 as Pair2_VLAN2
from tablet1 as t1
inner join tablet2 as t2
on t2.SW_Pair1 = t1.SW_Pair2 and t2.SW_Pair2=t1.SW_Pair1
group by least(t1.SW_Pair11,t1.SW_Pair2),greatest(t1.SW_Pair1,t1.SW_Pair2);
Hope it helps!
I have 3 tables in MySQL:
1) page (id, title)
2) visitor (id, name)
3) page_visit (page_id, visitor_id, timestamp_of_visit)
Visitors can visit pages multiple times, across several days. Hence, while we will have one row for a page, and one row for a visitor, we can have several page_visit rows, each with a timestamp of the visit.
I'm trying to find the number of unique visitors, by week. I know how to get the 'by week count' query for non-uniques (i.e. 'how many visitors did I see each week'). I'm not sure how to pick the unique visitors by week, though, with the visitor showing up on the list ONLY the first time they are ever seen.
----------- ----------- ----------------------------
| page | | visitor | | page_visit |
----------- ----------- ----------------------------
|id |title| |id |name | |pid|vid|timestamp of visit|
----------- ----------- ----------------------------
| 1 | p1 | | 1 | v1 | | 1 | 1 | 02-18-2016:08:30 |
| 2 | p2 | | 2 | v2 | | 1 | 1 | 02-18-2016:10:00 |
| 3 | p3 | | 3 | v3 | | 1 | 3 | 02-20-2016:23:45 |
| 4 | p4 | | 4 | v4 | | 2 | 3 | 02-22-2016:07:30 |
| 5 | p5 | | 5 | v5 | | 3 | 1 | 02-23-2016:08:30 |
| 6 | p6 | | 6 | v6 | | 3 | 6 | 02-24-2016:09:30 |
What the result set should show:
------------------------
| results |
------------------------
| Week of | Net new |
------------------------
| 02-15-2016 | 2 |
| 02-22-2016 | 1 |
As mentioned, I can figure out how to show ALL visitors by week. I'm not sure how to get the unique visitors.
I tried doing a min(timestamp of visit), but, based on where I tried it, it returned the lowest timestamp across all rows (understandably...).
Any help would be much appreciated!
This is a tricky question when you first encounter it. It requires two levels of aggregation. The first gets the first visit for each visitor, the second summarizes by time. The following does the summary by day:
select date(minvd), count(*) as numvisitors
from (select vid, min(visitdate) as minvd
from page_visit pv
group by vid
) v
group by date(minvd)
order by date(minvd);
Translating to weeks is always a bit tricky -- do they begin on Mondays? End on Saturdays? On Fridays? (I've seen all of these.) However, the above is additive, so you can just add all the values for a given week to get your value.
In case you want to do this without a subquery:
SELECT
<week>,
COUNT(DISTINCT PV.vid)
FROM
Page_Visit PV
LEFT OUTER JOIN Page_Visit PV2 ON
PV2.vid = PV.vid AND
PV2.visit_date < PV.visit_date
WHERE
PV2.vid IS NULL
GROUP BY
<week>
As Gordon mentions, how you determine the week can be tricky. Just add in that calculation where you see <week>. Personally, I like to use a Calendar table for that kind of functionality, but it's up to you. You can run any expressions directly against PV.visit_date to determine it.
I have a data table that I use to do some calculations. The resulting data set after calculations looks like:
+------------+-----------+------+----------+
| id_process | id_region | type | result |
+------------+-----------+------+----------+
| 1 | 4 | 1 | 65.2174 |
| 1 | 5 | 1 | 78.7419 |
| 1 | 6 | 1 | 95.2308 |
| 1 | 4 | 1 | 25.0000 |
| 1 | 7 | 1 | 100.0000 |
+------------+-----------+------+----------+
By other hand I have other table that contains a set of ranges that are used to classify the calculations results. The range tables looks like:
+----------+--------------+---------+
| id_level | start | end | status |
+----------+--------------+---------+
| 1 | 0 | 75 | Danger |
| 2 | 76 | 90 | Alert |
| 3 | 91 | 100 | Good |
+----------+--------------+---------+
I need to do a query that add the corresponding 'status' column to each value when do calculations. Currently, I can do that adding the following field to calculation query:
select
...,
...,
[math formula] as result,
(select status
from ranges r
where result between r.start and r.end) status
from ...
where ...
It works ok. But when I have a lot of rows (more than 200K), calculation query become slow.
My question is: there is some way to find that 'status' value without do that subquery?
Some one have worked on something similar before?
Thanks
Yes, you are looking for a subquery and join:
select s.*, r.status
from (select s.*
from <your query here>
) s left outer join
ranges r
on s.result between r.start and r.end
Explicit joins often optimize better than nested select. In this case, though, the ranges table seems pretty small, so this may not be the performance issue.