Hello i have two tables with same structure and now I want merge it.
Here is structure:
Terms:
steamid - that goes without saying
regcas - keep only a smaller value
VIP - sum
FunVIP - ignore when duplicate
Days - sum
KilledCT - sum
WinPP - sum
LastT - sum
cas - sum
lastnick - ignore when duplicate
lastlog - ignore when duplicate
ct_cas - sum
simon_cas - sum
Example when duplicate:
row from main table
steamid | regcas | VIP | FunVIP | Days | KilledCT | WinPP | LastT | lastnick | lastlog | ct_cas | simon_cas
------------------------------------------------------------------------------------------------------------------------------
76561198040874389 | 1546639030 | 1 | 0 | 125 | 1000 | 20 | 50 | Bomber | 1546639037 | 64 | 50
row from second table
steamid | regcas | VIP | FunVIP | Days | KilledCT | WinPP | LastT | lastnick | lastlog | ct_cas | simon_cas
------------------------------------------------------------------------------------------------------------------------------
76561198040874389 | 1553888234 | 1 | 5 | 100 | 1555 | 40 | 20 | Lucker | 1549387793 | 10 | 1
Result
steamid | regcas | VIP | FunVIP | Days | KilledCT | WinPP | LastT | lastnick | lastlog | ct_cas | simon_cas
------------------------------------------------------------------------------------------------------------------------------
76561198040874389 | 1546639030 | 2 | 0 | 225 | 2555 | 60 | 70 | Bomber | 1546639037 | 74 | 51
I absolutely don't know how to compose a complex SQL statement and I need help.
You seem to want union all and group by. I have no idea what "ignore with duplicate" is supposed to mean, but min() seems close enough. So:
select steamid, min(regcas) as regcas, sum(vip) as vip),
min(FunVIP) as FunVIP,
sum(Days) as days, sum(KilledCT) as KilledCT, sum(WinPP) as WinPP,
sum(LastT) as LastT, sum(cas) as cas,
min(lastnick) as lastnick,
min(lastlog) as lastlog,
sum(ct_cas) as ct_cas, sum(simon_cas) as simon_cas
from ((select t1.* from table1 t1) union all
(select t2.* from table2 t2)
) t2
group by steamid;
To merge two tables.Can use join tables command.
select*from natural join ;
OR [1]
select*from, where column.table1=column.table2;
Related
I have two tables in a database that I would like to combine in a specific way.
Here are the tables:
table: watchhistory
customerid | titleid | rating | date
------------+-----------+--------+------------
1488844 | tt0389605 | 3 | 2005-09-06
1181550 | tt0389605 | 3 | 2004-02-01
1227322 | tt0389605 | 4 | 2004-02-06
786312 | tt0389605 | 3 | 2004-11-16
525356 | tt0389605 | 2 | 2004-07-11
1009622 | tt0389605 | 1 | 2005-01-19
table: media
mediaid | directorid | title | genre | runtime | releasedate
-----------+------------+----------------+----------------------+---------+-------------
tt0090557 | nm0851724 | Round Midnight | [Drama, Music] | 133 | 1986
tt0312296 | nm0146385 | 1 Giant Leap | [Documentary, Music] | 155 | 2002
tt0078721 | nm0001175 | 10 | [Comedy, Romance] | 122 | 1979
tt2170245 | nm3593080 | 10 | [Thriller] | 76 | 2012
tt5282238 | nm6207118 | 10 | [Thriller] | 90 | 2015
tt0312297 | nm0302572 | 10 Attitudes | [Comedy, Drama] | 87 | 2001
I would like to make a table with the following columns:
title (from media) | Views#
I created this query to get the top 10 titleids, meaning the top 10 titles from watchhistory that appear in watchhistory the most times:
SELECT titleid, count(*) as Views FROM watchhistory GROUP BY titleid ORDER BY Views DESC limit 10;
titleid | views
------------+-------
tt7631348 | 1307
tt14627576 | 1065
tt8372506 | 1063
tt5793632 | 1056
tt1403008 | 1053
tt7825602 | 1051
tt6840954 | 1046
tt12780424 | 1042
tt7266106 | 1036
tt6539274 | 1035
The goal is to essentially replace this titleid column (from watchhistory) with the title (from media). I tried using joins between the watchhistory.titleid and media.mediaid with no luck.
What SQL query do I need to get this desired table?
Thanks in advance.
You need to INNER JOIN to your media table on mediaid:
SELECT m.title, count(wh.*) as Views
FROM watchhistory wh
INNER JOIN media m on m.mediaid = wh.titleid
GROUP BY m.mediaid
ORDER BY Views DESC LIMIT 10;
To see what the select and join are doing, you can simplify it:
SELECT m.*, wh.*
FROM watchhistory wh
INNER JOIN media m on m.mediaid = wh.titleid
The result will be a joined 'table' that has the two tables combined on the mediaid/titleid.
Context:
I'm attempting to take a series of market transactions, and determine the amount of money actually moving per item type. This is pretty much my first attempt at MySql, so the query is ugly, but the following nearly works:
SELECT types.typename,
averages.type,
averages.price,
movement.sold,
( averages.price * movement.sold ) AS value
FROM (SELECT type,
Round(Avg(price)) AS price
FROM orders
GROUP BY type) AS averages
INNER JOIN (SELECT type,
( startingvolume - currentvolume ) AS sold
FROM (SELECT type,
Sum(volume) AS currentVolume,
Sum(volumeentered) startingVolume
FROM orders
GROUP BY type) AS movement
WHERE ( startingvolume - currentvolume ) > 10000
ORDER BY sold) AS movement
ON averages.type = movement.type
INNER JOIN invtypes AS types
ON types.typeid = averages.type
ORDER BY value DESC
LIMIT 10 ;
-
+------------------------------------+-------+---------+------------+------------------+
| typeName | type | price | sold | value |
+------------------------------------+-------+---------+------------+------------------+
| Dirt | 34 | 1904767 | 2670581874 | 5086836224393358 |
| Light Wood | 2629 | 42999 | 2756595 | 118530828405 |
| Dark Wood | 24509 | 47344 | 1107771 | 52446310224 |
| Stone | 21922 | 18386 | 1505884 | 27687183224 |
| Grass | 238 | 5643 | 4554470 | 25700874210 |
| Paper | 3814 | 25635 | 861006 | 22071888810 |
| Iron | 3699 | 320270 | 58833 | 18842444910 |
| Ink | 16275 | 8552 | 2200545 | 18819060840 |
| Loam | 2679 | 5759 | 2608771 | 15023912189 |
| Copper | 672 | 904612 | 14989 | 13559229268 |
+------------------------------------+-------+---------+------------+------------------+
The problem with the data above is that the raw market data is unavoidably corrupted by outliers, as you can see below:
select type, price from orders where type = 34 order by price desc limit 10;
-
+------+-----------+
| type | price |
+------+-----------+
| 34 | 200000000 |
| 34 | 15.99 |
| 34 | 12.06 |
| 34 | 10 |
| 34 | 7.67 |
| 34 | 7.5 |
| 34 | 7.3 |
| 34 | 7.17 |
| 34 | 7.1 |
| 34 | 7.06 |
+------+-----------+
Core problem:
99% of the market data is clean, but the outliers destroy the average, and MySql doesn't seem to have a median function. I've found several examples of how to find the median of an entire column, but I need the median per-item.
How would I determine a per-item median in stead of a per-item mean, or efficiently clean the data of these outliers prior to running the primary query?
Note:
I've tried omitting results via std, but prices of items range from $17 to $10B, while deviation remains relatively low, regardless of price range.
I won't touch your original query because it very complex, but one option you could do would be to use a subquery to remove any statistical outliers. For example, if you wanted to remove any outlier from the orders table whose value is more than say two standard deviations away from the mean you could use:
SELECT t1.type,
t1.price
FROM orders t1
INNER JOIN
(
SELECT type,
AVG(price) AS AVG,
STD(price) AS STD
FROM orders
GROUP BY type
) t2
ON t1.type = t2.type
WHERE t1.price < ABS(2*t2.STD - t2.AVG) -- any value more than 2 standard devations
-- away from the mean is discarded
Demo here:
SQLFiddle
Let me elaborate. I have a table like this (updated to include more example)
| id | date | cust | label | paid | due |
+----+-----------+------+-------------------------+------+-------+
| 1 |2016-02-02 | 1 | SALE: Acme Golf Balls | 0 | 1000 |
| 20 |2016-03-01 | 1 | PAYMENT: transaction #1 | 700 | 0 |
| 29 |2016-03-02 | 1 | PAYMENT: transaction #1 | 300 | 0 |
| 30 |2016-03-02 | 3 | SALE: Acme Large Anvil | 500 | 700 |
| 32 |2016-03-02 | 3 | PAYMENT: transaction #30| 100 | 0 |
| 33 |2016-03-03 | 2 | SALE: Acme Rockets | 0 | 2000 |
Now I need to output a table that displays sales that haven't been paid in full and the remaining amount. How do I do that? There's not much info out there on how to relate rows from the same table.
EDIT: Here's the output table I'm thinking of making
Table: debts_n_loans
| cust | label | amount |
==========================================
| 3 | SALE: Acme Large Anvil | 100 |
| 2 | SALE: Acme Rockets | 2000 |
If cust is the key that ties them together, then you can just use aggregation and a having clause:
select cust, sum(paid), sum(due)
from t
group by cust
having sum(paid) <> sum(due);
If you want the details, you can use a join, in or exists to get the details.
EDIT:
If you need to do this using the transaction at the end of the string:
select t.id, t.due, sum(tpay.paid) as paid
from t left join
t tpay
on tpay.label like '%#' || t.id
where t.label like 'SALE:%' and
tpay.label like 'PAYMENT:%'
group by t.id, t.due
having t.due <> sum(tpay.paid);
So you only need the rows with a due greater than 0
SELECT * FROM <table> WHERE due > 0;
Try this:
SELECT
cust,
SUM(due) - SUM(paid) AS remaining
FROM t1
GROUP BY cust
HAVING SUM(due) > SUM(paid);
Query 1:
SELECT num_requerimiento, asunto
FROM masivos_texto INNER JOIN envios_masivos
ON id_masivos=id_envio;
Result 1:
+---------------------+---------------------+
| num_requerimiento | asunto |
|---------------------+----------------------
| 1800 | inscripcion |
|---------------------+---------------------+
| 1801 | seguimiento |
+---------------------+---------------------+
Query 2:
SELECT id_envio, estatus, count(estatus)
FROM acuses_recibo
WHERE id_envio IN (SELECT id_masivos FROM cati_atencion.masivos_texto WHERE fecha >= '2014-01-01' AND fecha <= '2015-06-16')
GROUP BY id_envio, estatus;
Result 2:
+---------------------+---------------------+----------------------+
| id_envio | estatus | count(estatus) |
|---------------------+--------------------------------------------+
| 84 | 0 | 4031 |
|---------------------+---------------------+----------------------+
| 84 | 1 | 632 |
+---------------------+---------------------+----------------------+
| 85 | 0 | 35635 |
+---------------------+---------------------+----------------------+
| 85 | 1 | 3711 |
+---------------------+---------------------+----------------------+
Desired Result:
+---------------------+-----------------+------------+------------+-------------------+
| num_requerimiento | asunto | id_envio | estatus | count(estatus) |
|---------------------+-----------------+------------+------------+-------------------+
| 1800 | inscripcion | 84 | 0 | 4031 |
|---------------------+-----------------+------------+------------+-------------------+
| 1800 | inscripcion | 84 | 1 | 632 |
+---------------------+-----------------+------------+------------+-------------------+
| 1801 | seguimiento | 85 | 0 | 635 |
+---------------------+-----------------+------------+------------+-------------------+
| 1801 | seguimiento | 85 | 1 | 711 |
+---------------------+-----------------+------------+------------+-------------------+
in the Desired Result the id_envio/id_masivos corresponding to num_requerimiento 1800 is 84,
and the id_envio/id_masivos corresponding to num_requerimiento 1801 is 85,
and estatus in the 2nd table cant take up to three values, than i.a. for your assistance.
UNION doesn´t work, it gives me the 1st table followed by the 2nd, and only if the selects are of the same number of columns.
To do this with SQL, you will need a table relating your masivos_texto and acuses_recibo tables. I suggest you create a table. You could call it req_id or anything suitable. This is often called a JOIN table. It will have this content
num_requerimiento id_envio
1800 84
1801 85
Then you'll be able to join your first and second queries together appropriately.
It's not possible to write your query for you without knowing the rows of your tables.
Solved!! I needed to use aliases to each SELECT, as adding an alias to each select level, like this:
SELECT result1.num_requerimiento, result1.asunto, result1.id_masivos, result2.estatus, result2.conteo
FROM
(SELECT C.num_requerimiento, B.asunto, B.id_masivos
FROM masivos_texto B INNER JOIN envios_masivos C
ON B.id_masivos=C.id_envio) as result1
INNER JOIN
(SELECT A.id_envio, A.estatus, count(estatus) as conteo
from acuses_recibo A
WHERE A.id_envio IN (SELECT B.id_masivos FROM masivos_texto B where B.fecha >= '2014-01-01' AND B.fecha <= '2015-06-16')
GROUP BY A.id_envio, A.estatus) as result2
ON result1.id_masivos=result2.id_envio;
and that generates the 3rd table needed. Hope it helps someone in the future.
I tried to narrow down the problem as much as possible, it is still quite something. This is the query that doesn't work the way I want it:
SELECT *, MAX(tbl_stopover.dist)
FROM tbl_stopover
INNER JOIN
(SELECT edges1.id id1, edges2.id id2, COUNT(edges1.id) numConn
FROM tbl_edges edges1
INNER JOIN tbl_edges edges2
ON edges1.nodeB = edges2.nodeA
GROUP BY edges1.id HAVING numConn = 1) AS tbl_conn
ON tbl_stopover.id_edge = tbl_conn.id1
GROUP BY id_edge
Here is what I get:
|id | edge | dist | id1 | id2 | numConn | MAX(tbl_stopover.dist) |
------------------------------------------------------------------
|2 | 23 | 2 | 23 | 35 | 1 | 9 |
|4 | 24 | 5 | 24 | 46 | 1 | 9 |
------------------------------------------------------------------
and this is what I would want:
|id | edge | dist | id1 | id2 | numConn | MAX(tbl_stopover.dist) |
------------------------------------------------------------------
|3 | 23 | 9 | 23 | 35 | 1 | 9 |
|5 | 24 | 9 | 24 | 46 | 1 | 9 |
------------------------------------------------------------------
But let me elaborate a bit...
I have a graph, let's say as such:
node1
|
node2
/ \
node3 node4
| |
node5 node6
Therefore I have a table I call tbl_edges like this:
| id | nodeA | node B |
------------------------
| 12 | 1 | 2 |
| 23 | 2 | 3 |
| 24 | 2 | 4 |
| 35 | 3 | 5 |
| 46 | 4 | 6 |
------------------------
Now each edge has "stop_overs" at a certain distance (to nodeA). Therefore I have a table tbl_stopover like this:
| id | edge | dist |
------------------------
| 1 | 12 | 5 |
| 2 | 23 | 2 |
| 3 | 23 | 9 |
| 4 | 24 | 5 |
| 5 | 24 | 9 |
| 6 | 35 | 5 |
| 7 | 46 | 5 |
------------------------
Why this query?
Let's assume I want to calculate the distance between the stop_overs. Within one edge that is no problem. Across edges it gets more difficult. But if I have two edges that are connected and there is no other connection I can also calculate the distance. Here an example assuming all edges have a length of 10. :
edge23 has a stop_over(id=3) at dist=9, edge35 has a stop_over(id=6) at dist=5. Therefore the distance between these two stop_overs is:
dist = (length - dist_id3) + dist_id5 = (10-9) + 5
I am not sure if I made my self clear. If this is not understandable, feel free to ask question and I will do my best to make this more understandable.
MySQL allows you to do something silly - display fields in an aggregate query that are not a part of the GROUP BY or an aggregate function like MAX. When you do this, you get random (as you said) results for the remaining fields.
In your query you are doing this twice - once in your inner query (id2 is not part of a GROUP BY or aggregate) and once in the outer.
Prepare for random results!
To fix it, try something like this:
SELECT tbl_stopover.id,
tbl_stopover.dist,
tbl_conn.id1,
tbl_conn.id2,
tbl_conn.numConn,
MAX(tbl_stopover.dist)
FROM tbl_stopover
INNER JOIN
(SELECT edges1.id id1, edges2.id id2, COUNT(edges1.id) numConn
FROM tbl_edges edges1
INNER JOIN tbl_edges edges2
ON edges1.nodeB = edges2.nodeA
GROUP BY edges1.id, edges2.id
HAVING numConn = 1) AS tbl_conn
ON tbl_stopover.id_edge = tbl_conn.id1
GROUP BY tbl_stopover.id,
tbl_stopover.dist,
tbl_conn.id1,
tbl_conn.id2,
tbl_conn.numConn
The major changes are the explicit field list (note that I removed the id_edge since you are joining on id1 and already have that field), and addition of additional fields to both the inner and outer GROUP BY clauses.
If this gives you more rows than you want then you may need to explain more about your desired result set. Something like this is the only way to ensure you get appropriate groupings.
Okay. This seems to be the answer to my question. I will do some further "investigation" though, because I'm not sure if this is reliable. If anybody has some though on this, please leave a comment.
SELECT tbl.id, tbl.dist, tbl.id1, tbl.id2, MAX(dist) maxDist
FROM
(
SELECT tbl_stopover.id,
tbl_stopover.dist,
tbl_conn.id1,
tbl_conn.id2,
tbl_conn.numConn
FROM tbl_stopover
INNER JOIN
(SELECT edges1.id id1, edges2.id id2, COUNT(edges1.id) numConn
FROM tbl_edges edges1
INNER JOIN tbl_edges edges2
ON edges1.nodeB = edges2.nodeA
GROUP BY edges1.id
HAVING numConn = 1) AS tbl_conn
ON tbl_stopover.id_edge = tbl_conn.id1
GROUP BY tbl_stopover.dist, tbl_conn.id1
ORDER BY dist DESC) AS tbl
GROUP BY tbl.id1, tbl.id2
Thanks to JNK (my colleague at work) without whom I wouldn't have gotten this far.