Hello everyone i have a quick question, i am running mysql workbench and after joining two tables i get as results 10000 rows. Considering that the first dataset got 6000 rows and the second 450, it'clearly wrong. i'm clearly doing something wrong but i can't figure what is that and why it is happening
I am selecting some column from the first data set and match it against the second one against sv3 and sv4 columns
Can you tell me what i am doing wrong?
the code
select media.Timestamp, media.Campaign, media.Media, media.sv3, media.sv4
from media
inner join media_1
on media.sv3=media_1.sv3 and on media.sv4=media_1.sv4
JOIN queries yielding more results than their source records is not necessarily a sign something is outright wrong; but can be an indicator of something amiss (queries that need to behave that way exist, but are relatively rare).
The source of your issue is likely because you are joining on a value that is non-unique in both tables. As a simple example: If table X has two records with field A = 5, and table Y has three records with field A = 5, and they are JOINed on field A; those records will produce six results.
This may mean there is a problem with your source data, or you may just need to query it in a different manner. I notice you are only selecting fields from media and none from media_1; this query may yield the results you are expecting:
SELECT media.Timestamp, media.Campaign, media.Media, media.sv3, media.sv4
FROM media
WHERE (sv3, sv4) IN (SELECT sv3, sv4 FROM media_1)
Related
I am trying to optimize query (below) which takes 80 minutes to execute. :/
I have very large table prodaja with 21m rows and actual_stock with 960k rows.
SELECT
p.NazivMat,
sum(p.Kolicina) AS ProdajaKol,
sum(p.Iznos) AS ProdajaIznos,
s.Kolicina AS TrenutnaZaliha,
s.Iznos AS TrenZalIznos
FROM
prodaja p
LEFT JOIN actual_stock s ON s.BrojSklad = p.BrojSklad
AND s.SifraMat = p.SifraMat
WHERE
p.Dobavljac = 1664
AND p.DatumOtprem BETWEEN '2020-12-10'
AND '2020-12-11'
I have set Indexes on fields BrojSklad and SifraMat but it does not change much at all as I have dates and range changing and query can take (run) forever if 10 days range is selected (with this query).
Is there any other way(s) to get same result with different query or two of them like "prefetch" and store in temp table and run another one?
Table with 20m rows is pain in the but. :/
UPDATE: 30. Dec
Thanks for all responds below. For sake of simplicity, I've shorten the query, the long version is below. I did add GROUP BY and the end of it, that's sorted.
EXPLAIN SELECT
cm_prodaja.NazivGrupe,
cm_prodaja.Grupa,
cm_prodaja.DatumOtprem,
cm_prodaja.SifraMat,
cm_prodaja.BarCode,
cm_prodaja.SifArtOdDob,
cm_prodaja.NazivMat,
sum(cm_prodaja.Kolicina) AS Kolicina,
sum(cm_prodaja.Iznos) AS Iznos,
IFNULL (zaliha_artikala_radnje.Kolicina, 0) AS TrenutnaZaliha,
IFNULL (zaliha_artikala_radnje.Iznos, 0) AS TrenZalIznos
FROM
cm_prodaja
LEFT JOIN zaliha_artikala_radnje ON zaliha_artikala_radnje.BrojSklad = cm_prodaja.BrojSklad
AND zaliha_artikala_radnje.SifraMat = cm_prodaja.SifraMat
WHERE
cm_prodaja.Dobavljac = 1664
AND cm_prodaja.DatumOtprem BETWEEN '2020-08-10'
AND '2020-08-11'
GROUP BY cm_prodaja.BrojSklad, cm_prodaja.NazivRadnje, cm_prodaja.SifraMat,
cm_prodaja.BarCode, cm_prodaja.SifArtOdDob, cm_prodaja.NazivMat,
cm_prodaja.Kolicina, cm_prodaja.Iznos, cm_prodaja.Dobavljac,
cm_prodaja.NazivDobavljaca, cm_prodaja.Proizvodjac,
cm_prodaja.NazivProizvodjaca, cm_prodaja.Grupa, cm_prodaja.NazivGrupe
I made it a bit faster by adding missing Index on zaliha_artikala_radnje.BrojSklad and zaliha_artikala_radnje.SifraMat.
Another thing I did is enabling partitioning and I've set to "split" table by year (months) on 4 sections/year and that helped a lot.
I've added image with EXPLAIN result.
Add these composite indexes, with the columns in the order given:
p: (Dobavljac, DatumOtprem)
s: (SifraMat, BrojSklad, Iznos, Kolicina)
If you need further assistance, please provide SHOW CREATE TABLE and fix the syntax error: ... LEFT The JOIN ...
What is the datatype of DatumOtprem? I am worried about the endpoints of the BETWEEN.
Another problem... The query has SUM(), but no GROUP BY; what is the intent?
thanks for suggestions. I ended up with using PHP way to JOIN data described here: https://www.koolreport.com/docs/processes/join/
My original query took 3-40 minutes to give results for 1-7 days selected in filter.
Now it takes 2-5 seconds, where final result has 1k - 8k rows.
I did try different methods and anything what has JOIN inside query dropped performance drastically. As said, KoolReport JOIN function solved my problem. I created two queries, both of them are getting their sets of data ordered by SifraMat and match same field Dobavljac.
I am using MySQL through R. I am working with two tables within the same database and I noticed something strange that I can't explain. To be more specific, when I try to make a connection between the tables using a foreign key the result is not what it should be.
One table is called Genotype_microsatellites, the second table is called Records_morpho. They are connected through the foreign key sample_id.
If I only select records with certain characteristics from the Genotype_microsatellites table using the following command...
Gen_msat <- dbGetQuery(mydb, 'SELECT *
FROM Genotype_microsatellites
WHERE CIDK113a >= 0')
...the query returns 546 observations for 52 variables, exactly what I would expect. Now, I want to do a query that adds a little more info to my results, specifically by including data from the Records_morpho table. I, therefore, use the following code:
Gen_msat <- dbGetQuery(mydb, 'SELECT Genotype_microsatellites.*,
Records_morpho.net_mass_g,
Records_morpho.svl_mm
FROM Genotype_microsatellites
INNER JOIN Records_morpho ON Genotype_microsatellites.sample_id = Records_morpho.sample_id
WHERE CIDK113a >= 0')
The problem is that now the output has 890 observation and 54 variables!! Some sample_id values (i.e., the rows or individuals in the data frame ) are showing up multiple times, which shouldn't be the case. I have tried to fix this using SLECT DISTINCT, but the problem wouldn't go away.
Any help would be much appreciated.
Sounds like it is working as intended, that is how joins work. With A JOIN B ON A.x = B.y you get every row from A combined with every row from B that has a y matching the A row's x. If there are 3 rows in B that match one row in A, you will get three result rows for those. The A row's data will be repeated for each B row match.
To go a little further, if x is not unique and y is not unique. And you have two x with the same value, and three y with that value, they will produce six result rows.
As you mentioned DISTINCT does not make this problem go away because DISTINCT operates across the result row. It will only merge result rows if the values in all selected fields are the same on those result rows. Similarly, if you have a query on a single table that has duplicate rows, DISTINCT will merge those rows despite them being separate rows, as they do not have distinct sets of values.
I have a view where I combine some normalized tables. Based on a "master" table, I join connected tables (e.g. JOIN child ON master.child_fk = child.pk). This is pretty straight forward. Now, I'd like to extend this query to perform a join on ALL child rows in some special cases, for example if the master.child_fk equals to -1.
I managed to get a working query by creating a view where I duplicate all rows and set the pk to -1 in the duplicates, but this is incredibly slow (I have quite a lot of data). The same result could be produced by iterating over all the child.pks and performing a separate join for each, but I can't imagine that being faster.
What would be the best way to go about this using MySQL? Please ask questions if something is not clear.
edit: I can add that it seems the reason why my attempt was slow was because of poor index utliziation. See attached EXPLAIN output here https://i.imgur.com/8zfT0HM.png
Replace your join condition as JOIN child ON CASE WHEN master.child_fk != -1 THEN master.child_fk = child.pk ELSE 1 END)
Given relations R(a,b) and S(c,d).I execute following query
select a,b from R,S;
When S is empty, result is always empty while R(a,b) is non empty.I am not getting how S is affecting the query even there should be no interaction with S.
It's because, regardless of the items you're selecting from the query, you're still doing a join between the two tables.
If S is empty, the result of the join is zero rows because that's what the join gives you. That is indeed what you're seeing.
If S had 10,000 rows you would get that many copies of each row in R.
The only way you'll see the correct number of rows from R (assuming no where clause affecting the join), is if S had exactly one row in it.
If you're not using any columns in S for the query, you really shouldn't be listing it as a source table. The correct query would be:
select a, b from R
I am not getting how S is affecting the query even there should be no interaction with S
Because the Cartesian product of A and the empty set is an empty set. Reference: http://en.wikipedia.org/wiki/Empty_set
Also, check this Why is the Cartesian product of a set A and empty set an empty set?
I came across the following SQL statement and I was wondering if it was valid:
SELECT COUNT(*)
FROM
registration_waitinglist,
registration_registrationprofile
WHERE
registration_registrationprofile.activation_key = "ALREADY_ACTIVATED"
What does the two tables separated by a comma mean?
When you SELECT data from multiple tables you obtain the Cartesian Product of all the tuples from these tables. It can be illustrated in the following way:
This means you get each row from the first table paired with all the rows from the second table. Most of the time, it is not what you want. If you really want it, then it's clearer to use the CROSS JOIN notation:
SELECT * FROM A CROSS JOIN B;
In this context, it means that you are going to be joining every row from registration_waitinglist to every row in registration_registrationprofile
It's called a cartesian join
That query is 'syntactically' correct, meaning it will run. What the query will return is the entire product of every row in registration_waitinglist x registration_registrationprofile.
For example, if there were 2 rows in waitinglist and 3 rows in profile, then 6 rows will be returned.
From a practical matter, this is almost always a logical error and not intended. With rare exception, there should be either join criteria or criteria in the where clause.