so I have a possibly silly question, but I'm looking for a basic approach or strategy for the following problem.
I have 1 master file and 3 source files, lets call them master, src1, src2, and src3. The master file is SUPPOSED to have the same records as the 3 source files combined, however, the master file has more records than the sum of all 3 sources. My goal is to validate that all records in said src1-3 are inside the master file AND also extract the records from the master that aren't in any 1 of the 3 sources. Additionally, each of the 4 files have different (but similar) headers
I have been able to find the distinct records from src1 (and subsequent sources) and mapped it to the matching records in the master file by using the following :
WITH tmp1 AS (
SELECT src1.*
FROM src1 as s1
LEFT JOIN master as mstr
ON (
s1.name = mstr.fname
s1.quant = mstr.qty
s1.item = mstr.obj
s1.price = mstr.prc
s1.age = mstr.time_since_dob
)
) SELECT DISTINCT primaryKey from tmp1;
Using this, I can get a count of distinct matches between the two files that are present in src1 and if that matches the count from select distinct PK from src1 then I'm in decent shape. Albeit, I know that using the criteria above I could easily get many collision since several records could have the same name, quantity, item, price, etc... But suffice it to say, using the above criteria I can get unique matches since there are no matching ID's between the two tables or anything like that. Additionally, the join criteria for each source is slightly different so I had to do the above 3 separate times and validate each source independently.
Having done the above along with some other analysis, I have been able to validate that each distinct record from src1-3 has at least 1 distinct match in the master file. I'm having issue, however, with the second half of this challenge where I have to select the records from the master file that did NOT have a corresponding match.
How can I select those records from the master file that were not matched? Can I do a simple
select * from master not in newView1 where newView1 is the combination of the 3 selects for the 3 sources? Again, I'm using different columns for each join condition so putting 3 sources under the same header might be difficult (but worth pursuing?). Another thing worth mentioning is that each file is ~1gb and the master file is ~3gb so time complexity is worth considering.
Thanks for all and any help.
First, using UNION ALL to get all matching rows and rows contained only in src1-3 tables.
Next, getting also rows of the master table that is contained only in the master table by joining with the tmp1 table.
Refer to the following query:
with tmp1(tbl,name,quant,item,price,age,fname,qty,obj,prc,time_since_dob) as (
select 'src1',s1.*,m.* from src1 s1 left join master1 m on
s1.name=m.fname and
s1.quant=m.qty and
s1.item=m.obj and
s1.price=m.prc and
s1.age=m.time_since_dob
union all
select 'src2',s2.*,m.* from src2 s2 left join master1 m on
s2.name=m.fname and
s2.quant=m.qty and
s2.item=m.obj and
s2.price=m.prc and
s2.age=m.time_since_dob
union all
select 'src3',s3.*,m.* from src3 s3 left join master1 m on
s3.name=m.fname and
s3.quant=m.qty and
s3.item=m.obj and
s3.price=m.prc and
s3.age=m.time_since_dob
)
select 'master',m.fname,m.qty,m.obj,m.prc,m.time_since_dob from master1 m left join tmp1 t on
m.fname=t.name and
m.qty=t.quant and
m.obj=t.item and
m.prc=t.price and
m.time_since_dob=t.age
where t.name is null
union all
select t.tbl,t.name,t.quant,t.item,t.price,t.age from tmp1 t
where t.fname is null
db fiddle
Related
I have 2 tables. One (domains) has domain ids, and domain names (dom_id, dom_url).
the other contains actual data, 2 of which columns require a TO and FROM domain names. So I have 2 columns rev_dom_from and rev_dom_for, both of which store the domain name id, from the domains table.
Simple.
Now I need to actually display both domain names on the webpage. I know how to display one or the other, via the LEFT JOIN domains ON reviews.rev_dom_for = domains.dom_url query, and then you echo out the dom_url, which would echo out the domain name in the rev_dom_for column.
But how would I make it echo out the 2nd domain name, in the dom_rev_from column?
you'd use another join, something along these lines:
SELECT toD.dom_url AS ToURL,
fromD.dom_url AS FromUrl,
rvw.*
FROM reviews AS rvw
LEFT JOIN domain AS toD
ON toD.Dom_ID = rvw.rev_dom_for
LEFT JOIN domain AS fromD
ON fromD.Dom_ID = rvw.rev_dom_from
EDIT:
All you're doing is joining in the table multiple times. Look at the query in the post: it selects the values from the Reviews tables (aliased as rvw), that table provides you 2 references to the Domain table (a FOR and a FROM).
At this point it's a simple matter to left join the Domain table to the Reviews table. Once (aliased as toD) for the FOR, and a second time (aliased as fromD) for the FROM.
Then in the SELECT list, you will select the DOM_URL fields from both LEFT JOINS of the DOMAIN table, referencing them by the table alias for each joined in reference to the Domains table, and alias them as the ToURL and FromUrl.
For more info about aliasing in SQL, read here.
Given the following tables..
Domain Table
dom_id | dom_url
Review Table
rev_id | rev_dom_from | rev_dom_for
Try this sql... (It's pretty much the same thing that Stephen Wrighton wrote above)
The trick is that you are basically selecting from the domain table twice in the same query and joining the results.
Select d1.dom_url, d2.dom_id from
review r, domain d1, domain d2
where d1.dom_id = r.rev_dom_from
and d2.dom_id = r.rev_dom_for
If you are still stuck, please be more specific with exactly it is that you don't understand.
Read this and try, this will help you:
Table1
column11,column12,column13,column14
Table2
column21,column22,column23,column24
SELECT table1.column11,table1.column12,table2asnew1.column21,table2asnew2.column21
FROM table1 INNER JOIN table2 AS table2asnew1 ON table1.column11=table2asnew1.column21 INNER TABLE table2 as table2asnew2 ON table1.column12=table2asnew2.column22
table2asnew1 is an instance of table 2 which is matched by table1.column11=table2asnew1.column21
and
table2asnew2 is another instance of table 2 which is matched by table1.column12=table2asnew2.column22
I am running a MySQL Server on Ubuntu, patched up to date...
In MySQL, I have 2 tables in a database. I am trying to get a stock query change working and it kind of is, but it's not :(
What I have is a table (table A) that holds the last time I have checked stock levels, and another table (table B) that holds current stock levels. Each table has identical column names and types.
What I want to do is report on the changes from table B. The reason is that there are about 1/2 million items in this table - and I cannot just update each item using the table as a source as I am limited to 100 changes at a time. So, ideally, I want to get the changes - store them in a temporary table, and use that table to update our system with just those changes...
The following below brings back the changes but shows both Table A and Table B.
I have tried using a Left Join to only report back on Table B but I'm not a mysql (or any SQL) guy, and googling all this... Can anyone help please. TIA. Stuart
SELECT StockItemName,StockLevel
FROM (
SELECT StockItemName,StockLevel FROM stock
UNION ALL
SELECT StockItemName,StockLevel FROM stock_copy
) tbl
GROUP BY StockItemName,StockLevel
HAVING count(*) = 1
ORDER BY StockItemName;
The query below spit out records that have different stock level in both table.
SELECT s.StockItemName, s.StockLevel, sc.StockLevel
FROM stock s
LEFT JOIN stock_copy sc ON sc.Id = s.Id AND sc.StockLevel <> s.StockLevel
ORDER BY s.StockItemName
ok - I solved it - as there wasn't a unique ID on each table that could be matched, and rather than make one, I used 3 colums to create the unique ID and left joined on that.
SELECT sc.StockItem, sc.StockItemName, sc.Warehouse, sc.stocklevel
FROM stock s
LEFT JOIN stock_copy sc ON (sc.StockItem = s.StockItem AND sc.StockItemName = s.StockItemName AND sc.Warehouse = s.Warehouse AND sc.StockLevel <> s.StockLevel)
having sc.StockLevel is not Null;
I am trying to make a table that includes join between 3 tables in the MSSS 2008. There is a fact table, a date table, and a course table. I should join them to make a base table. In date table there is a one parameter that name is Academic Year lookup, and the values in this parameter is like 2000/1, 2001/2. This parameter in the base table should separate to three parameter such as CensusYear, StartYear, and ApplicationYear. Therefore, I need the data table multiple times. I executed a inner join query, and already I have four inner join statement, but I am getting some extra years, and I'm losing some years. I believe, my query should be wrong somewhere.
The attached file is include the design view that created in the MS Access, it'll help to see the tables, and understand what I need to create.
[Design View in Ms Access][1]
SELECT
A.[EventCount],
B.[AcademicYearLookup] AS [CensusYear],
C.[AcademicYearLookup] AS [StartYear],
D.[AcademicYearLookup] AS [ApplicationYear],
B.[CurrentWeekComparisonFlag],
B.[AcademicWeekOfYear],
case
when A.[ApplicationCensusSK] = 1 then 'Same Year'
when A.[ApplicationCensusSK] = 2 then 'Next Year'
when A.[ApplicationCensusSK] = 5 then 'Last Year'
ELSE 'Other'
END as [CensusYearDescription],
B.[CurrentAcademicYear],
A.[StudentCodeBK],
A.[ApplicationSequenceNoBK],
A.[CourseSK],
A.[CourseGroupSK],
A.[CourseMoaSK],
A.[CboSK],
A.[CourseTaughtAbroadSK],
A.[ApplicationStatusSK],
A.[ApplicationFeeStatusSK],
A.[DecisionResponseSK],
A.[NationalityCountrySK],
A.[DomicileCountrySK],
A.[TargetRegionSK],
A.[InternationalSponsorSK] INTO dbo.[BaseTable3yrs]
FROM Student.FactApplicationSnapshot A
INNER JOIN Conformed.DimDate AS B ON A.[CensusDateSK] = B.[DateSK]
INNER JOIN Conformed.DimDate AS C ON A.[AcademicYearStartDateSK] = C.[DateSK]
INNER JOIN Conformed.DimDate AS D ON A.[ApplicationDateSK] = D.[DateSK]
INNER JOIN Student.DimCourse ON A.CourseSK = Student.DimCourse.CourseSK
WHERE (((B.CurrentAcademicYear) In (0,-1))
AND ((A.ApplicationCensusSK) In (1,2,5))
AND ((Student.DimCourse.DepartmentShortName)= 'TEACH ED'));
/* the query to check that the result it's correct or not, and I check it by academic week of year, and I found that I am lossing some data, and I have some extra data, means maybe join is wrong*/
select * from [BaseTable3yrs]
where [StudentCodeBK]= '26002423'
AND [ApplicationSequenceNoBK] = '0101'
order by [AcademicWeekOfYear]
When doing recursive joins like this, it's easy to get duplicate records. You could try gathering the Conformed data separately into a table variable and then joining to it. This would also make your query more readable.
You might also try a SELECT DISTINCT on your main query.
I'm searching for a solution to a problem within MySQL which doesn't sound too complicated, I thought.
Basically I want to use two tables.
The first does contain an electronic component list like
ID Description Value AdditonalInfo
1 Resistor 1.0R R0402
2 Capacitor 100nF C0805
3 Capacitor 10nF C0603
...
I want to store information about the sourcing within a second table.
ID Component Manufacturer Partnumber Timestamp
1 2 TDK XXXYYYZZZ 5
2 2 Kemet AAABBBCCC 10
3 1 Multicomp 111222333 3
...
As you can see, it should be possible to add more than one manufacturer for each component.
Now, I want to generate a single table (a view) which should contain
all component information AND if present, the latest entry of the manufacturer.
For the given example that would be
ID Description Value AdditonalInfo Manufacturer Partnumber
1 Resistor 1.0R R0402 Multicomp 111222333
2 Capacitor 100nF C0805 Kemet AAABBBCCC
3 Capacitor 10nF C0603 (NULL)
Would this be possible within a single query? Or at least with some kind of query which
generates the final table? I could not find out, if the JOIN command would do that.
I would appreciate any help or hints to find a solution for this.
Thanks!
The following query should give you what you are after.
It takes all the components, and then for each component shows the matching entries in the sub-query against the sourcing table which groups the components by the latest entry.
The sub-query is joined based on the component and max(timestamp) to another copy of the sourcing table to get the remaining information required.
SELECT a.ID, a.Description, a.Value, a.AdditonalInfo,
c.Manufacturer, c.Partnumber
FROM componentTable a
LEFT JOIN ( SELECT component, max(timestamp) AS maxTime
FROM sourcingTable
GROUP BY component
) b
ON a.id = b.component
INNER JOIN sourcingTable c
ON b.component = c.component
AND b.maxTime = c.timestamp
You may need additional bracketing around the LEFT JOIN and INNER JOIN parts, but give this a try first and let me know if it doesn't work
Have your try this:
SELECT * FROM table1 LEFT JOIN table2 on table1.id = table2.id
You can get data using query with JOIN, if you have index on Component column:
SELECT
*
FROM components
INNER JOIN store USING(id)
This query will enough and you don't need store redundant data.
I have three tables, the first one is called "File" :
JobId FilenameId FileId
5 2 1
7 3 2
And the second one is called "Filename"
Filename FilenameId
File1 2
File2 3
And the third one is called "Client" :
ClientId JobId
1 5
2 7
Now I want to get the ClientId of File1, how can I do it? I'm new to SQL.
Thanks.
Edit : this is what I tried but it's not working
Select c.ClientId
From `File` f, Filename fn, Client c
Where f.FilenameId = fn.FilenameId and f.JobId = c.JobId and fn.Filename = "File1";
First, I hate the negative banter that sometimes goes on, but yes, you need to get yourself more educated in SQL during your learning. Look here at real-life scenarios and how people offer different solutions to the same.
Now to YOUR question. First, get rid of old style sql where you put all the join criteria in your where clause. Get started knowing the proper relationships between the tables. Second, your WHERE clause should be the basis of your specific criteria -- such as you want File 1. From that, get to the other tables. My personal standard of SQL coding shows first the what criteria do I want and from what table. Ensure indexes are available for optimizing the query. THEN join to the other tables to get the other elements needed to complete the row of data. (Good use of table "aliases", and keep with it).
First, your main criteria. Simple enough.
select
fn.FileNameID,
fn.FileName
from
FileName fn
where
fn.FileName = 'File1'
From there, do your joins to get the next pieces of information from file to client relationships
select
fn.FileNameID,
fn.FileName,
c.clientID
from
FileName fn
JOIN File f
on fn.FileNameID = f.FileNameID
JOIN Client c
on f.JobID = c.JobID
where
fn.FileName = 'File1'
Notice the hierarchical indentation from file name to the file, then from file to the client... you can visually see how the tables are related. Then, just grab your other columns as you need and add to your field list with proper aliases.
Try this:
select ClientId from Client where JobId in (select JobId from File where FilenameId in (select FilenameId from Filename where Filename="File1"));