how to query a database of a graph with sql to find paths of a given criteria for samples - mysql

If I have a database with two tables:
+---------+----+----+----+----+----+-------+
| samples | r1 | r2 | r3 | r4 | r5 | class |
+---------+----+----+----+----+----+-------+
| s1 | 1 | 0 | 1 | 0 | 1 | a |
| s2 | 0 | 1 | 1 | 1 | 0 | b |
| s3 | 1 | 0 | 1 | 0 | 1 | a |
| s4 | 0 | 1 | 0 | 1 | 0 | a |
+---------+----+----+----+----+----+-------+
+-------+------+----+
| edges | from | to |
+-------+------+----+
| e1 | r1 | r2 |
| e2 | r2 | r3 |
| e3 | r3 | r4 |
| e4 | r4 | r5 |
+-------+------+----+
How do I write an SQL query to find the samples that have a path of r's that have the value 1 of at least length two, and other queries that are similar? (e.g a path of value 1 of exactly length 2 or have a pattern 1,0,1)
The graph is more complicated in reality and has loops, so it needs to account for this.
Query : What samples have a chain of at least three r's that are one?
Result: s2
Query: What samples have a chain of 1,0,1?
s1,s3,s4
In prolog I could write something like :
sample_with_chain_at_least_3(S):-
edge(A,B),edge(B,C),r(S,A,1),r(S,B,1),r(S,C,1).
where we would have facts stored for each r(s1,r1,1) etc
But it has been a while since I have touched sql and I cant think how to do this simple query in sql..

Related

I understand that the PIVOT command can transform a dataset, is this the correct way how to do it?

I have a dataset that looks like this:
+----+-------------+
| ID | StoreVisit |
+----+-------------+
| 1 | Home Depot |
| 2 | Lowes |
| 3 | Home Depot |
| 2 | ACE |
| 2 | Lowes |
| 1 | Home Depot |
| 4 | ACE |
| 5 | ACE |
| 4 | Lowes |
+----+-------------+
I'm new(ish) to SQL and I know I can select all and then either use Excel (pivot table / functions / paste special) or R (tidyr) to transpose.. however, if I have a lot of data, this is not efficient. Is the query below correct? If so, how can I define all values of StoreVisit if there are thousands of types of stores without typing each one in the query?
select * from Stores
pivot (COUNT(StoreVisit) for StoreVisit in ([ACE],[Lowes],[Home Depot])) as StoreCounts
+----+-------+-----------+-----+
| ID | Lowes | HomeDepot | ACE |
+----+-------+-----------+-----+
| 1 | 0 | 2 | 0 |
| 2 | 2 | 0 | 0 |
| 3 | 0 | 1 | 0 |
| 4 | 1 | 0 | 1 |
| 5 | 0 | 0 | 1 |
+----+-------+-----------+-----+
Please excuse the formatting of this post! Many apologies.
Use conditional aggregation:
select id,
sum(storevisit = 'Lowes') as lowes,
sum(storevisit = 'HomeDepot') as HomeDepot,
sum(storevisit = 'Ace') as ace
from t
group by id;

How to determine what's changed between database records

Presume first, that the following table exists in a MySQL Database
|----|-----|-----|----|----|-----------|--------------|----|
| id | rid | ver | n1 | n2 | s1 | s2 | b1 |
|----|-----|-----|----|----|-----------|--------------|----|
| 1 | 1 | 1 | 0 | 1 | Hello | World | 0 |
| 2 | 1 | 2 | 1 | 1 | Hello | World | 0 |
| 3 | 1 | 3 | 0 | 0 | Goodbye | Cruel World | 0 |
| 4 | 2 | 1 | 0 | 0 | Hello | Doctor | 1 |
| 5 | 2 | 2 | 0 | 0 | Hello | Nurse | 1 |
| 6 | 3 | 1 | 0 | 0 | Dippity | Doo-Dah | 1 |
|----|-----|-----|----|----|-----------|--------------|----|
Question
How do I write a query to determine whether for any given rid, what changed between the most recent version and the version immediately preceding it (if any) such that it produces something like this:
|-----|-----------------|-----------------|-----------------|
| rid | numbers_changed | strings_changed | boolean_changed |
|-----|-----------------|-----------------|-----------------|
| 1 | TRUE | TRUE | FALSE |
| 2 | FALSE | TRUE | FALSE |
| 3 | n/a | n/a | n/a |
|-----|-----------------|-----------------|-----------------|
I think that I should be able to do this by doing a cross-join between the table and itself but I can't resolve how to perform this join to get the desired output.
I need to generate this "report" for a table with 10's of columns and 1-10 versions of 100's of records (resulting in 1000's of rows). Note the particular design of the database is not my own and altering the structure of the database (at this time) is not an acceptable approach.
The actual format of the output isn't important - and if it simplifies the query getting a "full breakdown" of what changed for each "change set" would also be acceptable, for example
|-----|-----|-----|----|----|----|----|----|
| rid | old | new | n1 | n2 | s1 | s2 | b1 |
|-----|-----|-----|----|----|----|----|----|
| 1 | 1 | 2 | Y | N | N | N | N |
| 1 | 2 | 3 | Y | Y | Y | Y | N |
| 2 | 4 | 5 | N | N | N | Y | N |
|-----|-----|-----|----|----|----|----|----|
Note that it is also ok, in this case to omit rid records which only have a single version, as for the purposes of this report I only care about records that have changed and getting a separate list of records that haven't changed is an easy query
You can join every row with the following one with
select *
from history h1
join history h2
on h2.rid = h1.rid
and h2.id = (
select min(h.id)
from history h
where h.rid = h1.rid
and h.id > h1.id
);
Then you just need to compare every column from the two rows like h1.n1 <> h2.n1 as n1.
The full query would be:
select h1.rid, h1.id as old, h2.id as new
, h1.n1 <> h2.n1 as n1
, h1.n2 <> h2.n2 as n2
, h1.s1 <> h2.s1 as s1
, h1.s2 <> h2.s2 as s2
, h1.b1 <> h2.b1 as b1
from history h1
join history h2
on h2.rid = h1.rid
and h2.id = (
select min(h.id)
from history h
where h.rid = h1.rid
and h.id > h1.id
);
Result:
| rid | old | new | n1 | n2 | s1 | s2 | b1 |
|-----|-----|-----|----|----|----|----|----|
| 1 | 1 | 2 | 1 | 0 | 0 | 0 | 0 |
| 1 | 2 | 3 | 1 | 1 | 1 | 1 | 0 |
| 2 | 4 | 5 | 0 | 0 | 0 | 1 | 0 |
Demo: http://sqlfiddle.com/#!9/2e5d12/5
If the columns can contain NULLs, You might need something like NOT h1.n1 <=> h2.n1 as n1. <=> is a NULL-save equality check.
If the version within a rid group is guaranteed to be consecutive, you can simplify the JOIN to
from history h1
join history h2
on h2.rid = h1.rid
and h2.ver = h1.ver + 1
Demo: http://sqlfiddle.com/#!9/2e5d12/7

Creating a view in MySQL with columns created from row data in a table?

I've got a MySQL database containing three tables. The database contains information about various electrical and mechanical components. It has three tables.
Tables:
componentSource - contains information about where the information in the database was sourced from.
component - contains part number information, description, etc. Multiple entries will refer to a single entry in the componentSource table as its source (Each source file describes multiple components).
componentParams - contains parametric information about the components. Multiple parameter entries will refer to a single entry the component table (each component has multiple parameters).
See simplified example tables...
Database Tables and Relationships:
+-------------------------------+
| Table: componentSource |
+-------------------------------+
| compSrcID* | sourceFile |
+-------------------------------+
| 1 | comp1.txt |
| 2 | comp2.txt |
| 3 | comp3.txt |
+-------------------------------+
^
|
+---------------------------------------------------+
( many to one reference) |
^
^
+---------------------------------------------------------------+
| Table: component |
+---------------------------------------------------------------+
| compID* | partNum | mfrPartNum | mfr | compSrcID |
+---------------------------------------------------------------+
| 1 | 1234 | ABCD | BrandA | 1 |
| 2 | 2345 | BCDE | BrandB | 1 |
| 3 | 3456 | CDEF | BrandC | 3 |
| 4 | 4567 | DEFG | BrandD | 2 |
+---------------------------------------------------------------+
^
|
+---------------+ (many to one reference)
|
^
^
+-------------------------------------------------------+
| Table: componentParams |
+-------------------------------------------------------+
| compParamID* | compID | paramName | paramValue |
+-------------------------------------------------------+
| 1 | 1 | ParamA | 50 |
| 2 | 1 | ParamB | 123 |
| 3 | 1 | ParamC | 10% |
| 4 | 1 | ParamD | 0.5 |
| 5 | 1 | ParamE | Active |
| 6 | 2 | ParamA | 25 |
| 7 | 2 | ParamB | 10K |
| 8 | 2 | ParamC | 5% |
| 9 | 2 | ParamD | 0.25 |
| 10 | 2 | ParamE | Proto |
| 11 | 3 | ParamA | 53.6 |
| 12 | 3 | ParamE | Active |
| 13 | 4 | ParamY | 123-56 |
| 14 | 4 | ParamZ | True |
+-------------------------------------------------------+
I would like to create a view of the database that merges information from the three tables. I would like to have a row for each line in the component table that merges the relevant lines from the componentSource table, and all of the relevant parameters out of the componentParams table.
See example view...
Database View:
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| View: componentView |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| compID* | partNum | mfrPartNum | mfr | SourceFile | ParamA | ParamB | ParamC | ParamD | ParamE | ParamY | ParamZ |
| 1 | 1234 | ABCD | BrandA | comp1.txt | 50 | 123 | 10% | 0.5 | Active | | |
| 2 | 2345 | BCDE | BrandB | comp1.txt | 25 | 10K | 5% | 0.25 | Proto | | |
| 3 | 3456 | CDEF | BrandC | comp3.txt | 53.6 | | | | Active | | |
| 4 | 4567 | DEFG | BrandD | comp2.txt | | | | | | 123-56 | True |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
Since I want a line in the view for each component in the component table, I think merging the info from the componentSource table is fairly straight forward with a join, but the tricky part is creating columns in the view that correspond to the value in componentParam.paramName column. Seems like this requires some recursion to read all parameters associated with a component. Also note that not all components have all the same parameters in the parameter table, so the values for the parameters not used by a component would be null.
An alternative to creating a view, if that can't be done, would be to build another database table.
My SQL skills are super rusty, and were probably not up to this task when they were fresh.
Is it possible to create a view that creates columns that are based on row data (paramName) in a table? Could you show an example?
If not, can a table be built that does the same? Again, could you show an example?
Many thanks.
Conditional aggregation can do the pivoting for you
SELECT cp.compID,
ct.partNum,
ct.mfrPartNum,
ct.mfr,
cs.SourceFile,
MAX(CASE WHEN cp.paramName = 'ParamA' THEN cp.ParamValue END) as ParamA,
MAX(CASE WHEN cp.paramName = 'ParamB' THEN cp.ParamValue END) as ParamB,
MAX(CASE WHEN cp.paramName = 'ParamC' THEN cp.ParamValue END) as ParamC,
MAX(CASE WHEN cp.paramName = 'ParamD' THEN cp.ParamValue END) as ParamD,
MAX(CASE WHEN cp.paramName = 'ParamE' THEN cp.ParamValue END) as ParamE,
MAX(CASE WHEN cp.paramName = 'ParamY' THEN cp.ParamValue END) as ParamY,
MAX(CASE WHEN cp.paramName = 'ParamZ' THEN cp.ParamValue END) as ParamZ
FROM componentParameters cp
JOIN component ct ON cp.compId = ct.compId
JOIN componentSource cs ON cs.compSrcID = ct.compSrcID
GROUP BY cp.compID,
ct.partNum,
ct.mfrPartNum,
ct.mfr,
cs.SourceFile
It is also possible to use subqueries for this, however, I guess this should do the job better.

how to get value from another table by their id's from another table?

Documents table (req1-7 are id's of requirements from another table)
| document_id | document_name | document_amount | req_1 | req_2 | req_3 | req_4 | req_5 | req_6 | req_7 |
| 1 | Diploma | 40 | 1 | 3 | 4 | 4 | 6 | 7 | 8 |
Requirements table
| requirement_id | requirement_name |
| 1 | 1 x 1 picture |
| 2 | 2 x 2 picture |
| 3 | Registration form|
| 4 | Clearance |
| 5 | Medical cert |
| 6 | xray result |
| 7 | excuse letter |
| 8 | affidavit |
| 9 | comsoc clearance |
expected result is similar to the documents table but the the requirements names a displayed with the corresponding value of their id's .
I want to know the correct syntax to query those
So far my query is:
SELECT * FROM document_tbl
WHERE requirement1,requirement2,requirement3,requirement4,requirement5,requirement6,requirement7
IN (
SELECT requirement_name FROM requirements_tbl WHERE requirement_id=requirement1,requirement2,requirement3,requirement4,requirement5,requirement6,requirement7 )";
But I screwedup somehow. Thanks in advance for the help. I would appreciate it.
Intermediate table (document_requirements)
| document_id | requirement_id |
| 1 | 1 |
| 1 | 3 |
| 1 | 4 |
| 1 | 6 |
| 1 | 6 |
| 1 | 7 |
| 1 | 8 |
Query
SELECT d.document_id,dr.requirement_id,r.requirement_name
FROM documents AS d
JOIN document_requirements AS dr ON (dr.document_id=d.document_id)
JOIN requirements AS r ON (r.requirement_id=dr.requirement_id)
WHERE d.document_id = 1;
Something like this:
SELECT d.document_id,
d.document_name,
d.document_amount,
r1.requirement_name AS req_1,
r2.requirement_name AS req_2,
r3.requirement_name AS req_3,
r4.requirement_name AS req_4,
r5.requirement_name AS req_5,
r6.requirement_name AS req_6,
r7.requirement_name AS req_7
FROM documents AS d,
requirements AS r1,
requirements AS r2,
requirements AS r3,
requirements AS r4,
requirements AS r5,
requirements AS r6,
requirements AS r7
WHERE d.req_1 = r1.requirement_id
AND d.req_2 = r2.requirement_id
AND d.req_3 = r3.requirement_id
AND d.req_4 = r4.requirement_id
AND d.req_5 = r5.requirement_id
AND d.req_6 = r6.requirement_id
AND d.req_7 = r7.requirement_id
and so on.

How to use user-defined variables in an UPDATE statement?

I was trying to answer another SO question and was suddenly faced with the following problem. Points should be assigned to the 3 highest scoring (mrk) groups (grp) of each class (sec). The highest scoring groups get 5 points, the second ranking ones 3 points and the groups in 3rd rank only 1 point. For all others pts should be set to null.
| ID | SEC | GRP | MRK | PTS |
|----|-----|-----|-----|--------|
| 1 | cl2 | ge | 32 | (null) |
| 2 | cl1 | gb | 22 | (null) |
| 3 | cl1 | gd | 22 | (null) |
| 4 | cl1 | ge | 18 | (null) |
| 5 | cl2 | ga | 26 | (null) |
| 6 | cl1 | ga | 55 | (null) |
| 7 | cl2 | gb | 66 | (null) |
| 8 | cl2 | gc | 15 | (null) |
| 9 | cl1 | gc | 12 | (null) |
| 10 | cl2 | gf | 5 | (null) |
| 11 | cl2 | ge | 66 | (null) |
I chose to work with user-defined variables as they provide maximum flexibility regarding the allocation scheme and soon came up with the following solution:
SELECT id,sec,grp,mrk,
CASE WHEN #s=sec THEN -- whenever there is a new class ...
CASE WHEN #m=mrk THEN #i ELSE -- issue the same points for
-- identical scorers, otherwise ...
CASE WHEN IF(#m:=mrk,#i,#i)>2 THEN #i:=#i-2 -- store mrk in #mrk and
-- while #i>2 return points: 3 or 1 ...
ELSE #i:=null -- no points for the rest
END
END
ELSE NULLIF(#i:=5,(#s:=sec)=(#m:=mrk)) -- store sec in #s and mrk in #m
-- and return points: 5
END pts
FROM tbl ORDER BY sec,mrk desc
Explanation of NULLIF(#i:=5,(#s:=sec)=(#m:=mrk)):
The expressions #s:=sec and #m:=mrk are both evaluated and then their values are compared by =. The result can either be 0 (false) or 1 (true) but it will definitely be unequal to 5, the other argument of the NULLIF function, therefore in the end only the first argument (5) will be returned. I chose the construct to make the two variable assignments happen without returning anything.
OK, maybe not the most straightforward solution ;-), but I did pay attention to define each variable only once for each record that is being processed, since "the order of evaluation for expressions involving user variables is undefined" mysql manual. The select indeed gives me the desired
result:
| ID | SEC | GRP | MRK | PTS |
|----|-----|-----|-----|--------|
| 6 | cl1 | ga | 55 | 5 |
| 2 | cl1 | gb | 22 | 3 |
| 3 | cl1 | gd | 22 | 3 |
| 4 | cl1 | ge | 18 | 1 |
| 9 | cl1 | gc | 12 | (null) |
| 7 | cl2 | gb | 66 | 5 |
| 11 | cl2 | ge | 66 | 5 |
| 1 | cl2 | ge | 32 | 3 |
| 5 | cl2 | ga | 26 | 1 |
| 8 | cl2 | gc | 15 | (null) |
| 10 | cl2 | gf | 5 | (null) |
Now, my question is:
How do I write an UPDATE statement along the same lines that will store the above calculated results in column pts?
My attempts so far have all failed:
UPDATE tbl SET pts=
CASE WHEN #s=sec THEN
CASE WHEN #m=mrk THEN #i ELSE
CASE WHEN IF(#m:=mrk,#i,#i)>2 THEN #i:=#i-2
ELSE #i:=null
END
END
ELSE NULLIF(#i:=5,(#s:=sec)=(#m:=mrk))
END
ORDER BY sec,mrk desc
result:
| ID | SEC | GRP | MRK | PTS |
|----|-----|-----|-----|-----|
| 6 | cl1 | ga | 55 | 5 |
| 2 | cl1 | gb | 22 | 5 |
| 3 | cl1 | gd | 22 | 5 |
| 4 | cl1 | ge | 18 | 5 |
| 9 | cl1 | gc | 12 | 5 |
| 7 | cl2 | gb | 66 | 5 |
| 11 | cl2 | ge | 66 | 5 |
| 1 | cl2 | ge | 32 | 5 |
| 5 | cl2 | ga | 26 | 5 |
| 8 | cl2 | gc | 15 | 5 |
| 10 | cl2 | gf | 5 | 5 |
Why does the update statement only get a single value (5) for pts?!?
You can find all the data and SQL statements in my SQLfiddle.
I have tried to debug this case.
I've added 6 new columns to the tbl table: b_s, b_m, b_i and a_s, a_m, a_i
b_* - means "before", a_* - means "after",
and I've modified the query to:
UPDATE tbl SET
b_s = #s,
b_m = #m,
b_i = #i,
pts=
CASE WHEN #s=sec THEN
CASE WHEN #m=mrk THEN #i ELSE
CASE WHEN IF(#m:=mrk,#i,#i)>2 THEN #i:=#i-2
ELSE #i:=null
END
END
ELSE NULLIF(#i:=5,(#s:=sec)=(#m:=mrk))
END,
a_s = #s,
a_m = #m,
a_i = #i
ORDER BY sec,mrk desc
My intent was to log values of variables before and after of the expression evaluation.
It's strange - I don't know why, but it seems that when you assign values to all variables before the execution of the update then the update works as expected.
Compare these two demos:
1 - wrong: http://sqlfiddle.com/#!2/2db3e4/1
2 - fine: http://sqlfiddle.com/#!2/37ff5/1
The only difference is this code fragment before the update:
set #i='alamakota';
set #m='alamakota';
set #s='alamakota';
Some kind on "magic string" :)