Stuck with complex MySQL query syntax (joining a table with itself?) - mysql

I have some language data in a MySQL table containing about 3.8 million rows (with indexes on virtually all fields):
+---------+-----------+----------+--------+----------------+----------+--------+---------+---------+
| theWord | lcTheWord | spelling | thePOS | theUSAS | register | period | variety | theDate |
+---------+-----------+----------+--------+----------------+----------+--------+---------+---------+
| to | to | l | TO | Z5 | p | 1 | b | 1608 |
| direct | direct | l | VVI | M6 | p | 1 | b | 1608 |
| others | others | l | NN2 | A6.1-/Z8 | p | 1 | b | 1608 |
| . | . | o | . | PUNC | p | 1 | b | 1608 |
| Both | both | u | DB2 | N5 | p | 1 | b | 1608 |
| his | his | l | APPGE | Z8m | p | 1 | b | 1608 |
| eyes | eyes | l | NN2 | B1 | p | 1 | b | 1608 |
| are | are | l | VBR | A3+ | p | 1 | b | 1608 |
| never | never | l | RR | T1/Z6 | p | 1 | b | 1608 |
| at | at | l | RR21 | N3.8+[i281.2.1 | p | 1 | b | 1608 |
So the same word can (and often will) be contained in the table multiple times, some with "l" for lowercase and some with "u" for uppercase.
I would now like to compare capitalisation of individual words across time-periods (e.g. 1 vs. 8), variety ("b" = British English, "a" = American English) etc. by creating output that is ranked by the proportion of upper to lowercase spelling. I will at some stage also want to restrict the data to certain parts-of-speech tags (thePOS) or semantic tags (theUSAS).
Unfortunately, my knowledge in SQL is very limited - and although I've tried quite a few things (e.g. joining the table with itself and trying to work out things from there), I have so far failed miserably.
Just to give you an example of the kind of things I have been trying:
SELECT l.theWord, count(l.theWord) as freq_low, count(u.theWord) as freq_up
FROM table_name l
INNER JOIN table_name u ON l.lcTheWord = u.lcTheWord
group by l.lcTheWord;
This is clearly the wrong approach, as it doesn't seem to use the necessary indexes (and takes too long for me to even see what it does...)
I realise this is a far less specific question than the guidelines suggest. Apologies! However, I'm wondering whether some kind soul could give me some pointers so that I can go on from there...?
Many thanks in advance!
Sebastian

I do not think that you need a self join here - a GROUP BY should be sufficient. You can count words with 'u's and 'l's in the spelling column like this:
SELECT
lcTheWord
, SUM(CASE spelling WHEN 'u' THEN 1 ELSE 0 END) AS UpperCount
, SUM(CASE spelling WHEN 'l' THEN 1 ELSE 0 END) AS LowerCount
FROM table_name
GROUP BY lcTheWord

Related

How to determine what's changed between database records

Presume first, that the following table exists in a MySQL Database
|----|-----|-----|----|----|-----------|--------------|----|
| id | rid | ver | n1 | n2 | s1 | s2 | b1 |
|----|-----|-----|----|----|-----------|--------------|----|
| 1 | 1 | 1 | 0 | 1 | Hello | World | 0 |
| 2 | 1 | 2 | 1 | 1 | Hello | World | 0 |
| 3 | 1 | 3 | 0 | 0 | Goodbye | Cruel World | 0 |
| 4 | 2 | 1 | 0 | 0 | Hello | Doctor | 1 |
| 5 | 2 | 2 | 0 | 0 | Hello | Nurse | 1 |
| 6 | 3 | 1 | 0 | 0 | Dippity | Doo-Dah | 1 |
|----|-----|-----|----|----|-----------|--------------|----|
Question
How do I write a query to determine whether for any given rid, what changed between the most recent version and the version immediately preceding it (if any) such that it produces something like this:
|-----|-----------------|-----------------|-----------------|
| rid | numbers_changed | strings_changed | boolean_changed |
|-----|-----------------|-----------------|-----------------|
| 1 | TRUE | TRUE | FALSE |
| 2 | FALSE | TRUE | FALSE |
| 3 | n/a | n/a | n/a |
|-----|-----------------|-----------------|-----------------|
I think that I should be able to do this by doing a cross-join between the table and itself but I can't resolve how to perform this join to get the desired output.
I need to generate this "report" for a table with 10's of columns and 1-10 versions of 100's of records (resulting in 1000's of rows). Note the particular design of the database is not my own and altering the structure of the database (at this time) is not an acceptable approach.
The actual format of the output isn't important - and if it simplifies the query getting a "full breakdown" of what changed for each "change set" would also be acceptable, for example
|-----|-----|-----|----|----|----|----|----|
| rid | old | new | n1 | n2 | s1 | s2 | b1 |
|-----|-----|-----|----|----|----|----|----|
| 1 | 1 | 2 | Y | N | N | N | N |
| 1 | 2 | 3 | Y | Y | Y | Y | N |
| 2 | 4 | 5 | N | N | N | Y | N |
|-----|-----|-----|----|----|----|----|----|
Note that it is also ok, in this case to omit rid records which only have a single version, as for the purposes of this report I only care about records that have changed and getting a separate list of records that haven't changed is an easy query
You can join every row with the following one with
select *
from history h1
join history h2
on h2.rid = h1.rid
and h2.id = (
select min(h.id)
from history h
where h.rid = h1.rid
and h.id > h1.id
);
Then you just need to compare every column from the two rows like h1.n1 <> h2.n1 as n1.
The full query would be:
select h1.rid, h1.id as old, h2.id as new
, h1.n1 <> h2.n1 as n1
, h1.n2 <> h2.n2 as n2
, h1.s1 <> h2.s1 as s1
, h1.s2 <> h2.s2 as s2
, h1.b1 <> h2.b1 as b1
from history h1
join history h2
on h2.rid = h1.rid
and h2.id = (
select min(h.id)
from history h
where h.rid = h1.rid
and h.id > h1.id
);
Result:
| rid | old | new | n1 | n2 | s1 | s2 | b1 |
|-----|-----|-----|----|----|----|----|----|
| 1 | 1 | 2 | 1 | 0 | 0 | 0 | 0 |
| 1 | 2 | 3 | 1 | 1 | 1 | 1 | 0 |
| 2 | 4 | 5 | 0 | 0 | 0 | 1 | 0 |
Demo: http://sqlfiddle.com/#!9/2e5d12/5
If the columns can contain NULLs, You might need something like NOT h1.n1 <=> h2.n1 as n1. <=> is a NULL-save equality check.
If the version within a rid group is guaranteed to be consecutive, you can simplify the JOIN to
from history h1
join history h2
on h2.rid = h1.rid
and h2.ver = h1.ver + 1
Demo: http://sqlfiddle.com/#!9/2e5d12/7

Left joins, i need an explanation about a code

i am watching a tutorial. There is a code which i don't understand what is supposed to do.
$sql = 'SELECT p.*,
a.screen_name AS author_name,
c.name AS category_name
FROM
posts p
LEFT JOIN
admin_users a ON p.author_id = a.id
LEFT JOIN
categories c ON p.category_id = c.id
WHERE
p.id = ?';
I read about the left joins but i didn't understand them. Can somebody please explain me the code i shared.
Thanks in advance!
Imagine you have two tables. One that stores the information about the programmers on your website, and the other table that keeps track of their online purchases.
PROGRAMMERS Table
+--------------------------------------------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Desire | 32 | 123 fake s| 3000.00 |
| 2 | Jamin | 25 | 234 fake s| 2500.00 |
| 3 | Jon | 23 | 567 fake s| 2000.00 |
| 4 | Bob | 30 | 789 fake s| 1500.00 |
| 5 | OtherGuy | 31 | 890 fake s| 1000.00 |
| 6 | DudeMan | 32 | 901 fake s| 500.00 |
+--------------------------------------------+
PURCHASES Table
+---------------------------------------------+
| ORDER_ID | PROG_ID | DATE | PRICE |
+-------------+---------+---------------------|
| 1 | 1 | 1-1-2017 | 100 |
| 2 | 2 | 1-2-2017 | 200 |
| 3 | 6 | 1-3-2017 | 300 |
+---------------------------------------------|
You decide you need to make a new table to consolidate this information to a table that contains
certain columns you want.
For example, you figure it would be nice for shipping purposes to have a table
that has the ID, the NAME, the PRICE, and the DATE columns.
Currently, the tables we have don't display all of that in a single table.
If we were to LEFT JOIN these tables, we would end up filling the desired columns
with NULL values where there is no information to join.
SELECT ID, NAME, PRICE, DATE
FROM PROGRAMMERS
LEFT JOIN PURCHASES
ON PROGRAMMERS.ID = PURCHASES.PROG_ID;
Notice that I'm selecting the columns I want from the starting table, then joining the right table
even though there might be missing information.
RESULTING TABLE
+-------------------------------------+
| ID | NAME | PRICE | DATE |
+----+----------+-----------------+---+
| 1 | Desire | 100 | 1-1-2017 |
| 2 | Jamin | 200 | 1-2-2017 |
| 3 | Jon | NULL | NULL |
| 4 | Bob | NULL | NULL |
| 5 | OtherGuy | NULL | NULL |
| 6 | DudeMan | 300 | 1-3-2017 |
+-------------------------------------+
For a visual representation of the difference between SQL JOINs check out
https://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins .

Compare different rows and bring out result

I have a table which requires me to pair certain rows together using a unique value that both the rows share.
For instance in the below table;
+--------+----------+-----------+-----------+----------------+-------------+
| id | type | member | code | description | matching |
+--------+----------+-----------+-----------+----------------+-------------+
| 1000 |transfer | 552123 | SC120314 | From Gold | |
| 1001 |transfer | 552123 | SC120314 | To Platinum | |
| 1002 |transfer | 833612 | SC120314 | From silver | |
| 1003 |transfer | 833612 | SC120314 | To basic | |
| 1004 |transfer | 457114 | SC150314 | From Platinum | |
| 1005 |transfer | 457114 | SC150314 | To silver | |
| 1006 |transfer | 933276 | SC180314 | From Gold | |
| 1007 |transfer | 933276 | SC180314 | From To basic | |
+--------+----------+-----------+-----------+----------------+-------------+
basically What i need the query / routine to do is find the rows where the value in the 'member' column for each row match. Then see if the values in the 'code' column for the same found rows also match.
If both columns for both rows match, then assign a value to the 'matching' column for both rows. This value should be the same for both rows and unique to only them.
The unique code can be absolutely anything, so long as it's exclusive to matching rows. Is there any query / routine capable of carrying this out?
I'm not sure I understand the question correctly, but if you like to pick out and update rows where the code and member columns matches and set matching to some unique value for each of the related rows, I believe this would work:
UPDATE <table> A
INNER JOIN (SELECT * FROM <table>) B ON
B.member = A.member && B.code = A.code && A.id <> B.id
SET A.matching = (A.id + B.id);
The matching value will be set to the sum of the id columns for both rows. Notice that updating the matching field this way will not work if there are more than two rows that can match.
Running the above query against your example table would yield:
+------+----------+--------+----------+---------------+----------+
| id | type | member | code | description | matching |
+------+----------+--------+----------+---------------+----------+
| 1000 | transfer | 552123 | SC120314 | From Gold | 2001 |
| 1001 | transfer | 552123 | SC120314 | To Platinum | 2001 |
| 1002 | transfer | 833612 | SC120314 | From Silver | 2005 |
| 1003 | transfer | 833612 | SC120314 | To basic | 2005 |
| 1004 | transfer | 457114 | SC150314 | From Platinum | 2009 |
| 1005 | transfer | 457114 | SC150314 | To silver | 2009 |
| 1006 | transfer | 933276 | SC180314 | From Gold | 2013 |
| 1007 | transfer | 933276 | SC180314 | From To basic | 2013 |
+------+----------+--------+----------+---------------+----------+
I can give you a simple query what can do what you need.
tst is the name of the table.
SELECT *, COUNT( t2.id ) as matching FROM tst t LEFT JOIN tst t2 ON t2.member = t.member GROUP BY t.id

MySQL - Use Header Name as Part of Query Filter

I'm relatively new to MySQL and have come across a problem to which I cannot seem to find a solution. I have searched but could not find an answer. I'm open to the possibility that I'm not asking the question correctly. Here goes:
I'm trying to use the name of a given column and the values within that column from one table to pull values from another table. The first table contains 3 columns with the response codified. The second table contains the definitions for each code for each item. The same number code is associated with different meanings depending on the item. For example:
table1 (this table cannot change):
--------------------------------------------------------------
|result_id | f_initial | l_name | item_A | item_B | item_C |
--------------------------------------------------------------
| 1 | j | doe | 1 | 3 | 2 |
| 2 | k | smith | 3 | 1 | 2 |
| 3 | l | williams | 2 | 2 | 1 |
--------------------------------------------------------------
table2 (this table can be modified, split, or whatever needs to be done):
-------------------------------------------
|item_id | item_name | score | definition |
-------------------------------------------
| 1 | item_A | 1 | agree |
| 2 | item_A | 2 | neutral |
| 3 | item_A | 3 | disagree |
| 4 | item_B | 1 | likely |
| 5 | item_B | 2 | not likely |
| 6 | item_B | 3 | no reply |
| 7 | item_C | 1 | yes |
| 8 | item_C | 2 | no |
-------------------------------------------
My goal is for the query to output the following:
--------------------------------------------------------------------
|result_id | f_initial | l_name | item_A | item_B | item_C |
--------------------------------------------------------------------
| 1 | j | doe | agree | no reply | no |
| 2 | k | smith | disagree | likely | no |
| 3 | l | williams | neutral | not likely | yes |
--------------------------------------------------------------------
Any assistance or guidance is greatly appreciated. Thank you in advance.
You must join the two tables on the item_A/B/C and score columns
select t1.result_id, t1.f_initial, t1.l_name,
t2a.definition as item_a,
t2b.definition as item_b,
t2c.definition as item_c
from table1 t1
join table2 t2a on t2a.score = t1.item_a
join table2 t2b on t2b.score = t1.item_b
join table2 t2c on t2c.score = t1.item_c
where t2a.item_name = 'item_A'
and t2b.item_name = 'item_B'
and t2c.item_name = 'item_C'

Nested MySql Select statement with "where in" clause

I'll try to detail this the best I can. I have a nested select statement with a where in clause, but the nested part of the select should be interpreted as a literal string (I believe this is the right terminology). However the default behavior of mysql leads to a result I do not want.
I.e.
select class
from cs_item
where code="007"
+-------+
| class |
+-------+
| 1,3 |
+-------+
And the below is a query if I explicitly type "in (1,3)" as part of a select query:
select alpha,description
from cs_quality
where class in (1,3);
+-------+-------------+
| alpha | description |
+-------+-------------+
| STD | STD |
| XS | XS |
| 5 | Sch 5 |
| 10 | Sch 10 |
| 20 | Sch 20 |
| 40 | Sch 40 |
| 60 | Sch 60 |
| 80 | Sch 80 |
| 100 | Sch 100 |
| 120 | Sch 120 |
| 140 | Sch 140 |
| 160 | Sch 160 |
| XXS | XXS |
| 15L | 150# |
| 30L | 300# |
| 40L | 400# |
| 60L | 600# |
| 90L | 900# |
| 150L | 1500# |
| 200L | 2000# |
| 250L | 2500# |
| 300L | 3000# |
| 400L | 4000# |
| 600L | 6000# |
| 900L | 9000# |
+-------+-------------+
But when I go to nest this to get the same result I have...
select alpha,description
from cs_quality
where class in (select class from cs_item where code = "007")
+-------+-------------+
| alpha | description |
+-------+-------------+
| STD | STD |
| XS | XS |
| 5 | Sch 5 |
| 10 | Sch 10 |
| 20 | Sch 20 |
| 40 | Sch 40 |
| 60 | Sch 60 |
| 80 | Sch 80 |
| 100 | Sch 100 |
| 120 | Sch 120 |
| 140 | Sch 140 |
| 160 | Sch 160 |
| XXS | XXS |
+-------+-------------+
Which is just the part of "class in 1"... it balks on the ",3" component. Is there a way for the nested select to be interpreted as literal text?
Thanks all, much appreciated. I had a bit of trouble wording this question but will edit as needed.
Normalize, normalize, normalize your tables, in this case table cs_item. You should NOT store multiple (comma separated) values in one field.
Until you do that, you can use:
select alpha, description
from cs_quality
where FIND_IN_SET( class , (select class from cs_item where code = '007'))
or
select q.alpha, q.description
from cs_quality AS q
join cs_item AS i
on FIND_IN_SET( q.class , i.class )
where i.code = '007'
But this kind of using special functions instead of equality for JOINs, leads to very slow queries. Storing comma separated lists leads to a ton of other problems. See here:
Is storing a comma separated list in a database column really that bad?
Short answer is: Yeah, it's that bad.
Your query needs to return multiple rows like this:
+-------+
| class |
+-------+
| 1 |
+-------+
| 3 |
+-------+
Or else it is as if you are doing:
select alpha,description
from cs_quality
where class in ("1, 3");
Which you do not want.
Better use join, instead of a nested query