MySQL complicated query | extracting a phrase from table of words - mysql

I'm working through MySQL connector in python on a project where I'm analyzing books.
I would gladly accept any help with my issue (explained below).
The relevant DB structures:
each Word, in each book, has its own word_id(primary key) and text.
each Word_instance has word_id, word_serial, offset in line, sentence number and so on...
the entity Word_instance's word_serial is its offset from the beginning of the book.
each Phrase has its own id and text.
each Phrase_word has phrase_id and word_id(from above).
Right now, I'm trying to figure out how to build a query that will locate a phrase from the user in the database.
Words are a part of a phrase if they have consecutive word_serial and are in the same sentence.
so far I've managed to build the following mess of a query:
select book_id
, word_txt
, word_serial
, sentence_serial
, ROW_NUMBER() Over (partition by sentence_serial, book_id) as encounter_num
from word
join word_instance
on word.word_id = word_instance.word_id
join word_in_phrase
on word.word_id = word_in_phrase.word_id
where phrase_id = %s
order
by book_id
, sentence_serial
, word_serial
In the following table image is the result set of said query.
let's say the user has entered the phrase: "I believe in cause".
in that case I would need to extract word_serial = 562, as it is the beginning said phrase.
can I accomplish such a task without extracting row by row and assessing whether the current row is part of the phrase and in the correct order?
In fact, there are way to many rows to examine outside of SQL to consider that a possibility.
I will appreciate your help immensely, as I'm stuck on this issue for far too long...
As requested, I'm uploading images of relevant DB entities:
Word_in_phrase entity
Word_instance entity
word entity

This probably isn't the most efficient way of writing this, but I think it works in principle and you could tinker with it as you wanted. Note that I assumed phrases can't cross sentence boundaries (wi2.sentence_serial = wi1.sentence_serial) and I've assumed a column word_in_phrase.order_id exists that starts at 0 and increases by 1 for each word. I'm also assuming word_id increases by 1 each row. (You could make those assumptions true by using CTEs where that is true instead of the real tables).
with (
SELECT *
FROM word_in_phrase
WHERE phrase_id = %s
) as phrase
select book_id
, word_txt
, word_serial
, sentence_serial
from word
join word_instance wi1
on word.word_id = word_instance.word_id
where (SELECT COUNT(*) FROM phrase) = (SELECT COUNT(*) FROM word_instance wi2 INNER JOIN phrase on wi2.word_id = phrase.word_id WHERE wi2.book_id = wi1.book_id and wi2.sentence_serial = wi1.sentence_serial and wi2.word_id = wi1.word_id + phrase.order_id)
order
by book_id
, sentence_serial
, word_serial
Alternatively, you might prefer something like
with (
SELECT *
FROM word_in_phrase
WHERE phrase_id = %s
) as phrase
select wi1.book_id
, word_txt
, wi1.word_serial
, wi1.sentence_serial
from word
join word_instance wi1
on word.word_id = word_instance.word_id
inner join word_instance wi2
on wi2.book_id = wi1.book_id and wi2.sentence_serial = wi1.sentence_serial
INNER JOIN phrase
on wi2.word_id = phrase.word_id
WHERE wi2.word_id = wi1.word_id + phrase.order_id
GROUP BY
wi1.book_id
, word_txt
, wi1.word_serial
, wi1.sentence_serial
HAVING COUNT(*) = (SELECT COUNT(*) FROM phrase)

Related

Please how do I get these lines of codes to run error free

I keep getting this error " The text, ntext, and image data types cannot be compared or sorted, except when using IS NULL or LIKE operator" while running this lines of code on SQL.
--I desire to obtain the names, texts and Last_authors considering their most recent entries.
Select PR1.Name, PR1.TText, PR1.Author as Last_Author
From PageRevision as PR1, PageRevision as PR2
Group by Pr1.Name, PR1.TText
Having PR1.DDate = max(PR2.DDate);
Well seems the easiest way is just do this.
Select PR1.Name, PR1.TText, PR1.Author as Last_Author
From PageRevision as PR1
inner join (
Select name, ttext,author
,max(ddate) ddate
From Pagerevision
group by name, ttext,author
) pr2
on pr1.name = pr2.name
and pr1.ttext = pr2.ttext
and pr1.author = pr2.author
and pr1.ddate = pr2.ddate

Print results of multiple SQL queries as one result

So I have 3 SQL query results (example code below). I want the results to be displayed either as different columns or different tables. Is this even possible? If yes, please help as to how. The results are unrelated to each other
SELECT RouterName, RouterType, Loopback100, Loopback200, ResiliencyGroup,
DeploymentStatus
FROM Routers
WHERE RouterName = 'PE23-SNG-AP'
SELECT ARouter, AInterface, BRouter, BInterface
FROM netplan.LinksPACSLcl
WHERE ARouter = 'PE23-SNG-AP' OR Brouter = 'PE23-SNG-AP'
This is, I believe, a job for JOIN. With respect if you don't know about JOIN you should study it; it's a core feature of SQL. It combines rows from multiple tables into single output rows.
Try something like this
SELECT i.RouterName, i.RouterType, i.Loopback100, i.Loopback200,
i.ResiliencyGroup i.DeploymentStatus,
j.ARouter, j.AInterface, j.BRouter, j.BInterface
FROM Routers i
LEFT JOIN netplan.LinksPACSLcl j (ON i.RouterName = j.ARouter
OR i.RouterName = j.BRouter)
WHERE RouterName = 'PE23-SNG-AP'
This generates a result set with the items from both your first and second tables, assiging the alias names i and j to those tables. The LEFT JOIN operation allows information from the first table to be shown even without anything matching it in the second table.
You should use UNION. This is a working sample. The important part is Column names. Check carefully the usage of col1 with as
select UserName as col1 from Users
union
select FeatureName as col1 from Features
union
select TopicName as col1 from Topics
If you can accept a json result ;)
SELECT JSON_ARRAY_APPEND('[]', '$',
JSON_EXTRACT((SELECT concat('[',
group_concat(
JSON_OBJECT('RouterName' , `RouterName`,
'RouterType' , `RouterType`,
'Loopback100' , `Loopback100`,
'Loopback200' , `Loopback200`,
'ResiliencyGroup' , `ResiliencyGroup`,
'DeploymentStatus', `DeploymentStatus`)
SEPARATOR ','),
']')
FROM Routers
WHERE RouterName = 'PE23-SNG-AP'), '$[*]'), '$',
JSON_EXTRACT((SELECT concat('[',
group_concat(
JSON_OBJECT('ARouter' , `ARouter`,
'AInterface', `AInterface`,
'BRouter' , `BRouter`,
'BInterface', `BInterface`)
SEPARATOR ','),
']')
FROM netplan.LinksPACSLcl
WHERE ARouter = 'PE23-SNG-AP'
OR Brouter = 'PE23-SNG-AP'), '$[*]'));
If you use sql server, you can use SELECT ... FOR XML RAW...
https://learn.microsoft.com/en-us/sql/relational-databases/xml/example-specifying-a-root-element-for-the-xml-generated-by-for-xml

Wordpress Search Serialized Meta Data with Custom Query

I'm trying to do a search on serialized post meta values in a wordpress database. I know the structure of the serial string so I can search for the preceding value, get the index and then get the number of characters I want before that index value. I cannot effectively use regexp in this particular query because I would like to sort based on the results. This is what I have so far, but I am failing on the syntax and I think it has to do with trying to use the AS keyword and my grouping the AND statements.
SELECT SQL_CALC_FOUND_ROWS _posts.ID FROM _posts
INNER JOIN _postmeta ON (_posts.ID = _postmeta.post_id)
WHERE 1=1
AND _posts.post_type = 'dog'
AND (_posts.post_status = 'publish')
AND ( (_postmeta.meta_key = '_meta_general'
AND CAST(_postmeta.meta_value AS CHAR)) AS dmet
AND POSITION(';s:6:\"weight' IN dmet) AS ddex
AND MID(dmet ,ddex,10)) AS dres
GROUP BY dres ORDER BY dres ASC LIMIT 0, 10
Well, I'm still having issues with the structure of this thing. The previous code did not work, #fenway, after closer inspection. Here is what I have now. The problem with #fenway's answer is that the MID and POSITION values were being called in the select part of the statement that was selecting "FROM" posts. They are located in postmeta. So I attempted to rearrange the string filtering after the INNER JOIN which is joining the postmeta table to the posts table by id. This is not working. I understand that this question is simply due to a lack of my knowledge in SQL, but I'm trying to learn here.
None of these are working as I want. There are syntactical errors. The purpose of the code is to group the returned query by a value that is inside of a serial(json) string. The method is to search for the following value (n this case it would be - ";s:6:"weight -) When I have the index of this string I want to return the preceding 10 values ( a date xx/xx/xxxx ). I want to label this string (AS dres) and have the result sort by dres. Wordpress gathers the posts from the posts table, then gathers the post meta from the postmeta table. The post meta table is where the json is stored. It is really a simple algorithm, it's just the syntax that is screwing with me.
SELECT SQL_CALC_FOUND_ROWS {$wpdb->posts}.ID
FROM {$wpdb->posts} INNER JOIN {$wpdb->postmeta}
MID(CAST({$wpdb->postmeta}.meta_value AS CHAR),
POSITION(';s:6:\"weight' IN CAST({$wpdb->postmeta}.meta_value AS CHAR) ),10 ) AS dres
ON ({$wpdb->posts}.ID = {$wpdb->postmeta}.post_id)
WHERE 1=1
AND {$wpdb->posts}.post_type = 'dog'
AND ({$wpdb->posts}.post_status = 'publish')
AND {$wpdb->postmeta}.meta_key = '_meta_general'
AND POSITION(';s:6:"weight' IN CAST({$wpdb->postmeta}.meta_value AS CHAR)) > 0
GROUP BY {$wpdb->posts}.ID ORDER BY dres ASC LIMIT 0, 10
You can't use column aliases in your WHERE clause -- what's more, in some cases, those expressions with always evaluate to TRUE, so I don't see why there are there at all.
Perhaps you mean:
SELECT SQL_CALC_FOUND_ROWS
_posts.ID
,MID(
CAST(_postmeta.meta_value AS CHAR),
POSITION(';s:6:\"weight' IN CAST(_postmeta.meta_value AS CHAR) ),
10
) AS dres
FROM _posts
INNER JOIN _postmeta ON (_posts.ID = _postmeta.post_id)
WHERE 1=1
AND _posts.post_type = 'dog' AND _posts.post_status = 'publish'
AND _postmeta.meta_key = '_meta_general'
AND POSITION(';s:6:\"weight' IN CAST(_postmeta.meta_value AS CHAR)) > 0
GROUP BY dres ORDER BY _postmeta.meta_value ASC LIMIT 0, 10

SQL query with three different tables with distinct

I'm having some trouble with a SQL query across 3 tables with different attributes. Here are the tables and the attributes that I'd like to query in each of them:
news_stories - time, headline
per_minute_quotes - security_id, timestamp, last_price
securities - name, id_bb, id
What I'd like to do is retrieve a security name, id from the securities table, find headlines that correspond to that security (with a timestamp) from the *news_stories* table and find the last_price for that security at the same time as the article from the per_minute_quotes table.
Does this make sense? Please see what I've managed to do so far below...
SELECT DISTINCT
`news_stories`.`time`
, `securities`.`name`
, `adjusted_daily_quotes`.`security_id`
, `news_stories`.`headline`
, `securities`.`id_bb`
, `securities`.`id`
FROM
`schema`.`adjusted_daily_quotes`
, `schema`.`securities`
, `schema`.`news_stories`
WHERE ( (`adjusted_daily_quotes`.`security_id`) = '498'
AND (`securities`.`id`) = '498'
AND (`securities`.`id_bb`) LIKE '267%'
AND (`news_stories`.`headline`) LIKE '%:267')
LIMIT 0,50;
This will basically do the first part of my query, ie. it isn't connected with the last_price. Here is my attempt at doing that:
SELECT DISTINCT
`news_stories`.`time`
, `securities`.`name`
, `per_minute_quotes`.`security_id`
, `news_stories`.`headline`
, `securities`.`id_bb`
, `securities`.`id`
, `per_minute_quotes`.`timestamp`
, `per_minute_quotes`.`last_price`
FROM
`schema`.`per_minute_quotes`
, `schema`.`securities`
, `schema`.`news_stories`
WHERE ( (`per_minute_quotes`.`security_id`) = '498'
AND (`securities`.`id`) = '498'
AND (`securities`.`id_bb`) LIKE '267%'
AND (`news_stories`.`headline`) LIKE '%:267 HK'
AND (`per_minute_quotes`.`timestamp`) <= (`news_stories`.`time`))
LIMIT 0,5;
However, this query returns 5 of the same headline for some reason, all with the same time. I would really appreciate help with forming this query. Does that have something to do with the DISTINCT operator? I've tried using GROUP BY but with no luck.
Thanks in advance!
This is probably by far the easiest way to do it / explain it, although there are other ways.
SELECT
s.name
, s.id
, ns.headline
, pmq.last_price
FROM
securities s
JOIN
news_stories ns
ON ns.headline LIKE '%:267 HK%'
JOIN
(
SELECT
MAX(per_minute_quotes.timestamp) ts
, per_minute_quotes.security_id
FROM
per_minute_quotes
WHERE
per_minute_quotes.security_id
AND per_minute_quotes.timestamp <= news_stories.time
GROUP BY
per_minute_quotes.security_id
) t1
JOIN
per_minute_quotes pmq
ON s.id = pmq.security_id
AND t1.ts = pmq.time
WHERE
security.id = '498'
LIMIT 0,5;
The easiest way to do this is with joins, which you are doing, it's just a different way. The other important thing you need, is the join with the aggregation in it (MAX). This join is a sub-query that finds the pmq with the MAX timestamp that is less or equal to when your news story was published. You were pretty close, just need a bit of refactoring.
*I may have mistakes in here as I typed it in Notepad and copy and pasted... and it's 4 AM and I should be in bed.

Tricky SQL including outer join and case

I use data from http://geonames.org. The table structure is as follows:
GN_Name 1 - 0:N GN_AlternateName
They are linked on:
(PK)GN_Name.GeoNameId == (FK)GN_AlternateName.GeoNameId
GN_Name is the main table containing all place names.
GN_AlternateName contains names in other languages if any.
EX:
GN_Name.Name - Stockholm
GN_AlternateName.AlternateName - Estocolmo (if IsoLanguage=="es")
Rules:
I want to use GN_AlternateName.AlternateName if it exists for the specified language and if it starts with the search string.
If not, i want to use GN_Name.Name if it starts with the search string.
I want GeoNameId to be unique.
Basically I could outer join in first record only, but that seemed to decrease performance.
I've got the following SQL (basically modified SQL from a LINQ query). The problem is that it only finds 'Estocolmo' if search string starts with "stock". "estoc" yields nothing.
select
distinct(n.GeoNameId) as Id,
an.IsoLanguage,
CASE WHEN (an.AlternateName like N'estoc%')
THEN an.AlternateName
ELSE n.Name
END AS [The name we are going to use]
from GN_Name as n
LEFT OUTER JOIN GN_AlternateName as an
ON n.GeoNameId = an.GeoNameId
AND 'es' = an.IsoLanguage
WHERE n.Name like N'estoc%'
UPDATE
Thanks Rahul and Lee D.
I now have the following:
select
distinct(n.GeoNameId) as Id,
an.IsoLanguage,
CASE WHEN (an.AlternateName like N'estoc%')
THEN an.AlternateName
ELSE n.Name
END AS [The final name]
from GN_Name as n
LEFT OUTER JOIN GN_AlternateName as an
ON n.GeoNameId = an.GeoNameId
AND 'es' = an.IsoLanguage
WHERE (n.Name LIKE N'estoc%' OR an.AlternateName LIKE N'estoc%')
This performs LIKE twice on an.AlternateName. Is there any way i could get rid of on LIKE clause?
UPDATE 2
Andriy M made a nice alternative query using COALESCE. I changed it a little bit and ended up with the following:
SELECT Id, LocalisedName
FROM (
SELECT
n.GeoNameId AS Id,
an.IsoLanguage,
COALESCE(an.AlternateName, n.Name) AS LocalisedName
FROM n
LEFT JOIN GN_AlternateName AS an ON n.GeoNameId = an.GeoNameId
AND IsoLanguage = 'es'
) x
WHERE LocalisedName LIKE 'estoc%'
This query does exactly what i am looking for. Thanks!
Here's a probable solution of the problem, which uses a slightly different apporach:
SELECT Id, LocalisedName
FROM (
SELECT
n.GeoNameId AS Id,
an.IsoLanguage,
COALESCE(an.AlternateName, n.Name) AS LocalisedName
FROM GN_Name AS n
LEFT JOIN GN_AlternateName AS an ON n.GeoNameId = an.GeoNameId
AND IsoLanguage = 'es'
) x
WHERE LocalisedName LIKE 'estoc%'
(Changed it based on your update.)
If I understand correctly, in your example the value 'Estocolmo' is in the GN_AlternateName.AlternateName column, so would be filtered out by the where clause which only looks at GN_Name.Name. What if you change the last line of SQL to:
WHERE n.Name LIKE N'estoc%' OR an.AlternateName LIKE N'estoc%'
I'm assuming 'estoc%' is your search string.
I guess you need to modify the WHERE clause to check in GN_AlternateName table as well
WHERE n.Name like N'estoc%' OR an.AlternateName like 'N'estoc%'