SQL query with three different tables with distinct - mysql

I'm having some trouble with a SQL query across 3 tables with different attributes. Here are the tables and the attributes that I'd like to query in each of them:
news_stories - time, headline
per_minute_quotes - security_id, timestamp, last_price
securities - name, id_bb, id
What I'd like to do is retrieve a security name, id from the securities table, find headlines that correspond to that security (with a timestamp) from the *news_stories* table and find the last_price for that security at the same time as the article from the per_minute_quotes table.
Does this make sense? Please see what I've managed to do so far below...
SELECT DISTINCT
`news_stories`.`time`
, `securities`.`name`
, `adjusted_daily_quotes`.`security_id`
, `news_stories`.`headline`
, `securities`.`id_bb`
, `securities`.`id`
FROM
`schema`.`adjusted_daily_quotes`
, `schema`.`securities`
, `schema`.`news_stories`
WHERE ( (`adjusted_daily_quotes`.`security_id`) = '498'
AND (`securities`.`id`) = '498'
AND (`securities`.`id_bb`) LIKE '267%'
AND (`news_stories`.`headline`) LIKE '%:267')
LIMIT 0,50;
This will basically do the first part of my query, ie. it isn't connected with the last_price. Here is my attempt at doing that:
SELECT DISTINCT
`news_stories`.`time`
, `securities`.`name`
, `per_minute_quotes`.`security_id`
, `news_stories`.`headline`
, `securities`.`id_bb`
, `securities`.`id`
, `per_minute_quotes`.`timestamp`
, `per_minute_quotes`.`last_price`
FROM
`schema`.`per_minute_quotes`
, `schema`.`securities`
, `schema`.`news_stories`
WHERE ( (`per_minute_quotes`.`security_id`) = '498'
AND (`securities`.`id`) = '498'
AND (`securities`.`id_bb`) LIKE '267%'
AND (`news_stories`.`headline`) LIKE '%:267 HK'
AND (`per_minute_quotes`.`timestamp`) <= (`news_stories`.`time`))
LIMIT 0,5;
However, this query returns 5 of the same headline for some reason, all with the same time. I would really appreciate help with forming this query. Does that have something to do with the DISTINCT operator? I've tried using GROUP BY but with no luck.
Thanks in advance!

This is probably by far the easiest way to do it / explain it, although there are other ways.
SELECT
s.name
, s.id
, ns.headline
, pmq.last_price
FROM
securities s
JOIN
news_stories ns
ON ns.headline LIKE '%:267 HK%'
JOIN
(
SELECT
MAX(per_minute_quotes.timestamp) ts
, per_minute_quotes.security_id
FROM
per_minute_quotes
WHERE
per_minute_quotes.security_id
AND per_minute_quotes.timestamp <= news_stories.time
GROUP BY
per_minute_quotes.security_id
) t1
JOIN
per_minute_quotes pmq
ON s.id = pmq.security_id
AND t1.ts = pmq.time
WHERE
security.id = '498'
LIMIT 0,5;
The easiest way to do this is with joins, which you are doing, it's just a different way. The other important thing you need, is the join with the aggregation in it (MAX). This join is a sub-query that finds the pmq with the MAX timestamp that is less or equal to when your news story was published. You were pretty close, just need a bit of refactoring.
*I may have mistakes in here as I typed it in Notepad and copy and pasted... and it's 4 AM and I should be in bed.

Related

SQL query optimization taking lot of time in execution

We have two tables one is properties and another one is property meta when we are getting data from one table "properties" , query only take less then one second in execution but when we are use join to get the data using bellow query from both tables its taking more then 5 second to fetch the data although we have only 12000 record in the tables , i think there is an issue in the sql query any help or suggestion will be appreciated.
SELECT
u.id,
u.property_title,
u.description,
u.city,
u.area,
u.address,
u.slug,
u.latitude,
u.longitude,
u.sync_time,
u.add_date,
u.is_featured,
u.pre_construction,
u.move_in_date,
u.property_status,
u.sale_price,
u.mls_number,
u.bedrooms,
u.bathrooms,
u.kitchens,
u.sub_area,
u.property_type,
u.main_image,
u.area_size as land_area,
pm7.meta_value as company_name,
pm8.meta_value as virtual_tour,
u.year_built,
u.garages
FROM
tbl_properties u
LEFT JOIN tbl_property_meta pm7
ON u.id = pm7.property_id
LEFT JOIN tbl_property_meta pm8
ON u.id = pm8.property_id
WHERE
u.status = 1
AND (pm7.meta_key = 'company_name')
AND (pm8.meta_key = 'virtual_tour')
AND (
(
( u.city = 'Delta'
OR u.post_code LIKE '%Delta%'
OR u.sub_area LIKE '%Delta%'
OR u.state LIKE '%Delta%')
AND country = 'Canada'
)
OR (
( u.city = 'Metro Vancouver Regional District'
OR u.post_code LIKE '%Metro Vancouver Regional District%'
OR u.sub_area LIKE '%Metro Vancouver Regional District%'
OR u.state LIKE '%Metro Vancouver Regional District%' )
AND country = 'Canada'
)
)
AND
u.pre_construction ='0'
GROUP BY
u.id
ORDER BY
u.is_featured DESC,
u.add_date DESC
Try adding this compound index:
ALTER TABLE tbl_property_meta ADD INDEX id_key (property_id, meta_key);
If it doesn't help make things faster, try this one.
ALTER TABLE tbl_property_meta ADD INDEX key_id (meta_key, property_id);
And, you should know that column LIKE '%somevalue' (with a leading %) is a notorious performance antipattern, resistant to optimization via indexes. (There's a way to create indexes for that shape of filter in PostgreSQL, but not in MariaDB / MySQL.)
Add another column with the meta stuff; throw city, post_code, sub_area, and state and probably some other things into it. Then build a FULLTEXT index on that column. Then use MATCH(..) AGAINST("Delta Metro Vancouver Regional District") in the WHERE clause _instead of the LEFT JOINs (which are actually INNER JOINs) and the really messy part of the WHERE clause.
Also, the GROUP BY is probably unnecessary, thereby eliminating extra sort on the intermediate set of rows.

MySQL complicated query | extracting a phrase from table of words

I'm working through MySQL connector in python on a project where I'm analyzing books.
I would gladly accept any help with my issue (explained below).
The relevant DB structures:
each Word, in each book, has its own word_id(primary key) and text.
each Word_instance has word_id, word_serial, offset in line, sentence number and so on...
the entity Word_instance's word_serial is its offset from the beginning of the book.
each Phrase has its own id and text.
each Phrase_word has phrase_id and word_id(from above).
Right now, I'm trying to figure out how to build a query that will locate a phrase from the user in the database.
Words are a part of a phrase if they have consecutive word_serial and are in the same sentence.
so far I've managed to build the following mess of a query:
select book_id
, word_txt
, word_serial
, sentence_serial
, ROW_NUMBER() Over (partition by sentence_serial, book_id) as encounter_num
from word
join word_instance
on word.word_id = word_instance.word_id
join word_in_phrase
on word.word_id = word_in_phrase.word_id
where phrase_id = %s
order
by book_id
, sentence_serial
, word_serial
In the following table image is the result set of said query.
let's say the user has entered the phrase: "I believe in cause".
in that case I would need to extract word_serial = 562, as it is the beginning said phrase.
can I accomplish such a task without extracting row by row and assessing whether the current row is part of the phrase and in the correct order?
In fact, there are way to many rows to examine outside of SQL to consider that a possibility.
I will appreciate your help immensely, as I'm stuck on this issue for far too long...
As requested, I'm uploading images of relevant DB entities:
Word_in_phrase entity
Word_instance entity
word entity
This probably isn't the most efficient way of writing this, but I think it works in principle and you could tinker with it as you wanted. Note that I assumed phrases can't cross sentence boundaries (wi2.sentence_serial = wi1.sentence_serial) and I've assumed a column word_in_phrase.order_id exists that starts at 0 and increases by 1 for each word. I'm also assuming word_id increases by 1 each row. (You could make those assumptions true by using CTEs where that is true instead of the real tables).
with (
SELECT *
FROM word_in_phrase
WHERE phrase_id = %s
) as phrase
select book_id
, word_txt
, word_serial
, sentence_serial
from word
join word_instance wi1
on word.word_id = word_instance.word_id
where (SELECT COUNT(*) FROM phrase) = (SELECT COUNT(*) FROM word_instance wi2 INNER JOIN phrase on wi2.word_id = phrase.word_id WHERE wi2.book_id = wi1.book_id and wi2.sentence_serial = wi1.sentence_serial and wi2.word_id = wi1.word_id + phrase.order_id)
order
by book_id
, sentence_serial
, word_serial
Alternatively, you might prefer something like
with (
SELECT *
FROM word_in_phrase
WHERE phrase_id = %s
) as phrase
select wi1.book_id
, word_txt
, wi1.word_serial
, wi1.sentence_serial
from word
join word_instance wi1
on word.word_id = word_instance.word_id
inner join word_instance wi2
on wi2.book_id = wi1.book_id and wi2.sentence_serial = wi1.sentence_serial
INNER JOIN phrase
on wi2.word_id = phrase.word_id
WHERE wi2.word_id = wi1.word_id + phrase.order_id
GROUP BY
wi1.book_id
, word_txt
, wi1.word_serial
, wi1.sentence_serial
HAVING COUNT(*) = (SELECT COUNT(*) FROM phrase)

Inner join in combination with sum and group by

I am having some strange issue with the SQL statement below. The result groups by user IDs and some of them turn out right but for one of them (user ID = 1) the "initial_average" is multiplied by 3. I really have no idea why.. Is there something wrong with the structure of the statement? If it is not clear, the aim is to sum the field "initial_avg" in the "tasks" table and have it broken out by user. Some help with this is much appreciated. I am using MySQL.
SELECT sum(initial_avg) AS initial_average
, sum(initial_std) AS initial_standard_dev
, tasks.user
, hourly_rate
FROM tasks
INNER JOIN user_project
ON tasks.user=user_project.user
AND tasks.project=59
AND tasks.user=1
GROUP BY tasks.user
I just solved it by adding another "and" clause (AND user_project.project=59 )
Optimize your query (Try it):
SELECT SUM(initial_avg) AS initial_average, SUM(initial_std) AS initial_standard_dev, tasks.user, hourly_rate FROM tasks INNER JOIN user_project ON tasks.user = user_project.user AND tasks.project = User_project.project WHERE tasks.project = 59 AND tasks.user = 1 GROUP BY tasks.user, hourly_rate

MySQL multiple joins and count distinct

I need to write query that joins several tables and I need distinct value from one table based on max count().
These are my tables names and columns:
bands:
db|name|
releases_artists:
release_db|band_db
releases_styles
release_db|style
Relations between tables are (needed for JOINs):
releases_artists.band_db = bands.db
releases_styles.release_db = releases_artists.release_db
And now the query that I need to write:
SELECT b.name, most_common_style
LEFT JOIN releases_artists ra ON ra.band_db = b.db
and here I need to find the most common style from all band releases
JOIN(
SELECT DISTINCT style WHERE releases_styles.release_db = ra.release_db ORDER BY COUNT() DESC LIMIT 1
)
FROM bands b
WHERE b.name LIKE 'something'
This is just a non working example of what I want to accomplish. It would be great if someone could help me build this query.
Thanks in advance.
EDIT 1
Each artist from table bands can have multiple records from releases_artists table based on band_db and each release can have multiple styles from releases_styles based on release_db
So if I search for b.name LIKE '%ray%' it returns something similar to:
`bands`:
o7te|Ray Wilson
9i84|Ray Parkey Jr.
`releases_artists` for Ray Wilson:
tv5c|o7te (for example album `Change`)
78wz|o7te (`The Next Best Thing`)
nz7c|o7te (`Propaganda Man`)
`releases_styles`
tv5c|Pop
tv5c|Rock
tv5c|Alternative Pop/Rock
----
78wz|Rock
78wz|Pop
78wz|Classic Rock
I need style name that repeats mostly from all artist releases as this artist main style.
Ok, this is a bit of a hack. But the only alternatives I could think of involve heaps of nested subqueries. So here goes:
SELECT name
, SUBSTRING_INDEX(GROUP_CONCAT(style ORDER BY release_count DESC SEPARATOR '|'), '|', 1) AS most_common_style
FROM (
SELECT b.db
, b.name
, rs.style
, COUNT(*) AS release_count
FROM bands b
JOIN releases_artists ra ON ra.band_db = b.db
JOIN releases_styles rs ON rs.release_db = ra.release_db
GROUP BY b.db, rs.style
) s
GROUP BY db;

How to ORDER a LEFT JOIN for 2 tables

I've (on reflection ridiculously) stored our 'punters' in 2 tables depending on whether they registered or paid through the express checkout form.
My SQL looks like this:
SELECT
DISTINCT(sale_id), sale_punter_type, sale_comment, sale_refund, sale_timestamp,
punter_surname, punter_firstname,
punter_checkout_surname, punter_checkout_firstname,
punter_compo_surname, punter_compo_firstname,
sale_random, sale_scanned, sale_id
FROM
sale
LEFT JOIN
punter ON punter_id = sale_punter_no
LEFT JOIN
punter_checkout ON punter_checkout_id = sale_punter_no
LEFT JOIN
punter_compo ON punter_compo_id = sale_punter_no
WHERE
sale_event_no = :id
ORDER BY
punter_surname, punter_firstname,
punter_checkout_surname, punter_checkout_firstname
It returns results BUT lists the registered users alphabetically first, then the checkout punters alphabetically.
My question is there a way to get all of the users (registered or checkout) all together sorted alphabetically in 1 sorted list instead of 2 joined sorted lists.
I thought maybe I could use something like punter_checkout_surname AS punter_surname but that didn't work.
Any thoughts? I know now that I shouldn't have used 2 separate tables but but I'm stuck with it now.
Thank you.
I think you just want to use coalesce().
ORDER BY COALESCE(punter_surname, punter_checkout_surname)
COALESCE(punter_firstname, punter_checkout_firstname)
Other comments:
I doubt that DISTINCT is necessary. Why would this generate multiple rows for a single sale_id?
When a query has multiple tables, qualify all the column names (that is, include table aliases so you and others know where the table comes from).
Your data has three sets of names. That seems overkill.
You might want to put the COALESCE() in the SELECT so you don't have quite so many names generated by the query.
Here's my answer:
ORDER BY
COALESCE( UCASE( punter_surname) , UCASE( punter_checkout_surname ), UCASE( punter_compo_surname ) ) ,
COALESCE( UCASE( punter_firstname ) , UCASE( punter_checkout_firstname ), UCASE( punter_compo_firstname) )