Rails 4 .where.not - mysql

pry(main)> Loan.joins(:statistics).where(state: <some states>).where.not(statistics: {state: <some other states>}).order(created_at: "desc").last.statistics.map(&:state)
2015-09-21 20:53:54,423|65310|DEBUG|development| - Loan Load (0.9ms) SELECT `loans`.* FROM `loans` INNER JOIN `statistics` ON `statistics`.`loan_id` = `loans`.`id` WHERE `loans`.`state` IN ('started', 'pending_declined') AND (`statistics`.`state` NOT IN ('prequalified', 'conditionally_approved', '4506t_results_uploaded', 'customer_forms_uploaded', 'ready_for_etran', 'etran_verified', 'forms_to_be_verified', 'forms_verified', 'credit_memo_entered', 'loandoc_generated', 'loandoc_completed', 'loandoc_customer_received_need_signatures', 'signatures_checked_and_uploaded', 'boarded')) ORDER BY `loans`.`created_at` ASC LIMIT 1
2015-09-21 20:53:54,426|65310|DEBUG|development| - Statistic Load (0.3ms) SELECT DISTINCT `statistics`.* FROM `statistics` WHERE `statistics`.`loan_id` = 97
=> ["started", "prequalified", "conditionally_approved", "customer_forms_uploaded", "ready_for_etran", "pending_declined"]
So, maybe I'm not understanding what's going on here... I'm asking SQL to find me some Loans where their Statistics do not contain certain values. In this example, I'm saying to leave out any loans with a Statistic of prequalified, but, as you can see from the print out, the Loan#statistics does have prequalified, along with several other states I'd like to leave out.
Can anyone shed some light on this? I've been fighting with it for hours, and my head is spinning at this point.

With that ActiveRecord query, you've:
First, found a set of loans
next, ordered by created_at
then, used last to find limit to 1 result, finding the oldest of the set
So, you have an instance of Loan.
Since you called #statistics on the method, I can infer that loan has_many :statistics, and you've found all statistics that holds a foreign key value that matches the instance of Loan that you found. Now you have set of statistics.
For the set of statistics, you've mapped them to the map attribute.
Since you've already joined the statistics try removing .last.statistics from your query. User map on the result set to its state. Also, consider using #includes or #select.

It because you use last.statistics. It means the result on loan object will be joined with statistics whereas you have created condition before.
Look at your last result query:
Statistic Load (0.3ms) SELECT DISTINCT `statistics`.* FROM `statistics` WHERE `statistics`.`loan_id` = 97
remove your last.statistics
Loan.joins(:statistics).where(state: <some states>).where.not(statistics: {state: <some other states>}).order(created_at: "desc").map(&:state)
or
if you want to add condition to determine some loans that you need in before map(&state)
Loan.joins(:statistics).where(state: <some states>).where.not(statistics: {state: <some other states>}).where("loans.id IN (97)").order(created_at: "desc")

You query returns product of Loan and Statistic so it still returns Loan records that have some Statistic that does not have state you specified.
If you only want Loan that has no Statistic on those states at all you probably want your SQL to be something along this line:
SELECT loans.*,
FROM loans
LEFT OUTER JOIN (
SELECT statistics.loan_id, COUNT(*) count
FROM quotes
WHERE statistics.state IN ('prequalified', 'conditionally_approved')
GROUP BY statistics.loan_id
) statistics
ON statistics.loan_id = loans.id
WHERE loans.state IN ('started', 'pending_declined')
AND statistics.count IS NULL;
My SQLfu is not what I'd be proud of so this might not be the most optimised query ever but it should get the result you expect.
You could convert that to ActiveRecord query interface but unfortunately subquery and LEFT JOIN are not really supported, at least not in the way that we going to use it will be something like this:
join_query = <<SQL
LEFT OUTER JOIN (
SELECT statistics.loan_id, COUNT(*) AS count
FROM statistics
WHERE statistics.state IN (<<state>>)
) statistics ON loans.id = statistics.loan_id
SQL
Loan
.joins(join_query)
.where(statistics: { count: null })
.where(state: <<somestate>>)
.order(created_at: :desc)
The <<SQL ... SQL is Heredoc by the way if you're not familiar with it.

Related

MySQL Query gets too complex for me

I'm trying to write a MYSQL Query that updates a cell in table1 with information gathered from 2 other tables;
The gathering of data from the other 2 tables goes without much issues (it is slow, but that's because one of the 2 tables has 4601537 records in it.. (because all the rows for one report are split in a separate record, meaning that 1 report has more than 200 records)).
The Query that I use to Join the two tables together is:
# First Table, containing Report_ID's: RE
# Table that has to be updated: REGI
# Join Table: JT
SELECT JT.report_id as ReportID, REGI.Serienummer as SerialNo FROM Blancco_Registration.TrialTable as REGI
JOIN (SELECT RE.Value_string, RE.report_id
FROM Blancco_new.mc_report_Entry as RE
WHERE RE.path_id=92) AS JT ON JT.Value_string = REGI.Serienummer
WHERE REGI.HardwareType="PC" AND REGI.BlanccoReport=0 LIMIT 100
This returns 100 records (I limit it because the database is in use during work hours and I don't want to steal all resources).
However, I want to use these results in a Query that updates the REGI table (which it uses to select the 100 records in the first place).
However, I get the error that I cannot select from the table itself while updateing it (logically). So I tried selecting the select statement above into a temp table and than Update it; however, then I get the issue that I get to much results (logically! I only need 1 result and get 100) however, I'm getting stuck in my own thougts.. I ultimately need to fill the ReportID into each record of REGI.
I know it should be possible, but I'm no expert in MySQL.. is there anybody that can point me into the right direction?
Ps. fixing the table containing 400k records is not an option, it's a program from an external developer and I can only read that database.
The errors I'm talking about are as follows:
Error Code: 1093. You can't specify target table 'TrialTable' for update in FROM clause
When I use:
UPDATE TrialTable SET TrialTable.BlanccoReport =
(SELECT JT.report_id as ReportID, REGI.Serienummer as SerialNo FROM Blancco_Registration.TrialTable as REGI
JOIN (SELECT RE.Value_string, RE.report_id
FROM Blancco_new.mc_report_Entry as RE
WHERE RE.path_id=92) AS JT ON JT.Value_string = REGI.Serienummer
WHERE REGI.HardwareType="PC" AND REGI.BlanccoReport=0 LIMIT 100)
WHERE TrialTable.HardwareType="PC" AND TrialTable.BlanccoReport=0)
Then I tried:
UPDATE TrialTable SET TrialTable.BlanccoReport = (SELECT ReportID FROM (<<and the rest of the SQL>>> ) as x WHERE X.SerialNo = TrialTable.Serienummer)
but that gave me the following error:
Error Code: 1242. Subquery returns more than 1 row
Haveing the Query above with a LIMIT 1, gives everything the same result
Firstly, your query seems to be functionally identical to the following:
SELECT RE.report_id ReportID
, REGI.Serienummer SerialNo
FROM Blancco_Registration.TrialTable REGI
JOIN Blancco_new.mc_report_Entry RE
ON RE.Value_string = REGI.Serinummer
WHERE REGI.HardwareType = "PC"
AND REGI.BlanccoReport=0
AND RE.path_id=92
LIMIT 100
So, why not use that?
EDIT:
I still don't get it. I can't see what part of the problem the following fails to solve...
UPDATE TrialTable REGI
JOIN Blancco_new.mc_report_Entry RE
ON RE.Value_string = REGI.Serinummer
SET TrialTable.BlanccoReport = RE.report_id
WHERE REGI.HardwareType = "PC"
AND REGI.BlanccoReport=0
AND RE.path_id=92;
(This is not an answer, but maybe a pointer towards a few points that need further attention)
Your JT sub query looks suspicious to me:
(SELECT RE.Value_string, RE.report_id
FROM Blancco_new.mc_report_Entry as RE
WHERE RE.path_id=92
GROUP BY RE.report_id)
You use group by but don't actually use any aggregate functions. The column RE.Value_string should strictly be something like MAX(RE.Value_string) instead.

Joining and filtering one-to-many relationship

I need some help about optimal structuring of SQL query. I have model like this:
I'm trying some kind of join between tables NON_NATURAL_PERSON and NNP_NAME. Because I have many names in table NNP_NAME for one person I can't do one-to-one SELECT * from NON_NATURAL_PERSON inner join NNP_NAME etc. That way I'll get extra rows for every name one person has.
Data in tables:
How to extend this query to get rows marked red on picture shown below? My wannabe query criteria is: Always join name of typeA only if exists. If not, join name of typeB. If neither exists join name of typeC.
SELECT nnp.ID, name.NAME, name.TYPE
FROM NON_NATURAL_PERSON nnp
INNER JOIN NNP_NAME name ON (name.NON_NATURAL_PERSON = nnp.ID)
If type is spelled exactly as it's written (typeA, typeB, typeC) then you can use MIN() function:
SELECT NON_NATURAL_PERSON, MIN(type) AS min_type
FROM NNP_NAME
GROUP BY NON_NATURAL_PERSON
if you also want the username you can use this query:
SELECT
n1.NON_NATURAL_PERSON AS ID,
n1.Name,
n1.Type
FROM
NNP_NAME n1 LEFT JOIN NNP_NAME n2
ON n1.NON_NATURAL_PERSON = n2.NON_NATURAL_PERSON
AND n1.Type > n2.type
WHERE
n2.type IS NULL
Please see this fiddle. If Types are not literally sorted, change this line:
AND n1.Type > n2.type
with this:
AND FIELD(n1.Type, 'TypeA', 'TypeB', 'TypeC') >
FIELD(n2.type, 'TypeA', 'TypeB', 'TypeC')
MySQL FIELD(str, str1, str2, ...) function returns the index (position) of str in the str1, str2, ... list, and 0 if str is not found. You want to get the "first" record, ordered by type, for every NON_NATURAL_PERSON. There are multiple ways to get this info, I chose a self join:
ON n1.NON_NATURAL_PERSON = n2.NON_NATURAL_PERSON
AND n1.Type > n2.type -- or filed function
with the WHERE condition:
WHERE n2.type IS NULL
this will return all rows where the join didn't succeed - the join won't succeed when there is not n2.type that is less than n1.type - it will return the first record.
Edit
If you want a platform independent solution, avoiding the creation of new tables, you could use CASE WHEN, just change
AND n1.Type > n2.Type
with
AND
CASE
WHEN n1.Type='TypeA' THEN 1
WHEN n1.Type='TypeB' THEN 2
WHEN n1.Type='TypeC' THEN 3
END
>
CASE
WHEN n2.Type='TypeA' THEN 1
WHEN n2.Type='TypeB' THEN 2
WHEN n2.Type='TypeC' THEN 3
END
There is a piece of information missing. You say:
Always join name of typeA only if exists. If not, join name of typeB. If neither exists join name of typeC.
But you do not indicate why you prefer typeA over typeB. This information is not included in your data.
In the answer of #fthiella, either lexicographical is assumed, or an arbitrary order is given using FIELD. This is also the reason why two joins with the table nnp_name is necessary.
You can solve this problem by adding a table name_type (id, name, order) and changing the type column to contain the id. This will allow you to add the missing information in a clean way.
With an additional join with this new table, you will be able get the preferred nnp_name for each row.

Using a COUNT value in an expression getting..does not include specified expression as part of an aggregate function

I am trying to display a warning if a bike station gets to over 90% full or less than 10% full. When i run this query I get "you are trying to execute query that does not include the iif statment... as part of an aggregate function.
Bike_locations table - Bicycle_id and Locations_ID
Locations table - Locations_ID, No_of_Spaces, Location_Address
SELECT Locations.Location_Address, Count(Bike_Locations.Bicycle_ID) AS CountOfBicycle_ID,
IIf(((([CountOfBicycle_ID]/[LOCATIONS]![No_Of_Spaces])*100)>90),"This Station is nearly full.
Need to move some bicycles out of here",IIf(((([CountOfBicycle_ID]/[LOCATIONS]![No_Of_Spaces])*100)
<10),"This station is nearly empty. Need to move some bicycles here","")) AS Warnings
FROM Locations INNER JOIN Bike_Locations ON Locations.[LOCATIONS_ID] = Bike_Locations.[LOCATIONS_ID]
GROUP BY Locations.Location_Address;
Anyone got a scooby
When you use a GROUP BY, you should have the exact same fields in both your SELECT and GROUP BY statements, except for the aggregate function that should only be specified in the SELECT
The aggregate function in your case is the COUNT(*)
The fields you aggregate on are:
in the SELECT : Location_Address and Warnings
in the GROUP BY : Location_Address only
The error message is telling you that you don't have the same in both statements.
2 solutions:
Remove the Warnings from the SELECT statement
Add the Warnings to the GROUP BY statement
Note that in MS Access SQL, you can't (unfortunately) use in the GROUP BY, the Aliases specified in the SELECT. So you have to copy over the whole field, which would be the long iif in your case
Edit: better solution proposal:
I would radically change your approach as you'll go no where with all those nested iff
Create the following Query and Name it (for instance) Stations_Occupation
SELECT L.Locations_ID AS ID,
L.Location_Address AS Addr,
L.No_of_Spaces AS TotSpace,
BL.cnt AS OccSpace,
ROUND((BL.cnt/L.No_of_Spaces*100),0) AS OccPourc
FROM Locations L
LEFT JOIN
(
SELECT Locations_ID, COUNT(*) AS cnt
FROM Bike_Locations
GROUP BY LOCATIONS_ID
) AS BL ON L.Locations_ID = BL.Locations_ID
This query will probably be a lot helpfull in many parts of your application, and not only here, as it calculates the occupation % of each station
Some examples:
Get all stations with >90% occupation:
SELECT Addr
FROM Stations_Occupation
WHERE OccPourc > 90
Get all stations with <10% occupation:
SELECT Addr
FROM Stations_Occupation
WHERE OccPourc < 10
Get Occupation level of a specific station:
SELECT OccPourc
FROM Stations_Occupation
WHERE ID=specific_station_ID
Get number of bikes and max on a specific station:
SELECT OccSpace & "/" & TotSpace
FROM Stations_Occupation
WHERE ID=specific_station_ID

Refine Query Results from MySQL Database

I have the following query:
SELECT routes.route_date, time_slots.name, time_slots.openings, time_slots.appointments
FROM routes
INNER JOIN time_slots ON routes.route_id = time_slots.route_id
WHERE route_date
BETWEEN 20140109
AND 20140115
AND time_slots.openings > time_slots.appointments
ORDER BY route_date, name
This works just fine and will produce the following results:
What I want to do is only return one name per date. So the 9th, name = 1, would only have 1 result, rather than 2, as it currently does.
UPDATE: See the SQLFIDDLE for different type of solutions here: http://sqlfiddle.com/#!2/9ac65b/6
Will it solve your request if you use...
SELECT DISTINCT routes.route_date...your query... ?
It depends if you know that your rows always will have the same values, for same date/name.
Otherwise use group by...
(which I think suits your request best)
SELECT routes.route_date, time_slots.name, sum(time_slots.openings), sum(time_slots.appointments)
FROM routes
INNER JOIN time_slots ON routes.route_id = time_slots.route_id
WHERE route_date
BETWEEN 20140109
AND 20140115
AND time_slots.openings > time_slots.appointments
group by routes.route_date, time_slots.name
ORDER BY route_date, name
(i did a sum for the openings and appointments, you could do min, max, count, etc. Pick the one that fits your requirements best!)
You need to figure out which "name" you want when there are several for the same date.
Then you can group by date and select the right "name" by using an aggregate function like COUNT, MAX, etc.
I can't help you more if you don't explain your rule for picking one.

indexes in mysql SELECT AS or using Views

I'm in over my head with a big mysql query (mysql 5.0), and i'm hoping somebody here can help.
Earlier I asked how to get distinct values from a joined query
mysql count only for distinct values in joined query
The response I got worked (using a subquery with join as)
select *
from media m
inner join
( select uid
from users_tbl
limit 0,30) map
on map.uid = m.uid
inner join users_tbl u
on u.uid = m.uid
unfortunately, my query has grown more unruly, and though I have it running, joining into a derived table is taking too long because there is no indexes available to the derived query.
my query now looks like this
SELECT mdate.bid, mdate.fid, mdate.date, mdate.time, mdate.title, mdate.name,
mdate.address, mdate.rank, mdate.city, mdate.state, mdate.lat, mdate.`long`,
ext.link,
ext.source, ext.pre, meta, mdate.img
FROM ext
RIGHT OUTER JOIN (
SELECT media.bid,
media.date, media.time, media.title, users.name, users.img, users.rank, media.address,
media.city, media.state, media.lat, media.`long`,
GROUP_CONCAT(tags.tagname SEPARATOR ' | ') AS meta
FROM media
JOIN users ON media.bid = users.bid
LEFT JOIN tags ON users.bid=tags.bid
WHERE `long` BETWEEN -122.52224684058 AND -121.79760915942
AND lat BETWEEN 37.07500915942 AND 37.79964684058
AND date = '2009-02-23'
GROUP BY media.bid, media.date
ORDER BY media.date, users.rank DESC
LIMIT 0, 30
) mdate ON (mdate.bid = ext.bid AND mdate.date = ext.date)
phew!
SO, as you can see, if I understand my problem correctly, i have two derivative tables without indexes (and i don't deny that I may have screwed up the Join statements somehow, but I kept messing with different types, is this ended up giving me the result I wanted).
What's the best way to create a query similar to this which will allow me to take advantage of the indexes?
Dare I say, I actually have one more table to add into the mix at a later date.
Currently, my query is taking .8 seconds to complete, but I'm sure if I could take advantage of the indexes, this could be significantly faster.
First, check for indices on ext(bid, date), users(bid) and tags(bid), you should really have them.
It seems, though, that it's LONG and LAT that cause you most problems. You should try keeping your LONG and LAT as a (coordinate POINT), create a SPATIAL INDEX on this column and query like that:
WHERE MBRContains(#MySquare, coordinate)
If you can't change your schema for some reason, you can try creating additional indices that include date as a first field:
CREATE INDEX ix_date_long ON media (date, `long`)
CREATE INDEX ix_date_lat ON media (date, lat)
These indices will be more efficient for you query, as you use exact search on date combined with a ranged search on axes.
Starting fresh:
Question - why are you grouping by both media.bid and media.date? Can a bid have records for more than one date?
Here's a simpler version to try:
SELECT
mdate.bid,
mdate.fid,
mdate.date,
mdate.time,
mdate.title,
mdate.name,
mdate.address,
mdate.rank,
mdate.city,
mdate.state,
mdate.lat,
mdate.`long`,
ext.link,
ext.source,
ext.pre,
meta,
mdate.img,
( SELECT GROUP_CONCAT(tags.tagname SEPARATOR ' | ')
FROM tags
WHERE ext.bid = tags.bid
ORDER BY tags.bid GROUP BY tags.bid
) AS meta
FROM
ext
LEFT JOIN
media ON ext.bid = media.bid AND ext.date = media.date
JOIN
users ON ext.bid = users.bid
WHERE
`long` BETWEEN -122.52224684058 AND -121.79760915942
AND lat BETWEEN 37.07500915942 AND 37.79964684058
AND ext.date = '2009-02-23'
AND users.userid IN
(
SELECT userid FROM users ORDER BY rank DESC LIMIT 30
)
ORDER BY
media.date,
users.rank DESC
LIMIT 0, 30
You might want to compare your perforamnces against using a temp table for each selection, and joining those tables together.
create table #whatever
create table #whatever2
insert into #whatever select...
insert into #whatever2 select...
select from #whatever join #whatever 2
....
drop table #whatever
drop table #whatever2
If your system has enough memory to hold full tables this might work out much faster. It depends on how big your database is.