mysql group by where another field is the same - mysql

I have a database full of train movement data when a train enters a stations we may get an arrival message and when the train leaves the station to head to the next destination we may get a departure message.
therefore when a train hits a station we will normally get 2 messages 1 for when it arrived and 1 for when it departed. However sometimes there are mistakes in this data and so we can get another movement message to correct the departure/arrival data. If a movement message is a correction of a previous one it will have a correction_ind of 1 otherwise it will have a correction_ind of 0.
This means that for a given station we can have a total of 4 messages (departure, arrival, fixed departure, fixed arrival)
I'm trying to get 0/1 departure messages and 0/1 arrival messages for each station along a route for a specific train. Where we select movement message in the following order:
Pick the fixed message (if it exists) otherwise
Pick the first movement message (if it exists) otherwise
don't pick anything
My query looks like this:
SELECT
tm.variation_status,
tm.planned_timestamp,
tm.platform,
tm.actual_timestamp,
tm.event_type,
tm.timetable_variation,
sched.tps_description
FROM
train_activation ta,
train_movement tm
LEFT JOIN
cif_tiploc sched ON sched.stanox = tm.loc_stanox
WHERE
train_uid = 'C40200'
AND date(creation_timestamp) = '2014-08-20'
AND tm.train_id = ta.train_id
ORDER BY tm.correction_ind ASC
the problem I have with this query is for a given station we can get 0-2 departure messages and 0-2 arrival messages. If I add the following GROUP BY tm.event_type (this is the field that tells us if this is a departure or arrival message) we will only get 2 messages in total as it will group all the depature's together and all the arrivals together!
how can I re-write this query so we only select the best arrival/depature message for a each station along the route?
a station can be identified by tm.loc_stanox or sched.tps_description
a message will tell us if its an depart or arrival by tm.event_type
a message will tell us if its a correction to a previous message by tm.correction_ind which will be 1 if its a correction or 0 if its not
any help on the issue would be amazing.

How about trying an order by case/when condition, but for each respective station.
order by
tm.loc_stanox,
case when tm.event_type like 'fixed%' then 1
when tm.event_type like 'arrive%' then 2
when tm.event_type like 'depart%' then 3 end
I wish I had more sample data to see the real scenarios you are describing to better assist.

Related

Get number of in time deliveries according to specific customer hour in SQL

I've 2 SQL tables:
The first is called LIVRAISON and stores all deliveries. It especially contains the customer's number (NumCptClient), date and time of the delivery (heureLiv, DateLiv) and the town name where the customer has been delivered (nomVille).
The second is called IMPERATIF that modelize a special service to clients wanting to be delivered before a given hour. That special service stands by a customer's number, a town name, the maximal hour when the customer must be delivered (heureImp) and start/end date of that premium option (dateDebImp, dateFinImp) : a delivery enters in the special service field when it matchs an IMPERATIF row by a NumCptClient AND nomVille combination and that its date is in the date range of that row (a customer, although having special service's orders, can be delivered in a town not concerned and vice versa: that kind of deliveries musn't be taken in account here).
I need to compute by a single and fast SQL query the rate between the number of deliveries that satisfied this special service (heureLiv <= heureImp) and the total number of deliveries for each couple of customer AND town concerned by this premium option.
I'd tailored that request that gives me all needed info except the rate:
SELECT NumCptCli, nomVille, heureImp, COUNT(*) AS TotalLivs
FROM LIVRAISON
NATURAL JOIN IMPERATIF
GROUP BY NumCptCli, nomVille
Then, the question is fondamentally, how to change that query so as to add a column with the corresponding rate, that is livsAlheure column below, and without display special service's couples that haven't any corresponding delivery registered yet?
NumCptCli |nomVille |heureImp|TotalLivs |livsAlheure
----------|--------------------|--------|----------|-----------
12345678 |PARIS |07:30 |311 |80.56
87654321 |BREST |15:30 |314 |95.2
...

FIFO in MySQL db

I'm using Mysql (not MSSQL) database and bumped onto a problem I seem to be unable to solve. I would very much appreciate your help finding a solution.
I'd like to create a view using two tables, as described below:
1. product_in
product_code
received_time
received_amount
2. product_out
product_code
delivery_time
delivered_amount
The “view” should provide the following:
product_code
received_time
received_amount
of_which_delivered
My problem is that product_out is to be administered to the first incoming data (FIFO: first in first out), but since the amount delivered is either more or less than the received amount, I do not know how to calculate the “of_which_delivered”.
So far, I managed to put into order the incoming data, and sum up the outgoing (delivered) goods using SUM.
SELECT
sn,
product_code,
received_time,
received_amount,
delivered_amount
FROM
( SELECT
received_time,
received_amount,
#rend2 := If( #rend1 = product_code, #rend2 + 1, 1) as sn,
#rend1 := product_code AS product
FROM
product_in,
( SELECT #rend1 := 0, #rend2 := 0 ) AS tt
ORDER BY
product_code,
received_time ) AS k
LEFT JOIN
( SELECT
product_code AS prdct,
SUM(delivered_amount) AS delivered_amount
FROM
product_out
GROUP BY
product_code ) AS b
ON aru = product_code
I have not succeded in creating the loop that would make it possible to analyze if the output amount is more, or less than the received amount on a given day, and if more, then the difference be added to the received amount of another day.
To be more precise, here is an example:
Product Date Qty
nr.1 Sep 2 500
nr.1 Sep 3 300
nr1. Sep 4 200 on the 4th.
900 pcs were delivered out on the 5th.
In this case we should see the following in the view:
Product Action Date Qty
nr.1 received Sep 2 500 (delivered all 500)
nr.1 received Sep 3 300 (delivered all 300)
nr.1 received Sep 4 200 (only 100 delivered)
I would be very grateful to anyone who could help me find a solution!
Sorry no query adjustment for you, but... having worked with accounting systems in the past, your database structure appears short on a better handling of the in/out FIFO method (or even LIFO). What the underlying accounting system had was a inventory table for all receipts (abbreviated)
Table: ItemTrans
ItemTransID (auto-increment ID for any transaction)
ItemID (item id of the inventory item)
Status (status, such as In, Our or Inventory Adjustment)
Date (date of the activity)
Qty (quantity)
QtyUsed (running total column as inventory was used up)
As items were sold or adjustments. the Sales Details table would show which ItemTrans record the quantities were used for. So, if you had inventory on hand as you have in your sample, the sales order detail line would show the 900. The inventory used table would show which specific ItemTransID the quantities were allocated from and how many from said block, thus showing what your intended activity of release of actual product was at the time of sale.
This way, you don't have to recreate what inventory was what at a given point in time. Just query the sales order details and which items they were pulled from to get the data and how many from each block of receipt it went against.
This simplifies the process of generating the COGS (cost of goods sold) as would be reported to the General Ledger from a Sub-journal such as Accounts Receivable (Sales Orders) activity.
Do you have the ability to introduce such adjustments to your database structures?

Mysql SUM CASE with unique IDs only

Easiest explained through an example.
A father has children who win races.
How many of a fathers offspring have won a race and how many races in total have a fathers offspring won. (winners and wins)
I can easily figure out the total amount of wins but sometimes a child wins more than one race so to figure out winners I need only sum if the child has won, not all the times it has won.
In the below extract from a query I cannot use Distinct, so this doesn't work
SUM(CASE WHEN r.finish = '1' AND DISTINCT h.runnersid THEN 1 ELSE 0 END ) AS winners,
This also won't work
SUM(SELECT DISTINCT r.runnersid FROM runs r WHERE r.finish='1') AS winners
This works when I need to find the total amount of wins.
SUM(CASE WHEN r.finish = '1' THEN 1 ELSE 0 END ) AS wins,
Here is a sqlfiddle http://sqlfiddle.com/#!2/e9a81/1
Let's take this step by step.
You have two pieces of information you are looking for: Who has won a race, and how many races have they one.
Taking the first one, you can select a distinct runnersid where they have a first place finish:
SELECT DISTINCT runnersid
FROM runs
WHERE finish = 1;
For the second one, you can select every runnersid where they have a first place finish, count the number of rows returned, and group by runnersid to get the total wins for each:
SELECT runnersid, COUNT(*) AS numWins
FROM runs
WHERE finish = 1
GROUP BY runnersid;
The second one actually has everything you want. You don't need to do anything with that first query, but I used it to help demonstrate the thought process I take when trying to accomplish a task like this.
Here is the SQL Fiddle example.
EDIT
As you've seen, you don't really need the SUM here. Because finish represents a place in the race, you don't want to SUM that value, but you want to COUNT the number of wins.
EDIT2
An additional edit based on OPs requirements. The above does not match what OP needs, but I left this in as a reference to any future readers. What OP really needs, as I understand it now, is the number of children each father has that has run a race. I will again explain my thought process step by step.
First I wrote a simple query that pulls all of the winning father-son pairs. I was able to use GROUP BY to get the distinct winning pairs:
SELECT father, name
FROM runs
WHERE finish = 1
GROUP BY father, name;
Once I had done that, I used it is a subquery and the COUNT(*) function to get the number of winners for each father (this means I have to group by father):
SELECT father, COUNT(*) AS numWinningChildren
FROM(SELECT father, name
FROM runs
WHERE finish = 1
GROUP BY father, name) t
GROUP BY father;
If you just need the fathers with winning children, you are done. If you want to see all fathers, I would write one query to select all fathers, join it with our result set above, and replace any values where numWinningChildren is null, with 0.
I'll leave that part to you to challenge yourself a bit. Also because SQL Fiddle is down at the moment and I can't test what I was thinking, but I was able to test those above with success.
I think you want the father name along with the count of the wins by his sons.
select father, count(distinct(id)) wins
from runs where father = 'jack' and finish = 1
group by father
sqlfiddle
I am not sure if this is what you are looking for
Select user_id, sum(case when finish='1' then 1 else 0 end) as total
From table
Group by user_id

Relational Database Logic

I'm fairly new to php / mysql programming and I'm having a hard time figuring out the logic for a relational database that I'm trying to build. Here's the problem:
I have different leaders who will be in charge of a store anytime between 9am and 9pm.
A customer who has visited the store can rate their experience on a scale of 1 to 5.
I'm building a site that will allow me to store the shifts that a leader worked as seen below.
When I hit submit, the site would take the data leaderName:"George", shiftTimeArray: 11am, 1pm, 6pm (from the example in the picture) and the shiftDate and send them to an SQL database.
Later, I want to be able to get the average score for a person by sending a query to mysql, retrieving all of the scores that that leader received and averaging them together. I know the code to build the forms and to perform the search. However, I'm having a hard time coming up with the logic for the tables that will relate the data. Currently, I have a mysql table called responses that contains the following fields,
leader_id
shift_date // contains the date that the leader worked
shift_time // contains the time that the leader worked
visit_date // contains the date that the survey/score was given
visit_time // contains the time that the survey/score was given
score // contains the actual score of the survey (1-5)
I enter the shifts that the leader works at the beginning of the week and then enter the survey scores in as they come in during the week.
So Here's the Question: What mysql tables and fields should I create to relate this data so that I can query a leader's name and get the average score from all of their surveys?
You want tables like:
Leader (leader_id, name, etc)
Shift (leader_id, shift_date, shift_time)
SurveyResult (visit_date, visit_time, score)
Note: omitted the surrogate primary keys for Shift and SurveyResult that I would probably include.
To query you join shifts and surveys group on leader and taking the average then jon that back to leader for a name.
The query might be something like (but I haven;t actually built it in MySQL to verify syntax)
SELECT name
,AverageScore
FROM Leader a
INNER JOIN (
SELECT leader_id
, AVG(score) AverageScore
FROM Shift
INNER JOIN
SurveyResult ON shift_date = visit_date
AND shift_time = visit_time --depends on how you are recording time what this really needs to be
GROUP BY leader ID
) b ON a.leader_id = b.leader_id
I would do the following structure:
leaders
id
name
leaders_timetabke (can be multiple per leader)
id,
leader_id
shift_datetime (I assume it stores date and hour here, minutes and seconds are always 0
survey_scores
id,
visit_datetime
score
SELECT l.id, l.name, AVG(s.score) FROM leaders l
INNER JOIN leaders_timetable lt ON lt.leader_id = l.id
INNER JOIN survey_scores s ON lt.shift_datetime=DATE_FORMAT('Y-m-d H:00:00', s.visit_datetime)
GROUP BY l.id
DATE_FORMAT here helps to cut hours and minutes from visit_datetime so that it could be matched against shift_datetime. This is MYSQL function, so if you use something else you'll need to use different function
Say you have a 'leader' who has 5 survey rows with scores 1, 2, 3, 4 and 5.
if you select all surveys from this leader, sum the survey scores and divide them by 5 (the total amount of surveys that this leader has). You will have the average, in this case 3.
(1 + 2 + 3 + 4 + 5) / 5 = 3
You wouldn't need to create any more tables or fields, you have what you need.

Very complex Group By / Unique / Limit by SQL-command

I actually don't even know how to call this :P, but...
I have one table, let's call it "uploads"
id owner date
-----------------------------
0 foo 20100101120000
1 bar 20100101120300
2 foo 20100101120400
3 bar 20100101120600
.. .. ..
6 foo 20100101120800
Now, when I'ld do something like:
SELECT id FROM uploads ORDER BY date DESC
This would result in:
id owner date
-----------------------------
6 foo 20100101120800
.. .. ..
3 bar 20100101120600
2 foo 20100101120400
1 bar 20100101120300
0 foo 20100101120000
Question: Nice, but, I want to go even further. Because now, when you would build a timeline (and I did :P), you are 'spammed' by messages saying foo and bar uploaded something. I'ld like to group them and return the first result with a time-limit of '500' at the date-field.
What kind of SQL-command do I need that would result in:
id owner date
-----------------------------
6 foo 20100101120800
3 bar 20100101120600
0 foo 20100101120000
Then, after that, I can perform a call for each record to get the associative records in a timeframe of 5 minutes (this is an exmaple for id=6):
SELECT id FROM uploads WHERE date>=20100101120800-500 ORDER BY date DESC
Does anyone now how I should do the first step? (so limiting/grouping the results)
(btw. I know that when I want to use this, I should convert every date (YmdHis=60) to Unix-time (=100), but I don't need the 5 minutes to be exactly 5 minutes, they may be a minute less sometimes...)
I'm not quite clear on the result you are trying to get, even with your examples. Perhaps something with rounding and group by.
SELECT max(id) max_id,owner, (ROUND(date/500)*500) date_interval, max(date) date
FROM uploads GROUP BY date_interval,owner
You may want to use FLOOR or CEILING instead of ROUND, depending on what you want.
Standard SQL doesn't deal with intervals very well.
You are going to need to do a self-join of the table to compare dates of different tuples.
That way, you can easily find all pairs of tuples of which the dates are no more than 500 apart.
However, you really want to cluster the dates in sets no more than 500 apart - and that can't be expressed in SQL at all, as far as I know.
What you can do is something quite similar: split the total time interval into fixed 500-unit ranges, and then cluster all tuples in the table based on the interval they're in. For that, you first need a table or query result with the start times of the intervals; this can be created using a SQL query on your table and a function that either "rounds off" a timestamp to the starting time in its interval, or computes its interval sequence number. Then as a second step you can join the table with that result to group its timestamps according to their corresponding start time. I can't give the SQL because it's DBMS-dependent, and I certainly can't tell you if this is the best way of accomplishing what you want in your situation.
Use an inline view? e.g. something like
SELECT u1.*
FROM uploads u1,
(SELECT date
FROM uploads u2
WHERE u2.owner='foo') datum_points
WHERE u1.date BETWEEN datum_points.date
AND DATE_ADD(datum_points.date INTERVAL 5 MINUTES)
should return all the posts made within 5 minutes of 'foo' making a post.