MySQL Query eliminate duplicates but only adjacent to each other - mysql

I have the following query..
SELECT Flights.flightno,
Flights.timestamp,
Flights.route
FROM Flights
WHERE Flights.adshex = '400662'
ORDER BY Flights.timestamp DESC
Which returns the following screenshot.
However I cannot use a simple group by as for example BCS6515 will appear a lot later in the list and I only want to "condense" the rows that are the same next to each other in this list.
An example of the output (note BCS6515 twice in this list as they were not adjacent in the first query)
Which is why a GROUP BY flightno will not work.

I don't think there's a good way to do so in SQL without a column to help you. At best, I'm thinking it would require a subquery that would be ugly and inefficient. You have two options that would probably end up with better performance.
One would be to code the logic yourself to prune the results. (Added:) This can be done with a procedure clause of a select statement, if you want to handle it on the database server side.
Another would be to either use other information in the table or add new information to the table for this purpose. Do you currently have something in your table that is a different value for each instance of a number of BCS6515 rows?
If not, and if I'm making correct assumptions about the data in your table, there will be only one flight with the same number per day, though the flight number is reused to denote a flight with the same start/end and times on other days. (e.g. the 10a.m. from NRT to DTW is the same flight number every day). If the timestamps were always the same day, then you could use DAY(timestamp) in the GROUP BY. However, that doesn't allow for overnight flights. Thus, you'll probably need something such as a departure date to group by to identify all the rows as belonging to the same physical flight.

GROUP BY does not work because 'timestamp' value is different for 2 BCS6515 records.
it will work only if:
SELECT Flights.flightno,
Flights.route
FROM Flights
WHERE Flights.adshex = '400662'
GROUP BY (Flights.flightno)

Related

Getting the highest number from a mysql query

My issue is I need to create an order so I can move items up and down in this web application. However, I can not index the order(ord column) values by an index with an incremental value because there are several companies in the same table that use this column.
My table structure is this:
Right now I am thinking that the easiest way would be to do a MAX and grab the highest number and use that as an index in a way so you never end up with the same number twice for a specific companies listing when you go to add a new entry for a company.
SELECT MAX(ord) FROM phonebook WHERE `id_company` = "51";
Would this be a wise route to go? OR maybe create a new database for each client and create and index and use that as a way order entries?
I suggest you aim for less than complete perfection in your assignment of ord values. You can get away with this as follows:
don't make ord unique. (It isn't).
rely on the ordering of phonebook_name to get a good order of names. MySQL has these wonderful case-insensitive collations for precisely this purpose.
I suppose you're trying to make some of the entries for a company come first, and others come last. Set the ord column to 50 for everybody, then give the entries you want first lower numbers, and the ones you want last higher numbers.
When you display data for a particular company, do it like this ...
SELECT whatever, whatever
FROM phonebook
WHERE id_company = 11
ORDER BY ord, phonebook_name, phonebook_number, id_phonebook
This ORDER BY clause will do what you want, and it will be stable if there are duplicates. You can then, in your user interface, move an entry up with a query like this.
UPDATE phonebook SET ord=ord-1 WHERE id_phonebook = :recordnumber

A Complicated MySQL Query

I want to perform a very complicated Query on a MySQL Table. Currently this MySQL Table stores user info like IP, Country, event_id and many other statistics like date_start date_end for specific events.
A specific event_id starts with date_start and when the user ends it a time() value is being written to the date_end column.
I want a query to find somehow all the suspicous users (ids return). Below are the rules that defines a suspicous user.
There are rows in the database for the user_id that has been connected from multiple countries. In this case where the country column has different values
There are many rows in the database for a specific event_id that the SUM OF (date_end-date_start) has a value for example +50% than all the other SUM of (date-end-date_start) of others events. With a simple words, the query should report the user_ids that have spent too much time on some events whereas they didn't spend too much time on all the others. The % percent value should be configurable.
I know it sounds crazy, however i tried to do it and i failed so much. I did that using PHP but it's slow and i'm sure that it can be done with queries.
Hope you understand me
Thank you
This problem is too big. Figure out how to find the users who have come from multiple countries. Then figure out how to get statistics on event durations. Then figure out how to identify outliers. Then, finally, try to merge all three solutions.
In general, use SQL to filter the data down to a manageable size, then PHP to do any further processing.

MySQL Query: Return all rows with a certain value in one column when value in another column matches specific criteria

This may be a little difficult to answer given that I'm still learning to write queries and I'm not able to view the database at the moment, but I'll give it a shot.
The database I'm trying to acquire information from contains a large table (TransactionLineItems) that essentially functions as a store transaction log. This table currently contains about 5 million rows and several columns describing products which are included in each transaction (TLI_ReceiptAlias, TLI_ScanCode, TLI_Quantity and TLI_UnitPrice). This table has a foreign key which is paired with a primary key in another table (Transactions), and this table contains transaction numbers (TRN_ReceiptNumber). When I join these two tables, the query returns one row for every item we've ever sold, and each row has a receipt number. 16 rows might have the same receipt number, meaning that all of these items were sold in a single transaction. Below that might be 12 more rows, each sharing another receipt number. All transactions are broken down into multiple rows like this.
I'm attempting to build a query which returns all rows sharing a single receipt number where at least one row with that receipt number meets certain criteria in another column. For example, three separate types of gift cards all have values in the TLI_ScanCode column that begin with "740000." I want the query to return rows with values beginning with these six digits in the TLI_ScanCode column, but I would also like to return all rows which share a receipt number with any of the rows which meet the given scan code criteria. Essentially, I need the query to return all rows for every receipt number which is also paired in at least one row with a gift card-related scan code.
I attempted to use a subquery to return a column of all receipt numbers paired with gift card scan codes, using "WHERE A.TRN_ReceiptAlias IN (subquery..." to return only those rows with a receipt number which matched one of the receipt numbers returned by the subquery. This appeared to run without issue for five minutes before the server ground to a halt for another twenty while it processed the query. The query appeared to complete successfully, but given that I was working with IT to restore normal store operations during this time I failed to obtain the results of the query (apart from the associated shame and embarrassment).
I'd like to know if there is a way to write a query to obtain this information without causing the server to hang. I'm assuming that either: a) it wasn't very smart to use a subquery in this manner on such a large table, or b) I don't know enough about SQL to obtain the information I need. I'm assuming the answer is both A and B, but I'd very much like to learn how to do this the right way. Any help would be greatly appreciated. Thanks!
SELECT *
FROM a as a1
JOIN b
ON b.id = a.id
JOIN a as a2
ON a2.id = b.id
WHERE b.some_criteria = 'something';
Include an index on (b.id,b.some_criteria)
You aren't the first person, nor will you be the last to bring down your system with an inefficient query.
The most important lesson is that "Decision Support" and "Analytics" really don't co-exist with a transaction system. You really want to pull the data into a datamart or datawarehouse or some other database that isn't your transaction database, so that you don't take the business offline.
In terms of understanding why your initial query was so inefficient, you want to familiarize yourself with the EXPLAIN EXTENDED syntax that returns you plan information that should help you debug your query and work on making it perform acceptably. If you update your question with the actual explain plan output for it, that would be helpful in determining what the issue is.
Just from the outline you provided, it does sound like a self join would make sense rather than the subquery.

MySQL query speed issues when counting from second table

So I'm having serious speed problems using a left join to count ticket comments from another table. I've tried using a sub-select in the count field and had precisely the same performance.
With the count, the query takes about 1 second on maybe 30 tickets, and 5 seconds for 19000 tickets (I have both a production and a development server so times are skewed a tad). I'm trying to optimize this as four variations of the query need to be run each time per page refresh.
Without the count, I see execution time fall from 1 second to 0.03 seconds, so certainly this count is being run across all tickets and not just the ones which are selected.
Here's a trimmed down version of the query in question:
SELECT tickets.ticket_id,
ticket_severity,
ticket_short_description,
ticket_full_description,
count(*) as commentCount,
FROM tickets (LEFT JOIN tickets_comment on ticket_id = tickets_comment.ticket_id)
WHERE ticket_status='Open'
and ticket_owner_id='133783475'
GROUP BY
everything,
under,
the,
sun
Now, not all tickets have comments, so I can't just do a right or standard join. When doing that the speed is fairly good (1/10th the current), but any tickets without comments aren't included.
I see three fixes for this, and would like any and all advice you have.
Create a new column comment_count and use a trigger/update query on new comment
Work with the UI and grab comments on the fly (not really wanted)
Hope stackoverflow folks have a more elegant solution :รพ
Ideas?
A co-worker has come to the rescue. The query was just using join improperly.
What must be done here is create a second table with a query like:
select count(*) from tickets_comment group by ticket_id where (clause matches other)
which will create a table with counts for each ticket id. Then join that table with the ticket table where the ticket ids match. It's not as wicked fast as creating a new column, but it's at least 1/10th the speed it was, so I'm pleased as punch.
Last step is converting nulls (on tickets where there were no comments) into zeros
Is by far the fastest solution and you'll see it done in Rails all the time because it really is that fast.
count(*) is really only used when you aren't selecting any other attributes. Try count(ticket_id) and see if that helps. I can't run explain so I can't test it myself but if your analysis is correct it should help.
Try running explain on the query to make sure the correct indexes are being used. If there are no indexes being used, create another one

MS-Access design pattern for last value for a grouping

It's common to have a table where for example the the fields are account, value, and time. What's the best design pattern for retrieving the last value for each account? Unfortunately the last keyword in a grouping gives you the last physical record in the database, not the last record by any sorting. Which means IMHO it should never be used. The two clumsy approaches I use are either a subquery approach or a secondary query to determine the last record, and then joining to the table to find the value. Isn't there a more elegant approach?
could you not do:
select account,last(value),max(time)
from table
group by account
I tested this (granted for a very small, almost trivial record set) and it produced proper results.
Edit:
that also doesn't work after some more testing. I did a fair bit of access programming in a past life and feel like there is a way to do what your asking in 1 query, but im drawing a blank at the moment. sorry.
After literally years of searching I finally found the answer at the link below #3. The sub-queries above will work, but are very slow -- debilitatingly slow for my purposes.
The more popular answer is a tri-level query: 1st level finds the max, 2nd level gets the field values based on the 1st query. The result is then joined in as a table to the main query. Fast but complicated and time-consuming to code/maintain.
This link works, still runs pretty fast and is a lot less work to code/maintain. Thanks to the authors of this site.
http://access.mvps.org/access/queries/qry0020.htm
The subquery option sounds best to me, something like the following psuedo-sql. It may be possible/necessary to optimize it via a join, that will depend on the capabilities of the SQL engine.
select *
from table
where account+time in (select account+max(time)
from table
group by account
order by time)
This is a good trick for returning the last record in a table:
SELECT TOP 1 * FROM TableName ORDER BY Time DESC
Check out this site for more info.
#Tom
It might be easier for me in general to do the "In" query that you've suggested. Generally I do something like
select T1.account, T1.value
from table T as T1
where T1 = (select max(T2.time) from table T as T2 where T1.account = T2.Account)
#shs
yes, that select last(value) SHOULD work, but it doesn't... My understanding although I can't produce an authorative source is that the last(value) gives the last physical record in the access file, which means it could be the first one timewise but the last one physically. So I don't think you should use last(value) for anything other than a really bad random row.
I'm trying to find the latest date in a group using the Access 2003 query builder, and ran into the same problem trying to use LAST for a date field. But it looks like using MAX finds the lates date.
Perhaps the following SQL is clumsy, but it seems to work correctly in Access.
SELECT
a.account,
a.time,
a.value
FROM
tablename AS a INNER JOIN [
SELECT
account,
Max(time) AS MaxOftime
FROM
tablename
GROUP BY
account
]. AS b
ON
(a.time = b.MaxOftime)
AND (a.account = b.account)
;