Connecting two tables and having count, distinct, and average of time difference

Connecting two tables and having count, distinct, and average of time difference - mysql

First, I apologize if my question is not correctly organized.
I am trying to run an SQL Query in Java in order to return all the records of time difference. So to explain more:
I have two tables. Table A has the following structure:
Table `A` (
`interaction_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`job_id` int(11) NOT NULL,
`task_id` varchar(250) NOT NULL,
`task_time` datetime DEFAULT NULL,
`task_assessment` float DEFAULT NULL,
)
Table `B` (
`task_id` varchar(250) NOT NULL,
`task_type` varchar(250) DEFAULT NULL,
`task_weight` float DEFAULT NULL,
`task_due` datetime DEFAULT NULL,
`Job_id` int(11) NOT NULL
)
what I need is to get the count(distinct) from table A -and I do that using the interaction_id
and then get their times -using the task_time for each user and i use "WHERE user_id='" + userId (a java parameter).
After that I want to link Table A with Table B using Job_id
so that I can get the difference date (in hour, so i used SELECT TIMEDIFF(Hour, A(task_time), B(task_due)).
Finally, i need to get Average of the time difference.
I believe its a bit complicated when describing. But, I would appreciate your advanced help!
Thank you very much

This query should gather the results that you are expecting:
select count(*) as countLines,
avg(time_to_sec(timediff(A.task_time, B.task_due)) / 3600)
from A
inner join B on A.job_id = B.job_id
where A.user_id = #userId

Related

Mysql count number of result with join

I have 2 tables.
CREATE TABLE $media_table (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`options` longtext DEFAULT NULL,
`order_id` int(11) unsigned DEFAULT NULL,
`player_id` int(11) unsigned NOT NULL,
PRIMARY KEY (`id`))
CREATE TABLE $category_table (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`category` varchar(300) DEFAULT NULL,
`media_id` int(11) unsigned DEFAULT NULL,
PRIMARY KEY (`id`))
I get id, options, category for rows matching category 'foo','bar'. I also use limit to get only x number of results.
SELECT mt.id, mt.options, ct.category
FROM $media_table as mt
LEFT JOIN $category_table as ct
ON mt.id = ct.media_id
WHERE mt.player_id = %d AND ct.category IN ('foo','bar')
GROUP BY ct.media_id
ORDER BY mt.order_id
LIMIT $limit
This works as intended. But I dont know how to get total number of results?
I tried this but the count is not correct.
SELECT COUNT(mt.id), ct.category
FROM $media_table as mt
LEFT JOIN $category_table as ct
ON mt.id = ct.media_id
WHERE mt.player_id = %d AND ct.category IN ('foo','bar')
GROUP BY ct.media_id
Where I select all results without the limit (in my previous query) the count is correct.
If I had only one table with primary key id I would do this to get count:
SELECT COUNT(id) FROM table
I dont know how to apply the same to my query.
Edit: I found my answer here select count(*) from select

Question 1: Are you looking at the raw results of the query using a tool like phpMyAdmin or MySQL WorkBency or what?
Question 2: Will the ultimate query results be delivered to the client via a web browser or what?
Answer 1: "The SUM() function returns the total sum of a numeric column."
SELECT SUM(column_name) FROM table_name WHERE condition;
Answer Possibility 2: If the results will be delivered in a web browser you should be able to use PHP or some other server side language like MS Active Server Pages to add up he "COUNT" field of each result.
Answer Possibility 3: Alternative 1: Export the results to a CVS file and import into a spreadsheet.
Maybe some of these suggestions will get the wheels turning and help you find the solution you are looking for.

Improve query speed suggestions

For self education I am developing an invoicing system for an electricity company. I have multiple time series tables, with different intervals. One table represents consumption, two others represent prices. A third price table should be still incorporated. Now I am running calculation queries, but the queries are slow. I would like to improve the query speed, especially since this is only the beginning calculations and the queries will only become more complicated. Also please note that this is my first database i created and exercises I have done. A simplified explanation is preferred. Thanks for any help provided.
I have indexed: DATE, PERIOD_FROM, PERIOD_UNTIL in each table. This speed up the process from 60 seconds to 5 seconds.
The structure of the tables is the following:
CREATE TABLE `apxprice` (
`APX_id` int(11) NOT NULL AUTO_INCREMENT,
`DATE` date DEFAULT NULL,
`PERIOD_FROM` time DEFAULT NULL,
`PERIOD_UNTIL` time DEFAULT NULL,
`PRICE` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`APX_id`)
) ENGINE=MyISAM AUTO_INCREMENT=28728 DEFAULT CHARSET=latin1
CREATE TABLE `imbalanceprice` (
`imbalanceprice_id` int(11) NOT NULL AUTO_INCREMENT,
`DATE` date DEFAULT NULL,
`PTU` tinyint(3) DEFAULT NULL,
`PERIOD_FROM` time DEFAULT NULL,
`PERIOD_UNTIL` time DEFAULT NULL,
`UPWARD_INCIDENT_RESERVE` tinyint(1) DEFAULT NULL,
`DOWNWARD_INCIDENT_RESERVE` tinyint(1) DEFAULT NULL,
`UPWARD_DISPATCH` decimal(10,2) DEFAULT NULL,
`DOWNWARD_DISPATCH` decimal(10,2) DEFAULT NULL,
`INCENTIVE_COMPONENT` decimal(10,2) DEFAULT NULL,
`TAKE_FROM_SYSTEM` decimal(10,2) DEFAULT NULL,
`FEED_INTO_SYSTEM` decimal(10,2) DEFAULT NULL,
`REGULATION_STATE` tinyint(1) DEFAULT NULL,
`HOUR` int(2) DEFAULT NULL,
PRIMARY KEY (`imbalanceprice_id`),
KEY `DATE` (`DATE`,`PERIOD_FROM`,`PERIOD_UNTIL`)
) ENGINE=MyISAM AUTO_INCREMENT=117427 DEFAULT CHARSET=latin
CREATE TABLE `powerload` (
`powerload_id` int(11) NOT NULL AUTO_INCREMENT,
`EAN` varchar(18) DEFAULT NULL,
`DATE` date DEFAULT NULL,
`PERIOD_FROM` time DEFAULT NULL,
`PERIOD_UNTIL` time DEFAULT NULL,
`POWERLOAD` int(11) DEFAULT NULL,
PRIMARY KEY (`powerload_id`)
) ENGINE=MyISAM AUTO_INCREMENT=61039 DEFAULT CHARSET=latin
Now when running this query:
SELECT i.DATE, i.PERIOD_FROM, i.TAKE_FROM_SYSTEM, i.FEED_INTO_SYSTEM,
a.PRICE, p.POWERLOAD, sum(a.PRICE * p.POWERLOAD)
FROM imbalanceprice i, apxprice a, powerload p
WHERE i.DATE = a.DATE
and i.DATE = p.DATE
AND i.PERIOD_FROM >= a.PERIOD_FROM
and i.PERIOD_FROM = p.PERIOD_FROM
AND i.PERIOD_FROM < a.PERIOD_UNTIL
AND i.DATE >= '2018-01-01'
AND i.DATE <= '2018-01-31'
group by i.DATE
I have run the query with explain and get the following result: Select_type, all simple partitions all null possible keys a,p = null i = DATE Key a,p = null i = DATE key_len a,p = null i = 8 ref a,p = null i = timeseries.a.DATE,timeseries.p.PERIOD_FROM rows a = 28727 p = 61038 i = 1 filtered a = 100 p = 10 i = 100 a extra: using where using temporary using filesort b extra: using where using join buffer (block nested loop) c extra: null
Preferably I run a more complicated query for a whole year and group by month for example with all price tables incorporated. However, this would be too slow. I have indexed: DATE, PERIOD_FROM, PERIOD_UNTIL in each table. The calculation result may not be changed, in this case quarter hourly consumption of two meters multiplied by hourly prices.

"Categorically speaking," the first thing you should look at is indexes.
Your clauses such as WHERE i.DATE = a.DATE ... are categorically known as INNER JOINs, and the SQL engine needs to have the ability to locate the matching rows "instantly." (That is to say, without looking through the entire table!)
FYI: Just like any index in real-life – here I would be talking about "library card catalogs" if we still had such a thing – indexes will assist both "equal to" and "less/greater than" queries. The index takes the computer directly to a particular point in the data, whether that's a "hit" or a "near miss."
Finally, the EXPLAIN verb is very useful: put that word in front of your query, and the SQL engine should "explain to you" exactly how it intends to carry out your query. (The SQL engine looks at the structure of the database to make that decision.) Although the EXPLAIN output is ... (heh) ... "not exactly standardized," it will help you to see if the computer thinks that it needs to do something very time-wasting in order to deliver your answer.

How to optimized mysql query having large dataset

I have two tables with the following schema,
CREATE TABLE `open_log` (
`delivery_id` varchar(30) DEFAULT NULL,
`email_id` varchar(50) DEFAULT NULL,
`email_activity` varchar(30) DEFAULT NULL,
`click_url` text,
`email_code` varchar(30) DEFAULT NULL,
`on_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `sent_log` (
`email_id` varchar(50) DEFAULT NULL,
`delivery_id` varchar(50) DEFAULT NULL,
`email_code` varchar(50) DEFAULT NULL,
`delivery_status` varchar(50) DEFAULT NULL,
`tries` int(11) DEFAULT NULL,
`creation_ts` varchar(50) DEFAULT NULL,
`creation_dt` varchar(50) DEFAULT NULL,
`on_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
The email_id and delivery_id columns in both tables make up a unique key.
The open_log table have 2.5 million records where as sent_log table has 0.25 million records.
I want to filter out the records from open log table based on the unique key (email_id and delivery_id).
I'm writing the following query.
SELECT * FROM open_log
WHERE CONCAT(email_id,'^',delivery_id)
IN (
SELECT DISTINCT CONCAT(email_id,'^',delivery_id) FROM sent_log
)
The problem is the query is taking too much time to execute. I've waited for an hour for the query completion but didn't succeed.
Kindly, suggest what I can do to make it fast since, I have the big data size in the tables.
Thanks,
Faisal Nasir

First, rewrite your query using exists:
SELECT *
FROM open_log ol
WHERE EXISTS (SELECT 1
FROM send_log sl
WHERE sl.email_id = ol.email_id and sl.delivery_id = ol.delivery_id
);
Then, add an index so this query will run faster:
create index idx_sendlog_emailid_deliveryid on send_log(email_id, delivery_id);
Your query is slow for a variety of reasons:
The use of string concatenation makes it impossible for MySQL to use an index.
The select distinct in the subquery is unnecessary.
Exists can be faster than in.

If this request is often on, you can greatly increase it by create bigint id column, enven if it not unique.
For example you can put trigger and create column like this
alter table sent_log for_get bigint;
After that create trigger/ update it to put hash into that bigint
for_get=CONV(substr(md5(concat(email_id, delivery_id)),1,10),16,10)
If you have such column in both table and index on it, query will be like
SELECT *
FROM open_log ol
left join send_log sl on sl.for_get=ol.for_get
WHERE sl.email_id is not null and sl.email_id = ol.email_id and sl.delivery_id = ol.delivery_id;
That query will be fast.

Ordering in MySQL Bogs Down

I've been working on a small Perl program that works with a table of articles, displaying them to the user if they have not been already read. It has been working nicely and it has been quite speedy, overall. However, this afternoon, the performance has degraded from fast enough that I wasn't worried about optimizing the query to a glacial 3-4 seconds per query. To select articles, I present this query:
SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM `channelitem`
WHERE ciid NOT
IN (
SELECT ciid
FROM `uninet_channelitem_read`
WHERE uid = '1030'
)
AND (
cid =117
OR cid =308
OR cid =310
)
ORDER BY `channelitem`.`creationdate` DESC
LIMIT 0 , 100
The list of possible cid's varies and could be quite a bit more. In any case, I noted that about 2-3 seconds of the total time to make the query is devoted to "ORDER BY." If I remove that, it only takes about a half second to give me the query back. If I drop the subquery, the performance goes back to normal... but the subquery didn't seem to be problematic until just this afternoon, after working fine for a week or so.
Any ideas what could be slowing it down so much? What might I do to try to get the performance back up to snuff? The table being queried has 45,000 rows. The subquery's table has fewer than 3,000 rows at present.
Update: Incidentally, if anyone has suggestions on how to do multiple queries or some other technique that would be more efficient to accomplish what I am trying to do, I am all ears. I'm really puzzled how to solve the problem at this point. Can I somehow apply the order by before the join to make it apply to the real table and not the derived table? Would that be more efficient?
Here is the latest version of the query, derived from suggestions from #Gordon, below
SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM `channelitem`
LEFT JOIN (
SELECT ciid, dateRead
FROM `uninet_channelitem_read`
WHERE uid = '1030'
)alreadyRead ON channelitem.ciid = alreadyRead.ciid
WHERE (
alreadyRead.ciid IS NULL
)
AND `cid`
IN ( 6648, 329, 323, 6654, 6647 )
ORDER BY `channelitem`.`creationdate` DESC
LIMIT 0 , 100
Also, I should mention what my db structure looks like with regards to these two tables -- maybe someone can spot something odd about the structure:
CREATE TABLE IF NOT EXISTS `channelitem` (
`newsversion` int(11) NOT NULL DEFAULT '0',
`cid` int(11) NOT NULL DEFAULT '0',
`ciid` int(11) NOT NULL AUTO_INCREMENT,
`description` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`url` varchar(222) DEFAULT NULL,
`creationdate` datetime DEFAULT NULL,
`urgent` varchar(10) DEFAULT NULL,
`name` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`lastchanged` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`author` varchar(255) NOT NULL,
PRIMARY KEY (`ciid`),
KEY `newsversion` (`newsversion`),
KEY `cid` (`cid`),
KEY `creationdate` (`creationdate`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1638554365 ;
CREATE TABLE IF NOT EXISTS `uninet_channelitem_read` (
`ciid` int(11) NOT NULL,
`uid` int(11) NOT NULL,
`dateRead` datetime NOT NULL,
PRIMARY KEY (`ciid`,`uid`),
KEY `ciid` (`ciid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

It never hurts to try the left outer join version of such a query:
SELECT ci.ciid, ci.cid, ci.name, ci.description, ci.url, ci.creationdate, ci.author
FROM `channelitem` ci left outer join
(SELECT ciid
FROM `uninet_channelitem_read`
WHERE uid = '1030'
) cr
on ci.ciid = cr.ciid
where cr.ciid is null and
ci.cid in (117, 308, 310)
ORDER BY ci.`creationdate` DESC
LIMIT 0 , 100
This query will be faster with an index on uninet_channelitem_read(ciid) and probably on channelitem(cid, ciid, createddate).

The problem could be that you need to create an index on the channelitem table for the column creationdate. Indexes help a database to run queries faster. Here is a link about MySQL Indexing

Is this a MySQL bug or is my query wrong?

Yesterday I run into some sql weirdness. I had a query that melted the server so, trying to improve it, I made this query:
SELECT idEvent, MAX( fechaHora ) , codAgente, evento FROM eventos_centralita GROUP BY codAgente
And it seems to work for this schema:
CREATE TABLE IF NOT EXISTS `eventos_centralita` (
`idEvent` int(11) NOT NULL AUTO_INCREMENT,
`fechaHora` datetime NOT NULL,
`codAgente` varchar(8) DEFAULT NULL,
`extension` varchar(20) DEFAULT NULL,
`evento` varchar(45) DEFAULT NULL,
PRIMARY KEY (`idEvent`),
KEY `codAgente` (`codAgente`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=105847 ;
I mean, that the hour it's indeed the MAX one for the agent. However, the id of the event and the event itself is wrong...
So, is this a bug or is this expected?

You are mixing an aggregate function and a "normal" column select. This "feature" only works in MySQL and returns a random id.
Normally you should group by a specific column and the use aggregate functions to select all other columns not in that group. Example:
SELECT e1.codAgente, e1.idEvent, e1.fechaHora, e1.evento
FROM eventos_centralita e1
inner join
(
select codAgente, MAX(fechaHora) as fechaHora
from eventos_centralita
group by codAgente
) e2
on e1.codAgente = e2.codAgente and e1.fechaHora = e2.fechaHora

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Connecting two tables and having count, distinct, and average of time difference - mysql

This query should gather the results that you are expecting: select count(*) as countLines, avg(time_to_sec(timediff(A.task_time, B.task_due)) / 3600) from A inner join B on A.job_id = B.job_id where A.user_id = #userId

Related

Mysql count number of result with join

Improve query speed suggestions

How to optimized mysql query having large dataset

Ordering in MySQL Bogs Down

Is this a MySQL bug or is my query wrong?

Categories

Resources