SELECT-Statement for displaying a temporary state (two tables) - mysql

I am sorry for the title, but I could not find the correct wording.
Situation: I got a schema with two tables:
check_result:
node_name
requirement_number
status
reason
source
acceptance
node_name
requirement_number
status
reason
valid_from
valid_until
acceptor (mail / name...)
So the idea is to show the status of a check consisting of several entries in the check_table, where existing valid acceptances "overlay" the result / status of the check_table. For the beginning, not all information are neccessary. The output should contain:
requirement_number | status | reason | source/acceptor
How could I achieve this?
Edit: Requested outputs
sec_ora_acceptance | CREATE TABLE `sec_ora_acceptance` (
`node_name` varchar(20) NOT NULL,
`instance_oracle_sid` varchar(20) NOT NULL,
`req_no` int(11) NOT NULL,
`status` enum('OK','NOK','OPEN','NA') NOT NULL,
`reason` text NOT NULL,
`acceptor` varchar(45) NOT NULL,
`acceptor_mail` varchar(50) DEFAULT NULL,
`date` date NOT NULL,
`valid_until` date DEFAULT '9999-12-31',
PRIMARY KEY (`node_name`,`instance_oracle_sid`,`req_no`)
)
sec_ora_result | CREATE TABLE `sec_ora_result` (
`check_id` int(11) NOT NULL,
`req_no` int(11) NOT NULL,
`status` enum('OK','NOK','OPEN','NA') NOT NULL COMMENT 'OK, NOK, OPEN, N(ot)A(pplicable)',
`reason` text,
PRIMARY KEY (`check_id`,`req_no`)
)
Edit#2: Requested Information (example and results)
I adjusted the columns in the sec_ora_result, to make it easier (no other tables needed for comparison - just the two tables)
table sec_ora_result:
check_id|req_no|status|reason|node_name|instance_oracle_sid|source
1|1|OPEN|Could not be tested automatically|abc|ora1|automatic_security_test
2|4|OK|Software Version is OK|abc|ora1|automatic_security_test
3|5|NOK|There is a Problem|abc|ora1|autotic_security_test
table sec_ora_acceptance:
node_name|instance_oracle_sid|req_no|status|reason|acceptor|acceptor_mail|date|valid_until
abc|ora1|1|OK|Manual proof|Markus|markus#email.com|2014-02-20|9999-12-31
The result should now consist of the following
req_no|status|reason|source
1|OK|Manual proof|Markus
4|OK|Software Version is OK|automatic_security_test
5|NOK|There is a Problem|automatic_security_test
Regards
Markus

EDIT:
As far as I understand, then you would need something like this...
SELECT SR.req_no,
ISNULL( SA.status, SR.status) as Status,
ISNULL( SA.reason, SR.reason) as Reason,
ISNULL( SA.acceptor, SR.source) as Source
FROM sec_ora_result SR
left join sec_ora_acceptance SA on SA.req_no = SR.req_no
Note that we could use ISNULL because status, reason and acceptor are NOT NULL, so if they are NULL means the whole row is NULL and then we can take the SR field, however, if any of those fields could be null, then you would need a regular CASE WHEN SA.req_no is null then Field1 else Field2 end.
Also, take a look to the key used for the left join, not sure if you are filtering only by req_no or you also need something else...

Related

Pulling a random value out of a table is returning a null value

I have a stored procedure that I've used to 'de-identify' client information when I want to use it in a test environment. I am replacing actual names and addresses with random values. I have database tables in a database called dict (for dictionary) for female names, male names, last names, and addresses.
Each of these has a field called f_row_id that is a sequential number from 1 to x, one for each record in the table.
We recently upgraded to mySQL 8 and the stored procedure quit working. I ended up with NULL for every field where I tried filling in a random value out of the other table. In trying to find what will now work, I'm unable to get the following query to work as I expect:
SELECT
f_enroll_id,
(SELECT f_name FROM dict.dummy_female_first_name fn WHERE fn.f_row_id = (FLOOR(RAND() * 850) + 1) LIMIT 1)
FROM
t_enroll
My data table (that I eventually want to have contain random names) is called t_enroll. There is an ID field in that (f_enroll_id) I want to get a list of each ID and a random first name for each record in that table.
There are 850 records in the table of random first names (dummy_female_first_name) (in my stored procedure this is a session variable that I compute at the start of the procedure).
When I first tried running this I got an error that my sub-query returned more than one value. I don't understand why it would do that since (FLOOR(RAND() * 850) + 1) should return a single integer. So I added the LIMIT 1. But when I run this, about half of the returned rows have NULL for the first name.
I have verified that all the rows in my first name table have a row ID, that the row ID is unique, and there not any gaps in the numbers.
What do you think is causing this?
Thanks in advance!
Here is the schema for the table that I'm updating:
CREATE TABLE `t_enroll` (
`f_enroll_id` int(15) NOT NULL AUTO_INCREMENT,
`f_status` int(2) DEFAULT NULL,
`f_date_enrolled` date NOT NULL DEFAULT '0000-00-00',
`f_first_name` varchar(20) DEFAULT NULL,
`f_mi` char(1) DEFAULT NULL,
`f_last_name` varchar(20) NOT NULL DEFAULT '',
`f_maiden_name` varchar(20) DEFAULT NULL,
`f_dob` date NOT NULL DEFAULT '0000-00-00',
`f_date_fee_received` date NOT NULL DEFAULT '0000-00-00',
`f_gender` int(11) NOT NULL DEFAULT '2',
`f_address_1` varchar(40) DEFAULT NULL,
`f_address_2` varchar(20) DEFAULT NULL,
`f_quadrant` char(2) DEFAULT NULL,
`f_city` varchar(25) DEFAULT NULL,
`f_state` char(2) NOT NULL DEFAULT '',
`f_county` varchar(3) NOT NULL,
`f_zip_code` varchar(10) DEFAULT NULL,
PRIMARY KEY (`f_enroll_id`),
KEY `f_date_enrolled` (`f_date_enrolled`),
KEY `f_last_name` (`f_last_name`),
KEY `f_first_name` (`f_first_name`),
KEY `f_dob` (`f_dob`),
KEY `f_gender` (`f_gender`)
ENGINE=InnoDB AUTO_INCREMENT=532 DEFAULT CHARSET=latin1 COMMENT='InnoDB free: 15360 kB';
Here is the schema for the dictionary table where I pull names from:
CREATE TABLE `dummy_female_first_name` (
`f_row_id` int(11) NOT NULL,
`f_name` varchar(25) NOT NULL,
PRIMARY KEY (`f_row_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
As I mentioned in my comment, I have found an alternate approach using the ORDER BY RAND() LIMIT 1 variation. But I am still curious as to what is going on that prevented my original method to fail. This is something that changed in the more recent mySQL version because it used to work.
Thanks again.
It is a much more expensive approach, but you can use:
SELECT f_enroll_id,
(SELECT f_name FROM dict.dummy_female_first_name fn ORDER BY rand() LIMIT 1)
FROM t_enroll;
You can make this more efficient using:
SELECT f_enroll_id,
(SELECT f_name
FROM dict.dummy_female_first_name fn
WHERE rand() < 0.01
ORDER BY rand() LIMIT 1
)
FROM t_enroll;
The where clause means that about 8 rows will filter through so the sorting will be much faster.

Speed up mysql SQL query but with a huge dataset

I have a table that has over 2.5 million rows and I would like to run the following SQL Statment to get the
select count(*)
from workflow
where action_name= 'Workflow'
and release_date >= '2019-12-01 13:24:22'
and release_date <= '2019-12-31 13:24:22'
AND project_name= 'Web'
group
by page_id
, headline
, release_full_name
, release_date
The problem is that it takes over 2.7 seconds to return 0 rows as expected. Is there a way to speed it up more? I have 6 more SQL Statements that are similiar so that will take almost (2.7 seconds * 6) = 17 seconds at least.
Here is my table schema
CREATE TABLE workflow (
id int(11) NOT NULL AUTO_INCREMENT,
action_name varchar(100) NOT NULL,
project_name varchar(30) NOT NULL,
page_id int(11) NOT NULL,
headline varchar(200) NOT NULL,
create_full_name varchar(200) NOT NULL,
create_date datetime NOT NULL,
change_full_name varchar(200) NOT NULL,
change_date datetime NOT NULL,
release_full_name varchar(200) NOT NULL,
release_date datetime NOT NULL,
reject_full_name varchar(200) NOT NULL,
reject_date datetime NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_INCREMENT=2948271 DEFAULT CHARSET=latin1
What I'm looking for in this query is to get the count of the pages that were released last month. that have project_name = "web" and action_name = "Workflow"
This is bit bigger for comments
Using Group by with Count function doesn't make any sense. Usually you need to count actual rows in DB not after aggregation. Not sure if this is your actual requirement reason being GROUP BY causes slowness of the query.
Use composite Index on (Web, start_date) as column project seems highest selective.
For other information, Please share the explain plan.
Assuming that you need counts for groups (you had listed), better to include the group fields in select (essentially) like
select page_id, headline, release_full_name, release_date, count(*)
from ...
Adding an index with (page_id, headline) would optimize well.

Improve query speed suggestions

For self education I am developing an invoicing system for an electricity company. I have multiple time series tables, with different intervals. One table represents consumption, two others represent prices. A third price table should be still incorporated. Now I am running calculation queries, but the queries are slow. I would like to improve the query speed, especially since this is only the beginning calculations and the queries will only become more complicated. Also please note that this is my first database i created and exercises I have done. A simplified explanation is preferred. Thanks for any help provided.
I have indexed: DATE, PERIOD_FROM, PERIOD_UNTIL in each table. This speed up the process from 60 seconds to 5 seconds.
The structure of the tables is the following:
CREATE TABLE `apxprice` (
`APX_id` int(11) NOT NULL AUTO_INCREMENT,
`DATE` date DEFAULT NULL,
`PERIOD_FROM` time DEFAULT NULL,
`PERIOD_UNTIL` time DEFAULT NULL,
`PRICE` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`APX_id`)
) ENGINE=MyISAM AUTO_INCREMENT=28728 DEFAULT CHARSET=latin1
CREATE TABLE `imbalanceprice` (
`imbalanceprice_id` int(11) NOT NULL AUTO_INCREMENT,
`DATE` date DEFAULT NULL,
`PTU` tinyint(3) DEFAULT NULL,
`PERIOD_FROM` time DEFAULT NULL,
`PERIOD_UNTIL` time DEFAULT NULL,
`UPWARD_INCIDENT_RESERVE` tinyint(1) DEFAULT NULL,
`DOWNWARD_INCIDENT_RESERVE` tinyint(1) DEFAULT NULL,
`UPWARD_DISPATCH` decimal(10,2) DEFAULT NULL,
`DOWNWARD_DISPATCH` decimal(10,2) DEFAULT NULL,
`INCENTIVE_COMPONENT` decimal(10,2) DEFAULT NULL,
`TAKE_FROM_SYSTEM` decimal(10,2) DEFAULT NULL,
`FEED_INTO_SYSTEM` decimal(10,2) DEFAULT NULL,
`REGULATION_STATE` tinyint(1) DEFAULT NULL,
`HOUR` int(2) DEFAULT NULL,
PRIMARY KEY (`imbalanceprice_id`),
KEY `DATE` (`DATE`,`PERIOD_FROM`,`PERIOD_UNTIL`)
) ENGINE=MyISAM AUTO_INCREMENT=117427 DEFAULT CHARSET=latin
CREATE TABLE `powerload` (
`powerload_id` int(11) NOT NULL AUTO_INCREMENT,
`EAN` varchar(18) DEFAULT NULL,
`DATE` date DEFAULT NULL,
`PERIOD_FROM` time DEFAULT NULL,
`PERIOD_UNTIL` time DEFAULT NULL,
`POWERLOAD` int(11) DEFAULT NULL,
PRIMARY KEY (`powerload_id`)
) ENGINE=MyISAM AUTO_INCREMENT=61039 DEFAULT CHARSET=latin
Now when running this query:
SELECT i.DATE, i.PERIOD_FROM, i.TAKE_FROM_SYSTEM, i.FEED_INTO_SYSTEM,
a.PRICE, p.POWERLOAD, sum(a.PRICE * p.POWERLOAD)
FROM imbalanceprice i, apxprice a, powerload p
WHERE i.DATE = a.DATE
and i.DATE = p.DATE
AND i.PERIOD_FROM >= a.PERIOD_FROM
and i.PERIOD_FROM = p.PERIOD_FROM
AND i.PERIOD_FROM < a.PERIOD_UNTIL
AND i.DATE >= '2018-01-01'
AND i.DATE <= '2018-01-31'
group by i.DATE
I have run the query with explain and get the following result: Select_type, all simple partitions all null possible keys a,p = null i = DATE Key a,p = null i = DATE key_len a,p = null i = 8 ref a,p = null i = timeseries.a.DATE,timeseries.p.PERIOD_FROM rows a = 28727 p = 61038 i = 1 filtered a = 100 p = 10 i = 100 a extra: using where using temporary using filesort b extra: using where using join buffer (block nested loop) c extra: null
Preferably I run a more complicated query for a whole year and group by month for example with all price tables incorporated. However, this would be too slow. I have indexed: DATE, PERIOD_FROM, PERIOD_UNTIL in each table. The calculation result may not be changed, in this case quarter hourly consumption of two meters multiplied by hourly prices.
"Categorically speaking," the first thing you should look at is indexes.
Your clauses such as WHERE i.DATE = a.DATE ... are categorically known as INNER JOINs, and the SQL engine needs to have the ability to locate the matching rows "instantly." (That is to say, without looking through the entire table!)
FYI: Just like any index in real-life – here I would be talking about "library card catalogs" if we still had such a thing – indexes will assist both "equal to" and "less/greater than" queries. The index takes the computer directly to a particular point in the data, whether that's a "hit" or a "near miss."
Finally, the EXPLAIN verb is very useful: put that word in front of your query, and the SQL engine should "explain to you" exactly how it intends to carry out your query. (The SQL engine looks at the structure of the database to make that decision.) Although the EXPLAIN output is ... (heh) ... "not exactly standardized," it will help you to see if the computer thinks that it needs to do something very time-wasting in order to deliver your answer.

Connecting two tables and having count, distinct, and average of time difference

First, I apologize if my question is not correctly organized.
I am trying to run an SQL Query in Java in order to return all the records of time difference. So to explain more:
I have two tables. Table A has the following structure:
Table `A` (
`interaction_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`job_id` int(11) NOT NULL,
`task_id` varchar(250) NOT NULL,
`task_time` datetime DEFAULT NULL,
`task_assessment` float DEFAULT NULL,
)
Table `B` (
`task_id` varchar(250) NOT NULL,
`task_type` varchar(250) DEFAULT NULL,
`task_weight` float DEFAULT NULL,
`task_due` datetime DEFAULT NULL,
`Job_id` int(11) NOT NULL
)
what I need is to get the count(distinct) from table A -and I do that using the interaction_id
and then get their times -using the task_time for each user and i use "WHERE user_id='" + userId (a java parameter).
After that I want to link Table A with Table B using Job_id
so that I can get the difference date (in hour, so i used SELECT TIMEDIFF(Hour, A(task_time), B(task_due)).
Finally, i need to get Average of the time difference.
I believe its a bit complicated when describing. But, I would appreciate your advanced help!
Thank you very much
This query should gather the results that you are expecting:
select count(*) as countLines,
avg(time_to_sec(timediff(A.task_time, B.task_due)) / 3600)
from A
inner join B on A.job_id = B.job_id
where A.user_id = #userId

MySQL booking site: query/db optimization

I have a very bad performance in most of my queries. I've read a lot on stackoverflow, but still have some questions, maybe anyone could help or give me any hints?
Basically, i am working on a booking website, having among others the following tables:
objects
+----+---------+--------+---------+------------+-------------+----------+----------+-------------+------------+-------+-------------+------+-----------+----------+-----+-----+
| id | user_id | status | type_id | privacy_id | location_id | address1 | address2 | object_name | short_name | price | currency_id | size | no_people | min_stay | lat | lng |
+----+---------+--------+---------+------------+-------------+----------+----------+-------------+------------+-------+-------------+------+-----------+----------+-----+-----+
OR in MySQL:
CREATE TABLE IF NOT EXISTS `objects` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'object_id',
`user_id` int(11) unsigned DEFAULT NULL,
`status` tinyint(2) unsigned NOT NULL,
`type_id` tinyint(3) unsigned DEFAULT NULL COMMENT 'type of object, from object_type id',
`privacy_id` tinyint(11) unsigned NOT NULL COMMENT 'id from privacy',
`location_id` int(11) unsigned DEFAULT NULL,
`address1` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`address2` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`object_name` varchar(35) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT 'given name by user',
`short_name` varchar(12) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT 'short name, selected by user',
`price` int(6) unsigned DEFAULT NULL,
`currency_id` tinyint(3) unsigned DEFAULT NULL,
`size` int(4) unsigned DEFAULT NULL COMMENT 'size rounded and in m2',
`no_people` tinyint(3) unsigned DEFAULT NULL COMMENT 'number of people',
`min_stay` tinyint(2) unsigned DEFAULT NULL COMMENT '0=no min stay;else # nights',
`lat` varchar(32) COLLATE utf8_unicode_ci DEFAULT NULL,
`lng` varchar(32) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1451046 ;
reservations
+----+------------+-----------+-----------+---------+--------+
| id | by_user_id | object_id | from_date | to_date | status |
+----+------------+-----------+-----------+---------+--------+
OR in MySQL:
CREATE TABLE IF NOT EXISTS `reservations` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`by_user_id` int(11) NOT NULL COMMENT 'user_id of guest',
`object_id` int(11) NOT NULL COMMENT 'id of object',
`from_date` date NOT NULL COMMENT 'start date of reservation',
`to_date` date NOT NULL COMMENT 'end date of reservation',
`status` int(1) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=890729 ;
There are a few questions:
1 - I have not set any additional key (except primary) - where should I set and which key should I set?
2 - I have read about MyISAM vs InnoDB, the conclusion for me was that MyISAM is faster when it comes to read-only, whereas InnoDB is designed for tables that get UPDATED or INSERTs more frequently. So, currently objects uses MyISAM and reservations InnoDB. Is this a good idea to mix? Is there a better choice?
3 - I need to query those objects that are available in a certain period (between from_date and end_date). I have read (among others) this post on stackoverflow: MySQL select rows where date not between date
However, when I use the suggested solution the query times out before returning any results (so it is really slow):
SELECT DISTINCT o.id FROM objects o LEFT JOIN reservations r ON(r.object_id=o.id) WHERE
COALESCE('2012-04-05' NOT BETWEEN r.from_date AND r.to_date, TRUE)
AND COALESCE('2012-04-08' NOT BETWEEN r.from_date AND r.to_date, TRUE)
AND o.location_id=201
LIMIT 20
What am I doing wrong? What is the best solution for doing such a query? How do other sites do it? Is my database structure not the best for this or is it only the query?
I would have some more questions, but I would be really grateful for getting any help on this! Thank you very much in advance for any hint or suggestion!
It appears you are looking for any "objects" that do NOT have a reservation conflict based on the from/to dates provided. Doing a coalesce() to always include those that are not ever found in reservations is an ok choice, however, being a left-join, I would try left joining where the IS a date found, and ignoring any objects FOUND. Something like
SELECT DISTINCT
o.id
FROM
objects o
LEFT JOIN reservations r
ON o.id = r.object_id
AND ( r.from_date between '2012-04-05' and '2012-04-08'
OR r.to_date between '2012-04-05' and '2012-04-08' )
WHERE
o.location_id = 201
AND r.object_id IS NULL
LIMIT 20
I would ensure an index on the reservations table by (object_id, from_date ) and another (object_id, to_date). By explicitly using the from_date between range, (and to date also), you are specifically looking FOR a reservation occupying this time period. If they ARE found, then don't allow, hence the WHERE clause looking for "r.object_id IS NULL" (ie: nothing is found in conflict within the date range you've provided)
Expanding from my previous answer, and by having two distinct indexes on (id, from date) and (id, to date), you MIGHT get better performance by joining on reservations for each index respectively and expecting NULL in BOTH reservation sets...
SELECT DISTINCT
o.id
FROM
objects o
LEFT JOIN reservations r
ON o.id = r.object_id
AND r.from_date between '2012-04-05' and '2012-04-08'
LEFT JOIN reservations r2
ON o.id = r2.object_id
AND r2.to_date between '2012-04-05' and '2012-04-08'
WHERE
o.location_id = 201
AND r.object_id IS NULL
AND r2.object_id IS NULL
LIMIT 20
I wouldn't mix InnoDB and MyISAM tables, but I would define all the tables as InnoDB (for foreing keys support). Generally all the columns with the _id suffix should be foreign keys refering to appropriate table (object_id => objects etc).
You don't have to define index on foreign key as it is defined automatically (since MySQL 4.1.2), but you can define additional indexes on reservations.from_date and reservations.to_date columns for faster comparison.
I know this is a year old, but if you've tried that solution above, the logic isn't complete. It misses reservations that start before the query start AND end after the query end. Also between doesn't cope with reservations that start and end at the same time.
This worked better for me:
SELECT venues.id
FROM venues LEFT JOIN reservations r
ON venues.id = r.venue_id && (r.date_end >':start' and r.date_start <':end')
WHERE r.venue_id IS NULL
ORDER BY venues.id