Basic MySQL query is pretty slow. Am I indexing my tables properly? - mysql

I'm currently in the process of optimizing a very large MySQL database, around which I'm building a web-based query interface.
The database will have two tables. The first table is already optimized (I believe), and contains information regarding 950 meteorological data observation stations across the US:
Description for: stations (950 records)
+-----------+------------+--------+-------+---------+----------------+
|Field |Type | NULL |KEY | Default | Extra |
+-----------+------------+--------+-------+---------+----------------+
|id |INT |NO |PRI |NULL |auto_increment |
|stationID |char(4) |NO |PRI |NULL | |
|name |varchar(16) |YES | |NULL | |
|state |char(2) |YES |MUL |NULL | |
|lat |float(6,2) |YES | |NULL | |
|lon |float(6,2) |YES | |NULL | |
|elev |INT |YES | |NULL | |
+-----------+------------+--------+-------+---------+----------------+
The other table contains observations collected at these stations from 2014 through 2017 (constructed, not optimized):
Description for: metar_records (359786049 records)
+-----------+------------+--------+-------+---------+----------------+
|Field |Type | NULL |KEY | Default | Extra |
+-----------+------------+--------+-------+---------+----------------+
|auto_id |INT |NO |PRI |NULL |auto_increment |
|stationID |char(4) |NO |MUL |0 | |
|zdatetime |datetime |NO | |NULL | |
|ldatetime |datetime |NO | |NULL | |
|temp |tinyint(4) |YES | |NULL | |
|dew |tinyint(4) |YES | |NULL | |
|wspd |tinyint(3) |YES | |NULL | #unsigned |
|wdir |tinyint(3) |YES | |NULL | #unsigned |
|wgust |tinyint(3) |YES | |NULL | #unsigned |
|VRB |char(3) |YES | |NULL | |
+-----------+------------+--------+-------+---------+----------------+
where stationID is the field upon which the two tables are related. The metar_records has a unique index on ('stationID', 'zdatetime'). A list of metar_records table indexes:
+-------------+--------+---------+------------+-----------+-----------+----------+
|Table |Non_UNQ |Key_name |Seq_in_index|Column_name|Cardinality|Index_type|
+-------------+--------+---------+------------+-----------+-----------+----------+
|metar_records|0 |PRIMARY |1 |auto_id |358374698 |BTREE |
|metar_records|0 |sz_date |1 |stationID |820079 |BTREE |
|metar_records|0 |sz_date |2 |zdatetime |358374698 |BTREE |
|metar_records|1 |stationID|1 |stationID |598288 |BTREE |
+-------------+--------+---------+------------+-----------+-----------+----------+
Here's where I'm really confused: I also have a test table (called metar_test), which is identical to metar_records aside from having no auto_increment field, and has no indexes whatsoever. Execution of SELECT COUNT(*) FROM metar_test; lasts 0.02 seconds at most, whereas SELECT COUNT(*) FROM metar_records; takes roughly 1 minute and 18 seconds to complete.
I understand that having a table this large will result in some long query times, but metar_records is only 3.36 times larger than metar_test -- why is there such a large discrepancy between the SELECT COUNT(*) ... queries for the two tables? I'm not particularly well versed in data storage, but this difference seems unexpectedly large to me.
How can I improve my indexing to optimize the large table size? Is it possible to reduce the query duration from here?

You can try:
select count(stationID)
from metar_records
This will make the query optimizer to use the index of stationID and therefore reading less data as the count(*) which reads through full data.

I would reconstruct your tables in such way.
Stations.
Id as auto inc.
StationId char (4) unique
Rest...
Metar_records
Id as auto inc
StationId referencing stations.id
Rest...
This way your key length are way smaller and numeric. Will boost up your performance.

You probably have the "Query cache" turned on. That makes it so that running exactly the same query a second time is very fast. To properly time a query, do this twice and take the second timing:
SELECT SQL_NO_CACHE ...
COUNT(*) is the usual pattern for counting rows. COUNT(col) is slower because it needs to check each col for being NOT NULL.
You have 3 INDEXes on your big table; you need only one:
PRIMARY KEY(stationID, zdatetime)
And, by clustering that way, several likely queries will run faster.
Please use SHOW CREATE TABLE; it is more descriptive than DESCRIBE.
You should use ENGINE=InnoDB, not ENGINE=MyISAM (see SHOW CREATE TABLE).
SELECT COUNT(*) ... is not a very common query; you should not take much stock in how fast it runs.
PARTITIONing is not likely to help performance. Let's see more of your queries -- to double check my claim. MySQL has no parallel processing, even for PARTITIONed tables.
Also toss the id AUTO_INCREMENT from the Stations table; instead, have PRIMARY KEY(stationID).

Related

How to do order by desc when the time is same in mysql

I have a log table which has column time(DATETIME). I wanted to sort this table by time.
It is working perfectly when there are distinct time values.
But when there are same time in the column, that rows wont sort.
This table values are inserted by automated process
|user_id| type | title | time |
--------------------------------------------------------
|150 | add_note | Note added | 2018-06-13 08:30:10 |
|150 | send_email | Email sent | 2018-06-13 08:30:10 |
|150 | add_tag | Tag added | 2018-06-13 08:30:10 |
|150 | add_note | Note added | 2018-06-13 08:30:10 |
This is the query I'm using to sort data
SELECT * FROM log ORDER BY time DESC
I want to get rows in reverse order if the time is same.
Please help.
Add a sequence column to your table, add that column to the order by.

Rewriting a select query

I have a rather simple (I think) question at hand. The example tables and the result I need are provided below (in reality those tables containt much more columns and data, I jest left what is relevant). There is also the query which returns exactly what I need. However, I dont like rather crude way in which it works (I dont like subqueries in general). The question is, how can I rewrite the query so it will automatically react to more columns appearing in TABLE2 in the future? Right now if the "z" column would be added to TABLE2, I need to modify each query in the code and add one more relevant subquery. I just want the select to read the entire content of TABLE2 and translate the id numbers to corresponding strings from TABLE1.
TABLE1
-----------------
id |x |
-----------------
567 |AAA |
345 |BBB |
341 |CCC |
827 |DDD |
632 |EEE |
503 |FFF |
945 |GGG |
234 |HHH |
764 |III |
123 |JJJ |
-----------------
TABLE2
-------------------------
id |x |y |
-------------------------
1 |123 |341 |
2 |567 |632 |
3 |345 |945 |
4 |764 |503 |
5 |234 |827 |
-------------------------
THE RESULT I NEED
-----------------
A |B |
-----------------
JJJ |CCC |
AAA |EEE |
BBB |GGG |
III |FFF |
HHH |DDD |
-----------------
The query I have
SELECT
(SELECT `x` FROM `TABLE1` WHERE `TABLE2`.`x` LIKE `TABLE1`.`id` LIMIT 1) as A,
(SELECT `x` FROM `TABLE1` WHERE `TABLE2`.`y` LIKE `TABLE1`.`id` LIMIT 1) as B
FROM `TABLE2` ORDER BY `id` DESC;
You might want to restructure your data model:
Instead of:
-------------------------
id |x |y |
-------------------------
1 |123 |341 |
2 |567 |632 |
3 |345 |945 |
4 |764 |503 |
5 |234 |827 |
-------------------------
You would have:
----------------------
col_id |col |
----------------------
1 |x |
2 |y |
----------------------
---------------------------
id |col_id |col_val |
---------------------------
1 |1 |123 |
1 |2 |341 |
2 |1 |567 |
2 |2 |632 |
etc
---------------------------
Probably not worth the hassle (you would effectively need to pivot when you're accessing multiple columns at a time) but it would allow you to do the query that you want across all current and future columns.
You can't do that with a plain select.
What you can do is creating a view with the translated values. You still have to modify the view when the original table is changed but your queries don't have to.
You can use dynamic sql statements, but still you can use the dynamic statements only if you are sure that table 2 will have the columns of same type like x and y(Apart from id).
Let me know if you are not sure how to write it.
All the best.

MySQL how to rank objects by similarity of multiple property rows

Hello all and a Happy New Year
SITUATION:
I have some tables in MySQL db:
Scores:
(Unique ID, unique (objectID, metricID))
| ID | ObjectID | MetricID | Score |
|--------+----------+----------+----------|
|0 | 1 | 7 | 0 |
|1 | 5 | 3 | 13 |
|2 | 7 | 2 | 78 |
|3 | 7 | 3 | 22 |
|.....
|--------+----------+----------+----------|
Objects:
(unique ID, unique ObjectName)
| ID | ObjectName |
|--------+------------|
|0 | Ook |
|1 | Oop |
|2 | Oww |
|3 | Oat |
|.....
|--------+------------|
Metrics:
(unique ID, unique MetricName)
| ID | MetricName |
|--------+------------|
|0 | Moo |
|1 | Mar |
|2 | Mee |
|3 | Meep |
|.....
|--------+------------|
For a given object ID:
There will be a number of scores between '0' and 'one per metric'
REQUIREMENT:
For a given ObjectID, I want to return a sorted list based on the following criteria:
Returned rows ranked in order of similarity to the provided object
Returned rows not to include provided object
(this is the hard bit I think) Order of similarity is determined by an object's "score distance" from the provided object based on the numeric offset/difference of its score from the provided object's score for any metric for which there is an entry for both the provided and the currently-examined objects
Contains objectID, Object name, score difference (or something similar)
PROBLEM STATEMENT:
I don't know the correct SQL syntax to use for this, and my experiments so far have failed. I would like to do as much of this work in the DB as possible and have little or none of this work done in nasty for-loops in the code or similar.
ADDITIONAL NON-FUNCTIONALS
At present there are only 200 rows in the Scores table. My calculations show that ultimately there may be up to around 2,000,000 rows, but probably no more.
The Objects table will only ever have up to around 5000 rows
The Metrics table will only ever have up to around 400 rows
Here's an approach to order objects based on their similarity to object 1:
select other.ObjectID
, avg(abs(target.Score - other.Score)) as Delta
from Scores target
join Scores other
on other.MetricID = target.MetricID
and other.ObjectID <> target.ObjectID
where target.ObjectID = 1
group by
other.ObjectID
order by
Delta
Similarity is defined as the average difference in common metrics. Objects that do not share at least one metric with object 1 are not listed. If this answer makes wrong assumptions, feel free to clarify your question :)
Live example at SQL Fiddle.

SQL: Compare if column values in two tables are equal

I'm working on mysql and have two tables with the same schema:
preTrial
|id|accusedId|articleid|
------------------------
|1 | 1 | 1 |
|2 | 1 | 2 |
|3 | 1 | 3 |
|4 | 2 | 1 |
|5 | 2 | 2 |
trial
|id|accusedId|articleid|
------------------------
|1 | 1 | 1 |
|2 | 1 | 2 |
|3 | 2 | 1 |
|4 | 2 | 2 |
I want to get those accusedIds where all the articleIds of the first and the second tables are equal.
The above example should only return the accusedId 2, cause for accusedId 1 there is no articleId 3 in the second table.
I hope you understand what i mean. I'm currently writing my thesis in law, and the the time i was into sql is long gone by. Of course i already did some research, and tried several joins, but i was not able to find a solution. Hopefully you can help me.
Try something like this:
select a.accusedId , sum(a.accusedid) as cnt_a, sum(coalesce(b.accusedId, 0)) as cnt_b
from a left join b on a.accusedId = b.accusedId and a.articleId = b.articleId
group by accusedId
having cnt_a = cnt_b
I haven't even run that, so it might be a little off, but give it a lash. What it's doing is returning zeroes for a row in a not matched by b, so the HAVING clause will filter your grouped results to those where the article counts are equal.

SSIS MDX Query Problem

Hallo at all!
I have a little Problem with my Query in MDX.
I try to query up the Damage Repair Types from my Cube. Next i explain my Dimension and the Fact Table:
Dimension: Demage Repair Type
RepairTypeKey | Name | RepairTypeAlternateKey | RepairSubTypeAlternateKey | SubName 0 |Unknown |0 | NULL | NULL
1 |Repair |1 |1 | 1 Boil
2 |Replacement |2 |NULL | NULL
3 |Repair |1 |2 | 2 Boils
4 |Repair |1 |3 | 3 Boils
So I have in my Fact Table "CLaimCosts" for every Claim one RepairTypeKey. I Fill the Tables and design a Cube. The Dimension have a Hirarchy with RepairType and SubRepairType. I Process the Cube and it works Fine:
Demage Repair Type
Hirarchy
Members
All
Replacement
Repair
1 Boil
2 Boils
3 Boils
Unknown
Now I Create a Query with MDX:
select
{
[Measures].[Claim Count],
[Measures].[Claim Cost Position Count],
[Measures].[Claim Cost Original],
[Measures].[Claim Cost Original Average],
[Measures].[Claim Cost Possible Savings],
[Measures].[Claim Cost Possible Savings Average],
[Measures].[Claim Cost Possible Savings Percentage]
} on 0,
NON EMPTY{
NonEmpty([Damage Repair Type].[Hierarchy].Allmembers, ([Measures].[Claim Count]))
} on 1
from
Cube
where
(
({StrToMember(#DateFrom) : StrToMember(#DateTo)})
,([Claim Document Type].[Document Type].&[4])
)
Now i try to Run the Query and it Works but i have to much Rows Shown:
Demage Repair Type | Demage Repair Sub Type | Claim Count | ....
NULL |NULL | 200000
Replacement | NULL | 150000
Repair | NULL | 45000
Repair | 1 Boil | 10000
Repair | 2 Boil | 15000
Repair | 3 Boil | 19000
Unknown |NULL | 1000
My Problem are the frist Row (Sum) and the third Row (Sum)! I don't need this Rows but I don't know how to Filter them! I don't need this Sums because i have the Childs with the right Counts!
How I can Filter this? Please help me. It doesn't work!
Sorry for my bad English and Thank you!
Alex
NonEmpty([Damage Repair Type].[Hierarchy].Allmembers, ([Measures].[Claim Count]))
You can use:
NonEmpty([Damage Repair Type].[Hierarchy].Levels(2).Members, [Measures].[Claim Count])
This way we exclude the All members. Also, when you use the level members (e.g. [dim].[hier].[lvl].Members) instead of the hierarchy members (e.g. [dim].[hier].members) you don't get the aggregate members - e.g. the All member which is commonly present in all hierarchies other than non-aggregatable attribute hierarchies.