I have a database with 1 million records, it's working fine with around to 1.2s response time for simple queries using JOIN, GROUP BY, ORDER, .. It's ok and there are no problems with that. I'm working to simplify my queries using table aliases, but when I execute a simple query with two table aliases or more, the request never ends and MariaDB doesn't respond anymore, I have to restart the service manually.
Whats is going wrong ?
Here it's structure:
CREATE TABLE `values` (
`id` mediumint(11) UNSIGNED NOT NULL,
`date` int(11) NOT NULL DEFAULT '0',
`indexVar` int(11) NOT NULL,
`value` float NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Data:
exemple
Working query:
SELECT
v.date,
v.value
FROM
`values` AS v
WHERE
v.date > 1548460800 AND v.indexVar = 6 OR v.indexVar = 2
expected result
Infinite loading query:
SELECT DISTINCT
v.date,
v1.value,
v2.value
FROM
`values` AS v,
`values` AS v1,
`values` AS v2
WHERE
v.date > 1548460800 AND v1.indexVar = 6 AND v2.indexVar = 2
expected result
You aren't including any join conditions in your query.
If the values table has 1 million rows, then including it twice gives you a result set with 1 million * 1 million = 1 trillion rows. You are applying some conditions, but you're still going to wind up with a huge number of results. (And you're including the values table three times!)
Let's say you have a table with a million rows, and each row is just an integer from 1 to 1 million. If you do select value from values where value > 900000 then you'll get 100,000 rows. But if you say select value from values v, values v2 where v.value > 900000 then for each of 100,000 rows matched by the v.value > 900000 condition you'll get all million rows from v2. Even if you apply the same filter to v2 (i.e., v2.value > 900000) the query will still return 100,000 v2 rows for each row in the original values table--10 billion rows in all.
If date is the primary key of the table, then you must make sure that all the date values in each result row are the same:
select v.date, v1.value, v2.value
from values v, values v1, values v2
where v.date = v1.date and v.date = v2.date
and v1.indexVar = 6 and v2.indexVar = 2
or better yet:
select v.date, v1.value, v2.value
from values v
inner join values v1 on (v1.date = v.date)
inner join values v2 on (v2.date = v.date)
where v1.indexVar = 6 and v2.indexVar = 2
If the primary key is id then just do the same with id. (You said you wanted to align rows based on the date, so not sure which column is most significant.)
You could try using a fake aggregation function and group for reduce the result of a case when for filter
SELECT date
, max(case when v.indexVar = 6 then v.values end) v1_value
, max(case when v.indexVar = 2 then v.values end) v2_value
FROM values
WHERE
date > 1548460800 AND indexVar = 6 OR indexVar = 2
group by date
you should also add a proper composite index
create index idx1 on values ( indexVar, date)
Related
What I like to do:
First: sort columns according to a rule.
Second: add an auto-increment column, so that each row will have correct sequential numbering.
Issues so far:
My code below can only create the auto-increment column at first and only then it will start sorting. This leads to non-sequential numbering within the auto-increment column.
Code:
SET #i:=0;
SELECT
#i:=#i+1 AS autoincr_id,
billings.id AS bill_id,
daily_reports.id AS report_id,
billings.billingDate AS billing_date
FROM lineitems
INNER JOIN billings
ON billings.order_id=lineitems.orderID
INNER JOIN daily_reports
ON billings.`billingDate` BETWEEN DATE_ADD(daily_reports.`referenceDate`, INTERVAL 7 HOUR ) AND DATE_ADD(daily_reports.`referenceDate`, INTERVAL 31 HOUR )
ORDER BY billings.id, autoincr_id
LIMIT 200
see pic that illustrates my issue
Guess using a delivered table to enforce the sorting then use the variable will fix your problem.
You can also use a CROSS JOIN to init the user variable #i without using two queries.
Query
SELECT
#i := #i + 1 AS autoincr_id,
bill_id,
report_id,
billing_date
FROM (
SELECT
billings.id AS bill_id,
daily_reports.id AS report_id,
billings.billingDate AS billing_date
FROM lineitems
INNER JOIN billings
ON billings.order_id=lineitems.orderID
INNER JOIN daily_reports
ON billings.`billingDate` BETWEEN DATE_ADD(daily_reports.`referenceDate`, INTERVAL 7 HOUR ) AND DATE_ADD(daily_reports.`referenceDate`, INTERVAL 31 HOUR )
ORDER BY billings.id
LIMIT 200
)
AS ordered
CROSS JOIN ( SELECT #i := 0 ) AS init_user_param
To create a new table with the rows numbered in the desired order seq_num:
CREATE TABLE new (
seq_num INT UNSIGNED AUTO_INCREMENT,
PRIMARY KEY seq_num
) ENGINE=InnoDB
SELECT ...
ORDER BY ...
Caveat: AUTO_INCREMENT will not have consecutive ids if you are in a multi-Master replication setup. This is because of auto_increment_increment won't be 1.
This question already has an answer here:
How to query this MySQL table in the most performant way?
(1 answer)
Closed 5 years ago.
Suppose I have a MySQL table that was created like this:
CREATE TABLE `my_table` (
`my_id` char(32) NOT NULL,
`my_boolean_field` tinyint(1) DEFAULT NULL,
PRIMARY KEY (`my_id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
This table is very large. It has approximately 10 million rows in it. my_boolean_field's only possible values are 1, 0 and null.
Now separately, I have a list of 500 IDs. Let's call this my_list. They are in a text file separated by newlines. The name of that file is myfile.txt. The entries in this file are too long to include in a single where clause of an SQL statement.
I want to answer the following questions:
How many of the rows in my_table have and ID that is in my_list and my_boolean_field is null.
How many of the rows in my_table have and ID that is in my_list and my_boolean_field==1.
How many of the rows in my_table have and ID that is in my_list and my_boolean_field==0.
How many of the rows in my_table have and ID that is not in my_list and my_boolean_field is null.
How many of the rows in my_table have and ID that is not in my_list and my_boolean_field==1.
How many of the rows in my_table have and ID that is not in my_list and my_boolean_field==0.
What is the most performant way to get the above 6 counts??
Use left join and conditional aggregation to get the desired counts.
select
sum(l.id is not null and t.my_boolean_field is null)
,sum(l.id is not null and t.my_boolean_field =1)
,sum(l.id is not null and t.my_boolean_field =0)
,sum(l.id is null and t.my_boolean_field is null)
,sum(l.id is null and t.my_boolean_field =1)
,sum(l.id is null and t.my_boolean_field =0)
from my_table t
left join my_list l on l.id=t.my_id
The conditions in sum are treated as booleans and return 1 for true and 0 for false.
CREATE TABLE list (
`my_id` char(32) NOT NULL
);
LOAD DATA INFILE ... -- get the 500 rows into `list`.
SELECT
SUM(my_boolean_field IS NULL),
SUM(my_boolean_field = 0),
SUM(my_boolean_field = 1),
SUM(my_boolean_field IS NOT NULL),
SUM(my_boolean_field != 0),
SUM(my_boolean_field != 1)
FROM tbl AS t
JOIN list AS l ON t.my_id = l.my_id;
It will do the 500 lookups, tallying the 6 counters as it goes.
SELECT *
FROM `eBayorders`
WHERE (`OrderIDAmazon` IS NULL
OR `OrderIDAmazon` = "null")
AND `Flag` = "True"
AND `TYPE` = "GROUP"
AND (`Carrier` IS NULL
OR `Carrier` = "null")
AND LEFT(`SKU`, 1) = "B"
AND datediff(now(), `TIME`) < 4
AND (`TrackingInfo` IS NULL
OR `TrackingInfo` = "null")
AND `STATUS` = "PROCESSING"
GROUP BY `Name`,
`SKU`
ORDER BY `TIME` ASC LIMIT 7
I am trying to make sure that none of the names and skus will show up in the same result. I am trying to group by name and then sku, however I ran into the problem where a result showed up that has the same name and different skus, which I dont want to happen. How can I fix this query to make sure that there is always distinct names and skus in the result set?!
For example say I have an Order:
Name: Ben Z, SKU : B000334, oldest
Name: Ben Z, SKU : B000333, second oldest
Name: Will, SKU: B000334, third oldest
Name: John, SKU: B000036, fourth oldest
The query should return only:
Name: Ben Z, SKU : B000334, oldest
Name: John, SKU: B000036, fourth oldest
This is because all of the Names should only have one entry in the set along with SKU.
There are two problems here.
The first is the ANSI standard says that if you have a GROUP BY clause, the only things you can put in the SELECT clause are items listed in GROUP BY or items that use an aggregate function (SUM, COUNT, MAX, etc). The query in your question selects all the columns in the table, even those not in the GROUP BY. If you have multiple records that match a group, the table doesn't know which record to use for those extra columns.
MySql is dumb about this. A sane database server would throw an error and refuse to run that query. Sql Server, Oracle and Postgresql will all do that. MySql will make a guess about which data you want. It's not usually a good idea to let your DB server make guesses about data.
But that doesn't explain the duplicates... just why the bad query runs at all. The reason you have duplicates is that you group on both Name and SKU. So, for example, for Ben Z's record you want to see just the oldest SKU. But when you group on both Name and SKU, you get a seperate group for { Ben Z, B000334 } and { Ben Z, B000333 }... that's two rows for Ben Z, but it's what the query asked for, since SKU is also part of what determines a group.
If you only want to see one record per person, you need to group by just the person fields. This may mean building that part of the query first, to determine the base record set you need, and then JOINing to this original query as part of your full solution.
SELECT T1.*
FROM eBayorders T1
JOIN
( SELECT `Name`,
`SKU`,
max(`TIME`) AS MAX_TIME
FROM eBayorders
WHERE (`OrderIDAmazon` IS NULL OR `OrderIDAmazon` = "null") AND `Flag` = "True" AND `TYPE` = "GROUP" AND (`Carrier` IS NULL OR `Carrier` = "null") AND LEFT(`SKU`, 1) = "B" AND datediff(now(), `TIME`) < 4 AND (`TrackingInfo` IS NULL OR `TrackingInfo` = "null") AND `STATUS` = "PROCESSING"
GROUP BY `Name`,
`SKU`) AS dedupe ON T1.`Name` = dedupe.`Name`
AND T1.`SKU` = dedupe.`SKU`
AND T1.`Time` = dedupe.`MAX_TIME`
ORDER BY `TIME` ASC LIMIT 7
Your database platform should have complained because your original query had items in the select list which were not present in the group by (generally not allowed). The above should resolve it.
An even better option would be the following if your database supported window functions (MySQL doesn't, unfortunately):
SELECT *
FROM
( SELECT *,
row_number() over (partition BY `Name`, `SKU`
ORDER BY `TIME` ASC) AS dedupe_rank
FROM eBayorders
WHERE (`OrderIDAmazon` IS NULL OR `OrderIDAmazon` = "null") AND `Flag` = "True" AND `TYPE` = "GROUP" AND (`Carrier` IS NULL OR `Carrier` = "null") AND LEFT(`SKU`, 1) = "B" AND datediff(now(), `TIME`) < 4 AND (`TrackingInfo` IS NULL OR `TrackingInfo` = "null") AND `STATUS` = "PROCESSING" ) T
WHERE dedupe_rank = 1
ORDER BY T.`TIME` ASC LIMIT 7
You are trying to obtain a result set which doesn't have repeats in either the SKU nor the Name column.
You might have to add a subquery to your query, to accomplish that. The inner query would group by Name, and the Outer query would group by SKU, such that you won't have repeats in either column.
Try this :
SELECT *
FROM
(SELECT *
FROM eBayorders
WHERE (`OrderIDAmazon` IS NULL
OR `OrderIDAmazon` = "null")
AND `Flag` = "True"
AND `TYPE` = "GROUP"
AND (`Carrier` IS NULL
OR `Carrier` = "null")
AND LEFT(`SKU`, 1) = "B"
AND datediff(now(), `TIME`) < 4
AND (`TrackingInfo` IS NULL
OR `TrackingInfo` = "null")
AND `STATUS` = "PROCESSING"
GROUP BY Name)
GROUP BY `SKU`
ORDER BY `TIME` ASC LIMIT 7
With this approach you just filter out rows that do not contain the largest/latest value for TIME.
SELECT SKU, Name
FROM eBayOrders o
WHERE NOT EXISTS (SELECT 0 FROM eBayOrders WHERE Name = o.name and Time > o.Time)
GROUP BY SKU, Name
Note: If two records have exactly the same Name and Time values, you may still end up getting duplicates, because the logic you have specified does not provide any way to break up a tie.
I have the following query:
select date(updated_at) as data, COUNT(id) as numar
from `coupons`
where `user_id` = 5 and `won_by` != 0 and `updated_at` >= '2016-04-01'
group by DAY(updated_at), month(updated_at), year(updated_at)
and the result is this:
2016-04-01- 229
2016-04-03- 30
2016-04-04- 6
2016-04-07- 1
2016-04-08- 1
2016-04-10- 1
What can I do to receive something like this:
2016-04-01- 229
2016-04-02- 0
2016-04-03- 30
2016-04-04- 6
2016-04-05- 0
2016-04-06- 0
2016-04-07- 1
2016-04-08- 1
2016-04-10- 1
The best way that I've found to do this is to simply create (and maintain) a secondary table with a single column, containing all of the dates that you care about. Something like:
CREATE TABLE date_join (
date date not null primary key
);
Then insert records for each date in whatever way is convenient (by hand, if it's a one-off, as part of your daily process, via stored procedure, etc).
At that point, it's simply a left join of date_join and your initial query, with a CASE statement to translate NULLs to 0s:
SELECT dj.date, q.numar
FROM date_join dj
LEFT JOIN (select date(updated_at) as date, COUNT(id) as numar
from `coupons`
where `user_id` = 5 and `won_by` != 0 and `updated_at` >= '2016-04-01'
group by DATE(updated_at)
) q
ON dj.date = q.date
ORDER BY dj.date;
I have a query that contains several conditions to extract data from a table of 5 million rows. A composite index has been built to partially cover some of these conditions to the extend that I am not able to cover the sorting with an index:
SELECT columns FROM Table WHERE conditions='conditions' ORDER BY id DESC LIMIT N;
The id itself is an auto-increment column. The above query can be very slow (4-5s) as filesort is being used. By removing the ORDER BY clause, I am able to speed up the query by up to 4 times. However the data extracted will be mostly old data.
Since post-processing can be carried out to sort the extracted data, I am more interested in extracting data from roughly the latest N rows from the resultset. My question is, is there a way to do something like this:
SELECT columns FROM Table WHERE conditions='conditions' LIMIT -N;
Since I do not really need a sort and I know that there is very high likelihood that the bottom N rows contain newer data.
Here you go. Keep in mind that there should be no problem in using ORDER BY with any indexed columns, including id.
SET #seq:=0;
SELECT `id`
FROM (
SELECT #seq := #seq +1 AS `seq` , `id`
FROM `Table`
WHERE `condition` = 'whatever'
)t1
WHERE t1.seq
BETWEEN (
(
SELECT COUNT( * )
FROM `Table`
WHERE `condition` = 'whatever'
) -49
)
AND (
SELECT COUNT( * )
FROM `Table`
WHERE `condition` = 'whatever'
);
You can replace the "-49" with an expression like: -1 * ($quantity_desired -1);
Also check out this answer as it might help you:
https://stackoverflow.com/a/725439/631764
And here's another one:
https://stackoverflow.com/a/1441164/631764
Grab the last "few" rows using a between:
SELECT columns
FROM Table
WHERE conditions = 'conditions'
AND id between (select max(id) from table) - 50 AND (select max(id) from table)
ORDER BY id
DESC LIMIT N;
This example gets the last 50 rows, but the id index will be used efficiently. The other conditions and ordering will then be only over 50 rows. Should work a treat.