I have a problem in deciding on index. I have a table in mySQL say X with atlas 70 million records. I need to query few fields based on filter like Year = X and Quarter = X and ( user = X or manager = X or ..)
I have an index on year and quarter which is not considered. So if it is considered, then less than 10% of the data is used.
I have index on year, quarter and all the user fields. Even then the index is not considered.
What am I doing wrong?
Related
I am developing a web application and I want to plot a 1-year chart with daily data points.
The x-axis is time (date) and the y-axis is of numeric type.
MySQL version: 8.0 (or higher)
The DDBB must store data points for multiple customers.
For each customer I want to show the last 365 data points (1-year data).
Each data point is a tuple: (date, int). For example: (2022/11/10, 35)
The chart displays data for one single customer at a time.
Every day a new data point is calculated and added to the customer dataset.
Every customer must contain up to 5 years of data points
The number of customers is 1000.
Assuming customer is a foreign key (FK) to the Customers table, I have considered two options for the dataset.
Option A
Primary Key
Customer(FK)
Date
Value
1
Customer 1
Date 1
Val1
2
Customer 1
Date 2
Val2
...
...
...
...
N
Customer 1
Date N
ValN
N+1
Customer 2
Date 1
ValN+1
...
...
...
...
2N
Customer 2
Date N
Val2N
Option B
Use a JSON type for the dataset
Primary Key
Customer(FK)
Dataset
1
Customer 1
Dataset 1
2
Customer 2
Dataset 2
Where each dataset looks like:
((2022/01/01, 35), (2022/01/02, 17), ...., (2022/12/31, 42))
Comments:
My interest is to plot the chart as fast as possible and since data insert/update operations only happen once a day (for every customer), my question is:
Which option is better for data retrieval?
Right now I have around 50 customers and 2-year data history, but I don't know how the DDBB will perform when I increase both, the number of customers and years.
Additionally, I am using a JavaScript plotting library in the frontend so I was wondering whether the JSON data type approach could fit better for this purpose.
CREATE TABLE datapoints (
c_id SMALLINT UNSIGNED NOT NULL,
date DATE NOT NULL,
datapoint SMALLINT/MEDIUMINT/INT [UNSIGNED] /FLOAT NOT NULL,
PRIMARY KEY(c_id, date),
) ENGINE=InnoDB;
Pick the smallest datatype that is appropriate for your values. For example, SMALLINT UNSIGNED takes only 2 bytes and allows non-negative values up to 64K. FLOAT is 4 bytes and has a big range and far more significant digits (about 7) than you can reasonably graph.
The main queries. First various ways to do the daily INSERT:
INSERT INTO datapoints (c_id, date, datapoint)
VALUES(?,?,?);
or
INSERT INTO datapoints (c_id, date, datapoint)
VALUES
(?,?,?),
(?,?,?), ...
(?,?,?); -- 1000 rows batched
or
LOAD DATA ...
Fetching for the graph:
SELECT date, datapoint
FROM datapoints
WHERE c_id = ...
AND date >= CURDATE() - INTERVAL 1 YEAR -- or whatever
ORDER BY date;
1.8M rows (probably under 1GB) is not very big. Still, I recommend the PRIMARY KEY be in that order and not involve an AUTO_INCREMENT. The INSERT(s) will poke into the table at 1000 places once a day. The SELECT (for graphing) will find all the data clustered together -- very fast.
If you will be keeping the data past the year, we can discuss things further. Meanwhile, to purge after 5 years, this will be slow, but it is only once a day:
DELETE FROM datapoints
WHERE date < CURDATE() - INTERVAL 5 YEAR;
I having around 500 excel sheets in .csv format with data captured for my experiment having following columns in place.
Now I need to calculate the following parameters using this data. I have done these in excel, however doing this repeatedly for each excel so many times is difficult, so I want to write an SQL query in PhpmyAdmin will help some time.
Last charecter typed - need to capture last charecter from the column 'CharSq'
Slope (in column J) =(B3-B2)/(A3-A2)
Intercept (in column K) =B2-(A2*(J3))
Angle (in degrees) =MOD(DEGREES(ATAN2((A3-A2),(B3-B2))), 360) -
Index of Difficulty =LOG(((E1/7.1)+1),2)
Speed Value length (if speed value length >3, then mark as 1 or else 0) = =IF(LEN(D3) >= 3, "1","0")
Wrong Sequence (if I3=I2,then mark search time, else actual time) =IF(I3=I2,"Search Time","Actual Time")
Mark charecter into (1,2,3) = =IF(I2="A",1, IF(I2="B",2, IF(I2="C",3, 0)))
I have started with this SQL query
SELECT id, type, charSq, substr(charSq,-1,1) AS TypedChar, xCoordinate, yCoordinate, angle, distance, timestamp, speed FROM table 1 WHERE 1
Need help for the rest of the parameters. Thanks.
Note - I am going to run this in phpMyAdmin SQL
create table test.Table10 select mm.myid,mm.id,mm.type1 as GESTURE,MM.CHARSQ,MM.TYPE2 as TYPEDCHAR,MM.MYCHAR,MM.XCOR,MM.YCOR,MM.SLOPE,l4-(l2*(SLOPE)) as Intercept,
if (ANGLE1<0, (ANGLE1+360) , ANGLE1 ) as ANGLE0,MM.DISTANCE,MM.DW,MM.INDDIFF,MM.TIME1,MM.SPEED,MM.SPDFILT,MM.TIMETYPE from (select c11.*,((YCOR-l4)/(XCOR-l2)) as SLOPE,MOD(DEGREES (ATAN2((YCOR-l4),(XCOR-l2))), 360) as ANGLE1,(YCOR-l4)/(XCOR-l2) ATT,LOG2(((DW)+1)) as INDDIFF,
if(TYPE2=(LAG(TYPE2) OVER (
PARTITION BY MYID
ORDER BY ID)),"Search Time","Actual Time") as TIMETYPE,case when type2="A" then "1"
when type2="B" then 2
when type2="C" then 3
else 0
end as MYCHAR from (SELECT b.*,LEAD(XCOR) OVER (
PARTITION BY charsq) l1,LAG(XCOR) OVER (
PARTITION BY MYID
ORDER BY ID) l2,LEAD(YCOR) OVER (
PARTITION BY MYID) l3,LAG(YCOR) OVER (
PARTITION BY MYID
ORDER BY ID) l4,distance/7.1 as DW,IF(length(speed) >= 3, "1","0") as SPDFILT,RIGHT(charSq,1) as TYPE2 FROM test.table2 b) c11) mm
Some background first. We have a MySQL database with a "live currency" table. We use an API to pull the latest currency values for different currencies, every 5 seconds. The table currently has over 8 million rows.
Structure of the table is as follows:
id (INT 11 PK)
currency (VARCHAR 8)
value (DECIMAL
timestamp (TIMESTAMP)
Now we are trying to use this table to plot the data on a graph. We are going to have various different graphs, e.g: Live, Hourly, Daily, Weekly, Monthly.
I'm having a bit of trouble with the query. Using the Weekly graph as an example, I want to output data from the last 7 days, in 15 minute intervals. So here is how I have attempted it:
SELECT *
FROM currency_data
WHERE ((currency = 'GBP')) AND (timestamp > '2017-09-20 12:29:09')
GROUP BY UNIX_TIMESTAMP(timestamp) DIV (15 * 60)
ORDER BY id DESC
This outputs the data I want, but the query is extremely slow. I have a feeling the GROUP BY clause is the cause.
Also BTW I have switched off the sql mode 'ONLY_FULL_GROUP_BY' as it was forcing me to group by id as well, which was returning incorrect results.
Does anyone know of a better way of doing this query which will reduce the time taken to run the query?
You may want to create summary tables for each of the graphs you want to do.
If your data really is coming every 5 seconds, you can attempt something like:
SELECT *
FROM currency_data cd
WHERE currency = 'GBP' AND
timestamp > '2017-09-20 12:29:09' AND
UNIX_TIMESTAMP(timestamp) MOD (15 * 60) BETWEEN 0 AND 4
ORDER BY id DESC;
For both this query and your original query, you want an index on currency_data(currency, timestamp, id).
For the purposes of my question, I have a database in a MySQL server with info on many taxi rides (it is comprised of two tables, history_trips and trip_info).
In history_trips, each row's useful data is comprised of a unique alphanumeric ID, ride_id, the name of the rider, rider, and the time the ride ended, finishTime as a Y-m-d string.
In trip_info, each row's useful data similarly contains ride_id and rider, but also contains an integer, value (calculated in the back end from other data).
What I need to do is create a query that can find the average of all the maximum 'values' from all riders in a given time period. The riders included in this average are only considered if they completed less than X (let's say 3) rides within the aforementioned time period.
So far, I have a query that creates a grouped table containing the name of the rider, the finishTime of their highest 'value' ride, the value of said ride, and the number of rides, num_rides, they have taken in that time period. The AVG(b.value) column, however, gives me the same values as b.value, which is unexpected. I would like to find some way to return the average of the b.value column.
SELECT a.rider, a.finishTime, b.value, AVG(b.value), COUNT(a.rider) as num_rides
FROM history_trips as a, trip_info as b
WHERE a.finishTime > 'arbitrary_start_date_str' and a.ride_id = b.ride_id
and b.value = (SELECT MAX(value)
from trip_info where rider = b.rider and ride_id = b.ride_id)
GROUP BY a.rider
HAVING COUNT(a.rider) < 3
I am a novice in SQL but have read on some other forums that when using the AVG function on a value you must also GROUP BY that value. I was wondering if there is a way around that or if I am thinking of this problem incorrectly. Thanks in advance for any advice / solutions you might have!
The following worked for me:
SELECT AVG(ridergroups.maxvalues) avgmaxvalues FROM
(SELECT MAX(trip_info.value) maxvalues FROM trip_info
INNER JOIN history_trips
ON trip_info.rideid = history_trips.ride_id
WHERE history_trips.finishTime > '2010-06-20'
GROUP BY trip_info.rider
HAVING COUNT(trip_info.rider) < 3) ridergroups;
The subquery groups the maximum values by rider after filtering by date and rider count. The containing query calculates the average of the maximum values.
I need to build the backend for a chart, which needs to have a fixed amount of data points, let's assume 10 for this example. I need to get all entries in a table, have them split into 10 chunks (by their respective date column) and show how many entries there were between each date interval.
I have managed to do kind of the opposite (I can get the entries for a fixed interval, and variable number of data points), but now I need a fixed number of data points and variable date interval.
What I was thinking (which didn't work) is to get the difference between the min and max date from the table, divide it by 10 (number of data points) and have each row's date column divided by that result and also grouped by it. I either screwed up the query somewhere or my logic is faulty, because it didn't work.
Something along these lines:
SELECT (UNIX_TIMESTAMP(created_at) DIV (SELECT (MAX(UNIX_TIMESTAMP(created_at)) - MIN(UNIX_TIMESTAMP(created_at))) / 10 FROM user)) x FROM user GROUP BY x;