I have a report table that looks similar to this
reports
inspection_type | inspection_number
berries | 111
citrus | 222
grapes | 333
inspection_type in my case is the name of the other table I would like
to SELECT * from where the inspection_number equals report_key on
that associated table.
{fruit}
row | report_key | etc....
value | 111 | value
value | 222 | value
The issue is I do not know how to query inspection_type to get the table name
to query the value. Does that make any sense?
I tried this here, but even I know that it's glaringly wrong:
SELECT inpection_type, inspection_number
FROM reports rpt
ON rpt.inspection_number = report_key
(SELECT * FROM inspection_type WHERE status < '2')
WHERE rpt.status < '2'
ORDER BY rpt.inspection_number DESC
Could a SQL guru tell me the best way to do this?
Since it is not possible to have a variable for a table name directly in TSQL, you will have to dynamically construct the TSQL.
Variable table names in Stored Procedures
You can't really do what you are aiming to in SQL alone, you'll need to either mess around in another language, or (and this is the preferred solution) restructure the database i.e. (sorry for the meta-code)
// Comes in where your existing `reports` table is
inspections (
inspection_id INT UNSIGNED NOT NULL AI,
inspection_type_id INT UNSIGNED NOT NULL (links to inspection_types.inspection_type_id)
.... other rows ....
)
// New table to normalise the inspection types
inspection_types (
inspection_type_id INT UNSIGNED NOT NULL AI,
type_name VARCHAR NOT NULL
.... other rows ....
)
// Normalised table to replace each {fruit} table
inspection_data (
inspection_data_id INT UNSIGNED NOT NULL AI,
inspection_id INT UNSIGNED NOT NULL (links to inspections.inspection_id)
.... other rows ....
)
Then your query would be simply
SELECT *
FROM inspections
INNER JOIN inspection_types
ON inspection_types.inspection_type_id = inspections.inspection_type_id
INNER JOIN inspection_data
ON inspection_data.inspection_id = inspections.inspection_id
The brief overview above is quite vague because your existing table data hasn't really been specified, but the general principle is sound. It wouldn't even take much to migrate data out of your existing structure, but when it's done it'll give you far cleaner queries and allow you to actually get the data you're after out more easily
Related
I have a MySQL table that looks like:
CREATE TABLE `messages` (
`id` int NOT NULL AUTO_INCREMENT,
`from` varchar(12) NOT NULL,
`to` varchar(12) NOT NULL,
`message` text,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=66 DEFAULT CHARSET=latin1;
So each time a message is sent or received, it is stored as:
# id from, to, message, timestamp
'65', '+1231303****', '+1833935****', 'Showtimes', '2022-01-26 09:26:10'
'64', '+1833935****', '+1231303****', 'Showtimes are: 12:30 someresponse', '2022-01-26 09:26:10'
I want to create a index of these conversation threats, and need to be able to execute a query that selects the conversation based on it either being addressed from or to a specific number, and returns the number of rows that match either, while at the same time, returning the last message that was sent. So basically I want it to return:
recipient (the other phone number, not the one I'm using to look up ),count(messages),lastmessage
Individually, I can query this all separately, since most of my experience here resolves around using PHP to untangle the data I'm going after. What I'm curious about is a single query that lets MySQL handle this, rather than submitting multiple queries to the database server. I figure this may be a good time to approach in, since several projects I've coded have ran out of memory to process before with so many queries between so many loops.
Apologies in advance if this has been answered somewhere else already. I searched extensively for an answer, but the few results I found used a completely different table structure than I am using, and the MySQL query I was able to fumble together didn't work. I stand next to my work as a PHP programmer, but my MySQL needs some work. Hence I'm here!
If a conversation thread can be defined by a unique combination of from and to then creating a compound key where the first node is the lower of the two then all the conversations in the thread can be established , however selecting on from OR two means many conversation threads may be selected. for example
DROP TABLE IF EXISTS T;
CREATE TABLE T(ID INT AUTO_INCREMENT PRIMARY KEY, FROMNO INT, TONO INT);
INSERT INTO T(FROMNO,TONO) VALUES
(1,2),(2,1),
(1,3),(4,1),(1,2);
WITH CTE AS
(SELECT * ,
CASE WHEN FROMNO < TONO THEN CONCAT(FROMNO,TONO)
ELSE CONCAT(TONO,FROMNO)
END AS CVAL
FROM T
WHERE FROMNO = 1 OR TONO = 1
),
CTE1 AS
(SELECT *,
DENSE_RANK() OVER (ORDER BY CVAL) DR
FROM CTE
),
CTE2 AS
(SELECT CVAL,COUNT(*) conversations,MAX(ID) MAXID
FROM CTE1
GROUP BY CVAL
)
SELECT CTE2.CVAL,CTE2.THINGS,CTE2.MAXID,T.ID
FROM CTE2
JOIN T ON T.ID = CTE2.MAXID;
Yields
+------+---------------+-------+----+
| CVAL | conversations | MAXID | ID |
+------+---------------+-------+----+
| 13 | 1 | 3 | 3 |
| 14 | 1 | 4 | 4 |
| 12 | 3 | 5 | 5 |
+------+---------------+-------+----+
3 rows in set (0.002 sec)
If I compare
explain select * from Foo where find_in_set(id,'2,3');
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | User | ALL | NULL | NULL | NULL | NULL | 4 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
with this one
explain select * from Foo where id in (2,3);
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | User | range | PRIMARY | PRIMARY | 8 | NULL | 2 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
It is apparent that FIND_IN_SET does not exploit the primary key.
I want to put a query such as the above into a stored procedure, with the comma-separated string as an argument.
Is there any way to make the query behave like the second version, in which the index is used, but without knowing the content of the id set at the time the query is written?
In reference to your comment:
#MarcB the database is normalized, the CSV string comes from the UI.
"Get me data for the following people: 101,202,303"
This answer has a narrow focus on just those numbers separated by a comma. Because, as it turns out, you were not even talking about FIND_IN_SET afterall.
Yes, you can achieve what you want. You create a prepared statement that accepts a string as a parameter like in this Recent Answer of mine. In that answer, look at the second block that shows the CREATE PROCEDURE and its 2nd parameter which accepts a string like (1,2,3). I will get back to this point in a moment.
Not that you need to see it #spraff but others might. The mission is to get the type != ALL, and possible_keys and keys of Explain to not show null, as you showed in your second block. For a general reading on the topic, see the article Understanding EXPLAIN’s Output and the MySQL Manual Page entitled EXPLAIN Extra Information.
Now, back to the (1,2,3) reference above. We know from your comment, and your second Explain output in your question that it hits the following desired conditions:
type = range (and in particular not ALL) . See the docs above on this.
key is not null
These are precisely the conditions you have in your second Explain output, and the output that can be seen with the following query:
explain
select * from ratings where id in (2331425, 430364, 4557546, 2696638, 4510549, 362832, 2382514, 1424071, 4672814, 291859, 1540849, 2128670, 1320803, 218006, 1827619, 3784075, 4037520, 4135373, ... use your imagination ..., ..., 4369522, 3312835);
where I have 999 values in that in clause list. That is an sample from this answer of mine in Appendix D than generates such a random string of csv, surrounded by open and close parentheses.
And note the following Explain output for that 999 element in clause below:
Objective achieved. You achieve this with a stored proc similar to the one I mentioned before in this link using a PREPARED STATEMENT (and those things use concat() followed by an EXECUTE).
The index is used, a Tablescan (meaning bad) is not experienced. Further readings are The range Join Type, any reference you can find on MySQL's Cost-Based Optimizer (CBO), this answer from vladr though dated, with a eye on the ANALYZE TABLE part, in particular after significant data changes. Note that ANALYZE can take a significant amount of time to run on ultra-huge datasets. Sometimes many many hours.
Sql Injection Attacks:
Use of strings passed to Stored Procedures are an attack vector for SQL Injection attacks. Precautions must be in place to prevent them when using user-supplied data. If your routine is applied against your own id's generated by your system, then you are safe. Note, however, that 2nd level SQL Injection attacks occur when data was put in place by routines that did not sanitize that data in a prior insert or update. Attacks put in place prior via data and used later (a sort of time bomb).
So this answer is Finished for the most part.
The below is a view of the same table with a minor modification to it to show what a dreaded Tablescan would look like in the prior query (but against a non-indexed column called thing).
Take a look at our current table definition:
CREATE TABLE `ratings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`thing` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5046214 DEFAULT CHARSET=utf8;
select min(id), max(id),count(*) as theCount from ratings;
+---------+---------+----------+
| min(id) | max(id) | theCount |
+---------+---------+----------+
| 1 | 5046213 | 4718592 |
+---------+---------+----------+
Note that the column thing was a nullable int column before.
update ratings set thing=id where id<1000000;
update ratings set thing=id where id>=1000000 and id<2000000;
update ratings set thing=id where id>=2000000 and id<3000000;
update ratings set thing=id where id>=3000000 and id<4000000;
update ratings set thing=id where id>=4000000 and id<5100000;
select count(*) from ratings where thing!=id;
-- 0 rows
ALTER TABLE ratings MODIFY COLUMN thing int not null;
-- current table definition (after above ALTER):
CREATE TABLE `ratings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`thing` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5046214 DEFAULT CHARSET=utf8;
And then the Explain that is a Tablescan (against column thing):
You can use following technique to use primary index.
Prerequisities:
You know the maximum amount of items in comma separated string and it is not large
Description:
we convert comma separated string into temporary table
inner join to the temporary table
select #ids:='1,2,3,5,11,4', #maxCnt:=15;
SELECT *
FROM foo
INNER JOIN (
SELECT * FROM (SELECT #n:=#n+1 AS n FROM foo INNER JOIN (SELECT #n:=0) AS _a) AS _a WHERE _a.n <= #maxCnt
) AS k ON k.n <= LENGTH(#ids) - LENGTH(replace(#ids, ',','')) + 1
AND id = SUBSTRING_INDEX(SUBSTRING_INDEX(#ids, ',', k.n), ',', -1)
This is a trick to extract nth value in comma separated list:
SUBSTRING_INDEX(SUBSTRING_INDEX(#ids, ',', k.n), ',', -1)
Notes: #ids can be anything including other column from other or the same table.
I got a table structure in my HTML
Needs to populate the data as rows and columns, some columns having a value "1"
Fixed 10 Rows and 10 Columns and there are multiple tables
So I just created a database like the following
ga_id (pk) | A1 | A2 | A3 ......
-------------+-------+------+-------------
125 | 1 | 0 | 0 ..........
-------------+-------+------+--------------
126 | 0 | 1 | 1 ...
I got the following questions
For achieving the same, is my approach is correct ?
I need to check whether a column or a row is fully occupied with a value "1"
for eg:-
Case block D4 then I need to check D1,D2,D3..... D10 having the same value ie 1
And A4,B4,C4,D4.......J4 values having the same value
Hope my question is clear,
By way of example, a normalised environment might look something like this:
CREATE TABLE my_table
(id INT NOT NULL
,x INT NOT NULL
,y CHAR(1) NOT NULL
,val INT NOT NULL
,PRIMARY KEY(id,x,y)
);
INSERT INTO my_table VALUES
(101,2,'B',1),
(101,2,'I',1),
(101,4,'D',1),
(101,5,'I',1),
(101,7,'D',1),
(101,7,'H',1),
(101,8,'G',1);
As a comment, the "correct approach" is whatever approach solves your issue. While normalization and the third normal form are concepts that are battle tested and definitely worth mastering, if the current structure solves your particular issue, go with it.
A possible, normalized DB structure would be:
Table columns: column_id, name
Table rows: row_id, name
Table tables: table_id, name
Table table_rows_columns: table_id, row_id,
column_id, value
I've stumbled on a previously asked and answered question here:
How to use comparison operator for numeric string in MySQL?
I absolutely agree with the answer being the best mentioned. But it left me with a question myself while I was trying to create my own answer. I was trying to select the first number and convert it to an integer. Next I wanted to compare that integer with a number (3 in case of the question).
This is the query I've created:
SELECT experience,
CONVERT(SUBSTRING_INDEX(experience,'-',1), UNSIGNED INTEGER) AS num
FROM employee
WHERE #num >= 3;
For the sake of simplicity, asume the data inside experience is: 4-8
The query doesn't return any errors. But it doesn't return the data either. I know it's possible to compare the data inside a column with a user defined variable. But is it possible to compare data (the integer in this case) with the variable like I'm trying to do?
This is purely out of curiousity and to learn something.
Yes, a derived table will do. The inner select block below is a derived table. And every derived table needs a name. In my case, xDerived.
The strategy is to let the derived table cleanse the use of the column name. Coming out of the derived chunk is a clean column named num which the outer select is free to use.
Schema
create table employee
( id int auto_increment primary key,
experience varchar(20) not null
);
-- truncate table employee;
insert employee(experience) values
('4-5'),('7-1'),('4-1'),('6-5'),('8-6'),('5-9'),('10-4');
Query
select id,experience,num
from
( SELECT id,experience,
CONVERT(SUBSTRING_INDEX(experience,'-',1),UNSIGNED INTEGER) AS num
FROM employee
) xDerived
where num>=7;
Results
+----+------------+------+
| id | experience | num |
+----+------------+------+
| 2 | 7-1 | 7 |
| 5 | 8-6 | 8 |
| 7 | 10-4 | 10 |
+----+------------+------+
Note, your #num concept was faulty but hopefully I interpreted what you meant to do above.
Also, I went with 7 not 3 because all your sample data would have returned, and I wanted to show you it would work.
The AS num instruction names the result of convert as num, not a variable named #num.
You could repeat the convert
SELECT experience,CONVERT(SUBSTRING_INDEX(experience,'-',1),UNSIGNED INTEGER)
FROM employee
WHERE CONVERT(SUBSTRING_INDEX(experience,'-',1),UNSIGNED INTEGER) >= 3;
Or use a partial (derived) table (only one convert)
SELECT experience,num
FROM (select experience,
CONVERT(SUBSTRING_INDEX(experience,'-',1),UNSIGNED INTEGER) as num
FROM employee) as partialtable WHERE num>=3;
Much simpler. (Or at least much shorter.) This will work for the data as described, namely "number, -, other stuff".
SELECT experience,
0+experience AS 'FirstPart'
FROM employee
WHERE 0+experience >= 3
Why? 0+string is parsed as "convert the string to a number, then add it to 0". Converting a string will extract the digits up to the first non-digit, then convert that as numeric.
My objective
I am trying to retrieve multiple random rows that contain only unique userid but for the type column to be random - type can only be 0 or 1. The table in question will contain less than 1,000 rows at any given time.
My table
CREATE TABLE tbl_message_queue (
userid bigint(20) NOT NULL,
messageid varchar(20) NOT NULL,
`type` int(1) NOT NULL,
PRIMARY KEY (userid,messageid,`type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Sample data
userid | messageid | type
---------------------------------------------------
4353453 | 518423942 | 0
4353453 | 518423942 | 1
2342934 | 748475435 | 0
2342934 | 748475435 | 1
7657529 | 821516543 | 0
7657529 | 821516543 | 1
0823546 | 932843285 | 0
0823546 | 932843285 | 1
What to rule out
Using ORDER BY RAND() isn't feasible as over at least 18,000 of these types of queries are executed by applications at any given moment and are causing high load. Using SELECT DISTINCT or GROUP BY is (obviously) more efficient and will always pick unique userid but type will always equal to 0 with an acceptable load.
The common method is to create an id column but I'm looking for an alternative way only. The group primary key cannot change as it is required and deeply integrated into our application, however the structure of each column can be altered.
Thanks.
My understanding of your question is that for each userid you have two entries, but want to extract only one, at random.
To achieve this, you ought to generate a random value between 0 and 1 for each unique userid, and then JOIN this list with the starting list:
SELECT a.* FROM tbl_message_queue AS a
JOIN ( SELECT userid, FLOOR(2*RAND()) AS type
FROM tbl_message_queue GROUP BY userid ) AS b
ON ( a.userid = b.userid AND a.type = b.type );
But if an ORDER BY RAND() does not work for you, maybe we should compromise.
In the above sequence, any two userids will be uncorrelated -- i.e., the fact that user A gets type 0 tells you nothing about what user B will turn up with.
Depending on the use case, a less random (but "apparently random") sequence could be obtained with two queries:
SELECT #X := FLOOR(2*RAND()), #Y := POW(2,FLOOR(2+14*RAND()))-1;
SELECT * FROM tbl_message_queue WHERE (((userid % #Y) & 1) XOR type XOR #X);
This way, you can get what seems a random extraction. What really happens is that the userids are correlated, and you only have some couple dozens different extractions possible. But using only simple operators, and no JOINs, this query is very fast.