Related
I have a MySQL database the stores news articles with the publications date (just day information), the source, and category. Based on these I want to generate a table that holds the article counts w.r.t. to these 3 parameters.
Since for some combinations of these 3 parameters there might be no article, a simple GROUP BY won't do. I therefore first generate a table news_article_counts with all possible combinations of the 3 parameters, and an default article_count of 0 -- like this:
SELECT * FROM news_article_counts;
+--------------+------------+----------+---------------+
| published_at | source | category | article_count |
+------------- +------------+----------+---------------+
| 2016-08-05 | 1826089206 | 0 | 0 |
| 2016-08-05 | 1826089206 | 1 | 0 |
| 2016-08-05 | 1826089206 | 2 | 0 |
| 2016-08-05 | 1826089206 | 3 | 0 |
| 2016-08-05 | 1826089206 | 4 | 0 |
| ... | ... | ... | ... |
+--------------+------------+----------+---------------+
For testing, I now created a temporary table tmp as the GROUP BY result from the original news article table:
SELECT * FROM tmp LIMIT 6;
+--------------+------------+----------+-----+
| published_at | source | category | cnt |
+--------------+------------+----------+-----+
| 2016-08-05 | 1826089206 | 3 | 1 |
| 2003-09-19 | 1826089206 | 4 | 1 |
| 2005-08-08 | 1826089206 | 3 | 1 |
| 2008-07-22 | 1826089206 | 4 | 1 |
| 2008-11-26 | 1826089206 | 8 | 1 |
| ... | ... | ... | ... |
+--------------+------------+----------+-----+
Given these two tables, the following query works as expected:
SELECT * FROM news_article_counts c, tmp t
WHERE c.published_at = t.published_at AND c.source = t.source AND c.category = t.category;
But now I need to update the article_count of table news_article_counts with the values in table tmp where the 3 parameters match up. For this I'm using the following query (I've tried different ways but with the same results):
UPDATE
news_article_counts c
INNER JOIN
tmp t
ON
c.published_at = t.published_at AND
c.source = t.source AND
c.category = t.category
SET
c.article_count = t.cnt;
Executing this query yields this error:
ERROR 1062 (23000): Duplicate entry '2018-04-07 14:46:17-1826089206-1' for key 'uniqueIndex'
uniqueIndex is a joint index over published_at, source, category of table news_article_counts. But this shouldn't be a problem since I do not -- as far as I can tell -- update any of those 3 values, only article_count.
What confuses me most is that in the error it mentions the timestamp I executed the query (here: 2018-04-07 14:46:17). I have no absolutely idea where this comes into play. In fact, some rows in news_article_counts now have 2018-04-07 14:46:17 as value for published_at. While this explains the error, I cannot see why published_at gets overwritten with the current timestamp. There is no ON UPDATE CURRENT_TIMESTAMP on this column; see:
CREATE TABLE IF NOT EXISTS `test`.`news_article_counts` (
`published_at` TIMESTAMP NOT NULL,
`source` INT UNSIGNED NOT NULL,
`category` INT UNSIGNED NOT NULL,
`article_count` INT UNSIGNED NOT NULL DEFAULT 0,
UNIQUE INDEX `uniqueIndex` (`published_at` ASC, `source` ASC, `category` ASC))
ENGINE = MyISAM
DEFAULT CHARACTER SET = utf8mb4;
What am I missing here?
UPDATE 1: I actually checked the table definition of news_article_counts in the database. And there's indeed the following:
mysql> SHOW COLUMNS FROM news_article_counts;
+---------------+------------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+------------------+------+-----+-------------------+-----------------------------+
| published_at | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| source | int(10) unsigned | NO | | NULL | |
| category | int(10) unsigned | NO | | NULL | |
| article_count | int(10) unsigned | NO | | 0 | |
+---------------+------------------+------+-----+-------------------+-----------------------------+
But why is on update CURRENT_TIMESTAMP set. I double and triple-checked my CREATE TABLE statement. I removed the joint index, I added an artificial primary key (auto_increment). Nothing help. I've even tried to explicitly remove these attributes from published_at with:
ALTER TABLE `news_article_counts` CHANGE `published_at` `published_at` TIMESTAMP NOT NULL;
Nothing seems to work for me.
It looks like you have the explicit_defaults_for_timestamp system variable disabled. One of the effects of this is:
The first TIMESTAMP column in a table, if not explicitly declared with the NULL attribute or an explicit DEFAULT or ON UPDATE attribute, is automatically declared with the DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP attributes.
You could try enabling this system variable, but that could potentially impact other applications. I think it only takes effect when you're actually creating a table, so it shouldn't affect any existing tables.
If you don't to make a system-level change like this, you could add an explicit DEFAULT attribute to the published_at column of this table, then it won't automatically add ON UPDATE.
Description
I have a MySQL table like the following one:
CREATE TABLE `ticket` (
`ticket_id` int(11) NOT NULL AUTO_INCREMENT,
`ticket_number` varchar(30) DEFAULT NULL,
`pick1` varchar(2) DEFAULT NULL,
`pick2` varchar(2) DEFAULT NULL,
`pick3` varchar(2) DEFAULT NULL,
`pick4` varchar(2) DEFAULT NULL,
`pick5` varchar(2) DEFAULT NULL,
`pick6` varchar(2) DEFAULT NULL,
PRIMARY KEY (`ticket_id`)
) ENGINE=InnoDB AUTO_INCREMENT=19675 DEFAULT CHARSET=latin1;
Let's also asume we have the following values already stored in DB:
+-----------+-------------------+-------+-------+-------+-------+-------+-------+
| ticket_id | ticket_number | pick1 | pick2 | pick3 | pick4 | pick5 | pick6 |
+-----------+-------------------+-------+-------+-------+-------+-------+-------+
| 655 | 08-09-21-24-46-52 | 8 | 9 | 21 | 24 | 46 | 52 |
| 658 | 08-23-24-40-42-45 | 8 | 23 | 24 | 40 | 42 | 45 |
| 660 | 07-18-19-20-22-31 | 7 | 18 | 19 | 20 | 22 | 45 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 19674 | 06-18-33-43-49-50 | 6 | 18 | 33 | 43 | 49 | 50 |
+-----------+-------------------+-------+-------+-------+-------+-------+-------+
Now, my goal is to compare each ticket with each other one in the Table (except itself), in terms of their respective values in ticket_number field (6 elements per set, split by -). Put differently, for instance, imagine I compare ticket_id = 655 with ticket_id = 658, in terms of the elements in their respectives ticket_number fields, then I will find that elements 08 and 24 appear in both sets. If we now compare ticket_id = 660 with ticket_id = 19674, then we have that there is only one coincidence: 18.
What I am actually using to carry out these comparisons is the following query:
select A.ticket_id, A.ticket_number, P.ticket_id, P.ticket_number, count(P.ticket_number) as cnt from ticket A inner join ticket P on A.ticket_id != P.ticket_id
where
((A.ticket_number like concat("%", lpad(P.pick1,2,0), "%"))
+ (A.ticket_number like concat("%", lpad(P.pick2,2,0), "%"))
+ (A.ticket_number like concat("%", lpad(P.pick3,2,0), "%"))
+ (A.ticket_number like concat("%", lpad(P.pick4,2,0), "%"))
+ (A.ticket_number like concat("%", lpad(P.pick5,2,0), "%"))
+ (A.ticket_number like concat("%", lpad(P.pick6,2,0), "%")) > 3) group by A.ticket_id
having cnt > 5;
That is, first I create a INNER JOIN concatenating all rows with different ticket_id and then I compare each P.pickX (X=[1..6]) with the A.ticket_number of the resulting INNER JOIN operation, and I count the number of matchings between both sets.
Finally, after executing, I obtain something like this:
+-------------+-------------------+-------------+-------------------+-----+
| A.ticket_id | A.ticket_number | P.ticket_id | P.ticket_number | cnt |
+-------------+-------------------+-------------+-------------------+-----+
| 8489 | 14-21-28-32-48-49 | 2528 | 14-21-33-45-48-49 | 6 |
| 8553 | 02-14-17-38-47-53 | 2364 | 02-30-38-44-47-53 | 6 |
| 8615 | 05-12-29-33-36-43 | 4654 | 12-21-29-33-36-37 | 6 |
| 8686 | 09-13-29-34-44-48 | 6038 | 09-13-17-29-33-44 | 6 |
| 8693 | 01-10-14-17-42-50 | 5330 | 01-10-37-42-48-50 | 6 |
| ... | ... | ... | ... | ... |
| 19195 | 05-13-29-41-46-51 | 5106 | 07-13-14-29-41-51 | 6 |
+-------------+-------------------+-------------+-------------------+-----+
Problem
The problem is that I execute this for a table of 10476 rows, resulting in more tan 100 Million ticket_number vs pickX to compare, lasting around 172 seconds in total to conclude. This is too slow.
GOAL
My goal is to make this execution as fast as possible so as to be completed in less than a second, since this must work in real-time.
Is that possible?
If you want to keep the current structure then change pick1..6 to tinyint type instead of varchar
TINYINT(1) stores the values between -128 to 128 if it is signed. And then your query won't have that concat with % which is the cause of slow run.
Then, these two queries will give you the same result
select * FROM ticket where pick1 = '8';
select * FROM ticket where pick1 = '08';
This is the sql structure:
CREATE TABLE `ticket` (
`ticket_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`ticket_number` varchar(30) DEFAULT NULL,
`pick1` tinyint(1) unsigned zerofill DEFAULT NULL,
`pick2` tinyint(1) unsigned zerofill DEFAULT NULL,
`pick3` tinyint(1) unsigned zerofill DEFAULT NULL,
`pick4` tinyint(1) unsigned zerofill DEFAULT NULL,
`pick5` tinyint(1) unsigned zerofill DEFAULT NULL,
`pick6` tinyint(1) unsigned zerofill DEFAULT NULL,
PRIMARY KEY (`ticket_id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=latin1;
I think, you even can remove the zerofill
if this doesn't work, change the table design.
How big can the numbers be? Looks like 50. If the answer is 63 or less, then change the format to this:
All 6 numbers are stored in a single SET ('0','1','2',...,'50') and use suitable operations to set the nth bit.
Then, comparing two sets becomes BIT_COUNT(x & y) to find out how many match. A simple comparison will test for equality.
If your goal is to see if a particular lottery guess is already in the table, then index that column so that a lookup will be fast. I don't mean minutes or even seconds, but rather a few milliseconds. Even for a billion rows.
The bit arithmetic can be done in SQL or in your client language. For example, to build the SET for (11, 33, 7), the code would be
INSERT INTO t SET picks = '11,33,7' -- order does not matter
Also this would work:
... picks = (1 << 11) |
(1 << 33) |
(1 << 7)
A quick example:
CREATE TABLE `setx` (
`picks` set('1','2','3','4','5','6','7','8','9','10') NOT NULL
) ENGINE=InnoDB;
INSERT INTO setx (picks) VALUES ('2,10,6');
INSERT INTO setx (picks) VALUES ('1,3,5,7,9'), ('2,4,6,8,10'), ('9,8,7,6,5,4,3,2,1,10');
SELECT picks, HEX(picks+0) FROM setx;
+----------------------+--------------+
| picks | HEX(picks+0) |
+----------------------+--------------+
| 2,6,10 | 222 |
| 1,3,5,7,9 | 155 |
| 2,4,6,8,10 | 2AA |
| 1,2,3,4,5,6,7,8,9,10 | 3FF |
+----------------------+--------------+
4 rows in set (0.00 sec)
I am looking for an update statement that will group terms by language in the following table
CREATE TABLE _tempTerms(
ID int(8) unsigned NOT NULL AUTO_INCREMENT,
TTC_ART_ID mediumint(8) unsigned,
TTC_TYP_ID mediumint(8) unsigned,
Name varchar(200),
Value varchar(200),
ID_Lang tinyint(3) unsigned,
Sequence smallint unsigned,
Group_ID int(8) unsigned DEFAULT 0,
PRIMARY KEY(TTC_ART_ID, TTC_TYP_ID, Name, Value),
UNIQUE KEY(ID)
);
All data except Group_ID is inserted into the table. I need to update the table so that I auto-generate new Group_IDs and the Group_ID for all records with same combination of TTC_ART_ID, TTC_TYP_ID and Sequence will get the same Group_ID. I guess I need a variable to store the current value for Group_ID and so far I experimented with
SET #group_id:=1;
UPDATE _tempTerms
SET Group_ID = (#group_id := #group_id + 1);
which just gives a new group_id to every new record. I believe I need a SELECT Statement somewhere to check if there is a group_id already given, but I am confused on how I go about it.
Thank you
Schema:
create database xGrpId; -- create a test db
use xGrpId; -- use it
CREATE TABLE _tempTerms(
ID int(8) unsigned NOT NULL AUTO_INCREMENT,
TTC_ART_ID mediumint(8) unsigned,
TTC_TYP_ID mediumint(8) unsigned,
Name varchar(200),
Value varchar(200),
ID_Lang tinyint(3) unsigned,
Sequence smallint unsigned,
Group_ID int(8) unsigned DEFAULT 0,
PRIMARY KEY(TTC_ART_ID, TTC_TYP_ID, Name, Value),
UNIQUE KEY(ID)
);
-- truncate table _tempTerms;
insert _tempTerms(TTC_ART_ID,TTC_TYP_ID,Name,Value,ID_Lang,Sequence) values
(1,2,'n','v1',66,4),
(1,1,'n','v2',66,4),
(1,1,'n','v3',66,3),
(1,1,'n','v4',66,4),
(1,1,'n','v5',66,4),
(1,1,'n','v6',66,3),
(2,1,'n','v7',66,4),
(1,2,'n','v8',66,4);
View them:
select * from _tempTerms order by id;
select distinct TTC_ART_ID,TTC_TYP_ID,Sequence from _tempTerms;
-- 4 rows
-- update _tempTerms set Group_ID=0; -- clear before testing
The query:
update _tempTerms t
join
( select TTC_ART_ID,TTC_TYP_ID,Sequence,#rn:=#rn+1 as rownum
from
( select distinct TTC_ART_ID,TTC_TYP_ID,Sequence
from _tempTerms
-- put your `order by` here if needed
order by TTC_ART_ID,TTC_TYP_ID,Sequence
) d1
cross join (select #rn:=0) as xParams
) d2
on d2.TTC_ART_ID=t.TTC_ART_ID and d2.TTC_TYP_ID=t.TTC_TYP_ID and d2.Sequence=t.Sequence
set t.Group_ID=d2.rownum;
Results:
select * from _tempTerms order by TTC_ART_ID,TTC_TYP_ID,Sequence;
+----+------------+------------+------+-------+---------+----------+----------+
| ID | TTC_ART_ID | TTC_TYP_ID | Name | Value | ID_Lang | Sequence | Group_ID |
+----+------------+------------+------+-------+---------+----------+----------+
| 3 | 1 | 1 | n | v3 | 66 | 3 | 1 |
| 6 | 1 | 1 | n | v6 | 66 | 3 | 1 |
| 2 | 1 | 1 | n | v2 | 66 | 4 | 2 |
| 4 | 1 | 1 | n | v4 | 66 | 4 | 2 |
| 5 | 1 | 1 | n | v5 | 66 | 4 | 2 |
| 1 | 1 | 2 | n | v1 | 66 | 4 | 3 |
| 8 | 1 | 2 | n | v8 | 66 | 4 | 3 |
| 7 | 2 | 1 | n | v7 | 66 | 4 | 4 |
+----+------------+------------+------+-------+---------+----------+----------+
Cleanup:
drop database xGrpId;
d1, d2, and xParams are derived tables. Every derived table needs a name. The purpose of xParams and the cross join is merely to bring in a variable to initialize the row number. This is because mysql lacks CTE functionality found in other RDBMS's. So, don't overthink the cross join. It is like saying LET i=0.
I am having a issue finding a fast way of joining the tables looking like that:
mysql> explain geo_ip;
+--------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+-------+
| ip_start | varchar(32) | NO | | "" | |
| ip_end | varchar(32) | NO | | "" | |
| ip_num_start | int(64) unsigned | NO | PRI | 0 | |
| ip_num_end | int(64) unsigned | NO | | 0 | |
| country_code | varchar(3) | NO | | "" | |
| country_name | varchar(64) | NO | | "" | |
| ip_poly | geometry | NO | MUL | NULL | |
+--------------+------------------+------+-----+---------+-------+
mysql> explain entity_ip;
+------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------------+------+-----+---------+-------+
| entity_id | int(64) unsigned | NO | PRI | NULL | |
| ip_1 | tinyint(3) unsigned | NO | | NULL | |
| ip_2 | tinyint(3) unsigned | NO | | NULL | |
| ip_3 | tinyint(3) unsigned | NO | | NULL | |
| ip_4 | tinyint(3) unsigned | NO | | NULL | |
| ip_num | int(64) unsigned | NO | | 0 | |
| ip_poly | geometry | NO | MUL | NULL | |
+------------+---------------------+------+-----+---------+-------+
Please note that I am not interested in finding the needed rows in geo_ip by only ONE IP address at once, I need a entity_ip LEFT JOIN geo_ip (or similar/analogue way).
This is what I have for now (using polygons as advised on http://jcole.us/blog/archives/2007/11/24/on-efficiently-geo-referencing-ips-with-maxmind-geoip-and-mysql-gis/):
mysql> EXPLAIN SELECT li.*, gi.country_code FROM entity_ip AS li
-> LEFT JOIN geo_ip AS gi ON
-> MBRCONTAINS(gi.`ip_poly`, li.`ip_poly`);
+----+-------------+-------+------+---------------+------+---------+------+--------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------+
| 1 | SIMPLE | li | ALL | NULL | NULL | NULL | NULL | 2470 | |
| 1 | SIMPLE | gi | ALL | ip_poly_index | NULL | NULL | NULL | 155183 | |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------+
mysql> SELECT li.*, gi.country_code FROM entity AS li LEFT JOIN geo_ip AS gi ON MBRCONTAINS(gi.`ip_poly`, li.`ip_poly`) limit 0, 20;
20 rows in set (2.22 sec)
No polygons
mysql> explain SELECT li.*, gi.country_code FROM entity_ip AS li LEFT JOIN geo_ip AS gi ON li.`ip_num` >= gi.`ip_num_start` AND li.`ip_num` <= gi.`ip_num_end` LIMIT 0,20;
+----+-------------+-------+------+---------------------------+------+---------+------+--------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------------------+------+---------+------+--------+-------+
| 1 | SIMPLE | li | ALL | NULL | NULL | NULL | NULL | 2470 | |
| 1 | SIMPLE | gi | ALL | PRIMARY,geo_ip,geo_ip_end | NULL | NULL | NULL | 155183 | |
+----+-------------+-------+------+---------------------------+------+---------+------+--------+-------+
mysql> SELECT li.*, gi.country_code FROM entity_ip AS li LEFT JOIN geo_ip AS gi ON li.ip_num BETWEEN gi.ip_num_start AND gi.ip_num_end limit 0, 20;
20 rows in set (2.00 sec)
(On higher number of rows in the search - there is no difference)
Currently I cannot get any faster performance from these queries as 0.1 seconds per IP is way too slow for me.
Is there any way to make it faster?
This approach has some scalability issues (should you choose to move to, say, city-specific geoip data), but for the given size of data, it will provide considerable optimization.
The problem you are facing is effectively that MySQL does not optimize range-based queries very well. Ideally you want to do an exact ("=") look-up on an index rather than "greater than", so we'll need to build an index like that from the data you have available. This way MySQL will have much fewer rows to evaluate while looking for a match.
To do this, I suggest that you create a look-up table that indexes the geolocation table based on the first octet (=1 from 1.2.3.4) of the IP addresses. The idea is that for each look-up you have to do, you can ignore all geolocation IPs which do not begin with the same octet than the IP you are looking for.
CREATE TABLE `ip_geolocation_lookup` (
`first_octet` int(10) unsigned NOT NULL DEFAULT '0',
`ip_numeric_start` int(10) unsigned NOT NULL DEFAULT '0',
`ip_numeric_end` int(10) unsigned NOT NULL DEFAULT '0',
KEY `first_octet` (`first_octet`,`ip_numeric_start`,`ip_numeric_end`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Next, we need to take the data available in your geolocation table and produce data that covers all (first) octets the geolocation row covers: If you have an entry with ip_start = '5.3.0.0' and ip_end = '8.16.0.0', the lookup table will need rows for octets 5, 6, 7, and 8. So...
ip_geolocation
|ip_start |ip_end |ip_numeric_start|ip_numeric_end|
|72.255.119.248 |74.3.127.255 |1224701944 |1241743359 |
Should convert to:
ip_geolocation_lookup
|first_octet|ip_numeric_start|ip_numeric_end|
|72 |1224701944 |1241743359 |
|73 |1224701944 |1241743359 |
|74 |1224701944 |1241743359 |
Since someone here requested for a native MySQL solution, here's a stored procedure that will generate that data for you:
DROP PROCEDURE IF EXISTS recalculate_ip_geolocation_lookup;
CREATE PROCEDURE recalculate_ip_geolocation_lookup()
BEGIN
DECLARE i INT DEFAULT 0;
DELETE FROM ip_geolocation_lookup;
WHILE i < 256 DO
INSERT INTO ip_geolocation_lookup (first_octet, ip_numeric_start, ip_numeric_end)
SELECT i, ip_numeric_start, ip_numeric_end FROM ip_geolocation WHERE
( ip_numeric_start & 0xFF000000 ) >> 24 <= i AND
( ip_numeric_end & 0xFF000000 ) >> 24 >= i;
SET i = i + 1;
END WHILE;
END;
And then you will need to populate the table by calling that stored procedure:
CALL recalculate_ip_geolocation_lookup();
At this point you may delete the procedure you just created -- it is no longer needed, unless you want to recalculate the look-up table.
After the look-up table is in place, all you have to do is integrate it into your queries and make sure you're querying by the first octet. Your query to the look-up table will satisfy two conditions:
Find all rows which match the first octet of your IP address
Of that subset: Find the row which has the the range that matches your IP address
Because the step two is carried out on a subset of data, it is considerably faster than doing the range tests on the entire data. This is the key to this optimization strategy.
There are various ways for figuring out what the first octet of an IP address is; I used ( r.ip_numeric & 0xFF000000 ) >> 24 since my source IPs are in numeric form:
SELECT
r.*,
g.country_code
FROM
ip_geolocation g,
ip_geolocation_lookup l,
ip_random r
WHERE
l.first_octet = ( r.ip_numeric & 0xFF000000 ) >> 24 AND
l.ip_numeric_start <= r.ip_numeric AND
l.ip_numeric_end >= r.ip_numeric AND
g.ip_numeric_start = l.ip_numeric_start;
Now, admittedly I did get a little lazy in the end: You could easily get rid of ip_geolocation table altogether if you made the ip_geolocation_lookup table also contain the country data. I'm guessing dropping one table from this query would make it a bit faster.
And, finally, here are the two other tables I used in this response for reference, since they differ from your tables. I'm certain they are compatible, though.
# This table contains the original geolocation data
CREATE TABLE `ip_geolocation` (
`ip_start` varchar(16) NOT NULL DEFAULT '',
`ip_end` varchar(16) NOT NULL DEFAULT '',
`ip_numeric_start` int(10) unsigned NOT NULL DEFAULT '0',
`ip_numeric_end` int(10) unsigned NOT NULL DEFAULT '0',
`country_code` varchar(3) NOT NULL DEFAULT '',
`country_name` varchar(64) NOT NULL DEFAULT '',
PRIMARY KEY (`ip_numeric_start`),
KEY `country_code` (`country_code`),
KEY `ip_start` (`ip_start`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
# This table simply holds random IP data that can be used for testing
CREATE TABLE `ip_random` (
`ip` varchar(16) NOT NULL DEFAULT '',
`ip_numeric` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`ip`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Just wanted to give back to the community:
Here's an even better and optimized way building on Aleksi's solution:
DROP PROCEDURE IF EXISTS recalculate_ip_geolocation_lookup;
DELIMITER ;;
CREATE PROCEDURE recalculate_ip_geolocation_lookup()
BEGIN
DECLARE i INT DEFAULT 0;
DROP TABLE `ip_geolocation_lookup`;
CREATE TABLE `ip_geolocation_lookup` (
`first_octet` smallint(5) unsigned NOT NULL DEFAULT '0',
`startIpNum` int(10) unsigned NOT NULL DEFAULT '0',
`endIpNum` int(10) unsigned NOT NULL DEFAULT '0',
`locId` int(11) NOT NULL,
PRIMARY KEY (`first_octet`,`startIpNum`,`endIpNum`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT IGNORE INTO ip_geolocation_lookup
SELECT startIpNum DIV 1048576 as first_octet, startIpNum, endIpNum, locId
FROM ip_geolocation;
INSERT IGNORE INTO ip_geolocation_lookup
SELECT endIpNum DIV 1048576 as first_octet, startIpNum, endIpNum, locId
FROM ip_geolocation;
WHILE i < 1048576 DO
INSERT IGNORE INTO ip_geolocation_lookup
SELECT i, startIpNum, endIpNum, locId
FROM ip_geolocation_lookup
WHERE first_octet = i-1
AND endIpNum DIV 1048576 > i;
SET i = i + 1;
END WHILE;
END;;
DELIMITER ;
CALL recalculate_ip_geolocation_lookup();
It builds way faster than his solution and drills down more easily because we're not just taking the first 8, but the first 20 bits. Join performance: 100000 rows in 158ms. You might have to rename the table and field names to your version.
Query by using
SELECT ip, kl.*
FROM random_ips ki
JOIN `ip_geolocation_lookup` kb ON (ki.`ip` DIV 1048576 = kb.`first_octet` AND ki.`ip` >= kb.`startIpNum` AND ki.`ip` <= kb.`endIpNum`)
JOIN ip_maxmind_locations kl ON kb.`locId` = kl.`locId`;
Can't comment yet, but user1281376's answers is wrong and doesn't work. the reason you only use the first octet is because you aren't going to match all ip ranges otherwise. there's plenty of ranges that span multiple second octets which user1281376s changed query isn't going to match. And yes, this actually happens if you use the Maxmind GeoIp data.
with aleksis suggestion you can do a simple comparison on the fîrst octet, thus reducing the matching set.
I found a easy way. I noticed that all first ip in the group % 256 = 0,
so we can add a ip_index table
CREATE TABLE `t_map_geo_range` (
`_ip` int(10) unsigned NOT NULL,
`_ipStart` int(10) unsigned NOT NULL,
PRIMARY KEY (`_ip`)
) ENGINE=MyISAM
How to fill the index table
FOR_EACH(Every row of ip_geo)
{
FOR(Every ip FROM ipGroupStart/256 to ipGroupEnd/256)
{
INSERT INTO ip_geo_index(ip, ipGroupStart);
}
}
How to use:
SELECT * FROM YOUR_TABLE AS A
LEFT JOIN ip_geo_index AS B ON B._ip = A._ip DIV 256
LEFT JOIN ip_geo AS C ON C.ipStart = B.ipStart;
More than 1000 times Faster.
I have a table that I added a column called phone - the table also has an id set as a primary key that auto_increments. How can I insert a random value into the phone column, that won't be duplicated. The following UPDATE statement did insert random values, but not all of them unique. Also, I'm not sold I cast the phone field correctly either, but ran into issues when trying to set it as a int(11) w/ the ALTER TABLE command (mainly, it ran correctly, but when adding a row with a new phone number, the inserted value was translated into a different number).
UPDATE Ballot SET phone = FLOOR(50000000 * RAND()) + 1;
Table spec's
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| phone | varchar(11) | NO | | NULL | |
| age | tinyint(3) | NO | | NULL | |
| test | tinyint(4) | NO | | 0 | |
| note | varchar(100) | YES | | NULL | |
+------------+--------------+------+-----+---------+----------------+
-- tbl_name: Table
-- column_name: Column
-- chars_str: String containing acceptable characters
-- n: Length of the random string
-- dummy_tbl: Not a parameter, leave as is!
UPDATE tbl_name SET column_name = (
SELECT GROUP_CONCAT(SUBSTRING(chars_str , 1+ FLOOR(RAND()*LENGTH(chars_str)) ,1) SEPARATOR '')
FROM (SELECT 1 /* UNION SELECT 2 ... UNION SELECT n */) AS dummy_tbl
);
-- Example
UPDATE tickets SET code = (
SELECT GROUP_CONCAT(SUBSTRING('123abcABC-_$#' , 1+ FLOOR(RAND()*LENGTH('123abcABC-_$#')) ,1) SEPARATOR '')
FROM (SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5) AS dummy_tbl
);
Try this
UPDATE Ballot SET phone = FLOOR(50000000 * RAND()) * id;
I'd tackle this by generating a (temporary) table containing the numbers in the range you need, then looping through each record in the table you wish to supply with random numbers. Pick a random element from the temp table, update the table with that, and remove it from the temp table. Not beautiful, nor fast.. but easy to develop and easy to test.