I have this table:
mysql> desc Customers;
+------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+-------+
| CustomerID | int(10) unsigned | NO | PRI | NULL | |
| Name | char(50) | NO | | NULL | |
| Address | char(100) | NO | | NULL | |
| City | char(30) | NO | | NULL | |
+------------+------------------+------+-----+---------+-------+
Now, If I want to insert sample data:
mysql> insert into Customers values(null, 'Julia Smith', '25 Oak Street', 'Airport West');
ERROR 1048 (23000): Column 'CustomerID' cannot be null
I know I cannot make the ID null, but that should be job of mysql to set it numbers and increment them. So I try to simple not specifying the id:
mysql> insert into Customers (Name, Address, City) values('Julia Smith', '25 Oak Street', 'Airport West');
Field 'CustomerID' doesn't have a default value
Now I am in trap. I cannot make id null (which is saying for mysql "increment my ID"), and I cannot omit it, becuase there is no default value. So how should I make mysql to handle ids for me in new insertions?
Primary key means that every CustomerID has to be unique. and you defined it also as NOT NULL, so that an INSERT of NULL is not permitted
instead of >| CustomerID | int(10) unsigned | NO | PRI | NULL |
Make it
CustomerID BIGINT AUTO_INCREMENT PRIMARY KEY
and you can enter your data without problem
ALTER TABLE table_name MODIFY CustomerID BIGINT AUTO_INCREMENT PRIMARY KEY
#Milan,
Delete the CustomerID var from the table. And add this field again with the following details:
Field: CustomerID,
Type: BIGINT(10),
Default: None,
Auto_increment: tick in the checkbox
Click SAVE button to save this new field in the table. Now I hopefully it will work while inserting the new record. Thanks.
I'm working with a MariaDB (MySQL) table which mainly contains information about the whole world cities, their latitude/longitude and the country code (2 characters) where the city is. The table is so big, over 2.5 milion rows.
show columns from Cities;
+---------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| city | varchar(255) | YES | | NULL | |
| lat | float | NO | | NULL | |
| lon | float | NO | | NULL | |
| country | varchar(255) | YES | | NULL | |
+---------+--------------+------+-----+---------+----------------+
I want to implement a city searcher, so I have to optimize the SELECTS, not the INSERTS or UPDATES (it will be always the same information).
I thought that I should:
create an index (by city? by city and country?)
create partitions (by country?)
Should I do both tasks? If so... How could I do them? Could anyone give me several advices? I'm a little bit lost.
PS. I tryied this to create and index by city and country (I don't know if I am doing it well...):
CREATE INDEX idx_cities ON Cities(city (30), country (2));
Do not use "prefix indexing". Simply use INDEX(city, country) This will work very well for either of these:
WHERE city = 'London' -- 26 results, half in the US
WHERE city = 'London' AND country = 'CA' -- one result
Do not use Partitions. The table is too small, and there is no performance benefit.
Since there are only 2.5M rows, use id MEDIUMINT UNSIGNED to save 2.5MB.
What other queries will you have? If you need to "find the 10 nearest cities to a given lat/lng", then see this.
Your table, including index(es), might be only 300MB.
Perhaps I've been staring at the screen too long but I have the following [legacy] table I'm messing with:
describe t3_test;
+--------------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+------------------+------+-----+---------+----------------+
| provnum | varchar(24) | YES | MUL | NULL | |
| trgt_mo | datetime | YES | | NULL | |
| mcare | varchar(2) | YES | | NULL | |
| bed2prsn_asst | varchar(2) | YES | | NULL | |
| trnsfr2prsn_asst | varchar(2) | YES | | NULL | |
| tlt2prsn_asst | varchar(2) | YES | | NULL | |
| hygn2prsn_asst | varchar(2) | YES | | NULL | |
| bath2psrn_asst | varchar(2) | YES | | NULL | |
| ampmcare2prsn_asst | varchar(2) | YES | | NULL | |
| any2prsn_asst | varchar(2) | YES | | NULL | |
| n | float | YES | | NULL | |
| pct | float | YES | | NULL | |
| trgt_qtr | varchar(12) | YES | | NULL | |
| recno | int(10) unsigned | NO | PRI | NULL | auto_increment |
| enddate | date | YES | | NULL | |
+--------------------+------------------+------+-----+---------+----------------+
15 rows in set (0.00 sec)
I have data that looks like this..
"555223","2008-10-01 00:00:00",NULL,"1",NULL,NULL,NULL,NULL,NULL,NULL,"40","93.0233","2008Q4","5767343","2008-12-31"
"555223","2008-10-01 00:00:00",NULL,"1",NULL,NULL,NULL,NULL,NULL,NULL,"40","93.0233","2008Q4","4075309","2008-12-31"
"555223","2008-10-01 00:00:00",NULL,"0",NULL,NULL,NULL,NULL,NULL,NULL,"3","6.97674","2008Q4","4075308","2008-12-31"
"555223","2008-10-01 00:00:00",NULL,"0",NULL,NULL,NULL,NULL,NULL,NULL,"3","6.97674","2008Q4","5767342","2008-12-31"
"555223","2008-10-01 00:00:00","N",NULL,"1",NULL,NULL,NULL,NULL,NULL,"36","83.7209","2008Q4","4075327","2008-12-31"
"555223","2008-10-01 00:00:00","N","1",NULL,NULL,NULL,NULL,NULL,NULL,"36","83.7209","2008Q4","4075323","2008-12-31"
"555223","2008-10-01 00:00:00","Y","1",NULL,NULL,NULL,NULL,NULL,NULL,"4","9.30233","2008Q4","4075325","2008-12-31"
"555223","2008-10-01 00:00:00",NULL,NULL,"0",NULL,NULL,NULL,NULL,NULL,"3","6.97674","2008Q4","4075310","2008-12-31"
"555223","2008-10-01 00:00:00",NULL,NULL,"1",NULL,NULL,NULL,NULL,NULL,"40","93.0233","2008Q4","4075311","2008-12-31"
The first two lines of the table clearly appear to be dupes (minus the A.I. index "recno"). I've tried a half dozen dupe-removal routines and they are not automatically removed.
At this point I am not sure what exactly is wrong? Is it possible there's an invisible character somewhere? Is it possible a letter is in a different character encoding? When I dump the data to CSV as is listed, it doesn't look any different.
Do you have a delete routine that would work on this file structure that would remove anything that is a dupe (minus the recno field)? I have been staring at this for two days and for some reason, it escapes me. (btw, I am aware of the column name anomaly for bathd2psrn_asst - that's not it)
This (original) table has over 13 million records in it. And is over 3GB in size so I'm looking for the most efficient way to kill dupes.. Any ideas?
Here's an example of one of the dupe-killing techniques I used that did not work:
DELETE a FROM t3_test as a, t3_test as b WHERE
(a.provnum=b.provnum)
AND (a.trgt_mo=b.trgt_mo OR a.trgt_mo IS NULL AND b.trgt_mo IS NULL)
AND (a.mcare=b.mcare OR a.mcare IS NULL AND b.mcare IS NULL)
AND (a.bed2prsn_asst=b.bed2prsn_asst OR a.bed2prsn_asst IS NULL AND b.bed2prsn_asst IS NULL)
AND (a.trnsfr2prsn_asst=b.trnsfr2prsn_asst OR a.trnsfr2prsn_asst IS NULL AND b.trnsfr2prsn_asst IS NULL)
AND (a.tlt2prsn_asst=b.tlt2prsn_asst OR a.tlt2prsn_asst IS NULL AND b.tlt2prsn_asst IS NULL)
AND (a.hygn2prsn_asst=b.hygn2prsn_asst OR a.hygn2prsn_asst IS NULL AND b.hygn2prsn_asst IS NULL)
AND (a.bath2psrn_asst=b.bath2psrn_asst OR a.bath2psrn_asst IS NULL AND b.bath2psrn_asst IS NULL)
AND (a.ampmcare2prsn_asst=b.ampmcare2prsn_asst OR a.ampmcare2prsn_asst IS NULL AND b.ampmcare2prsn_asst IS NULL)
AND (a.any2prsn_asst=b.any2prsn_asst OR a.any2prsn_asst IS NULL AND b.any2prsn_asst IS NULL)
AND (a.n=b.n OR a.n IS NULL AND b.n IS NULL)
AND (a.pct=b.pct OR a.pct IS NULL AND b.pct IS NULL)
AND (a.trgt_qtr=b.trgt_qtr OR a.trgt_qtr IS NULL AND b.trgt_qtr IS NULL)
AND (a.enddate=b.enddate OR a.enddate IS NULL AND b.enddate IS NULL)
AND (a.recno>b.recno);
For such a large table, delete can be quite inefficient -- all the logging needed for the deletes is very cumbersome.
I might recommend that you try the truncate/insert approach:
create table temp_t3_test as (
select provnum, targ_mo, . . .,
min(recno) as recno,
enddate
from t3_test
group by provnum, targ_mo, . . ., enddate;
truncate table t3_test;
insert into t3_test(provnum, targ_mo, . . . , recno, enddate)
select *
from temp_t3_test;
Try:
CREATE TABLE t3_new AS
(
SELECT provnum,
trgt_mo,
mcare,
bed2prsn_asst,
trnsfr2prsn_asst,
tlt2prsn_asst,
hygn2prsn_asst,
bath2psrn_asst,
ampmcare2prsn_asst,
any2prsn_asst,
n,
pct,
trgt_qtr,
Min(recno),
enddate
FROM t3_test
GROUP BY provnum,
trgt_mo,
mcare,
bed2prsn_asst,
trnsfr2prsn_asst,
tlt2prsn_asst,
hygn2prsn_asst,
bath2psrn_asst,
ampmcare2prsn_asst,
any2prsn_asst,
n,
pct,
trgt_qtr,
enddate
)
When you use min(recno), you don't actually select just one row. you select the minimum of all recno and use the same value for all the rows. To remove less rows, you can use distinct or group by as I have used. I would say that you can remove the rec no from the temp table and use a new auto increment column in the table that you create again to avoid gaps in the ids.
This is to be used in with the method suggested by Gordon Linoff.
In the case of this scenario, the problem was not with the SQL statement. It was a problem with the DATA, but it was not visible.
The two fields designated type "float" held hidden decimal values that were slightly different from each other. Converting those fields to DECIMAL(a,b) type made the dupes show up and be properly deleted by conventional means.
Special thanks to Gordon Linoff for suggesting looking into this.
We are having a Analytics product. For each of our customer we give one JavaScript code, they put that in their web sites. If a user visit our customer site the java script code hit our server so that we store this page visit on behalf of our customer. Each of our customer contains unique domain name that means customer determined by domain nam
Database server : MySql 5.6
Table rows : 400 million
Following is our table schema.
+---------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| domain | varchar(50) | NO | MUL | NULL | |
| guid | binary(16) | YES | | NULL | |
| sid | binary(16) | YES | | NULL | |
| url | varchar(2500) | YES | | NULL | |
| ip | varbinary(16) | YES | | NULL | |
| is_new | tinyint(1) | YES | | NULL | |
| ref | varchar(2500) | YES | | NULL | |
| user_agent | varchar(255) | YES | | NULL | |
| stats_time | datetime | YES | | NULL | |
| country | char(2) | YES | | NULL | |
| region | char(3) | YES | | NULL | |
| city | varchar(80) | YES | | NULL | |
| city_lat_long | varchar(50) | YES | | NULL | |
| email | varchar(100) | YES | | NULL | |
+---------------+------------------+------+-----+---------+----------------+
In above table guid represents visitor of our customer site and sid represents visitor session of our customer site. That means for every sid there should be associated guid.
We need queries like following
Query 1 : Find unique,total visitors
SELECT count(DISTINCT guid) AS count,count(guid) AS total FROM page_views WHERE domain = 'abc' AND stats_time BETWEEN '2015-10-05 00:00:00' AND '2015-10-04 23:59:59'
composite index planning : domain,stats_time,sid
Query 2 : Find unique,total sessions
SELECT count(DISTINCT sid) AS count,count(sid) AS total FROM page_views WHERE domain = 'abc' AND stats_time BETWEEN '2015-10-05 00:00:00' AND '2015-10-04 23:59:59'
composite index planning : domain,stats_time,guid
Query 3: Find visitors,sessions by country ,by region, by city
composite index planning : domain,country
composite index planning : domain,region
Each combination is requiring new composite index. That means huge index file, we can't keep this in memory so performance of the queries are low.
Is there any way optimize this index combinations to reduce index size and improve performance.
Just for grins, run this to see what type of spread you have...
select
country, region, city,
DATE_FORMAT(colName, '%Y-%m-%d') DATEONLY, count(*)
from
yourTable
group by
country, region, city,
DATE_FORMAT(colName, '%Y-%m-%d')
order by
count(*) desc
and then see how many rows it returns. Also, what sort of range does the COUNT column generate. Instead of just an index, does it make sense to create a separate aggregation table on the key elements you are trying to provide with data mining.
If so, I would recommend looking at a similar post also on the stack here. This shows a SAMPLE on how, but I would first look at the counts before suggesting further. But if you have it broken down on a daily basis, what MIGHT this be reduced to.
Additionally, you might want to create pre-aggregate tables ONCE to get started, then have a nightly procedure that builds any new records based on a day just completed. This way it is never running through all 400M records.
If your pre-aggregate tables store based on just the date (y,m,d only), your queries rolled-up per day would shorten querying requirements. The COUNT(*) is just an example basis, but your could add count( distinct whateverColumn ) as needed. Then, you could query the SUM( aggregateColumn ) based on domain, date range, etc. If your 400M records gets reduced down to 7M records, I would also have a minimum index on the (domain, dateOnlyField, and maybe country) to optimize your domain, date-range queries. Once you get something narrowed down at whatever level make sense, you could always drill into the raw data for the granular level.
So, here's basically the problem:
For starter, I am not asking anyone to do my homework, but to just give me a nudge in the right direction.
I have 2 tables containing names and contact data for practicing
Let's call these tables people and contact.
Create Table for people:
CREATE TABLE `people` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`fname` tinytext,
`mname` tinytext,
`lname` tinytext,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Create Table for contact:
CREATE TABLE `contact` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`person_id` int(10) unsigned NOT NULL DEFAULT '0',
`tel_home` tinytext,
`tel_work` tinytext,
`tel_mob` tinytext,
`email` text,
PRIMARY KEY (`id`,`person_id`),
KEY `fk_contact` (`person_id`),
CONSTRAINT `fk_contact` FOREIGN KEY (`person_id`) REFERENCES `people` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
When getting the contact information for each person, the query I use is as follows:
SELECT p.id, CONCAT_WS(' ',p.fname,p.mname,p.lname) name, c.tel_home, c.tel_work, c.tel_mob, c.email;
This solely creates a response like:
+----+----------+---------------------+----------+---------+---------------------+
| id | name | tel_home | tel_work | tel_mob | email |
+----+----------+---------------------+----------+---------+---------------------+
| 1 | Jane Doe | 1500 (xxx-xxx 1500) | NULL | NULL | janedoe#example.com |
| 2 | John Doe | 1502 (xxx-xxx 1502) | NULL | NULL | NULL |
| 2 | John Doe | NULL | NULL | NULL | johndoe#example.com |
+----+----------+---------------------+----------+---------+---------------------+
The problem with this view is that row 1 and 2 (counting from 0) could've been grouped to a single row.
Even though this "non-pretty" result is due to corrupt data, it is likely that this will occur in a multi-node database environment.
The targeted result would be something like
+----+----------+---------------------+----------+---------+---------------------+
| id | name | tel_home | tel_work | tel_mob | email |
+----+----------+---------------------+----------+---------+---------------------+
| 1 | Jane Doe | 1500 (xxx-xxx 1500) | NULL | NULL | janedoe#example.com |
| 2 | John Doe | 1502 (xxx-xxx 1502) | NULL | NULL | johndoe#example.com |
+----+----------+---------------------+----------+---------+---------------------+
Where the rows with the same id and name are grouped when still showing the effective data.
Side notes:
innodb_version: 5.5.32
version: 5.5.32-0ubuntu-.12.04.1-log
version_compile_os: debian_linux-gnu
You could use GROUP_CONCAT(), which "returns a string result with the concatenated non-NULL values from a group":
SELECT p.id,
GROUP_CONCAT(CONCAT_WS(' ',p.fname,p.mname,p.lname)) name,
GROUP_CONCAT(c.tel_home) tel_home,
GROUP_CONCAT(c.tel_work) tel_work,
GROUP_CONCAT(c.tel_mob ) tel_mob,
GROUP_CONCAT(c.email ) email
FROM my_table
GROUP BY p.id