Can I create a mapping from interger values in a column to the text values they represent in sql? - mysql

I have a table full of traffic accident data with column headers such as 'Vehicle_Manoeuvre' which contains integers for example 13 represents the vehicle manoeuvre which caused the accident was 'overtaking moving vehicle'.
I know the mappings from integers to text as I have a (quite large) excel file with this data.
An example of what I want to know is percentage of the accidents involved this type of manoeuvre but I don't want to have to open the excel file and find the mappings of integers to text every time I write a query.
I could manually change the integers of all the columns (write query with all the possible mappings of each column, add them as new column, then delete the orginial columns) but this sould take a long time.
Is it possible to create some type of variable (like an array with first column as integers and second column with the mapped text) that SQL could use to understand how text relates to the integers allowing me to write a query below:
SELECT COUNT(Vehicle_Manoeuvre) FROM traffictable WHERE Vehicle_Manoeuvre='overtaking moving vehicle';
rather than:
SELECT COUNT(Vehicle_Manoeuvre) FROM traffictable WHERE Vehicle_Manoeuvre=13;
even though the data in the table is still in integer form?

You would do this with a Maneeuvres reference table:
create table Manoeuvres (
ManoeuvreId int primary key,
Name varchar(255) unique
);
insert into Manoeuvres(ManoeuvreId, Name)
values (13, 'Overtaking');
You might even have such a table already, if you know that 13 has a special meaning.
Then use a join:
SELECT COUNT(*)
FROM traffictable tt JOIN
Manoeuvres m
ON tt.Vehicle_Manoeuvre = m.ManoeuvreId
WHERE m.name = 'Overtaking';

Related

MySQL performance issue on ~3million rows containing MEDIUMTEXT?

I had a table with 3 columns and 3600K rows. Using MySQL as a key-value store.
The first column id was VARCHAR(8) and set to primary key.The 2nd and 3rd columns were MEDIUMTEXT. When calling SELECT * FROM table WHERE id=00000 MySQL took like 54 sec ~ 3 minutes.
For testing I created a table containing VARCHAR(8)-VARCHAR(5)-VARCHAR(5) where data casually generated from numpy.random.randint. SELECT takes 3 sec without primary key. Same random data with VARCHAR(8)-MEDIUMTEXT-MEDIUMTEXT, the time cost by SELECT was 15 sec without primary key.(note: in second test, 2nd and 3rd column actually contained very short text like '65535', but created as MEDIUMTEXT)
My question is: how can I achieve similar performance on my real data? (or, is it impossible?)
If you use
SELECT * FROM `table` WHERE id=00000
instead of
SELECT * FROM `table` WHERE id='00000'
you are looking for all strings that are equal to an integer 0, so MySQL will have to check all rows, because '0', '0000' and even ' 0' will all be casted to integer 0. So your primary key on id will not help and you will end up with a slow full table. Even if you don't store values that way, MySQL doesn't know that.
The best option is, as all comments and answers pointed out, to change the datatype to int:
alter table `table` modify id int;
This will only work if your ids casted as integer are unique (so you don't have e.g. '0' and '00' in your table).
If you have any foreign keys that references id, you have to drop them first and, before recreating them, change the datatype in the other columns too.
If you have a known format you are storing your values (e.g. no zeros, or filled with 0s up to the length of 8), the second best option is to use this exact format to do your query, and include the ' to not cast it to integer. If you e.g. always fill 0 to 8 digits, use
SELECT * FROM `table` WHERE id='00000000';
If you never add any zeros, still add the ':
SELECT * FROM `table` WHERE id='0';
With both options, MySQL can use your primary key and you will get your result in milliseconds.
If your id column contains only numbers so define it as int , because int will give you better performance ( it is more faster)
Make the column in your table (the one defined as key) integer and retry. Check first performance by running a test within your DB (workbench or simple command line). You should get a better result.
Then, and only if needed (I doubt it though), modify your python to convert from integer to string (and/or vise-versa) when referencing the key column.

update sql table current row

Complete noob alert! I need to store a largish set of data fields (480) for each of many devices i am measuring. Each field is a Decimal(8,5). First, is this an unreasonably large table? I have no experience really, so if it is unmanageable, I might start thinking of an alternative storage method.
Right now, I am creating a new row using INSERT, then trying to put the 480 data values in to the new row using UPDATE (in a loop). Currently each UPDATE is overwriting the entire column. How do I specify only to modify the last row? For example, with a table ("magnitude") having columns "id", "field1", "field2",...:
sql UPDATE magnitude SET field1 = 3.14; this modifies the entire "field1" column.
Was trying to do something like:
sql UPDATE magnitude SET field1 = 3.14 WHERE id = MAX(id)
Obviously I am a complete noob. Just trying to get this one thing working and move on... Did look around a lot but can't find a solution. Any help appreciated.
Instead of inserting a row and then updating it with values, you should insert an entire row, with populated values, at once, using the insert command.
I.e.
insert into tTable (column1, column2, ..., column n) values (datum1, datum2, ..., datum n)
Your table's definition should have the ID column with property identity, which means that it will autofill it for you when you insert, i.e. you don't need to specify it.
Re: appropriateness of the schema, I think 480 is a large number of columns. However, this is a straightforward enough example that you could try it and determine empirically if your system is able to give you the performance you need.
If I were doing this myself, I would go for a different solution that has many rows instead of many columns:
Create a table tDevice (ID int, Name nvarchar)
Create a table tData (ID int, Device_ID int, Value decimal(8,5))
-- With a foreign key on Device_ID back to tDevice.ID
Then, to populate:
Insert all your devices in tDevice
Insert one row into tData for every Device / Data combination
-- i.e. 480 x n rows, n being the number of devices
Then, you can query the data you want like so:
select * from tData join tDevice on tDevice.ID = tData.Device_ID

LPAD with leading zero

I have table with invoice numbers. Guidelines say that numbers should have 6 or more digits. First of all tried to do:
UPDATE t1 SET NUMER=CONCAT('00000',NUMER) WHERE LENGTH(NUMER)=1;
UPDATE t1 SET NUMER=CONCAT('0000',NUMER) WHERE LENGTH(NUMER)=2;
UPDATE t1 SET NUMER=CONCAT('000',NUMER) WHERE LENGTH(NUMER)=3;
UPDATE t1 SET NUMER=CONCAT('00',NUMER) WHERE LENGTH(NUMER)=4;
UPDATE t1 SET NUMER=CONCAT('0',NUMER) WHERE LENGTH(NUMER)=5;
but that isn't efficient, and even pretty. I tried LPAD function, but then came problem because function :
UPDATE t1 SET NUMER=LPAD(NUMER,6,'0') WHERE CHAR_LENGTH(NUMER)<=6 ;
returns ZERO rows affected. Also googled and they say that putting zero into quotes will solve problem, but didn't, any help ? It's daily import.
EDIT:
Column NUMER is INT(19) and contain already data like :
NUMER
----------
1203
12303
123403
1234503
...
(it's filled with data with different length from 3 to 7 digits by now)
I think you should consider that the guidelines you read apply to how an invoice should be displayed, and not how it should be stored in the database.
When a number is stored as an INT, it's a pure number. If you add zeros in front and store it again, it is still the same number.
You could select the NUMER field as follows, or create a view for that table:
SELECT LPAD(NUMER,6,'0') AS NUMER
FROM ...
Or, rather than changing the data when you select it from the database, consider padding the number with zeros when you display it, and only when you display it.
I think your requirement for historical data to stay the same is a moot point. Even for historical data, an invoice numbered 001203 is the same as an invoice numbered 1203.
However, if you absolutely must do it the way you describe, then converting to a VARCHAR field may work. Converted historical data can be stored as-is, and any new entries could be padded to the required number of zeros. But I do not recommend that.
UPDATE t1 SET NUMER=LPAD(NUMER,6,'0') WHERE CHAR_LENGTH(NUMER)<=6 ; will not do what you expect since the NUMER field is an int. It will create the string '001234' from the int 1234 and then cast it back into 1234 - that is why there is no change.
Change NUMER to type int(6) zerofill and MySQL will pad it for you each time you read it.
If you really want zeros stored in the database, you have to change the type to CHAR/VARCHAR, then your LPAD update statement will work.
The field in the table is an int column so it just stores a number. There's no way to pad out the data in the table. 1 == 001 == 000000000001. This is the same number.
You should do the padding at the application level (the system that pulls the data out of the table). What happens when the order number goes above 999999? You would then have to update all the data in the table to add an extra 0. This kind of thing should not be done at the database level.
You could also select the data out with an LPAD:
SELECT LPAD(NUMER,6,'0'), [other_columns] FROM t1;
Alternative, As CBroe mentioned you could change the datatype to be INT(6) ZEROFILL so that it displays correctly but this will have to be modified if it goes above 999999 as mentioned above..

MYSQL find INT in string

I have a column MEDIUMTEXT that contains values that come from a goup_concat, in the form of INT,INT,INT . We can call it Concatenated_IDs
The length of the string can be of 1 int or more.
I need to break it down into original values somehow to be able to do something such as
SELECT
table_country.name
FROM
table_country
WHERE
table_country.country_id IN (
SELECT
Concatenated_IDs
FROM
table_targeted_countries
WHERE
table_targeted_countries.email LIKE "%gmail.com")
and get the country names that users registered with a gmail address target.
I have considered exploding the mediumtext into INT, creating one row for each int, sort of like a reverse concat, but I am guessing it would take a large procedure
edit:reformulated question name
You should probably normalize that table, so those concated ids are stored in a separate table, one id per record. But in the mean time, you can use mysql's find_in_set() function:
SELECT ...
WHERE FIND_IN_SET(table_country.country_id, Concatenated_IDs) > 0
Relevant docs: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_find-in-set

Best solution for saving boolean values and saving cpu and memory on searches

What is the best solution for inserting boolean values on database if you want more query performance and minimum losing of memory on select statement.
For example:
I have a table with 36 fields that 30 of them has boolean values (zero or one) and i need to search records using the boolean fields that just have true values.
SELECT * FROM `myTable`
WHERE
`field_5th` = 1
AND `field_12th` = 1
AND `field_20` = 1
AND `field_8` = 1
Is there any solution?
If you want to store boolean values or flags there are basically three options:
Individual columns
This is reflected in your example above. The advantage is that you will be able to put indexes on the flags you intend to use most often for lookups. The disadvantage is that this will take up more space (since the minimum column size that can be allocated is 1 byte.)
However, if you're column names are really going to be field_20, field_21, etc. Then this is absolutely NOT the way to go. Numbered columns are a sign you should use either of the other two methods.
Bitmasks
As was suggested above you can store multiple values in a single integer column. A BIGINT column would give you up to 64 possible flags.
Values would be something like:
UPDATE table SET flags=b'100';
UPDATE table SET flags=b'10000';
Then the field would look something like: 10100
That would represent having two flag values set. To query for any particular flag value set, you would do
SELECT flags FROM table WHERE flags & b'100';
The advantage of this is that your flags are very compact space-wise. The disadvantage is that you can't place indexes on the field which would help improve the performance of searching for specific flags.
One-to-many relationship
This is where you create another table, and each row there would have the id of the row it's linked to, and the flag:
CREATE TABLE main (
main_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
);
CREATE TABLE flag (
main_id INT UNSIGNED NOT NULL,
name VARCHAR(16)
);
Then you would insert multiple rows into the flag table.
The advantage is that you can use indexes for lookups, and you can have any number of flags per row without changing your schema. This works best for sparse values, where most rows do not have a value set. If every row needs all flags defined, then this isn't very efficient.
For performance comparisson you can read a blog post I wrote on the topic:
Set Performance Compare
Also when you ask which is "Best" that's a very subjective question. Best at what? It all really depends on what your data looks like and what your requirements are and how you want to query it.
Keep in mind that if you want to do a query like:
SELECT * FROM table WHERE some_flag=true
Indexes will only help you if few rows have that value set. If most of the rows in the table have some_flag=true, then mysql will ignore indexes and do a full table scan instead.
How many rows of data are you querying over? You can store the boolean values in an integer value and use bit operations to test for them them. It's not indexable, but storage is very well packed. Using TINYINT fields with indexes would pick one index to use and scan from there.