I have a spreadsheet that I have imported into MySQL.It has many rows of entries, including gambling odds i.e 2/1, or 7/2. Unfortunately the gambling odds are read as varchar by MySQL which makes it impossible to do calculations on them. It was suggested that I create a look-up table, where the gambling odds can be converted to there decimal values. This makes sense. OK so the big question is how do I go about this? Should I create a separate table that lists gambling odds and equates these to their decimal equivalents, if so, how would I make queries such as, find all the rows that have odds at 2/1 from table 1, and multiply this by £1. Any suggestions please?
I think a lookup table is too hard to maintain, since there are an infinite number of possible odds combinations.
Instead, I would strongly suggest that you create a view over your base table, that has the various fields that contain the odds:
create view v_table as
select t.*,
OddsTop*1.0/OddsBottom as OddsNumeric,
OddsBottom*1.0/(OddsTop + OddsBottom) as OddsPvalue
from (select t.*,
cast(left(t.odds, locate('/', t.odds, '/')-1) as int) as OddsTop,
cast(substring(t.odds, locate('/', t.odds)+1, 100) as int) as OddsBottom,
from t
) t
You can easily calculate various types of information related to the odds. Here, I've shown how to get the top number, bottom number, odds as a floating point number, and the p-value equivalent.
As far as I know there is no datatype for it in MySQL. I would suggest not to create separate table as you suggest. If precision is not of utmost importance you can just store them as a decimal with a specific width and query with the decimal value. You can always convert it back to its fractional representation in the application layer. If simplicity matters you can store them as varchar. See here for a related question.
This is actually quite an interesting question. My two-bits suggests that "7/2" isn't actually being used as a number, so could be interpreted as being a "code", in the same way that country codes are used instead of the whole country name. I might be inclined to go with the lookup table, using the Odds as the key, and the multiplication factors as columns in each row for use in math. You could also control how much precision you'd like to use, as well as have queries for high-odds and low-odds very easily.
Not necessarily saying I'm right here, just find this an interesting one.
Related
I need help with a mysql query. I have a table like this
I have "type" 1 to 15. I would like help with a query to automatically change the "type"
I wouldn't do it as an update, because you're saying that points determine type, and you want points AND type in the table, which means they must always be kept in sync. Of the two, points is more fine grained - points can be used to determine type but not the other way round - so we can devise a strategy to determine the type from the current points, and we can let points increase and the type will change automatically upon querying:
make another table with the type, lower and upper bound for points and then join it in to find the type:
CREATE TABLE TypeRanges (
playerType INT,
fromPoints INT,
toPoints INT
)
INSERT INTO TypeRanges VALUES(1, 0, 1599)
...
SELECT * FROM
username p
INNER JOIN typeRanges t ON p.points BETWEEN t.fromPoints AND t.toPoints
Remember that BETWEEN is inclusive at both ends so for < 1600 points you want the end to be 1599, for 1600 to 14000 you probably want 1600 and 13999 etc
If you want, you can make a view out of this query and then use that view anywhere you want to know the points and the type together. See the comments for a bit more on what a view is /used for
Footnote on dynamism/ performance considerations:
Every time you run this query it will calculate the type from the points. Calculating the type when you run the query rather than updating the type when the points change means you can easily redefine the bounds or add to them just by altering the points range table. Because we are calculating every time it's highly responsive to data updating but it would be a few nanoseconds slower than having type stored and retrieved simply; in most cases the benefits of recalculating outweigh this but if you're going to be querying it thousands of times per second and updating it once a year (as an extreme example) it may make sense to store the type instead. In most typical use cases I would go the route of calculating the type from the points and only look to optimize it if it proves to be a problem when scaled to large numbers of users and lots of activity. It would be a premature optimization to assume that lookup will make things unusably slow and seeking to store it - databases are engineered towards rapid data joining and retrieval. If you did determine that storing it would be better you can make the sync transparent by using a trigger to import the type upon each update
Side note; seek to avoid using reserved/keywords like TYPE as column names - while they can be quoted etc it usually does more good to find a more descriptive label for the column that doesn't need to be quoted in queries and treated specially in front end languages
If you are happy storing your logic in the table itself, then you could use a calculated column.
CREATE TABLE members (
points INT,
type DOUBLE AS
(CASE
WHEN points < 1600 THEN 1
WHEN points < 14000 THEN 2
-- TODO: implement other cases
ELSE 3
END))
So when the points column is updated, there is no need to "update" the type column - the type is automatically calculated from the points whenever the table is read.
https://www.db-fiddle.com/f/fqEL2jtNio1Srt5xLkyLy2/0
Edit:
As Caius Jard mentions in their answer, the performance of selects would start to degrade at scale, but how you optimize depends on how volatile the points are, and how frequent your reads are.
I'm trying to understand why this is happening but I couldn't find anything in the internet.
I have a table of meds(called Medicamento) which has 23600 elements in it.
When I try to take an element using the IdMed column it only takes the values with less than 6 digits. Example 1:
SELECT * FROM `Medicamento` WHERE IdMed=100
Example 2:
SELECT * FROM `Medicamento` WHERE IdMed=200703
At this point I thought that the med with that Id was not created so I did this last query which made me not knowing where the mistake is:
SELECT * FROM `Medicamento` WHERE IdMed>200702
Result:
As you can see the first element is the one with the 200703 Id. What I cannot understand is why it takes elemnts with Id's such as 12700 or 100 but it doesn't take elements with Id's of 6 numbers. I thought it could be a matter of formats but I didn't find anything helpful.
Data of the table was taken from 2 different .xlsx files, that's why I thought about formats.
PD: Sorry for my bad English. I hope the problem is understood.
EDIT:
Table data types
In a nutshell, what's happening is your value is losing precision because you're using an inaccurate data type. float is for floating point numbers, and ideally shouldn't normally be used as a primary key. your best bet is to change this to an integer data type instead. By the looks of the comments, this may not be viable, you're probably best off to create another column and use THAT as the primary key instead. What's likely happening is for example with 200703, it's potentially being stored in the database as 200703.000001 or 2007002.99999 and you're searching for a value that's not an exact match to how the database is storing it.
As a suggestion, you may want to change your current float column to a double column instead to retain a little more precision beyond the decimal point.
I need to store a range of numeric or datetime values in MySQL, preferably in a single column. Unfortunately, as there are no real array or set data-types in MySQL, likewise it seems that there is no range data-type, so I'm a bit at an impasse here, hoping to come up with something smart.
Common use cases for a range would be e.g. storing the start and end times of an event, or the minimum and maximum prices of a given product.
In my case, I simply need to store the year(s) a book was written. In some cases, there is ambiguity and the year I have on record may be e.g. 810-820. Of course one way to go would be to have separate year_min and year_max columns, and then have identical data stored in both columns in case there is no variance.
Yet only a fraction of the entries I have would actually need to have such a range stored, and I'd much love to just query a simple BETWEEN 750 AND 850 for example -- and avoid both having a WHERE hit on two columns, as well as the redundant duplication of data in 98% of the cases.
What's your recommended approach? Any best practice tips? I know how to tune up decent two-column queries. I'm hoping there's another way to go about this... (And no, I'm not likely to switch to PostgreSQL just to have the benefit of their range types.)
I would recommend going with a two column solution, despite that it is not that sexy or clever. Suppose you implement this with one column. Then your database becomes non relational, because a given record and column now points to multiple values (year_min and year_max). So even though your schema might appear tidy, you might lose that benefit in the form of more difficult queries.
I'm working tons of phone numbers, and many are international.
I've changed my phone numbers table structure to have 5 columns:
`phonenumbers`.`phoneID`
`phonenumbers`.`countrycode`
`phonenumbers`.`areacode`
`phonenumbers`.`phonenumber`
`phonenumbers`.`ext`
At the moment the phoneID is the only column that's an INT, since it's the primary key.
Should I change the other columns to integers? I've heard indexes work best with numeric values, and I'm only storing numbers in each of the columns (no dashes, parenthesis, spaces, etc)
I'm still learning how MySQL works with indexes, so I'm curious how others work with searching for numbers. In this case, I'm sure I'll be searching for numbers that start with a certain known areacode and part of a known phonenumber, or an entire phonenumber.
The part that gets me with indexing and table columns like phone numbers is that I don't always know how long a phonenumber will be. Since countries have different lengths for areacodes and phonenumbers.
In summary, INT vs VARCHAR indexing with numbers.
Phone numbers are not integers, so don't store them as one, it'll just cause you trouble. The obvious cases are when you have to handle phone numbers too big to fit in an int, or phone numbers starting with a 0.
Moreover, as you want to do prefix matches (phonenumber like '800%'), mysql will be able to use indexes if you're using varchar columns.
You have to figure out how you're querying this data, if you're frequently doing queries like where countrycode='1' and areacode='123' and phonenumber like '2%' , you'd want a compound index on (countrycode,areacode,phonenumber) , and if you're also often doing queries on only the phonenumber, you'd want an additional index only on the phonenumber column, but this is something you have to work out depeding on the amount of data you have and queries you do - work with EXPLAIN to learn how your indexes are used and where they are needed.
Use varchar for representing phone numbers NOT integers. Otherwise you will find your design decision will come back to bite you.
Also: "I've heard indexes work best with numeric values" - well, that's not strictly accurate: yes the index will take up less space, and more rows will fit per page etc, but an index on a varchar column works perfectly well.
Worry about index size and performance when (1) you have a huge amount of data and (2) when you have measured a performance problem.
In my opinion you have a lot of attributes, that you don´t need, and for phone numbers i usualy use an auto-increment key for id and the phone number is a varchar. This makes it easier the validation with the use of a programming language. It´s my opinion...
Use a BIGINT UNSIGNED simple because this forces you to normalize your data. Force your user to store the phonenumber in root level. That means at country level. You could store the country prefix in a separate column to ease the usage.
Everybody types phone-numbers in different ways and this makes it almost impossible to search the data.
E.g. %020123456% will not match 02 0123456. Are you going to search all combinations or just parse it?
This i know from experience, we had to fix manually about 1,000 phonenumbers which we could not script out when installing an auto-dialer.
I've been inserting some numbers as INT UNSIGNED in MySQL database. I perform search on this column using "SELECT. tablename WHERE A LIKE 'B'. I'm coming across some number formats that are either too long for unsigned integer or have dashes in them like 123-456-789.
What are some good options for modifying the table here? I see two options (are there others?):
Make another column (VARCHAR(50)) to store numbers with dashes. When a search query detects numbers with dashes, look in this new column.
Recreate the table using a VARCHAR(50) instead of unsigned integer for this column in question.
I'm not sure which way is the better in terms of (a) database structure and (b) search speed. I'd love some inputs on this. Thank you.
Update: I guess I should have included more info.
These are order numbers. The numbers without dashes are for one store (A), and the one with dashes are for Amazon (B; 13 or 14 digits I think with two dashes). A's order numbers should be sortable. I'm not sure if B has to be since the numbers don't mean anything to me really (just a unique number).
If I remove the dashes and put them all together as big int, will there be any decrease in performance in the search queries?
the most important question is how you would like to use the data. What do you need? If you make a varchar, and then you would like to sort it as a number, you will not be able to, since it will be treating it as string..
you can always consider big int, however the question is: do you need dashes? or can you just ignore them on application level? if you need them, it means you need varchar. in that case it might make sense to have two columns if you want to be able to for example sort them as numbers, or perform any calculations. otherwise probably one makes more sense.
you should really provide more context about the problem
Mysql has the PROCEDURE ANALYSE , which helps you to identify with your existing data sets. here's some example.
Given you are running query WHERE A LIKE 'B' mainly. You can also try full text search if "A" varies a lot.
I think option 2 makes the most sense. Just add a new column as varchar(50), put everything in the int column into that varchar, and drop the int. Having 2 separate columns to maintain just isn't a good idea.