Firstly I want this to be purely done with MySQL query.
I have a series of Invoice numbers
invoice_number
INV001
INV002
INV003
INV004
INV005
001
002
003
006
007
009
010
INVOICE333
INVOICE334
INVOICE335
INVOICE337
INVOICE338
INVOICE339
001INV
002INV
005INV
009INV
I want to output something like this
from_invoice_no to_invoice_no total_invoices
INV001 INV005 5
001 010 7
INVOICE333 INVOICE339 6
001INV 009INV 4
The invoice number pattern cannot be fixed. They can change in future
Please help me to achieve this.
Thanks in advance.
I will first show a general idea how to solve this problem and provide some code which will be ugly, but easily understandable. Then I'll explain what the issues are and how to remedy them.
STEP 1: Deriving the grouping criterion
For the first step, I assume you have the right (privilege) to create an additional column in your table. Let us name it invoice_text. Now, the general idea is to remove all digits from the invoice number so that only the "text pattern" remains. Then we can group by the text pattern.
Assuming that you have already created the column mentioned above, you could do the following:
UPDATE Invoices SET invoice_text = REPLACE(invoice_number, '0', '');
UPDATE Invoices SET invoice_text = REPLACE(invoice_text, '1', '');
UPDATE Invoices SET invoice_text = REPLACE(invoice_text, '2', '');
...
UPDATE Invoices SET invoice_text = REPLACE(invoice_text, '9', '');
After having done that, you will have the pure text pattern without digits in invoice_text and can use that for grouping:
SELECT COUNT(invoice_number) AS total_invoices FROM Invoices
GROUP BY invoice_text
This is nice, but it is not yet what you wanted. It does not show the first and last invoice number for each group.
STEP 2: Deriving the first and last invoice for each group
For this step, create one more column in your table. Let us name it invoice_digits. As the name implies, it is meant to take only the pure invoice number without the "pattern text".
Assuming you have that column, you could do the following:
UPDATE Invoices SET invoice_digits = REPLACE(invoice_number, 'A', '');
UPDATE Invoices SET invoice_digits = REPLACE(invoice_digits, 'B', '');
UPDATE Invoices SET invoice_digits = REPLACE(invoice_digits, 'C', '');
...
UPDATE Invoices SET invoice_digits = REPLACE(invoice_digits, 'Z', '');
Now, you can use that column to get the minimum and maximum invoice number (without "pattern text"):
SELECT
MIN(invoice_digits) AS from_invoice_no,
MAX(invoice_digits) AS to_invoice_no,
COUNT(invoice_number) AS total_invoices
FROM Invoices
GROUP BY invoice_text
Problems and how to solve them
1) According to your question, you want to get the minimum and maximum full invoice number text. The solution above will show only the minimum and maximum invoice number text without the text parts, i.e. only the digits.
We could remedy this by doing a further JOIN, but since I can very well imagine that you won't insist on this :-), and since it won't make the general idea more clear, I am leaving this to you. If you are interested, let us know.
2) It might be difficult to decide what a digit (i.e. what the actual invoice number) is. For example, if you have invoice numbers like INV001, INV002, this will be no problem, but what if you have INV001/001, INV001/002, INV002/003 and so on? In this example, my code would would yield 001001, 001002, 002003 as actual invoice numbers and use that to decide what the minimum and maximum numbers are.
This might not be what you want to do in that case. The only way around this is that you thoroughly think about what you should consider a digit and what not, and to adapt my code accordingly.
3) My code currently uses string comparisons to get the minimum and maximum invoice numbers. This may yield other results than comparing the values as numbers. If you are wondering what that means: Compare '19' to '9' as string, and compare 19 to 9 as number.
If this is a problem, then use MySQL's CAST to convert the text to a number before feeding it to MAX or MIN. But please be aware that this has its own caveats:
If you have very long invoice numbers with so many digits that they don't fit into MySQL's numeric data types, this method will fail. It will also fail if you have defined a character like / to be digits (due to the issues described in 2)) since MySQL can't convert this into a number.
Instead of converting to numbers, you can also pad the values in invoice_digits with leading zeroes, for example using MySQL's LPAD function. This will avoid the problems described above and sort the numbers as expected, even if they include non-digits like /, but you will have to know the maximum length of the digit string in advance.
4) The code is ugly! Do you really have to remove all possible characters from A to Z one by one by doing UPDATE statements to get the digit string?
Actually, it is even worse. I just have assumed that you only have the "text characters" A to Z in your invoices. But there could be any character Unicode defines: Russian or Chinese ones, special characters, in other words: thousands of different characters.
Unfortunately, AFAIK, MySQL still does not provide a REGEX-REPLACE function. I don't see any chance to get this problem solved unless you extend MySQL with an appropriate UDF (user defined function). There are some cool guys out there who have recognized the problem and have added such functions to MySQL. Since recommending libraries seems to be discouraged on SO, just google for "mysql regex replace".
When having extended MySQL that way, you can replace the ugly bunch of UPDATE statements which remove the digits / the text from the invoice number by a single one (using a REGEX, you can replace all digits or all non-digits at once).
For the sake of completeness, you could avoid the many UPDATE statements by doing UPDATE ... SET ... = REPLACE(REPLACE(REPLACE(...))) and thus applying all updates with one statement. But this is even more ugly and error prone, so if you are serious about your problem, you'll really have to extend MySQL by a REGEX-REPLACE.
5) The solution will only work if you have the privilege to create new columns in the table.
This is true for the solution as-is. But I have chosen to go that way solely because it makes the general idea clear and understandable. Instead of adding columns to your original table, you could also create a new table where you store the pure text / digits (this table might be a temporary one).
Furthermore, since MySQL supports grouping by computed values, you don't need additional columns / tables at all. You should decide by yourself what is the best way to go.
Related
forgive me... I'm a noob in this field.
I have googled this already, but I'm not entirely sure of the info I found as it's not exactly matching my use case.
In my db I have an accounts table with addresses. the addresses are broken up into street, city, state, country, and postalcode. everything was imported from a csv file which somehow stripped the leading 0 on every number... some postal codes have leading zeros (newjersey for example) so the application that pulls from the db is throwing some errors when opening these accounts.
The column is already varchar so find and replace should be pretty simple. I need to add the zero only to strings with 4 characters in them.
In theory, I COULD just run this:
UPDATE account SET columnName=postalcodes(nums,5,0);
the problem is a handful of accounts have the 4 digit postal extension as well. doing this would (as I understand it) wreck those fields... or am I wrong?
so basically, how do I do an update in this situation, to add a zero only if the field is 4 characters long?
Use lpad():
update account set postal_code = lpad(postal_code, 5, '0')
This left-padds the string with 0s if it is less than 5 characters long (note that if the string is longer than 5 characters, it is truncated to the target length).
not sure how to specifically upvote comments, but spencer7589 and GMB helped.
I used this:
SELECT id, shipping_address_postal_code FROM account WHERE CHAR_LENGTH(shipping_address_postal_code) = 4;
UPDATE account SET shipping_address_postal_code = lpad(shipping_address_postal_code, 5, '0') WHERE CHAR_LENGTH(shipping_address_postal_code) = 4;
SELECT id, billing_address_postal_code FROM account WHERE CHAR_LENGTH(billing_address_postal_code) = 4;
UPDATE account SET billing_address_postal_code = lpad(billing_address_postal_code, 5, '0') WHERE CHAR_LENGTH(billing_address_postal_code) = 4;
basically I did select as recommended by spencer7593 to check what data would be modified. then once I was satisfied it was safe to do so, I used where to single out the fields with 4 characters or less--also suggested by spencer7593--and lpad as suggested by GMB (because I misunderstood the original query I was copying).
thanks guys!
I have created a database table in mySQL of which two column names are "landPhone" and "mobilePhone" to store phone numbers (in the format of: 123-456-8000 for land and 098-765-6601 for mobile). These two columns' data type are set to VARCHAR(30). The data have been inserted in the table. But after SQL query, I found the phone numbers have been truncated. It shows (above two data for example) only first 3 digits (123) for landPhone and only first 2 digits after removing the leading '0' (98) for mobilePhone.
Why this is happening ?
Phone numbers are not actually numbers; they are strings that happen to contain digits (and, in your case, dashes). If you try to interpret one as a number, two things typically happen:
Leading zeros are forgotten.
Everything from the first non-digit to the end of the string is stripped off.
That sounds exactly like the result you're describing. Even if you end up stuffing the result into a string field, it's too late -- the data has already been corrupted.
Make sure you're not treating phone numbers as integers at any point in the process.
You must use
insert into sample values('123-456-8000', '098-765-6601' )
instead of
insert into sample values(123-456-8000, 098-765-6601 )
see this SQLFiddle.
Thanks all for your solution. As cHao suspected, it was me who did the mistake. When I first time created the table, I declared the datatype of the phone columns as INT, later I corrected them to VARCHAR().
When I dropped the table and inserted the same data to the new table, it is working fine.
That sounds exactly like the result you're describing. Even if you end up stuffing the result into a string field, it's too late -- the data has already been corrupted. ..cHao
Question to understand: Why mySQL doesn't override the previous datatype with the new one ?
I want to select only the area code from a list of column entries populated by phone numbers. This is what I have:
SELECT LEFT(phone, 3) AS areacode, COUNT(phone) AS count
FROM registration
GROUP BY areacode;
The problem is, the entries aren't consistent. So some phone numbers start as +123-456-7899, and others with (123)-456-7899, and others with no symbol at the beginning.
So my question is: is there a way that I can ensure the SELECT LEFT starts at the first integer?
Thanks!
There are somethings that SQL is just not meant for. This is one. I would select the phone number into a string, and do some pattern matching in your programming language of choice to find the area code.
-OR-
Change your table such that area code is a different column.
Two options (neither of which being SQL):
Select all phone numbers and use a programming language of your choice to programatically strip out the unnecessary characters.
Clean the input to strip out all unnecessary characters prior to inserting them into the database
SQL is not the best way to do this, rather, SQL + programming
There actually is a way to do this in SQL that was intentionally designed for this exact purpose.
SELECT SUBSTRING(office_phone_number, 1, 3) FROM contact;
Of course, this depends on how the number is stored in the table. If parenthesis are present, your starting position would be off.
Here is more information:
MySQL substring function
I have a table with a decimal column with a lenght = 9 and decimals = 2.
If I put a value of 21.59 (for example) it works ok.
If I put 52.00 it writes only 52. I need to keep 52.00 instead.
Master question: Can the database store the value this way? Instead of
using format/cast in select to retrieve the value...
As noted bellow, this make sense:
"You shouldn't worry about display formatting issues at the database
level but at the ... display level"
Use the FORMAT function:
select format(mycolumn, 2) from mytable;
This also has the effect of adding thousand's separator into the number, so you would get output like 123,456.70. There are workarounds if this doesn't work for you.
Given that MySQL doesn't have the world's best facilities for formatting numbers, display issues like this are usually handled in client code.
In a MySQL database, I have a table which contains itemID, itemName and some other fields.
Sample records (respectively itemID and itemName):
vaX652bp_X987_foobar, FooBarItem
X34_bar, BarItem
tooX56, TOOX_What
I want to write a query which gives me an output like:
652, FooBarItem
34, BarItem
56, TOOX_What
In other words, I want to extract out the number from the itemID column. But the condition is that the extracted number should be the number that occurs after the first occurence of the character "X" in the itemID column.
I am currently trying out locate() and substring() but could not (yet) achieve what I want..
EDIT:
Unrelated to the question - Can any one see all the answers (currently two) to this question ? I see only the first answer by "soulmerge". Any ideas why ? And the million dollar question - Did I just find a bug ?!
That's a horrible thing to do in mysql, since it does not support extraction of regex matches. I would rather recommend pulling the data into your language of choice and processing it there. If you really must do this in mysql, using unreadable combinations of LOCATE and SUBSTRING with multiple CASEs is the only thing I can think of.
Why don't you try to make a third column where you can store, at the moment of the insertion of the record (separating the number in PHP or so), the number alone. So this way you use a little more of space to save a lot of processing.
Table:
vaX652bp_X987_foobar, 652, FooBarItem
X34_bar, 34, BarItem
tooX56, 56, TOOX_What
This isn't so unreadable :
SELECT 0+SUBSTRING(itemID, LOCATE("X", itemID)+1), itemName FROM tableName