I have an SQL table on users where particular accounts are tagged with country code (2 letter words in uppercase) while other substrings in the tags (all separated by commas) are either in lowercase or more than 2 letters long.
In user table
Eg:
id User_tags
1 alu,US,ATD
2 GB,xx
3 ol,tuds,FR
Users 1,2 and 3 are tagged to countries US, GB and FR and I need to extract them from the user_tags column. I understand that regex functions are needed but I am not able to make them work in an SQL query.
Create a country code ref table and join to using the below. I don’t have sqlserver open, so unable to double check syntax , but it should work.
Select *
From yourtable y left join refcountry r
On charindex (r.code+’,’,y.string+’,’)>0
Note this might be slow for a large dataset
If not sqlserver find the equiv function for charindex in your rdbms
Related
I have 5 users which have a column 'shop_access' (which is a list of shop IDs eg: 1,2,3,4)
I am trying to get all users from the DB which have a shop ID (eg. 2) in their shop_access
Current Query:
SELECT * FROM users WHERE '2' IN (shop_access)
BUT, it only returns users which have shop_access starting with the number 2.
E.g
User 1 manages shops 1,2,3
User 2 manages shops 2,4,5
User 3 manages shops 1,3,4
User 4 manages shops 2,3
The only one which will be returned when running the IN Clause is User 2 and User 4.
User 1 is ignored (which it shouldn't as it has number 2 in the list) as it does not start with the number 2.
I'm not in a position to currently go back and change the way this is set up, eg convert it to JSON and handle it with PHP first, so if someone can try to make this work without having to change the column data (shop_access) that would be ideal.
A portable solution is to use like:
where concat(',', shop, ',') like '%,2,%'
Or if the value to search for is given as a parameter:
where concat(',', shop, ',') like concat('%,', ?, ',%')
Depending on your database, there may be neater options available. In MuSQL:
where find_in_set('2', shop)
That said, I would highly recommend fixing your data model. Storing CSV data in a database defeats the purpose of a relational database in many ways. You should have a separate table to store the user/shop relations, which each tuple on a separate row. Recommended reading: Is storing a delimited list in a database column really that bad?.
Also, you might want to consider using REGEXP here for an option:
SELECT *
FROM users
WHERE shop_access REGEXP '[[:<:]]2[[:>:]]';
-- [[:<:]] and [[:>:]] are word boundaries
SELECT * FROM users WHERE (shop_access = 2) OR (shop_access LIKE "2,%" OR shop_access LIKE "%,2,%" OR shop_access LIKE "%,2")
I'm trying to pull a list of IDs from a table Company where the first 6 characters of the ID are the same. The way our application creates a company ID is it takes the first 3 characters of the company name and the first 3 characters of the City. Beceause of that, overtime we have company IDs with the same first 6 characters, followed by a sequential number...
I was thinking using something using LIKE
Select companyID, companyName from Company Where
substring(companyID,1,6)+'%' like substring(companyID,1,6)+'%'
Basically i'm trying to get all company IDs where the first 6 characters match; The result set should show the just the top company ID ( The first 1 created) and the company name. I'm not expecting a tone of results, so i can then use the IDs returned to find the IDs below it.
I'm thinking it could maybe also be done using HAVING, where the count of IDs with the same first 6 characters are the same HAVING Count(*)>1??
Not really sure what the syntax would be...
SELECT distinct c1.CompanyID, c1.CompanyName, c2.CompanyID, c2.CompanyName
FROM dbo.Company c1
JOIN dbo.Company c2
ON SUBSTRING(c1.CompanyName,1,6) = SUBSTRING(c2.CompanyName,1,6)
AND c1.CompanyID < c2.CompanyID
order by c1.CompanyName, c2.CompanyName
SELECT c1.CompanyID, c1.CompanyName, c2.CompanyID, c2.CompanyName
FROM dbo.Company c1
INNER JOIN dbo.Company c2
ON SUBSTRING(c1.CompanyName,1,6) + '%' LIKE SUBSTRING(c2.CompanyName,1,6) + '%'
AND c1.CompanyID <> c2.CompanyID
If this is something that you envision doing frequently, I'd add a computed column to the table that has a definition of substring(CompanyName, 1, 6). You can then index it and make this efficient. As it is, it will have to scan all the entries and calculate the substring on the fly. With the computed column, you amortize the substring calculation up front and at least have a chance at an efficient query.
After trying to use Blam's script, i made a few slight changes and got some better results. His script was returning more results than rows in the table and it was pretty slow; think it's because of the company_name column. I got rid of it and wrote it like this:
select distinct c1.cmp_id, count(substring(c2.cmp_id,1,6)) as TotalCount
from company c1
join company c2 on substring(c1.cmp_id,1,6)=substring(c2.cmp_id,1,6)
group by c1.cmp_id
order by c1.cmp_id asc
This still returns all the table records, but atleast i can see the total count when the first 6 characters are listed more than once. Also, it ran in only 1 second so that's also a plus. Thank again for you input guys, always appreciated!
I have two tables: one called tweets and one called references. tweets consists out of the rows tweet_id and classified amongst others. references consists out of the rows tweet_id and class_id.
The row tweet_id in the table references only consists out of a fraction of the total tweet_ids in the table tweets.
What I would like to do is combine these tables in such a way that the eventual table shows the rows r.tweet_id, t.classified and r.class_id.
I've come up with this query, but for some reason it shows zero rows of output. In reality however, there are about 900 rows in r.tweet_id which all exist in t.tweet_id.
SELECT 't.tweet_id', 't.classified', 'r.tweet_id', 'r.class_id'
FROM `tweets` t, `references` r
WHERE 'r.tweet_id' = 't.tweet_id'
Could somebody tell me what I am doing wrong and how I should change my script in order to get the desired outcome?
Mysql uses backticks ` to escape schema object names (columns, tables, databases) and apostrophes ' and quotes " to escape strings so you are comparing string r.tweet_id with string t.tweed_id in your condition (which is supposed to be false), do:
SELECT t.tweet_id, t.classified, r.tweet_id, r.class_id
FROM tweets AS t
INNER JOIN `references` AS r ON r.tweet_id = t.tweet_id
Note that you have to just escape word references because it's reserved word in mysql and you can omit other backticks.
Also if you also want to display rows like 1, 2, NULL, NULL (tweets that weren't classified) you can use LEFT JOIN instead of INNER JOIN;if you allow multiple classifications per one tweet, some GROUP BY (Aggregate) Functions may get handy.
BTW: PostgreSQL uses " for schema object names and ' for strings.
I have inherited a database in which a person table has a field called authorised_areas. The front end allows the user to choose multiple entries from a pick list (populated with values from the description field of the area table) and then sets the value of authorised_areas to a comma-delimited list. I am migrating this to a MySQL database and while I'm at it, I would like to improve the database integrity by removing the authorised_areas field from the person table and create a many-to-many table person_area which would just hold pairs of person-area keys. There are several hundred person records, so I would like to find a way to do this efficiently using a few MySQL statements, rather than individual insert or update statements.
Just to clarify, the current structure is something like:
person
id name authorised_areas
1 Joe room12, room153, 2nd floor office
2 Anna room12, room17
area
id description
1 room12
2 room17
3 room153
4 2nd floor office
...but what I would like is:
person
id name
1 Joe
2 Anna
area
id description
1 room12
2 room17
3 room153
4 2nd floor office
person_area
person_id area_id
1 1
1 3
1 4
2 1
2 2
There is no reference to the area id in the person table (and some text values in the lists are not exactly the same as the description in the area table), so this would need to be done by text or pattern matching. Would I be better off just writing some php code to split the strings, find the matches and insert the appropriate values into the many-to-many table?
I'd be surprised if I were the first person to have to do this, but google search didn't turn up anything useful (perhaps I didn't use the appropriate search terms?) If anyone could offer some suggestions of a way to do this efficiently, I would very much appreciate it.
While it is possible to do this I would suggest that as a one off job it would probably be quicker to knock up a php (or your favorite scripting language) script to do it with multiple inserts.
If you must do it in a single statement then have a table of integers (0 to 9, cross join against itself to get as big a range as you need) and join this against your original table, using string functions to get the Xth comma and from that each of the values for each row.
Possible, and I have done it but mainly to show that having a delimited field is not a good idea. It would likely be FAR quicker to knock up a script with multiple inserts.
You could base an insert on something like this SELECT (although this also comes up with a blank line for each person as well as the relevant ones, and will only cope with up to 1000 authorised areas per person)
SELECT z.id, z.name, x.an_authorised_area
FROM person z
LEFT OUTER JOIN (
SELECT DISTINCT a.id, SUBSTRING_INDEX( SUBSTRING_INDEX( authorised_areas, ",", b.ournumbers ) , ",", -1 ) AS an_authorised_area
FROM person a, (
SELECT hundreds.i *100 + tens.i *10 + units.i AS ournumbers
FROM integers AS hundreds
CROSS JOIN integers AS tens
CROSS JOIN integers AS units
)b
)x ON z.id = x.id
I have a database of substrings generated from a list of words. I'm performing a comparison to retrieve all words that share substrings with some input word.
'word_substrings' Database format and example ( for the word 'aback' ):
id (primary key), word_id (Foreign Key), word_substring (char(3))
30 4 " a"
31 4 " ab"
32 4 "aba"
33 4 "bac"
34 4 "ack"
35 4 "ck "
36 4 "k "
Where the 'word_id' is the key of the word in a table of words.
I've tried an equivalence:
select distinct t1.word_id
from word_substrings t1, word_substrings t2
where t1.word_substring = t2.word_substring
and t2.word_id = [some word_id]
As well as a table join:
select distinct t1.word_id
from word_substrings as t1
join word_substrings as t2
on t1.word_substring = t2.word_substring
where and t2.word_id = [some word_id]
However, both queries take about 10 seconds to return results.
Given that the table of words and table of word_substrings are both liable to change, but the data will be retrieved very regularly, I tried making a view to help improve query times. However, I saw no nominal change in return times.
My list of words is currently 40k rows and my list of substrings is approximately 400k rows.
Does anyone have any ideas on how to either optimize the query, or to reformat the database to improve return times?
I've contemplated generating a table that has columns that represent every possible substring, and registering each word in the appropriate columns, however I don't quite know how that would work.
I thank you for all your help! If there is any information that I neglected to include, I will be happy to retrieve that data for you.
NOTE: If it is pertinent information, this is for a Django web application.
You need an index on word_id and word_substring. (As well, set the columns as not null if you can)
This way, queries using only word_id will work, and others using word_id and word_substring will also work.
Cheers.