Selecting values where the first 6 characters match - sql-server-2008

I'm trying to pull a list of IDs from a table Company where the first 6 characters of the ID are the same. The way our application creates a company ID is it takes the first 3 characters of the company name and the first 3 characters of the City. Beceause of that, overtime we have company IDs with the same first 6 characters, followed by a sequential number...
I was thinking using something using LIKE
Select companyID, companyName from Company Where
substring(companyID,1,6)+'%' like substring(companyID,1,6)+'%'
Basically i'm trying to get all company IDs where the first 6 characters match; The result set should show the just the top company ID ( The first 1 created) and the company name. I'm not expecting a tone of results, so i can then use the IDs returned to find the IDs below it.
I'm thinking it could maybe also be done using HAVING, where the count of IDs with the same first 6 characters are the same HAVING Count(*)>1??
Not really sure what the syntax would be...

SELECT distinct c1.CompanyID, c1.CompanyName, c2.CompanyID, c2.CompanyName
FROM dbo.Company c1
JOIN dbo.Company c2
ON SUBSTRING(c1.CompanyName,1,6) = SUBSTRING(c2.CompanyName,1,6)
AND c1.CompanyID < c2.CompanyID
order by c1.CompanyName, c2.CompanyName

SELECT c1.CompanyID, c1.CompanyName, c2.CompanyID, c2.CompanyName
FROM dbo.Company c1
INNER JOIN dbo.Company c2
ON SUBSTRING(c1.CompanyName,1,6) + '%' LIKE SUBSTRING(c2.CompanyName,1,6) + '%'
AND c1.CompanyID <> c2.CompanyID

If this is something that you envision doing frequently, I'd add a computed column to the table that has a definition of substring(CompanyName, 1, 6). You can then index it and make this efficient. As it is, it will have to scan all the entries and calculate the substring on the fly. With the computed column, you amortize the substring calculation up front and at least have a chance at an efficient query.

After trying to use Blam's script, i made a few slight changes and got some better results. His script was returning more results than rows in the table and it was pretty slow; think it's because of the company_name column. I got rid of it and wrote it like this:
select distinct c1.cmp_id, count(substring(c2.cmp_id,1,6)) as TotalCount
from company c1
join company c2 on substring(c1.cmp_id,1,6)=substring(c2.cmp_id,1,6)
group by c1.cmp_id
order by c1.cmp_id asc
This still returns all the table records, but atleast i can see the total count when the first 6 characters are listed more than once. Also, it ran in only 1 second so that's also a plus. Thank again for you input guys, always appreciated!

Related

MYSQL Group By Returning Duplicate Values

I am seeing a weird problem with MYSQL GROUP BY.
I have a query...
SELECT schools.schoolregion,
Count(schools.schoolregion) AS regioncount,
(
SELECT Count(jobs_jobsubject)
FROM 'jobs'
WHERE 'jobs_createdDate' BETWEEN '$startofyear'
AND '$endofyear') AS regionjobstotal
FROM 'jobs'
LEFT JOIN 'schools'
ON 'jobs_schoolID'='SID'
WHERE 'jobs_createdDate' BETWEEN '$startofyear'
AND '$endofyear'
GROUP BY 'schoolRegion'
...in which I am attempting to total the number of job postings listed per region and group by region. I have two tables, one with a list of schools and another with job information that has a column value that joins back to the school. I need the region total, and the overall total of jobs within a time period (hence the sub query).
When I run this query, I get everything that I expect - except that I am getting a duplicate region listing in the returned results of the GROUP BY function.
For example, here is the table that I am getting but not sure why the duplicate for the Middle East.
schoolRegion regioncount regionjobstotal
Africa 1 38
Asia 6 38
Middle East 20 38
Middle East 11 38
I thought maybe there was an extra character or something, but I could not find/see anything different about the values within the tables - which for that column is being stored as type "text". Is there anything I can check for? Is it something to do with the query?
Any help would be fantastic and much appreciated!!
My guess is that the data is not ordered by schoolRegion. I would add an ORDER BY schoolRegion ASC to your query to ensure that they are organized thusly. :)
OMG, do I feel like a noob!!
When I adjusted the query to list the schools, there was only one school that was not included in the GROUP BY. Initially when I looked at this hours ago, inline editing in PHPMYADMIN didn't show that there was a character return AFTER the text - so I wrote off that it was the text of the value being stored. But when I checked the box to edit the row individually and not inline and went to that column value - low and behold - a carriage return!!! Sometimes it's the little things like that which kill and humble me.
First, i do not think you can supply a child select statement as a column in your parent select statement "(SELECT COUNT(jobs_jobSubject)...".
Also since the where clause for your child and parent select are thesame, why not use a single select statement and get the count of both.
SELECT schools.schoolRegion,
COUNT(schools.schoolRegion) AS regioncount,
COUNT(jobs_jobSubject) AS regionjobstotal
FROM 'jobs' jb
INNER JOIN 'schools' sc ON jb.jobs_schoolID=sc.SID
WHERE 'jobs_createdDate'
BETWEEN '$startofyear' AND '$endofyear'
GROUP BY 'schoolRegion'

How to convert list field into many-to-many table in MySQL

I have inherited a database in which a person table has a field called authorised_areas. The front end allows the user to choose multiple entries from a pick list (populated with values from the description field of the area table) and then sets the value of authorised_areas to a comma-delimited list. I am migrating this to a MySQL database and while I'm at it, I would like to improve the database integrity by removing the authorised_areas field from the person table and create a many-to-many table person_area which would just hold pairs of person-area keys. There are several hundred person records, so I would like to find a way to do this efficiently using a few MySQL statements, rather than individual insert or update statements.
Just to clarify, the current structure is something like:
person
id name authorised_areas
1 Joe room12, room153, 2nd floor office
2 Anna room12, room17
area
id description
1 room12
2 room17
3 room153
4 2nd floor office
...but what I would like is:
person
id name
1 Joe
2 Anna
area
id description
1 room12
2 room17
3 room153
4 2nd floor office
person_area
person_id area_id
1 1
1 3
1 4
2 1
2 2
There is no reference to the area id in the person table (and some text values in the lists are not exactly the same as the description in the area table), so this would need to be done by text or pattern matching. Would I be better off just writing some php code to split the strings, find the matches and insert the appropriate values into the many-to-many table?
I'd be surprised if I were the first person to have to do this, but google search didn't turn up anything useful (perhaps I didn't use the appropriate search terms?) If anyone could offer some suggestions of a way to do this efficiently, I would very much appreciate it.
While it is possible to do this I would suggest that as a one off job it would probably be quicker to knock up a php (or your favorite scripting language) script to do it with multiple inserts.
If you must do it in a single statement then have a table of integers (0 to 9, cross join against itself to get as big a range as you need) and join this against your original table, using string functions to get the Xth comma and from that each of the values for each row.
Possible, and I have done it but mainly to show that having a delimited field is not a good idea. It would likely be FAR quicker to knock up a script with multiple inserts.
You could base an insert on something like this SELECT (although this also comes up with a blank line for each person as well as the relevant ones, and will only cope with up to 1000 authorised areas per person)
SELECT z.id, z.name, x.an_authorised_area
FROM person z
LEFT OUTER JOIN (
SELECT DISTINCT a.id, SUBSTRING_INDEX( SUBSTRING_INDEX( authorised_areas, ",", b.ournumbers ) , ",", -1 ) AS an_authorised_area
FROM person a, (
SELECT hundreds.i *100 + tens.i *10 + units.i AS ournumbers
FROM integers AS hundreds
CROSS JOIN integers AS tens
CROSS JOIN integers AS units
)b
)x ON z.id = x.id

Finding and dealing with duplicate users

In a large user database with the following format and sample data, we are trying to identify duplicated people:
id first_name last_name email
---------------------------------------------------
1 chris baker
2 chris baker chris#gmail.com
3 chris baker chris#hotmail.com
4 chris baker crayzyguy#crazy.com
5 carl castle castle#npr.org
6 mike rotch fakeuser#sample.com
I am using the following query:
SELECT
GROUP_CONCAT(id) AS "ids",
CONCAT(UPPER(first_name), UPPER(last_name)) AS "name",
COUNT(*) AS "duplicate_count"
FROM
users
GROUP BY
name
HAVING
duplicate_count > 1
This works great; I get a list of duplicates with the id numbers of the involved rows.
We would re-assign any associated data tied to a duplicate to the actual person (set user_id = 2 where user_id = 3), then we delete the duplicating user row.
The trouble comes after we make this report the first time, as we clean up the list after manually verifying that they are indeed duplicates -- some ARE NOT duplicates. There are 2 Chris Bakers that are legitimate users.
We don't want to keep seeing Chris Baker in subsequent duplicate reports until the end of time, so I am looking for a way to flag that user id 1 and user id 4 are NOT duplicates of each other for future reports, but they could be duplicated by new users added later.
What I tried
I added a is_not_duplicate field to the user table, but then if a new duplicate "Chris Baker" gets added to the database, it will cause this situation to not show on the duplicate report; the is_not_duplicate improperly excludes one of the accounts. My HAVING statement would not meet the > 1 threshold until there are -two- duplicates of Chris Baker, plus the "real" one marked is_not_duplicate.
Question Summed Up
How can I build exceptions into the above query without looping results or multiple queries?
Sub-queries are fine, but the size of the dataset makes every query count and I'd like the solution to be as performant as possible.
Try to add the is_not_duplicate boolean field and modify your code as follows:
SELECT
GROUP_CONCAT(id) AS "ids",
CONCAT(UPPER(first_name), UPPER(last_name)) AS "name",
COUNT(*) AS "duplicate_count",
SUM(is_not_duplicate) AS "real_count"
FROM
users
GROUP BY
name
HAVING
duplicate_count > 1
AND
duplicate_count - real_count > 0
Newly added duplicates will have is_not_duplicate=0 so the real_count for that name will be less than duplicate_count and the row will be shown
My brain is too fried to come up with the actual query for this at the moment, but I might be able to give you a nudge in a path that should work :)
What if you did add another column (maybe a table of valid duplicated users instead?...both will accomplish the same thing), and ran a subquery that would count up all of the valid duplicates and then you could compare against the count in your current query. You would exclude any users that have matching counts, and would pull in any with counts that are higher. Hopefully that makes sense; I will create a use case:
Chris Baker with id 1 and 4 are marked as valid_duplicates
There are 4 Chris Baker's in the system
You get a count of valid Chris Baker's
You get a count of all Chris Baker's
valid_count <> total_count, so return Chris Baker
*You probably can even modify the query so that it does not even list the duplicate id's (even if you get a duplicate marking of only 1 id). Rather than having to re-check which are the valids. This would be a little more complicated. Without it, at least you ignore Chris Baker until another enters the system
I have written up the basic query, dealing with excluding specific id's I will try to roll in tonight. But, this at least solves your initial need. If you do not need the more complicated query, do let me know so that I do not waste my time on it :)
SELECT
GROUP_CONCAT(id) AS "ids",
CONCAT(UPPER(first_name), UPPER(last_name)) AS "name",
COUNT(*) AS "duplicate_count"
FROM
users
WHERE NOT EXISTS
(
SELECT 1
FROM
(
SELECT
CONCAT(UPPER(first_name), UPPER(last_name)) AS "name",
COUNT(*) AS "valid_duplicate_count"
FROM
users
WHERE
is_valid_duplicate = 1 --true
GROUP BY
name
HAVING
valid_duplicate_count > 1
) AS duplicate_users
WHERE
duplicate_users.name = users.name
AND valid_duplicate_count = duplicate_count
)
GROUP BY
name
HAVING
duplicate_count > 1
Below is the query that should do the same as above, but the final list will only print the id's that are not in the valid list. This actually ended up being a lot simpler than I thought. And, it is mostly the same as above, but the only reason I kept above is to keep the two options and in case I messed the above up...it does get complicated as it is many nested queries. If CTE's are available to you, or even temp tables. It might make the query more expressive to break it up into temp tables :). Hopefully this helps and is what you are looking for
SELECT GROUP_CONCAT(id) AS "ids",
CONCAT(UPPER(first_name), UPPER(last_name)) AS "name",
COUNT(*) AS "final_duplicate_count"
--This count could actually be 1 due to the nature of the query
FROM
users
--get the list of duplicated user names
WHERE EXISTS
(
SELECT
CONCAT(UPPER(first_name), UPPER(last_name)) AS "name",
COUNT(*) AS "total_duplicate_count"
FROM
users AS total_dup_users
--ignore valid_users whose count still matches
WHERE NOT EXISTS
(
SELECT 1
FROM
(
SELECT
CONCAT(UPPER(first_name), UPPER(last_name)) AS "name",
COUNT(*) AS "valid_duplicate_count"
FROM
users AS valid_users
WHERE
is_valid_duplicate = 1 --true
GROUP BY
name
HAVING
valid_duplicate_count > 1
) AS duplicate_users
WHERE
--join inner table to outer table
duplicate_users.name = total_dup_users.name
--valid count check
AND valid_duplicate_count = total_duplicate_count
)
--join inner table to outer table
AND total_dup_users.Name = users.Name
GROUP BY
name
HAVING
duplicate_count > 1
)
--ignore users that are valid when doing the actual counts
AND NOT EXISTS
(
SELECT 1
FROM users AS valid
WHERE
--join inner table to outer table
users.name =
CONCAT(UPPER(valid.first_name), UPPER(valid.last_name))
--only valid users
AND valid.is_valid_duplicate = 1 --true
)
GROUP BY
FinalDuplicates.Name
Since this is basically a many-to-many relationship I would add a new table not_duplicate with fields user1 and user2.
I would probably add two rows for each not_duplicate relationship such that I have one row for 2 -> 3 and a symmetric row for 3 -> 2 to ease querying, but that may introduce data inconsistencies so make sure you delete both rows at the same time (or have only one row and make the correct query in your script).
well it seems to me that the is_not_duplicate column is not complex enough to hold the information you want to store - from what I understand you want to manually tell your detection that two distinct users are not duplicates of each other. so either you create a column like is_not_duplicate_of=other-user-id or if you want to keep the possibility open that one user can be manually defined not duplicate of more than one users, you need a seperate table with two user-id columns.
the query telling you the non overridden duplicates probably has to be a bit more complex than the one you suggested, I cannot think of one that works with a group by and having logic. The only thing that would come to my mind is something like
SELECT u1.* FROM users u1
INNER JOIN users u2
ON u1.id <> u2.id
AND u2.name = u1.name
WHERE NOT EXISTS (
SELECT *
FROM users_non_dups un
WHERE (un.id1 = u1.id AND un.id2 = u2.id)
OR (un.id1 = u2.id AND un.id2 = u1.id)
)
If you were to correct all duplicates each time you run the report, then a very simple solution might be to modify the query:
SELECT
GROUP_CONCAT(id) AS "ids",
MAX(id) AS "max_id",
CONCAT(UPPER(first_name), UPPER(last_name)) AS "name",
COUNT(*) AS "duplicate_count"
FROM
users
GROUP BY
name
HAVING
duplicate_count > 1
AND
max_id > MAX_ID_LAST_TIME_DUPLICATE_REPORT_WAS_GENERATED;
I would go ahead and make the "confirmed_unique" column, defaulted as "False."
In order to avoid the problems you mentioned,
Then I would select all elements that may look like duplicates and have a "False" entry for "confirmed_unique."
I am not sure if this will work, but could you consider the reverse logic of adding a *is_duplicate_of* column? That way you can mark duplicates by entering the ID of the first record at this column which will be greater than zero. The records that you wish to retain will have a 0 value at this field. You can set the default (unchecked records) to -1 to keep track of the validation status for each record.
Afterwards you can keep executing an SQL that will compare new records only with correct records having is_duplicate_of = 0 .
If you are ok to make a slight change to the format of the report. You could do a self-join like this -
SELECT
CONCAT(u1.id,",", u2.id) AS "ids",
CONCAT(UPPER(u1.first_name), UPPER(u1.last_name)) AS "name"
FROM
users u1, users u2
WHERE
u1.id < u2.id AND
UPPER(u1.first_name) = UPPER(u2.first_name) AND
UPPER(u1.last_name) = UPPER(u2.last_name) AND
CONCAT(u1.id,",", u2.id) NOT IN (SELECT ids from not_dupe)
which reports duplicates as follows:
ids | name
----|--------
1,2 | CHRISBAKER
1,3 | CHRISBAKER
...
And the not_dupe table would have rows like below:
ids
------
1,2
3,4
...
I think it would make sense to create a lookup-table storing the ids of the ones that are not duplicates. Thus confirmed non duplicants are removed and the query will only have to ad a small look up for duplicates actualy found on the lookup table.
for instance in this example we would have
id 1 | id 2
2 4
if crayzyguy#crazy.com and chris#gmail.com are diffrent persons.
If I were you, I will add some geolocalisation tables/fields to my database schema.
The probability two end-users are having the same names AND are living in the same place is very very low - except in very big town - but you can split geolocalization to small areas too - it's about granularity.
Good luck.
I would suggest you to create a couple of things:
A Boolean column to flag confirmed users
A String column to save ids
A trigger that will check if the first name and last name are already there to fill up the flag, and save in the string column all ids to which this one is a possible duplicate.
And then build a report that looks for duplicated true and decode the string field to match the possible duplicated
I gave Justin Pihony +1 as the 1st to suggest comparing the duplicate count with the not duplicate count, and Hrant Khachatrian +1 for being the 1st to show an efficient way of doing that.
Here is a slightly different method, plus some renaming to make everything a bit more self explanatory, plus some extra columns in the query to make it obvious which records need to be compared as potential duplicates.
I would call the new column "CONFIRMED_UNIQUE" instead of "IS_NOT_DUPLICATE". Like Hrant I would make it Boolean (tinyint(1) with 0=FALSE and 1=TRUE).
The "potential_duplicate_count" is the maximum number of records that would have to be deleted.
select
group_concat(case when not confirmed_unique then id end) as potential_duplicate_ids,
group_concat(case when confirmed_unique then id end) as confirmed_unique_ids,
concat(upper(first_name), upper(last_name)) as name,
sum( case when not confirmed_unique then 1 end ) - (not max(confirmed_unique)) as potential_duplicate_count
from
users
group by
name
having
potential_duplicate_count > 0
I see someone else has been voted down for the suggestion of merging, but nothing about your problem statement says the data needs to be inplace. The OP followed up with their solution which happens to be a put SQL one, that doesn't imply that every solution needs to be limited to that.
The issue as I understand is around contacts having multiple, similar, but not necessarily identical records in your database, which has cost and reputational implications so you're looking to deduplicate these records.
I would write a batch job that searches for potential duplicates (this can be as complicated or as simple as you like) and then close the two records that it finds are dupes and create a new record.
To enable that you'd need four new columns:
Status, which would be either Open, Merged, Split
RelatedId, which would hold the value of who the record was merged with
ChainId, the new record Id
DateStatusChanged, obvious enough
Open would be the default status
Merged would be when the record is merged (effectively closed and replaced)
Split would be if the merge was reversed
So, as an example, go through all of the records that, for example, have the same name. Merge them in pairs. So if you have three Chris Bakers, records 1, 2 and 3, merge 1 and 2 to make record 4 and then 3 and 4 to make record 5. Your table would end up something like:
ID NAME STATUS RELATEDID CHAINID DATESTATUSCHANGED [other rows omitted]
1 Chris Baker MERGED 2 4 27-AUG-2012
2 Chris Baker MERGED 1 4 27-AUG-2012
3 Chris Baker MERGED 4 5 28-AUG-2012
4 Chris Baker MERGED 3 5 28-AUG-2012
5 Chris Baker OPEN
This way you have a full record of what has happened to your data can reverse any changes by unmerging, if for example contacts 1 and 2 weren't the same you reverse the merge of 3 and 4, reverse the merge of 1 and 2, you'd end up with this:
ID NAME STATUS RELATEDID CHAINID DATESTATUSCHANGED
1 Chris Baker SPLIT 2 4 29-AUG-2012
2 Chris Baker SPLIT 1 4 29-AUG-2012
3 Chris Baker SPLIT 4 5 29-AUG-2012
4 Chris Baker CLOSED 3 5 29-AUG-2012
5 Chris Baker CLOSED 29-AUG-2012
You could then manually merge, as you'd probably not want your job to automatically remerge split records.
Is there a good reason for not merging duplicate accounts into a single account?
From the comments, it seems like the information is being used mostly for contact information so merging should be relatively painless and low risk. Once you merge users they will no longer appear in your duplicate report. Furthermore, you users table will actually shrink which could help with performance.
Add is_not_duplicate by datatype bit to your table and use below query after set is_not_duplicate data value:
SELECT GROUP_CONCAT(id) AS "ids",
CONCAT(UPPER(first_name), UPPER(last_name)) AS "name"
FROM users
GROUP BY name
HAVING COUNT(*) > SUM(CAST(is_not_duplicate AS INT))
above query compare total duplicate rows by total valid duplicate rows.
Why don't you make the email column to be a unique identifier in this case, and after you cleanse your records once, you do not allow duplicates from there onwards?

Query MySQL for rows that share a value, and returning them as columns?

This is for a homework assignment. I haven't copy-pasted the question below, I made an simpler version of it that focuses on the specific area where I'm stuck.
Let's say I have a table of two values: a person's name, and the place he had lunch yesterday. Assume everyone has lunch in pairs. How can I query the database to return all the pairs of people that had lunch together yesterday? Each pair must be only listed once.
I'm actually not even sure what the professor means by return them as pairs. I've sent him an email, but no reply yet. It seems like he wants me to write a query that returns a table with column 1 as person 1 and column 2 as person 2.
Any suggestions on how to go about this? Does it seem right to assume he wants them as separate columns?
So far, I basically have:
SELECT name, restaurant FROM lunches GROUP BY restaurant, name
which essentially just reorganizes the table so that the people who had lunch together are one after the other.
We have to assume there can be only one pair eating lunch in a given restaurant.
You can get a list of pairs either using self-join:
SELECT l1.name, l2.name FROM lunches l1
JOIN lunches l2
ON l1.restaurant = l2.restaurant AND l1.name < l2.name
or using GROUP BY:
SELECT GROUP_CONCAT(name) FROM lunches
GROUP BY restaurant
The first query will return pairs in two different columns, while the second in one column, using comma as separator (default for GROUP_CONCAT, you can change it to whatever you wish).
Also note that for the first query names in pairs will come in alphabetical order as we use < instead of <> to avoid listing each pair twice.

MySQL join count from one table with ids from another

I have two tables that make up a full text index of article content for search purposes. One of the tables is just a primary key associated with a word, whereas the other records the article it occurred in and its location in the document. A single word can conceivably appear many times in the same document with different locations, so the same word id can occur several times in the word_locations table.
Here are the structures:
words:
id bigint
word tinytext
word_location:
id bigint(20)
wordid bigint(20)
location int(11)
article_id int(11)
What i need to write is a query that will find the count of occurrences for each word for any one profile. I need to preserve a zero value for wordids that don't appear at all, so I assume this needs to be a left join. However, whenever I try to add a where query to limit off article, any wordids that don't appear at all are not included in the result set.
I have tried:
select words.wordid, COUNT(word_location.wordid) as appears from words left join word_location on word.id = word_location.wordid where article_id = %s GROUP BY wordid
But this query does not return zeros for words that don't appear at all.
How can I modify this left join?
Thanks in advance!
EDIT:
Here is an example data set and the result sets for the different queries.
Example article content:
Bob's Restaurant is one of the finest restaurants in greater
County where you can enjoy the finest Turkish Cuisine.
So the vocabulary table, after being adjusted by the application to exclude stop words, will have in its vocabulary rows for Bob, Restaurant, finest, greater, county, enjoy, Turkish, and cusine. (I'm using this actual article since it's the first in the set, so the ids actually appear starting from integer 1.
The query provided by #Mark Bannister produces this result set:
wordid - word - occurances:
128 clifton 0
1 bob's 2
2 restaurant 2
3 one 1
4 finest 3
5 restaurants 2
6 greater 1
9 county 1
12 enjoy 3
13 turkish 6
14 cuisine 1
The result set is correct per se - but id 128 doesn't appear in the document at all and is the only thing in the result set with occurance 0. The goal is to have the entire vocabulary returned with number of occurrences from the document (this is roughly 2500 different words)
My original problematic query from before the edit above actually returned the same result set, but without ANY 0 occurance rows at all.
You need to include your article selection in your join condition:
select words.wordid, COUNT(word_location.wordid) as appears
from words
left join word_location on word.id = word_location.wordid and article_id = ?
GROUP BY wordid
Including the restriction on article_id in the WHERE clause effectively turns your left join back into an inner join.
I would use a subselect instead of a join.
SELECT words.id, (SELECT count(*) FROM word_location WHERE word_location.wordid = words.id) as appears
Bit of a guess this one, but I think COUNT() is just disregarding your nulls, not COUNTing them and arriving at 0. (NULL + NULL != 0)
Look at the IFNULL() function, you might be able to do something like:
COUNT(IFNULL(word_location.wordid, 0))
(Disclaimer - I am more used to Oracle's NVL(, ) function hence this is a little speculative!)