Joined query missing records - mysql

I have a database that contains a people table and another table with names for those people. For each person, there is at least one record in the names table, with one of those being set as the 'person_default_name_id' for that person, but other variations of that name in different languages. The idea is that the user who looks up the table will have a preferred language set (e.g. English, Spanish, Russian) and a preferred script set, which is based on their preferred language (e.g. if their preferred language is English or Spanish, the script would be "Latin", while if the preferred language is Russian, the script would be "Cyrillic"). It's a little complex and I'm wanting to display a list of names, but only display one name per person, and that one chosen name should be shown according to the best-fit for the user's chosen language and script.
The code below is what I'm trying:
SELECT
people.person_id,
names.name
FROM
`people`
LEFT JOIN
`names` ON names.person_id=people.person_id
LEFT JOIN
`languages` ON names.language_id = languages.language_id
LEFT JOIN
`language_scripts` ON languages.language_id = language_scripts.language_id
WHERE
(
/* 1st preference - display the default name for the person IF the default name's language writing system matches the user's writing system */
(people.person_default_name_id=names.name_id AND language_scripts.script_id = :user_script_id)
OR
/* 2nd preference - display the alternative name in the user's chosen language if an alternative name exists in that language */
names.language_id = :user_language_id
OR
/* 3rd preference - display the alternative name in the user's chosen writing system if an alternative name exists in that writing system */
language_scripts.script_id = :user_script_id
)
GROUP BY
people.person_id
ORDER BY
names.name ASC
Example data is below:
Table: people
person_id | person_default_name_id
------------------------------------
1 | 2
Table: names
name_id | name | person_id | language_id
--------------------------------------------
1 | George | 1 | 1
2 | Jorge | 1 | 2
3 | Джордж | 1 | 3
Table: languages
language_id | language
------------------------
1 | English
2 | Spanish
3 | Russian
Table: language_scripts
language_script_id | language_id | script_id
----------------------------------------------
1 | 1 | 1
2 | 2 | 1
3 | 3 | 2
Table: scripts
script_id | script
----------------------
1 | Latin
2 | Cyrillic
I'm finding that some of the expected records are not coming through. I'm guessing that there are improvements I could make to my query, but my skills are not quite advanced enough to know the best path. Can anyone see what I'm doing wrong?

I would suggest you put your where clause conditions in your select statement and return a "score" for each record. Remove it entirely from your where clause and it may give you insight into why you have missing records if they are returned with a 0 score.
Case when condition Then 5
when condition then 4
Etc...
else 0
End case
Once you have your results scored, you can order by your score descending and take the first one per person. Or add additional outer queries to only return the rows having the max score per person.
Apologies for answering from my phone.

Related

mySql count all latest versions

In had a question up yesterday unfortunately I didn't explain myself well enough - one of them end of the day things.
Anyway I have a table called documents...
+----+--------------------------------------+-----------+---------+---------+
| id | document_guid | title | version | payload |
+----+--------------------------------------+-----------+---------+---------+
| 1 | 0D2753BE-583B-42CE-B0DA-1FD0171D95C0 | animation | 1 | {} |
| 2 | 0D2753BE-583B-42CE-B0DA-1FD0171D95C0 | animation | 2 | {} |
| 3 | 1C2A1131-0261-4D58-81AA-EFAB5285B282 | formation | 1 | {} |
| 4 | 1E17403F-C590-4CE4-9E79-E1B7C98F97F1 | session | 1 | {} |
| 4 | 1E17403F-C590-4CE4-9E79-E1B7C98F97F1 | session | 2 | {} |
+----+--------------------------------------+-----------+---------+---------+
As you can see we can have multiple versions of the same document (referenced by document_guid). What I need is a count of all documents in the table excluding obsolete version. i.e. If the document 1E17403F-C590-4CE4-9E79-E1B7C98F97F1 has two versions like it shows in the example above then it should only account for one document in the overall count.
I really hope this makes more sense then my last question.
The main problem I have is I need a similar query that returns all the latest versions rather than just the count too.
A useful query would probably look like:
-- select the maximum version (and other information, per group)
-- can also add a 'count(1) as version_count' if required
select max(version) as latest_version, title, document_guid
from documents
-- from each group, as divided up by the same guid *see note 1
group by document_guid, title
This query returns the latest version number; there is always "one latest version" per document.
1 The title, which may be a break in normalization, needs to be part of the group for it to be included in the result columns; if not needed, it can be removed.
If the title is a required field that can change across versions then this needs to be written differently - first find the "latest version" and then join it back with the appropriate rows. An example:
select t.latest_version, d.title, d.document_guid
from documents d
join (
select max(version) as latest_version, document_guid
from documents
group by document_guid
) t
on t.document_guid = d.document_guid and t.latest_version = d.version
And of course this assumes a key of (document_guid, version).
To count distinct document_guids:
select count(distinct document_guid) from documents
To return the latest version of each document, you can either do a GROUP BY (as user2864740's answer), or a NOT EXISTS:
select * from documents d1
where not exists (select 1 from documents d2
where d2.document_guid = d1.document_guid
and d2.version > d1.version)
I.e. return a row if there are no other with same document_guid that has a higher version number.

export phpList subscribers via sql in mysql database

For some reason, I am unable to export a table of subscribers from my phpList (ver. 3.0.6) admin pages. I've searched on the web, and several others have had this problem but no workarounds have been posted. As a workaround, I would like to query the mySQL database directly to retrieve a similar table of subscribers. But I need help with the SQL command. Note that I don't want to export or backup the mySQL database, I want to query it in the same way that the "export subscribers" button is supposed to do in the phpList admin pages.
In brief, I have two tables to query. The first table, user contains an ID and email for every subscriber. For example:
id | email
1 | e1#gmail.com
2 | e2#gmail.com
The second table, user_attribute contains a userid, attributeid, and value. Note in the example below that userid 1 has values for all three possible attributes, while userid's 2 and 3 are either missing one or more of the three attributeid's, or have blank values for some.
userid | attributeid | value
1 | 1 | 1
1 | 2 | 4
1 | 3 | 6
2 | 1 | 3
2 | 3 |
3 | 1 | 4
I would like to execute a SQL statement that would produce a row of output for each id/email that would look like this (using id 3 as an example):
id | email | attribute1 | attribute2 | attribute3
3 | e3#gmail.com | 4 | "" | "" |
Can someone suggest SQL query language that could accomplish this task?
A related query I would like to run is to find all id/email that do not have a value for attribute3. In the example above, this would be id's 2 and 3. Note that id 3 does not even have a blank value for attributeid3, it is simply missing.
Any help would be appreciated.
John
I know this is a very old post, but I just had to do the same thing. Here's the query I used. Note that you'll need to modify the query based on the custom attributes you have setup. You can see I had name, city and state as shown in the AS clauses below. You'll need to map those to the attribute id. Also, the state has a table of state names that I linked to. I excluded blacklisted (unsubscribed), more than 2 bounces and unconfirmed users.
SELECT
users.email,
(SELECT value
FROM `phplist_user_user_attribute` attrs
WHERE
attrs.userid = users.id and
attributeid=1
) AS name,
(SELECT value
FROM `phplist_user_user_attribute` attrs
WHERE
attrs.userid = users.id and
attributeid=3
) AS city,
(SELECT st.name
FROM `phplist_user_user_attribute` attrs
LEFT JOIN `phplist_listattr_state` st
ON attrs.value = st.id
WHERE
attrs.userid = users.id and
attributeid=4
) AS state
FROM
`phplist_user_user` users
WHERE
users.blacklisted=0 and
users.bouncecount<3 and
users.confirmed=1
;
I hope someone finds this helpful.

How do I use mysql to match against multiple possibilities from a second table?

I'm not entirely sure how to ask this question, so I'll lead by providing an example table and an example output and then follow up with a more thorough explanation of what I'm attempting to accomplish.
Imagine that I have two tables. In the first is a list of companies. Some of these companies have duplicate entries due to being imported and continuously updated from different sources. For example, the company table may look something like this:
| rawName | strippedName |
| Kohl's | kohls |
| kohls.com | kohls |
| kohls Corporation | kohls |
So in this situation, we have information that has come in from three different sources. In an attempt to allow my program to understand that each of these sources are all the same store, I created the stripped name column (which I also use for creating URL's and whatnot).
In the second table, we have information about deals, coupons, shipping offers, etc. However, since these come in from their various sources, the end up with the three different rawNames that we identified above. For example, the second table might look something like this:
| merchantName | dealInformation |
| kohls.com | 10% off everything... |
| kohl's | Free shipping on... |
| kohls corporation | 1 Day Flash Sale! |
| kohls.com | Buy one get one... |
So here we have four entries that are all from the same company. However, when a user on the site visits the listing for Kohls, I want it to display all the entries from each source.
Here is what I currently have, but it doesn't seem to be doing the trick. This seems to only work if I set the LIMIT in that sub-query to 1 so that it only brings back one of the rawNames. I need it to match against all of the rawNames.
SELECT * FROM table2
WHERE merchantName = (SELECT rawName FROM table1 WHERE strippedName = '".$strippedName."')
The quickest fix is to replace your mercahantName = with merchantName IN
SELECT * FROM table2
WHERE merchantName IN (SELECT rawName FROM table1 WHERE strippedName = '".$strippedName."')
The = operator needs to have exactly one value on each side - the IN keyword matches a value against multiple values.

Summary of MySQL detail records matching by IP address ranges - mySQL Jedi Knight required

So, I have to draw upon all the powers of the greatest mySQL minds that SO has to offer. I have to summarize detail records based on the IP address in each record. Here's the scenario:
In short, we have consortiums that want to know: "Which schools within my consortium watched which videos how many times"? In SQL terms, it amounts to COUNTing the detail records, grouped by which IP range it might fall into.
We have several university Consortiums - each with a handful of different schools that are members.
Each school within a consortium uses various IP ranges to access the videos that we serve to these schools.
The IP Ranges are specified with wild cards, so each school specifies something like '100.200.35.x, 100.201.x.x, 100.202.39.50, etc.', with the average number of ranges per school being 10 or 15.
The raw text log files to summarize are already in a database (one row for each log entry), and has the actual IP address that accessed the video file.
There are 100's of millions of detail records, so I fully expect this to be a long slow process that runs for a considerable period.
PHP scripts exist that can "explode" the wildcards into the individual IPs that are represented, but I fear this will be the final answer and could take weeks to run.
(For simplicity sake, I'm only going to refer to the video filename that was accessed and COUNT the log entries for it, but in fact all the details such as start/stop/duration,etc. are there and will ultimately be part of this solution.)
With Consortium records something like this: (All table designs except log details open to suggestion):
| id|consortium |
| 10|Ivy League |
| 20|California |
And School/IP records something like this:
| id|school |consortium_id|
| 101|Harvard |10 |
| 102|Yale |10 |
| 103|UCLA |20 |
| 104|Berkeley |20 |
| id|school_id|ip_range |
| 1| 101 |100.200.x.x |
| 2| 101 |100.201.65.x |
| 3| 101 |100.202.39.50 |
| 4| 101 |100.202.39.51 |
| 5| 101 |100.200.x.x |
| 6| 101 |100.201.65.x |
| 7| 101 |100.202.39.50 |
And detail records something like this:
|session |ip_address |filename |
|560554790925|100.202.390.500|history101.mp4 |
|406417611526|43.22.90.5 |newsreel.mp4 |
|650423700223|100.202.39.50 |history101.mp4 |
|650423700223|100.202.50.12 |science101.mp4 |
|513057324209|100.202.39.56 |history101.mp4 |
I like to think I'm pretty handy with mySQL, but this one is stretching it, and am hoping that there's a spectacular function or set of steps that someone might offer.
With your existing data structure, you could do string matching as follows (but it's not very efficient):
SELECT schools.school, detail.filename, COUNT(*)
FROM schools
JOIN ipranges ON schools.id = ipranges.school_id
JOIN detail ON detail.ip_address LIKE REPLACE(ipranges.ip_range, 'x', '%')
WHERE schools.consortium_id = ?
GROUP BY schools.school, detail.filename
A better way would be to store your IP ranges as network address and prefix length:
ALTER TABLE ipranges
ADD COLUMN network INT UNSIGNED,
ADD COLUMN prefix TINYINT;
UPDATE ipranges SET
network = INET_ATON(REPLACE(ip_range, 'x', 0)),
prefix = 32 - 8*(CHAR_LENGTH(ip_range) - CHAR_LENGTH(REPLACE(ip_range,'x',''));
ALTER TABLE ipranges
DROP COLUMN ip_range;
ALTER TABLE detail
ADD COLUMN ip_address_new INT UNSIGNED;
UPDATE detail SET
ip_address_new = INET_ATON(ip_address);
ALTER TABLE detail
DROP COLUMN ip_address,
CHANGE ip_address_new ip_address INT UNSIGNED;
Then it would merely be a case of performing some bit comparisons:
SELECT schools.school, detail.filename, COUNT(*)
FROM schools
JOIN ipranges ON schools.id = ipranges.school_id
JOIN detail ON detail.ip_address & ~((1 << 32 - ipranges.prefix) - 1)
= ipranges.network
WHERE schools.consortium_id = ?
GROUP BY schools.school, detail.filename
SELECT D.filename, S.school, COUNT(D.*)
FROM detail_records AS D
INNER JOIN ip_map AS I ON D.ip_address LIKE CONCAT(SUBSTRING(I.ip_range, 1, LOCATE('x', I.ip_range)-1), '%')
INNER JOIN school AS S ON S.id = I.school_id
INNER JOIN consortium AS C ON C.id = S.consortium_id
WHERE S.consortium_id = <consortium identifier>
GROUP BY D.filename, S.school

SQL statement to return elements from a column only if no elements from a different column match

Sorry for the confusing question, I will try to clarify.
I have an SQL database ( that I did not create ) that I would like to write a query for. I know very little about SQL, so it is hard for me to even know what to search for to see if this question has already been asked, so sorry if it has. It should be an easy solution for those in the know.
The query I need is for a search I would like to perform on an existing data management system. I want to return all the documents that a given user has NOT signed-off on, as indicated by rows in a signoffs_table. The data is stored similarly to as follows: (this is actually a simplification of the actual schema and hides several LEFT JOINS and columns)
signoffs_table:
| id | user_id | document_id | signers_list |
The naive solution I had was to do something like the following:
SELECT document_id from signoffs_table WHERE (user_id <> $BobsID) AND signers_list LIKE "%Bob%";
This works if ONLY Bob signs the document. The problem is that if Bob and Mary have signed the document then the table looks like this:
signoffs_table:
-----------------------------------------------
| id | user_id | document_id | signers_list |
-----------------------------------------------
| 1 | 10 | 100 | "Bob,Mary,Jim" |
| 2 | 20 | 100 | "Bob,Mary,Jim" |
-----------------------------------------------
(assume Bob's ID = 10 and mary's ID = 20).
and then when I do the query then I get back document_id 100 (in row #2) because there is a row that Bob should have signed, but did not.
Is what I am trying to do possible with the given database structure? I can provide more details if needed. I am not sure how much details are needed.
I guess this query is what you mean:
SELECT document_id FROM signoffs_table AS t1
WHERE signers_list LIKE "%Bob%"
AND NOT EXISTS (
SELECT 1 FROM signoffs_table AS t2
WHERE (t2.user_id = $BobsID) AND t2.document_id = t1.document_id )
I believe your design is incorrect. You have a many-to-many relationship between documents and signers. You should have a junction table, something like:
ID DocumentID SignerID