MySQL - Remove dublets and preserve first instance

MySQL - Remove dublets and preserve first instance - mysql

I have created a pretty messy table that I need to clean up. This is not an easy task.
Situation:
Messy table is called tbl_users.
It contains information about a user. It has a unique ID.
The most unique colomn is phone_number which identifies a user.
For varoius reasons I have the same user with the same phone number more than once in this table.
You can see when it is created in a colomn called created_date.
What i need:
I need to find the FIRST created instance of each user and then I
need to find
all the ID's of the second and even maybe third instance of the user,
if
such exists.
I need the ID's because there's data from other tables where I need to change the user_id to first instance.
How do I proceed with this challenge?

select t.*, (t.created_date = tf.minDate) as IsFirst
from tbl_users t
inner join (
select phone_number, min(created_date) as minDate
from tbl_users
group by phone_number
) tf on t.phone_number = tf.phone_number

Related

In vTiger 6.5: Which table stores the products that belongs to a quotes?

I need to know which table acts as an intermediary to achieve the many-to-many relationship between these entities.
I know that the table that stores the products is vtiger_products and that the one that keeps the quotes is vtiger_quotes but I do not know which table relates both, so my query is incomplete.
So...
SELECT * FROM vtiger_quotes
INNER JOIN vtiger_products INNER JOIN table_relates_both
ON vtiger_quotes.quoteid = table_relates_both.quoteid
AND vtiger_products.productid = table_relates_both.productid
WHERE vtiger_quotes.potentialid = ?
What's the real name of table_relates_both?

vtiger_inventoryproductrel is the intermediary table between vtiger_quotes and vtiger_products
Below is the structure of vtiger_inventoryproductrel where id column act as a foreign key of Quotes, Opportunity, Invoice etc
If you want to fetch Quotes related to particular Opportunity then you need to execute below query:
SELECT {your required field goes here} FROM vtiger_inventoryproductrel INNER JOIN vtiger_quotes
ON vtiger_quotes.quoteid = vtiger_inventoryproductrel.id
WHERE vtiger_quotes.potentialid = $potential_id
Also note that:
vtiger_crmentity - This is core table in which an entry is added for
all entity type records. This stores meta information like record id,
record owner id, last modified by user id, created time, modified time
and description.

the table name is vtiger_inventoryproductrel

Get latest access log comparing records between two access log tables

Two tables that log access to projects. I want to show the admin a list of projects/worksheets that he/she has not accessed since the user has last access that project/worksheet with an access_code of "e" (edited).
There may be a case where the admin has NEVER accessed that project/worksheet (i.e. no matching project/worksheet in admin table) (this is actually where I get stuck)
Obviously, what I was after is a single query that "does it all".. maybe dreaming..
admin_access
project_id, wksheet_id, id, access_date, access_code
user_access
project_id, wksheet_id, id, access_date, access_code
(where id is the user/admin id for that access event and access_date is a timestamp)
result table
project_id, wksheet_id
What I need is a list of records (project_id, wksheet_id) where the access_date in the table user_access is the greatest (i.e. admin has not access that project/worksheet since the user has last edited that worksheet. (note: the only other access_code is "v" for view))
It is not relevant that the admin had either viewed or edited that worksheet previously or which user had last access that record in the user_access table. (i.e. the id in the user_access table is not relevant)
I've gotten close but the killer seems to be where that admin does not have an access record in the admin_access table for that project. (max(date) returns NULL and then the comparison fails)
Looking for fresh angles on this..

I finally got something that works when the max(a.access_date) is NULL for the comparison
Select * from
(SELECT u.project_id, u.wksheet_id, max(u.access_date) as user_date, max(a.access_date) as admin_date FROM user_access u
left outer join admin_access a on (a.project_id = u.project_id and a.wksheet_id = u.wksheet_id)
where u.access_code = 'e'
group by u.project_id, u.wksheet_id) as maxtable
where maxtable.user_date > maxtable.admin_date or
maxtable.admin_date is NULL
The key was to take the query that produces a good list of possible matching project/worksheets and use that in a select where the comparison accounts for the possible null from the max(a.access_date) from the missing matching admin_access record.
It would be interesting to hear other ways to do this...

How do I select a record from one table in a mySQL database, based on the existence of data in a second?

Please forgive my ignorance here. SQL is decidedly one of the biggest "gaps" in my education that I'm working on correcting, come October. Here's the scenario:
I have two tables in a DB that I need to access certain data from. One is users, and the other is conversation_log. The basic structure is outlined below:
users:
id (INT)
name (TXT)
conversation_log
userid (INT) // same value as id in users - actually the only field in this table I want to check
input (TXT)
response (TXT)
(note that I'm only listing the structure for the fields that are {or could be} relevant to the current challenge)
What I want to do is return a list of names from the users table that have at least one record in the conversation_log table. Currently, I'm doing this with two separate SQL statements, with the one that checks for records in conversation_log being called hundreds, if not thousands of times, once for each userid, just to see if records exist for that id.
Currently, the two SQL statements are as follows:
select id from users where 1; (gets the list of userid values for the next query)
select id from conversation_log where userid = $userId limit 1; (checks for existing records)
Right now I have 4,000+ users listed in the users table. I'm sure that you can imagine just how long this method takes. I know there's an easier, more efficient way to do this, but being self-taught, this is something that I have yet to learn. Any help would be greatly appreciated.

You have to do what is called a 'Join'. This, um, joins the rows of two tables together based on values they have in common.
See if this makes sense to you:
SELECT DISTINCT users.name
FROM users JOIN conversation_log ON users.id = converation_log.userid
Now JOIN by itself is an "inner join", which means that it will only return rows that both tables have in common. In other words, if a specific conversation_log.userid doesn't exist, it won't return any part of the row, user or conversation log, for that userid.
Also, +1 for having a clearly worded question : )
EDIT: I added a "DISTINCT", which means to filter out all of the duplicates. If a user appeared in more than one conversation_log row, and you didn't have DISTINCT, you would get the user's name more than once. This is because JOIN does a cartesian product, or does every possible combination of rows from each table that match your JOIN ON criteria.

Something like this:
SELECT *
FROM users
WHERE EXISTS (
SELECT *
FROM conversation_log
WHERE users.id = conversation_log.userid
)
In plain English: select every row from users, such that there is at least one row from conversation_log with the matching userid.

What you need to read is JOIN syntax.
SELECT count(*), users.name
FROM users left join conversion_log on users.id = conversation_log.userid
Group by users.name
You could add at the end if you wanted
HAVING count(*) > 0

How to check if a given data exists in multiple tables (all of which has the same column)?

I have 3 tables, each consisting of a column called username. On the registration part, I need to check that the requested username is new and unique.
I need that single SQL that will tell me if that user exists in any of these tables, before I proceed. I tried:
SELECT tbl1.username, tbl2.username, tbl3.username
FROM tbl1,tbl2,tbl3
WHERE tbl1.username = {$username}
OR tbl2.username = {$username}
OR tbl3.username ={$username}
Is that the way to go?

select 1
from (
select username as username from tbl1
union all
select username from tbl2
union all
select username from tbl3
) a
where username = 'someuser'

In the event you honestly just want to know if a user exists:
The quickest approach is an existence query:
select
NOT EXISTS (select username from a where username = {$username}) AND
NOT EXISTS (select username from b where username = {$username}) AND
NOT EXISTS (select username from c where username = {$username});
If your username column is marked as Unique in each table, this should be the most efficient query you will be able to make to perform this operation, and this will outperform a normalized username table in terms of memory usage and, well, virtually any other query that cares about username and another column, as there are no excessive joins. If you've ever been called on to speed up an organization's database, I can assure you that over-normalization is a nightmare. In regards to the advice you've received on normalization in this thread, be wary. It's great for limiting space, or limiting the number of places you have to update data, but you have to weigh that against the maintenance and speed overhead. Take the advice given to you on this page with a grain of salt.
Get used to running a query analyzer on your queries, if for no other reason than to get in the habit of learning the ramifications of choices when writing queries -- at least until you get your sea legs.
In the event you want to insert a user later:
If you are doing this for the purpose of eventually adding the user to the database, here is a better approach, and it's worth it to learn it. Attempt to insert the value immediately. Check afterwards to see if it was successful. This way there is no room for some other database call to insert a record in between the time you've checked and the time you inserted into the database. For instance, in MySQL you might do this:
INSERT INTO {$table} (`username`, ... )
SELECT {$username} as `username`, ... FROM DUAL
WHERE
NOT EXISTS (select username from a where username = {$username}) AND
NOT EXISTS (select username from b where username = {$username}) AND
NOT EXISTS (select username from c where username = {$username});
All database API's I've seen, as well as all SQL implementations will provide you a way to discover how many rows were inserted. If it's 1, then the username didn't exist and the insertion was successful. In this case, I don't know your dialect, and so I've chosen MySQL, which provides a DUAL table specifically for returning results that aren't bound to a table, but honestly, there are many ways to skin this cat, whether you put it in a transaction or a stored procedure, or strictly limit the process and procedure that can access these tables.
Update -- How to handle users who don't complete the sign up process
As #RedFilter points out, if registration is done in multiple steps -- reserving a username, filling out details, perhaps answering an email confirmation, then you will want to at least add a column to flag this user (with a timestamp, not a boolean) so that you can periodically remove users after some time period, though I recommend creating a ToBePurged table and add new users to that, along with a timestamp. When the confirmation comes through, you remove the user from this table. Periodically you will check this table for all entries prior to some delta off your current time and simply delete them from whichever table they were originally added. My philosophy behind this is to define more clearly the responsibility of the table and to keep the number of records you are working with very lean. We certainly don't want to over-engineer our solutions, but if you get into the habit of good architectural practices, these designs will flow out as naturally as their less efficient counterparts.

No. Two processes could run your test at the same time and both would report no user and then both could insert the same user.
It sounds like you need a single table to hold ALL the users with a unique index to prevent duplicates. This master table could link to 'sub-tables' using a user ID, not user name.

Given the collation stuff, you could do this instead, if you don't want to deal with the collation mismatch:
select sum(usercount) as usercount
from (
select count(*) as usercount from tbl1 where username = 'someuser'
union all
select count(*) as usercount from tbl2 where username = 'someuser'
union all
select count(*) as usercount from tbl3 where username = 'someuser'
) as usercounts
If you get 0, there isn't a user with that username, if you get something higher, there is.
Note: Depending on how you do the insert, you could in theory get more than one user with the same username due to race conditions (see other comments about normalisation and unique keys).

1- You need to normalize your tables
See: http://databases.about.com/od/specificproducts/a/normalization.htm
2- Don't use implicit SQL '89 joins.
Kick the habit and use explicit joins
SELECT a.field1, b.field2, c.field3
FROM a
INNER JOIN b ON (a.id = b.a_id) -- JOIN criteria go here
INNER JOIN c ON (b.id = c.b_id) -- and here, nice and explicit.
WHERE ... -- filter criteria go here.

With your current set up RedFilter's answer should work fine. I thought it would be worth noting that you shouldn't have redundant or dispersed data in your database to begin with though.
You should have one and only one place to store any specific data - so in your case, instead of having a username in 3 different tables, you should have one table with username and a primary key identifier for those usernames. Your other 3 tables should then foreign-key reference the username table. You'll be able to construct much simpler and more efficient queries with this layout. You're opening a can of worms by replicating data in various locations.

SQL Server Reporting Services, how to best apply filters

I have a bunch of records (orders) that I want to make available to users to make reports from.
The users come from different departments, and I would like to make it, so each department can only see their own stuff.
I can't figure out how to do this the right way.
What I have now is:
- A model where I have placed a Filter on the Order table.
The filter can use GetUserID() to get the users name, but I can't figure out how I get from that to the "UserDepartment" table that maps users to specific departments.
Ofcourse, I would prefer a solution whereby I didn't have to create new access groups or edit the model for each department that someone might dream up.
Any clues?
(Using SQL server 2008)
EDIT: This link http://blogs.msdn.com/bobmeyers/articles/Implementing_Data_Security_in_a_Report_Model.aspx shows the basics of what I'm trying to do, but the author seems to assume that each record have a UserName field that can be matched.
In my case i want all users of department X to be able to access the line.

We had a similar problem to this and ended up writing a function in SQL.
The function did the following:
Received the username parameter from SRSS
Performed a lookup on the permissions table and retrieved the records (department Id's in your case).
returned the department Id's
Then our sql statement looked like this:
SELECT *
FROM ImportantData
WHERE DepartmentId IN (SELECT Id FROM fn_GetUserDepartmentAllocations(#UserName))
This did force us to modify all of the sql queries but it allowed us to do it with minimal complex logic.
The other thing that this allows for is if you have one user who transcends department boundaries: for example a manager of 2 departments.
CREATE FUNCTION [dbo].[fn_GetUserDepartmentAllocations]
(
#UserName NVARCHAR(100)
)
RETURNS
#TempPermissions TABLE
(
DepartmentId Int
)
AS
BEGIN
INSERT INTO #TempPermissions
SELECT DepartmentId
FROM DepartmentPermissions
WHERE DepartmentAllowedUsername = #UserName
RETURN
END
The main benefit to doing it this way is it also allows you to edit one place to change the entire permissions structure, you don't have to go through each and every report to change it, instead you change one place
For example you could have a manager who belongs to 2 departments but is not allowed to view them except on thursdays (I know silly example but you get the point hopefully).
Hope this helps
Pete

This assume that Users have Orders.
So, filter by users who exist in the same dept as the filter user. Don't filter orders directly.
I've guessed at schema and column names: hoep you get the idea...
SELECT
MY STuff
FROM
Order O
JOIN
UserDept UD ON O.UserCode = UD.UserCode
WHERE
EXISTS (SELECT *
FROM
UserDept UD2
WHERE
UD2.UserCode = #MYUSerCode
AND
UD2.DeptID = UD.DeptID)
--or
SELECT
MY STuff
FROM
Order O
JOIN
UserDept D ON O.UserCode = D.UserCode
JOIN
UserDept U ON D.DeptID = U.DeptID
WHERE
U.UserCode = #MYUSerCode

What you're trying to achieve is difficult using the GetUserID() method. To use that your source query would have to return a lot of redundant data, imagine something like the following:
/*
Table: User
Fields: UserID, LoginName, FullName
Table: Department
Fields: DepartmentID, Name
Table: UserDepartments
Fields: UserID, DepartmentID
Table: Order
Fields: OrderNumber, DepartmentID
*/
SELECT O.OrderNumber, O.DepartmentID, U.LoginName
FROM Order O
JOIN Department D ON D.DepartmentID = O.DepartmentID
JOIN UserDepartments UD ON UD.DepartmentID = D.DepartmentID
JOIN User U ON U.UserID = UD.UserID
This will give you more rows than you want, basically a copy of the order for each user in the department that owns the order.
Now you can apply your filter as described in the link you provided. This will filter it down to just one copy of the order rows for the current user if they're in the right department.
If this is a performance issue there's other alternatives, easiest being using a local report (.RDLC) in either ASP.NET, WinForms or WPF and passing user details off to the data call so the filtering can be done in the SQL.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008