I am new to SQL, and I have a problem in my practice program,
assume I have a table below:
----------------------------------------
job_Name | member1 | member2 | member3 | ...The number of members is unknown
-----------------------------------------
ABC | 10 | 20 | null | ...The value is the price which paid
-----------------------------------------
bcd | null | 20 | null | ...to the member. that mean member1
-----------------------------------------
asd | null | 20 | 10 | ...and member2 finish this job
-----------------------------------------
qwe | 10 | 20 | 10 | ...togeter
What I want to get below:
--------------------------------------
job_Name | members |
--------------------------------------
ABC | member1,member2 |
--------------------------------------
bcd | member2 |
--------------------------------------
asd | member2,member3 |
--------------------------------------
qwe | member1,member2,member3 |
This is easy to get on excel with Visual Basic
In Mysql, I have no idea about that.
Related
Let's say i have a user table like this :
+----+-----------+----------------------+------+
| ID | Name | Email | Age |
+----+-----------+----------------------+------+
| 1 | John | john.doe1#mail.com | 24 |
| 2 | Josh | josh99#mail.com | 29 |
| 3 | Joseph | joseph410#mail.com | 21 |
| 4 | George | gge.48#mail.com | 28 |
| 5 | Joseph | jh.city89#mail.com | 24 |
| 6 | Kim | kimsd#mail.com | 32 |
| 7 | Bob | bob.s#mail.com | 38 |
| 8 | Joseph | psa.jos#mail.com | 34 |
| 9 | Joseph | joseph.la#mail.com | 28 |
| 10 | Jonathan | jonhan#mail.com | 22 |
+----+-----------+---------+------------+------+
In the actual, the database consists of more data and some of the data is duplicated, with more than two records. But the point is i want to get only the first and the second row of the duplicated rows that contains the name of "Joseph", How can i achieve this ? My code so far...
User::withTrashed()->groupBy('name')->havingRaw('count("name") >= 1')->get();
With that code the result will retrieve :
+----+-----------+----------------------+------+
| ID | Name | Email | Age |
+----+-----------+----------------------+------+
| 1 | John | john.doe1#mail.com | 24 |
| 2 | Josh | josh99#mail.com | 29 |
| 3 | Joseph | joseph410#mail.com | 21 |
| 4 | George | gge.48#mail.com | 28 |
| 6 | Kim | kimsd#mail.com | 32 |
| 7 | Bob | bob.s#mail.com | 38 |
| 10 | Jonathan | jonhan#mail.com | 22 |
+----+-----------+---------+------------+------+
And i use this code to try to get the second duplicated row :
User::withTrashed()->groupBy('name')->havingRaw('count("name") >= 2')->get();
The result is still same as the mentioned above :
+----+-----------+----------------------+------+
| ID | Name | Email | Age |
+----+-----------+----------------------+------+
| 1 | John | john.doe1#mail.com | 24 |
| 2 | Josh | josh99#mail.com | 29 |
| 3 | Joseph | joseph410#mail.com | 21 |
| 4 | George | gge.48#mail.com | 28 |
| 6 | Kim | kimsd#mail.com | 32 |
| 7 | Bob | bob.s#mail.com | 38 |
| 10 | Jonathan | jonhan#mail.com | 22 |
+----+-----------+---------+------------+------+
I want the result is to get record that have the id "5" with name "Joseph" like this :
+----+-----------+----------------------+------+
| ID | Name | Email | Age |
+----+-----------+----------------------+------+
| 1 | John | john.doe1#mail.com | 24 |
| 2 | Josh | josh99#mail.com | 29 |
| 4 | George | gge.48#mail.com | 28 |
| 5 | Joseph | jh.city89#mail.com | 24 |
| 6 | Kim | kimsd#mail.com | 32 |
| 7 | Bob | bob.s#mail.com | 38 |
| 10 | Jonathan | jonhan#mail.com | 22 |
+----+-----------+---------+------------+------+
But it seems only the first duplicate row is retrieved and i can't get the second duplicated row, can anybody give me suggestion ?
Let's start from your query
User::withTrashed()->groupBy('name')->havingRaw('count("name") >= 1')->get();
This will show all groups of rows whose count equals to 1 ore more. and this is the description of DISTINCT.
If you want to get only duplicate records you should get groups whose count is LARGER than 1.
The other thing to notice here is that a non-aggrigated column will be chosen randomly. because when you get a name and it's count, for example if you select name,count(name), email (email is not in the group by clause - not aggregated), and 4 rows have the same name. so you'll see:
+--------+-------------+-------+
| Name | Count(Name) | Email |
+--------+-------------+-------+
| Joseph | 4 | X |
+--------+-------------+-------+
what do you expect instead of X? which one of the 4 emails? actually, in SQLServer it's forbidden to select a non-aggrigated column and other databases will just give you a random one of the counted 3.
see this answer for more details it's explained very well: Do all columns in a SELECT list have to appear in a GROUP BY clause
So, we'll use having count(name) > 1 and select only the aggregated column name
DB::from('users')->select('name')->groupBy('name')->havingRaw('count("name") > 1')->get();
This should give you (didn't test it) this:
+--------+-------------+
| name | Count(name) |
+--------+-------------+
| Joseph | 4 |
+--------+-------------+
This will give you all names who have 2 or more instances. you can determine the number of duplicates in the having clause. for example having count(name) = 3 will give you all names which have exactly 3 duplicates.
So how to get the second duplicate? I have a question for that:
What is the first (original) duplicate? is it the one with the oldest created_at or the oldest updated_at ? or maybe some other condition?. because of that you should make another query with order by clause to give you the duplicates in the order most convenient to you. for example:
select * from `users` where `name` in (select `name` from users group by `name` having count(`name`) > 1) order by `id` asc
which will give:
+----+-----------+----------------------+------+
| ID | Name | Email | Age |
+----+-----------+----------------------+------+
| 3 | Joseph | joseph410#mail.com | 21 |
| 5 | Joseph | jh.city89#mail.com | 24 |
| 8 | Joseph | psa.jos#mail.com | 34 |
| 9 | Joseph | joseph.la#mail.com | 28 |
+----+-----------+---------+------------+------+
I have two tables (Users and session). The users table has the combination of users (user1 and user2 along with their ids). From the session table, I want to group by sessionID and count the number of times the pairs in users table occur in the grouped by objects from session table.
For eg, let's take the first pair in users table (xxx, yyy). Now, this pair exists in sessionID 1000 as well as in 2000. So, I want #sessions in users table for that pair to have 2 as the value. Similarly, for all the pairs.
A rough idea I have is to groupby sessionID and see if the pair from users table exists in that grouped by object, if yes, count that group as 1 and sum over all the groups that contain this pair. And of the 2 times that the pair occurred, how many times did user1 click (user1clicks) and how many times did user2 click (user2clicks).
I am not able to translate this into a query. Really struggling to solve this... Any help is much appreciated!
Users table:
| user1 | id1 | user2 | id2 |
-----------------------------
| xxx | 1 | yyy | 2 |
| xxx | 1 | zzz | 3 |
| yyy | 2 | zzz | 3 |
Session table:
| user | id | sessionID | clicked |
--------------------------------------
| xxx | 1 | 1000 | yes |
| yyy | 2 | 1000 | no |
| xxx | 1 | 2000 | no |
| yyy | 2 | 2000 | no |
| xxx | 1 | 3000 | no |
| zzz | 1 | 3000 | yes |
Output:
| user1 | id1 | user2 | id2 | #sessions | user1clicks | user2clicks|
--------------------------------------------------------------------
| xxx | 1 | yyy | 2 | 2 | 1 | 0 |
| xxx | 1 | zzz | 3 | 1 | 1 | 0 |
| yyy | 2 | zzz | 3 | 0 | 0 | 0 |
Not (yet) an answer; too long for a comment:
You need 3 tables;
users
user_id | user
-----------------------------
1 | xxx
2 | yyy
3 | zzz
combinations
user_id1 | user_id2
-----------------------------
1 | 2
1 | 3
2 | 3
sessions
session_id | user_id | sessionID | clicked |
--------------------------------------
101 | 1 | 1000 | yes |
102 | 2 | 1000 | no |
103 | 1 | 2000 | no |
104 | 2 | 2000 | no |
105 | 1 | 3000 | no |
105 | 1 | 3000 | yes |
A query returns data in the following format
-----------------------------------------
| CustID | ProdID | Date | Amount |
-----------------------------------------
| 001 | 444 | 12/09/2017 | £234 |
| 001 | 524 | .... | £234 |
| 004 | 444 | .... | £321 |
| 008 | 523 | ..... | £299 |
| 008 | 444 | ... | £299 |
.......
As shown above, there are often multiple records for CustID (one per ProdID). The problem is that the amount in the last column is the total for both 444 and 524 (in the case of Customer 001). Additionally, the ProdId can contain duplicates as the products are the same for many customers.
In other words, in Excel speak, I'm trying to remove duplicates from column CustID.
I'd like it to be:
-------------------
| CustID | Amount |
-------------------
| 001 | £234 |
| 004 | £321 |
| 008 | £299 |
I need to design a database to store user values : for each user, there is a specific set of columns.
For instance, Jon wants to store values in a table with 2 columns : name, age.
And Paul wants to store values in a 3 columns table : fruit, color, weight.
At this point, I have 2 options.
Option 1 - Store data as text values
I would have a first table 'profiles' with the users' preferences :
+----+---------+--------+-------------+
| id | user_id | label | type |
+----+---------+--------+-------------+
| 1 | 1 | name | VARCHAR(50) |
| 2 | 1 | age | INT |
| 3 | 2 | fruit | VARCHAR(50) |
| 4 | 2 | color | VARCHAR(50) |
| 5 | 2 | weight | DOUBLE |
+----+---------+--------+-------------+
And then store the datas as text in another table :
+----+------------+--------+
| id | id_profile | value |
+----+------------+--------+
| 1 | 1 | Aron |
| 2 | 2 | 17 |
| 3 | 1 | Vince |
| 4 | 2 | 27 |
| 5 | 1 | Elena |
| 6 | 2 | 78 |
| 7 | 3 | Banana |
| 8 | 4 | Yellow |
| 9 | 5 | 124.8 |
+----+------------+--------+
After that, I would programatically create and populate a clean table.
Option 2 - One column per type
On this option, I would have a first table 'profiles2' like that :
+----+---------+--------+------+
| id | user_id | label | type |
+----+---------+--------+------+
| 1 | 1 | name | 3 |
| 2 | 1 | age | 1 |
| 3 | 2 | fruit | 3 |
| 4 | 2 | color | 3 |
| 5 | 2 | weight | 2 |
+----+---------+--------+------+
with the type corresponding of a set of type : 1=INT , 2=DOUBLE , 3=VARCHAR(50)
And a data table like that :
+----+-------------+-----------+--------------+---------------+
| id | id_profile2 | int_value | double_value | varchar_value |
+----+-------------+-----------+--------------+---------------+
| 1 | 1 | NULL | NULL | Aron |
| 2 | 2 | 17 | NULL | NULL |
| 3 | 1 | NULL | NULL | Vince |
| 4 | 2 | 27 | NULL | NULL |
| 5 | 1 | NULL | NULL | Elena |
| 6 | 2 | 78 | NULL | NULL |
| 7 | 3 | NULL | NULL | Banana |
| 8 | 4 | NULL | NULL | Yellow |
| 9 | 5 | NULL | 124.8 | NULL |
+----+-------------+-----------+--------------+---------------+
Here I have cleaner tables, but still a programmatic trick to implement to get everything in order.
The questions
Have anyone ever face this situation ?
What do you think of my 2 options ?
Is there a better solution, less tricky ?
Tx a lot!
Edit
Hi again,
My model had a bug : impossible to retrieve a "line" of information; i.e. the informations in the "values" table are not sortables.
After some wanredings around the EAV model, it showed not suitable because it's not designed to store datas, but specific infos.
Then I ended with this model :
Firt table 'labels' :
+----+------------+------+----------+
| id | profile_id | name | datatype |
+----+------------+------+----------+
| 1 | 1 | 1 | Nom |
| 2 | 1 | 1 | Age |
| 3 | 2 | 2 | Fruit |
| 4 | 2 | 2 | Couleur |
| 5 | 2 | 2 | Poids |
+----+------------+------+----------+
Then a very simple 'nodes' talbe, just to keep track of the lines of infos :
+----+------------+
| id | profile_id |
+----+------------+
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 2 |
+----+------------+
and a set of tables corresponding to different datatypes :
+----+---------+----------+--------+
| id | node_id | label_id | value |
+----+---------+----------+--------+
| 1 | 1 | 1 | John |
| 2 | 2 | 1 | Doe |
| 3 | 3 | 3 | Orange |
| 4 | 3 | 4 | Orange |
| 5 | 4 | 3 | Banane |
| 6 | 4 | 4 | Jaune |
+----+---------+----------+--------+
With this model, queries are ok. Data input is a bit tricky but I will manage with a clean code.
Cheers
Take a look at EAV data models.
Option 3: make two different tables.
One table is obviously for people. The other is obviously for fruit. They should be in different tables.
Why not just have a user table with name and ID, the a userValues table that has key value pairs? that was John can have key "fruit" and value "mango, and another key "tires" and value "goodyear". Bob can have key "coin" and value "penny" and key "age" and value "42". Anyone can have any value they like and you have maximum flexibility. Speed won't be great, and you'll have to cast string to values, but it's always a tradeoff.
Cheers,
Daniel
I have a table like this:
idstable
+--------+-----+-----+-----+-----+-----+
| userid | id1 | id2 | id3 | id4 | id5 |
+--------+-----+-----+-----+-----+-----+
| 1 | 6 | 2 | 52 | 32 | 16 |
+--------+-----+-----+-----+-----+-----+
| 2 | 12 | 5 | 18 | 21 | 5 |
+--------+-----+-----+-----+-----+-----+
| ... | ... | ... | ... | ... | ... |
which i want to get the id numbers stored in the columns and get rows from another table with them. The table i wish to get the rows from is:
namestable
+----+----------+
| id | name |
+----+----------+
| 6 | Bruce |
+----+----------+
| 2 | Mary |
+----+----------+
| 52 | Dick |
+----+----------+
| 32 | Bob |
+----+----------+
| 16 | Jack |
+----+----------+
| .. | ... |
The result-set i'm im trying to get is something like:
+------+----+-------+
| User | id | name |
+------+----+-------+
| 1 | 6 | Bruce |
+------+----+-------+
| 1 | 2 | Mary |
+------+----+-------+
| 1 | 52 | Dick |
+------+----+-------+
| 1 | 32 | Bob |
+------+----+-------+
| 1 | 16 | Jack |
+------+----+-------+
How can i query the idstable with a userid to retrieve the ids associated with it, to then retrieve the correct rows from the namestable? I've read this is called transposing but i've yet to find a concise example.
Any ideas?
join
SELECT idstable.id as User, namestable.id as id, namestable.name as name
FROM idstable it
JOIN namestable nt on (nt.id = it.id1 or nt.id = it.id2 ... 3,4,5 like the first two)
Ideally, your idstable wouldn't have 5 columns, but 5 rows per per id.