Selecting the most popular keyword in a mysql database - mysql

I have a column called keywords where users enter up to 4 keywords separated by a coma, ie:
----------------------------------
userId | kewords |
----------------------------------
01 | php,css,html,mysql |
02 | wordpress,css,drupal,xx |
03 | mysql,html,wordpress,css|
----------------------------------
I'm trying to figure out a query to select all the keywords from everyone, explode them by the coma and then count how many there are of each.
I know I can do this quite easily with PHP but I though there might be a way for mysql to do it...
Any ideas?

Try to normalize the data, ie store 4 rows instead of one for each user.
It also possible to split a string into a temporary table but I'm not sure that will help you much. Originally I found this source on mysql forge but that has been shut down so here is a similar code
http://www.pnplogic.com/blog/articles/MySQL_Convert_Delimited_String_To_Temp_Table_Result_Set.php

Related

How to extract relational data from a flat table using SQL?

I have a single flat table containing a list of people which records their participation in different groups and their activities over time. The table contains following columns:
- name (first/last)
- e-mail
- secondary e-mail
- group
- event date
+ some other data in a series of columns, relevant to a specific event (meeting, workshop).
I want to extract distinct people from that into a separate table, so that further down the road it could be used for their profiles giving them a list of what they attended and relevant info. In other words, I would like to have a list of people (profiles) and then link that to a list of groups they are in and then a list of events per group they participated in.
Obviously, same people appear a number of times:
| Full name | email | secondary email | group | date |
| John Smith | jsmith#someplace.com | | AcOP | 2010-02-12 |
| John Smith | jsmith#gmail.com | jsmith#somplace.com | AcOP | 2010-03-14 |
| John Smith | jsmith#gmail.com | | CbDP | 2010-03-18 |
| John Smith | jsmith#someplace.com | | BDz | 2010-04-02 |
Of course, I would like to roll it into one record for John Smith with both e-mails in the resulting People table. I can't rule out that there might be more records for same person with other e-mails than those two - I can live with that. To make it more complex ideally I would like to derive a list of groups, creating a Groups table (possibly with further details on the groups) and then a list of meetings/activities for each group. By linking that I would then have clean relational model.
Now, the question: is there a way to perform such a transformation of data in SQL? Or do I need to write a procedure (program) that would traverse the database and do it?
The database is in MySQL, though I can also use MS Access (it was given to me in that format).
There is no tool that does this automatically. You will have to write a couple queries (unless you want to write a DTS package or something proprietary). Here's a typical approach:
Write two select statements for the two tables you wish to create-- one for users and one for groups. You may need to use DISTINCT or GROUP BY to ensure you only get one row when the source table contains duplicates.
Run the two select statements and inspect them for problems. For example, it's possible some users show up with two different email addresses, or some users have the same name and were combined incorrectly. These will need to be cleaned up in order to proceed. There is great way to do this-- it's more or less a manual process requiring expert knowledge of the data.
Write CREATE TABLE scripts based on the two SELECT statements so that you can store the results somewhere.
Use INSERT FROM or SELECT INTO to populate the tables from your two SELECT statements.

Find existence of a record in MySQL table with input data as a list

I have a list of ids in text format as a comma separated value like so
("12345", "12346", "12347", etc, etc)
I would like to find their existence or non existence from a table say devices table which has a column called device ids (not primary key)
Ideally i would like to get a list which says if each item exists or not.
So far I have tried to get the query of those that exist and I have to manually find the non existing ones.
Is there a for loop I have to run on stored procedures or something like that. Please help.
Table structure
<pre>
| id | device_id | device_name |
+------+-----------------+---------------+
| 71 | 352701060409650 | 57X |
| 13 | 352701060409700 | 582 |
</pre>
You need to create a query with left join to the same table with 'IFNULL' condition. There already has been a post for this topic. Please check this out here.

SQL: Text type with a lot of commonly used values

I have a table that basically looks like the following:
Timestamp | Service | Observation
----------+---------+------------
... | vm-1 | 15
... | vm-1 | 20
... | vm-1 | 20
... | vm-1 | 20
... | vm-1 | 20
... | vm-1 | 20
... | bvm-2 | 184
... | bvm-2 | 104
... | bvm-2 | 4
... | bvm-2 | 14
... | bvm-2 | 657
... | bvm-2 | 6
... | bvm-2 | 6
The Service column will not have a lot of different values. I don't know at table creation time what all possible values are going to be so I can't use an enum, but the number of distinct values are going to grow very slowly at (less than ~10 new distinct values per month or less), whereas I'll have thousands of new observations per day.
Right now I'm just thinking of using a VARCHAR or mysql's TEXT type for the Service column, but given the specifics of the situation those kind of seem wasteful.
Are databases usually smart about this sort of thing? Or is there some way I can hint to the database that this behavior is something that it can reliably exploit?
I'm using MySQL 5.7. I'd prefer something standards compliant or portable, but I'm also open to MySQL specific workarounds.
EDIT:
In other words, what I want is for the column to be treated like an enum, but have the database figure out dynamically based on the data that shows up in the table what the different enum values are.
Every time you need to use an enum you should consider creating another table and reference to it. It's basic normalization. So create one table for the ServiceType with a name and an id field the name can be VARCHAR and the id should be INT. The actual table then just uses the id instead of the service name.
You can write a simple stored procedure to do the inserting and looking up of duplicate names as well as a view to access the results so outside of the DB you barely know how it is internally handled.
Your stored procedure needs to:
Check if the service exists and insert it if not. INSERT IGNORE ... is probably your friend here.
Get the ID of the service with SELECT id INTO #serv_id FROM ServiceType WHERE name = [service_name];
Insert into the table with the service ID instead of the service.
Don't over optimize. MySQL does not store TINYINT more efficiently than INT so just use the latter and it won't fail until you have billions of services.
I think , you have to create a new table for store the services and and then this table primary key (service_id) can be replaced in place of service text. But main table service column should be int type for storing the service id . So please change the service column type to int(4) .
hope it will be helpfull

check if value is present in one of the database rows

Im looking for a way to check if a value is present in one of the rows of the page column.
For example if should check if the value '45' is present?
Id | page |
---------------
1 | 23 |
---------------
2 | |
---------------
3 | 33,45,55 |
---------------
4 | 45 |
---------------
The find_in_set function is just what you're looking for:
SELECT *
FROM mytable
WHERE FIND_IN_SET('45', page) > 0
You should not store values in lists. This is especially true in this case:
Values should be stored in the proper data type. You are storing numbers as characters.
Foreign key relationships should be properly defined.
SQL doesn't have very good string processing functions.
Resulting queries cannot make use of indexes.
SQL has a great data type for lists, called a table. In this case, you want a junction table.
Sometimes, you are stuck with other people's really bad design decisions. In that case, you can use find_in_set() as suggested by Mureinik.

On a stats-system, should I save little bits of information about single visit on many tables or just one table?

I've been wondering this for a while already. The title stands for my question. What do you prefer?
I made a pic to make my question clearer.
Why am I even thinking of this? Isn't one table the most obvious option? Well, kind of. It's the simpliest way, but let's think more practical. When there is a ton of data in one table and user wants to only see statistics about browsers the visitors use, this may not be as successful. Taking browser-data out of one table is naturally better.
Multiple tables has disadvantages too. Writing data takes more time and resources. With one table there's only one mysql-query needed.
Anyway, I figured out a solution, which I think makes sense. Data is written to some kind of temporary table. All of those lines will be exported to multiple tables later (scheduled script). This way the system doesn't take loading-time from the users page, but the data remains fast to browse.
Let's bring some discussion here. I'm hoping to raise some opinions.
Which one is better? Let's find out!
The date, browser and OS are all related on a one-to-one basis... Without more information to require distinguishing records further, I'd be creating a single table rather than two.
Database design is based on creating tables that reflect entities, and I don't see two distinct entities in the example provided. Consider using views to serve data without duplicating the data in the database; a centralized copy of the data makes managing the data much easier...
What you're really thinking of is whether to denormalize the table or use the first normal form. When you're using 1NF you have a table that looks like this:
Table statistic
id | date | browser_id | os_id
---------------------------------------------
1 | 127003727 | 1 | 1
2 | 127391662 | 2 | 2
3 | 127912683 | 3 | 2
And then to explain what browser and os the client used, you need other tables:
Table browser
id | name | company | version
-----------------------------------------------
1 | Firefox | Mozilla | 3.6.8
2 | Safari | Apple | 4.0
3 | Firefox | Mozilla | 3.5.1
Table os
id | name | company | version
-----------------------------------------------
1 | Ubuntu | Canonical | 10.04
2 | Windows | Microsoft | 7
3 | Windows | Microsoft | 3.11
As OMG Ponies already pointed out, this isn't a good example to be creating several entities, so one can safely go with one table and then think about how he/she is going to deal with having to, say, find all the entries with a matching browser name.