Complex conditions on Eloquent Joins

Complex conditions on Eloquent Joins - mysql

I'm looking to use Eloquent to output some data from our database, but the database can vary.
Below is an illustration of my database tables
Webservice Tags:
----------------------------------------------------------
| id | webservice_tag | webservice_name | blog_id |
==========================================================
| 1 | TEST, TESSST, TES | Test Service | 1 |
----------------------------------------------------------
| 2 | OPTION, OPT, EXAMPLE | Example Service | 1 |
----------------------------------------------------------
| 3 | ANOTHER, ANO, THER | Another Service | 1 |
----------------------------------------------------------
Blog Post:
----------------------------------------------------------
| id | title | blog_id | tag |
==========================================================
| 1 | Blog Title 1 | 1 | THER |
----------------------------------------------------------
| 2 | Blog Title 2 | 1 | TES |
----------------------------------------------------------
| 3 | Blog Title 3 | 1 | ANOTHER |
----------------------------------------------------------
So here, we have two tables. Blog Posts and Web-service Tags.
Our blog posts are populated by a number of web-services from a number of different providers. Test Service, Example Service and Another Service for example. However, these web-services are very inconsistent; they'll send over a combination of tags, and no two posts are guaranteed to be the same.
So, we've created a table called Webservice Tags which is designed to log each of these occurrances. This way we can identify that (in the examples) Blog Title 3 was sent by Another Service, Blog Title 2 was sent by Test Service etc.
I'm developing reports to show how many posts we get from each of our web services. So for each blog post, I need to identify the web service and get the web service name that's associated with it. We have multiple blogs, each with their own web services (some may share a tag), so this report needs to be isolated to each individual blog.
Here's the query in Eloquent:
$query = DB::table('blog_posts')
->join('webservice_tags', function($join) use ($blog) {
$join->on('blog_posts.tag', '=', 'webservice_tags.webservice_tag')
->where('webservice_tags.blog_id', '=', $blog->id);
})
->addSelect('webservice_tags.webservice_name AS name')
->addSelect(DB::raw("COUNT(blog_posts.id) AS count"))
->where('blog_posts.blog_id', '=', $blog->id)
->groupBy('webservice_tags.webservice_name')
->get();
This query was fine whilst the Webservice sent a consistent tag. Now however there are different tags per web service and this report needs to count them all equally.
Here's my amendment, but it's not working as expected:
->join('webservice_tags', function($join) use ($blog) {
$join->on('blog_posts.tag', 'LIKE', DB::raw('CONCAT("%", webservice_tags.webservice_tag, "%")'))
->where('webservice_tags.blog_id', '=', $blog->id);
})
I'm not getting any matches in the query through this method.
The query aims to join any field to the SELECT where the blog_posts.tag is within the comma separated list of webservice_tags.
Is there a way to do this more effectively?
Explanation on the blog_id relationship
This system manages multiple blogs, as well as the posts within those blogs. A webservice may be applicable to one blog, but not the other, which is why the blog_id is associated witih webservices as well as the individual posts. There are things like commission percentages that are associated with a webservice, and Test Webservice may offer 10% for Blog A, and 12% for Blog B, so they are essentially separate.

Quickest/crude solution:
The reason you do not see any result is because of this:
$join->on('blog_posts.tag', 'LIKE', DB::raw('CONCAT("%", webservice_tags.webservice_tag, "%")'))
which translates to:
INNER JOIN `webservice_tags`
ON `blog_posts`.`tag` LIKE CONCAT("%", webservice_tags.webservice_tag, "%")
This tries to match blog_posts.tag that has values like %ANOTHER, ANO, THER%.
So let's say blog_posts.tag is "ANOTHER" and webservice_tags.webservice_tag is "ANOTHER, ANO, THER". Rather than matching "ANOTHER", the database is trying to match values like these:
ANOTHER != FOOANOTHER, ANO, THER
ANOTHER != ANOTHER, ANO, THERBAR
ANOTHER != FOOANOTHER, ANO, THERBAR
Which is the reason why you are not getting any results. You will need to swap the columns around, like this:
$join->on('webservice_tags.webservice_tag', 'LIKE', DB::raw('CONCAT("%", blog_posts.tag, "%")'))
Longer solution:
I'm just going to throw my idea here. These are some points that come into my mind when thinking about this:
Tag <-> Web Service mapping is stored as a string when it is actually a set of data. That makes it hard to search and hard to perform queries on it. I like to break them down into two or more tables. This is called Database normalization. Don't ask me which level it is, I never understood the technical description for them. :D
Since you mentioned that a web service can have many tags, but some tags can also be shared by different web services, we will need a pivot table to handle this many-to-many relationship.
When aggregating, I like to start from top down. So I write queries starting from webservices, then work my way down to blog posts. This is the opposite of your example.
I'm ignoring blog_id columns in webservices table for now since I don't quite understand yet what they do.
So first I would actually try to normalize the tables first into webservices, webservice_tags, tags, and blog_posts, like this:
webservices:
-------------------------
| id | webservice_name |
=========================
| 1 | Test Service |
-------------------------
| 2 | Example Service |
-------------------------
| 3 | Another Service |
-------------------------
webservice_tags:
------------------------------------------
| id | tag_name | webservice_id |
==========================================
| 1 | TEST | 1 |
------------------------------------------
| 2 | TESSST | 1 |
------------------------------------------
| 3 | TES | 1 |
------------------------------------------
| 4 | OPTION | 2 |
------------------------------------------
| 5 | OPT | 2 |
------------------------------------------
| 6 | EXAMPLE | 2 |
------------------------------------------
| 7 | ANOTHER | 3 |
------------------------------------------
| 8 | ANO | 3 |
------------------------------------------
| 9 | THER | 3 |
------------------------------------------
| 10 | FOO | NULL |
------------------------------------------
| 11 | BAR | NULL |
------------------------------------------
Note that I'm using tag_name in the pivot table above for the sake of readability. I guess it would be better to use tag_id instead.
tags:
--------------------------
| id | tag_name |
==========================
| 1 | TEST |
--------------------------
| 2 | TESSST |
--------------------------
| 3 | TES |
--------------------------
| 4 | OPTION |
--------------------------
| 5 | OPT |
--------------------------
| 6 | EXAMPLE |
--------------------------
| 7 | ANOTHER |
--------------------------
| 8 | ANO |
--------------------------
| 9 | THER |
--------------------------
| 10 | FOO |
--------------------------
| 11 | BAR |
--------------------------
blog_posts:
-----------------------------------------------------------
| id | title | blog_id | tag_name |
===========================================================
| 1 | Blog Title 1 | 1 | THER |
-----------------------------------------------------------
| 2 | Blog Title 2 | 1 | TES |
-----------------------------------------------------------
| 3 | Blog Title 3 | 1 | ANOTHER |
-----------------------------------------------------------
Now to get a report of how many blog posts are created by each web service, we can do this with a join query.
In this case a QueryBuilder query would give a better performance since we want to know just the aggregates, not the actual database models:
$report = DB::table('webservices')
->leftJoin('webservice_tags', 'webservice_tags.webservice_id', '=', 'webservices.id')
->leftJoin('tags', 'tags.tag_name', '=', 'webservices_tags.tag_name')
->leftJoin('blog_posts', 'blog_posts.tag_name', '=', 'tags.tag_name')
->where('blog_posts.blog_id', '=', $blog->id)
->select(['webservices.webservice_name', DB::raw('COUNT(`blog_posts.id`) as `num_posts`')])
->groupBy('webservices.id')
->get();
You would now get a report of how many blog posts were created by each web service, for all web services.
One note is that, since you mentioned a web service may share tags, and you are identifying web service <-> blog posts via tags,
you cannot 100% accurately determine the origin web service because you won't know for sure which web service set that tag for that post.
Extra: You might notice that I also added FOO and BAR to the Webservice Tags.
This would help in case, like you said that web services are "very inconsistent", new tags would be added to your database.
You could also query them and generate a report easily to find out which tag hasn't been mapped yet.

Related

Group results in Crystal Reports where linked column has multiple related results

I have two tables that are linked. The first table is a list of prescribed medications ('medications' table) and the other is a list of actions that relate to the medication when it has been prescribed ('PMP' table)
For each prescribed medication, there can be multiple actions such as authorise, comment, stop etc.
What i am trying to do is to call all prescribed medications along with every other action for that drug.
I can do this using the code below.
select medications.oid, medications.drug, PMP.action_dte, PMP.actions
from medications
left join PMP on medications.oid = PMP.fk2_oid
This works fine and i get everything i need. But the medications that have multiple actions are returned once for each action.
My issue comes when i want to put it in to a Crystal Report.
I have a subreport called Drugs and i want to list all of the authorised drugs on the left and then all the actions for that drug on the right. But what i get is.
OID | MEDICATION | ACTION DTE | ACTION
| | |
1 | Paracetamol 200mg | 01.01.17 | Authorised
| | |
| | |
1 | Paracetamol 200mg | 03.01.17 | Comment
| | |
| | |
1 | Paracetamol 200mg | 10.01.17 | Stop
| | |
| | |
2 |Ibuprofen 100mg | 05.01.17 | Authorised
| | |
| | |
2 |Ibuprofen 100mg | 06.02.17 | Comment
Where as i would like
OID | MEDICATION | ACTION DTE | ACTION
| | |
1 | Paracetamol 200mg | 01.01.17 | Authorised
| | 03.01.17 | Comment
| | 10.01.17 | Stop
| | |
2 | Ibuprofen 100mg | 05.01.17 | Authorised
| | 06.02.17 | Comment
| | |
I have played around with grouping by oid and fk2_oid. As well as trying to link two sub reports on those fields but i am getting no where.
Is anyone able to suggest a formula or preferably a modification to the code which will allow the crystal report to display the drug on the left ONCE along with every action related to that drug on the right.
Thanks in advance!
EDIT ----------
I forgot to mention that there is a date field in the equation. Which is PMP.action_dte. Edited the original query and description.
Apologies for any confusion caused.
EDIT 2 ---------
Apparently my original post was misleading.....modified to hopefully clarify. I would like the drug grouped by OID with every action allocated to that drug listed along with the action date, as shown above.

Bring in your results
Under Report > Group Expert, group by Drug.
Then I deleted Drug in Details
In Group Header 1, Section Report, check on Underlay Following
Section
Remove Bold for Group #1 Medication
You can download the Crystal Report here

MySQL - At what point should more than one table be used?

Edit for future viewers: Aside from the accepted answer which helped me I found some really good info here .
I've got a database with a single table for displaying inventory on a website (RVs). It stores the typical info: year, make, model, etc. I originally made it with 6 extra columns for storing "special features", but I don't like having such a hard limit on what options can be listed. Since I've never messed with more than a single table my gut instinct was to just add 24 or so more columns to cover everything, but something in my head told me that there might be a better way. So when do I decide N columns is too many? The data in these columns will commonly not be unique.
(Sorry for crappy diagram)
Current table design:
-----------------------------------------------------------------------
| id | year | make | model | price | ft_1 | ft_2 | ft_3 | ft_4 | ft_5 |
-----------------------------------------------------------------------
| | | | | | | | | | |
-----------------------------------------------------------------------
Possible better design:
table #1
------------------------------------
| id | year | make | model | price |
------------------------------------
| | | | | |
------------------------------------
table #2
---------------------------------------------
| unique_id(?) | feature | unit_ref |
---------------------------------------------
| 0 | "Diesel Pusher" | 2,6,14 |
---------------------------------------------
I feel like a bonus of the second table might be that I could more easily propagate a dropdown containing all the previously entered features to speed up adding new units to inventory.
Is this the right way to go about it, or should I just add more columns and be content?
Thanks.

Believe it or not, your best option would likely be to add a third table.
Since each record in your rvs table can be linked to multiple rows in the features table, and each feature can correspond to multiple rvs, you have a many-to-many relationship which is inherently difficult to maintain in a relational dbms. By adding a third "intersection" table you convert it to a one-to-many-to-one relationship which can be enforced declaratively by the dbms.
Your table structure would then become something like
rvs
------------------------------------
| id | year | make | model | price |
------------------------------------
| | | | | |
------------------------------------
features
--------------------------
| id | feature |
--------------------------
| 1192 | "Diesel Pusher" |
--------------------------
rv_features
----------------------
| rv_id | feature_id |
----------------------
| | |
----------------------
How do you make use of this? Suppose you want to record the fact that the 2016 Travelmore CampMaster has a 25kW diesel generator. You would first add a record to rvs like
--------------------------------------------------
| id | year | make | model | price |
--------------------------------------------------
| 0231 | 2016 | Travelmore | CampMaster | 750000 |
| 2101 | 2016 | Travelmore | Domestant | 650000 |
--------------------------------------------------
(Note the value in the id column is entirely arbitrary; its sole purpose is to serve as the primary key which uniquely identifies the record. It can encode meaningful information, but it must be something that will not change throughout the life of the record it identifies.)
You then add (or already have) the generator in the features table:
--------------------------------
| id | feature |
--------------------------------
| 1192 | Diesel Pusher 450hp |
| 3209 | diesel generator 25kW |
--------------------------------
Finally, you associate the rv to the feature with a record in rv_features:
----------------------
| rv_id | feature_id |
----------------------
| 0231 | 3209 |
| 0231 | 1192 |
| 2101 | 3209 |
----------------------
(I've added a few other records to each table for context.)
Now, to retrieve the features of the 2016 CampMaster, you use the following SQL query:
SELECT r.year, r.make, r.model, f.feature
FROM rvs r, features f, rv_features rf
WHERE r.id = rf.rv_id
AND rv.feature_id = f.id
AND r.id = '2031';
to get
----------------------------------------------------------
| year | make | model | feature |
----------------------------------------------------------
| 2016 | Travelmore | CampMaster | diesel generator 25kW |
| 2016 | Travelmore | CampMaster | Diesel Pusher 450hp |
----------------------------------------------------------
To see the rvs with a 25kW generator, change the query to
SELECT r.year, r.make, r.model, f.feature
FROM rvs r, features f, rv_features rf
WHERE r.id = rf.rv_id
AND rv.feature_id = f.id
AND f.id = '3209';
Sherantha's link to A Quick-Start Tutorial on Relational Database Design actually looks like a good intro to table design and normalization; you might find it useful.

There is a thing calles "third normal form" it says that everything without the unique ids shuld be unique. This means you need to make a table for year, a table for make a table for models etc and a table where you can combine all these ids to one connected dataset.
But this is not always practical, io think the best way to take this is something in between, like tables for entrys that repeat very often, but there dont need to be an extra table for price with unique ids, that would be overkill i think.

Based upon your scenario, if you believe no. of features columns remain same then no need for second table. And in case if there any possibility that features can be increased at any time in future then you should break up your table into two. (RVS & Features). Then create a third table that identify RVS & features as it seems there is many-to-many relationship. So I suggest you to use three tables.

I think it is better for you to be more familiar with relational database design. This is a short but great article I have found earlier.

How to save language skill levels correctly in a database

I think I am before a problem where many of you were before. I have a registration form where a user can pick any language of the planet and then pick his skill level for the respective language from a selectbox.
So, for example:
Language1: German
Skill: Fluent
Language2: English
Skill: Basic
I'm thinking what's the best way to store these values in a MySQL database.
I thought of two ways.
First way: creating a column for each language and assigning a skill value to it.
--------------------------------------------------
| UserID | language_en | language_ge |
--------------------------------------------------
| 22 | 1 | 4 |
--------------------------------------------------
| 23 | 3 | 4 |
--------------------------------------------------
So the language is always the column's name and the number represents the skill level (1. Basic, 2. Average ... )
I believe this is a nice way to work with these things and it is also pretty fast. The problem starts when there are 50 languages or more. It doesn't sound like a good idea to make 50 columns where the script always have to check them all if a user have any skill in that language.
Second way: inserting an array in one of the table's column. The table will look like this:
----------------------------------
| UserID | languages |
----------------------------------
| 22 | "ge"=>"4", "en"=>"1" |
----------------------------------
This way the user with ID 22 has skill level 4 for Germany and skill level 1 for English. This is fine because we don't need to check 50 additional columns (or even more) but it's not the right way in my eyes anyway.
We have to parse a lot of results and find a user with, for example, has level 1 for Germany and level 2 for Spanish without looking for the English skill level - it will take the server's a longer time and when bigger data comes we are in trouble.
I bet many of you have experienced this kind of issue. Please, can someone advise me how to sort this out?
Thanks a lot.

I'd advise you to have a separate table with all the languages:
Table: Language
+------------+-------------------+--------------+
| LanguageID | LanguageNameShort | LanguageName |
+------------+-------------------+--------------+
| 1 | en | English |
| 2 | de | German |
+------------+-------------------+--------------+
And another table to link the users to the languages:
Table: LanguageLink
+--------+------------+--------------+
| UserID | LanguageID | SkillLevelID |
+--------+------------+--------------+
| 22 | 1 | 1 |
| 22 | 2 | 4 |
| 23 | 1 | 3 |
| 23 | 2 | 4 |
+--------+------------+--------------+
This is the normalised way to represent that kind of relations in a DB. All data is easily searchable and you don't have to change the DB scheme if you add a language.
To render a user's languages you could use a query like that. It will give you a row per lanugage a user speaks:
SELECT
LanguageLink.UserID,
LanguageLink.SkillLevelID,
Language.LanguageNameShort
FROM
LanguageLink,
Language
WHERE
LanguageLink.UserID = 22
AND LanguageLink.LanguageID = Language.LanguageID
If you want to go further, you could create another table fo the skill level:
Table: Skill
+--------------+-----------+
| SkillLevelID | SkillName |
+--------------+-----------+
| 1 | bad |
| 2 | mediocre |
| 3 | good |
| 4 | perfect |
+--------------+-----------+
What I've done here is called Database normalization. I'd recommend reading about it, it may help you design further databases.

How to split CSVs from one column to rows in a new table in MSSQL 2008 R2

Imagine the following (very bad) table design in MSSQL2008R2:
Table "Posts":
| Id (PK, int) | DatasourceId (PK, int) | QuotedPostIds (nvarchar(255)) | [...]
| 1 | 1 | | [...]
| 2 | 1 | 1 | [...]
| 2 | 2 | 1 | [...]
[...]
| 102322 | 2 | 123;45345;4356;76757 | [...]
So, the column QuotedPostIds contains a semicolon-separated list of self-referencing PostIds (Kids, don't do that at home!). Since this design is ugly as a hell, I'd like to extract the values from the QuotedPostIds table to a new n:m relationship table like this:
Desired new table "QuotedPosts":
| QuotingPostId (int) | QuotedPostId (int) | DatasourceId (int) |
| 2 | 1 | 1 |
| 2 | 1 | 2 |
[...]
| 102322 | 123 | 2 |
| 102322 | 45345 | 2 |
| 102322 | 4356 | 2 |
| 102322 | 76757 | 2 |
The primary key for this table could either be a combination of QuotingPostId, QuotedPostId and DatasourceID or an additional artificial key generated by the database.
It is worth noticing that the current Posts table contains about 6,300,000 rows but only about 285,000 of those have a value set in the QuotedPostIds column. Therefore, it might be a good idea to pre-filter those rows. In any case, I'd like to perform the normalization using internal MSSQL functionality only, if possible.
I already read other posts regarding this topic which mostly dealt with split functions but neither could I find out how exactly to create the new table and also copying the appropriate value from the Datasource column, nor how to filter the rows to touch accordingly.
Thank you!
€dit: I thought it through and finally solved the problem using an external C# program instead of internal MSSQL functionality. Since it seems that it could have been done using Mikael Eriksson's suggestion, I will mark his post as an answer.

From comments you say you have a string split function that you you don't know how to use with a table.
The answer is to use cross apply something like this.
select P.Id,
S.Value
from Posts as P
cross apply dbo.Split(';', P.QuotedPostIds) as S

How can I save semantic information in a MySQL table?

I wish to save some semantic information about data in a table. How can I save this information in MySQL, such that I can access data and also search for the articles using the semantic data.
For example, I have a article about Apple and Microsoft. The semantic data will be like
Person : Steve Jobs
Person : Steve Ballmer
Company : Apple
Company : Microsoft
I want to save the information without losing the info that Steve Jobs and Steve Ballmer are persons and Apple and Microsoft are companies. I also want to search for articles about Steve Jobs / Apple.
Person and Company are not the only possible types, hence adding new fields is not viable. Since the type of the data is to be saved, I cannot use FullText field type directly.
Update - These are two options that I am considering.
Save the data in a full text column as serialized php array.
Create another table with 3 columns
--
--------------------------------
| id | subject | object |
--------------------------------
| 1 | Person | Steve Ballmer |
| 1 | Person | Steve Jobs |
| 1 | Company | Microsoft |
| 1 | Company | Apple |
| 2 | Person | Obama |
| 2 | Country | US |
--------------------------------

You're working on a hard and interesting problem! You may get some interesting ideas from looking at the Dublin Core Metadata Initiative.
http://dublincore.org/metadata-basics/
To make it simple, think of your metadata items as all fitting in one table.
e.g.
Ballmer employed-by Microsoft
Ballmer is-a Person
Microsoft is-a Organization
Microsoft run-by Ballmer
SoftImage acquired-by Microsoft
SoftImage is-a Organization
Joel Spolsky is-a Person
Joel Spolsky formerly-employed-by Microsoft
Spolsky, Joel dreamed-up StackOverflow
StackOverflow is-a Website
Socrates is-a Person
Socrates died-on (some date)
The trick here is that some, but not all, your first and third column values need to be BOTH arbitrary text AND serve as indexes into the first and third columns. Then, if you're trying to figure out what your data base has on Spolsky, you can full-text search your first and third columns for his name. You'll get out a bunch of triplets. The values you find will tell you a lot. If you want to know more, you can search again.
To pull this off you'll probably need to have five columns, as follows:
Full text subject (whatever your user puts in)
Canonical subject (what your user puts in, massaged into a standard form)
Relation (is-a etc)
Full text object
Canonical object
The point of the canonical forms of your subject and object is to allow queries like this to work, even if your user puts in "Joel Spolsky" and "Spolsky, Joel" in two different places even if they mean the same person.
SELECT *
FROM relationships a
JOIN relationships b (ON a.canonical_object = b.canonical_subject)
WHERE MATCH (subject,object) AGAINST ('Spolsky')

You might want to normalize your data table by making 2 tables.
----------------
| id | subject |
----------------
| 1 | Person |
| 2 | Company |
| 3 | Country |
----------------
-----------------------------------
| id | subject-id | object |
-----------------------------------
| 1 | 1 | Steve Ballmer |
| 2 | 1 | Steve Jobs |
| 3 | 2 | Microsoft |
| 4 | 2 | Apple |
| 5 | 1 | Obama |
| 6 | 3 | US |
-----------------------------------
This allows you to more easily see all the different subject types you have defined.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008