Count substrings in SQL (in Digital Metaphors' ReportBuilder) - mysql

I'm trying to create a report in ReportBuilder (Digital Metaphors, not Microsoft) and I'm having trouble getting the SQL to do what I want.
I have one table with a field building:
| building |
+------------+
| WhiteHouse |
| TajMahal |
and another table with a field locations:
| id | locations |
+----+-----------------------------------------------------------------+
| 1 | WhiteHouse:RoseGarden,WhiteHouse:MapRoom,TajMahal:MainSanctuary |
| 2 | TajMahal:NorthGarden,WhiteHouse:GreenRoom |
I would like to create a table showing how many times each building is used in locations, like so:
| building | count |
+------------+-------+
| WhiteHouse | 3 |
| TajMahal | 2 |
The characters : and , are never used in building or room names. Even a quick-and-dirty solution that assumes that building names never appear in room names would be good enough for me.
Of course this would be easy to do in just about any sane programming language (total over something like /\bWhiteHouse:/); the trick will be getting RB to do it. Suggestions for workarounds are welcome.

it is possible to split locations string into pieces using the "," and ":" characters as seperators as follows in SQL Server with the help of a custom sql split function
select
p2.val,
count(p2.val)
from locations l
cross apply dbo.split(l.locations,',') p1
cross apply dbo.split(p1.val,':') p2
inner join building b
on b.building = p2.val
group by p2.val
I'm not sure there is a similar one in mysql, if so please check following solution as a template

You can try this, probably not the fastest, but certainly easier solution.
SELECT t1.building,
( SELECT SUM( ROUND( (LENGTH(t2.locations)
- LENGTH(REPLACE(t2.locations, concat(t1.building, ':'), ''))
) / (LENGTH(t1.building) + 1)
)
)
FROM table2 AS t2
) as count
FROM table1 as t1
SQL Fiddle Demo

Related

Implementing SUMIF() function from Excel to SQL

Lately, I have been learning how to use SQL in order to process data. Normally, I would use Python for that purpose, but SQL is required for the classes and I still very much struggle with using it comfortably in more complicated scenarios.
What I want to achieve is the same result as in the following screenshot in Excel:
Behaviour in Excel, that I want to implement in SQL
The formula I used in Excel:
=SUMIF(B$2:B2;B2;C$2:C2)
Sample of the table:
> select * from orders limit 5;
+------------+---------------+---------+
| ID | clientID | tonnage |
+------------+---------------+---------+
| 2005-01-01 | 872-13-44-365 | 10 |
| 2005-01-04 | 369-43-03-176 | 2 |
| 2005-01-05 | 408-24-90-350 | 2 |
| 2005-01-10 | 944-16-93-033 | 5 |
| 2005-01-11 | 645-32-78-780 | 14 |
+------------+---------------+---------+
The implementation is supposed to return similar results as following group by query:
select
orders.clientID as ID,
sum(orders.tonnage) as Tonnage
from orders
group by orders.clientID;
That is, return how much each client have purchased, but at the same I want it to return each step of the addition as separate record.
For an instance:
Client A bought 350 in the first order and then 231 in the second one. In such case the query would return something like this:
client A - 350 - 350 // first order
client A - 281 - 581 // second order
Example, how it would look like in Excel
I have already tried to use something like:
select
orders.clientID as ID,
sum(case when orders.clientID = <ID> then orders.tonnage end)
from orders;
But got stuck quickly, since I would need to somehow dynamically change this <ID> and store it's value in some kind of temporary variable and I can't really figure out how to implement such thing in SQL.
You can use window function for running sum.
In your case, use like this
select id, clientID, sum(tonnage) over (partition by clientID order by id) tonnageRunning
from orders
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=13a8c2d46b5ac22c5c120ac937bd6e7a

SELECT and COUNT from two différents tables in MySQL

I'm using phpMyAdmin and I've two SQL tables:
___SalesTaxes
|--------|----------|------------|
| STX_Id | STX_Name | STX_Amount |
|--------|----------|------------|
| 1 | Tax 1 | 5.00 |
| 2 | Tax 2 | 13.50 |
|--------|----------|------------|
___Inventory
|--------|----------|----------|---------------------|
| INV_Id | INV_Name | INV_Rate | INV_ApplicableTaxes |
|--------|----------|----------|---------------------|
| 10 | Bike | 9.00 | 1 |
| 11 | Movie | 3.00 | 1,2 |
|--------|----------|----------|---------------------|
INV_ApplicableTaxes list the applicable taxes.
For each item in the ___Inventory table, I have the table ___SalesTaxes linked to know witch taxes is applicable to the item.
How can I list items in ___Inventory and sum applicable taxes to have something like this:
Bike - Applicable sum of taxes is 5.00%
Movie - Applicable sum of taxes is 18.50%
What I already tried is:
SELECT
a.INV_ApplicableTaxes,
(
SELECT Count(b.STX_Amount)
FROM ___SalesTaxes
Where b.STX_Id = a.INV_ApplicableTaxes
) as b_count
FROM ___Inventory
Thanks.
You have a very poor data structure. You should not be storing 1,2 in a single column. Why not?
Integers should be stored as integers, not strings.
Foreign key relationships should be properly declared.
SQL has relatively poor string handling functions.
The SQL optimizer cannot optimize string operations very well, particularly between tables.
The proper way to store this would be using a separate itemTaxes table, with one row per item and one row per applicable tax.
That said, sometimes we are stuck with other people's really bad design decisions and can't do anything about it (until performance is so bad that we have to).
You can do what you want using like:
SELECT i.INV_ApplicableTaxes,
(SELECT SUM(st.STX_Amount)
FROM ___SalesTaxes st
WHERE ',' || st.STX_Id || ',' LIKE '%,' || i.INV_ApplicableTaxes || ',%'
) as sum_taxes
FROM ___Inventory i;
Note: This uses ANSI standard syntax for string concatenation. Some databases have their own syntax.
EDIT:
In MySQL, you can express this using find_in_set():
SELECT i.INV_ApplicableTaxes,
(SELECT SUM(st.STX_Amount)
FROM ___SalesTaxes st
WHERE FIND_IN_SET(st.STX_Id, i.INV_ApplicableTaxes) > 0
) as sum_taxes
FROM ___Inventory i;
Looking forward, this is quite a questionable design for a database, and sooner or later, the lack of 1NF normalization may very well cause you problems.
However, if changing the database schema is not an option, you could use the find_in_set function to help perform this join:
SELECT i.*,
(SELECT SUM(s.stx_amount)
FROM ___SalesTaxes s
WHERE FIND_IN_SET(s.stx_id, i.inv_applicableTaxes) > 0
) AS total_tax
FROM ___Inventory i

How to SELECT row B only if row A doesn't exist on GROUP BY

I'm passing through the following situation and have not found a good solution to this problem. I am going through a optimization of a API so am looking for fastest possible solution.
The following description is not exactly what I am doing, but I think it represents the problem well.
Let's say I have a table of products:
+----+----------+
| id | name |
+----+----------+
| 1 | product1 |
| 2 | product2 |
+----+----------+
And I have a table of attachments to each product, separate by language:
+----+----------+------------+-----------------------+
| id | language | product_id | attachment_url |
+----+----------+------------+-----------------------+
| 1 | bb | 1 | image1_bb.jpg |
| 1 | en | 1 | image1_en.jpg |
| 1 | pt | 1 | image1_pt.jpg |
| 2 | bb | 1 | image2_bb.jpg |
| 2 | pt | 1 | image2_pt.jpg |
+----+----------+------------+-----------------------+
What I intend to do is to get the correct attachment according to the language selected on the request. As you can see above, I can have several attachments to each product. We use Babel (bb) as a generic language, so every time I don't have a attachment to the right language, I should get the babel version. Is also important to consider that the Primary Key of the attachments table is a composite of id + language.
So, supposing I try to get all the data in pt, my first option to create a SQL query was:
SELECT p.id, p.name,
GROUP_CONCAT( '{',a.id,',',a.attachment_url, '}' ) as attachments_list
FROM products p
LEFT JOIN attachments a
ON (a.product_id=p.id AND (a.language='pt' OR a.language='bb'))
The problem is that, with this query I always get the bb data and I only want to get it when there is no attachment on the right language.
I already tried to do a subquery changing attachments for:
(SELECT * FROM attachments GROUP BY id ORDER BY id ASC, language DESC)
but it doubles the time of the request.
I also tried using DISTINCT inside the GROUP_CONCAT, but it only works if the whole result of each row is equal, so it does not work for me.
Does anyone knows any other solution that I can apply directly into the query?
EDIT:
Combining the answers of #Vulcronos and #Barmar made the final solution at least 2x faster than the one I first suggested.
Just to add some context, for anybody else who is looking for it. I am using Phalcon. Because of it, I had a lot of trouble putting the pieces together, as Phalcon PHQL does not support subqueries, nor a lot of the other stuff I had to use.
For my scenario, where I had to deliver approximatelly 1.2MB of JSON content, with more than 2100 objects, using custom queries made the total request time up to 3x faster than Phalcon native relations management methods (hasMany(), hasManyToMany(), etc.) and 10x faster than my original solution (which used a lot the find() method).
Try doing two joins instead of one:
SELECT p.id, p.name,
GROUP_CONCAT( '{',COALESCE(a.id, b.id),',',COALESCE(a.attachment_url, b.attachment_url), '}' ) as attachments_list
FROM products p
LEFT JOIN attachments a
ON (a.product_id=p.id AND a.language='pt')
LEFT JOIN attachments b
ON (a.product_id=p.id AND a.language='bb')
and then using COALESCE to return b instead of a if a doesn't exist. You can also do it with a subselect if the above doesn't work.
OR conditions tend to make queries slow, because it's hard to optimize them with indexes. Try joining separately using the two different languages.
SELECT p.id, p.name,
IFNULL(apt.attachment_url, abb.attachment_url) AS attachment_url
FROM products AS p
JOIN attachments AS abb ON abb.product_id = p.id
LEFT JOIN attachments AS apt ON alang.product_id = p.id AND apt.language = 'pt'
WHERE abb.language = 'bb'
This assumes that all products have a bb attachment, while pt is optional.
I left out the join of Product, because it's not relevant for this problem. It's only needed to include the product name in the resultset.
SELECT a.product_id, a.id, a.attachment_url FROM attachments a
WHERE a.language = ?
OR (a.language = 'bb'
AND NOT EXISTS
(SELECT * FROM attachments
WHERE language = ?
AND id = a.id
AND product_id = a.product_id));
Notes: problems like this usually have many possible solutions. This is not necessarily the most efficient one.

mysql count votes optimization

so im making a file hub nothing huge or fancy just to store some files that may be shared by others for download. and it just occured to me in the way that i originally intended to count the amount of upvotes or downvotes the query could be server heavy.the query to get the files is something along the lines of
select*from files;
and in such i would recieve an array of my files that i could loop over and get specifics on each file now with the inclusion of voting a file that same foreach loop would include a further query that would get the count the amount votes a file would get (the file id in the where clause) like so
select*from votes where upvoted=true and file.id=?
and i was thinking of using pdo::rowCount to get my answer. now evey bone in my body just says this is bad very bad as imagine im getting 10,000 files i just ran 10,000 extra queries one on each file and i havent looked at the downvotes yet which i was think could go in a similar fasion. any optimization adviece here is a small rep of the structure of a few tables. the upvoted and downvoted columbs are of type bool or tinyint if you will
table: file table: user table: votes
+----+-------------+ +----+-------------+ +--------+--------+--------+--------+
| id |storedname | | id | username | |file_id | user_id| upvoted | downvoted
+----+-------------+ +----+-------------+ +--------+--------+--------+--------+
| 1 | 45tfvb.txt | | 1 | matthew | | 1 | 2 | 1 | 0
| 2 |jj7fnfddf.pdf| | 2 | mark | | 2 | 1 | 1 | 1
| .. | .. | | .. | .. | | .. | .. | .. | ..
there are two ways to do this. the better way to do this (aka faster) is to write separate queries and build into one variable in your programming language (like php, python.. etc.)
SELECT
d.id as doc_id,
COUNT(v.document_id) as num_upvotes
FROM votes v
JOIN document d on d.id = v.document_id
WHERE v.upvoted IS TRUE
GROUP BY doc_id
);
that will return your list of upvoted documents. you can do the same for your downvotes.
then after your select from document do a for loop to compare the votes with the document by ID and build into a dictionary or list.
The second way to do this which can take a lot longer at runtime if you have a bunch of records in the table (its less efficient, but easier to write) is to add subquery selects in your select statement like this...
SELECT
logical_name ,
document.id ,
file_type ,
physical_name ,
uploader_notes ,
views ,
downloads ,
user.name ,
category.name AS category_name,
(Select count(1) from votes where upvoted=true and document_id=document.id )as upvoted,
(select count(1) from votes where upvoted=false and document_id=document.id) as downvoted
FROM document
INNER JOIN category ON document.category_id = category.id
INNER JOIN user ON document.uploader_id = user.id
ORDER BY category.id
Two advices:
Avoid SELECT * especially if you're going to count. Replace it, with something like that:
SELECT COUNT(1) AS total WHERE upvoted=true AND file.id=?
Maybe you want to create a TRIGGER to keep update a counter in the file table.
I hope it will be helpfull to you.

How do I use mysql to match against multiple possibilities from a second table?

I'm not entirely sure how to ask this question, so I'll lead by providing an example table and an example output and then follow up with a more thorough explanation of what I'm attempting to accomplish.
Imagine that I have two tables. In the first is a list of companies. Some of these companies have duplicate entries due to being imported and continuously updated from different sources. For example, the company table may look something like this:
| rawName | strippedName |
| Kohl's | kohls |
| kohls.com | kohls |
| kohls Corporation | kohls |
So in this situation, we have information that has come in from three different sources. In an attempt to allow my program to understand that each of these sources are all the same store, I created the stripped name column (which I also use for creating URL's and whatnot).
In the second table, we have information about deals, coupons, shipping offers, etc. However, since these come in from their various sources, the end up with the three different rawNames that we identified above. For example, the second table might look something like this:
| merchantName | dealInformation |
| kohls.com | 10% off everything... |
| kohl's | Free shipping on... |
| kohls corporation | 1 Day Flash Sale! |
| kohls.com | Buy one get one... |
So here we have four entries that are all from the same company. However, when a user on the site visits the listing for Kohls, I want it to display all the entries from each source.
Here is what I currently have, but it doesn't seem to be doing the trick. This seems to only work if I set the LIMIT in that sub-query to 1 so that it only brings back one of the rawNames. I need it to match against all of the rawNames.
SELECT * FROM table2
WHERE merchantName = (SELECT rawName FROM table1 WHERE strippedName = '".$strippedName."')
The quickest fix is to replace your mercahantName = with merchantName IN
SELECT * FROM table2
WHERE merchantName IN (SELECT rawName FROM table1 WHERE strippedName = '".$strippedName."')
The = operator needs to have exactly one value on each side - the IN keyword matches a value against multiple values.