Table lists
id | user_id | name
1 | 3 | ListA
2 | 3 | ListB
Table celebrities
id | user_id | list_id | celebrity_code
1 | 3 | 1 | AA000297
2 | 3 | 1 | AA000068
3 | 3 | 2 | AA000214
4 | 3 | 2 | AA000348
I am looking a JSON object like this
[
{id:1, name:'ListA', celebrities:[{celebrity_code:AA000297},{celebrity_code:AA000068}]},
{id:2, name:'ListB', celebrities:[{celebrity_code:AA000214},{celebrity_code:AA000348}]}
]
Moved this to an answer since the details were getting long, and I thought the additional references would be useful to future readers.
Since you are using MySQL, check out GROUP_CONCAT. To get your object, you will want to GROUP_CONCAT on a CONCATenated string. If you could live with a schema more like {id:2, name:'ListB', celebrity_codes:['AA000214','AA000348']} you'll have a simpler query. If you make a SQLfiddle of your basic schema (basically your create tables plus the inserts of the above sample data), someone might even write it for you. :-)
To be clear, while GROUP_CONCAT can do this, if you are trying to generate more than a fairly simple schema, it gets to be some pretty messy code and it starts making more and more sense to move it into your application layer both from a code maintenance standpoint as well as performance & scalability considerations.
Also note that SQLLite supports GROUP_CONCAT, for other databases:
Postgres user should look at string_agg
SQL Server users should check out this project on CodePlex.
Oracle users can use MODEL, as illustrated here.
Related
This question targets modelling limited availability in Doctrine 2. I'm sure this has already been discussed here as it seems quite basic but I could not find any best practices. May it be that limit/restrict/max/... are bad search terms as they all mean something else in the db world :-).
Simplified example
Assume a typical online shop application that allows multiple users to buy items of some kind (at the same time). Some of these items may have a limited availability (first come first served). So two users may be in a concurrent situtation when trying to checkout/confirming the order. The faster one must win the race, the other order should not even be processed (inserted in the database).
Entities/tables may look like this:
items
+----+-----+---------------+---------+
| id | ... | max_available | version |
+----+-----|---------------|---------+
| 7 | | 4 | 2 |
| 8 | | 1 | 0 |
orders
+----+---------+----------+
| id | item_id | quantity |
+----+---------+----------+
| 1 | 7 | 2 |
| 1 | 7 | 1 |
In this case: Another order for item 8 with a quantity of 1 would be valid. Another order for item 7 with a quantity of 2 must be prevented as this would be one more that available.
Best practice?
The application uses Doctrine 2 ORM, the db will be MySQL. The system may be coupled to the db type but if there is a reasonable db agnostic way that's even better of course.
What's the best way to model this?
Transactions and locking on db level (db needs to support this)? Locking on ORM level (integer version field)? Or should there be (additionally) installed triggers that ensure data integrity on database level?
Sidenote: Should constraints be optional by design or can they be part of the business logic? In other words: Is it bad practice to test against constraints and let the test fail under normal conditions - e.g. by having a (concurrency safe) trigger on updates/inserts, that cancels the request if an item isn't available anymore? (This would only work for certain db types and InnoDB as the engine in the case of MySQL...)
I'll try to explain my situation: I'm trying to create a search engine for products on my website, so when the user needs to find a product I need to show similar ones, here's an example.
User searches:
assassins creed OR assassinscreed OR aSsAssIn's CreeD assuming there are no letters/numbers mispelling (those 3 queries should produce the same result)
Expected results:
Assassin's Creed AND Assassin's Creed: Unity AND Assassin's Creed: Special Edition
What have I tried so far
I have created a MySQL field for the search engine which contains a parsed name of the product (Assassin's Creed: Unity -> assassinscreedunity
I parse the search query
I search using MySQL's INSTR()
My problem
I'm fine by using this, but I heard it can be slow when the number of rows increases, I've created a full-text index in my table, but I don't think it would help, so I need another solution.
Thanks for any answer, and ask me anything before downvoting.
First of all, you should keep track of performance issues in your queries more precisely than 'heard it cand be slow' and 'think it would help'. One starting point may be the Slow Query Log.
If you have a table which contains the same parsed name in more than one row, consider normalizing your database. In the specific case, store unique parsed names in one table, and only the id of the corresponding parsed name in the table you described in your question. This way, you only need to check the smaller table with unique names and can then quickly find all matching entries in the main table by id.
Example:
Consider the following table with your structure
id | product_name | rating
-----------------------------------
1 | assassinscreedunity | 5
2 | assassinscreedunity | 2
3 | monkeyisland | 3
4 | monkeyisland | 5
5 | assassinscreedunity | 4
6 | monkeyisland | 4
you would have to scan all six entries to find relevant rows.
In contrast, consider two tables like this:
id | p_id | rating
--------------------
1 | 1 | 5
2 | 1 | 2
3 | 2 | 3
4 | 2 | 5
5 | 1 | 4
6 | 2 | 4
id | name
--------------------------
1 | assassinscreedunity
2 | monkeyisland
In this case, you only have to scan two entries (compared to six) and can then efficiently look up relevant rows using the integer id.
To further enhance the performance, you could extend the concept of a parsed name and use hashes. For example, you could calculate the SHA1-hash of your parsed name which is a 160 bit value. You can find entries in your database for this value very efficiently. To match substrings, you can add them to the second table as well. Since the hash only needs to computed once, you still can use the database to match by an integer. Another thing for you might be fuzzy hashing.
In addition, you should read up on the Rabin–Karp algorithm or string searching in general.
I have put a lot of effort into my database design, but I think I am
now realizing I made a major mistake.
Background: (Skip to 'Problem' if you don't need background.)
The DB supports a custom CMS layer for a website template. Users of the
template are limited to turning pages on and off, but not creating
their own 'new' pages. Further, many elements are non editable.
Therefore, if a page has a piece of text I want them to be able to edit,
I would have 'manually' assigned a static ID to it:
<h2><%= CMS.getDataItemByID(123456) %></h2>
Note: The scripting language is not relevant to this question, but the design forces
each table to have unique column names. Hence the convention of 'TableNameSingular_id'
for the primary key etc.
The scripting language would do a lookup on these tables to find the string.
mysql> SELECT * FROM CMSData WHERE CMSData_data_id = 123456;
+------------+-----------------+-----------------------------+
| CMSData_id | CMSData_data_id | CMSData_CMSDataType_type_id |
+------------+-----------------+-----------------------------+
| 1 | 123456 | 1 |
+------------+-----------------+-----------------------------+
mysql> SELECT * FROM CMSDataTypes WHERE CMSDataType_type_id = 1;
+----------------+---------------------+-----------------------+------------------------+
| CMSDataType_id | CMSDataType_type_id | CMSDataType_type_name | CMSDataType_table_name |
+----------------+---------------------+-----------------------+------------------------+
| 1 | 1 | String | CMSStrings |
+----------------+---------------------+-----------------------+------------------------+
mysql> SELECT * FROM CMSStrings WHERE CMSString_CMSData_data_id=123456;
+--------------+---------------------------+----------------------------------+
| CMSString_id | CMSString_CMSData_data_id | CMSString_string |
+--------------+--------------------------------------------------------------+
| 1 | 123456 | The answer to the universe is 42.|
+--------------+---------------------------+----------------------------------+
The rendered text would then be:
<h2>The answer to the universe is 42.</h2>
This works great for 'static' elements, such as the example above. I used the exact same
method for other data types such as file specifications, EMail Addresses, Dates, etc.
However, it fails for when I want to allow the User to dynamically generate content.
For example, there is an 'Events' page and they will be dynamically created by the
User by clicking 'Add Event' or 'Delete Event'.
An Event table will use keys to reference other tables with the following data items:
Data Item: Table:
--------------------------------------------------
Date CMSDates
Title CMSStrings (As show above)
Description CMSTexts (MySQL TEXT data type.)
--------------------------------------------------
Problem:
That means, each time an Event is created, I need to create the
following rows in the CMSData table;
+------------+-----------------+-----------------------------+
| CMSData_id | CMSData_data_id | CMSData_CMSDataType_type_id |
+------------+-----------------+-----------------------------+
| x | y | 6 | (Event)
| x+1 | y+1 | 5 | (Date)
| x+2 | y+2 | 1 | (Title)
| x+3 | y+3 | 3 | (Description)
+------------+-----------------+-----------------------------+
But, there is the problem. In MySQL, you can have only 1 AUTO INCREMENT field.
If I query for the highest value of CMSData_data_id and just add 1 to it, there
is a chance there is a race condition, and someone else grabs it first.
How is this issue typically resolved - or avoided in the first place?
Thanks,
Eric
The id should be meaningless, except to be unique. Your design should work no matter if the block of 4 ids is contiguous or not.
Redesign your implementation to add the parts separately, not as a block of 4. Doing so should simplify things overall, and improve your scalability.
What about locking the table before writing into it? This way, when you are inserting a row in the CMSData table, you can get the last id.
Other suggestion would be to not have an incremented id, but a unique generated one, like a guid or so.
Lock Tables
I am having trouble developing some queries on the fly for our clients and sometimes find myself asking "Would it be better to start with a subset of the data I know I'm looking for, then just import into a program like Excel and process the data accordingly using similar functions, such as Pivot Tables"?.
One instance in particular I am struggling with is the following example:
I have an online member enrollment system. For simplicity sake, let's assume the data captured is: Member ID, Sign Up Date, their referral code, their state.
A sample member table may look like the following:
MemberID | Date | Ref | USState
=====================================
1 | 2011-01-01 | abc | AL
2 | 2011-01-02 | bcd | AR
3 | 2011-01-03 | cde | CA
4 | 2011-02-01 | abc | TX
and so on....
ultimately, the types of queries I want to build and run with this data set can extend to:
"Show me a list of all referral codes and the number of sign ups they had by each month in a single result set".
For example:
Ref | 2011-01 | 2011-02 | 2011-03 | 2011-04
==============================================
abc | 1 | 1 | 0 | 0
bcd | 1 | 0 | 0 | 0
cde | 1 | 0 | 0 | 0
I have no idea how to build this type of query in MySQL to be honest (I imagine if it can be done it would require a LOT of code, joins, subqueries, and unions.
Similarly, another sample query may be how many members signed up in each state by month
USState | 2011-01 | 2011-02 | 2011-03 | 2011-04
==============================================
AL | 1 | 0 | 0 | 0
AR | 1 | 0 | 0 | 0
CA | 1 | 0 | 0 | 0
TX | 0 | 1 | 0 | 0
I suppose my question is two fold:
1) Is it in fact best to just try and build these out with the necessary data from within a MySQL GUI such as Navicat or just import the entire subset of data into Excel and work forward?
2) If I was to use the MySQL route, what is the proper way to build the subsets of data in the examples mentioned below (note that the queries could become far more complex such as "Show how many sign ups came in for each particular month by each state and grouped by each agent as well (each agent has 50 possible rows)"
Thank you so much for your assistance ahead of time.
I am a proponent of doing this kind of querying on the server side, at least to get just the data you need.
You should create a time-periods table. It can get as complex as you desire, going down to days even.
id year month monthstart monthend
1 2011 1 1/1/2011 1/31/2011
...
This gives you almost limitless ability to group and query data in all sorts of interesting ways.
Getting the data for the original referral counts by month query you mentioned would be quite simple...
select a.Ref, b.year, b.month, count(*) as referralcount
from myTable a
join months b on a.Date between b.monthstart and b.monthend
group by a.Ref, b.year, b.month
order by a.Ref, b.year, b.month
The result set would be in rows like ref = abc, year = 2011, month = 1, referralcount = 1 as opposed to a column for every month. I am assuming that since getting a larger set of data and manipulating it in Excel was an option, that changing the layout of this data wouldn't be difficult.
Check out this previous answer that goes into a little more detail about the concept with different examples: SQL query for Figuring counts by month
I work on an Excel based application that deals with multi-dimensional time series data, and have recently been working on implementing predefined pivot table spreadsheets, so I know exactly what you're thinking. I'm a big proponent of giving users tools rather than writing up individual reports or a whole query language for them to use. You can create pivot tables on the fly that connect to the database and it's not that hard. Andrew Whitechapel has a great example here. But, you will also need to launch that in Excel or setup a basic Excel VSTO program, which is fairly easy to do in Visual Studio 2010. (microsoft.com/vsto)
Another thing, don't feel like you have to create ridiculously complex queries. Every join that you have will slow down any relational database. I discovered years ago that doing multi-step queries into temp tables in most cases will be much clearer, faster, and easier to write and support.
I've been wondering this for a while already. The title stands for my question. What do you prefer?
I made a pic to make my question clearer.
Why am I even thinking of this? Isn't one table the most obvious option? Well, kind of. It's the simpliest way, but let's think more practical. When there is a ton of data in one table and user wants to only see statistics about browsers the visitors use, this may not be as successful. Taking browser-data out of one table is naturally better.
Multiple tables has disadvantages too. Writing data takes more time and resources. With one table there's only one mysql-query needed.
Anyway, I figured out a solution, which I think makes sense. Data is written to some kind of temporary table. All of those lines will be exported to multiple tables later (scheduled script). This way the system doesn't take loading-time from the users page, but the data remains fast to browse.
Let's bring some discussion here. I'm hoping to raise some opinions.
Which one is better? Let's find out!
The date, browser and OS are all related on a one-to-one basis... Without more information to require distinguishing records further, I'd be creating a single table rather than two.
Database design is based on creating tables that reflect entities, and I don't see two distinct entities in the example provided. Consider using views to serve data without duplicating the data in the database; a centralized copy of the data makes managing the data much easier...
What you're really thinking of is whether to denormalize the table or use the first normal form. When you're using 1NF you have a table that looks like this:
Table statistic
id | date | browser_id | os_id
---------------------------------------------
1 | 127003727 | 1 | 1
2 | 127391662 | 2 | 2
3 | 127912683 | 3 | 2
And then to explain what browser and os the client used, you need other tables:
Table browser
id | name | company | version
-----------------------------------------------
1 | Firefox | Mozilla | 3.6.8
2 | Safari | Apple | 4.0
3 | Firefox | Mozilla | 3.5.1
Table os
id | name | company | version
-----------------------------------------------
1 | Ubuntu | Canonical | 10.04
2 | Windows | Microsoft | 7
3 | Windows | Microsoft | 3.11
As OMG Ponies already pointed out, this isn't a good example to be creating several entities, so one can safely go with one table and then think about how he/she is going to deal with having to, say, find all the entries with a matching browser name.