SQL Server 2008 - Too much denormalization and over Indexing: What use is there for the Matrix? - sql-server-2008

I have a budding developer who is very enthusiastic about something he is calling “the matrix”
I am looking for peer insight
In a nutshell this is what we have:
- 1 highly denormalized table with about 120 columns
- Data points range from account, customer, household, relationship, product, employee, etc…
- One index per column: about 120 non-clustered indexes
- About 90% of all space in the database used by indexes today are indexes on this table
- Today about 1.5 million rows with a lot of nulls
- Table loaded with a stored procedure whose core is dynamic SQL
- All Field names are generic and do not describe the data
- A data dictionary type table is used with the dynamic SQL to load any data point to any field
- Field mapping is not static: today column dim_0001 is customer name, but tomorrow maybe something else
- No primary key
- No foreign keys
- No real constraints (For example all fields are nullable)
The argument for the table:
- Makes writing queries simpler because it eliminates the needs to write some join
The intended use:
- An End User Layer and would be a core component of a Universe build in Business Objects
- Post ETL process development
My recommendation will either kill the process where it is today (early development in a test environment) or move it to the next step in test.
Based on the research I have done, my education, and experience I do not support it and want the tables dropped as soon as the one or two processes that depend on these tables have been migrated to another solution.
Script below for your reference (I limited to one index example).
Any insight you can offer (even just a one word opinion) is valuable
-- The Matrix
CREATE TABLE [z005497].[tblMatrix](
[as_of_dt] [datetime] NOT NULL,
[dim_0001] [varchar](100) NULL,
[dim_0002] [varchar](103) NULL,
[dim_0003] [varchar](100) NULL,
[dim_0004] [varchar](100) NULL,
[dim_0005] [varchar](100) NULL,
[dim_0006] [varchar](100) NULL,
[dim_0007] [varchar](100) NULL,
[dim_0008] [varchar](100) NULL,
[dim_0009] [varchar](100) NULL,
[dim_0010] [varchar](100) NULL,
[dim_0011] [varchar](100) NULL,
[dim_0012] [varchar](100) NULL,
[dim_0013] [varchar](100) NULL,
[dim_0014] [varchar](100) NULL,
[dim_0015] [varchar](100) NULL,
[dim_0016] [varchar](100) NULL,
[dim_0017] [varchar](103) NULL,
[dim_0018] [varchar](103) NULL,
[dim_0019] [varchar](103) NULL,
[dim_0020] [varchar](103) NULL,
[dim_0021] [varchar](103) NULL,
[dim_0022] [varchar](103) NULL,
[dim_0023] [varchar](103) NULL,
[dim_0024] [varchar](103) NULL,
[dim_0025] [varchar](103) NULL,
[dim_0026] [varchar](11) NULL,
[dim_0027] [varchar](11) NULL,
[dim_0028] [varchar](11) NULL,
[dim_0029] [varchar](11) NULL,
[dim_0030] [varchar](11) NULL,
[dim_0031] [varchar](11) NULL,
[dim_0032] [varchar](11) NULL,
[dim_0033] [varchar](11) NULL,
[dim_0034] [varchar](11) NULL,
[dim_0035] [varchar](11) NULL,
[dim_0036] [varchar](11) NULL,
[dim_0037] [varchar](11) NULL,
[dim_0038] [varchar](11) NULL,
[dim_0039] [varchar](11) NULL,
[dim_0040] [varchar](11) NULL,
[dim_0041] [varchar](11) NULL,
[dim_0042] [varchar](11) NULL,
[dim_0043] [varchar](11) NULL,
[dim_0044] [varchar](11) NULL,
[dim_0045] [varchar](11) NULL,
[dim_0046] [varchar](11) NULL,
[dim_0047] [varchar](11) NULL,
[dim_0048] [varchar](11) NULL,
[dim_0049] [varchar](11) NULL,
[dim_0050] [varchar](11) NULL,
[dim_0051] [varchar](11) NULL,
[dim_0052] [varchar](11) NULL,
[dim_0053] [varchar](11) NULL,
[dim_0054] [varchar](5) NULL,
[dim_0055] [varchar](5) NULL,
[dim_0056] [varchar](5) NULL,
[dim_0057] [varchar](5) NULL,
[dim_0058] [varchar](5) NULL,
[dim_0059] [varchar](5) NULL,
[dim_0060] [varchar](5) NULL,
[dim_0061] [varchar](5) NULL,
[dim_0062] [varchar](5) NULL,
[dim_0063] [varchar](5) NULL,
[dim_0064] [varchar](5) NULL,
[dim_0065] [varchar](5) NULL,
[dim_0066] [varchar](5) NULL,
[dim_0067] [varchar](5) NULL,
[dim_0068] [varchar](5) NULL,
[dim_0069] [varchar](5) NULL,
[dim_0070] [varchar](5) NULL,
[dim_0071] [varchar](5) NULL,
[dim_0072] [varchar](5) NULL,
[dim_0073] [varchar](5) NULL,
[dim_0074] [varchar](5) NULL,
[dim_0075] [varchar](5) NULL,
[dim_0076] [varchar](5) NULL,
[dim_0077] [varchar](5) NULL,
[dim_0078] [varchar](5) NULL,
[dim_0079] [varchar](5) NULL,
[dim_0080] [varchar](5) NULL,
[dim_0081] [varchar](5) NULL,
[dim_0082] [varchar](5) NULL,
[dim_0083] [varchar](5) NULL,
[dim_0084] [int] NULL,
[dim_0085] [int] NULL,
[dim_0086] [int] NULL,
[dim_0087] [int] NULL,
[dim_0088] [int] NULL,
[dim_0089] [int] NULL,
[dim_0090] [int] NULL,
[dim_0091] [int] NULL,
[dim_0092] [int] NULL,
[dim_0093] [int] NULL,
[dim_0094] [varchar](12) NULL,
[dim_0095] [varchar](12) NULL,
[dim_0096] [varchar](12) NULL,
[dim_0097] [varchar](120) NULL,
[dim_0098] [varchar](120) NULL,
[dim_0099] [varchar](120) NULL,
[dim_0100] [numeric](20, 0) NULL,
[dim_0101] [varchar](20) NULL,
[dim_0102] [varchar](20) NULL,
[dim_0103] [varchar](20) NULL,
[dim_0104] [varchar](20) NULL,
[dim_0105] [varchar](20) NULL,
[dim_0106] [varchar](20) NULL,
[dim_0107] [varchar](20) NULL,
[dim_0108] [varchar](20) NULL,
[dim_0109] [varchar](20) NULL,
[dim_0110] [varchar](20) NULL,
[dim_0111] [varchar](20) NULL,
[dim_0112] [varchar](20) NULL,
[dim_0113] [varchar](20) NULL,
[dim_0114] [varchar](20) NULL,
[dim_0115] [varchar](20) NULL,
[dim_0116] [varchar](20) NULL,
[dim_0117] [varchar](20) NULL,
[dim_0118] [varchar](20) NULL,
[dim_0119] [varchar](20) NULL,
[dim_0120] [varchar](20) NULL,
[lastLoad] [datetime] NULL
) ON [PRIMARY]
-- Index example
CREATE NONCLUSTERED INDEX [idx_dim_0001 (not unique)] ON [z005497].[tblMatrix]
(
[dim_0001] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
-- The configuration table from which developers would find out what is in the Matrix
CREATE TABLE [z005497].[tblMatrixCfg](
[dimId] [int] IDENTITY(100000,1) NOT NULL,
[colName] [varchar](25) NOT NULL,
[dataType] [varchar](25) NOT NULL,
[dimName] [varchar](25) NOT NULL,
[dimDesc] [varchar](500) NOT NULL,
[dimpath] [varchar](5000) NOT NULL,
[loadDate] [datetime] NOT NULL,
[modUser] [varchar](100) NOT NULL,
[modDate] [datetime] NOT NULL,
CONSTRAINT [PK_tblMatrixCfg_1] PRIMARY KEY CLUSTERED
(
[dimId] ASC,
[colName] ASC,
[dimName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

Kill it if you can.
Also, that developer needs a lot more experience. And he/she should get it at another company.
It's basically violating so many things I don't know where to start.
Even if you end up fighting a highly normalized model which is following someone's best practices slavishly, it won't compare to the disaster which this design is going to create.

Just to give one example of what Cade meant with "I don't know where to start" :
"today column dim_0001 is customer name, but tomorrow maybe something else"
This typically also means that in the User acceptance system, dim_0001 can be customer name (and the system might seem to work and get accepted), and then you move to production, and dim_0001 gets to be name of the president's wife or so, and then hours of meetings need to be spent trying to figure out (a) where the problem is, and (b) how to get it fixed in as little time as possible.
( (b) usually amounts to patching the code with stuff like "if col_name = dim_0001 then don't treat it as what the matrix says it is, but treat it as what is hardcoded here instead".)

"What use is there for the Matrix?"
Well, I certainly don't get it.
I have never seen anything like this before and I don't understand how it is meant to be used or how the indexes is meant to speed up anything or how it is possible to query this table without using at least self joins.
Call me inexperienced if you like but this is a first for me. I would think that if this is the way to do things, the db vendors should not put so much effort into allowing us developers to define tables, with columns that have different data types, with relationships.

This is the result of trying to stuff an object oriented paradigm into a relational system. Document databases allow for this sort of programming:
Documents inside a document-oriented database are similar, in some
ways, to records or rows, in relational databases, but they are less
rigid. They are not required to adhere to a standard schema nor will
they have all the same sections, slots, parts, keys, or the like. For
example here's a document:
FirstName="Bob", Address="5 Oak St.", Hobby="sailing".
Another document could be:
FirstName="Jonathan", Address="15 Wanamassa Point Road", Children=[{Name:"Michael",Age:10}, {Name:"Jennifer", Age:8},
{Name:"Samantha", Age:5}, {Name:"Elena", Age:2}].
Both documents have some similar information and some different.
Unlike a relational database where each record would have the same set
of fields and unused fields might be kept empty, there are no empty
'fields' in either document (record) in this case. This system allows
new information to be added and it doesn't require explicitly stating
if other pieces of information are left out.
Trying to use this paradigm in a relational database is a "square peg, round hole" problem. A document database might be excellent for a highly transactional system, but analysis would be better served by loading the transactional data into various fact tables in a data warehouse.

Related

Coverting script for older SQL version to work with 5.6

Apologies if my fundamental understanding of the issue is incorrect. I am not very experienced with SQL and am still learning.
I am attempting to generate a table for a data set and was given this script:
CREATE TABLE [dbo].[lobbying](
[uniqid] [varchar](36) NOT NULL, [registrant_raw] [varchar](110) NULL, [registrant] [varchar](50) NULL, [isfirm] [char](1) NULL,
[client_raw] [varchar](110) NULL, [client] [varchar](50) NULL,
[ultorg] [varchar](50) NULL,
[amount] [float] NULL,
[catcode] [char](5) NULL,
[source] [char] (5) NULL,
[self] [char](1) NULL,
[IncludeNSFS] [char](1) NULL,
[use] [char](1) NULL,
[ind] [char](1) NULL,
[year] [char](4) NULL,
[type] [char](4) NULL,
[typelong] [varchar](50) NULL, [affiliate] [char](1) NULL,
) ON [PRIMARY]
As you can probably tell, it doesn't work. The script was updated in 2015 so that is why I presume the issue to be the version. I tried using SQL Fiddle to figure out what was causing the issue, and found that taking the brackets out helped(which makes sense, as the tutorial I was following did not use any brackets for their tables). However, even with that, I still receive the error
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'use char(1) NULL,
ind char(1) NULL,
year char(4) NULL,
type char(4) NULL,
typelo' at line 10
Does anybody know what the issue is here? Any help would be greatly appreciated. I've poured out about 4 hours into this project so far and have not been able to get past this roadblock.
It is enough to replace the names with brackets with backticks and also remove the brackets around the type nqames
CREATE TABLE lobbying(
`niqid`varchar(36) NOT NULL
, `egistrant_raw`varchar(110) NULL
, `egistrant` varchar(50) NULL
, `isfirm` char(1) NULL,
`client_raw` varchar(110) NULL
, `client` varchar(50) NULL,
`ultorg` varchar(50) NULL,
`amount` float NULL,
`catcode` char(5) NULL,
`source` char(5) NULL,
`self` char(1) NULL,
`IncludeNSFS` char(1) NULL,
`use` char(1) NULL,
`ind` char(1) NULL,
`year` char(4) NULL,
`type` char(4) NULL,
`typelong` varchar(50) NULL
, `affiliate` char(1) NULL
)
✓
db<>fiddle here

By using query, how do I create a table in a database instead of a schema in SSMS

I know using the CREATE TABLE [example_schema].[example table] creates the table in the schema but I want to create the table in a database instead, but I don't know the syntax
CREATE TABLE Royal_Poly_DB.staff_relation (
"staff_no" CHAR(4) NOT NULL,
"staff_name" VARCHAR(100) NOT NULL,
"supervisor" CHAR(4) NULL,
"dob" DATE NOT NULL,
"grade" CHAR(5) NOT NULL,
"marital_status" CHAR(1) NOT NULL,
"pay" DECIMAL(7,2) NULL,
"allowance" DECIMAL(7,2) NULL,
"hourly_rate" DECIMAL(7,2) NULL,
"gender" CHAR(1) NOT NULL,
"citizenship" VARCHAR(10) NOT NULL,
"join_yr" INT NOT NULL,
"dept_cd" VARCHAR(5) NOT NULL,
"type_of_employment" CHAR(2) NOT NULL,
"highest_qln" VARCHAR(10) NOT NULL,
"designation" VARCHAR(20) NOT NULL,
PRIMARY KEY (staff_no))
Right click the database in the object explorer and select
"New Query" then add in you code there

Looking for a general name or term for this particular database table design

I've run into a particular SQL table design and I'm trying to see if there's a common name people use to describe it. The table has a definition like this:
CREATE TABLE [ATTRIBUTES] (
[Id] [int] NOT NULL,
[FIELD_1] [varchar](30) NULL,
[VALUE_1] [varchar](30) NULL,
[FIELD_2] [varchar](30) NULL,
[VALUE_2] [varchar](30) NULL,
[FIELD_3] [varchar](30) NULL,
[VALUE_3] [varchar](30) NULL,
[FIELD_4] [varchar](30) NULL,
[VALUE_4] [varchar](30) NULL,
[FIELD_5] [varchar](30) NULL,
[VALUE_5] [varchar](30) NULL,
[FIELD_6] [varchar](30) NULL,
[VALUE_6] [varchar](30) NULL,
[FIELD_7] [varchar](30) NULL,
[VALUE_7] [varchar](30) NULL,
[FIELD_8] [varchar](30) NULL,
[VALUE_8] [varchar](30) NULL,
[FIELD_9] [varchar](30) NULL,
[VALUE_9] [varchar](30) NULL,
[FIELD_10] [varchar](30) NULL,
[VALUE_10] [varchar](30) NULL
)
The idea is to allow a variable number of attributes to be defined (up to a max of 10 in this case), without using a more normalized design.
I'm not looking for the pros/cons of this approach, but does anyone know of a common name or term used to describe this type of database table design?
This is typically referred to as the Attribute Columns pattern when people aren't busy calling it a pile of crap.
I would call it a (particularly nasty) form of a key-value store.

database desigining for city details from multiple web service

I am working on travel application, so we have to deal with different web services like GTA, Gallileo, Kuoni etc . for getting information regarding Hotel details.
Each web service has its own list of city code and city name.
I want to design a table to store the city details from different web service, after some research I came to these two approaches
1st approach
CREATE TABLE [dbo].[City](
[CityID] [int] NOT NULL,
[CountryCode] [varchar](5) NOT NULL,
[AppCityCode] [varchar](10) NOT NULL,
[AppCityName] [varchar](200) NOT NULL,
[GTACityCode] [varchar](10) NULL,
[GTACityName] [varchar](200) NULL,
[GWSCityCode] [varchar](10) NULL,
[GWSCityName] [varchar](200) NULL,
[KuoniCityCode] [varchar](10) NULL,
....
....
....
....
....
....
)
In this approach when ever a new webservice is added then two columns (city code and city name) corresponding to the webservice is added, due to this modification and there will be a change in stored procedure and in frontend application code.
There will be no duplication while loading the cities in the textbox
2nd Approach
WSSupplier table is used to store Webservice details like GTA, Gallileo..
CREATE TABLE [dbo].[WSSupplier](
[SupplierID] [smallint] NOT NULL,
[SupplierName] [varchar](100) NOT NULL
)
CREATE TABLE [dbo].[City](
[CityID] [int] IDENTITY(1,1) NOT NULL,
[AppCityCode] [varchar](20) NULL,
[AppCityName] [varchar](150) NULL,
[CountryCode] [varchar](10) NULL,
[WSSupplierID] [smallint] NULL,
[WSCityCode] [varchar](20) NULL,
[WSCityName] [varchar](150) NULL
)
In the 2nd approach the cities will be added row by row with corresponding web service supplier ID
If new webservice come then I don't have to modify the table structure or in frontend application.
While loading cities I have to use DISTINCT to load unique city in the textbox or dropdown in frontend
In both approach I am using Appcitycode and Appcityname this will load the city textbox or dropdown in the application. While selecting the Appcityname, it will get the corresponding web service city code and send it as request to the webservice to search a hotel in a particular city.
I want to know which will be the best approach or if there is any other good approach
A third approach would be to create an intersection table between your city table and your supplier table that lists the supplier's version of the city code.
Your city table would just have your own system's city identifier. The city would appear only once. Each time you add a supplier you insert new records into the intersection table with the city codes for the cities that supplier cares about. The translation of a supplier city code to your internal city code is a simple lookup in the intersection table.
Consider something like this:
CREATE TABLE [dbo].[WSSupplier](
[SupplierID] [smallint] NOT NULL,
[SupplierName] [varchar](100) NOT NULL
)
CREATE TABLE [dbo].[City](
[CityID] [int] IDENTITY(1,1) NOT NULL,
[CityCode] [varchar](20) NULL,
[CityName] [varchar](150) NULL,
[CountryCode] [varchar](10) NULL
)
CREATE TABLE [dbo].[SupplierCityCode](
[CityID] [int] NOT NULL,
[WSSupplierID] [smallint] NULL,
[WSCityCode] [varchar](20) NULL,
[WSCityName] [varchar](150) NULL,
FOREIGN KEY [fk_city] [CityID] REFERENCES [dbo].[City],
FOREIGN KEY [fk_supplier] [WSSupplierID] REFERENCES [dbo].[WSSupplier]
)
Your question is about application and database design. From the application design point of view try to abstract from database design and think about it as some storage for your business objects. From database design point of view your question is about Database normalization - start from this article at Wikipedia as a gate to big world of database design. As for me:
CREATE TABLE [dbo].[Supplier](
[SupplierID] [smallint] NOT NULL,
[SupplierName] [varchar](100) NOT NULL
)
CREATE TABLE [dbo].[AppCity](
[CityID] [int] IDENTITY(1,1) NOT NULL,
[CityCode] [varchar](20) NULL,
[CityName] [varchar](150) NULL,
[CountryCode] [varchar](10) NULL,
)
CREATE TABLE [dbo].[SupplierCity](
[CityID] [int] IDENTITY(1,1) NOT NULL,
[SupplierID] [smallint] NOT NULL,
[CityCode] [varchar](20) NULL,
[CityName] [varchar](150) NULL
)

SSIS migration: split one record to many tables

I'm refactoring my db's User object from a schema that combines BillingAddress with Shipping Address:
[BillingFirstName] [nvarchar](50) NOT NULL,
[BillinglastName] [nvarchar](50) NOT NULL,
[BillingAddress] [nvarchar](100) NOT NULL,
[BillingCity] [nvarchar](100) NOT NULL,
[BillingZip] [varchar](16) NOT NULL,
[BillingState] [nvarchar](2) NOT NULL,
[shippingFirstName] [nvarchar](50) NULL,
[shippingLastName] [nvarchar](50) NULL,
[shippingAddress] [nvarchar](100) NULL,
[shippingCity] [nvarchar](100) NULL,
[shippingState] [nvarchar](2) NULL,
[shippingZip] [nvarchar](20) NULL,
[shippingPhone] [nvarchar](30) NULL,
Refactored to one table for User and a separate table for addresses bound by a foreign key Users.ID => Addresses.idUser
CREATE TABLE [dbo].[Addresses](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Type] [nchar](10) NOT NULL, // designates Billing or Shipping
[Formatted] [nchar](600) NOT NULL,
[Street] [nchar](100) NOT NULL,
[City] [nchar](100) NOT NULL,
[POBox] [nchar](50) NULL,
[Region] [nchar](50) NULL,
[PostalCode] [nchar](50) NULL,
[Country] [nchar](50) NULL,
[ExtendedAddress] [nchar](100) NULL,
[idUser] [int] NULL,
How do I tell SSIS to import a record to the simplified User object and then create 2 addresses records; one with the Shipping info and the other with Billing?
I'd want preserve existing ID Key.
thx
Multicast your source data.
Add a Derived Column component to each output stream. In your mind, designate one as the "Billing Address" stream and one as the "Shipping Address" stream.
Add a new column named "Type", hardcoded to "Billing" and "Shipping", respectively.
Add a destination component to each streams both pointing to your Address table.
Map the appropriate columns in each stream (ie BillingCity to the City in the "billing address" stream, ShippingCountry to the Country in the "shipping address" stream, etc.)