Today I have a table containing:
Table a
--------
name
description
street1
street2
zipcode
city
fk_countryID
I'm having a discussion what is the best way to normalize this in terms of quickest search. E.g. find all rows filtered by city or zipcode. Suggested new structure is this:
Table A
--------
name
description
fk_streetID
streetNumber
zipcode
fk_countryID
Table Street
--------
id
street1
street2
fk_cityID
Table City
----------
id
name
Table Country
-------------
id
name
The dicussion is about having only one field for street name instead of two.
My argument is that having two feilds is considered normal for supporting international addresses.
The pro argument is that it will go on the cost of performance on search and possible duplication.
I'm wondering what is the best way to go here.
UPDATE
I'm aiming at having 15.000 brands associated with 50.000 stores, where 1.000 users will do multiple searches each day by web and iPhone. In addition I will be having 3. parties fetching data from the DB for their sites.
The site is not launched yet, so we have no idea of the workload. And we'll only have around 1000 brands assocaited with around 4000 stores when we start.
My standard advice (from years of data warehouse /BI experience) here is:
always store the lowest level of broken out detail, i.e. the multiple fields option.
In addition to that, depending on your needs you can add indexes or even a compound field that is the other two field concatenated - though make sure to maintain with a trigger and not manually or you will have data syncronization and quality problems.
Part of the correct answer for you will always depend on your actual use. Can you ever anticipate needing the address in a standard (2-line) format for mailing... or exchange with other entities? Or is this a really pure 'read-only' database that is just set up for inquiries and not used for more standard address needs such as mailings.
A the end of the day if you have issues with query performance, you can add additional structures such as compound fields, indexes and even other tables with the same data in a different form. Then there are also options for caching at the server level if performance is slow. If building a complex or traffic intensive site, chances are you will end up with a product to help anyway, for example in the Ruby programming world people use thinking sphinx If query performance is still an issue and your data is growing you may ultimately need to consider non-sql solutions like MongoDB.
One final principle that I also adhere to: think about people updating data if that will occur in this system. When people input data initially and then subsequently go to edit that information, they expect the information to be "the same" so any manipulation done internally that actually changes the form or content of the users input will become a major headache when trying to allow them to do a simple edit. I have seen insanely complicated algorithms for encoding and decoding data in this fashion and they frequently have issues.
I think the topmost example is the way to go, maybe with a third free-form field:
name
description
street1
street2
street3
zipcode
city
fk_countryID
the only thing you can normalize half-way sanely for international addresses is zip code (needs to be a free-form field, though) and city. Street addresses vary way too much.
Note that high normalisation means more joins, so it won't yield to faster searches in every case.
As others have mentioned, address normalization (or "standardization") is most effective when the data is together in one table but the individual pieces are in separate columns (like your first example). I work in the address verification field (at SmartyStreets), and you'll find that standardizing addresses is a really complex task. There's more documentation on this task here: https://www.smartystreets.com/Features/Standardization/
With the volume of requests you'll be handling, I highly suggest you make sure the addresses are correct before you deploy. Process your list of addresses and remove duplicates, standardize formats, etc. A CASS-Certified vendor (such as SmartyStreets, though there are others) will provide such a service.
Related
What's the most efficient method or tool to randomize a list of database table columns to obscure sensitive information?
I have a Django application used by several clients, and I need to onboard some development contractors to do work on the application. When they work on bugs (e.g. page /admin/model/123 has an error), ideally they'd need a snapshot of the client database in order to reproduce and fix the bug. However, because they're off-site contractors, I'd like to mitigate risk in the event they expose the client database (unintentionally or otherwise). I don't want to have to explain to a client why all their data's been published online because a foreign contractor left his laptop in an unlocked car.
To do this, I'd like to find or write a tool to "randomize" sensitive fields in the database, like usernames, email addresses, account numbers, company names, phone numbers, etc so that the structure of the data is maintained, but all personally identifiable information is removed.
Presumably, this is a task that many other people have had to do, but I'm not sure what the technical term is, so I'm not finding much through Google. Are there any existing tools to do this with a Django application running a MySQL or PostgreSQL backend?
Anonymize and sanitize are good words for this chore.
It's relatively easy to do. Use queries like
UPDATE person
SET name = CONCAT('Person', person_id),
email = CONCAT('Person', person_id, '#example.com')
and so forth, to stomp actual names and emails and all that. It's helpful to preserve the uniqueness of entries, and the autoincrementing IDs of various tables can help you do that.
(Adding this as an answer, as I am not allowed to comment yet.)
As Cerin said, O. Jones approach for anonymizing/sanitizing works for simple fields, but not more complicated ones like addresses, phone number or account numbers that need to match a specific format. However, the method can be modified to allow this too.
Let's take a phone number with format aaa-bbbb-ccc as an example and use the autoincrementing person_id as the source of unique numbers. For the ccc part of the phone number, use MOD(person_id,1000). This will give the remainder of person_id divided by a 1000. For bbbb, take MOD((person_id-MOD(person_id,1000))/1000,10000). It looks complicated, but what this is doing is taking person_id, removing the last three digits (which were used for ccc), then dividing by a 1000. The last four digits are taken from the resulting number to use as bbbb. I think you'll be able to figure out how to calculate aaa.
The three parts of the phone number can then be concatenated to give the complete phone number: CONCAT(aaa,"-",bbbb,"-",ccc)
(You might have to explicitly convert the numbers to string, I'm not sure)
I have created a mysql table which has the crime count, Crime description, Crime Category and address of crime. I have created some reports over this table. The user wants to have a search by address filter in the report. so we are going to be using a where clause on table and have a condition over street.
The problem is that street address is quite a large string and searching/filtering the table over address when the table is already quite big will take a lot of time. I tried using some hashing like md5(streetaddress) but that did not help either. The query become very slow with this kind of where clause
example
select * from crimedata where streetaddress = "41 BENNETT RD Watertown Massachusetts United States"
Will indexing the streetaddress help in this case or should I use some kind of hashing to make this kind of string search faster in the table?
Shah
Adding an index on streetaddress will help a bit but limited.
You may want to consider changing your storage engine to something that supports fulltext search.
An example is Mroonga
NOTE: I am not associated with Mroonga. I just had a chance to use the library before and found that it does provide improvement in text search.
You could try properly normalizing your data, where addresses are stored in one table and referenced by ID in another.
Your query should look like?:
SELECT ... FROM crimedata WHERE address_id=?
Where that ? is a placeholder for the ID of the address you fetch from the other table.
As always, anything that shows up repeatedly in a WHERE clause as a condition is a strong candidate for being indexed.
I would take a step back and see if you are attacking the problem in a way that is going to scale.
I would look at using geospatial information to do your queries on then use the street address as an output display parameter.
If you use the GIS object to store things like a point then you'll be able to do radius searches and bounding box queries in the future.
Your coding would change when someone enters in a street address to convert to either lat/long or point. Then when doing searches it will go much quicker since you won't be doing full text searches.
It will give you the ability to call mapping API to show the address or place location on public mapping services.
http://mysqlserverteam.com/mysql-5-7-and-gis-an-example/
[Yes, of course scaling something like this out to a global scale would take out of the realm of databases into bigdata world]
I have a database that has a table email_eml that stores 3 attributes name_eml, host_eml and domain_eml. Which store email name the name of website and the domain name (like .com .net etc) It doesnt store # or a . in any of the variables.
This allows me some flexibility (for example checking average name lenght(before the # symbol) will be faster) . I can collect some statistics on email name, I can also create usernames from the name_eml attribute.
It however is also a burden to handle when people are submitting their email or i have to compare a whole email.
This will make me store the additional # and . symbols and make me seperate the name through script when i want to collect statistics.
I wonder if its better to store the email in a single column instead of the 3 columns.
Is one of the ways more proper or more normalized way?
I would like the answer to include pros and cons of both approaches to storing the email adresses. (even if storing the emails in 3 columns doesnt have many pros)
It doesnt store # or a . in any of the variables.
Well, it should; cat.call#somedomain.com is a legal email address.
I wonder if its better to store the email in a single column instead
of the 3 columns. Is one of the ways more proper or more normalized
way?
This doesn't have anything to do with normalization. It has to do with complex data types.
The relational model allows arbitrarily complex data types. A commonly used complex data type is a timestamp, which typically includes year, month, day, hour, minute, second, and microsecond.
Given a timestamp, sometimes you might need to know only the date, and sometimes you might need to know only the year or only the hour. The relational model imposes a specific burden on the dbms when dealing with complex data types. For a complex data type, the dbms is required either to return it in its entirety, or to provide functions that return its various parts. The point is that, if a user wants only the hour out of a timestamp, the user doesn't write code to get it.
SQL dbms have good support for timestamps; every dbms that I'm familiar with provides functions that return various parts of timestamps. None of them have native support for email addresses.
On a SQL platform, you have at least two alternatives to keep your database close to the relational model. You can write functions that can be incorporated into the database server (if your dbms and your programming skill allows that), or you can split up the data type into pieces so each can be addressed in its entirety like any other value.
While there are probably some data types that make sense to split like that (street addresses might be one of them), I don't really see any compelling reason to split an email address.
This allows me some flexibility (for example checking average name
lenght(before the # symbol) will be faster) . I can collect some
statistics on email name, I can also create usernames from the
name_eml attribute.
While that's true, right now I can't imagine anything at all interesting about the average length of a username. I don't find any of your reasons compelling, but you know more about your application than I do.
If you really need to do a lot of operations on the pieces, it makes more sense to keep the pieces separate. More "normal" client code should access the email addresses through a view that concatenates the pieces. (Concatenation is a lot easier than parsing an email address at run time.)
It's extremely rare to store email addresses in three columns. If you want to do something like search on the part of the email before the # symbol you could just use a LIKE query...
SELECT email FROM people WHERE email LIKE 'john.smith#%';
I'd be interested to hear of any real-life examples that aren't possible to do with an SQL query.
In terms of normalization, once you break apart common aspects (such as host and especially top-level domain), they should be modeled as foreign relationships. So you end up with three tables:
emailNames
emailHosts
emailTLDs
emailNames then has three columns:
emailName
hostID
tldID
Note that I used "TLD", as this is likely the only part with significant overlap in the host name, and you can expect the "." character in hostnames before the start of the TLD.
I am making a very simple database (mysql) with essentially two types of data, always with a 1 to 1 relationship:
Events
Sponsor
Time (Optional)
Location (City, State)
Venue (Optional)
Details URL
Sponsors
Name
URL
Cities will be duplicated often, but is there really much value in having a cities table for such a simple database schema?
The database is populated by screen-scraping a website. On this site the city field is populated via selecting from a dropdown, so there will not be mistypes, etc and it would be easy to match the records up with a city table. I'm just not sure there would be much of a point even if the users of my database will be searching by city frequently.
Normalize the database now.
It's a lot easier to optimize queries on normalized data than it is to normalize a pile of data.
You say it's simple now - these things have a tendency to grow. Design it right and you'll get the experience of proper design and some future proofing.
I think you are looking at things the wrong way - you should always normalize unless you have a good reason not to.
Trusting your application to maintain data integrity is a needless risk. You say the data is made uniform because it is selected from a dropdown. What if someone hacks on the form and modifies the data, or if your code inadvertently allows a querystring param with the same name?
Where will the city data come from that populates your dropdown box for the user? Wouldn't you want a table for that?
It looks like you are treating Location as one attribute including city and state. Suppose you want to sort or analyse events by state alone rather than city and state? That could be hard to do if you don't have an attribute for state. Logically I would expect state to belong in a city table - although that may depend on exactly how you want to identify cities.
Direct answer: Just because a problem is relatively simple is no reason to not do things to keep it simple. It's a lot easier to walk on my feet than on my hands. I don't recall ever saying, "Oh, I only have to go half a mile, that's a short distance so I might as well walk on my hands."
Longer answer: If you don't keep any information about a city other than it's name, and you don't have a pre-set list of cities (e.g. to build a drop-down), then your schema is already normalized. What would be in a City table other than the city name? (I presume State cannot be dependent on City because you could have two cities with the same name in different states, e.g. Dayton OH and Dayton TN.) The relevant rule of normalization is "no non-key dependencies", that is, you cannot have data that depends on data that is not a key. If you had, say, latitude and longitude of each city, then this data would be repeated in every record that referenced the same city. In that case you would certainly want to break out a separate city table to hold the latitude and longitude. You could, of course, create a "city code" that is an integer or abbreviation that links to a city table. But if there's no other data about a city, I don't see how this gains anything.
Technically, I would assume that City depends on Venue. If the venue is "Rockefeller Center", that implies that the city must be New York. But if venue is optional, this creates problems. One possibility is to have a Venue table that lists venue name, city, and state, and for cases where you don't specify a venue, have an "unspecified" for each city. This would be more textbook correct, but in practice if in most case you do not specify a venu, it would gain little. If most of the time you DO specify a venu, it would probably be a good idea.
Oh, and, is there really a 1:1 relation between event and sponsor? I can believe that an event cannot have more than one sponsor. (In real life, there are plenty of events with multiple sponsors, but maybe for your purposes you only care about a "primary sponsor" or some such.) But does a sponsor never hold more than one event? That seems unlikely.
Why not go ahead and normalize? You write as if there are significant costs of normalizing that outweigh the benefits. It's easier to set it up in a normal form before you populate it than to try and normalize it later.
Also, I wonder about your 1-to-1 relationship. Naively, I would imagine that an event might have multiple sponsors, or that a sponsor might be involved in more than one event. But I don't know your business logic...
ETA:
I don't know why I didn't notice this before, but if you are really averse to normalizing your database, and you know that you will always have a 1-to-1 relationship between the events and sponsors, then why would you have the sponsors in a separate table?
It sounds like you may be a little confused about what normalization is and why you would do it.
The answer hinges, IMO, on whether you want to prevent errors during data-entry. If you do, you will need a VENUES table:
VENUES
City
State
VenueName
as well as a CITIES and STATES table. (Note: I've seen situations where the same city occurs multiple times in the same state, usually smaller towns, so CITY/STATE do not comprise a unique dyad. Normally there's a zipcode to disambiguate.)
To prevent situations where the data-entry operator enters a venue for NY NY which is actually in SF CA, you'd need to validate the venue entry to see if such a venue exists in the city/state supplied on the record.
Then you'd need to make CITY/STATE mandatory, and have to write code to rollback the transaction and handle the error.
If you are not concerned about enforcing this sort of accuracy, then you don't really need to have CITY and STATES tables either.
If you are interested in learning about normalization, you should learn what happens when you don't normalize. For each normal form (beyond 1NF) there is an update anomaly that will occur as a consequence of harmful redundancy.
Often it's possible to program around the update anomalies, and sometimes that's more practical than always normalizing to the ultimate degree.
Sometimes, it's possible for a database to get into an inconsistent state due to failure to normalize, and failure to program the application to compensate.
In your example, the best I can come up with is a sort of lame hypotheical. What if the name of a city got mispelled in one row, but spelled correctly in all the others. What if you summarized by city and sponsor? Your output would reflect the error, and diovide one group into two groups. Maybe it would be better if the city were only spelled out once in the database, for better or for worse. At least the grouping for the summary would be correct, even if the name were mispelled.
Is this worth nromalizing for? Hey, it's your project, not mine. You decide
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Background
I'm a first year CS student and I work part time for my dad's small business. I don't have any experience in real world application development. I have written scripts in Python, some coursework in C, but nothing like this.
My dad has a small training business and currently all classes are scheduled, recorded and followed up via an external web application. There is an export/"reports" feature but it is very generic and we need specific reports. We don't have access to the actual database to run the queries. I've been asked to set up a custom reporting system.
My idea is to create the generic CSV exports and import (probably with Python) them into a MySQL database hosted in the office every night, from where I can run the specific queries that are needed. I don't have experience in databases but understand the very basics. I've read a little about database creation and normal forms.
We may start having international clients soon, so I want the database to not explode if/when that happens. We also currently have a couple big corporations as clients, with different divisions (e.g. ACME parent company, ACME healthcare division, ACME bodycare division)
The schema I have come up with is the following:
From the client perspective:
Clients is the main table
Clients are linked to the department they work for
Departments can be scattered around a country: HR in London, Marketing in Swansea, etc.
Departments are linked to the division of a company
Divisions are linked to the parent company
From the classes perspective:
Sessions is the main table
A teacher is linked to each session
A statusid is given to each session. E.g. 0 - Completed, 1 - Cancelled
Sessions are grouped into "packs" of an arbitrary size
Each packs is assigned to a client
I "designed" (more like scribbled) the schema on a piece of paper, trying to keep it normalised to the 3rd form. I then plugged it into MySQL Workbench and it made it all pretty for me: (Click here for full-sized graphic)
(source: maian.org)
Example queries I'll be running
Which clients with credit still left are inactive (those without a class scheduled in the future)
What is the attendance rate per client/department/division (measured by the status id in each session)
How many classes has a teacher had in a month
Flag clients who have low attendance rate
Custom reports for HR departments with attendance rates of people in their division
Question(s)
Is this overengineered or am I headed the right way?
Will the need to join multiple tables for most queries result in a big performance hit?
I have added a 'lastsession' column to clients, as it is probably going to be a common query. Is this a good idea or should I keep the database strictly normalised?
Thanks for your time
Some more answers to your questions:
1) You're pretty much on target for someone who is approaching a problem like this for the first time. I think the pointers from others on this question thus far pretty much cover it. Good job!
2 & 3) The performance hit you will take will largely be dependent on having and optimizing the right indexes for your particular queries / procedures and more importantly the volume of records. Unless you are talking about well over a million records in your main tables you seem to be on track to having a sufficiently mainstream design that performance will not be an issue on reasonable hardware.
That said, and this relates to your question 3, with the start you have you probably shouldn't really be overly worried about performance or hyper-sensitivity to normalization orthodoxy here. This is a reporting server you are building, not a transaction based application backend, which would have a much different profile with respect to the importance of performance or normalization. A database backing a live signup and scheduling application has to be mindful of queries that take seconds to return data. Not only does a report server function have more tolerance for complex and lengthy queries, but the strategies to improve performance are much different.
For example, in a transaction based application environment your performance improvement options might include refactoring your stored procedures and table structures to the nth degree, or developing a caching strategy for small amounts of commonly requested data. In a reporting environment you can certainly do this but you can have an even greater impact on performance by introducing a snapshot mechanism where a scheduled process runs and stores pre-configured reports and your users access the snapshot data with no stress on your db tier on a per request basis.
All of this is a long-winded rant to illustrate that what design principles and tricks you employ may differ given the role of the db you're creating. I hope that's helpful.
You've got the right idea. You can however clean it up, and remove some of the mapping (has*) tables.
What you can do is in the Departments table, add CityId and DivisionId.
Besides that, I think everything is fine...
The only changes I would make are:
1- Change your VARCHAR to NVARCHAR, if you might be going international, you may want unicode.
2- Change your int id's to GUIDs (uniqueidentifier) if possible (this might just be my personal preference). Assuming you eventually get to the point where you have multiple environments (dev/test/staging/prod), you may want to migrate data from one to the other. Have GUID Ids makes this significantly easier.
3- Three layers for your Company -> Division -> Department structure may not be enough. Now, this might be over-engineering, but you could generalize that hierarchy such that you can support n-levels of depth. This will make some of your queries more complex, so that may not be worth the trade-off. Further, it could be that any client that has more layers may be easily "stuffable" into this model.
4- You also have a Status in the Client Table that is a VARCHAR and has no link to the Statuses table. I'd expect a little more clarity there as to what the Client Status represents.
No. It looks like you're designing at a good level of detail.
I think that Countries and Companies are really the same entity in your design, as are Cities and Divisions. I'd get rid of the Countries and Cities tables (and Cities_Has_Departments) and, if necessary, add a boolean flag IsPublicSector to the Companies table (or a CompanyType column if there are more choices than simply Private Sector / Public Sector).
Also, I think there's an error in your usage of the Departments table. It looks like the Departments table serves as a reference to the various kinds of departments that each customer division can have. If so, it should be called DepartmentTypes. But your clients (who are, I assume, attendees) do not belong to a department TYPE, they belong to an actual department instance in a company. As it stands now, you will know that a given client belongs to an HR department somewhere, but not which one!
In other words, Clients should be linked to the table that you call Divisions_Has_Departments (but that I would call simply Departments). If this is so, then you must collapse Cities into Divisions as discussed above if you want to use standard referential integrity in the database.
By the way, it's worth noting that if you're generating CSVs already and want to load them into a mySQL database, LOAD DATA LOCAL INFILE is your best friend: http://dev.mysql.com/doc/refman/5.1/en/load-data.html . Mysqlimport is also worth looking into, and is a command-line tool that's basically a nice wrapper around load data infile.
Most things have already been said, but I feel that I can add one thing: it is quite common for younger developers to worry about performance a little bit too much up-front, and your question about joining tables seems to go into that direction. This is a software development anti-pattern called 'Premature Optimization'. Try to banish that reflex from your mind :)
One more thing: Do you believe you really need the 'cities' and 'countries' tables? Wouldn't having a 'city' and 'country' column in the departments table suffice for your use cases? E.g. does your application need to list departments by city and cities by country?
Following comments based on role as a Business Intelligence/Reporting specialist and strategy/planning manager:
I agree with Larry's direction above. IMHO, It's not so much over engineered, some things just look a little out of place. To keep it simple, I would tag client directly to a Company ID, Department Description, Division Description, Department Type ID, Division Type ID. Use Department Type ID and Division Type ID as references to lookup tables and internal reporting/analysis fields for long term consistency.
Packs table contains "Credit" column, shouldn't that actually be tied to the Client base table so if they many packs you can see how much credit owed is left for future classes? The application can take care of the calc and store it centrally in the Client table.
Company info could use many more fields, including the obvious address/phone/etc. information. I'd also be prepared to add in D&B "DUNs" columns (Site/Branch/Ultimate) long term, Dun and Bradstreet (D&B) has a huge catalog of companies and you'll find later down the road their information is very helpful for reporting/analysis. This will take care of the multiple division issue you mention, and allow you to roll up their hierarchy for sub/division/branches/etc. of large corps.
You don't mention how many records you'll be working with which could imply setting yourself up for a large development initiative which could have been done quicker and far fewer headaches with prepackaged "reporting" software. If your not dealing with a large database (< 65000) rows, make sure MS-Access, OpenOffice (Base) or related report/app dev solutions couldn't do the trick. I use Oracle's free APEX software quite a bit myself, it comes with their free database Oracle XE just download it from their site.
FYI - Reporting insight: for large databases, you typically have two database instances a) transaction database for recording each detailed record. b) reporting database (data mart/data warehouse) housed on a separate machine. For more information search google both Star Schema and Snowflake Schema.
Regards.
I want to address only the concern that joining to mutiple tables will casue a performance hit. Do not be afraid to normalize because you will have to do joins. Joins are normal and expected in relational datbases and they are designed to handle them well. You will need to set PK/FK relationships (for data integrity, this is important to consider in designing) but in many databases FKs are not automatically indexed. Since they wil be used in the joins, you will definitelty want to start by indexing the FKS. PKs generally get an index on creation as they have to be unique. It is true that datawarehouse design reduces the number of joins, but usually one doesn't get to the point of data warehousing until one has millions of records needed to be accessed in one report. Even then almost all data warehouses start with a transactional database to collect the data in real time and then data is moved to the warehouse on a schedule (nightly or monthly or whatever the business need is). So this is a good start even if you need to design a data warehouse later to improve report performance.
I must say your design is impressive for a first year CS student.
It isn't over-engineered, this is how I would approach the problem. Joining is fine, there won't be much of a performance hit (it's completely necessary unless you de-normalise the database out which isn't recommended!). For statuses, see if you can use an enum datatype instead to optimise that table out.
I've worked in the training / school domain and I thought I'd point out that there's generally a M:1 relationship between what you call "sessions" (instances of a given course) and the course itself. In other words, your catalog offers the course ("Spanish 101" or whatever), but you might have two different instances of it during a single semester (Tu-Th taught by Smith, Wed-Fri taught by Jones).
Other than that, it looks like a good start. I bet you'll find that the client domain (graphs leading to "clients") is more complex than you've modeled, but don't go overboard with that until you've got some real data to guide you.
A few things came to mind:
The tables seemed geared to reporting, but not really running the business. I would think when a client signs up, there's essentially an order being placed for the client attending a list of sessions, and that order might be for multiple employees in one company. It would seem an "order" table would really be at the center of your system and driving your data capture and eventual reporting. (Compare the paper documents you've been using to run the business with your database design to see if there's a logical match.)
Companies often don't have divisions. Employees sometimes change divisions/departments, maybe even mid-session. Companies sometimes add/delete/rename divisions/departments. Make sure the possible realtime changing contents of your tables doesn't make subsequent reporting/grouping difficult. With so much contact data split over so many tables, you might have to enforce very strict data entry validation to keep your reports meaningful and inclusive. Eg, when a new client is added, making sure his company/division/department/city match the same values as his coworkers.
The "packs" concept isn't clear at all.
Since you indicate it's a small business, it would be surprising if performance would be an issue, considering the speed and capacity of current machines.