How to Create an Effective Data Model Based on Effective SQL
Written on
This article reflects on the insights gained from Effective SQL: 61 Specific Ways to Write Better SQL and offers my personal notes along with additional references. Below are links to my reflections on earlier chapters, which you are welcome to explore.
How to Design an Index? — Reflection on Effective SQL
Effective SQL has provided numerous strategies for efficient database usage.
What if We Can’t Alter the Design? — Reflection on Effective SQL Part 3
Subqueries Explained — Reflection on Effective SQL Part 6
A subquery is essentially a query contained within parentheses. We often utilize subqueries when we need to derive tables, lists, or other data structures.
How to Retrieve and Analyze Metadata? — Reflection on Effective SQL Part 7
Sometimes, we require data about other data, or metadata. This need arises under certain conditions.
The following paragraphs summarize essential points:
- Ensure every table includes a primary key.
- Remove redundant storage.
- Eliminate repeating groups.
- Maintain one property per column.
- Recognize why storing computed data can be disadvantageous.
- Establish foreign keys to uphold referential integrity.
- Utilize denormalization for data warehouses.
While this article summarizes key concepts, I highly recommend reading the book for a more comprehensive understanding.
Item 1: Confirm All Tables Have a Primary Key
#### 1. The Importance of Primary Keys Without a primary key, tables may have duplicate entries, leading to inaccuracies and slower searches.
#### 2. Determining a Primary Key A primary key should be unique and non-null, ideally a column that remains unchanged.
#### 3. Referential Integrity (RI) This ensures that any foreign key in a child table corresponds to an entry in the parent table.
#### 4. Drawbacks of Text-Based Primary Keys Using something like CustomerID as a primary key can lead to complications if modifications are needed in the future.
#### 5. Challenges of Compound Primary Keys They can complicate operations and slow down processing.
Item 2: Remove Redundant Storage
Redundant storage occurs when a table encompasses multiple subjects, leading to various issues.
#### 1. Insertion Anomalies Inserting data can become cumbersome if you need to insert a salesperson's details but have to include customer profiles as well.
#### 2. Update Anomalies Updating a single row requires changes across all duplicates, risking potential errors.
#### 3. Deletion Anomalies Deleting a salesperson might inadvertently remove customer profiles.
#### 4. Redesigning the Table Using the Second Normal Form Following the Second Normal Form (2NF) helps minimize data redundancy.
#### 5. Joining Normalized Tables You can connect normalized tables with denormalized tables through join clauses.
Item 3: Eliminate Repeating Groups
Repeating groups involve multiple columns with similar attributes within a table.
#### 1. Drawbacks of Repeating Groups They can lead to duplicate values and complicate column modifications.
#### 2. Utilizing Union Queries If modifying the table is not an option, consider using union queries.
#### 3. Redesigning the Table Separate the original table into independent tables, ensuring each has its primary key and foreign keys as needed.
Item 4: Store Only One Property Per Column
Storing multiple properties in a single column violates the First Normal Form (1NF).
#### 1. Issues with Storing Multiple Properties Searching for specific values becomes inefficient and cumbersome.
#### 2. Redesigning for 1NF Compliance Break down values into distinct columns and utilize concatenation as needed.
Item 5: Understand the Drawbacks of Storing Calculated Data
Storing computed columns necessitates recalculating them whenever underlying data changes, which can be problematic.
#### 1. DML Triggers These are akin to constraints and can fulfill functional needs that constraints alone cannot.
After/For Trigger These execute post-insertion or updates, ensuring additional actions occur as necessary.
Instead of Trigger These override standard actions, providing a safeguard when certain conditions aren't met.
#### 2. AS Expression Computed columns cannot be employed as DEFAULT or FOREIGN KEY constraints.
#### 3. Deterministic vs. Nondeterministic - Deterministic: Consistent results for identical inputs. - Nondeterministic: Varying outcomes for the same inputs.
Item 6: Define Foreign Keys for Referential Integrity
#### 1. Role of Declarative Referential Integrity (DRI) This ensures relationships between tables and maintains data integrity.
#### 2. Cascading Actions Cascading actions dictate how changes in parent data affect child data.
- NO ACTION: No changes in child data upon parent updates.
- CASCADE: Child data is modified with parent updates.
- SET NULL: Child data becomes NULL if the parent is altered.
- SET DEFAULT: Child data reverts to default values on parent changes.
Item 9: Utilize Denormalization for Data Warehousing
#### 1. Properties of Data Warehouses These serve as central information repositories, designed for high read and low write operations, necessitating schema denormalization.
#### 2. Properties of Databases Function more like transaction systems focused on high write operations, requiring normalization for optimal performance.
#### 3. Distinctions Between Data Warehouses and Databases
#### 4. Denormalization Techniques - Method 1: Transfer child table data to parent without foreign keys. - Method 2: Store derived values to minimize computation. - Method 3: Use repeating groups to decrease the number of rows.
Summary
Thank you for your attention. I’m Sean, a software engineer. This article represents my notes, and I welcome any corrections or feedback. If you found this helpful, feel free to show support.
You can also follow my page on Facebook.
References
- DML Triggers
- SQL AFTER/INSTEAD OF
- Domain Level Integrity
- Using INSTEAD OF triggers in SQL Server for DML operations
- SQL Server: Foreign Keys with cascade delete