Schema Design
資料庫 Schema 設計指南,涵蓋 Schema 類型、正規化流程與 MongoDB 設計規則。
Schema Types
Relational Model
關聯式模型是最常見的資料庫模型,使用表格(Table)來組織資料,透過主鍵(Primary Key)和外鍵(Foreign Key)建立表格間的關係。
Star Schema
適用範例:
- Retail Sales Analysis(零售銷售分析)
- Financial Reporting(財務報表)
- Marketing Campaign Analysis(行銷活動分析)
- Inventory Management(庫存管理)
Star Schema 的特點:
- 中心為 Fact Table(事實表),包含度量值(如銷售額、數量)
- 周圍為 Dimension Tables(維度表),提供描述性資訊(如時間、產品、地區)
- 結構簡單,查詢效能佳
Snowflake Schema
適用範例:
- Customer Relationship Management (CRM)
- Healthcare Data Analysis(醫療數據分析)
- E-commerce Platform(電商平台)
- Supply Chain Management(供應鏈管理)
Snowflake Schema 的特點:
- 是 Star Schema 的正規化版本
- Dimension Tables 進一步拆分為多層結構
- 減少資料冗餘,但查詢較複雜
RDB Design Steps
Step 1: Write a Sentence
用一句話描述需要儲存的資料:
- Store the customer information and the orders that they made
- Store the car and the showroom that they are located in
- Store the students and the courses they are enrolled in at a specific date
Step 2: Pick Objects From Sentence
從句子中找出主要物件:
- customer information
- orders
- car
- showroom
- students
- courses
- date
Step 3: Determine Relationship
- Does a customer have many orders, or does an order have many customers?
- Does a car have many showrooms, or does a showroom have many cars?
- Does a student have many courses, or does a course have many students?
Step 4: Create a Diagram
Step 5: Joining Table
當遇到 Many-to-Many 關係時,需要建立 Joining Table(關聯表)。
例如:Student 和 Class 的關係
錯誤做法:
正確做法:
ER Diagram:
Normalization
正規化是將資料組織成表格的過程,以減少資料冗餘並確保資料完整性。
1NF (First Normal Form)
第一正規化:除去重複值
- Does the combination of all columns make a unique row every single time?
- What field can be used to uniquely identify the row?
2NF (Second Normal Form)
第二正規化:除去部分相依 / 去除只與部分主鍵相關欄位
- Fulfil the requirements of first normal form
- Each non-key attribute must be functionally dependent on the primary key
判斷欄位是否依賴主鍵:
| 欄位 | 依賴 PK? | 說明 |
|---|---|---|
| student name | Yes | 不同學生 ID 代表不同學生姓名 |
| fees paid | Yes | 每個費用值是針對單一學生 |
| date of birth | Yes | 出生日期特定於該學生 |
| address | Yes | 地址特定於該學生 |
| subject 1~4 | No | 一門學科可以有多位學生註冊 |
| teacher name | No | 老師名字不依賴此學生 |
| teacher address | No | 老師地址不依賴此學生 |
| course name | No | 課程名稱不取決於學生 |
不依賴主鍵的欄位應移到新表格:
Subject (subject ID, subject name)Teacher (teacher ID, teacher name, address)Course (course ID, course name)Student (student ID, student name, fees paid, date of birth, address)
Foreign Keys in Tables
問自己:
- Does a course have many students, or does a student have many courses?
- Does a teacher have many courses, or does a course have many teachers?
- Does a subject have many students, or does a student have many subjects?
2NF 結果:
3NF (Third Normal Form)
第三正規化:去除間接相依 / 去除除主鍵外有相依關係的欄位
Transitive Functional Dependency(遞移函數相依):
- Column A determines column B
- Column B determines column C
- Therefore, column A determines C
以地址為例:
| address |
|---|
| 3 Main Street, North Boston 56125 |
| 16 Leeds Road, South Boston 56128 |
ZIP code 決定 city、state 和 suburb。
Student 決定 address ZIP code,進而決定 suburb。
拆分結果:
Address Code (address code ID, ZIP code, suburb, city, state)- Student Table →
(student ID, course ID, student name, fees paid, date of birth, street address, address code ID) - Teacher Table →
(teacher ID, teacher name, street address, address code ID)
3NF ER Diagram:
4NF & BCNF
- 4NF:處理多值相依(Multi-valued Dependency)
- BCNF (Boyce-Codd Normal Form):複合主鍵需皆為外來鍵
MongoDB Schema Design
Design Rules
One-to-One
{
"_id": "ObjectId('AAA')",
"name": "Joe Karlsson",
"company": "MongoDB",
"twitter": "@JoeKarlsson1",
"website": "joekarlsson.com"
}One-to-Few (Embedding)
當很少單獨讀取嵌入的物件時,使用嵌入模式。關聯數量通常在 1000 以內。
{
"_id": "ObjectId('AAA')",
"name": "Joe Karlsson",
"addresses": [
{ "street": "123 Sesame St", "city": "Anytown", "cc": "USA" },
{ "street": "123 Avenue Q", "city": "New York", "cc": "USA" }
]
}One-to-Many (Child-Referencing)
例如:電商網站的產品頁面,一個產品由多個零件組成。
Products Collection:
{
"name": "left-handed smoke shifter",
"manufacturer": "Acme Corp",
"parts": ["ObjectID('AAAA')", "ObjectID('BBBB')", "ObjectID('CCCC')"]
}Parts Collection:
{
"_id": "ObjectID('AAAA')",
"partno": "123-aff-456",
"name": "#4 grommet",
"qty": "94",
"cost": "0.94",
"price": "3.99"
}One-to-Squillions (Parent-Referencing)
適合資料量會隨時間成長的場景,例如系統日誌、交易日誌。
Host Collection:
{
"_id": "ObjectID('AAAB')",
"name": "goofy.example.com",
"ipaddr": "127.66.66.66"
}Log Collection:
{
"time": "ISODate('2014-03-28T09:42:41.382Z')",
"message": "cpu is on fire!",
"host": "ObjectID('AAAB')"
}Many-to-Many
例如:To-Do App,一個使用者可以有多個任務,一個任務可以分配給多個使用者。
Users Collection:
{
"_id": "ObjectID('AAF1')",
"name": "Kate Monster",
"tasks": ["ObjectID('ADF9')", "ObjectID('AE02')", "ObjectID('AE73')"]
}Tasks Collection:
{
"_id": "ObjectID('ADF9')",
"description": "Write blog post about MongoDB schema design",
"due_date": "ISODate('2014-04-01')",
"owners": ["ObjectID('AAF1')", "ObjectID('BB3G')"]
}