Schema Design

資料庫 Schema 設計指南,涵蓋 Schema 類型、正規化流程與 MongoDB 設計規則。

Schema Types

Relational Model

關聯式模型是最常見的資料庫模型,使用表格(Table)來組織資料,透過主鍵(Primary Key)和外鍵(Foreign Key)建立表格間的關係。

Star Schema

Star Schema 適合用於簡單性、效能和查詢便利性為首要考量的場景。

適用範例:

  • Retail Sales Analysis(零售銷售分析)
  • Financial Reporting(財務報表)
  • Marketing Campaign Analysis(行銷活動分析)
  • Inventory Management(庫存管理)

Star Schema 的特點:

  • 中心為 Fact Table(事實表),包含度量值(如銷售額、數量)
  • 周圍為 Dimension Tables(維度表),提供描述性資訊(如時間、產品、地區)
  • 結構簡單,查詢效能佳

Snowflake Schema

適用範例:

  • Customer Relationship Management (CRM)
  • Healthcare Data Analysis(醫療數據分析)
  • E-commerce Platform(電商平台)
  • Supply Chain Management(供應鏈管理)

Snowflake Schema 的特點:

  • 是 Star Schema 的正規化版本
  • Dimension Tables 進一步拆分為多層結構
  • 減少資料冗餘,但查詢較複雜

RDB Design Steps

Step 1: Write a Sentence

用一句話描述需要儲存的資料:

  • Store the customer information and the orders that they made
  • Store the car and the showroom that they are located in
  • Store the students and the courses they are enrolled in at a specific date

Step 2: Pick Objects From Sentence

從句子中找出主要物件:

  1. customer information
  2. orders
  3. car
  4. showroom
  5. students
  6. courses
  7. date

Step 3: Determine Relationship

問自己: Does object-1 have many object-2s, or does object-2 have many object-1s?
  • Does a customer have many orders, or does an order have many customers?
  • Does a car have many showrooms, or does a showroom have many cars?
  • Does a student have many courses, or does a course have many students?

Step 4: Create a Diagram

Schema Diagram

Step 5: Joining Table

當遇到 Many-to-Many 關係時,需要建立 Joining Table(關聯表)。

例如:Student 和 Class 的關係

錯誤做法:

Student Class Wrong

正確做法:

Student Class Correct

ER Diagram:

Student Class ER

Normalization

正規化是將資料組織成表格的過程,以減少資料冗餘並確保資料完整性。

1NF (First Normal Form)

第一正規化:除去重複值

  1. Does the combination of all columns make a unique row every single time?
  2. What field can be used to uniquely identify the row?
1NF Example

2NF (Second Normal Form)

第二正規化:除去部分相依 / 去除只與部分主鍵相關欄位

  1. Fulfil the requirements of first normal form
  2. Each non-key attribute must be functionally dependent on the primary key

判斷欄位是否依賴主鍵:

欄位依賴 PK?說明
student nameYes不同學生 ID 代表不同學生姓名
fees paidYes每個費用值是針對單一學生
date of birthYes出生日期特定於該學生
addressYes地址特定於該學生
subject 1~4No一門學科可以有多位學生註冊
teacher nameNo老師名字不依賴此學生
teacher addressNo老師地址不依賴此學生
course nameNo課程名稱不取決於學生

不依賴主鍵的欄位應移到新表格:

  • Subject (subject ID, subject name)
  • Teacher (teacher ID, teacher name, address)
  • Course (course ID, course name)
  • Student (student ID, student name, fees paid, date of birth, address)

Foreign Keys in Tables

問自己:

  • Does a course have many students, or does a student have many courses?
  • Does a teacher have many courses, or does a course have many teachers?
  • Does a subject have many students, or does a student have many subjects?

2NF 結果:

2NF Result

3NF (Third Normal Form)

第三正規化:去除間接相依 / 去除除主鍵外有相依關係的欄位

Every attribute that is not the primary key must depend on the primary key and the primary key only.

Transitive Functional Dependency(遞移函數相依):

  • Column A determines column B
  • Column B determines column C
  • Therefore, column A determines C

以地址為例:

address
3 Main Street, North Boston 56125
16 Leeds Road, South Boston 56128

ZIP code 決定 citystatesuburbStudent 決定 address ZIP code,進而決定 suburb

拆分結果:

  • Address Code (address code ID, ZIP code, suburb, city, state)
  • Student Table → (student ID, course ID, student name, fees paid, date of birth, street address, address code ID)
  • Teacher Table → (teacher ID, teacher name, street address, address code ID)
對於大多數資料庫正規化,達到第三正規化就足夠了。

3NF ER Diagram:

3NF Result

4NF & BCNF

  • 4NF:處理多值相依(Multi-valued Dependency)
  • BCNF (Boyce-Codd Normal Form):複合主鍵需皆為外來鍵

MongoDB Schema Design

Design Rules

Rule 1: Favor embedding unless there is a compelling reason not to.
Rule 2: Avoid JOINs if they can be avoided.
Rule 3: Arrays should never grow without bound.
Rule 4: An object should not be embedded if it needs to be accessed individually.
Rule 5: How you model your data depends entirely on your application’s data access patterns.

One-to-One

{
    "_id": "ObjectId('AAA')",
    "name": "Joe Karlsson",
    "company": "MongoDB",
    "twitter": "@JoeKarlsson1",
    "website": "joekarlsson.com"
}

One-to-Few (Embedding)

當很少單獨讀取嵌入的物件時,使用嵌入模式。關聯數量通常在 1000 以內。

{
    "_id": "ObjectId('AAA')",
    "name": "Joe Karlsson",
    "addresses": [
        { "street": "123 Sesame St", "city": "Anytown", "cc": "USA" },
        { "street": "123 Avenue Q",  "city": "New York", "cc": "USA" }
    ]
}

One-to-Many (Child-Referencing)

例如:電商網站的產品頁面,一個產品由多個零件組成。

Products Collection:

{
    "name": "left-handed smoke shifter",
    "manufacturer": "Acme Corp",
    "parts": ["ObjectID('AAAA')", "ObjectID('BBBB')", "ObjectID('CCCC')"]
}

Parts Collection:

{
    "_id": "ObjectID('AAAA')",
    "partno": "123-aff-456",
    "name": "#4 grommet",
    "qty": "94",
    "cost": "0.94",
    "price": "3.99"
}

One-to-Squillions (Parent-Referencing)

適合資料量會隨時間成長的場景,例如系統日誌、交易日誌。

Host Collection:

{
    "_id": "ObjectID('AAAB')",
    "name": "goofy.example.com",
    "ipaddr": "127.66.66.66"
}

Log Collection:

{
    "time": "ISODate('2014-03-28T09:42:41.382Z')",
    "message": "cpu is on fire!",
    "host": "ObjectID('AAAB')"
}

Many-to-Many

例如:To-Do App,一個使用者可以有多個任務,一個任務可以分配給多個使用者。

Users Collection:

{
    "_id": "ObjectID('AAF1')",
    "name": "Kate Monster",
    "tasks": ["ObjectID('ADF9')", "ObjectID('AE02')", "ObjectID('AE73')"]
}

Tasks Collection:

{
    "_id": "ObjectID('ADF9')",
    "description": "Write blog post about MongoDB schema design",
    "due_date": "ISODate('2014-04-01')",
    "owners": ["ObjectID('AAF1')", "ObjectID('BB3G')"]
}