Skip to content

feat: 게시물 태그 정규화 (ARRAY → post_tags 테이블)#128

Closed
SnowSuno wants to merge 2 commits into
devfrom
claude/normalize-post-tags-GuN8d
Closed

feat: 게시물 태그 정규화 (ARRAY → post_tags 테이블)#128
SnowSuno wants to merge 2 commits into
devfrom
claude/normalize-post-tags-GuN8d

Conversation

@SnowSuno
Copy link
Copy Markdown
Member

  • Post 모델: ARRAY(String) 타입에서 Tag 엔티티 relationship으로 변경
  • Tag 모델: back_populates 활성화
  • PostRepository: 태그 검색 쿼리를 JOIN 기반으로 변경, find_or_create_tags 메서드 추가
  • PostService: 태그 생성/수정 시 Tag 엔티티 사용
  • 데이터 마이그레이션: 기존 ARRAY 데이터를 post_tags 테이블로 이행

- Post 모델: ARRAY(String) 타입에서 Tag 엔티티 relationship으로 변경
- Tag 모델: back_populates 활성화
- PostRepository: 태그 검색 쿼리를 JOIN 기반으로 변경, find_or_create_tags 메서드 추가
- PostService: 태그 생성/수정 시 Tag 엔티티 사용
- 데이터 마이그레이션: 기존 ARRAY 데이터를 post_tags 테이블로 이행
@SnowSuno
Copy link
Copy Markdown
Member Author

클로드로 끄적여봤는데 검토예정

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR normalizes post tags by migrating from an ARRAY(String) column to a proper many-to-many relationship using a Tag entity and post_tags join table. This enables better query performance, data integrity, and future extensibility.

  • Converts Post.tags from ARRAY(String) to a relationship with Tag entities via post_tags table
  • Updates all tag-related queries to use JOIN operations instead of array functions
  • Includes data migration to move existing ARRAY data to normalized tables

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
migrations/versions/2025-12-26_normalize_post_tags.py Data migration script to transfer ARRAY tags to tag/post_tags tables, with rollback support
app/tags/models.py Activates back_populates="tags" relationship to Post model
app/posts/models.py Changes tags field from ARRAY column to many-to-many relationship with Tag entity
app/posts/repository.py Updates tag search to use JOIN queries, adds find_or_create_tags and update_post_tags methods
app/posts/service.py Modifies create_post and update_post to use Tag entities instead of string arrays
app/posts/schemas.py Updates serialization to extract tag names from Tag entities

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/posts/models.py
tags: Mapped[list[str]] = mapped_column(
ARRAY(String(100)),
nullable=False,
tags: Mapped[List["Tag"]] = relationship(
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent type hint style: The Post model uses List["Tag"] (capital L from typing module) for the tags relationship, while other collections in the same file use list["PostImage"] (lowercase). For consistency and to follow modern Python standards (Python 3.9+), consider using lowercase list["Tag"] instead.

Suggested change
tags: Mapped[List["Tag"]] = relationship(
tags: Mapped[list["Tag"]] = relationship(

Copilot uses AI. Check for mistakes.
Comment thread app/posts/repository.py
Comment on lines +160 to +175
tags: list[Tag] = []
for name in tag_names:
# 기존 태그 찾기
existing_tag = await self.session.scalar(
select(Tag).where(Tag.name == name)
)
if existing_tag:
tags.append(existing_tag)
else:
# 새 태그 생성
new_tag = Tag(name=name, section=None)
self.session.add(new_tag)
await self.session.flush()
tags.append(new_tag)
return tags

Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The find_or_create_tags method performs N+1 queries where each tag name triggers a separate database query. For better performance, consider fetching all existing tags in a single query first, then only create the missing ones. This would reduce the number of database round trips from O(n) to O(1) + O(m) where n is the number of tag names and m is the number of new tags to create.

Suggested change
tags: list[Tag] = []
for name in tag_names:
# 기존 태그 찾기
existing_tag = await self.session.scalar(
select(Tag).where(Tag.name == name)
)
if existing_tag:
tags.append(existing_tag)
else:
# 새 태그 생성
new_tag = Tag(name=name, section=None)
self.session.add(new_tag)
await self.session.flush()
tags.append(new_tag)
return tags
# 중복 제거용 집합 (실제 반환 시에는 원래 순서 및 중복을 유지)
unique_names = set(tag_names)
# 한 번의 쿼리로 기존 태그 모두 조회
existing_tags_result = await self.session.scalars(
select(Tag).where(Tag.name.in_(unique_names))
)
existing_tags = list(existing_tags_result)
# 이름 -> Tag 매핑
tags_by_name: dict[str, Tag] = {tag.name: tag for tag in existing_tags}
# 아직 존재하지 않는 태그 이름들
missing_names = [name for name in unique_names if name not in tags_by_name]
new_tags: list[Tag] = []
for name in missing_names:
new_tag = Tag(name=name, section=None)
new_tags.append(new_tag)
if new_tags:
self.session.add_all(new_tags)
await self.session.flush()
for tag in new_tags:
tags_by_name[tag.name] = tag
# 입력된 tag_names 순서를 유지하고, 중복도 그대로 유지
return [tags_by_name[name] for name in tag_names]

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +49
existing_tag = conn.execute(
sa.text("SELECT id FROM tag WHERE name = :name"),
{"name": tag_name},
).fetchone()

if existing_tag:
tag_id = existing_tag[0]
else:
# 새 태그 생성
result = conn.execute(
sa.text("INSERT INTO tag (name, section) VALUES (:name, NULL) RETURNING id"),
{"name": tag_name},
)
tag_id = result.fetchone()[0]
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration script performs multiple database queries in a loop (check if tag exists, then insert if not). This could lead to race conditions if multiple posts have the same tag name, potentially resulting in duplicate tag entries if two posts process the same tag name concurrently. Consider using a unique constraint on tag.name or using INSERT ON CONFLICT DO NOTHING with RETURNING to handle this atomically.

Copilot uses AI. Check for mistakes.
Comment thread app/posts/repository.py Outdated
Comment on lines +163 to +173
existing_tag = await self.session.scalar(
select(Tag).where(Tag.name == name)
)
if existing_tag:
tags.append(existing_tag)
else:
# 새 태그 생성
new_tag = Tag(name=name, section=None)
self.session.add(new_tag)
await self.session.flush()
tags.append(new_tag)
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The find_or_create_tags method has a similar race condition issue as the migration. If multiple concurrent requests try to create the same tag simultaneously, duplicate tags could be created. Consider using INSERT ON CONFLICT (name) DO NOTHING or implementing proper locking mechanism. This would require a unique constraint on tag.name to work properly.

Copilot uses AI. Check for mistakes.
- Tag 모델: id/section 제거, name을 PK로 설정
- post_tags 테이블: tag_id → tag_name으로 변경
- Repository: 태그 검색 시 JOIN 불필요 (post_tags에서 직접 검색)
- 마이그레이션: 기존 구조를 새 구조로 변환

이점:
- 게시물 조회 시 Tag 테이블 JOIN 불필요
- 태그 검색, 인기 태그 등 모든 쿼리 단순화
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/posts/repository.py
Comment on lines +158 to +169
tags: list[Tag] = []
for name in tag_names:
# 기존 태그 찾기 (name이 PK이므로 get으로 조회 가능)
existing_tag = await self.session.get(Tag, name)
if existing_tag:
tags.append(existing_tag)
else:
# 새 태그 생성
new_tag = Tag(name=name)
self.session.add(new_tag)
await self.session.flush()
tags.append(new_tag)
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The find_or_create_tags method calls session.get() and potentially session.flush() in a loop for each tag. This results in N database round-trips where N is the number of tags. For posts with many tags, this could be a performance bottleneck. Consider using a single query to fetch all existing tags at once, then creating only the missing ones in bulk.

Suggested change
tags: list[Tag] = []
for name in tag_names:
# 기존 태그 찾기 (name이 PK이므로 get으로 조회 가능)
existing_tag = await self.session.get(Tag, name)
if existing_tag:
tags.append(existing_tag)
else:
# 새 태그 생성
new_tag = Tag(name=name)
self.session.add(new_tag)
await self.session.flush()
tags.append(new_tag)
# 중복 이름을 제거하여 기존 태그를 한 번에 조회
unique_names = set(tag_names)
existing_tags_result = await self.session.scalars(
select(Tag).where(Tag.name.in_(unique_names))
)
existing_tags = existing_tags_result.all()
existing_by_name: dict[str, Tag] = {tag.name: tag for tag in existing_tags}
tags: list[Tag] = []
new_tags: list[Tag] = []
# 입력된 tag_names 순서를 그대로 유지
for name in tag_names:
tag = existing_by_name.get(name)
if tag is not None:
tags.append(tag)
else:
new_tag = Tag(name=name)
new_tags.append(new_tag)
tags.append(new_tag)
existing_by_name[name] = new_tag
if new_tags:
self.session.add_all(new_tags)
await self.session.flush()

Copilot uses AI. Check for mistakes.
Comment thread app/posts/repository.py
return updated_post

async def find_or_create_tags(self, *, tag_names: list[str]) -> list[Tag]:
"""태그 이름 리스트를 받아서 Tag 엔티티 리스트를 반환. 없으면 생성."""
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The find_or_create_tags method lacks a docstring explanation of its behavior, particularly regarding the transaction semantics and potential race conditions. Consider adding documentation about when flush() is called and how callers should handle potential integrity errors in concurrent scenarios.

Suggested change
"""태그 이름 리스트를 받아서 Tag 엔티티 리스트를 반환. 없으면 생성."""
"""
주어진 태그 이름 리스트에 대해 Tag 엔티티 리스트를 반환한다.
- 이미 존재하는 태그는 `name`(PK 또는 UNIQUE 컬럼) 기준으로 조회해서 그대로 반환한다.
- 존재하지 않는 태그는 새로 생성해서 세션에 추가한 , ** 태그마다 즉시 `flush()` 호출**한다.
트랜잭션/동시성 관련:
- 메서드는 세션 수준에서만 `flush()` 수행하며, 트랜잭션의 `commit()`/`rollback()` 호출하지 않는다.
따라서 실제 커밋 시점과 트랜잭션 경계는 호출자가 관리해야 한다.
- 태그 이름이 유니크(또는 PK) 제약을 가지는 경우, 여러 트랜잭션이 동시에 같은 이름의 태그를
생성하려고 하면 `flush()` 시점에 무결성 제약(: IntegrityError) 발생할 있다.
호출자는 다음과 같이 사용하는 것을 권장한다.
- 상위 서비스/핸들러에서 트랜잭션을 시작한 메서드를 호출한다.
- 동시 태그 생성 가능성이 있는 경우, IntegrityError 등을 캐치하여 재시도하거나,
실패 해당 태그를 다시 조회하는 패턴을 적용한다.
"""

Copilot uses AI. Check for mistakes.
Comment thread app/posts/models.py
Comment on lines +56 to 61
tags: Mapped[List["Tag"]] = relationship(
"Tag",
default_factory=list,
secondary=post_tag_table,
back_populates="posts",
)
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Post model's tags relationship doesn't have init=False, which means tags must be provided during Post initialization. However, in the create_post method, the Post is initialized without tags (line 138-143), and tags are assigned afterward (line 144). This inconsistency could cause issues. Consider adding init=False to the tags relationship definition in the Post model to match the pattern used for other relationships.

Copilot uses AI. Check for mistakes.
Comment thread app/posts/repository.py
Comment on lines +153 to +170
async def find_or_create_tags(self, *, tag_names: list[str]) -> list[Tag]:
"""태그 이름 리스트를 받아서 Tag 엔티티 리스트를 반환. 없으면 생성."""
if not tag_names:
return []

tags: list[Tag] = []
for name in tag_names:
# 기존 태그 찾기 (name이 PK이므로 get으로 조회 가능)
existing_tag = await self.session.get(Tag, name)
if existing_tag:
tags.append(existing_tag)
else:
# 새 태그 생성
new_tag = Tag(name=name)
self.session.add(new_tag)
await self.session.flush()
tags.append(new_tag)
return tags
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The find_or_create_tags method doesn't validate tag names before creating Tag entities. While the schema has validation (max_length=30, pattern matching), tags could potentially be created with invalid names if this method is called from other contexts. Consider adding validation or at least trimming whitespace to ensure data consistency at the repository level.

Copilot uses AI. Check for mistakes.
# 4. tag 테이블 구조 복원
op.drop_constraint("tag_pkey", "tag", type_="primary")
op.add_column("tag", sa.Column("id", sa.Integer, autoincrement=True))
op.add_column("tag", sa.Column("section", sa.String(20), nullable=True))
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the downgrade migration, after restoring the tag.id column as autoincrement, the existing tag names won't automatically get sequential IDs assigned. The tag table will have rows with NULL id values, which will violate the primary key constraint when you try to create it at line 119. Consider using a sequence or explicit ID assignment before creating the primary key constraint.

Suggested change
op.add_column("tag", sa.Column("section", sa.String(20), nullable=True))
op.add_column("tag", sa.Column("section", sa.String(20), nullable=True))
# 기존 태그 레코드들에 대해 id 값 할당 (name 기준 순번)
conn.execute(
sa.text(
"""
WITH numbered AS (
SELECT name, ROW_NUMBER() OVER (ORDER BY name) AS new_id
FROM tag
)
UPDATE tag AS t
SET id = numbered.new_id
FROM numbered
WHERE t.name = numbered.name
"""
)
)

Copilot uses AI. Check for mistakes.
op.drop_constraint("tag_pkey", "tag", type_="primary")
op.drop_column("tag", "id")
op.drop_column("tag", "section")
op.alter_column("tag", "name", type_=sa.String(30), nullable=False)
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Tag model defines name with String(30) max length, but the migration creates the tag table column with String(30) in the upgrade, and when downgrading it restores the original column without specifying the length constraint. The original Tag model had String(20), so this creates an inconsistency. The downgrade should restore String(20) to match the original schema, or the comment should document that this is an intentional breaking change.

Copilot uses AI. Check for mistakes.
# 1. post.tags ARRAY 컬럼 복원
op.add_column(
"post",
sa.Column("tags", sa.ARRAY(sa.String(length=100)), nullable=False, server_default="{}"),
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downgrade function sets nullable=False and server_default='{}' for the tags column, but this could fail if there are posts without any tags in the post_tags table. When COALESCE returns '{}', the server_default may not be necessary, but the nullable=False constraint should be reconsidered - consider making it nullable=True to handle edge cases where posts might not have any tags.

Suggested change
sa.Column("tags", sa.ARRAY(sa.String(length=100)), nullable=False, server_default="{}"),
sa.Column("tags", sa.ARRAY(sa.String(length=100)), nullable=True),

Copilot uses AI. Check for mistakes.
Comment thread app/posts/repository.py
Comment on lines +153 to +170
async def find_or_create_tags(self, *, tag_names: list[str]) -> list[Tag]:
"""태그 이름 리스트를 받아서 Tag 엔티티 리스트를 반환. 없으면 생성."""
if not tag_names:
return []

tags: list[Tag] = []
for name in tag_names:
# 기존 태그 찾기 (name이 PK이므로 get으로 조회 가능)
existing_tag = await self.session.get(Tag, name)
if existing_tag:
tags.append(existing_tag)
else:
# 새 태그 생성
new_tag = Tag(name=name)
self.session.add(new_tag)
await self.session.flush()
tags.append(new_tag)
return tags
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the find_or_create_tags method, there's a potential race condition when multiple concurrent requests try to create the same tag. Between checking if a tag exists (line 161) and creating it (line 166), another request might create the same tag, causing a primary key violation. Consider using a try-except block with IntegrityError or use an INSERT ... ON CONFLICT approach to handle concurrent tag creation safely.

Copilot uses AI. Check for mistakes.
Comment thread app/posts/service.py
Comment on lines +208 to 217
# 태그 엔티티 찾거나 생성
tags = await self.post_repository.find_or_create_tags(tag_names=post.tags)
await self.post_repository.update_post_tags(post=saved_post, tags=tags)

updated_post = await self.post_repository.update_post(
post_id=post_id,
title=post.title,
description=post.description,
category=post.category,
tags=post.tags,
)
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update_post_tags method is called before the update_post method, but update_post returns a new instance with joinedload options. This means the updated_post returned at line 212 may not reflect the tag changes made at line 210, potentially causing stale data to be returned to the client. Consider moving the tag update after the post update, or ensure the tags relationship is properly refreshed.

Copilot uses AI. Check for mistakes.
op.create_index("ix_tag_id", "tag", ["id"])

# 5. 기존 태그에 ID 할당 및 post_tags 복원
conn.execute(sa.text("DELETE FROM post_tags"))
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downgrade migration deletes all data from post_tags at line 123, but at this point the array data has already been restored to the post.tags column (lines 98-106). This means the relationship data in post_tags is permanently lost during downgrade. If the migration is rolled back and then upgraded again, the post_tags relationships won't be recreated. Consider preserving this data or documenting that downgrade is destructive.

Copilot uses AI. Check for mistakes.
@kjy2844
Copy link
Copy Markdown
Contributor

kjy2844 commented Dec 27, 2025

거절합니다

@SnowSuno SnowSuno closed this Jan 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants