feat: 게시물 태그 정규화 (ARRAY → post_tags 테이블)#128
Conversation
SnowSuno
commented
Dec 26, 2025
- Post 모델: ARRAY(String) 타입에서 Tag 엔티티 relationship으로 변경
- Tag 모델: back_populates 활성화
- PostRepository: 태그 검색 쿼리를 JOIN 기반으로 변경, find_or_create_tags 메서드 추가
- PostService: 태그 생성/수정 시 Tag 엔티티 사용
- 데이터 마이그레이션: 기존 ARRAY 데이터를 post_tags 테이블로 이행
- Post 모델: ARRAY(String) 타입에서 Tag 엔티티 relationship으로 변경 - Tag 모델: back_populates 활성화 - PostRepository: 태그 검색 쿼리를 JOIN 기반으로 변경, find_or_create_tags 메서드 추가 - PostService: 태그 생성/수정 시 Tag 엔티티 사용 - 데이터 마이그레이션: 기존 ARRAY 데이터를 post_tags 테이블로 이행
|
클로드로 끄적여봤는데 검토예정 |
There was a problem hiding this comment.
Pull request overview
This PR normalizes post tags by migrating from an ARRAY(String) column to a proper many-to-many relationship using a Tag entity and post_tags join table. This enables better query performance, data integrity, and future extensibility.
- Converts Post.tags from
ARRAY(String)to a relationship with Tag entities via post_tags table - Updates all tag-related queries to use JOIN operations instead of array functions
- Includes data migration to move existing ARRAY data to normalized tables
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| migrations/versions/2025-12-26_normalize_post_tags.py | Data migration script to transfer ARRAY tags to tag/post_tags tables, with rollback support |
| app/tags/models.py | Activates back_populates="tags" relationship to Post model |
| app/posts/models.py | Changes tags field from ARRAY column to many-to-many relationship with Tag entity |
| app/posts/repository.py | Updates tag search to use JOIN queries, adds find_or_create_tags and update_post_tags methods |
| app/posts/service.py | Modifies create_post and update_post to use Tag entities instead of string arrays |
| app/posts/schemas.py | Updates serialization to extract tag names from Tag entities |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| tags: Mapped[list[str]] = mapped_column( | ||
| ARRAY(String(100)), | ||
| nullable=False, | ||
| tags: Mapped[List["Tag"]] = relationship( |
There was a problem hiding this comment.
Inconsistent type hint style: The Post model uses List["Tag"] (capital L from typing module) for the tags relationship, while other collections in the same file use list["PostImage"] (lowercase). For consistency and to follow modern Python standards (Python 3.9+), consider using lowercase list["Tag"] instead.
| tags: Mapped[List["Tag"]] = relationship( | |
| tags: Mapped[list["Tag"]] = relationship( |
| tags: list[Tag] = [] | ||
| for name in tag_names: | ||
| # 기존 태그 찾기 | ||
| existing_tag = await self.session.scalar( | ||
| select(Tag).where(Tag.name == name) | ||
| ) | ||
| if existing_tag: | ||
| tags.append(existing_tag) | ||
| else: | ||
| # 새 태그 생성 | ||
| new_tag = Tag(name=name, section=None) | ||
| self.session.add(new_tag) | ||
| await self.session.flush() | ||
| tags.append(new_tag) | ||
| return tags | ||
|
|
There was a problem hiding this comment.
The find_or_create_tags method performs N+1 queries where each tag name triggers a separate database query. For better performance, consider fetching all existing tags in a single query first, then only create the missing ones. This would reduce the number of database round trips from O(n) to O(1) + O(m) where n is the number of tag names and m is the number of new tags to create.
| tags: list[Tag] = [] | |
| for name in tag_names: | |
| # 기존 태그 찾기 | |
| existing_tag = await self.session.scalar( | |
| select(Tag).where(Tag.name == name) | |
| ) | |
| if existing_tag: | |
| tags.append(existing_tag) | |
| else: | |
| # 새 태그 생성 | |
| new_tag = Tag(name=name, section=None) | |
| self.session.add(new_tag) | |
| await self.session.flush() | |
| tags.append(new_tag) | |
| return tags | |
| # 중복 제거용 집합 (실제 반환 시에는 원래 순서 및 중복을 유지) | |
| unique_names = set(tag_names) | |
| # 한 번의 쿼리로 기존 태그 모두 조회 | |
| existing_tags_result = await self.session.scalars( | |
| select(Tag).where(Tag.name.in_(unique_names)) | |
| ) | |
| existing_tags = list(existing_tags_result) | |
| # 이름 -> Tag 매핑 | |
| tags_by_name: dict[str, Tag] = {tag.name: tag for tag in existing_tags} | |
| # 아직 존재하지 않는 태그 이름들 | |
| missing_names = [name for name in unique_names if name not in tags_by_name] | |
| new_tags: list[Tag] = [] | |
| for name in missing_names: | |
| new_tag = Tag(name=name, section=None) | |
| new_tags.append(new_tag) | |
| if new_tags: | |
| self.session.add_all(new_tags) | |
| await self.session.flush() | |
| for tag in new_tags: | |
| tags_by_name[tag.name] = tag | |
| # 입력된 tag_names 순서를 유지하고, 중복도 그대로 유지 | |
| return [tags_by_name[name] for name in tag_names] |
| existing_tag = conn.execute( | ||
| sa.text("SELECT id FROM tag WHERE name = :name"), | ||
| {"name": tag_name}, | ||
| ).fetchone() | ||
|
|
||
| if existing_tag: | ||
| tag_id = existing_tag[0] | ||
| else: | ||
| # 새 태그 생성 | ||
| result = conn.execute( | ||
| sa.text("INSERT INTO tag (name, section) VALUES (:name, NULL) RETURNING id"), | ||
| {"name": tag_name}, | ||
| ) | ||
| tag_id = result.fetchone()[0] |
There was a problem hiding this comment.
The migration script performs multiple database queries in a loop (check if tag exists, then insert if not). This could lead to race conditions if multiple posts have the same tag name, potentially resulting in duplicate tag entries if two posts process the same tag name concurrently. Consider using a unique constraint on tag.name or using INSERT ON CONFLICT DO NOTHING with RETURNING to handle this atomically.
| existing_tag = await self.session.scalar( | ||
| select(Tag).where(Tag.name == name) | ||
| ) | ||
| if existing_tag: | ||
| tags.append(existing_tag) | ||
| else: | ||
| # 새 태그 생성 | ||
| new_tag = Tag(name=name, section=None) | ||
| self.session.add(new_tag) | ||
| await self.session.flush() | ||
| tags.append(new_tag) |
There was a problem hiding this comment.
The find_or_create_tags method has a similar race condition issue as the migration. If multiple concurrent requests try to create the same tag simultaneously, duplicate tags could be created. Consider using INSERT ON CONFLICT (name) DO NOTHING or implementing proper locking mechanism. This would require a unique constraint on tag.name to work properly.
- Tag 모델: id/section 제거, name을 PK로 설정 - post_tags 테이블: tag_id → tag_name으로 변경 - Repository: 태그 검색 시 JOIN 불필요 (post_tags에서 직접 검색) - 마이그레이션: 기존 구조를 새 구조로 변환 이점: - 게시물 조회 시 Tag 테이블 JOIN 불필요 - 태그 검색, 인기 태그 등 모든 쿼리 단순화
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| tags: list[Tag] = [] | ||
| for name in tag_names: | ||
| # 기존 태그 찾기 (name이 PK이므로 get으로 조회 가능) | ||
| existing_tag = await self.session.get(Tag, name) | ||
| if existing_tag: | ||
| tags.append(existing_tag) | ||
| else: | ||
| # 새 태그 생성 | ||
| new_tag = Tag(name=name) | ||
| self.session.add(new_tag) | ||
| await self.session.flush() | ||
| tags.append(new_tag) |
There was a problem hiding this comment.
The find_or_create_tags method calls session.get() and potentially session.flush() in a loop for each tag. This results in N database round-trips where N is the number of tags. For posts with many tags, this could be a performance bottleneck. Consider using a single query to fetch all existing tags at once, then creating only the missing ones in bulk.
| tags: list[Tag] = [] | |
| for name in tag_names: | |
| # 기존 태그 찾기 (name이 PK이므로 get으로 조회 가능) | |
| existing_tag = await self.session.get(Tag, name) | |
| if existing_tag: | |
| tags.append(existing_tag) | |
| else: | |
| # 새 태그 생성 | |
| new_tag = Tag(name=name) | |
| self.session.add(new_tag) | |
| await self.session.flush() | |
| tags.append(new_tag) | |
| # 중복 이름을 제거하여 기존 태그를 한 번에 조회 | |
| unique_names = set(tag_names) | |
| existing_tags_result = await self.session.scalars( | |
| select(Tag).where(Tag.name.in_(unique_names)) | |
| ) | |
| existing_tags = existing_tags_result.all() | |
| existing_by_name: dict[str, Tag] = {tag.name: tag for tag in existing_tags} | |
| tags: list[Tag] = [] | |
| new_tags: list[Tag] = [] | |
| # 입력된 tag_names 순서를 그대로 유지 | |
| for name in tag_names: | |
| tag = existing_by_name.get(name) | |
| if tag is not None: | |
| tags.append(tag) | |
| else: | |
| new_tag = Tag(name=name) | |
| new_tags.append(new_tag) | |
| tags.append(new_tag) | |
| existing_by_name[name] = new_tag | |
| if new_tags: | |
| self.session.add_all(new_tags) | |
| await self.session.flush() |
| return updated_post | ||
|
|
||
| async def find_or_create_tags(self, *, tag_names: list[str]) -> list[Tag]: | ||
| """태그 이름 리스트를 받아서 Tag 엔티티 리스트를 반환. 없으면 생성.""" |
There was a problem hiding this comment.
The find_or_create_tags method lacks a docstring explanation of its behavior, particularly regarding the transaction semantics and potential race conditions. Consider adding documentation about when flush() is called and how callers should handle potential integrity errors in concurrent scenarios.
| """태그 이름 리스트를 받아서 Tag 엔티티 리스트를 반환. 없으면 생성.""" | |
| """ | |
| 주어진 태그 이름 리스트에 대해 Tag 엔티티 리스트를 반환한다. | |
| - 이미 존재하는 태그는 `name`(PK 또는 UNIQUE 컬럼) 기준으로 조회해서 그대로 반환한다. | |
| - 존재하지 않는 태그는 새로 생성해서 세션에 추가한 뒤, **각 태그마다 즉시 `flush()`를 호출**한다. | |
| 트랜잭션/동시성 관련: | |
| - 이 메서드는 세션 수준에서만 `flush()`를 수행하며, 트랜잭션의 `commit()`/`rollback()`은 호출하지 않는다. | |
| 따라서 실제 커밋 시점과 트랜잭션 경계는 호출자가 관리해야 한다. | |
| - 태그 이름이 유니크(또는 PK) 제약을 가지는 경우, 여러 트랜잭션이 동시에 같은 이름의 태그를 | |
| 생성하려고 하면 `flush()` 시점에 무결성 제약(예: IntegrityError)이 발생할 수 있다. | |
| 호출자는 다음과 같이 사용하는 것을 권장한다. | |
| - 상위 서비스/핸들러에서 트랜잭션을 시작한 뒤 이 메서드를 호출한다. | |
| - 동시 태그 생성 가능성이 있는 경우, IntegrityError 등을 캐치하여 재시도하거나, | |
| 실패 시 해당 태그를 다시 조회하는 패턴을 적용한다. | |
| """ |
| tags: Mapped[List["Tag"]] = relationship( | ||
| "Tag", | ||
| default_factory=list, | ||
| secondary=post_tag_table, | ||
| back_populates="posts", | ||
| ) |
There was a problem hiding this comment.
The Post model's tags relationship doesn't have init=False, which means tags must be provided during Post initialization. However, in the create_post method, the Post is initialized without tags (line 138-143), and tags are assigned afterward (line 144). This inconsistency could cause issues. Consider adding init=False to the tags relationship definition in the Post model to match the pattern used for other relationships.
| async def find_or_create_tags(self, *, tag_names: list[str]) -> list[Tag]: | ||
| """태그 이름 리스트를 받아서 Tag 엔티티 리스트를 반환. 없으면 생성.""" | ||
| if not tag_names: | ||
| return [] | ||
|
|
||
| tags: list[Tag] = [] | ||
| for name in tag_names: | ||
| # 기존 태그 찾기 (name이 PK이므로 get으로 조회 가능) | ||
| existing_tag = await self.session.get(Tag, name) | ||
| if existing_tag: | ||
| tags.append(existing_tag) | ||
| else: | ||
| # 새 태그 생성 | ||
| new_tag = Tag(name=name) | ||
| self.session.add(new_tag) | ||
| await self.session.flush() | ||
| tags.append(new_tag) | ||
| return tags |
There was a problem hiding this comment.
The find_or_create_tags method doesn't validate tag names before creating Tag entities. While the schema has validation (max_length=30, pattern matching), tags could potentially be created with invalid names if this method is called from other contexts. Consider adding validation or at least trimming whitespace to ensure data consistency at the repository level.
| # 4. tag 테이블 구조 복원 | ||
| op.drop_constraint("tag_pkey", "tag", type_="primary") | ||
| op.add_column("tag", sa.Column("id", sa.Integer, autoincrement=True)) | ||
| op.add_column("tag", sa.Column("section", sa.String(20), nullable=True)) |
There was a problem hiding this comment.
In the downgrade migration, after restoring the tag.id column as autoincrement, the existing tag names won't automatically get sequential IDs assigned. The tag table will have rows with NULL id values, which will violate the primary key constraint when you try to create it at line 119. Consider using a sequence or explicit ID assignment before creating the primary key constraint.
| op.add_column("tag", sa.Column("section", sa.String(20), nullable=True)) | |
| op.add_column("tag", sa.Column("section", sa.String(20), nullable=True)) | |
| # 기존 태그 레코드들에 대해 id 값 할당 (name 기준 순번) | |
| conn.execute( | |
| sa.text( | |
| """ | |
| WITH numbered AS ( | |
| SELECT name, ROW_NUMBER() OVER (ORDER BY name) AS new_id | |
| FROM tag | |
| ) | |
| UPDATE tag AS t | |
| SET id = numbered.new_id | |
| FROM numbered | |
| WHERE t.name = numbered.name | |
| """ | |
| ) | |
| ) |
| op.drop_constraint("tag_pkey", "tag", type_="primary") | ||
| op.drop_column("tag", "id") | ||
| op.drop_column("tag", "section") | ||
| op.alter_column("tag", "name", type_=sa.String(30), nullable=False) |
There was a problem hiding this comment.
The Tag model defines name with String(30) max length, but the migration creates the tag table column with String(30) in the upgrade, and when downgrading it restores the original column without specifying the length constraint. The original Tag model had String(20), so this creates an inconsistency. The downgrade should restore String(20) to match the original schema, or the comment should document that this is an intentional breaking change.
| # 1. post.tags ARRAY 컬럼 복원 | ||
| op.add_column( | ||
| "post", | ||
| sa.Column("tags", sa.ARRAY(sa.String(length=100)), nullable=False, server_default="{}"), |
There was a problem hiding this comment.
The downgrade function sets nullable=False and server_default='{}' for the tags column, but this could fail if there are posts without any tags in the post_tags table. When COALESCE returns '{}', the server_default may not be necessary, but the nullable=False constraint should be reconsidered - consider making it nullable=True to handle edge cases where posts might not have any tags.
| sa.Column("tags", sa.ARRAY(sa.String(length=100)), nullable=False, server_default="{}"), | |
| sa.Column("tags", sa.ARRAY(sa.String(length=100)), nullable=True), |
| async def find_or_create_tags(self, *, tag_names: list[str]) -> list[Tag]: | ||
| """태그 이름 리스트를 받아서 Tag 엔티티 리스트를 반환. 없으면 생성.""" | ||
| if not tag_names: | ||
| return [] | ||
|
|
||
| tags: list[Tag] = [] | ||
| for name in tag_names: | ||
| # 기존 태그 찾기 (name이 PK이므로 get으로 조회 가능) | ||
| existing_tag = await self.session.get(Tag, name) | ||
| if existing_tag: | ||
| tags.append(existing_tag) | ||
| else: | ||
| # 새 태그 생성 | ||
| new_tag = Tag(name=name) | ||
| self.session.add(new_tag) | ||
| await self.session.flush() | ||
| tags.append(new_tag) | ||
| return tags |
There was a problem hiding this comment.
In the find_or_create_tags method, there's a potential race condition when multiple concurrent requests try to create the same tag. Between checking if a tag exists (line 161) and creating it (line 166), another request might create the same tag, causing a primary key violation. Consider using a try-except block with IntegrityError or use an INSERT ... ON CONFLICT approach to handle concurrent tag creation safely.
| # 태그 엔티티 찾거나 생성 | ||
| tags = await self.post_repository.find_or_create_tags(tag_names=post.tags) | ||
| await self.post_repository.update_post_tags(post=saved_post, tags=tags) | ||
|
|
||
| updated_post = await self.post_repository.update_post( | ||
| post_id=post_id, | ||
| title=post.title, | ||
| description=post.description, | ||
| category=post.category, | ||
| tags=post.tags, | ||
| ) |
There was a problem hiding this comment.
The update_post_tags method is called before the update_post method, but update_post returns a new instance with joinedload options. This means the updated_post returned at line 212 may not reflect the tag changes made at line 210, potentially causing stale data to be returned to the client. Consider moving the tag update after the post update, or ensure the tags relationship is properly refreshed.
| op.create_index("ix_tag_id", "tag", ["id"]) | ||
|
|
||
| # 5. 기존 태그에 ID 할당 및 post_tags 복원 | ||
| conn.execute(sa.text("DELETE FROM post_tags")) |
There was a problem hiding this comment.
The downgrade migration deletes all data from post_tags at line 123, but at this point the array data has already been restored to the post.tags column (lines 98-106). This means the relationship data in post_tags is permanently lost during downgrade. If the migration is rolled back and then upgraded again, the post_tags relationships won't be recreated. Consider preserving this data or documenting that downgrade is destructive.
|
거절합니다 |