feat: 게시물 태그 정규화 (ARRAY → post_tags 테이블) by SnowSuno · Pull Request #128 · ShowbilityProject/showbility-server

SnowSuno · 2025-12-26T01:43:55Z

Post 모델: ARRAY(String) 타입에서 Tag 엔티티 relationship으로 변경
Tag 모델: back_populates 활성화
PostRepository: 태그 검색 쿼리를 JOIN 기반으로 변경, find_or_create_tags 메서드 추가
PostService: 태그 생성/수정 시 Tag 엔티티 사용
데이터 마이그레이션: 기존 ARRAY 데이터를 post_tags 테이블로 이행

- Post 모델: ARRAY(String) 타입에서 Tag 엔티티 relationship으로 변경 - Tag 모델: back_populates 활성화 - PostRepository: 태그 검색 쿼리를 JOIN 기반으로 변경, find_or_create_tags 메서드 추가 - PostService: 태그 생성/수정 시 Tag 엔티티 사용 - 데이터 마이그레이션: 기존 ARRAY 데이터를 post_tags 테이블로 이행

SnowSuno · 2025-12-26T01:44:58Z

클로드로 끄적여봤는데 검토예정

Copilot

Pull request overview

This PR normalizes post tags by migrating from an ARRAY(String) column to a proper many-to-many relationship using a Tag entity and post_tags join table. This enables better query performance, data integrity, and future extensibility.

Converts Post.tags from ARRAY(String) to a relationship with Tag entities via post_tags table
Updates all tag-related queries to use JOIN operations instead of array functions
Includes data migration to move existing ARRAY data to normalized tables

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
migrations/versions/2025-12-26_normalize_post_tags.py	Data migration script to transfer ARRAY tags to tag/post_tags tables, with rollback support
app/tags/models.py	Activates back_populates="tags" relationship to Post model
app/posts/models.py	Changes tags field from ARRAY column to many-to-many relationship with Tag entity
app/posts/repository.py	Updates tag search to use JOIN queries, adds find_or_create_tags and update_post_tags methods
app/posts/service.py	Modifies create_post and update_post to use Tag entities instead of string arrays
app/posts/schemas.py	Updates serialization to extract tag names from Tag entities

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-26T02:08:51Z

-    tags: Mapped[list[str]] = mapped_column(
-        ARRAY(String(100)),
-        nullable=False,
+    tags: Mapped[List["Tag"]] = relationship(


Inconsistent type hint style: The Post model uses List["Tag"] (capital L from typing module) for the tags relationship, while other collections in the same file use list["PostImage"] (lowercase). For consistency and to follow modern Python standards (Python 3.9+), consider using lowercase list["Tag"] instead.

Suggested change

tags: Mapped[List["Tag"]] = relationship(

tags: Mapped[list["Tag"]] = relationship(

Copilot · 2025-12-26T02:08:51Z

+        tags: list[Tag] = []
+        for name in tag_names:
+            # 기존 태그 찾기
+            existing_tag = await self.session.scalar(
+                select(Tag).where(Tag.name == name)
+            )
+            if existing_tag:
+                tags.append(existing_tag)
+            else:
+                # 새 태그 생성
+                new_tag = Tag(name=name, section=None)
+                self.session.add(new_tag)
+                await self.session.flush()
+                tags.append(new_tag)
+        return tags
+


The find_or_create_tags method performs N+1 queries where each tag name triggers a separate database query. For better performance, consider fetching all existing tags in a single query first, then only create the missing ones. This would reduce the number of database round trips from O(n) to O(1) + O(m) where n is the number of tag names and m is the number of new tags to create.

Suggested change

tags: list[Tag] = []

for name in tag_names:

# 기존 태그 찾기

existing_tag = await self.session.scalar(

select(Tag).where(Tag.name == name)

)

if existing_tag:

tags.append(existing_tag)

else:

# 새 태그 생성

new_tag = Tag(name=name, section=None)

self.session.add(new_tag)

await self.session.flush()

tags.append(new_tag)

return tags

# 중복 제거용 집합 (실제 반환 시에는 원래 순서 및 중복을 유지)

unique_names = set(tag_names)

# 한 번의 쿼리로 기존 태그 모두 조회

existing_tags_result = await self.session.scalars(

select(Tag).where(Tag.name.in_(unique_names))

)

existing_tags = list(existing_tags_result)

# 이름 -> Tag 매핑

tags_by_name: dict[str, Tag] = {tag.name: tag for tag in existing_tags}

# 아직 존재하지 않는 태그 이름들

missing_names = [name for name in unique_names if name not in tags_by_name]

new_tags: list[Tag] = []

for name in missing_names:

new_tag = Tag(name=name, section=None)

new_tags.append(new_tag)

if new_tags:

self.session.add_all(new_tags)

await self.session.flush()

for tag in new_tags:

tags_by_name[tag.name] = tag

# 입력된 tag_names 순서를 유지하고, 중복도 그대로 유지

return [tags_by_name[name] for name in tag_names]

Copilot · 2025-12-26T02:08:51Z

+            existing_tag = conn.execute(
+                sa.text("SELECT id FROM tag WHERE name = :name"),
+                {"name": tag_name},
+            ).fetchone()
+
+            if existing_tag:
+                tag_id = existing_tag[0]
+            else:
+                # 새 태그 생성
+                result = conn.execute(
+                    sa.text("INSERT INTO tag (name, section) VALUES (:name, NULL) RETURNING id"),
+                    {"name": tag_name},
+                )
+                tag_id = result.fetchone()[0]


The migration script performs multiple database queries in a loop (check if tag exists, then insert if not). This could lead to race conditions if multiple posts have the same tag name, potentially resulting in duplicate tag entries if two posts process the same tag name concurrently. Consider using a unique constraint on tag.name or using INSERT ON CONFLICT DO NOTHING with RETURNING to handle this atomically.

Copilot · 2025-12-26T02:08:52Z

+            existing_tag = await self.session.scalar(
+                select(Tag).where(Tag.name == name)
+            )
+            if existing_tag:
+                tags.append(existing_tag)
+            else:
+                # 새 태그 생성
+                new_tag = Tag(name=name, section=None)
+                self.session.add(new_tag)
+                await self.session.flush()
+                tags.append(new_tag)


The find_or_create_tags method has a similar race condition issue as the migration. If multiple concurrent requests try to create the same tag simultaneously, duplicate tags could be created. Consider using INSERT ON CONFLICT (name) DO NOTHING or implementing proper locking mechanism. This would require a unique constraint on tag.name to work properly.

- Tag 모델: id/section 제거, name을 PK로 설정 - post_tags 테이블: tag_id → tag_name으로 변경 - Repository: 태그 검색 시 JOIN 불필요 (post_tags에서 직접 검색) - 마이그레이션: 기존 구조를 새 구조로 변환 이점: - 게시물 조회 시 Tag 테이블 JOIN 불필요 - 태그 검색, 인기 태그 등 모든 쿼리 단순화

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-26T03:02:59Z

+        tags: list[Tag] = []
+        for name in tag_names:
+            # 기존 태그 찾기 (name이 PK이므로 get으로 조회 가능)
+            existing_tag = await self.session.get(Tag, name)
+            if existing_tag:
+                tags.append(existing_tag)
+            else:
+                # 새 태그 생성
+                new_tag = Tag(name=name)
+                self.session.add(new_tag)
+                await self.session.flush()
+                tags.append(new_tag)


The find_or_create_tags method calls session.get() and potentially session.flush() in a loop for each tag. This results in N database round-trips where N is the number of tags. For posts with many tags, this could be a performance bottleneck. Consider using a single query to fetch all existing tags at once, then creating only the missing ones in bulk.

Suggested change

tags: list[Tag] = []

for name in tag_names:

# 기존 태그 찾기 (name이 PK이므로 get으로 조회 가능)

existing_tag = await self.session.get(Tag, name)

if existing_tag:

tags.append(existing_tag)

else:

# 새 태그 생성

new_tag = Tag(name=name)

self.session.add(new_tag)

await self.session.flush()

tags.append(new_tag)

# 중복 이름을 제거하여 기존 태그를 한 번에 조회

unique_names = set(tag_names)

existing_tags_result = await self.session.scalars(

select(Tag).where(Tag.name.in_(unique_names))

)

existing_tags = existing_tags_result.all()

existing_by_name: dict[str, Tag] = {tag.name: tag for tag in existing_tags}

tags: list[Tag] = []

new_tags: list[Tag] = []

# 입력된 tag_names 순서를 그대로 유지

for name in tag_names:

tag = existing_by_name.get(name)

if tag is not None:

tags.append(tag)

else:

new_tag = Tag(name=name)

new_tags.append(new_tag)

tags.append(new_tag)

existing_by_name[name] = new_tag

if new_tags:

self.session.add_all(new_tags)

await self.session.flush()

Copilot · 2025-12-26T03:02:59Z

        return updated_post

+    async def find_or_create_tags(self, *, tag_names: list[str]) -> list[Tag]:
+        """태그 이름 리스트를 받아서 Tag 엔티티 리스트를 반환. 없으면 생성."""


The find_or_create_tags method lacks a docstring explanation of its behavior, particularly regarding the transaction semantics and potential race conditions. Consider adding documentation about when flush() is called and how callers should handle potential integrity errors in concurrent scenarios.

Suggested change

"""태그 이름 리스트를 받아서 Tag 엔티티 리스트를 반환. 없으면 생성."""

"""

주어진 태그 이름 리스트에 대해 Tag 엔티티 리스트를 반환한다.

- 이미 존재하는 태그는 `name`(PK 또는 UNIQUE 컬럼) 기준으로 조회해서 그대로 반환한다.

- 존재하지 않는 태그는 새로 생성해서 세션에 추가한 뒤, **각 태그마다 즉시 `flush()`를 호출**한다.

트랜잭션/동시성 관련:

- 이 메서드는 세션 수준에서만 `flush()`를 수행하며, 트랜잭션의 `commit()`/`rollback()`은 호출하지 않는다.

따라서 실제 커밋 시점과 트랜잭션 경계는 호출자가 관리해야 한다.

- 태그 이름이 유니크(또는 PK) 제약을 가지는 경우, 여러 트랜잭션이 동시에 같은 이름의 태그를

생성하려고 하면 `flush()` 시점에 무결성 제약(예: IntegrityError)이 발생할 수 있다.

호출자는 다음과 같이 사용하는 것을 권장한다.

- 상위 서비스/핸들러에서 트랜잭션을 시작한 뒤 이 메서드를 호출한다.

- 동시 태그 생성 가능성이 있는 경우, IntegrityError 등을 캐치하여 재시도하거나,

실패 시 해당 태그를 다시 조회하는 패턴을 적용한다.

"""

Copilot · 2025-12-26T03:03:00Z

+    tags: Mapped[List["Tag"]] = relationship(
+        "Tag",
        default_factory=list,
+        secondary=post_tag_table,
+        back_populates="posts",
    )


The Post model's tags relationship doesn't have init=False, which means tags must be provided during Post initialization. However, in the create_post method, the Post is initialized without tags (line 138-143), and tags are assigned afterward (line 144). This inconsistency could cause issues. Consider adding init=False to the tags relationship definition in the Post model to match the pattern used for other relationships.

Copilot · 2025-12-26T03:03:00Z

+    async def find_or_create_tags(self, *, tag_names: list[str]) -> list[Tag]:
+        """태그 이름 리스트를 받아서 Tag 엔티티 리스트를 반환. 없으면 생성."""
+        if not tag_names:
+            return []
+
+        tags: list[Tag] = []
+        for name in tag_names:
+            # 기존 태그 찾기 (name이 PK이므로 get으로 조회 가능)
+            existing_tag = await self.session.get(Tag, name)
+            if existing_tag:
+                tags.append(existing_tag)
+            else:
+                # 새 태그 생성
+                new_tag = Tag(name=name)
+                self.session.add(new_tag)
+                await self.session.flush()
+                tags.append(new_tag)
+        return tags


The find_or_create_tags method doesn't validate tag names before creating Tag entities. While the schema has validation (max_length=30, pattern matching), tags could potentially be created with invalid names if this method is called from other contexts. Consider adding validation or at least trimming whitespace to ensure data consistency at the repository level.

Copilot · 2025-12-26T03:03:00Z

+    # 4. tag 테이블 구조 복원
+    op.drop_constraint("tag_pkey", "tag", type_="primary")
+    op.add_column("tag", sa.Column("id", sa.Integer, autoincrement=True))
+    op.add_column("tag", sa.Column("section", sa.String(20), nullable=True))


In the downgrade migration, after restoring the tag.id column as autoincrement, the existing tag names won't automatically get sequential IDs assigned. The tag table will have rows with NULL id values, which will violate the primary key constraint when you try to create it at line 119. Consider using a sequence or explicit ID assignment before creating the primary key constraint.

Suggested change

op.add_column("tag", sa.Column("section", sa.String(20), nullable=True))

op.add_column("tag", sa.Column("section", sa.String(20), nullable=True))

# 기존 태그 레코드들에 대해 id 값 할당 (name 기준 순번)

conn.execute(

sa.text(

"""

WITH numbered AS (

SELECT name, ROW_NUMBER() OVER (ORDER BY name) AS new_id

FROM tag

)

UPDATE tag AS t

SET id = numbered.new_id

FROM numbered

WHERE t.name = numbered.name

"""

)

)

Copilot · 2025-12-26T03:03:00Z

+    op.drop_constraint("tag_pkey", "tag", type_="primary")
+    op.drop_column("tag", "id")
+    op.drop_column("tag", "section")
+    op.alter_column("tag", "name", type_=sa.String(30), nullable=False)


The Tag model defines name with String(30) max length, but the migration creates the tag table column with String(30) in the upgrade, and when downgrading it restores the original column without specifying the length constraint. The original Tag model had String(20), so this creates an inconsistency. The downgrade should restore String(20) to match the original schema, or the comment should document that this is an intentional breaking change.

Copilot · 2025-12-26T03:03:01Z

+    # 1. post.tags ARRAY 컬럼 복원
+    op.add_column(
+        "post",
+        sa.Column("tags", sa.ARRAY(sa.String(length=100)), nullable=False, server_default="{}"),


The downgrade function sets nullable=False and server_default='{}' for the tags column, but this could fail if there are posts without any tags in the post_tags table. When COALESCE returns '{}', the server_default may not be necessary, but the nullable=False constraint should be reconsidered - consider making it nullable=True to handle edge cases where posts might not have any tags.

Suggested change

sa.Column("tags", sa.ARRAY(sa.String(length=100)), nullable=False, server_default="{}"),

sa.Column("tags", sa.ARRAY(sa.String(length=100)), nullable=True),

Copilot · 2025-12-26T03:03:01Z

+    async def find_or_create_tags(self, *, tag_names: list[str]) -> list[Tag]:
+        """태그 이름 리스트를 받아서 Tag 엔티티 리스트를 반환. 없으면 생성."""
+        if not tag_names:
+            return []
+
+        tags: list[Tag] = []
+        for name in tag_names:
+            # 기존 태그 찾기 (name이 PK이므로 get으로 조회 가능)
+            existing_tag = await self.session.get(Tag, name)
+            if existing_tag:
+                tags.append(existing_tag)
+            else:
+                # 새 태그 생성
+                new_tag = Tag(name=name)
+                self.session.add(new_tag)
+                await self.session.flush()
+                tags.append(new_tag)
+        return tags


In the find_or_create_tags method, there's a potential race condition when multiple concurrent requests try to create the same tag. Between checking if a tag exists (line 161) and creating it (line 166), another request might create the same tag, causing a primary key violation. Consider using a try-except block with IntegrityError or use an INSERT ... ON CONFLICT approach to handle concurrent tag creation safely.

Copilot · 2025-12-26T03:03:01Z

+        # 태그 엔티티 찾거나 생성
+        tags = await self.post_repository.find_or_create_tags(tag_names=post.tags)
+        await self.post_repository.update_post_tags(post=saved_post, tags=tags)
+
        updated_post = await self.post_repository.update_post(
            post_id=post_id,
            title=post.title,
            description=post.description,
            category=post.category,
-            tags=post.tags,
        )


The update_post_tags method is called before the update_post method, but update_post returns a new instance with joinedload options. This means the updated_post returned at line 212 may not reflect the tag changes made at line 210, potentially causing stale data to be returned to the client. Consider moving the tag update after the post update, or ensure the tags relationship is properly refreshed.

Copilot · 2025-12-26T03:03:01Z

+    op.create_index("ix_tag_id", "tag", ["id"])
+
+    # 5. 기존 태그에 ID 할당 및 post_tags 복원
+    conn.execute(sa.text("DELETE FROM post_tags"))


The downgrade migration deletes all data from post_tags at line 123, but at this point the array data has already been restored to the post.tags column (lines 98-106). This means the relationship data in post_tags is permanently lost during downgrade. If the migration is rolled back and then upgraded again, the post_tags relationships won't be recreated. Consider preserving this data or documenting that downgrade is destructive.

kjy2844 · 2025-12-27T08:34:31Z

거절합니다

SnowSuno requested a review from Copilot December 26, 2025 02:05

Copilot started reviewing on behalf of SnowSuno December 26, 2025 02:06 View session

Copilot AI reviewed Dec 26, 2025

View reviewed changes

SnowSuno requested a review from Copilot December 26, 2025 02:58

Copilot started reviewing on behalf of SnowSuno December 26, 2025 02:58 View session

Copilot AI reviewed Dec 26, 2025

View reviewed changes

SnowSuno closed this Jan 4, 2026

	tags: Mapped[List["Tag"]] = relationship(
	tags: Mapped[list["Tag"]] = relationship(

-        tags: list[Tag] = []
-        for name in tag_names:
-            # 기존 태그 찾기
-            existing_tag = await self.session.scalar(
-                select(Tag).where(Tag.name == name)
-            )
-            if existing_tag:
-                tags.append(existing_tag)
-            else:
-                # 새 태그 생성
-                new_tag = Tag(name=name, section=None)
-                self.session.add(new_tag)
-                await self.session.flush()
-                tags.append(new_tag)
-        return tags
+        # 중복 제거용 집합 (실제 반환 시에는 원래 순서 및 중복을 유지)
+        unique_names = set(tag_names)
+        # 한 번의 쿼리로 기존 태그 모두 조회
+        existing_tags_result = await self.session.scalars(
+            select(Tag).where(Tag.name.in_(unique_names))
+        )
+        existing_tags = list(existing_tags_result)
+        # 이름 -> Tag 매핑
+        tags_by_name: dict[str, Tag] = {tag.name: tag for tag in existing_tags}
+        # 아직 존재하지 않는 태그 이름들
+        missing_names = [name for name in unique_names if name not in tags_by_name]
+        new_tags: list[Tag] = []
+        for name in missing_names:
+            new_tag = Tag(name=name, section=None)
+            new_tags.append(new_tag)
+        if new_tags:
+            self.session.add_all(new_tags)
+            await self.session.flush()
+            for tag in new_tags:
+                tags_by_name[tag.name] = tag
+        # 입력된 tag_names 순서를 유지하고, 중복도 그대로 유지
+        return [tags_by_name[name] for name in tag_names]

-        """태그 이름 리스트를 받아서 Tag 엔티티 리스트를 반환. 없으면 생성."""
+        """
+        주어진 태그 이름 리스트에 대해 Tag 엔티티 리스트를 반환한다.
+        - 이미 존재하는 태그는 `name`(PK 또는 UNIQUE 컬럼) 기준으로 조회해서 그대로 반환한다.
+        - 존재하지 않는 태그는 새로 생성해서 세션에 추가한 뒤, **각 태그마다 즉시 `flush()`를 호출**한다.
+        트랜잭션/동시성 관련:
+        - 이 메서드는 세션 수준에서만 `flush()`를 수행하며, 트랜잭션의 `commit()`/`rollback()`은 호출하지 않는다.
+          따라서 실제 커밋 시점과 트랜잭션 경계는 호출자가 관리해야 한다.
+        - 태그 이름이 유니크(또는 PK) 제약을 가지는 경우, 여러 트랜잭션이 동시에 같은 이름의 태그를
+          생성하려고 하면 `flush()` 시점에 무결성 제약(예: IntegrityError)이 발생할 수 있다.
+        호출자는 다음과 같이 사용하는 것을 권장한다.
+        - 상위 서비스/핸들러에서 트랜잭션을 시작한 뒤 이 메서드를 호출한다.
+        - 동시 태그 생성 가능성이 있는 경우, IntegrityError 등을 캐치하여 재시도하거나,
+          실패 시 해당 태그를 다시 조회하는 패턴을 적용한다.
+        """

-    op.add_column("tag", sa.Column("section", sa.String(20), nullable=True))
+    op.add_column("tag", sa.Column("section", sa.String(20), nullable=True))
+    # 기존 태그 레코드들에 대해 id 값 할당 (name 기준 순번)
+    conn.execute(
+        sa.text(
+            """
+            WITH numbered AS (
+                SELECT name, ROW_NUMBER() OVER (ORDER BY name) AS new_id
+                FROM tag
+            )
+            UPDATE tag AS t
+            SET id = numbered.new_id
+            FROM numbered
+            WHERE t.name = numbered.name
+            """
+        )
+    )

	sa.Column("tags", sa.ARRAY(sa.String(length=100)), nullable=False, server_default="{}"),
	sa.Column("tags", sa.ARRAY(sa.String(length=100)), nullable=True),

Conversation

SnowSuno commented Dec 26, 2025

Uh oh!

SnowSuno commented Dec 26, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

kjy2844 commented Dec 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants