AI 가드레일: PII 탐지 이해하기

이 글은 개인식별정보(PII)의 중요성과 정규식, Presidio, 트랜스포머와 같은 탐지 방법, 그리고 Weave를 활용한 평가 방식을 통해 정확하고 적응력 있는 데이터 보호를 구현하는 방법을 강조합니다. 이 글은 AI 번역본입니다. 오역이 있을 수 있으니 댓글로 알려 주세요.
Brett Young
Created on September 12|Last edited on September 12
Comment
오늘날처럼 서로 긴밀히 연결된 세계에서 개인식별정보(PII) 수많은 디지털 서비스와 플랫폼의 핵심에 있습니다. 데이터가 국경을 자유롭게 넘나드는 만큼, 명확한 기준을 마련하는 것이 중요합니다. 가드레일 을 위한 보호는 그 어느 때보다 중요해졌습니다. 이러한 가드레일은 법적 요구 사항, 윤리적 고려, 모범 사례 프레임워크를 바탕으로 구축되어 이름, 이메일 주소, 전화번호, 금융 정보와 같은 민감한 데이터를 안전하게 보호합니다. 건강 기록 보호된 상태로 유지됩니다.
이 아래에서 PII의 기본 개념을 살펴보고 이를 보호하기 위한 가드레일을 알아보겠습니다. 이러한 가드레일을 실제로 확인하고 싶다면 함께 제공된 Colab을 확인하세요.
﻿
그 외의 내용과 PII 가드레일을 다루는 데 필요한 코드가 궁금하다면 계속 읽어주세요.
﻿
목차PII란 무엇인가요?PII 가드레일이 중요한 이유PII의 중요성PII 탐지 가드레일 튜토리얼 정규식 기반 PII 탐지 가드레일Presidio 기반 PII 감지 가드레일트랜스포머 기반 PII 감지Weave로 PII 가드레일 평가성능 분석 결론 
﻿
PII란 무엇인가요?PII는 이름이나 이메일부터 금융 정보에 이르기까지 개인을 식별할 수 있는 데이터입니다. 법적, 윤리적, 기술적 가드레일은 은행, 의료, 교육 등 다양한 업계 전반에서 이러한 민감한 정보가 안전하게 보호되도록 보장합니다.
개인식별정보 제한사항 수집·저장·처리되는 개인의 프라이버시와 보안을 보호하기 위해 마련된 법적·윤리적·조직적 지침의 집합입니다. 이러한 제한사항은 관할 지역과 업계에 따라 다르지만, 일반적으로 개인식별정보(PII)의 활용을 정당한 목적에 한정하고 무단 접근, 오용, 노출을 방지하는 것을 목표로 합니다.
유럽연합의 일반개인정보보호법(GDPR)과 같은 법률에 따르면 PII 처리에는 엄격한 규제가 적용됩니다. GDPR은 데이터가 명확히 특정되고 정당한 목적을 위해서만 수집·처리되어야 한다고 요구합니다. 또한 개인의 데이터가 수집되기 전에 정보에 입각한 동의를 받아야 하며, 특히 국가 간 이전을 포함해 제3자에게 데이터를 공유하거나 전송하는 데 엄격한 제한을 둡니다. 아울러 조직은 민감한 데이터를 보호하기 위해 가명처리와 암호화와 같은 조치를 시행해야 하고, 개인에게 자신의 데이터에 대한 열람, 정정, 삭제 요청 권리를 제공해야 합니다.
미국에서는 PII 제한이 맥락과 정보의 구체적인 유형에 따라 달라집니다. 예를 들어, 건강보험 이동성과 책임에 관한 법(Health Insurance Portability and Accountability Act, HIPAA)은 의료 서비스 제공자가 건강 관련 PII를 처리하는 방식을 제한하며, 보호 대상 건강 정보(PHI)가 무단 공개로부터 안전하게 보호되도록 요구합니다. 마찬가지로 아동 온라인 프라이버시 보호법(Children’s Online Privacy Protection Act, COPPA)은 13세 미만 아동으로부터의 데이터 수집과 활용에 대해 엄격한 규제를 부과하며, 부모의 동의를 강조하고 데이터 보관 기간을 제한합니다.
많은 산업 분야에서는 PII 제한을 시행하기 위해 자체 표준을 채택해 왔습니다. 예를 들어, 금융 부문 결제 및 계정 정보를 보호하기 위해 결제카드 산업 데이터 보안 표준(PCI DSS)을 준��하며, 미국의 교육 기관은 학생 기록을 보호하기 위해 가족 교육 권리와 프라이버시법(FERPA)을 준수합니다. 이러한 프레임워크는 일반적으로 조직이 강력한 접근 통제를 구현하고, 데이터 관행을 정기적으로 감사하며, 직원에게 데이터 프라이버시에 관한 교육을 제공하도록 요구합니다.
PII 가드레일이 중요한 이유데이터 최소화, 사용자 동의, 엄격한 보안 조치를 요구함으로써 PII 가드레일은 침해와 무단 공개의 위험을 줄입니다. 또한 개인이 자신의 데이터를 열람, 정정, 삭제할 수 있는 구제 수단을 제공하여 디지털 서비스에 대한 지속적인 신뢰를 보장합니다.
PII의 중요성PII의 중요성은 프라이버시와 보안과의 본질적 연관성에 있습니다. 적절하게 처리될 경우, PII는 개인이 맞춤형 서비스를 이용할 수 있게 하고 조직이 보다 효율적으로 운영되도록 돕습니다. 그러나 PII를 오용하거나 부적절하게 처리하면 프라이버시 침해, 금전적 손실, 정서적 고통 등 심각한 결과를 초래할 수 있습니다. 사이버 범죄자들은 신원 도용과 사기를 저지르기 위해 종종 PII를 노리며, 시스템의 취약점이나 침해를 악용해 민감한 데이터에 무단으로 접근합니다.
전 세계적으로 유럽의 일반개인정보보호법(GDPR)과 미국의 캘리포니아 소비자 프라이버시법(CCPA)과 같은 법적 프레임워크가 마련되어 조직이 PII를 책임감 있게 수집·보관·처리하도록 보장하고 있습니다. 의료 분야의 맥락에서 미국의 건강보험 이동성과 책임에 관한 법(HIPAA)은 보호 대상 건강 정보(Protected Health Information, PHI)로 불리는 건강 관련 PII 보호를 위해 엄격한 기준을 설정하고 있습니다. HIPAA는 의료 제공자, 보험사, 그리고 그들의 비즈니스 파트너가 의료 기록과 기타 건강 정보를 보호하기 위한 조치를 구현하여, 이와 같이 고도로 민감한 데이터의 기밀성, 무결성, 가용성을 보장하도록 요구합니다. HIPAA를 준수하지 않을 경우 중대한 제재를 받을 수 있으며, 환자의 신뢰를 훼손할 수 있습니다.
사회가 디지털 플랫폼에 크게 의존할수록 PII를 보호하는 일은 단순한 규제 준수에 그치지 않고 윤리적 책임이기도 합니다. PII의 가치와 취약성을 올바로 인식함으로써, 조직은 데이터 중심성이 날로 높아지는 환경에서 이 민감한 정보를 보호하기 위한 선제적 조치를 취할 수 있습니다.
PII 탐지 가드레일 튜토리얼 PII를 탐지하고 보호하는 일은 컴플라이언스를 유지하고 사용자 프라이버시를 지키며 책임 있는 데이터 관행을 보장하기 위한 핵심 단계가 되었습니다. 이 튜토리얼에서는 간단한 정규식 기반 탐지부터 고급 AI 기반 방법에 이르기까지 다양한 PII 탐지 가드레일을 설정하는 방법을 보여 주며, 텍스트에서 PII를 자동으로 표시하고 처리하는 절차를 설명합니다.
먼저 시작하려면 다음 명령어로 safeguards 라이브러리를 설치하세요: 
git clone https://github.com/soumik12345/safeguards.git && cd safeguards && pip install -e .
정규식 기반 PII 탐지 가드레일The RegexEntityRecognitionGuardrail 미리 정의된 패턴을 사용해 텍스트에서 개인식별정보(PII)를 식별합니다. 이와 같은 직관적인 가드레일은 전화번호나 이메일 주소처럼 구조화된 데이터에 특히 효과적이며, 해석 가능성이 높고 설정이 쉽습니다. 그러나 고정된 특성 때문에 경계 사례나 다양한 텍스트 변형을 효과적으로 처리하지 못할 수 있습니다.
사용하기 Weave 감지된 엔터티를 기록하고 시각화하여 가드레일의 성능을 더 깊이 이해할 수 있게 해줍니다. 일반적으로는 다음을 사용해야 합니다 @Weave.op 추적하려는 모든 함수 위에 데코레이터를 추가하면 되지만, safeguards 라이브러리가 네이티브 통합을 제공하므로 Weave를 가져와 초기화하기만 하면 결과가 자동으로 추적됩니다: 
from safeguards.guardrails.entity_recognition import RegexEntityRecognitionGuardrail
import weave; weave.init("guardrails-pii")
# Define hardcoded sample data
test_cases = [
    {
        "input_text": "Contact me at john.doe@example.com or call me at (123) 456-7890.",
        "expected_entities": {
            "EMAIL": ["john.doe@example.com"],
            "TELEPHONENUM": ["(123) 456-7890"],
        },
    },
    {
        "input_text": "My SSN is 123-45-6789, and my credit card is 4111-1111-1111-1111.",
        "expected_entities": {
            "SOCIALNUM": ["123-45-6789"],
            "CREDITCARDNUMBER": ["4111-1111-1111-1111"],
        },
    },
]
﻿
# Initialize the regex-based guardrail
regex_guardrail = RegexEntityRecognitionGuardrail(should_anonymize=True)
﻿
# Process each test case
for i, case in enumerate(test_cases, 1):
    try:
        # Use the `guard` method for PII detection
        result = regex_guardrail.guard(case["input_text"])
        print(f"Test Case {i}")
        print(f"Input: {case['input_text']}")
        print(f"Expected Entities: {case['expected_entities']}")
        print(f"Detected Entities: {result}\n")
    except AttributeError as e:
        print(f"Error processing Test Case {i}: {e}")
﻿
The RegexEntityRecognitionGuardrail 는 이메일 주소와 전화번호 같은 일반적인 개인식별정보(PII)가 포함된 텍스트 샘플에 초기화되어 적용되었습니다. 가드 메서드는 정규식 패턴에 기반해 감지된 엔터티를 반환했고, 이는 Weave에 기록되었습니다. 덕분에 가드레일이 기대대로 동작한 부분과 엔터티를 놓친 부분을 쉽게 확인할 수 있었습니다. 단순한 사례에는 효과적이지만, 고정된 패턴에 의존하기 때문에 복잡한 텍스트에 대한 적응력은 제한됩니다.
Presidio 기반 PII 감지 가드레일The PresidioEntityRecognitionGuardrail 는 Microsoft의 Presidio 프레임워크를 기반으로 하며, 정규식 규칙과 문맥 인식 감지 기능을 결합합니다. 이 PII 가드레일은 단순 정규식만 사용할 때보다 더 유연하고, 형식이 조금 달라져도 엔터티를 인식할 수 있습니다. 코드는 다음과 같습니다: 
from safeguards.guardrails.entity_recognition import PresidioEntityRecognitionGuardrail
import weave; weave.init("guardrails-pii")
# Define hardcoded sample data
test_cases = [
    {
        "input_text": "Jane's email is jane.doe@gmail.com, and her phone is +1-800-555-1234.",
        "expected_entities": {
            "EMAIL_ADDRESS": ["jane.doe@gmail.com"],
            "PHONE_NUMBER": ["+1-800-555-1234"],
        },
    },
    {
        "input_text": "My passport number is A12345678, and I live in New York.",
        "expected_entities": {
            "US_PASSPORT": ["A12345678"],
            "LOCATION": ["New York"],
        },
    },
]
﻿
# Initialize the Presidio-based guardrail
presidio_guardrail = PresidioEntityRecognitionGuardrail(should_anonymize=True)
﻿
# Process each test case
for i, case in enumerate(test_cases, 1):
    try:
        # Use the `guard` method for PII detection
        result = presidio_guardrail.guard(case["input_text"])
        print(f"Test Case {i}")
        print(f"Input: {case['input_text']}")
        print(f"Expected Entities: {case['expected_entities']}")
        print(f"Detected Entities: {result}\n")
    except AttributeError as e:
        print(f"Error processing Test Case {i}: {e}")
﻿
 그 PresidioEntityRecognitionGuardrail 는 텍스트 샘플에 적용되어 이메일 주소와 여권 번호 같은 개인식별정보(PII)를 정규식보다 더 유연하게 식별했습니다. 가드 메서드는 엔터티 형식의 변형을 효과적으로 처리했고, 감지된 엔터티 집합을 반환했으며 이는 Weave에 기록되었습니다. 
트랜스포머 기반 PII 감지The TransformersEntityRecognitionGuardrail 용도 머신러닝 모델 텍스트에서 개인식별정보(PII)를 식별하기 위해 트랜스포머 같은 모델을 활용합니다. 문맥을 이해하고 비정형 데이터에 적응하는 능력 덕분에 복잡하거나 미묘한 상황에서 특히 강력합니다. 이 방식은 미리 정의된 규칙이 필요 없으며, 대신 사전 학습된 언어 모델의 성��에 의존합니다. 출력 결과를 Weave에 로깅하면 다른 방법과의 비교 및 상세한 성능 분석을 수행할 수 있습니다: 
from safeguards.guardrails.entity_recognition import TransformersEntityRecognitionGuardrail
import weave; weave.init("guardrails-pii")
﻿
﻿
# Define hardcoded sample data
test_cases = [
    {
        "input_text": "My name is Brett Johnson, and my phone number is +1 987-654-3210.",
        "expected_entities": {
            "PERSON": ["Alice Johnson"],
            "TELEPHONENUM": ["987-654-3210"],
        },
    },
    {
        "input_text": "The acc. # is 1234532289, and the credit card is 5555-5555-5555-5555.",
        "expected_entities": {
            "IP_ADDRESS": ["192.168.1.1"],
            "CREDITCARDNUMBER": ["5555-5555-5555-5555"],
        },
    },
]
﻿
# Initialize the transformer-based guardrail
transformer_guardrail = TransformersEntityRecognitionGuardrail(should_anonymize=True)
﻿
# Process each test case
for i, case in enumerate(test_cases, 1):
    try:
        # Use the `guard` method for PII detection
        result = transformer_guardrail.guard(case["input_text"])
        print(f"Test Case {i}")
        print(f"Input: {case['input_text']}")
        print(f"Expected Entities: {case['expected_entities']}")
        print(f"Detected Entities: {result}\n")
    except AttributeError as e:
        print(f"Error processing Test Case {i}: {e}")
﻿
The TransformersEntityRecognitionGuardrail 처리된 텍스트 샘플에서 어려운 문맥에서도 높은 정확도로 PII를 감지했습니다. 예를 들어, 더 긴 문장에 포함되거나 구조가 모호한 엔터티도 정확히 식별했습니다. 감지된 엔터티는 Weave에 로깅되어 다른 가드레일과의 성능 비교가 쉬워졌습니다. 스크립트를 실행한 뒤 Weave에서 보이는 화면은 다음과 같습니다. 
﻿
Weave로 PII 가드레일 평가각 가드레일로 PII를 감지한 뒤에는 성능을 체계적으로 평가하는 것이 중요합니다. Weave의 평가 프레임워크를 활용하면 감지된 엔터티와 정밀도, 재현율, F1 점수 같은 관련 지표를 로깅하고 비교하여 효과성을 명확히 파악할 수 있습니다. 이를 통해 각 가드레일이 다양한 데이터 유형에서 어떻게 성능을 보이는지에 대한 실행 가능한 인사이트를 제공하고, 특정 사용 사례에 가장 적합한 방법을 식별하는 데 도움을 줍니다.
우리는 서로 다른 PII 감지 가드레일 세 가지를 평가하겠습니다. RegexEntityRecognitionGuardrail, PresidioEntityRecognitionGuardrail, 그리고 TransformersEntityRecognitionGuardrail목표는 데이터셋 전반에서 PII 엔터티를 식별하는 성능을 정밀도, 재현율, F1 점수와 같은 평가 지표로 측정하는 것입니다. 아래는 평가를 실행할 코드입니다. 
import asyncio
import json
import random
from pathlib import Path
from typing import Dict, List, Optional
﻿
import weave
from datasets import load_dataset
from weave import Evaluation
from weave.scorers import Scorer
﻿
from safeguards.guardrails.entity_recognition import (
    RegexEntityRecognitionGuardrail, 
    PresidioEntityRecognitionGuardrail, 
    TransformersEntityRecognitionGuardrail
)
﻿
# Add this mapping dictionary near the top of the file
PRESIDIO_TO_TRANSFORMER_MAPPING = {
    "EMAIL_ADDRESS": "EMAIL",
    "PHONE_NUMBER": "TELEPHONENUM",
    "US_SSN": "SOCIALNUM",
    "CREDIT_CARD": "CREDITCARDNUMBER",
    "IP_ADDRESS": "IDCARDNUM",
    "DATE_TIME": "DATEOFBIRTH",
    "US_PASSPORT": "IDCARDNUM",
    "US_DRIVER_LICENSE": "DRIVERLICENSENUM",
    "US_BANK_NUMBER": "ACCOUNTNUM",
    "LOCATION": "CITY",
    "URL": "USERNAME",  # URLs often contain usernames
    "IN_PAN": "TAXNUM",  # Indian Permanent Account Number
    "UK_NHS": "IDCARDNUM",
    "SG_NRIC_FIN": "IDCARDNUM",
    "AU_ABN": "TAXNUM",  # Australian Business Number
    "AU_ACN": "TAXNUM",  # Australian Company Number
    "AU_TFN": "TAXNUM",  # Australian Tax File Number
    "AU_MEDICARE": "IDCARDNUM",
    "IN_AADHAAR": "IDCARDNUM",  # Indian national ID
    "IN_VOTER": "IDCARDNUM",
    "IN_PASSPORT": "IDCARDNUM",
    "CRYPTO": "ACCOUNTNUM",  # Cryptocurrency addresses
    "IBAN_CODE": "ACCOUNTNUM",
    "MEDICAL_LICENSE": "IDCARDNUM",
    "IN_VEHICLE_REGISTRATION": "IDCARDNUM",
}
﻿
﻿
class EntityRecognitionScorer(Scorer):
    """Scorer for evaluating entity recognition performance"""
﻿
    @weave.op()
    async def score(
        self, model_output: Optional[dict], input_text: str, expected_entities: Dict
    ) -> Dict:
        """Score entity recognition results"""
        if not model_output:
            return {"f1": 0.0}
﻿
        # Convert Pydantic model to dict if necessary
        if hasattr(model_output, "model_dump"):
            model_output = model_output.model_dump()
        elif hasattr(model_output, "dict"):
            model_output = model_output.dict()
﻿
        detected = model_output.get("detected_entities", {})
﻿
        # Map Presidio entities if needed
        if model_output.get("model_type") == "presidio":
            mapped_detected = {}
            for entity_type, values in detected.items():
                mapped_type = PRESIDIO_TO_TRANSFORMER_MAPPING.get(entity_type)
                if mapped_type:
                    if mapped_type not in mapped_detected:
                        mapped_detected[mapped_type] = []
                    mapped_detected[mapped_type].extend(values)
            detected = mapped_detected
﻿
        # Track entity-level metrics
        all_entity_types = set(list(detected.keys()) + list(expected_entities.keys()))
        entity_metrics = {}
﻿
        for entity_type in all_entity_types:
            detected_set = set(detected.get(entity_type, []))
            expected_set = set(expected_entities.get(entity_type, []))
﻿
            # Calculate metrics
            true_positives = len(detected_set & expected_set)
            false_positives = len(detected_set - expected_set)
            false_negatives = len(expected_set - detected_set)
﻿
            if entity_type not in entity_metrics:
                entity_metrics[entity_type] = {
                    "total_true_positives": 0,
                    "total_false_positives": 0,
                    "total_false_negatives": 0,
                }
﻿
            entity_metrics[entity_type]["total_true_positives"] += true_positives
            entity_metrics[entity_type]["total_false_positives"] += false_positives
            entity_metrics[entity_type]["total_false_negatives"] += false_negatives
﻿
            # Calculate per-entity metrics
            precision = (
                true_positives / (true_positives + false_positives)
                if (true_positives + false_positives) > 0
                else 0
            )
            recall = (
                true_positives / (true_positives + false_negatives)
                if (true_positives + false_negatives) > 0
                else 0
            )
            f1 = (
                2 * (precision * recall) / (precision + recall)
                if (precision + recall) > 0
                else 0
            )
﻿
            entity_metrics[entity_type].update(
                {"precision": precision, "recall": recall, "f1": f1}
            )
﻿
        # Calculate overall metrics
        total_tp = sum(
            metrics["total_true_positives"] for metrics in entity_metrics.values()
        )
        total_fp = sum(
            metrics["total_false_positives"] for metrics in entity_metrics.values()
        )
        total_fn = sum(
            metrics["total_false_negatives"] for metrics in entity_metrics.values()
        )
﻿
        overall_precision = (
            total_tp / (total_tp + total_fp) if (total_tp + total_fp) > 0 else 0
        )
        overall_recall = (
            total_tp / (total_tp + total_fn) if (total_tp + total_fn) > 0 else 0
        )
        overall_f1 = (
            2
            * (overall_precision * overall_recall)
            / (overall_precision + overall_recall)
            if (overall_precision + overall_recall) > 0
            else 0
        )
﻿
        entity_metrics["overall"] = {
            "precision": overall_precision,
            "recall": overall_recall,
            "f1": overall_f1,
            "total_true_positives": total_tp,
            "total_false_positives": total_fp,
            "total_false_negatives": total_fn,
        }
﻿
        return entity_metrics["overall"]
﻿
﻿
def load_ai4privacy_dataset(
    num_samples: int = 100, split: str = "validation"
) -> List[Dict]:
    """
    Load and prepare samples from the ai4privacy dataset.
﻿
    Args:
        num_samples: Number of samples to evaluate
        split: Dataset split to use ("train" or "validation")
﻿
    Returns:
        List of prepared test cases
    """
    # Load the dataset
    dataset = load_dataset("ai4privacy/pii-masking-400k")
﻿
    # Get the specified split
    data_split = dataset[split]
﻿
    # Randomly sample entries if num_samples is less than total
    if num_samples < len(data_split):
        indices = random.sample(range(len(data_split)), num_samples)
        samples = [data_split[i] for i in indices]
    else:
        samples = data_split
﻿
    # Convert to test case format
    test_cases = []
    for sample in samples:
        # Extract entities from privacy_mask
        entities: Dict[str, List[str]] = {}
        for entity in sample["privacy_mask"]:
            label = entity["label"]
            value = entity["value"]
            if label not in entities:
                entities[label] = []
            entities[label].append(value)
﻿
        test_case = {
            "description": f"AI4Privacy Sample (ID: {sample['uid']})",
            "input_text": sample["source_text"],
            "expected_entities": entities,
            "masked_text": sample["masked_text"],
            "language": sample["language"],
            "locale": sample["locale"],
        }
        test_cases.append(test_case)
﻿
    return test_cases
﻿
﻿
def save_results(
    weave_results: Dict, model_name: str, output_dir: str = "evaluation_results"
):
    """Save evaluation results to files"""
    output_dir = Path(output_dir)
    output_dir.mkdir(exist_ok=True)
﻿
    # Extract and process results
    scorer_results = weave_results.get("EntityRecognitionScorer", [])
    if not scorer_results or all(r is None for r in scorer_results):
        print(f"No valid results to save for {model_name}")
        return
﻿
    # Calculate summary metrics
    total_samples = len(scorer_results)
    passed = sum(1 for r in scorer_results if r is not None and not isinstance(r, str))
﻿
    # Aggregate entity-level metrics
    entity_metrics = {}
    for result in scorer_results:
        try:
            if isinstance(result, str) or not result:
                continue
﻿
            for entity_type, metrics in result.items():
                if entity_type not in entity_metrics:
                    entity_metrics[entity_type] = {
                        "precision": [],
                        "recall": [],
                        "f1": [],
                    }
                entity_metrics[entity_type]["precision"].append(metrics["precision"])
                entity_metrics[entity_type]["recall"].append(metrics["recall"])
                entity_metrics[entity_type]["f1"].append(metrics["f1"])
        except (AttributeError, TypeError, KeyError):
            continue
﻿
    # Calculate averages
    summary_metrics = {
        "total": total_samples,
        "passed": passed,
        "failed": total_samples - passed,
        "success_rate": (passed / total_samples) if total_samples > 0 else 0,
        "entity_metrics": {
            entity_type: {
                "precision": (
                    sum(metrics["precision"]) / len(metrics["precision"])
                    if metrics["precision"]
                    else 0
                ),
                "recall": (
                    sum(metrics["recall"]) / len(metrics["recall"])
                    if metrics["recall"]
                    else 0
                ),
                "f1": sum(metrics["f1"]) / len(metrics["f1"]) if metrics["f1"] else 0,
            }
            for entity_type, metrics in entity_metrics.items()
        },
    }
﻿
    # Save files
    with open(output_dir / f"{model_name}_metrics.json", "w") as f:
        json.dump(summary_metrics, f, indent=2)
﻿
    # Save detailed results, filtering out string results
    detailed_results = [
        r for r in scorer_results if not isinstance(r, str) and r is not None
    ]
    with open(output_dir / f"{model_name}_detailed_results.json", "w") as f:
        json.dump(detailed_results, f, indent=2)
﻿
﻿
def print_metrics_summary(weave_results: Dict):
    """Print a summary of the evaluation metrics"""
    print("\nEvaluation Summary")
    print("=" * 80)
﻿
    # Extract results from Weave's evaluation format
    scorer_results = weave_results.get("EntityRecognitionScorer", {})
    if not scorer_results:
        print("No valid results available")
        return
﻿
    # Calculate overall metrics
    total_samples = int(weave_results.get("model_latency", {}).get("count", 0))
    passed = total_samples  # Since we have results, all samples passed
    failed = 0
﻿
    print(f"Total Samples: {total_samples}")
    print(f"Passed: {passed}")
    print(f"Failed: {failed}")
    print(f"Success Rate: {(passed/total_samples)*100:.2f}%")
﻿
    # Print overall metrics
    if "overall" in scorer_results:
        overall = scorer_results["overall"]
        print("\nOverall Metrics:")
        print("-" * 80)
        print(f"{'Metric':<20} {'Value':>10}")
        print("-" * 80)
        print(f"{'Precision':<20} {overall['precision']['mean']:>10.2f}")
        print(f"{'Recall':<20} {overall['recall']['mean']:>10.2f}")
        print(f"{'F1':<20} {overall['f1']['mean']:>10.2f}")
﻿
    # Print entity-level metrics
    print("\nEntity-Level Metrics:")
    print("-" * 80)
    print(f"{'Entity Type':<20} {'Precision':>10} {'Recall':>10} {'F1':>10}")
    print("-" * 80)
﻿
    for entity_type, metrics in scorer_results.items():
        if entity_type == "overall":
            continue
﻿
        precision = metrics.get("precision", {}).get("mean", 0)
        recall = metrics.get("recall", {}).get("mean", 0)
        f1 = metrics.get("f1", {}).get("mean", 0)
﻿
        print(f"{entity_type:<20} {precision:>10.2f} {recall:>10.2f} {f1:>10.2f}")
﻿
﻿
def preprocess_model_input(example: Dict) -> Dict:
    """Preprocess dataset example to match model input format."""
    return {
        "prompt": example["input_text"],
        "model_type": example.get(
            "model_type", "unknown"
        ),  # Add model type for Presidio mapping
    }
﻿
﻿
def main():
    """Main evaluation function"""
    weave.init("guardrails-genie-pii-evaluation")
﻿
    # Load test cases
    test_cases = load_ai4privacy_dataset(num_samples=100)
﻿
    # Add model type to test cases for Presidio mapping
    models = {
        "regex": RegexEntityRecognitionGuardrail(should_anonymize=True),
        "presidio": PresidioEntityRecognitionGuardrail(should_anonymize=True),
        "transformers": TransformersEntityRecognitionGuardrail(should_anonymize=True)
    }
﻿
    scorer = EntityRecognitionScorer()
﻿
    # Evaluate each model
    for model_name, guardrail in models.items():
        print(f"\nEvaluating {model_name} model...")
        # Add model type to test cases
        model_test_cases = [{**case, "model_type": model_name} for case in test_cases]
﻿
        evaluation = Evaluation(
            dataset=model_test_cases,
            scorers=[scorer],
            preprocess_model_input=preprocess_model_input,
        )
﻿
        asyncio.run(evaluation.evaluate(guardrail))
﻿
﻿
if __name__ == "__main__":
    main()
평가 과정의 일관성을 보장하기 위해 Presidio와 트랜스포머 기반 모델 간의 엔터티 유형을 정규화하는 매핑 딕셔너리부터 정의합니다. 사용자 지정 EntityRecognitionScorer 엔터티 수준 비교를 처리하고 평가 지표를 계산하기 위한 클래스가 구현됩니다. 이 스코어러는 각 엔터티 유형에 대해 참양성, 거짓양성, 거짓음성을 고려합니다.
그다음 준비된 데이터셋은 다음을 사용하여 생성됩니다 load_ai4privacy_dataset 평가를 위해 테스트 케이스를 추출하고 구조화하는 함수입니다. 각 가드레일을 데이터셋에 적용한 뒤, 감지된 엔터티를 기대 결과와 비교합니다. 결과는 Weave의 평가 프레임워크를 사용해 로깅하고 저장하며, 이를 통해 가드레일 성능을 상세하게 분석하고 시각화할 수 있습니다.
Weave 로그는 전체 탐지 정확도 지표와 개별 엔터티 유형별 상세 지표를 포함해 각 가드레일의 성능을 명확하게 시각화합니다. 이처럼 세분화된 로깅은 방법 간 결과를 쉽게 비교할 수 있게 해 주며, 각 가드레일이 특정 PII 범주에서 어떻게 동작하는지를 부각합니다. 이러한 평가는 단순성, 적응성, 복잡한 데이터 처리의 견고성 등 사용 사례의 구체적인 요구 사항에 따라 가장 적합한 가드레일을 선택하는 데 도움이 됩니다.
아래에 일부 성능 지표를 표로 정리해 공유하겠습니다. 
모델 성능 요약
































ModelSuccess RateOverall PrecisionOverall RecallOverall F1
Regex0.0%0.030.500.06
Presidio12.0%0.090.170.12
Transformers77.0%0.810.830.82
﻿
엔터티 수준 F1 점수(상세)
















































































































Entity TypeRegexPresidioTransformers
EMAIL0.931.001.00
SURNAME0.050.000.86
TELEPHONENUM0.000.130.82
GIVENNAME0.080.000.90
CITY0.060.000.92
DRIVERLICENSENUM0.000.110.91
STREET0.000.000.89
TAXNUM0.000.031.00
USERNAME0.000.000.75
PASSWORD0.000.000.75
ZIPCODE0.420.000.53
ACCOUNTNUM0.240.200.77
DATEOFBIRTH0.000.001.00
IDCARDNUM0.000.130.67
CREDITCARDNUMBER0.400.330.80
BUILDINGNUM0.040.000.55
SOCIALNUM0.220.220.50
﻿
성능 분석 성능 요약은 방법에 따라 모델 효율성이 크게 달라짐을 보여줍니다. 정규식 기반 접근법은 성공률 0.0%, F1 점수 0.06, 낮은 정밀도와 재현율을 기록해, 관련 패턴을 효과적으로 포착하지 못함을 시사합니다.
Presidio는 성공률 12.0%와 F1 점수 0.12로 중간 수준의 성능을 보이며, 일부 엔터티는 식별할 수 있으나 정확도와 일관성에서 어려움을 겪는 것으로 보입니다. 트랜스포머는 성공률 77.0%와 F1 점수 0.82로 두드러진 성능을 보이며, 엔터티 수준 인식에서 정밀도와 재현율을 균형 있게 달성하는 높은 견고성과 우수한 능력을 입증합니다.이는 다음과 같은 고급 머신러닝 모델이 보여주는 바와 같습니다, 예를 들어 트랜스포머, 이 작업에서 정규식 및 규칙 기반과 같은 전통적 방법을 현저하게 능가합니다.
여러 방법을 겹쳐 적용하는 계층화 전략은 효과적일 수 있습니다. 먼저 정규식으로 빠르게 1차 필터링을 수행한 뒤, 표시된 콘텐츠에 대해 더 깊��� 있는 분석을 진행해 정확도를 높이고, 방법을 결합함으로써 중요한 애플리케이션에서 견고한 성능을 보장할 수 있습니다. 또한 특정 도메인에 맞게 접근법을 커스터마이즈하는 것도 유익한데, 여기에는 업계별 패턴을 추가하고 도메인 관련 데이터 형식으로 모델을 학습하며, 신뢰도 임계값을 미세 조정하는 작업이 포함됩니다.
마지막으로, 정기적인 모니터링은 시스템 개선의 핵심 동력이 될 수 있습니다. 이를 통해 거짓 양성 및 거짓 음성을 추적하고, 놓친 사례를 해결하기 위해 패턴을 업데이트하며, 모델을 재학습해 최신 상태로 유지할 수 있습니다.
결론 데이터 중심의 세상에서 개인식별정보(PII)를 탐지하고 보호하기 위한 견고한 가드레일은 필수적입니다. 정규식처럼 빠르고 해석 가능한 방법을 선택하든, 트랜스포머 기반 모델의 미세한 문맥 이해 능력을 택하든, 가드레일 전략은 지속적인 모니터링과 업데이트, 그리고 규제 기준과의 정합성을 최우선으로 해야 합니다.
Weave와 같은 도구는 각 가드레일의 성능을 가시적으로 보여 주어, 전체 데이터 보호 전략을 정교하게 다듬을 수 있게 합니다. 각 접근법의 강점과 한계를 이해하고 이를 정기적으로 업데이트함으로써, 높은 수준의 개인정보 보호와 보안을 유지하고 민감한 개인식별정보가 철저히 보호되도록 보장할 수 있습니다.
Securing your LLM applications against prompt injection attacks
We will focus on understanding prompt injection attacks in AI systems and explore effective strategies to prevent against them!
PHI and PII for healthcare in the world of AI
A practical guide on working with health data, safely, with multiple approaches for handling PHI
Creating a predictive models to assess the risk of mortgage clients
My top tips for competing in Kaggle Challenges like the Home Credit Risk Model Stability Challenge.
Evaluating LLMs on Amazon Bedrock
Discover how to use Amazon Bedrock in combination with W&B Weave to evaluate and compare Large Language Models (LLMs) for summarization tasks, leveraging Bedrock’s managed infrastructure and Weave’s advanced evaluation features.  
﻿
﻿
 이 글은 AI로 번역된 기사입니다. 오역이 의심되는 부분이 있으면 댓글로 알려 주세요. 원문 보고서는 아래 링크에서 확인할 수 있습니다: 원문 보고서 보기﻿
﻿
Model	Success Rate	Overall Precision	Overall Recall	Overall F1
Regex	0.0%	0.03	0.50	0.06
Presidio	12.0%	0.09	0.17	0.12
Transformers	77.0%	0.81	0.83	0.82
Entity Type	Regex	Presidio	Transformers
EMAIL	0.93	1.00	1.00
SURNAME	0.05	0.00	0.86
TELEPHONENUM	0.00	0.13	0.82
GIVENNAME	0.08	0.00	0.90
CITY	0.06	0.00	0.92
DRIVERLICENSENUM	0.00	0.11	0.91
STREET	0.00	0.00	0.89
TAXNUM	0.00	0.03	1.00
USERNAME	0.00	0.00	0.75
PASSWORD	0.00	0.00	0.75
ZIPCODE	0.42	0.00	0.53
ACCOUNTNUM	0.24	0.20	0.77
DATEOFBIRTH	0.00	0.00	1.00
IDCARDNUM	0.00	0.13	0.67
CREDITCARDNUMBER	0.40	0.33	0.80
BUILDINGNUM	0.04	0.00	0.55
SOCIALNUM	0.22	0.22	0.50
Add a comment