This repository was archived by the owner on Apr 18, 2025. It is now read-only.

Description
Hi. I am trying to confirm if all values in a Pandas-column are off type string. Doing this with IsDtypeValidation returns the error TypeError: Cannot interpret 'StringDtype' as a data type'. I made a topic on StackOverflow, and based on the comments I suspect that this might actually be in error in the IsDtypeValidation-class.
Is this an error? Or do I misuse the class/package?
import numpy as np
import pandas as pd
from pandas_schema.validation import IsDtypeValidation
series = pd.Series(["a", "b", "c"])
# Works as expected:
# Returns a validation warning as the series is of dtype 'object' and not 'string'.
print(f"dtype = {series.dtypes}") # Returns: dtype = object
idv = IsDtypeValidation(dtype=np.dtype(np.str))
validation_warnings = idv.get_errors(series=series)
print(validation_warnings[0]) # Returns: The column has a dtype of object which is not a subclass of the required type <U0
# But we know that the series only contains string-values. Thus convert_dtypes() below.
# Does not work as expected:
# Returns an error and traceback with 'TypeError: Cannot interpret 'StringDtype' as a data type'.
# Expected output should be no error or validation warning.
series = series.convert_dtypes()
print(f"dtype = {series.dtypes}") # Returns: dtype = string
idv = IsDtypeValidation(dtype=np.dtype(np.str))
validation_warnings = idv.get_errors(series=series) # Error occurs in this line: 'TypeError: Cannot interpret 'StringDtype' as a data type'
Besides that, awesome work! Really handy package.