Skip to content
Keren Jin edited this page May 10, 2013 · 2 revisions

How do I validate a record with default values filled in?

Foo foo = new Foo();
ValidationResult result = ValidateDataAgainstSchema.validate(foo.data(), foo.schema(), new ValidationOptions(RequiredMode.FIXUP_ABSENT_WITH_DEFAULT));
assert(result.isValid());

This will fail if the underlying data is read-only and default values cannot be set for absent fields.

This will also work for partially filled in records. It will only add default values to fields that are absent.

How do I convert a Pegasus data schema to an Avro data schema

Requires data-avro module, or data-avro-*.jar.

At runtime, use

Foo foo = new Foo();
DataSchema pegasusSchema = foo.schema();
org.apache.avro.Schema avroSchema = SchemaTranslator.dataToAvroSchema(pegasusSchema);
String avsoSchemaInJson = avroSchema.toString();

From command line, to create text files with Avro schema from pdsc files (version 0.17.1 or higher)

# syntax:
# java [-Dgenerator.resolver.path=<path>] com.linkedin.data.avro.AvroSchemaGenerator <targetDir> [<pdscFile|schemaFullName>]...
java -Dgenerator.resolver.path=src/main/pegasus com.linkedin.data.avro.generator.AvroSchemaGenerator ../build/main/codegen/avro src/main/pegasus/com/linkedin/foo/*.pdsc
# or
java -Dgenerator.resolver.path=src/main/pegasus com.linkedin.data.avro.generator.AvroSchemaGenerator ../build/main/codegen/avro com.linkedin.foo.Foo

Classpath must be setup to include data-avro.jar and its dependencies.

How do I use a data type embedded inside a pdsc file?

You may experience errors like the following

Type cannot be resolved: 1,1: "a.b.D" cannot be resolved.

when a.b.D’s definition is embedded in another type, for example, a.b.C.

Embedded data types do not have their own pdsc file. As a result, such data type can only be referenced within the containing pdsc file. It can not be referenced externally. To change this behavior, pull out the definition of the internal type to a separate pdsc file.

Internally, this behavior is due to reason that the schema parser references the data types by filenames. pdsc file for a.b.C should be expected at pegasus/a/b/C.pdsc. Since Rest.li does not prefix the containing data type’s name to the embedded type’s name, when code>a.b.D is embedded in C.pdsc, the schema parser will not be able to find pegasus/a/b/D.pdsc.

Clone this wiki locally