From b2bce47a53ee192b0fd8c4c96ce9921a68197ad9 Mon Sep 17 00:00:00 2001 From: Joel Grus Date: Thu, 14 Dec 2023 11:26:01 -0600 Subject: [PATCH] mapping as a concept --- concepts/mapping.mdx | 79 +++++++++++++++++++ .../flatfile-mapping-python.mdx | 9 +++ mint.json | 5 ++ 3 files changed, 93 insertions(+) create mode 100644 concepts/mapping.mdx create mode 100644 developer-tools/mapping-libraries/flatfile-mapping-python.mdx diff --git a/concepts/mapping.mdx b/concepts/mapping.mdx new file mode 100644 index 00000000..440fc7d3 --- /dev/null +++ b/concepts/mapping.mdx @@ -0,0 +1,79 @@ +--- +title: "Mapping" +description: "Mapping" +icon: "map" +--- + +In Flatfile, _mapping_ describes the process of getting data from an uploaded file +into a destination worksheet in your application. + +## The Ghost of Mapping Past + +Historically mapping in Flatfile has consisted of two pieces: + +1. A _field mapping_ which describes which columns in the source data + corresponds to which fields in the destination worksheet. For example, maybe the "fname" + column in the source data should be mapped to the "First Name" field in the destination worksheet. +2. For every destination field of type [enum](/blueprint/field-types#enum), + an _enum mapping_ describing which values in the source column + correspond to which allowable enum values in the destination column. For example, you could have an enum + column of country codes, and you might need the source value "United States" to map to the enum value "US". + +Flatfile has developed machine-learning models trained on millions of historical mappings that can suggest +both field and enum mappings based on the column names and values in the source data. You can of course +override these suggestions. The Flatfile platform remembers your previously chosen mappings and will +suggest them when it sees similar schemas / data in the future. + +## The Ghost of Mapping Present + +Many users of Flatfile need more control over the mapping process than +just assigning source fields to destination fields. For example, you might need to +concatenate multiple source fields into a single destination field, or you might need to +choose the first non-empty value from a set of source fields. + +To this end, we've recently introduced a wider variety of +[mapping rules](https://github.com/FlatFilers/flatfile-mapping/blob/main/README.md#mapping-rules) +that specify these more complex mapping-related transformations on your data, +and the notion of a _mapping program_ that selectively applies these rules to your dataset. + +As of right now the only rule type supported by the Flatfile application is the `assign` rule, +as described above. + +However, we have introduced a [Python SDK](developer-tools/mapping-libraries/flatfile-mapping-python) +that allows you to create your own mapping rules and mapping programs and apply them to your +own datasets outside of Flatfile (e.g. in an ETL pipeline). +The SDK is currently in extreme beta, but we would love to hear +your feedback on it. It supports the full complement of mapping rules, as described on its website. + +If you have a Flatfile secret key, the SDK also exposes an "automapping" feature that uses +the previously-described machine learning models to suggest mapping programs for a given +source and destination schema. As with the Flatfile app, this feature currently only provides `assign` rules +(and also `ignore` rules that don't do anything but just communicate that a source field was not mapped). + +## The Ghost of Mapping Future + +In the near future we will be expanding mapping in several ways: + +### Richer Mapping Rules in the Flatfile application + +We are working on adding support for the full set of mapping rules +in the Flatfile application. This will allow a user mapping an incoming file +to create complex mapping programs, save them, and reuse them. + +### Richer AI Assist for Mapping Rules + +Currently our machine learning algorithms only suggest "assign" rules. +In the future they will also be able to suggest more complex mapping rules; +for instance, if your incoming data had a "first name" and "last name" field, +and your destination sheet only a "full name" field, the AI might suggest a +"concatenate" rule that combines the two source fields into the destination field. + +In addition, we are working on a natural-language interface for creating mapping rules; +for instance, you could type "concatenate first name and last name into full name" +and the AI would produce the appropriate rule. + +### JavaScript SDK + +We are working on a JavaScript SDK that has similar functionality to the Python SDK, +the most notable difference being that it can't interact with Pandas dataframes. +It will be available soon. diff --git a/developer-tools/mapping-libraries/flatfile-mapping-python.mdx b/developer-tools/mapping-libraries/flatfile-mapping-python.mdx new file mode 100644 index 00000000..e3217c8a --- /dev/null +++ b/developer-tools/mapping-libraries/flatfile-mapping-python.mdx @@ -0,0 +1,9 @@ +--- +title: "flatfile-mapping (Python)" +description: "Python library for running mapping programs over your own data" +icon: "github" +--- + +See documentation on GitHub: + +https://github.com/flatfilers/flatfile-mapping diff --git a/mint.json b/mint.json index 8caa9242..4e0a2cb1 100644 --- a/mint.json +++ b/mint.json @@ -113,6 +113,7 @@ "concepts/workbooks", "concepts/actions", "concepts/jobs", + "concepts/mapping", "concepts/events", "concepts/spaces", "concepts/listeners" @@ -238,6 +239,10 @@ "developer-tools/core-libraries/wrappers/vanilla" ] }, + { + "group": "Mapping libraries", + "pages": ["developer-tools/mapping-libraries/flatfile-mapping-python"] + }, { "group": "Plugins", "pages": ["plugins/overview"]