-
Notifications
You must be signed in to change notification settings - Fork 0
transform bodyXMLto external content-tree in Go #110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transform bodyXMLto external content-tree in Go #110
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very important bit of the journey 👏
|
||
// Transform converts an external XHTML-formatted document into a content tree. | ||
// It returns an error if the input contains unsupported HTML elements | ||
// or does not comply with the content tree schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought (non-blocking): This is just food for thought. The JSON schemas have stricter rules than what we represent with the Go types. So I am thinking that we may want to still validate what is produced by this Transform
. But I think it's better to do so where this Transform
is used, in our case in the Content Tree API.
"http://www.ft.com/ontology/content/CustomCodeComponent": "/content/{{id}}", | ||
"http://www.ft.com/ontology/content/MediaResource": "/content/{{id}}", | ||
"http://www.ft.com/ontology/content/Video": "/content/{{id}}", | ||
"http://www.ft.com/ontology/company/PublicCompany": "/organisations/{{id}}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I am pretty sure that some old content pieces will have:
"http://www.ft.com/ontology/company/Organisation", let's add this one as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mimicked the transformation we perform in content-public-read. I don't see this URL their in the configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we map http://www.ft.com/ontology/company/Organisation
to /organisations/{{id}}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my previous comment is not very useful. I saw that contentTypeTemplates
is used in generateUrl
and used in mapping t.Tag == "concept"
. However if we need to count for the <concept>/<ft-concept>
tags, then we need to support a full mapping of all concepts that could go in the tag like Organisation, Person, etc
However we decided that we are going to remove the tag as part of @vlasakievft efforts to clean the historical content.
@lokendersinghft I suggest to keep the list as it is. At some point the transformer shouldn't need to work with <concept>/<ft-concept>
.
cb15bbd
to
33fa4d2
Compare
out := &contenttree.Root{ | ||
Type: contenttree.RootType, | ||
Body: m, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question, thought: Do we want to return this root node from the transformer? Isn't it the idea to give old (and new content published without tree) a tree body representation using this transformer. The current schema we use for validating payloads expects the following format:
{ "type": "body", "version": 1, "children": [ { ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided to keep it this way so that we can easily introduce stuff to it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (non-blocking): Consider whether you can separate the convertToContentTree
into smaller methods. For example one for the case *etree.Element
and one for the *etree.CharData
. In this way you would avoid the nested switch statements.
var contentType = struct { | ||
ImageSet string | ||
Video string | ||
Content string | ||
Article string | ||
CustomCodeComponent string | ||
ClipSet string | ||
}{ | ||
ImageSet: "http://www.ft.com/ontology/content/ImageSet", | ||
Video: "http://www.ft.com/ontology/content/Video", | ||
Content: "http://www.ft.com/ontology/content/Content", | ||
Article: "http://www.ft.com/ontology/content/Article", | ||
CustomCodeComponent: "http://www.ft.com/ontology/content/CustomCodeComponent", | ||
ClipSet: "http://www.ft.com/ontology/content/ClipSet", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion(non-blocking): Consider this:
type contentType = string
const (
ImageSet contentType = "http://www.ft.com/ontology/content/ImageSet"
Video = "http://www.ft.com/ontology/content/Video"
Content = "http://www.ft.com/ontology/content/Content"
Article = "http://www.ft.com/ontology/content/Article"
CustomCodeComponent = "http://www.ft.com/ontology/content/CustomCodeComponent"
ClipSet = "http://www.ft.com/ontology/content/ClipSet"
)
|
||
## Overview | ||
The Transformer converts external XHTML-formatted document into content tree. | ||
It supports format stored in the **internalComponent** collection as well as the one returned by the **Internal Content API**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: internalcomponents
collection.
nitpick: Maybe we can expand the description slightly to mention what is the difference between the two representations supported by the transformer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added tad bit of more information on this.
33fa4d2
to
807a56c
Compare
807a56c
to
66c99a0
Compare
This PR introduces a Go-based bodyXML → content tree transformer. The new transformer produces output in 1:1 parity with the current Node.js implementation, with the following differences:
JIRA --> https://financialtimes.atlassian.net/browse/UPPSF-6453