Conversion Schema
The Conversion Schema defines which relational entities are converted to which graph elements (nodes and relationships).
As seen in the Introduction, the Converter
expects resources as inputs. A resource is a wrapper around a relational entity.
Each resource has a type associated with it that corresponds to the type of entity it wraps.
The type must be defined for every Resource
instance and accessible at Resource.type
.
In the schema file, we specify for each type what the Converter
should do when it encounters this type.
The schema allows for one-to-one, one-to-many and many-to-one (with the help of merging) conversions of relational entities to graph elements (nodes and relationships).
Note that the conversion schema compiler that the Converter
uses to parse the provided schema only does limited semantic checking. Make sure that you write correct conversion schema syntax.
Otherwise, problems or weird behaviour might arise during runtime.
We define the conversion schema in a schema file. The file follows a modified YAML schema syntax. We will now look at a simple example with flowers, flower species and people, that like flowers.
ENTITY("Flower"):
NODE("Flower") flower:
- sepal_length = Flower.sepal_length
- sepal_width = Flower.sepal_width
- petal_length = Flower.petal_length
- petal_width = Flower.petal_width
NODE("Species", "BioEntity") species:
+ Name = Flower.species
RELATIONSHIP(flower, "is", species):
ENTITY("Person"):
NODE("Person") person:
+ ID = Person.ID
- FirstName = Person.FirstName
- LastName = Person.LastName
RELATIONSHIP(person, "likes", MATCH("Species", Name=Person.FavoriteFlower)):
- Since = "4ever"
Entity
We define the entity type with ENTITY(
type ):
following the definitions for the graph elements that this type should be converted to.
A schema file can contain multiple type definitions. Under (in a YAML sense) an entity type definition, we can refer to attributes of the type with type.attribute_name
.
So if the table for our “Flower” entity looks as follows:
Table “Flower”
sepal_length |
sepal_width |
petal_length |
petal_width |
species |
---|---|---|---|---|
3.1 |
2.3 |
1.2 |
1.6 |
setosa |
2.4 |
1.3 |
1.3 |
0.6 |
setosa |
… |
… |
… |
… |
… |
we can access the sepal_length attribute with Flower.sepal_length
.
An entity can be converted into multiple graph elements: NODE
s and RELATIONSHIP
s.
Node
A node is defined with NODE(
label1, label2, … )
; in between the brackets,
we define the labels that the nodes should have.
We can also define the label based on an attribute of the entity: NODE(
type.attributename )
(e.g. NODE("Flower", Flower.species)
to add the species name as label to the node).
After a node definition, we can specify its internal identifier. The identifier is optional and is used for referring to this node when creating relationships, as we will see later.
The identifier is only valid within one entity and, therefore must be unique per entity. The full syntax for defining a node is:
NODE(label1, label2, ...) identifier :
Attributes
We can define the attributes of a node or a relationship under it as follows (indented following YAML format):
- attribute_name = type.entity_attribute_name
Going back to our example, if the node with identifier flower should have an attribute named sepal_length
that contains the value of the attribute sepal_length
of the entity “Flower”, we write
- sepal_length = Flower.sepal_length
.
The attribute name of the node/relationships must not be the same as the one of the entity. We could also do
- sl = Flower.sepal_length
to get a node with attribute sl
.
We can also set static attribute values (strings, ints, floats or bools (True/False) ) with
- a_static_string_attribute = "some string"
- a_static_bool = True
- a_static_int = 1
- a_static_flaot = 1.123
Relationship
A relationship is declared with RELATIONSHIP( source node(s), relationship type, destination node(s))
. The relationship type is a simple string that represents the relationships’s name.
This will create a relationship on the kartesian product of the source node(s) and the destination node(s) (from all sources to all destinations).
We have two options on how to set source and destination nodes:
- Use a node identifier (note that it must appear above the relationship declaration under the same entity). This allows us to set a single node. E.g. in our example, we have defined the two nodes with identifiers flower
and species
. We can now define a relationship between those two with RELATIONSHIP(flower, "is", species)
.
- Use the MATCH
keyword. With a matcher, we can query for arbitrary nodes in the graph. This is useful when the node we want to refer to is either from a different instance of the same entity or from an other entity (or already existing in the graph). A matcher can return single or multiple nodes.
Match
The match syntax works as follows: MATCH(label1, label2 , ... , attribute1=value1, attribute2=value2, ... )
. We first specify the required labels of the searched node(s).
We can define arbitrarily many labels, and the labels themselves can be extracted from the entity as we did with the nodes (e.g. Flower.species
).
We then define the conditions that the nodes we are looking for must meet by specifying which attributes have which values. We write myattribute=myvalue, meaning that any matched node must have the value myvalue for its attribute myattribute.
Again the value can be extracted from the entity (e.g. Name=Person.FavoriteFlower
). We can specify an arbitrary amount of conditions.
Merging nodes
If we expect that a node is created multiple times and we want to ensure that it is only created once, we can specify a primary attribute. This would be the case for the node with identifier species in our example. If multiple rows of the “Flower” table contain the same entry for the “species” column, the converter will create a node for this species for each row. So for our example table, we would end up with at least two “setosa” nodes. However, what we want is only one node for each species present.
For this purpose, we can specify a primary attribute and a primary label to merge a node with the graph.
The primary label is always the first one mentioned, so we reformulate the node definition to NODE( primary_label, label2, ...)
. The primary attribute is set by replacing the -
in the attribute definition with a +
:
+ attribute_name = type.entity_attribute_name
If the Converter
detects that a primary attribute is set, it will only create a new node if no node with the same primary label and primary attribute exists in the graph.
If the node already exists, it is updated, i.e. new attributes are added, and existing attributes are updated according to the specified values.
We can also use this to create nodes with information from different entity types. For example, lets assume we had an entity “Person” and an entity “Employee”, both of them containing a per-person-unique property:
Table “Person”
personId |
name |
… |
… |
… |
… |
Table “Employee”
personId |
employer |
… |
… |
… |
… |
To create a node that contains both attributes of the entity “Person” and the entity “Employee” we can use the above-explained syntax to merge the nodes:
ENTITY("Person"):
NODE("Person"):
+ id = Person.personId
- name = Person.name
ENTITY("Employee"):
NODE("Person"):
+ id = Employee.personId
- employer = Employee.employer
If you now supply both entities to the converter for every person the resulting nodes will have all the attributes
id
, name
and employer
. Note if you don’t supply both entities for a person the node will only contain the information from the single entity that it got.
Merging relationships
You can explicitly merge relationships by specifying a primary attribute. The syntax is the same as for nodes:
RELATIONSHIPS(from, "type", to):
+ primary_attribute = Entity.ID
- other_attribute = Entity.other
If you don’t specify a primary attribute and two entities result in the same from and to node, two relationships will be created in parallel. If you don’t want to set a primary attribute but still want to merge the relationships, use the MERGE_RELATIONSHIPS wrapper.
Wrappers
If you have registered wrappers (see here) you can refer to them in the conversion schema.
You simply use the syntax NameOfWrapper(wrappedcontent)
, similar to how you call a function. Find examples below.
Assuming you have defined the attributewrappers ATTRWRAPPER1
and ATTRWRAPPER2
, as well as the subgraphwrappers SGWRAPPER1
and SGWRAPPER2
:
- name = ATTRWRAPPER2(ATTRWRAPPER1(Person.personId))
RELATIONSHIP(person, ATTRWRAPPER2("likes"), MATCH(ATTRWRAPPER1("Species"), Name=ATTRWRAPPER(Person.FavoriteFlower))):
SGWRAPPER1(RELATIONSHIP(person, "likes", MATCH(ATTRWRAPPER1("Species"), Name=ATTRWRAPPER(Person.FavoriteFlower)))):
SGWRAPPER2(SGWRAPPER1(NODE("Flower"))):
Note that the library does no semantic checking of your schema. If you apply an attribute wrapper to a node or a relationship, the outcome is undefined and might result in unexpected behaviour/exceptions during runtime.