Core

The main api and classes provided by the library.

Converter

class data2neo.Converter[source]

Bases: object

The converter handles the whole conversion pipeline.

__init__(schema: str, iterator: ResourceIterator, neo4j_uri: str, neo4j_auth: Auth, num_workers: Optional[int] = None, serialize: bool = False, batch_size: int = 5000) None[source]

Initialises a converter. Note that this is a singleton and only the most recent instantiation is valid.

Parameters
  • schema – The schema to convert.

  • iterator – The resource iterator.

  • neo4j_uri – The uri of the neo4j database.

  • neo4j_auth – The authentication for the neo4j database.

  • num_workers – The number of parallel workers. Please make sure that your usage supports parallelism. To use serial processing set this to 1. (default: cpu_count-2)

  • serialize – If true, the converter will make sure that all resources are processed serially and does not use any buffering. This is useful if you want to make sure that all resources are processed and committed to the graph in the same order as they are returned by the iterator. Note that you can’t set both serialize to true and set num_workers > 1. (default: False)

  • batch_size – The batch size for the parallel processing. (default: 5000)

property iterator: ResourceIterator

Gets the resource iterator

__call__(progress_bar: tdqm.tqdm = None, skip_nodes=False, skip_relationships=False) None[source]

Runs the convertion and commits the produced nodes and relationships to the graph.

Parameters
  • progress_bar – An optional tqdm like instance for a progress bar.

  • skip_nodes – (default: False) If true creation of nodes will be skiped. ATTENTION: this might lead to problems if you use identifiers.

  • skip_relationships – If true creation of relationships will be skiped (default: False)

Resource

class data2neo.Resource[source]

Bases: object

Abstract Resource class. Contains everything a factory needs to produce its output. This must be implemented.

__init__() None[source]

Inits a Resource

property supplies: Dict

Returns access to supplies from past factories. Is used to pass data between factories. This should not be customised.

abstract property type: str

Returns the type of the resource. Is used to select correct factory.

abstract __repr__() str[source]

Gets a string representation of the resource. Only used for logging.

Should follow the format: NameOfResource ‘TypeOfResource’ (DetailsAboutResource)

Example-Implementation: f”{super().__repr__()} ({self.somedetail})”

abstract __getitem__(key: str) str[source]

Gets the value with key ‘key’.

abstract __setitem__(key: str, value: str) None[source]

Sets the value of with key ‘key’.

clear_supplies() None[source]

Clears the supplies

ResourceIterator

class data2neo.ResourceIterator[source]

Bases: ABC

Allows the Converter to iterate over resource. It allows to iterate over the same range twice.

abstract __init__() None[source]
__next__() Resource[source]

Gets the next resource that will be converted. Raises StopIteration if the range is traversed.

abstract __len__() int[source]

Returns the total amount of resources in the iterator

abstract __iter__() Iterable[source]

Returns the iterator itself in its initial state (must return the first resource).

IteratorIterator

class data2neo.IteratorIterator[source]

Bases: ResourceIterator

Allows to Iterator over a list of Iterators

__init__(iterators: List[ResourceIterator]) None[source]

Initialises an IteratorIterator.

Parameters

iterators – List of ResourceIterators

Attribute

class data2neo.Attribute[source]

Bases: object

Represents an attribute in an Node or in an Relation.

key

String signifying the key of the attribute

value

Can be any value that is allowed in the graph

__init__(key: str, value: Union[str, int, float, bool, datetime]) None[source]

Inits an attribute with a key and a value

Parameters
  • key – String signifying the key of the attribute

  • value – Can be any value that is allowed in the graph (String, Int, Float, Bool)

property key

String signifying the key of the attribute

property value

Any value that is allowed in the graph (String, Int, Float, Bool)

SubgraphFactoryWrapper

class data2neo.SubgraphFactoryWrapper[source]

Bases: FactoryWrapper, SubgraphFactory

Factory Wrapper for any SubgraphFactory. Allows to insert pre and post processor functions that are called on the resource/subgraph respectivelly before and after the .construct function of the wrapped factory. This factory behaves like a normal SubgraphFactory and can be wrapped again.

factory

The wrapped SubgraphFactory

__init__(factory: SubgraphFactory, preprocessor: Optional[Callable[[Resource], Resource]] = None, postprocessor: Optional[Callable[[Subgraph], Subgraph]] = None, identifier: Optional[str] = None) None[source]

Inits the SubGraphFactoryWrapper with an factory and a post and preprocessor

Parameters
  • identifier – A string identifying this Factory instance, must be unique

  • factory – The SubgraphFactory that should be wrapped

  • preprocessor – A callable that takes a Resource, processes it and returns a resource (which is then passed to the factory). (default: None)

  • postprocessor – A callabel that takes the Subgraph, processes it and returns another Subgraph. (default: None)

  • identifier – A string identifying this Factory instance, must be unique. Can be None if factory doesn’t need to save unique supplies

construct(resource: Resource) Subgraph[source]

Runs the preprocessor on the resource, uses the factory to construct an SubGraph from the resource and runs the postprocessor on this SubGraph.

If resource is None then this method returns an emtpy SubGraph

Parameters

resource – A Resource containing any information needed for the construction

property factory: Factory

The wrapped factory

property id: str

A string that uniquely identifies this factory instantiation

AttributeFactoryWrapper

class data2neo.AttributeFactoryWrapper[source]

Bases: FactoryWrapper, AttributeFactory

Factory Wrapper for any AttributeFactory. Allows to insert pre and post processor functions that are called on the resource/attribute respectivelly before and after the .construct function of the wrapped factory. This factory behaves like a normal AttributeFactory and can be wrapped again.

identifier

A string identifying this Factory instance, must be unique

factory

The wrapped AttributeFactory

attribute_key

The key that any produced Attribute has.

static_attribute_value

If this is set, then any attribute produced by this factory will have this string as its value

entity_attribute

A key of an attribute of the expected resource entity.

__init__(factory: AttributeFactory, preprocessor: Optional[Callable[[Resource], Resource]] = None, postprocessor: Optional[Callable[[Attribute], Attribute]] = None, identifier: Optional[str] = None) None[source]

Inits the AtttributeFactoryWrapper with an factory and a post and preprocessor

Parameters
  • factory – The AttributeFactory that should be wrapped

  • preprocessor – A callable that takes a Resource, processes it and returns a resource (which is then passed to the factory). (default: None)

  • postprocessor – A callabel that takes the Attribute, processes it and returns another Attribute. (default: None)

  • identifier – A string identifying this Factory instance, must be unique. Can be None if factory doesn’t need to save unique supplies

construct(resource: Resource) Attribute[source]

Runs the preprocessor on the resource, uses the factory to construct an Attribute from the resource and runs the postprocessor on this Attribute.

If resource is None then this method returns None

Parameters

resource – A Resource containing any information needed for the construction

property attribute_key: str

The key that any produced Attribute has.

property entity_attribute: str

A key of an attribute of the expected resource entity.

property factory: Factory

The wrapped factory

property id: str

A string that uniquely identifies this factory instantiation

property static_attribute_value: str

Static value for any produced attribute