Customising Resource and ResourceIterator

The Resource and ResourceIterator classes are abstract classes and need to be implemented for your specific relational input data type. The library comes with implementations for a few existing relational modules. If you data type is not supported, you can easily create implementations for your data; see below for an explanation or check out one of the already implemented modules.

If you think your implementations could be helpful for others as well, read through the chapter Information for Developers and create a pull request.

Resource

The Resource is simply a wrapper around a relational entity. To create your own resource, you create a new class and inherit from Resource. The parent constructor must be called with no arguments. Additionally, you need to implement following methods: type, __getitem__, __setitem__, __repr__. See the docs strings below for more details.

from data2neo import Resource

class MyResource(Resource):
    def __init__(...your constructor parameters...):
        super().__init__()
        # your constructor code
        ...

    @property
    def type(self) -> str:
        """
        Returns the type of the resource as a string. The @property decorator makes this a property of the resource. The type is then accessible via resource.type without braces.
        """
        ...

    def __getitem__(self, key: str) -> str:
        """
        Returns the value at key 'key' of the underlying relational entity.
        Makes an item retrievable via resource["mykey"].
        """
        ...

    def __setitem(self, key: str, value: str) -> None:
        """
        Sets/Changes the value of the underlying relational entity at key 'key'.
        It must be able to change existing key and also add new keys.
        Makes an item settable via resource["mykey"] = "somevalue".
        """
        ...

    def __repr__(self) -> str:
        """
        Gets a string representation of the resource. Only used for logging.

        Should follow the format:
        NameOfResource 'TypeOfResource' (DetailsAboutResource)

        super().__repr__() returns "NameOfResource 'TypeOfResource'"

        Example-Implementation:
        f"{super().__repr__()} ({self.somedetail})"
        """
        return f"{super().__repr__()} ({... some detail about your resource ...})"

ResourceIterator

If you create your own Resource, you also need to create a ResourceIterator for this Resource that allows the Converter to iterate over a set of resources. The iterator is based on python iterators and needs to be written such that it can be traversed multiple times (The Converter will traverse it once for all nodes and then again for all relations).

Your iterator class must inherit from ResourceIterator, and its constructor must call the parent constructor with no arguments. Additionally, you need to implement following methods: __iter__, __len__ and optionally __next__. See the docs strings below for more details. Note that your iterator can also traverse resources of different types.

from data2neo import ResourceIterator

class MyIterator(ResourceIterator):
    def __init__(self, ...your constructor parameters...) -> None:
        super().__init__()
        ...

    def __next__(self) -> Resource:
        """
        Returns the next resource in the iterator based on the current state.
        """
        ...

    def __iter__(self) -> ResourceIterator:
        """
        Resets the iterator state and returns the iterator itself. You can also the __iter__ function to directly return an iterator and not
        use __next__.
        """
        ...

    def __len__(self) -> None:
        """
        Returns the total amount of resources in the iterator. This is only required if you use a progress bar and is used to compute the percentage of completed work
        """
        ...

A note on multiprocessing: If you intend to parallelise your conversion with multiple workers (see chapter Converter), be aware that next(iterator) is not parallelised. If you want to leverage multiple threads for loading remote data, you must implement this in the Resource class (in __getitem__).

IteratorIterator

The IteratorIterator allows you to iterate over multiple “sub”-iterators. There are no restrictions on the “sub”-iterators, as long as they are of type ResourceIterator. Since an IteratorIterator is also of type ResourceIterator, it can be used recursively.

from data2neo import IteratorIterator
iterator1 = ... # An iterator
iterator2 = ... # Another iterator
itit = IteratorIterator([iterator1, iterator2])

Existing relational modules

Pandas

With the PandasDataFrameIterator you can wrap a pandas dataframe. If you pass a pandas dataframe to the PandasDataFrameIterator it will automatically create PandasSeriesResource out of all rows (series) and iterate over them. Since a dataframe has no type associated, you need to also provide a type name.

from data2neo.relational_modules.pandas import PandasDataFrameIterator
iterator = PandasDataFrameIterator(pandas.DataFrame(...), "MyType")

SQLite

With the SQLiteIterator you can iterate over a sqlite database. You need to provide a connection to the database. You can also provide a list of tables to iterate over. If you do not provide a list of tables, the iterator will iterate over all tables in the database. data2neo requires primary keys, so if your tables do not have primary keys, you need to provide a dictionary with table, primary key pairs.

By default the Iterator will mix all tables together. If you want to iterate over tables one after another, you can set the mix_tables parameter to False.

The python implementation of sqlite will often throw warnings if a new process is spawned. You can disable these warnings by setting the check_same_thread parameter to False. data2neo does not share the connection between processes, only the master processes requests data from the database.

from data2neo.relational_modules.sqlite import SQLiteIterator
import sqlite3

connection = sqlite3.connect("mydatabase.db", check_same_thread=False)
iterator = SQLiteIterator(connection, filter=["table1", "table2"], primary_keys={"table3": "id"})