apache tinkerpop logo

4.0.0-SNAPSHOT

Introduction

tinkerpop cityscape

This document discusses Apache TinkerPop™ implementation details that are most useful to developers who implement TinkerPop interfaces and the Gremlin language. This document may also be helpful to Gremlin users who simply want a deeper understanding of how TinkerPop works and what the behavioral semantics of Gremlin are. The Provider Section outlines the various integration and extension points that TinkerPop has while the Gremlin Semantics Section documents the Gremlin language itself.

Providers who rely on the TinkerPop execution engine generally receive the behaviors described in the Gremlin Semantics section for free, but those who develop their own engine or extend upon the certain features should refer to that section for the details required for a consistent Gremlin experience.

Provider Documentation

TinkerPop exposes a set of interfaces, protocols, and tests that make it possible for third-parties to build libraries and systems that plug-in to the TinkerPop stack. TinkerPop refers to those third-parties as "providers" and this documentation is designed to help providers understand what is involved in developing code on these lower levels of the TinkerPop API.

This document attempts to address the needs of the different providers that have been identified:

  • Graph System Provider

    • Graph Database Provider

    • Graph Processor Provider

  • Graph Driver Provider

  • Graph Language Provider

  • Graph Plugin Provider

Graph System Provider Requirements

tinkerpop enabled At the core of TinkerPop 3.x is a Java API. The implementation of this core API and its validation via the gremlin-test suite is all that is required of a graph system provider wishing to provide a TinkerPop-enabled graph engine. Once a graph system has a valid implementation, then all the applications provided by TinkerPop (e.g. Gremlin Console, Gremlin Server, etc.) and 3rd-party developers (e.g. Gremlin-Scala, Gremlin-JS, etc.) will integrate properly. Finally, please feel free to use the logo on the left to promote your TinkerPop implementation.

Graph Structure API

The graph structure API of TinkerPop provides the interfaces necessary to create a TinkerPop enabled system and exposes the basic components of a property graph to include Graph, Vertex, Edge, VertexProperty and Property. The structure API can be used directly as follows:

Graph graph = TinkerGraph.open(); 1
Vertex marko = graph.addVertex(T.label, "person", T.id, 1, "name", "marko", "age", 29); 2
Vertex vadas = graph.addVertex(T.label, "person", T.id, 2, "name", "vadas", "age", 27);
Vertex lop = graph.addVertex(T.label, "software", T.id, 3, "name", "lop", "lang", "java");
Vertex josh = graph.addVertex(T.label, "person", T.id, 4, "name", "josh", "age", 32);
Vertex ripple = graph.addVertex(T.label, "software", T.id, 5, "name", "ripple", "lang", "java");
Vertex peter = graph.addVertex(T.label, "person", T.id, 6, "name", "peter", "age", 35);
marko.addEdge("knows", vadas, T.id, 7, "weight", 0.5f); 3
marko.addEdge("knows", josh, T.id, 8, "weight", 1.0f);
marko.addEdge("created", lop, T.id, 9, "weight", 0.4f);
josh.addEdge("created", ripple, T.id, 10, "weight", 1.0f);
josh.addEdge("created", lop, T.id, 11, "weight", 0.4f);
peter.addEdge("created", lop, T.id, 12, "weight", 0.2f);
  1. Create a new in-memory TinkerGraph and assign it to the variable graph.

  2. Create a vertex along with a set of key/value pairs with T.label being the vertex label and T.id being the vertex id.

  3. Create an edge along with a set of key/value pairs with the edge label being specified as the first argument.

In the above code all the vertices are created first and then their respective edges. There are two "accessor tokens": T.id and T.label. When any of these, along with a set of other key value pairs is provided to Graph.addVertex(Object…​) or Vertex.addEdge(String,Vertex,Object…​), the respective element is created along with the provided key/value pair properties appended to it.

Below is a sequence of basic graph mutation operations represented in Java:

basic mutation

// create a new graph
Graph graph = TinkerGraph.open();
// add a software vertex with a name property
Vertex gremlin = graph.addVertex(T.label, "software",
                             "name", "gremlin"); 1
// only one vertex should exist
assert(IteratorUtils.count(graph.vertices()) == 1)
// no edges should exist as none have been created
assert(IteratorUtils.count(graph.edges()) == 0)
// add a new property
gremlin.property("created",2009) 2
// add a new software vertex to the graph
Vertex blueprints = graph.addVertex(T.label, "software",
                                "name", "blueprints"); 3
// connect gremlin to blueprints via a dependsOn-edge
gremlin.addEdge("dependsOn",blueprints); 4
// now there are two vertices and one edge
assert(IteratorUtils.count(graph.vertices()) == 2)
assert(IteratorUtils.count(graph.edges()) == 1)
// add a property to blueprints
blueprints.property("created",2010) 5
// remove that property
blueprints.property("created").remove() 6
// connect gremlin to blueprints via encapsulates
gremlin.addEdge("encapsulates",blueprints) 7
assert(IteratorUtils.count(graph.vertices()) == 2)
assert(IteratorUtils.count(graph.edges()) == 2)
// removing a vertex removes all its incident edges as well
blueprints.remove() 8
gremlin.remove() 9
// the graph is now empty
assert(IteratorUtils.count(graph.vertices()) == 0)
assert(IteratorUtils.count(graph.edges()) == 0)
// tada!

The above code samples are just examples of how the structure API can be used to access a graph. Those APIs are then used internally by the process API (i.e. Gremlin) to access any graph that implements those structure API interfaces to execute queries. Typically, the structure API methods are not used directly by end-users.

Implementing Gremlin-Core

The classes that a graph system provider should focus on implementing are itemized below. It is a good idea to study the TinkerGraph (in-memory OLTP and OLAP in tinkergraph-gremlin), Neo4jGraph (OLTP w/ transactions in neo4j-gremlin) and/or HadoopGraph (OLAP in hadoop-gremlin) implementations for ideas and patterns.

  1. Online Transactional Processing Graph Systems (OLTP)

    1. Structure API: Graph, Element, Vertex, Edge, Property and Transaction (if transactions are supported).

    2. Process API: TraversalStrategy instances for optimizing Gremlin traversals to the provider’s graph system (i.e. TinkerGraphStepStrategy).

  2. Online Analytics Processing Graph Systems (OLAP)

    1. Everything required of OLTP is required of OLAP (but not vice versa).

    2. GraphComputer API: GraphComputer, Messenger, Memory.

Please consider the following implementation notes:

  • Use StringHelper to ensuring that the toString() representation of classes are consistent with other implementations.

  • Ensure that your implementation’s Features (Graph, Vertex, etc.) are correct so that test cases handle particulars accordingly.

  • Use the numerous static method helper classes such as ElementHelper, GraphComputerHelper, VertexProgramHelper, etc.

  • There are a number of default methods on the provided interfaces that are semantically correct. However, if they are not efficient for the implementation, override them.

  • Implement the structure/ package interfaces first and then, if desired, interfaces in the process/ package interfaces.

  • ComputerGraph is a Wrapper system that ensure proper semantics during a GraphComputer computation.

  • The javadoc is often a good resource in understanding expectations from both the user’s perspective as well as the graph provider’s perspective. Also consider examining the javadoc of TinkerGraph which is often well annotated and the interfaces and classes of the test suite itself.

OLTP Implementations

pipes character 1 The most important interfaces to implement are in the structure/ package. These include interfaces like Graph, Vertex, Edge, Property, Transaction, etc. The StructureStandardSuite will ensure that the semantics of the methods implemented are correct. Moreover, there are numerous Exceptions classes with static exceptions that should be thrown by the graph system so that all the exceptions and their messages are consistent amongst all TinkerPop implementations.

The following bullets provide some tips to consider when implementing the structure interfaces:

  • Graph

    • Be sure the Graph implementation is named as XXXGraph (e.g. TinkerGraph, Neo4jGraph, HadoopGraph, etc.).

    • This implementation needs to be GraphFactory compatible which means that the implementation should have a static Graph open(Configuration) method where the Configuration is an Apache Commons class of that name. Alternatively, the Graph implementation can have the GraphFactoryClass annotation which specifies a class with that static Graph open(Configuration) method.

  • VertexProperty

    • This interface is both a Property and an Element as VertexProperty is a first-class graph element in that it can have its own properties (i.e. meta-properties). Even if the implementation does not intend to support meta-properties, the VertexProperty needs to be implemented as an Element. VertexProperty should return empty iterable for properties if meta-properties is not supported.

OLAP Implementations

furnace character 1 Implementing the OLAP interfaces may be a bit more complicated. Note that before OLAP interfaces are implemented, it is necessary for the OLTP interfaces to be, at minimal, implemented as specified in OLTP Implementations. A summary of each required interface implementation is presented below:

  1. GraphComputer: A fluent builder for specifying an isolation level, a VertexProgram, and any number of MapReduce jobs to be submitted.

  2. Memory: A global blackboard for ANDing, ORing, INCRing, and SETing values for specified keys.

  3. Messenger: The system that collects and distributes messages being propagated by vertices executing the VertexProgram application.

  4. MapReduce.MapEmitter: The system that collects key/value pairs being emitted by the MapReduce applications map-phase.

  5. MapReduce.ReduceEmitter: The system that collects key/value pairs being emitted by the MapReduce applications combine- and reduce-phases.

Note
The VertexProgram and MapReduce interfaces in the process/computer/ package are not required by the graph system. Instead, these are interfaces to be implemented by application developers writing VertexPrograms and MapReduce jobs.
Important
TinkerPop provides two OLAP implementations: TinkerGraphComputer (TinkerGraph), and SparkGraphComputer (Hadoop). Given the complexity of the OLAP system, it is good to study and copy many of the patterns used in these reference implementations.
Implementing GraphComputer

furnace character 3 The most complex method in GraphComputer is the submit()-method. The method must do the following:

  1. Ensure the GraphComputer has not already been executed.

  2. Ensure that at least there is a VertexProgram or 1 MapReduce job.

  3. If there is a VertexProgram, validate that it can execute on the GraphComputer given the respectively defined features.

  4. Create the Memory to be used for the computation.

  5. Execute the VertexProgram.setup() method once and only once.

  6. Execute the VertexProgram.execute() method for each vertex.

  7. Execute the VertexProgram.terminate() method once and if true, repeat VertexProgram.execute().

  8. When VertexProgram.terminate() returns true, move to MapReduce job execution.

  9. MapReduce jobs are not required to be executed in any specified order.

  10. For each Vertex, execute MapReduce.map(). Then (if defined) execute MapReduce.combine() and MapReduce.reduce().

  11. Update Memory with runtime information.

  12. Construct a new ComputerResult containing the compute Graph and Memory.

Implementing Memory

gremlin brain The Memory object is initially defined by VertexProgram.setup(). The memory data is available in the first round of the VertexProgram.execute() method. Each Vertex, when executing the VertexProgram, can update the Memory in its round. However, the update is not seen by the other vertices until the next round. At the end of the first round, all the updates are aggregated and the new memory data is available on the second round. This process repeats until the VertexProgram terminates.

Implementing Messenger

The Messenger object is similar to the Memory object in that a vertex can read and write to the Messenger. However, the data it reads are the messages sent to the vertex in the previous step and the data it writes are the messages that will be readable by the receiving vertices in the subsequent round.

Implementing MapReduce Emitters

hadoop logo notext The MapReduce framework in TinkerPop is similar to the model popularized by Hadoop. The primary difference is that all Mappers process the vertices of the graph, not an arbitrary key/value pair. However, the vertices' edges can not be accessed — only their properties. This greatly reduces the amount of data needed to be pushed through the MapReduce engine as any edge information required, can be computed in the VertexProgram.execute() method. Moreover, at this stage, vertices can not be mutated, only their token and property data read. A Gremlin OLAP system needs to provide implementations for to particular classes: MapReduce.MapEmitter and MapReduce.ReduceEmitter. TinkerGraph’s implementation is provided below which demonstrates the simplicity of the algorithm (especially when the data is all within the same JVM).

public class TinkerMapEmitter<K, V> implements MapReduce.MapEmitter<K, V> {

    public Map<K, Queue<V>> reduceMap;
    public Queue<KeyValue<K, V>> mapQueue;
    private final boolean doReduce;

    public TinkerMapEmitter(final boolean doReduce) { 1
        this.doReduce = doReduce;
        if (this.doReduce)
            this.reduceMap = new ConcurrentHashMap<>();
        else
            this.mapQueue = new ConcurrentLinkedQueue<>();
    }

    @Override
    public void emit(K key, V value) {
        if (this.doReduce)
            this.reduceMap.computeIfAbsent(key, k -> new ConcurrentLinkedQueue<>()).add(value); 2
        else
            this.mapQueue.add(new KeyValue<>(key, value)); 3
    }

    protected void complete(final MapReduce<K, V, ?, ?, ?> mapReduce) {
        if (!this.doReduce && mapReduce.getMapKeySort().isPresent()) { 4
            final Comparator<K> comparator = mapReduce.getMapKeySort().get();
            final List<KeyValue<K, V>> list = new ArrayList<>(this.mapQueue);
            Collections.sort(list, Comparator.comparing(KeyValue::getKey, comparator));
            this.mapQueue.clear();
            this.mapQueue.addAll(list);
        } else if (mapReduce.getMapKeySort().isPresent()) {
            final Comparator<K> comparator = mapReduce.getMapKeySort().get();
            final List<Map.Entry<K, Queue<V>>> list = new ArrayList<>();
            list.addAll(this.reduceMap.entrySet());
            Collections.sort(list, Comparator.comparing(Map.Entry::getKey, comparator));
            this.reduceMap = new LinkedHashMap<>();
            list.forEach(entry -> this.reduceMap.put(entry.getKey(), entry.getValue()));
        }
    }
}
  1. If the MapReduce job has a reduce, then use one data structure (reduceMap), else use another (mapList). The difference being that a reduction requires a grouping by key and therefore, the Map<K,Queue<V>> definition. If no reduction/grouping is required, then a simple Queue<KeyValue<K,V>> can be leveraged.

  2. If reduce is to follow, then increment the Map with a new value for the key. MapHelper is a TinkerPop class with static methods for adding data to a Map.

  3. If no reduce is to follow, then simply append a KeyValue to the queue.

  4. When the map phase is complete, any map-result sorting required can be executed at this point.

public class TinkerReduceEmitter<OK, OV> implements MapReduce.ReduceEmitter<OK, OV> {

    protected Queue<KeyValue<OK, OV>> reduceQueue = new ConcurrentLinkedQueue<>();

    @Override
    public void emit(final OK key, final OV value) {
        this.reduceQueue.add(new KeyValue<>(key, value));
    }

    protected void complete(final MapReduce<?, ?, OK, OV, ?> mapReduce) {
        if (mapReduce.getReduceKeySort().isPresent()) {
            final Comparator<OK> comparator = mapReduce.getReduceKeySort().get();
            final List<KeyValue<OK, OV>> list = new ArrayList<>(this.reduceQueue);
            Collections.sort(list, Comparator.comparing(KeyValue::getKey, comparator));
            this.reduceQueue.clear();
            this.reduceQueue.addAll(list);
        }
    }
}

The method MapReduce.reduce() is defined as:

public void reduce(final OK key, final Iterator<OV> values, final ReduceEmitter<OK, OV> emitter) { ... }

In other words, for the TinkerGraph implementation, iterate through the entrySet of the reduceMap and call the reduce() method on each entry. The reduce() method can emit key/value pairs which are simply aggregated into a Queue<KeyValue<OK,OV>> in an analogous fashion to TinkerMapEmitter when no reduce is to follow. These two emitters are tied together in TinkerGraphComputer.submit().

...
for (final MapReduce mapReduce : mapReducers) {
    if (mapReduce.doStage(MapReduce.Stage.MAP)) {
        final TinkerMapEmitter<?, ?> mapEmitter = new TinkerMapEmitter<>(mapReduce.doStage(MapReduce.Stage.REDUCE));
        final SynchronizedIterator<Vertex> vertices = new SynchronizedIterator<>(this.graph.vertices());
        workers.setMapReduce(mapReduce);
        workers.mapReduceWorkerStart(MapReduce.Stage.MAP);
        workers.executeMapReduce(workerMapReduce -> {
            while (true) {
                final Vertex vertex = vertices.next();
                if (null == vertex) return;
                workerMapReduce.map(ComputerGraph.mapReduce(vertex), mapEmitter);
            }
        });
        workers.mapReduceWorkerEnd(MapReduce.Stage.MAP);

        // sort results if a map output sort is defined
        mapEmitter.complete(mapReduce);

        // no need to run combiners as this is single machine
        if (mapReduce.doStage(MapReduce.Stage.REDUCE)) {
            final TinkerReduceEmitter<?, ?> reduceEmitter = new TinkerReduceEmitter<>();
            final SynchronizedIterator<Map.Entry<?, Queue<?>>> keyValues = new SynchronizedIterator((Iterator) mapEmitter.reduceMap.entrySet().iterator());
            workers.mapReduceWorkerStart(MapReduce.Stage.REDUCE);
            workers.executeMapReduce(workerMapReduce -> {
                while (true) {
                    final Map.Entry<?, Queue<?>> entry = keyValues.next();
                    if (null == entry) return;
                        workerMapReduce.reduce(entry.getKey(), entry.getValue().iterator(), reduceEmitter);
                    }
                });
            workers.mapReduceWorkerEnd(MapReduce.Stage.REDUCE);
            reduceEmitter.complete(mapReduce); // sort results if a reduce output sort is defined
            mapReduce.addResultToMemory(this.memory, reduceEmitter.reduceQueue.iterator()); 1
        } else {
            mapReduce.addResultToMemory(this.memory, mapEmitter.mapQueue.iterator()); 2
        }
    }
}
...
  1. Note that the final results of the reducer are provided to the Memory as specified by the application developer’s MapReduce.addResultToMemory() implementation.

  2. If there is no reduce stage, the map-stage results are inserted into Memory as specified by the application developer’s MapReduce.addResultToMemory() implementation.

Hadoop-Gremlin Usage

Hadoop-Gremlin is centered around InputFormats and OutputFormats. If a 3rd-party graph system provider wishes to leverage Hadoop-Gremlin (and its respective GraphComputer engines), then they need to provide, at minimum, a Hadoop2 InputFormat<NullWritable,VertexWritable> for their graph system. If the provider wishes to persist computed results back to their graph system (and not just to HDFS via a FileOutputFormat), then a graph system specific OutputFormat<NullWritable,VertexWritable> must be developed as well.

Conceptually, HadoopGraph is a wrapper around a Configuration object. There is no "data" in the HadoopGraph as the InputFormat specifies where and how to get the graph data at OLAP (and OLTP) runtime. Thus, HadoopGraph is a small object with little overhead. Graph system providers should realize HadoopGraph as the gateway to the OLAP features offered by Hadoop-Gremlin. For example, a graph system specific Graph.compute(Class<? extends GraphComputer> graphComputerClass)-method may look as follows:

public <C extends GraphComputer> C compute(final Class<C> graphComputerClass) throws IllegalArgumentException {
  try {
    if (AbstractHadoopGraphComputer.class.isAssignableFrom(graphComputerClass))
      return graphComputerClass.getConstructor(HadoopGraph.class).newInstance(this);
    else
      throw Graph.Exceptions.graphDoesNotSupportProvidedGraphComputer(graphComputerClass);
  } catch (final Exception e) {
    throw new IllegalArgumentException(e.getMessage(),e);
  }
}

Note that the configurations for Hadoop are assumed to be in the Graph.configuration() object. If this is not the case, then the Configuration provided to HadoopGraph.open() should be dynamically created within the compute()-method. It is in the provided configuration that HadoopGraph gets the various properties which determine how to read and write data to and from Hadoop. For instance, gremlin.hadoop.graphReader and gremlin.hadoop.graphWriter.

GraphFilterAware Interface

Graph filters by OLAP processors to only pull a subgraph of the full graph from the graph data source. For instance, the example below constructs a GraphFilter that will only pull the "knows"-graph amongst people into the GraphComputer for processing.

graph.compute().vertices(hasLabel("person")).edges(bothE("knows"))

If the provider has a custom InputRDD, they can implement GraphFilterAware and that graph filter will be provided to their InputRDD at load time. For providers that use an InputFormat, state but the graph filter can be accessed from the configuration as such:

if (configuration.containsKey(Constants.GREMLIN_HADOOP_GRAPH_FILTER))
  this.graphFilter = VertexProgramHelper.deserialize(configuration, Constants.GREMLIN_HADOOP_GRAPH_FILTER);
PersistResultGraphAware Interface

A graph system provider’s OutputFormat should implement the PersistResultGraphAware interface which determines which persistence options are available to the user. For the standard file-based OutputFormats provided by Hadoop-Gremlin (e.g. GryoOutputFormat, GraphSONOutputFormat, and ScriptInputOutputFormat) ResultGraph.ORIGINAL is not supported as the original graph data files are not random access and are, in essence, immutable. Thus, these file-based OutputFormats only support ResultGraph.NEW which creates a copy of the data specified by the Persist enum.

IO Implementations

If a Graph requires custom serializers for IO to work properly, implement the Graph.io method. A typical example of where a Graph would require such a custom serializers is if their identifier system uses non-primitive values, such as OrientDB’s Rid class. From basic serialization of a single Vertex all the way up the stack to Gremlin Server, the need to know how to handle these complex identifiers is an important requirement.

The first step to implementing custom serializers is to first implement the IoRegistry interface and register the custom classes and serializers to it. Each Io implementation has different requirements for what it expects from the IoRegistry:

  • GraphML - No custom serializers expected/allowed.

  • GraphSON - Register a Jackson SimpleModule. The SimpleModule encapsulates specific classes to be serialized, so it does not need to be registered to a specific class in the IoRegistry (use null).

  • Gryo - Expects registration of one of three objects:

    • Register just the custom class with a null Kryo Serializer implementation - this class will use default "field-level" Kryo serialization.

    • Register the custom class with a specific Kryo `Serializer' implementation.

    • Register the custom class with a Function<Kryo, Serializer> for those cases where the Kryo Serializer requires the Kryo instance to get constructed.

This implementation should provide a zero-arg constructor as the stack may require instantiation via reflection. Consider extending AbstractIoRegistry for convenience as follows:

public class MyGraphIoRegistry extends AbstractIoRegistry {
    public MyGraphIoRegistry() {
        register(GraphSONIo.class, null, new MyGraphSimpleModule());
        register(GryoIo.class, MyGraphIdClass.class, new MyGraphIdSerializer());
    }
}

In the Graph.io method, provide the IoRegistry object to the supplied Builder and call the create method to return that Io instance as follows:

public <I extends Io> I io(final Io.Builder<I> builder) {
    return (I) builder.graph(this).registry(myGraphIoRegistry).create();
}}

In this way, Graph implementations can pre-configure custom serializers for IO interactions and users will not need to know about those details. Following this pattern will ensure proper execution of the test suite as well as simplified usage for end-users.

Important
Proper implementation of IO is critical to successful Graph operations in Gremlin Server. The Test Suite does have "serialization" tests that provide some assurance that an implementation is working properly, but those tests cannot make assertions against any specifics of a custom serializer. It is the responsibility of the implementer to test the specifics of their custom serializers.
Tip
Consider separating serializer code into its own module, if possible, so that clients that use the Graph implementation remotely don’t need a full dependency on the entire Graph - just the IO components and related classes being serialized.

There is an important implication to consider when the addition of a custom serializer. Presumably, the custom serializer was written for the JVM to be deployed with a Graph instance. For example, a graph may expose a geographical type like a Point or something similar. The library that contains Point assuming users expected to deserialize back to a Point would need to have the library with Point and the “PointSerializer” class available to them. In cases where that deployment approach is not desirable, it is possible to coerce a class like Point to a type that is already in the list of types supported in TinkerPop. For example, Point could be coerced one-way to Map of keys "x" and "y". Of course, on the client side, users would have to construct a Map for a Point which isn’t quite as user-friendly.

If doing a type coercion is not desired, then it is important to remember that writing a Point class and related serializer in Java is not sufficient for full support of Gremlin, as users of non-JVM Gremlin Language Variants (GLV) will not be able to consume them. Getting full support would mean writing similar classes for each GLV. While developing those classes is not hard, it also means more code to support.

Supporting Gremlin-Python IO

The serialization system of Gremlin-Python provides ways to add new types by creating serializers and deserializers in Python and registering them with the RemoteConnection.

class MyType(object):
  GRAPHSON_PREFIX = "providerx"
  GRAPHSON_BASE_TYPE = "MyType"
  GRAPHSON_TYPE = GraphSONUtil.formatType(GRAPHSON_PREFIX, GRAPHSON_BASE_TYPE)

  def __init__(self, x, y):
    self.x = x
    self.y = y

  @classmethod
  def objectify(cls, value, reader):
    return cls(value['x'], value['y'])

  @classmethod
  def dictify(cls, value, writer):
    return GraphSONUtil.typedValue(cls.GRAPHSON_BASE_TYPE,
                                  {'x': value.x, 'y': value.y},
                                  cls.GRAPHSON_PREFIX)

graphson_reader = GraphSONReader({MyType.GRAPHSON_TYPE: MyType})
graphson_writer = GraphSONWriter({MyType: MyType})

connection = DriverRemoteConnection('ws://localhost:8182/gremlin', 'g',
                                     graphson_reader=graphson_reader,
                                     graphson_writer=graphson_writer)
Supporting Gremlin.Net IO

The serialization system of Gremlin.Net provides ways to add new types by creating serializers and deserializers in any .NET language and registering them with the GremlinClient.

internal class MyType
{
    public static string GraphsonPrefix = "providerx";
    public static string GraphsonBaseType = "MyType";
    public static string GraphsonType = GraphSONUtil.FormatTypeName(GraphsonPrefix, GraphsonBaseType);

    public MyType(int x, int y)
    {
        X = x;
        Y = y;
    }

    public int X { get; }
    public int Y { get; }
}

internal class MyClassWriter : IGraphSONSerializer
{
    public Dictionary<string, dynamic> Dictify(dynamic objectData, GraphSONWriter writer)
    {
        MyType myType = objectData;
        var valueDict = new Dictionary<string, object>
        {
            {"x", myType.X},
            {"y", myType.Y}
        };
        return GraphSONUtil.ToTypedValue(nameof(TestClass), valueDict, MyType.GraphsonPrefix);
    }
}

internal class MyTypeReader : IGraphSONDeserializer
{
    public dynamic Objectify(JsonElement graphsonObject, GraphSONReader reader)
    {
        var x = reader.ToObject(graphsonObject.GetProperty("x"));
        var y = reader.ToObject(graphsonObject.GetProperty("y"));
        return new MyType(x, y);
    }
}

var graphsonReader = new GraphSON3Reader(
    new Dictionary<string, IGraphSONDeserializer> {{MyType.GraphsonType, new MyTypeReader()}});
var graphsonWriter = new GraphSON3Writer(
    new Dictionary<Type, IGraphSONSerializer> {{typeof(MyType), new MyClassWriter()}});

var gremlinClient = new GremlinClient(new GremlinServer("localhost", 8182), new GraphSON2MessageSerializer());

RemoteConnection Implementations

A RemoteConnection is an interface that is important for usage on traversal sources configured using the with() option. A Traversal that is generated from that source will apply a RemoteStrategy which will inject a RemoteStep to its end. That step will then send the Bytecode of the Traversal over the RemoteConnection to get the results that it will iterate.

There is one method to implement on RemoteConnection:

public <E> CompletableFuture<RemoteTraversal<?, E>> submitAsync(final Bytecode bytecode) throws RemoteConnectionException;

Note that it returns a RemoteTraversal. This interface should also be implemented and in most cases implementers can simply extend the AbstractRemoteTraversal.

TinkerPop provides the DriverRemoteConnection as a useful and example implementation. DriverRemoteConnection serializes the Traversal as Gremlin bytecode and then submits it for remote processing on Gremlin Server. Gremlin Server rebinds the Traversal to a configured Graph instance and then iterates the results back as it would normally do.

Implementing RemoteConnection is not something routinely done for those implementing gremlin-core. It is only something required if there is a need to exploit remote traversal submission. If a graph provider has a "graph server" similar to Gremlin Server that can accept bytecode-based requests on its own protocol, then that would be one example of a reason to implement this interface.

Bulk Import Export

When it comes to doing "bulk" operations, the diverse nature of the available graph databases and their specific capabilities, prevents TinkerPop from doing a good job of generalizing that capability well. TinkerPop thus maintains two positions on the concept of import and export:

  1. TinkerPop refers users to the bulk import/export facilities of specific graph providers as they tend to be more efficient and easier to use than the options TinkerPop has tried to generalize in the past.

  2. TinkerPop encourages graph providers to expose those capabilities via g.io() and the IoStep by way of a TraversalStrategy.

That said, for graph providers that don’t have a special bulk loading feature, they can either rely on the default OLTP (single-threaded) GraphReader and GraphWriter options that are embedded in IoStep or get a basic bulk loader from TinkerPop using the CloneVertexProgram. Simply provide a InputFormat and OutputFormat that can be referenced by a HadoopGraph instance as discussed in the Reference Documentation.

Validating with Gremlin-Test

gremlin edumacated

<dependency>
  <groupId>org.apache.tinkerpop</groupId>
  <artifactId>gremlin-test</artifactId>
  <version>4.0.0-SNAPSHOT</version>
</dependency>

Providers currently have two approaches to consider when validating their TinkerPop implementations. The first approach comes from the wholly JVM oriented original test suite which was developed in the early days of TinkerPop 3.x design and development. The second approach is available as of 3.6.0, is Gherkin-based and originates from the Gremlin Language Variant test suite which is language agnostic.

The first approach is more complete and more opinionated as to how an implementation should behave and in many ways helpful in getting an implementation semantically correct from the ground up (i.e. first getting the Graph Structure API implemented well by getting the Structure Suite to pass which will almost inevitably ensure that the most of the Gremlin language oriented tests in the Process Suite pass early on). On the other hand, the fact that this test suite is rigorous also can make it harder to implement especially if your graph already exists and behaves in a certain fashion.

The second approach only validates Gremlin semantics which is ultimately what users concern themselves with as that is the method by which they will interact with a provider’s Graph. This test suite is less concerned with how a TinkerPop implementation does what it does, so long as it succeeds at processing Gremlin traversals. There is significant overlap between this test suite and the aforementioned Process Suite.

At this time, it would be wise for providers to implement both approaches as the goal for TinkerPop is to move away from the rigors of the JVM Structure and Process Suites in favor of Gherkin. Over time, the Structure and Process Suites will be deprecated and removed.

JVM Test Suite

The operational semantics of any OLTP or OLAP implementation are validated by gremlin-test. To implement these tests, provide test case implementations as shown below, where XXX below denotes the name of the graph implementation (e.g. TinkerGraph, Neo4jGraph, HadoopGraph, etc.).

// Structure API tests
@RunWith(StructureStandardSuite.class)
@GraphProviderClass(provider = XXXGraphProvider.class, graph = XXXGraph.class)
public class XXXStructureStandardTest {}

// Process API tests
@RunWith(ProcessComputerSuite.class)
@GraphProviderClass(provider = XXXGraphProvider.class, graph = XXXGraph.class)
public class XXXProcessComputerTest {}

@RunWith(ProcessStandardSuite.class)
@GraphProviderClass(provider = XXXGraphProvider.class, graph = XXXGraph.class)
public class XXXProcessStandardTest {}
Important
It is as important to look at "ignored" tests as it is to look at ones that fail. The gremlin-test suite utilizes the Feature implementation exposed by the Graph to determine which tests to execute. If a test utilizes features that are not supported by the graph, it will ignore them. While that may be fine, implementers should validate that the ignored tests are appropriately bypassed and that there are no mistakes in their feature definitions. Moreover, implementers should consider filling gaps in their own test suites, especially when IO-related tests are being ignored.
Tip
If it is expensive to construct a new Graph instance, consider implementing GraphProvider.getStaticFeatures() which can help by caching a static feature set for instances produced by that GraphProvider and allow the test suite to avoid that construction cost if the test is ignored.

The only test-class that requires any code investment is the GraphProvider implementation class. This class is a used by the test suite to construct Graph configurations and instances and provides information about the implementation itself. In most cases, it is best to simply extend AbstractGraphProvider as it provides many default implementations of the GraphProvider interface.

Finally, specify the test suites that will be supported by the Graph implementation using the @Graph.OptIn annotation. See the TinkerGraph implementation below as an example:

@Graph.OptIn(Graph.OptIn.SUITE_STRUCTURE_STANDARD)
@Graph.OptIn(Graph.OptIn.SUITE_PROCESS_STANDARD)
@Graph.OptIn(Graph.OptIn.SUITE_PROCESS_COMPUTER)
public class TinkerGraph implements Graph {

Only include annotations for the suites the implementation will support. Note that implementing the suite, but not specifying the appropriate annotation will prevent the suite from running (an obvious error message will appear in this case when running the mis-configured suite).

There are times when there may be a specific test in the suite that the implementation cannot support (despite the features it implements) or should not otherwise be executed. It is possible for implementers to "opt-out" of a test by using the @Graph.OptOut annotation. This annotation can be applied to either a Graph instance or a GraphProvider instance (the latter would typically be used for "opting out" for a particular Graph configuration that was under test). The following is an example of this annotation usage as taken from HadoopGraph:

@Graph.OptIn(Graph.OptIn.SUITE_PROCESS_STANDARD)
@Graph.OptIn(Graph.OptIn.SUITE_PROCESS_COMPUTER)
@Graph.OptOut(
        test = "org.apache.tinkerpop.gremlin.process.graph.step.map.MatchTest$Traversals",
        method = "g_V_matchXa_hasXname_GarciaX__a_inXwrittenByX_b__a_inXsungByX_bX",
        reason = "Hadoop-Gremlin is OLAP-oriented and for OLTP operations, linear-scan joins are required. This particular tests takes many minutes to execute.")
@Graph.OptOut(
        test = "org.apache.tinkerpop.gremlin.process.graph.step.map.MatchTest$Traversals",
        method = "g_V_matchXa_inXsungByX_b__a_inXsungByX_c__b_outXwrittenByX_d__c_outXwrittenByX_e__d_hasXname_George_HarisonX__e_hasXname_Bob_MarleyXX",
        reason = "Hadoop-Gremlin is OLAP-oriented and for OLTP operations, linear-scan joins are required. This particular tests takes many minutes to execute.")
@Graph.OptOut(
        test = "org.apache.tinkerpop.gremlin.process.computer.GraphComputerTest",
        method = "shouldNotAllowBadMemoryKeys",
        reason = "Hadoop does a hard kill on failure and stops threads which stops test cases. Exception handling semantics are correct though.")
@Graph.OptOut(
        test = "org.apache.tinkerpop.gremlin.process.computer.GraphComputerTest",
        method = "shouldRequireRegisteringMemoryKeys",
        reason = "Hadoop does a hard kill on failure and stops threads which stops test cases. Exception handling semantics are correct though.")
public class HadoopGraph implements Graph {

The above examples show how to ignore individual tests. It is also possible to:

  • Ignore an entire test case (i.e. all the methods within the test) by setting the method to "*".

  • Ignore a "base" test class such that test that extend from those classes will all be ignored.

  • Ignore a GraphComputer test based on the type of GraphComputer being used. Specify the "computer" attribute on the OptOut (which is an array specification) which should have a value of the GraphComputer implementation class that should ignore that test. This attribute should be left empty for "standard" execution and by default all GraphComputer implementations will be included in the OptOut so if there are multiple implementations, explicitly specify the ones that should be excluded.

Also note that some of the tests in the Gremlin Test Suite are parameterized tests and require an additional level of specificity to be properly ignored. To ignore these types of tests, examine the name template of the parameterized tests. It is defined by a Java annotation that looks like this:

@Parameterized.Parameters(name = "expect({0})")

The annotation above shows that the name of each parameterized test will be prefixed with "expect" and have parentheses wrapped around the first parameter (at index 0) value supplied to each test. This information can only be garnered by studying the test set up itself. Once the pattern is determined and the specific unique name of the parameterized test is identified, add it to the specific property on the OptOut annotation in addition to the other arguments.

These annotations help provide users a level of transparency into test suite compliance (via the describeGraph() utility function). It also allows implementers to have a lot of flexibility in terms of how they wish to support TinkerPop. For example, maybe there is a single test case that prevents an implementer from claiming support of a Feature. The implementer could choose to either not support the Feature or to support it but "opt-out" of the test with a "reason" as to why so that users understand the limitation.

Important
Before using OptOut be sure that the reason for using it is sound and it is more of a last resort. It is possible that a test from the suite doesn’t properly represent the expectations of a feature, is too broad or narrow for the semantics it is trying to enforce or simply contains a bug. Please consider raising issues in the developer mailing list with such concerns before assuming OptOut is the only answer.
Important
There are no tests that specifically validate complete compliance with Gremlin Server. Generally speaking, a Graph that passes the full Test Suite, should be compliant with Gremlin Server. The one area where problems can occur is in serialization. Always ensure that IO is properly implemented, that custom serializers are tested fully and ultimately integration test the Graph with an actual Gremlin Server instance.
Warning
Configuring tests to run in parallel might result in errors that are difficult to debug as there is some shared state in test execution around graph configuration. It is therefore recommended that parallelism be turned off for the test suite (the Maven SureFire Plugin is configured this way by default). It may also be important to include this setting, <reuseForks>false</reuseForks>, in the SureFire configuration if tests are failing in an unexplainable way.
Warning
For graph implementations that require a schema, take note that TinkerPop tests were originally developed without too much concern for these types of graphs. While most tests utilize the standard toy graphs there are instances where tests will utilize their own independent schema that stands alone from all other tests. It may be necessary to create schemas specific to certain tests in those situations.
Tip
When running the gremlin-test suite against your implementation, you may need to set build.dir as an environment variable, depending on your project layout. Some tests require this to find a writable directory for creating temporary files. The value is typically set to the project build directory. For example using the Maven SureFire Plugin, this is done via the configuration argLine with -Dbuild.dir=${project.build.directory}.
Checking Resource Leaks

The TinkerPop query engine retrieves data by interfacing with the provider using iterators. These iterators (depending on the provider) may hold up resources in the underlying storage layer and hence, it is critical to close them after the query is finished.

TinkerPop provides you with the ability to test for such resource leaks by checking for leaks when you run the Gremlin-Test suites against your implementation. To enable this leak detection, providers should increment the StoreIteratorCounter whenever a resource is opened and decrement it when it is closed. A reference implementation is provided with TinkerGraph as TinkerGraphIterator.java.

Assertions for leak detection are enabled by default when running the test suite. They can be temporarily disabled by way of a system property - simply set `-DtestIteratorLeaks=false".

Gherkin Test Suite

The Gherkin Test Suite is a language agnostic set of tests that verify Gremlin semantics. It provides a unified set of tests that validate many TinkerPop components internally. The tests themselves can be found in gremlin-tests/features (here) with their syntax described in the TinkerPop Developer Documentation.

TinkerPop provides some infrastructure, for JVM based graphs, to help make it easier for providers to implement these tests against their implementations. This infrastructure is built on cucumber-java which is a dependency of gremlin-test. There are two main components to implementing the tests:

  1. A org.apache.tinkerpop.gremlin.features.World implementation which is a class in gremlin-test.

  2. A JUnit test class that will act as the runner for the tests with the appropriate annotations

Tip
It may be helpful to get familiar with Cucumber before proceeding with an implementation.

The World implementation provides context to the tests and allows providers to intercept test events that might be important to proper execution specific to their implementations. The most important part of implementing World is properly implementing the GraphTraversalSource getGraphTraversalSource(GraphData) method which provides to the test the GraphTraversalSource to execute the test against.

The JUnit test class is really just the test runner. It is a simple class which must include some Cucumber annotations. The following is just an example as taken from TinkerGraph:

@RunWith(Cucumber.class)
@CucumberOptions(
        tags = "not @RemoteOnly",
        glue = { "org.apache.tinkerpop.gremlin.features" },
        features = { "classpath:/org/apache/tinkerpop/gremlin/test/features" },
        plugin = {"progress", "junit:target/cucumber.xml",
        objectFactory = GuiceFactory.class})

The @CucumberOptions that are used are mostly implementation specific, so it will be up to the provider to make some choices as to what is right for their environment. For TinkerGraph, it needed to ignore Gherkin tests with the @RemoteOnly tag (the full list of possible tags can be found here), as will most providers. The "glue" will be the same for all test implementers as it refers to a package containing TinkerPop’s test infrastructure in gremlin-test (unless of course, a provider needs to develop their own infrastructure for some reason). The "features" is the path to the actual Gherkin test files that should be made available locally. The files can be referenced on the classpath assuming gremlin-test is a dependency. The "plugin" defines a JUnit style output, which happens to be understood by Maven.

The "objectFactory" is the last component. Cucumber relies on dependency injection to get a World implementation into the test infrastructure. Providers may choose from multiple available implementations, but TinkerPop chose to use Guice. To follow this approach include the following module:

<dependency>
    <groupId>com.google.inject</groupId>
    <artifactId>guice</artifactId>
    <version>4.2.3</version>
    <scope>test</scope>
</dependency>

Following the Neo4jGraph implementation, there are two classes to construct:

public class ServiceModule extends AbstractModule {
    @Override
    protected void configure() {
        bind(World.class).to(Neo4jGraphWorld.class);
    }
}

public class WorldInjectorSource implements InjectorSource {
    @Override
    public Injector getInjector() {
        return Guice.createInjector(Stage.PRODUCTION, CucumberModules.createScenarioModule(), new ServiceModule());
    }
}

The key here is that the Neo4jGraphWorld implementation gets bound to World in the ServiceModule and there is a WorldInjectorSource that specifies the ServiceModule to Cucumber. As a final step, the provider’s test resources needs a cucumber.properties file with an entry that specifies the InjectorSource so that Guice can find it. Here is the example taken from TinkerGraph where the WorldInjectorSource is inner class of TinkerGraphFeatureTest itself.

guice.injector-source=org.apache.tinkerpop.gremlin.neo4j.Neo4jGraphFeatureTest$WorldInjectorSource

In the event that a single World configuration is insufficient, it may be necessary to develop a custom ObjectFactory. An easy way to do this is to create a class that extends from the AbstractGuiceFactory in gremlin-test and provide that class to the @CucumberOptions. This approach does rely on the ServiceLoader which means it will be important to include a io.cucumber.core.backend.ObjectFactory file in META-INF/services and an entry that registers the custom implementation. Please see the TinkerGraph test code for further information on this approach.

If implementing the Gherkin tests, providers can choose to opt-in to the slimmed down version of the normal JVM process test suite to help alleviate test duplication between the two frameworks:

@Graph.OptIn(Graph.OptIn.SUITE_PROCESS_LIMITED_STANDARD)
@Graph.OptIn(Graph.OptIn.SUITE_PROCESS_LIMITED_COMPUTER)

Accessibility via GremlinPlugin

gremlin plugin The applications distributed with TinkerPop do not distribute with any graph system implementations besides TinkerGraph. If your implementation is stored in a Maven repository (e.g. Maven Central Repository), then it is best to provide a GremlinPlugin implementation so the respective jars can be downloaded according and when required by the user. Neo4j’s GremlinPlugin is provided below for reference.

Unresolved directive in index.asciidoc - include::/Users/yangx/Repos/maintinkerpop/tinkerpop/neo4j-gremlin/src/main/java/org/apache/tinkerpop/gremlin/neo4j/jsr223/Neo4jGremlinPlugin.java[]

With the above plugin implementations, users can now download respective binaries for Gremlin Console, Gremlin Server, etc.

gremlin> g = Neo4jGraph.open('/tmp/neo4j')
No such property: Neo4jGraph for class: groovysh_evaluate
Display stack trace? [yN]
gremlin> :install org.apache.tinkerpop neo4j-gremlin 4.0.0-SNAPSHOT
==>loaded: [org.apache.tinkerpop, neo4j-gremlin, …]
gremlin> :plugin use tinkerpop.neo4j
==>tinkerpop.neo4j activated
gremlin> g = Neo4jGraph.open('/tmp/neo4j')
==>neo4jgraph[EmbeddedGraphDatabase [/tmp/neo4j]]

In-Depth Implementations

gremlin painting The graph system implementation details presented thus far are minimum requirements necessary to yield a valid TinkerPop implementation. However, there are other areas that a graph system provider can tweak to provide an implementation more optimized for their underlying graph engine. Typical areas of focus include:

  • Traversal Strategies: A TraversalStrategy can be used to alter a traversal prior to its execution. A typical example is converting a pattern of g.V().has('name','marko') into a global index lookup for all vertices with name "marko". In this way, a O(|V|) lookup becomes an O(log(|V|)). Please review TinkerGraphStepStrategy for ideas.

  • Step Implementations: Every step is ultimately referenced by the GraphTraversal interface. It is possible to extend GraphTraversal to use a graph system specific step implementation. Note that while it is sometimes possible to develop custom step implementations by extending from a TinkerPop step (typically, AddVertexStep and other Mutating steps), it’s important to consider that doing so introduces some greater risk for code breaks on upgrades as opposed to other areas of the code base. As steps are more internal features of TinkerPop, they might be subject to breaking API and behavioral changes that would be less likely to be accepted by more public facing interfaces.

Graph Driver Provider Requirements

gremlin server protocol

One of the roles for Gremlin Server is to provide a bridge from TinkerPop to non-JVM languages (e.g. Go, Python, etc.). Developers can build language bindings (or driver) that provide a way to submit Gremlin scripts to Gremlin Server and get back results. Given the extensible nature of Gremlin Server, it is difficult to provide an authoritative guide to developing a driver. It is however possible to describe the core communication protocol using the standard out-of-the-box configuration which should provide enough information to develop a driver for a specific language.

Gremlin Server is distributed with a configuration that utilizes HTTP with a custom API. Under this configuration, Gremlin Server accepts requests containing a Gremlin script, evaluates that script and then streams back the results in HTTP chunks.

Let’s use the incoming request to process the Gremlin script of g.V() as an example. Gremlin Server evaluates that script, getting an Iterator of vertices as a result, and steps through each Vertex within it. The vertices are batched together into an HTTP chunk. Each response is serialized given the requested serializer type (GraphBinary is recommended) and written back to the requesting client immediately. Gremlin Server does not wait for the entire result to be iterated, before sending back a response. It will send the responses as they are realized.

This approach allows for the processing of large result sets without having to serialize the entire result into memory for the response. It places a bit of a burden on the developer of the driver however, because it becomes necessary to provide a way to reconstruct the entire result on the client side from all of the individual responses that Gremlin Server returns for a single request. Again, this description of Gremlin Server’s "flow" is related to the out-of-the-box configuration. It is quite possible to construct other flows, that might be more amenable to a particular language or style of processing.

Note
TinkerPop provides a test server which may be useful for testing drivers. Details can be found here

It is recommended but not required that a driver include a User-Agent header as part of any HTTP request to Gremlin Server. Gremlin Server uses the user agent in building usage metrics as well as debugging. The standard format for connection user agents is:

"[Application Name] [GLV Name].[Version] [Language Runtime Version] [OS].[Version] [CPU Architecture]" For example: "MyTestApplication Gremlin-Java.3.5.4 11.0.16.1 Mac_OS_X.12.6.1 aarch64"

The following section provides an in-depth description of the TinkerPop HTTP API. The HTTP API is used for communicating the requests and responses that were described earlier.

HTTP API

This section describes the TinkerPop HTTP API which should be implemented by both graph system providers and graph driver providers. There is only one endpoint that currently needs to be supported which is POST /gremlin. This endpoint is a Gremlin evaluator which takes in a Gremlin script request and responds with the serialized results. The formats below use a bit of pseudo-JSON to help represent request and response bodies. The actual format of the request and response bodies will be determined by the serializers defined via the "Accept" and "Content-Type" headers. As a result, a generic type definition in this document like "number" could translate to a "long" for a serializer that supports types like GraphBinary.

HTTP Request

To formulate a request to Gremlin Server, a RequestMessage needs to be constructed. The RequestMessage is a generalized representation of a request. This message can be serialized in any fashion that is supported by Gremlin Server, which by default is GraphBinary. An HTTP request that contains a RequestMessage has the following form:

POST /gremlin HTTP/1.1
Accept: <mimetype>
Content-Type: <mimetype>
Gremlin-Hints: <hints>

{
  "gremlin": string,
  "timeoutMs": number,
  "bindings": object,
  "g": string,
  "language" : string,
  "materializeProperties": string,
  "bulkResults": boolean
}

An actual, complete request might look like the following:

POST /gremlin HTTP/1.1
content-length: 61
host: 127.0.0.1
content-type: application/vnd.gremlin-v4.0+json
accept-encoding: deflate
accept: application/vnd.graphbinary-v4.0
user-agent: NotAvailable Gremlin-Java.4.0.0 11.0.25 Windows_11.10.0 amd64
{
    "gremlin": "g.V()",
    "language": "gremlin-lang"
}
Expected Request HTTP Headers
Name Description Required Default

Accept

Serializer MIME types supported for the response. Must be a mimetype (see Serializers).

No

application/vnd.gremlin-v4.0+json;types=false

Accept-Encoding

The requested compression algorithm of the response. Valid values: deflate.

No

N/A

Authorization

Header used with Basic authorization.

No

N/A

Content-Length

The size of the payload

Yes

N/A

Content-Type

The MIME type of the serialized body

No

None

Gremlin-Hints

A semi-colon separated list of key/value pair metadata that could be helpful to the server in processing a particular request in some way. Must be a hints (see table below).

No

N/A

User-Agent

The user agent. Follow the format specified by user agent format.

No

user agent format

Request Header Value Options
Name Options

mimetype

A MIME type listed in Serializers.

hints

mutations: yes, no, unknown - Indicates if the Gremlin contains steps that can mutate the graph.

The body of the request should be a RequestMessage which is a Map. The RequestMessage should be serialized using the serializer specified by the Content-Type header. The following are the key value pairs allowed in a RequestMessage:

Request Message Format
Key Description Value Required

gremlin

The Gremlin query to execute.

String containing script

Yes

timeoutMs

The maximum time a query is allowed to execute in milliseconds.

Number between 0 and 2^31-1

No

bindings

A map used during query execution. Its usage depends on "language". For "gremlin-groovy", these are the variable bindings. For "gremlin-lang", these are the parameter bindings.

Object (Map)

No

g

The name of the graph traversal source to which the query applies. Default: "g"

String containing traversal source name

No

language

The name of the ScriptEngine to use to parse the gremlin query. Default: "gremlin-lang"

String containing ScriptEngine name

No

materializeProperties

Whether to include all properties for results. One of "tokens" or "all".

String

No

bulkResults

Whether the results should be bulked by the server (only applies to GraphBinary)

Boolean

No

HTTP Response

When Gremlin Server receives that request, it will decode it given the "mime type", and execute it using the ScriptEngine specified by the language field. In this case, it will evaluate the script g.V(x).out() using the bindings supplied in the args and stream back the results in HTTP chunks. When the chunks are combined, they will form a single ResponseMessage. The HTTP response containing the ResponseMessage has the following form:

HTTP/1.1 200
Content-type: <mimetype>
Transfer-Encoding: chunked
Gremlin-RequestId: <uuid>
{
  "result": list,
  "status": object
}
Note
While this response message is expected for all serialized responses, there may be some errors that are not serialized. In that case, the Content-Type of the response should be application/json and the JSON should contain a message key.
Response Message Format
Key Description

result

A map that contains the result data.

Name Description Required Default

data

A list of result objects.

Array

Yes

status

A map that contains the status of the result.

Name Description Required Default

code

The actual status code of the result.

Number

Yes

exception

A class of exception if an error occurred.

String

No

message

The error message if an error occurred.

String

No

Expected Response HTTP Headers
Name Description Required Default

Content-Type

The MIME type of the serialized body which is based on the request’s Accept header. May also be "application/json".

Yes

N/A

Gremlin-RequestId

The server generated UUID that is used as a request ID.

Yes

N/A

Transfer-Encoding

The server should attempt to chunk all responses.

No

"chunked"

Response Header Value Options
Name Options

mimetype

A MIME type listed in Serializers.

uuid

A randomly generated UUID string.

Response Status Codes

The following table details the HTTP status codes that Gremlin Server will send:

Code Name Description

200

SUCCESS

The server successfully processed a request to completion - there are no messages remaining in this stream.

204

NO CONTENT

The server processed the request but there is no result to return (e.g. an Iterator with no elements) - there are no messages remaining in this stream.

206

PARTIAL CONTENT

The server successfully returned some content, but there is more in the stream to arrive - wait for a SUCCESS to signify the end of the stream.

400

BAD REQUEST

There was a problem with the HTTP request.

401

UNAUTHORIZED

The request attempted to access resources that the requesting user did not have access to.

403

FORBIDDEN

The server could authenticate the request, but will not fulfill it.

404

NOT FOUND

The server was unable to find the requested resource.

405

METHOD NOT ALLOWED

The request used an unsupported method. The server only supports POST.

413

REQUEST ENTITY TOO LARGE

The request was too large or the query could not be compiled due to size limitations.

500

INTERNAL SERVER ERROR

A general server error occurred that prevented the request from being processed.

505

HTTP VERSION NOT SUPPORTED

A server error indicating that an unsupported version of HTTP is being used. Only HTTP/1.1 is supported.

Trailing Headers

Error responses will have trailing headers in addition to the status object in the response body. This information is duplicated and should be the same, so graph driver providers should use whichever is easier for them. The trailers, however, will only contain the Status and Exception without the Message.

HTTP Examples

For examples of actual requests and responses, take a look at the IO documentation for GraphSON requests and GraphSON responses.

HTTP Request Interceptor

A graph driver may support HTTP request intercepting which provides a means for the user of your graph driver to update the headers and body of the HTTP request before it is sent to the server. This enables use cases where a graph system provider’s server implementation has additional capabilities that aren’t included in the base Gremlin Server. Although every graph system provider is expected to support the protocol defined by the TinkerPop HTTP API, this doesn’t preclude them from including additional functionality. Be aware that if you choose to not provide this functionality, then your graph driver may not have access to some graph provider’s features, or, possibly, it may not be able to connect at all.

Authentication and Authorization

By default, Gremlin Server only supports basic HTTP authentication. This is handled by the HttpBasicAuthenticationHandler which is the only AbstractAuthenticationHandler provided with the Gremlin Server. Other common HTTP authentication schemes that are sent via an HTTP header can be supported by implementing a custom AbstractAuthenticationHandler. Because the communication protocol is HTTP/1.1, authentication should be header-based and should not include negotiation.

When basic authentication is enabled, an incoming request is intercepted before it is evaluated by the ScriptEngine. The request is examined for an Authorization header. If one doesn’t exist then "401 Unauthorized" error response is returned.

In addition to authenticating users at the start of a connection, Gremlin Server allows providers to authorize users on a per request basis. If a java class is configured that implements the Authorizer interface, Gremlin Server passes each request to this Authorizer. The Authorizer can deny authorization for the request by throwing an exception and Gremlin Server returns UNAUTHORIZED (status code 401) to the client. The Authorizer authorizes the request by returning the original request or the request with some additional constraints. Gremlin Server proceeds with the returned request and on its turn returns the result of the request to the client. More details on implementing authorization can be found in the reference documentation for Gremlin Server security.

Note
While Gremlin Server supports this authorization feature it is not a feature that TinkerPop requires of graph providers as part of the agreement between client and server.

Serializers

In order to serialize and deserialize the requests and responses, your graph driver will need to implement GraphBinary. The Gremlin Server is capable of returning both GraphBinary and GraphSON, however, GraphBinary is a more compact format which can lead to increased performance as fewer bytes need to be sent through the wire. For this reason, drivers only need to support GraphBinary. GraphSON can be used by applications that only support JSON serialization.

The following table lists the serializers supported by the Gremlin Server and their MIME types. These MIME types should be used in the Content-Type and Accept HTTP headers.

Name Description MIME type

Untyped GraphSON 4.0

A JSON-based graph format

application/vnd.gremlin-v4.0+json;types=false

Typed GraphSON 4.0

A JSON-based graph format with embedded type information used for serialization

application/vnd.gremlin-v4.0+json;types=true

GraphBinary 4.0

A binary graph format

application/vnd.graphbinary-v4.0

IO Tests

The IO test suite is a collection of files that contain the expected outcome of serialization of certain types. These tests can be used to determine if a particular serializer has been correctly implemented. In general, a driver should be able to "round trip" each of these types. That is, it should be able to both read from and write to those exact same bytes. Not all programming languages provide library types that will match the specification of the corresponding type defined by the serializer. In this case, it is not possible to completely round trip that type and you may skip that test. The GraphBinary test files can be found here. The Java implementation can be used as a reference on how these files can be used and its model shows the Java representation of those files.

Gremlin Plugins

gremlin plugin

Plugins provide a way to expand the features of a GremlinScriptEngine, which stands at that core of both Gremlin Console and Gremlin Server. Providers may wish to create plugins for a variety of reasons, but some common examples include:

  • Initialize the GremlinScriptEngine application with important classes so that the user doesn’t need to type their own imports.

  • Place specific objects in the bindings of the GremlinScriptEngine for the convenience of the user.

  • Bootstrap the GremlinScriptEngine with custom functions so that they are ready for usage at startup.

The first step to developing a plugin is to implement the GremlinPlugin interface:


package org.apache.tinkerpop.gremlin.jsr223;

import java.util.Optional;

/**
 * A plugin interface that is used by the {@link GremlinScriptEngineManager} to configure special {@link Customizer}
 * instances that will alter the features of any {@link GremlinScriptEngine} created by the manager itself.
 *
 * @author Stephen Mallette (http://stephen.genoprime.com)
 */
public interface GremlinPlugin {
    /**
     * The name of the module.  This name should be unique (use a namespaced approach) as naming clashes will
     * prevent proper module operations. Modules developed by TinkerPop will be prefixed with "tinkerpop."
     * For example, TinkerPop's implementation of Spark would be named "tinkerpop.spark".  If Facebook were
     * to do their own implementation the implementation might be called "facebook.spark".
     */
    public String getName();

    /**
     * Some modules may require a restart of the plugin host for the classloader to pick up the features.  This is
     * typically true of modules that rely on {@code Class.forName()} to dynamically instantiate classes from the
     * root classloader (e.g. JDBC drivers that instantiate via @{code DriverManager}).
     */
    public default boolean requireRestart() {
        return false;
    }

    /**
     * Gets the list of all {@link Customizer} implementations to assign to a new {@link GremlinScriptEngine}. This is
     * the same as doing {@code getCustomizers(null)}.
     */
    public default Optional<Customizer[]> getCustomizers(){
        return getCustomizers(null);
    }

    /**
     * Gets the list of {@link Customizer} implementations to assign to a new {@link GremlinScriptEngine}. The
     * implementation should filter the returned {@code Customizers} according to the supplied name of the
     * Gremlin-enabled {@code ScriptEngine}. By providing a filter, {@code GremlinModule} developers can have the
     * ability to target specific {@code ScriptEngines}.
     *
     * @param scriptEngineName The name of the {@code ScriptEngine} or null to get all the available {@code Customizers}
     */
    public Optional<Customizer[]> getCustomizers(final String scriptEngineName);
}

The most simple plugin and the one most commonly implemented will likely be one that just provides a list of classes for import. This type of plugin is the easiest way for implementers of the TinkerPop Structure and Process APIs to make their implementations available to users. The TinkerGraph implementation has just such a plugin:


package org.apache.tinkerpop.gremlin.tinkergraph.jsr223;

import org.apache.tinkerpop.gremlin.jsr223.AbstractGremlinPlugin;
import org.apache.tinkerpop.gremlin.jsr223.DefaultImportCustomizer;
import org.apache.tinkerpop.gremlin.jsr223.ImportCustomizer;
import org.apache.tinkerpop.gremlin.tinkergraph.process.computer.TinkerGraphComputer;
import org.apache.tinkerpop.gremlin.tinkergraph.process.computer.TinkerGraphComputerView;
import org.apache.tinkerpop.gremlin.tinkergraph.process.computer.TinkerMapEmitter;
import org.apache.tinkerpop.gremlin.tinkergraph.process.computer.TinkerMemory;
import org.apache.tinkerpop.gremlin.tinkergraph.process.computer.TinkerMessenger;
import org.apache.tinkerpop.gremlin.tinkergraph.process.computer.TinkerReduceEmitter;
import org.apache.tinkerpop.gremlin.tinkergraph.process.computer.TinkerWorkerPool;
import org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerEdge;
import org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerElement;
import org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerFactory;
import org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph;
import org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraphVariables;
import org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerHelper;
import org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerIoRegistryV1;
import org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerIoRegistryV2;
import org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerIoRegistryV3;
import org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerIoRegistryV4;
import org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerProperty;
import org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex;
import org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertexProperty;

/**
 * @author Stephen Mallette (http://stephen.genoprime.com)
 */
public final class TinkerGraphGremlinPlugin extends AbstractGremlinPlugin {
    private static final String NAME = "tinkerpop.tinkergraph";

    private static final ImportCustomizer imports = DefaultImportCustomizer.build()
            .addClassImports(TinkerEdge.class,
                    TinkerElement.class,
                    TinkerFactory.class,
                    TinkerGraph.class,
                    TinkerGraphVariables.class,
                    TinkerHelper.class,
                    TinkerIoRegistryV1.class,
                    TinkerIoRegistryV2.class,
                    TinkerIoRegistryV3.class,
                    TinkerIoRegistryV4.class,
                    TinkerProperty.class,
                    TinkerVertex.class,
                    TinkerVertexProperty.class,
                    TinkerGraphComputer.class,
                    TinkerGraphComputerView.class,
                    TinkerMapEmitter.class,
                    TinkerMemory.class,
                    TinkerMessenger.class,
                    TinkerReduceEmitter.class,
                    TinkerWorkerPool.class).create();

    private static final TinkerGraphGremlinPlugin instance = new TinkerGraphGremlinPlugin();

    public TinkerGraphGremlinPlugin() {
        super(NAME, imports);
    }

    public static TinkerGraphGremlinPlugin instance() {
        return instance;
    }
}

This plugin extends from the abstract base class of AbstractGremlinPlugin which provides some default implementations of the GremlinPlugin methods. It simply allows those who extend from it to be able to just supply the name of the module and a list of Customizer instances to apply to the GremlinScriptEngine. In this case, the TinkerGraph plugin just needs an ImportCustomizer which describes the list of classes to import when the plugin is activated and applied to the GremlinScriptEngine.

The ImportCustomizer is just one of several provided Customizer implementations that can be used in conjunction with plugin development:

Individual GremlinScriptEngine instances may have their own Customizer instances that can be used only with that engine - e.g. gremlin-groovy has some that are specific to controlling the Groovy compiler configuration. Developing a new Customizer implementation is not really possible without changes to TinkerPop, as the framework is not designed to respond to external ones. The base Customizer implementations listed above should cover most needs.

A GremlinPlugin must support one of two instantiation models so that it can be instantiated from configuration files for use in various situations - e.g. Gremlin Server. The first option is to use a static initializer given a method with the following signature:

public static GremlinPlugin instance()

The limitation with this approach is that it does not provide a way to supply any configuration to the plugin so it tends to only be useful for fairly simplistic plugins. The more advanced approach is to provide a "builder" given a method with the following signature:

public static Builder build()

It doesn’t really matter what kind of class is returned from build so long as it follows a "Builder" pattern, where methods on that object return an instance of itself, so that builder methods can be chained together prior to calling a final create method as follows:

public GremlinPlugin create()

Please see the ImportGremlinPlugin for an example of what implementing a Builder might look like in this context.

Note that the plugin provides a unique name for the plugin which follows a namespaced pattern as namespace.plugin-name (e.g. "tinkerpop.hadoop" - "tinkerpop" is the reserved namespace for TinkerPop maintained plugins).

For plugins that will work with Gremlin Console, there is one other step to follow to ensure that the GremlinPlugin will work there. The console loads GremlinPlugin instances via ServiceLoader and therefore need a resource file added to the jar file where the plugin exists. Add a file called org.apache.tinkerpop.gremlin.jsr223.GremlinPlugin to META-INF/services. In the case of the TinkerGraph plugin above, that file will have this line in it:

org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin

Once the plugin is packaged, there are two ways to test it out:

  1. Copy the jar and its dependencies to the Gremlin Console path and start it. It is preferrable that the plugin is copied to the /ext/plugin_name directory.

  2. Start Gremlin Console and try the :install command: :install com.company my-plugin 1.0.0.

In either case, once one of these two approaches is taken, the jars and their dependencies are available to the Console. The next step is to "activate" the plugin by doing :plugin use my-plugin, where "my-plugin" refers to the name of the plugin to activate.

Note
When :install is used logging dependencies related to SLF4J are filtered out so as not to introduce multiple logger bindings (which generates warning messages to the logs).

Plugins can also tie into the :remote and :submit commands. Recall that a :remote represents a different context within which Gremlin is executed, when issued with :submit. It is encouraged to use this integration point when possible, as opposed to registering new commands that can otherwise follow the :remote and :submit pattern. To expose this integration point as part of a plugin, implement the RemoteAcceptor interface:

Tip
Be good to the users of plugins and prevent dependency conflicts. Maintaining a conflict free plugin is most easily done by using the Maven Enforcer Plugin.
Tip
Consider binding the plugin’s minor version to the TinkerPop minor version so that it’s easy for users to figure out plugin compatibility. Otherwise, clearly document a compatibility matrix for the plugin somewhere that users can find it.
Unresolved directive in index.asciidoc - include::/Users/yangx/Repos/maintinkerpop/tinkerpop/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/jsr223/console/RemoteAcceptor.java[]

The RemoteAcceptor can be bound to a GremlinPlugin by adding a ConsoleCustomizer implementation to the list of Customizer instances that are returned from the GremlinPlugin. The ConsoleCustomizer will only be executed when in use with the Gremlin Console plugin host. Simply instantiate and return a RemoteAcceptor in the ConsoleCustomizer.getRemoteAcceptor(GremlinShellEnvironment) method. Generally speaking, each call to getRemoteAcceptor(GremlinShellEnvironment) should produce a new instance of a RemoteAcceptor.

Gremlin Semantics

tinkerpop meeting room

This section provides details on Gremlin language operational semantics. Describing these semantics and reinforcing them with tests in the Gremlin test suite makes it easier for providers to implement the language and for the TinkerPop Community to have better consistency in their user experiences. While the general Gremlin test suite offers an integrated approach to testing Gremlin queries, the @StepClassSemantics oriented tests found here are especially focused to the definitions found in this section. References to the location of specific tests can be found throughout the sub-sections below.

Types

The TinkerPop query execution runtime is aligned with Java primitives and handles the following primitive types:

  • Boolean

  • Integer

    • Byte (int8)

    • Short (int16)

    • Integer (int32)

    • Long (int64)

    • BigInteger

  • Decimal

    • Float (32-bit) (including +/-Infinity and NaN)

    • Double (64-bit) (including +/-Infinity and NaN)

    • BigDecimal

  • String

  • UUID

  • Date

  • nulltype

    • Has only one value in its type space - the "undefined" value null

Note
TinkerPop has a bit of a JVM-centric view of types as it was developed within that ecosystem.

Graph providers may not support all of these types depending on the architecture and implementation. Therefore TinkerPop must provide a way for Graph providers to override the behavior while it has its own default behavior. Also when some types are not supported Graph providers needs to map unsupported types into supported types internally. This mapping can be done in either information-preserving manner or non-preserving manner. Graph providers must tell which mapping they support through Graph.Features as well as which types they support.

  • Which primitive types are supported

    • Boolean, Integer, Float, String, UUID and Date

    • TinkerPop by default supports all of them

  • Which integer types are supported

    • TinkerPop by default supports int8 (Byte), int16 (Short), int32 (Integer), int64 (Long) and BigInteger in Java

  • Which float types are supported

    • TinkerPop by default supports all as float, double, and BigDecimal in Java

In addition to these, there are composite types as follows:

  • Graph Element

    • Vertex

    • Edge

    • VertexProperty

  • Property (edge properties and meta properties)

    • (Key, Value) pair

    • Key is String, Value is any of the primitive types defined above

  • Path

  • List

  • Map

  • Map.Entry

  • Set

Numeric Type Promotion

TinkerPop performs type promotion a.k.a type casting for Numbers. Numbers are Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. In general, numbers are compared using semantic equivalence, without regard for their specific type, e.g. 1 == 1.0.

Numeric types are promoted as follows:

  • First determine whether to use floating point or not. If any numbers in the comparison are floating point then we convert all of them to floating point.

  • Next determine the maximum bit size of the numerics being compared.

  • If any floating point are present:

    • If the maximum bit size is 32 (up to Integer/Float), we compare as Float

    • If the maximum bit size is 64 (up to Long/Double), we compare as Double

    • Otherwise we compare as BigDecimal

  • If no floating point are present:

    • If the maximum bit size is 8 we compare as Byte

    • If the maximum bit size is 16 we compare as Short

    • If the maximum bit size is 32 we compare as Integer

    • If the maximum bit size is 64 we compare as Long

    • Otherwise we compare as BigInteger

BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these two types can vary depending on a Graph provider.

Comparability, Equality, Orderability, and Equivalence

This section of the document attempts to more clearly define the semantics for different types of value comparison within the Gremlin language and reference implementation. There are four concepts related to value comparison:

Equality

Equality semantics is used by the equality operators (P.eq/neq) and contains operators derived from them (P.within/without). It is also used for implicit P.eq comparisons, for example g.V().has("age", 25) - equality semantics are used to look up vertices by age when considering the value.

Comparability

Comparability semantics is used by the compare operators (P.lt/lte/gt/gte) and operators derived from them (P.inside/outside/between) and defines the semantics of how to compare two values.

Orderability

Orderability semantics defines how two values are compared in the context of an order() operation. These semantics have important differences from Comparability.

Equivalence

Equivalence semantics are slightly different from Equality and are used for operations such as dedup() and group(). Key differences include handling of numeric types and NaN.

Both Equality and Equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns nulltype or throws an exception). Similarly, Orderability can be also understood as complete - any two values can be compared without error for ordering purposes. Comparability semantics are not complete with respect to binary boolean semantics, and as such, Gremlin introduces a ternary boolean semantics for Comparability that includes a third boolean state - ERROR, with its own well-defined semantics.

Ternary Boolean Logics

When evaluating boolean value expressions, we sometimes encounter situations that cannot be proved as either TRUE or FALSE. Common ERROR cases are Comparability against NaN, cross-type Comparability (e.g. String vs Numeric), or other invalid arguments to other boolean value expressions.

Rather than throwing an exception and halting the traversal, we extend normal binary boolean logics and introduce a third boolean option - ERROR. How this ERROR result is handled is Graph provider dependent. For the reference implementation, ERROR is an internal representation only and will not be propagated back to the client as an exception - it will eventually hit a binary reduction operation and be reduced to FALSE (thus quietly filters the solution that produced the ERROR). Before that happens though, it will be treated as its own boolean value with its own semantics that can be used in other boolean value expressions, such as Connective predicates (P.and/or) and negation (P.not).

Ternary Boolean Semantics for AND

A B AND Intuition

TRUE

TRUE

TRUE

TRUE

FALSE

FALSE

TRUE

ERROR

ERROR

TRUE && X == X

FALSE

TRUE

FALSE

FALSE

FALSE

FALSE

FALSE

ERROR

FALSE

FALSE && X == FALSE

ERROR

TRUE

ERROR

X && TRUE == X

ERROR

FALSE

FALSE

X && FALSE == FALSE

ERROR

ERROR

ERROR

X && X == X

Ternary Boolean Semantics for OR

A B OR Intuition

TRUE

TRUE

TRUE

TRUE

FALSE

TRUE

TRUE

ERROR

TRUE

TRUE || X == TRUE

FALSE

TRUE

TRUE

FALSE

FALSE

FALSE

FALSE

ERROR

ERROR

FALSE || X == X

ERROR

TRUE

TRUE

X || TRUE == TRUE

ERROR

FALSE

ERROR

X || FALSE == X

ERROR

ERROR

ERROR

X || X == X

Ternary Boolean Semantics for NOT

The NOT predicate inverts TRUE and FALSE, respectively, but maintains ERROR values. The key idea is that, for an ERROR value, we can neither prove nor disprove the expression, and hence stick with ERROR.

Argument Result

TRUE

FALSE

FALSE

TRUE

ERROR

ERROR

Equality and Comparability

Equality and Comparability can be understood to be semantically aligned with one another. As mentioned above, Equality is used for P.eq/neq (and derived predicates) and Comparability is used for P.lt/lte/gt/gte (and derived predicates). If we define Comparability using a compare() function over A and B as follows:

If (A, B) are Comparable per Gremlin semantics, then:
  For A < B,  Comparability.compare(A, B) < 0
  For A > B,  Comparability.compare(A, B) > 0
  For A == B, Comparability.compare(A, B) == 0
If (A, B) not Comparable, then:
              Comparability.compare(A, B) => ERROR

Then we can define Equality using an equals() function over A and B that acts as a strict binary reduction of Comparability.compare(A, B) == 0:

For any (A, B):
  Comparability.compare(A, B) == 0     implies Equality.equals(A, B) == TRUE
  Comparability.compare(A, B) <> 0     implies Equality.equals(A, B) == FALSE
  Comparability.compare(A, B) => ERROR implies Equality.equals(A, B) == FALSE

The following table illustrates how Equality and Comparability operate under various classes of comparison:

Class Arguments Comparability Equality

Comparisons Involving NaN

(NaN,X)

where X = any value, including NaN

ERROR

Comparing NaN to anything (including itself) cannot be evaluated.

FALSE

Comparisons Involving null

(null,null)

compare() == 0

TRUE

(null, X)

ERROR

Since nulltype is its own type, this falls under the umbrella of cross-type comparisons.

FALSE

Comparisons within the same type family (i.e. String vs. String, Number vs. Number, etc.)

(X, Y)

where X and Y of same type

Result of compare() depends on type semantics, defined below.

TRUE iff compare() == 0

Comparisons across types (i.e. String vs. Number)

(X, Y)

where X and Y of different type

ERROR

FALSE

Equality and Comparability Semantics by Type

For Equality and Comparability evaluation of values within the same type family, we define the semantics per type family as follows.

Number

Numbers are compared using type promotion, described above. As such, 1 == 1.0.

Edge cases:

  • -0.0 == 0.0 == +0.0

  • +INF == +INF, -INF == -INF, -INF != +INF

    • Float.±Infinity and Double.±Infinity adhere to the same type promotion rules.

  • As described above NaN is not Equal and not Comparable to any Number (including itself).

nulltype

As described in the table above, null == null, but is not Equal and not Comparable to any non-null value.

Boolean

For Booleans, TRUE == TRUE, FALSE == FALSE, TRUE != FALSE, and FALSE < TRUE.

String

We assume the common lexicographical order over unicode strings. A and B are compared lexicographically, and A == B if A and B are lexicographically equal.

UUID

UUID is evaluated based on its String representation. However, UUID("b46d37e9-755c-477e-9ab6-44aabea51d50") and the String "b46d37e9-755c-477e-9ab6-44aabea51d50" are not Equal and not Comparable.

Date

Dates are evaluated based on the numerical comparison of Unix Epoch time.

Graph Element (Vertex / Edge / VertexProperty)

If they are the same type of Element, these are compared by the value of their T.id according to the semantics for the particular primitive type used for ids (implementation-specific). Elements of different types are not Equal and not Comparable.

Property

Properties are compared first by key (String semantics), then by value, according to the semantics for the particular primitive type of the value. Properties with values in different type families are not Equal and not Comparable.

List

Lists are compared pairwise, element-by-element, in their natural list order. For each element, if the pairs are Equal, we simply move on to the next element pair until we encounter a pair whose Comparability.compare() value is non-zero (-1, 1, or ERROR), and we return that value. Lists can be evaluated for Equality and Comparability even if they contain multiple types of elements, so long as their elements are pairwise comparable per Equality/Comparability semantics. During this element by element comparison, if iteration A exhausts its elements before iteration B then A < B, and vice-versa.

Empty lists are equal to other empty lists and less than non-empty lists.

A B compare(A,B) P Reason

[]

[]

0

P.eq

empty lists are equal

[]

[1]

-1

P.lt

empty < non-empty

[1]

[]

1

P.gt

non-empty > empty

[1,2,3]

[1,2,3]

0

P.eq

pairwise equal

[1,2,3]

[1,2,4]

-1

P.lt

pairwise equal until last element: 3 < 4

[1,2,3]

[1,2,3,4]

-1

P.lt

A exhausts first

[1,2,3,4]

[1,2,3]

1

P.gt

B exhausts first

[1,2]

[1.0,2.0]

0

P.eq

type promotion

[1,"a"]

[1,"b"]

-1

P.lt

pairwise Comparable and "a" < "b"

[1]

["a"]

ERROR

P.neq

cross-type comparison

Path

Equality and Comparability semantics for Paths are similar to those for Lists, described above (though Paths and Lists are still of different types and thus not Equal and not Comparable).

Set

Sets are compared pairwise, element-by-element, in the same way as Lists, but they are compared in sorted order using Orderability semantics to sort (described further below). We use Orderability semantics for ordering so that Sets containing multiple element types can be properly sorted before being compared.

For example:

A B compare(A,B) P Reason

{1, 2}

{2, 1}

0

P.eq

sort before compare

{1, "foo"}

{"foo", 1}

0

P.eq

we use Orderability semantics to sort across types

Sets do introduce a bit of semantic stickiness, in that on the one hand they do respect type promotion semantics for Equality and Comparability:

{1, 2} == {1.0, 2.0}

But on the other hand they also allow two elements that would be equal (and thus duplicates) according to type promotion:

{1, 1.0, 2} is a valid set and != {1, 2}

We allow some "wiggle-room" in the implementation for providers to decide how to handle this logical inconsistency. The reference implementation allows for semantically equivalent numerics to appear in a set (e.g {1, 1.0}), while at the same time evaluating the same semantically equivalent numerics as equal during pairwise comparison across sets (e.g. {1,2} == {1.0,2.0}).

Map

'Map' semantics can be thought of as similar to Set semantics for the entry set the comprises the Map. So again, we compare pairwise, entry-by-entry, in the same way as Lists, and again, we first sort the entries using Orderability semantics. Map entries are compared first by key, then by value using the Equality and Comparability semantics that apply to the specific type of key and value.

Maps semantics have the same logical inconsistency as set semantics, because of type promotion. Again, we leave room for providers to decide how to handle this in their implementation. The reference implementation allows for semantically equivalent keys to appear in a map (e.g. 1 and 1.0 can both be keys in the same map), but when comparing maps we treat pairwise entries with semantically equivalent keys as the same.

Orderability

Equality and Comparability were described in depth in the sections above, and their semantics map to the P predicates. Comparability in particular is limited to comparison of values within the same type family. Comparability is complete within a given type (except for NaN, which results in ERROR for any comparison), but returns ERROR for comparisons across types (e.g., an integer cannot be compared to a string).

Orderability semantics are very similar to Comparability for the most part, except that Orderability will never result in ERROR for comparison of any two values - even if two values are incomparable according to Comparability semantics we will still be able to determine their respective order. This allows for a total order across all Gremlin values. In the reference implementation, any step using Order.asc or Order.desc (e.g. OrderGlobalStep, OrderLocalStep) will follow these semantics.

To achieve this globally complete order, we need to address any cases in Comparability that produce a comparison ERROR, we must define a global order across type families, and we must provide semantics for ordering "unknown" values (for cases of in-process JVM implementations, like the TinkerGraph).

We define the type space, and the global order across the type space as follows:

1.  nulltype
2.  Boolean
3.  Number
4.  Date
5.  String
6.  Vertex
7.  Edge
8.  VertexProperty
9.  Property
10. Path
11. Set
12. List
13. Map
14. Unknown

Values in different type spaces will be ordered according to their priority (e.g. all Numbers < all Strings).

Within a given type space, Orderability determines if two values are ordered at the same position or one value is positioned before or after the another. When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.

To allow for this total ordering, we must also address the cases in Comparability that produce an comparison ERROR:

ERROR Scenario Comparability Orderability

Comparison against NaN

NaN not comparable to anything, including itself.

NaN appears after +Infinity in the numeric type space.

Comparison across types

Cannot compare values of different types. This includes the nulltype.

Subject to a total type ordering where every value of type A appears before or after every value of Type B per the priorty list above.

Key differences from Comparability

One key difference to note is that we use Orderability semantics to compare values within containers (List, Set, Path, Map, Property) rather than using Comparability semantics (i.e. Orderability all the way down).

Numeric Ordering

Same as Comparability, except NaN is equivalent to NaN and is greater than all other Numbers, including +Infinity. Additionally, because of type promotion (1 == 1.0), numbers of the same value but of different numeric types will not have a stable sort order (1 can appear either before or after 1.0).

Property

Same as Comparability, except Orderability semantics are used for the property value.

Iterables (Path, List, Set, Map)

Same as Comparability, except Orderability semantics apply for the pairwise element-by-element comparisons.

Unknown Types

For Orderability semantics, we allow for the possibility of "unknown" types. If the "unknown" arguments are of the same type, we use java.lang.Object#equals() and java.lang.Comparable (if implemented) to determine their natural order. If the unknown arguments are of different types or do not define a natural order, we order first by Class, then by Object.toString().

Equivalence

Equivalence defines how TinkerPop deals with two values to be grouped or de-duplicated. Specifically it is necessary for the dedup() and group() steps in Gremlin.

For example:

// deduplication needs equivalence over two property values
gremlin> g.V().dedup().by("name")
// grouping by equivalence over two property values
gremlin> g.V().group().by("age")

Like Equality, Equivalence checks always return true or false, never nulltype or error, nor do they produce exceptions. For the most part Equivalence and Equality are the same, with the following key differences:

  • Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent.

  • NaN Equivalence is the reverse of Equality: NaN is equivalent to NaN and not Equivalent to any other Number.

Further Reference

Mapping for P

The following table maps the notions proposed above to the various P operators:

Predicate Concept

P.eq

Equality

P.neq

Equality

P.within

Equality

P.without

Equality

P.lt

Comparability

P.gt

Comparability

P.lte

Equality, Comparability

P.gte

Equality, Comparability

P.inside

Comparability

P.outside

Comparability

P.between

Equality, Comparability

Steps

While TinkerPop has a full test suite for validating functionality of Gremlin, tests alone aren’t always exhaustive or fully demonstrative of Gremlin step semantics. It is also hard to simply read the tests to understand exactly how a step is meant to behave. This section discusses the semantics for individual steps to help users and providers understand implementation expectations.

all()

Description: Filters array data from the Traversal Stream if all of the array’s items match the supplied predicate.

Syntax: all(P predicate)

Start Step Mid Step Modulated Domain Range

N

Y

N

List/array/Iterable/Iterator

List/array/Iterable/Iterator

Arguments:

  • predicate - The predicate to use to test each value in the array data.

Modulation:

None

Considerations:

Each value will be tested using the supplied predicate. Empty lists always pass through and null/non-list traversers will be filtered out of the Traversal Stream.

Exceptions

  • A GremlinTypeErrorException will be thrown if one occurs and no other value evaluates to false.

any()

Description: Filters array data from the Traversal Stream if any of the array’s items match the supplied predicate.

Syntax: any(P predicate)

Start Step Mid Step Modulated Domain Range

N

Y

N

List/array/Iterable/Iterator

List/array/Iterable/Iterator

Arguments:

  • predicate - The predicate to use to test each value in the array data.

Modulation:

None

Considerations:

Each value will be tested using the supplied predicate. Empty lists, null traversers, and non-list traversers will be filtered out of the Traversal Stream.

Exceptions

  • A GremlinTypeErrorException will be thrown if one occurs and no other value evaluates to true.

asDate()

Description: Parse the value of incoming traverser as date. Supported ISO-8601 strings and Unix time numbers.

Syntax: asDate()

Start Step Mid Step Modulated Domain Range

N

Y

N

any

any

Arguments:

None

Incoming date remains unchanged.

Exceptions * If the incoming traverser is a non-String/Number/Date value then an IllegalArgumentException will be thrown.

asString()

Description: Returns the value of incoming traverser as strings, or if Scope.local is specified, returns each element inside incoming list traverser as string.

Syntax: asString() | asString(Scope scope)

Start Step Mid Step Modulated Domain Range

N

Y

N

any

String/List

Arguments:

  • scope - Determines the type of traverser it operates on. Both scopes will operate on the level of individual traversers. The global scope will operate on individual traverser, casting all (except null) to string. The local scope will behave like global for everything except lists, where it will cast individual non-null elements inside the list into string and return a list of string instead.

Null values from the incoming traverser are not processed and remain as null when returned.

Exceptions None

call()

Description: Provides support for provider-specific service calls.

Syntax: call() | call(String service, Map params) | call(String service, Traversal childTraversal) | call(String service, Map params, Traversal childTraversal)

Start Step Mid Step Modulated Domain Range

Y

Y

with()

any

any

Arguments:

  • service - The name of the service call.

  • params - A collection of static parameters relevant to the particular service call. Keys and values can be any type currently supported by the Gremlin type system.

  • childTraversal - A traversal used to dynamically build at query time a collection of parameters relevant to the service call.

Modulation:

  • with(key, value) - Sets an additional static parameter relevant to the service call. Key and value can be any type currently supported by the Gremlin type system.

  • with(key, Traversal) - Sets an additional dynamic parameter relevant to the service call. Key can be any type currently supported by the Gremlin type system.

How static and dynamic parameters are merged is a detail left to the provider implementation. The reference implementation (CallStep) uses effectively a "left to right" merge of the parameters - it starts with the static parameter Map argument, then merges in the parameters from the dynamic Traversal argument, then merges in each with modulation one by one in the order they appear.

Service calls in the reference implementation can be specified as Start (start of traversal), Streaming (mid-traversal flat map step), and Barrier (mid-traversal barrier step). Furthermore, the Barrier type can be all-at-once or with a maximum chunk size. A single service can support more than one of these modes, and if it does, must provide semantics for how to configure the mode at query time via parameters.

Providers using the reference implementation to support service call with need to provide a ServiceFactory for each named service that can create Service instances for execution during traversal. The ServiceFactory is a singleton that is registered with the ServiceRegistry located on the provider Graph. The Service instance is local to each traversal, although providers can choose to re-use instances across traversals provided there is no state.

Considerations:

Providers using the reference implementation can return Traverser output or raw value output - the CallStep will handle either case appropriately. In the case of a Streaming service, where there is exactly one input to each call, the reference implementation can preserve Path information by splitting the input Traverser when receiving raw output from the call. In the case of Barrier however, it is the responsiblity of the Service to preserve Path information by producing its own Traversers as output, since the CallStep cannot match input and ouput across a barrier. The ability to split input Traversers and generate output is provided by the reference implementation’s ServiceCallContext object, which is supplied to the Service during execution.

There are three execution methods in the reference implementation service call API:

  • execute(ServiceCallContext, Map) - execute a service call to start a traversal

  • execute(ServiceCallContext, Traverser, Map) - execute a service call mid-traversal streaming (one input)

  • execute(ServiceCallContext, TraverserSet, Map) - execute a service call mid-traversal barrier

The Map is the merged collection of all static and dynamic parameters. In the case of Barrier execution, notice that there is one Map for many input. Since the call() API support dynamic parameters, this implies that all input must reduce to the same set of parameters for Barrier execution. In the reference implementation, if more than one parameter set is detected, this will cause an execution and the traversal will halt. Providers that implement their own version of a call operation may decide on other strategies to handle this case - for example it may be sensible to group traversers by Map in the case where multiple parameter sets are detected.

The no-arg version of the call() API is meant to be a directory service and should only be used to start a traversal. The reference implementation provides a default version, with will produce a list of service names or a service description if run with verbose=true. Providers using the own implementation of the call operation must provide their own directory listing service with the service name "--list".

Exceptions

  • If a named service does not support the execution mode implied by the traversal, for example, using a Streaming or Barrier step as a traversal source, this will result in an UnsupportedOperationException.

  • As mentioned above, dynamic property parameters (Traversals) that reduce to more than one property set for a chunk of input is not supported in the reference implementation and will result in an UnsupportedOperationException.

  • Use of the reference implementation’s built-in directory service - call() or call("--list") - mid-traversal will result in an UnsupportedOperationException.

combine()

Description: Appends one list to the other and returns the result to the Traversal Stream.

Syntax: combine(Object values)

Start Step Mid Step Modulated Domain Range

N

Y

N

array/Iterable

List

Arguments:

  • values - A list of items (as an Iterable or an array) or a traversal that will produce a list of items.

Modulation:

None

Considerations:

A list is returned after the combine operation is applied so duplicates are allowed. Merge can be used instead if duplicates aren’t wanted. This step only applies to list types which means that non-iterable types (including null) will cause exceptions to be thrown.

Exceptions

  • If the incoming traverser isn’t a list (array or Iterable) then an IllegalArgumentException will be thrown.

  • If the argument doesn’t resolve to a list (array or Iterable) then an IllegalArgumentException will be thrown.

concat()

Description: Concatenates the incoming String traverser with the input String arguments, and return the joined String.

Syntax: concat() | concat(String…​ concatStrings) | concat(Traversal concatTraveral, Traversal…​ otherConcatTraverals)

Start Step Mid Step Modulated Domain Range

N

Y

N

String

String

Arguments:

  • concatStrings - Varargs of String. If one or more String values are provided, they will be concatenated together with the incoming traverser. If no argument is provided, the String value from the incoming traverser is returned.

  • concatTraveral - A Traversal whose must value resolve to a String. The first result returned from the traversal will be concatenated with the incoming traverser.

  • otherConcatTraverals - Varargs of Traversal. Each Traversal value must resolve to a String. The first result returned from each traversal will be concatenated with the incoming traverser and the previous traversal arguments.

Any null String values will be skipped when concatenated with non-null String values. If two null value are concatenated, the null value will be propagated and returned.

Exceptions

  • If the incoming traverser is a non-String value then an IllegalArgumentException will be thrown.

dateAdd()

Description: Increase value of input Date.

Syntax: dateAdd(DT dateToken, Integer value)

Start Step Mid Step Modulated Domain Range

N

Y

N

Date

Date

Arguments:

  • dateToken - Date token enum. Supported values second, minute, hour, day.

  • value - The number of units, specified by the DT Token, to add to the incoming values. May be negative for subtraction.

Exceptions

  • If the incoming traverser is a non-Date value then an IllegalArgumentException will be thrown.

dateDiff()

Description: Returns the difference between two Dates in epoch time.

Syntax: dateDiff(Date value) | dateDiff(Traversal dateTraversal)

Start Step Mid Step Modulated Domain Range

N

Y

N

Date

Date

Arguments:

  • value - Date for subtraction.

  • dateTraversal - The Traversal value must resolve to a Date. The first result returned from the traversal will be subtracted with the incoming traverser.

If argument resolves as null then incoming date will not be changed.

Exceptions

  • If the incoming traverser is a non-Date value then an IllegalArgumentException will be thrown.

dedup()

Description: Removes repeatedly seen results from the Traversal Stream.

Syntax: dedup() | dedup(String…​ labels) | dedup(Scope scope, String…​ labels)

Start Step Mid Step Modulated Domain Range

N

Y

by()

any

any

Arguments:

  • scope - Determines the scope in which dedup is applied. The global scope will drop duplicate values across the global stream of traversers. The local scope operates at the individual traverser level, and will remove duplicate values from within a collection.

  • labels - If dedup() is provided a list of labels, then it will ensure that the de-duplication is not with respect to the current traverser object, but to the path history of the traverser.

For Example:

g.V().as('a').out('created').as('b').in('created').as('c').
  dedup('a','b').select('a','b','c')

will filter out any such a and b pairs which have previously been seen.

Modulation:

  • by() - Performs dedup according to the property specified in the by modulation. For example: g.V().dedup().by("name") will filter out vertices with duplicate names.

Considerations:

  • There is no guarantee that ordering of results will be preserved across a dedup step.

  • There is no guarantee which element is selected as the survivor when filtering duplicates.

For example, given a graph with the following three vertices:

name age

Alex

38

Bob

45

Chloe

38

and the traversal of:

g.V().order().by("name", Order.asc).
  dedup().by("age").values("name")

can return any of:

["Alex", "Bob"], ["Bob", "Alex"], ["Bob", "Chloe"], or ["Chloe", "Bob"]

difference()

Description: Adds the difference of two lists to the Traversal Stream.

Syntax: difference(Object values)

Start Step Mid Step Modulated Domain Range

N

Y

N

array/Iterable

Set

Arguments:

  • values - A list of items (as an Iterable or an array) or a Traversal that will produce a list of items.

Modulation:

None

Considerations:

Set difference (A-B) is an ordered operation. The incoming traverser is treated as A and the provided argument is treated as B. A set is returned after the difference operation is applied so there won’t be duplicates. This step only applies to list types which means that non-iterable types (including null) will cause exceptions to be thrown.

Exceptions

  • If the incoming traverser isn’t a list (array or Iterable) then an IllegalArgumentException will be thrown.

  • If the argument doesn’t resolve to a list (array or Iterable) then an IllegalArgumentException will be thrown.

disjunct()

Description: Adds the disjunct set to the Traversal Stream.

Syntax: disjunct(Object values)

Start Step Mid Step Modulated Domain Range

N

Y

N

array/Iterable

Set

Arguments:

  • values - A list of items (as an Iterable or an array) or a Traversal that will produce a list of items.

Modulation:

None

Considerations:

A set is returned after the disjunct operation is applied so there won’t be duplicates. This step only applies to list types which means that non-iterable types (including null) will cause exceptions to be thrown.

Exceptions

  • If the incoming traverser isn’t a list (array or Iterable) then an IllegalArgumentException will be thrown.

  • If the argument doesn’t resolve to a list (array or Iterable) then an IllegalArgumentException will be thrown.

element()

Description: Traverse from Property to its Element.

Syntax: element()

Start Step Mid Step Modulated Domain Range

N

Y

N

Property

Element

Arguments:

None

Modulation:

None

format()

Description: a mid-traversal step which will handle result formatting to string values.

Syntax: format(String formatString)

Start Step Mid Step Modulated Domain Range

N

Y

by()

any

String

Arguments:

  • formatString - Variables can be represented with %{variable_name} notation. Positional arguments can be used as %{_} token. Can be used multiple times. The variable values are used in the order that the first one will be found: Element properties, then Scope values. If value for some variable was not found, then the result is filtered out.

Exceptions

None

Modulation:

  • by() - Used to inject positional argument. For example: g.V().format("%{name} has %{_} connections").by(bothE().count()).

length()

Description: Returns the length of the incoming string or list, if Scope.local is specified, returns the length of each string elements inside incoming list traverser.

Syntax: length() | length(Scope scope)

Start Step Mid Step Modulated Domain Range

N

Y

N

String/array/Iterable

Integer/List

Arguments:

  • scope - Determines the type of traverser it operates on. Both scopes will operate on the level of individual traversers. The global scope will operate on individual string traverser. The local scope will operate on list traverser with string elements inside.

Null values from the incoming traverser are not processed and remain as null when returned.

Exceptions * For Scope.global or parameterless function calls, if the incoming traverser is a non-String value then an IllegalArgumentException will be thrown. * For Scope.local, if the incoming traverser is not a string or a list of strings then an IllegalArgumentException will be thrown.

intersect()

Description: Adds the intersection to the Traversal Stream.

Syntax: intersect(Object values)

Start Step Mid Step Modulated Domain Range

N

Y

N

array/Iterable

Set

Arguments:

  • values - A list of items (as an Iterable or an array) or a Traversal that will produce a list of items.

Modulation:

None

Considerations:

A set is returned after the intersect operation is applied so there won’t be duplicates. This step only applies to list types which means that non-iterable types (including null) will cause exceptions to be thrown.

Exceptions

  • If the incoming traverser isn’t a list (array or Iterable) then an IllegalArgumentException will be thrown.

  • If the argument doesn’t resolve to a list (array or Iterable) then an IllegalArgumentException will be thrown.

conjoin()

Description: Joins every element in a list together into a String.

Syntax: conjoin(String delimiter)

Start Step Mid Step Modulated Domain Range

N

Y

N

array/Iterable

String

Arguments:

  • delimiter - A delimiter to use to join the elements together. Can’t be null.

Modulation:

None

Considerations:

Every element in the list (except null) is converted to a String. Null values are ignored. The delimiter is inserted between neighboring elements to form the final result. This step only applies to list types which means that non-iterable types (including null) will cause exceptions to be thrown.

Exceptions

  • If the incoming traverser isn’t a list (array or Iterable) then an IllegalArgumentException will be thrown.

  • If the argument doesn’t resolve to a list (array or Iterable) then an IllegalArgumentException will be thrown.

lTrim()

Description: Returns a string with leading whitespace removed.

Syntax: lTrim() | lTrim(Scope scope)

Start Step Mid Step Modulated Domain Range

N

Y

N

String/array/Iterable

String/List

Arguments:

  • scope - Determines the type of traverser it operates on. Both scopes will operate on the level of individual traversers. The global scope will operate on individual string traverser. The local scope will operate on list traverser with string elements inside.

Null values from the incoming traverser are not processed and remain as null when returned.

Exceptions * For Scope.global or parameterless function calls, if the incoming traverser is a non-String value then an IllegalArgumentException will be thrown. * For Scope.local, if the incoming traverser is not a string or a list of strings then an IllegalArgumentException will be thrown.

merge()

Description: Adds the union of two sets (or two maps) to the Traversal Stream.

Syntax: merge(Object values)

Start Step Mid Step Modulated Domain Range

N

Y

N

array/Iterable/Map

Set/Map

Arguments:

  • values - A list of items (as an Iterable or an array), a Map, or a Traversal that will produce a list of items.

Modulation:

None

Considerations:

For iterable types, a set is returned after the merge operation is applied so there won’t be duplicates. For maps, if both maps contain the same key then the value yielded from the argument will be the value put into the merged map. This step only applies to list types or maps which means that other non-iterable types (including null) will cause exceptions to be thrown.

Exceptions

  • If the incoming traverser isn’t a list (array or Iterable) or map then an IllegalArgumentException will be thrown.

  • If the argument doesn’t resolve to a list (array or Iterable) or map then an IllegalArgumentException will be thrown.

mergeE()

Description: Provides upsert-like functionality for edges.

Syntax: mergeE() | mergeE(Map searchCreate) | mergeE(Traversal searchCreate)

Start Step Mid Step Modulated Domain Range

Y

Y

option()

Map/Vertex

Edge

Arguments:

  • searchCreate - A Map used to match an Edge and if not found will be the default set of data to create the new one.

  • onCreate - A Map used to specify additional existence criteria and/or properties not already specified in searchCreate.

  • onMatch - A Map used to update the Edge that is found using the searchCreate criteria.

  • outV - A Vertex that will be late-bound into the searchCreate and onCreate Maps for the Direction.OUT key, or else another Map used to search for that Vertex

  • inV - A Vertex that will be late-bound into the searchCreate and onCreate Maps for the Direction.IN key, or else another Map used to search for that Vertex

The searchCreate and onCreate Map instances must consist of any combination of:

  • T - id, label

  • Direction - IN or to, OUT or from

  • Arbitrary String keys (which are assumed to be vertex properties).

The onMatch Map instance only allows for String keys as the id and label of a Vertex are immutable as are the incident vertices. Values for these valid keys that are null will be treated according to the semantics of the addE() step.

The Map that is used as the argument for searchCreate may be assigned from the incoming Traverser for the no-arg mergeE(). If mergeE(Map) is used, then it will override the incoming Traverser. If mergeE(Traversal) is used, the Traversal argument must resolve to a Map and it would also override the incoming Traverser. The onCreate and onMatch arguments are assigned via modulation as described below.

If onMatch is triggered the Traverser becomes the matched Edge, but the traversal still must return a Map instance to be applied. Null is considered semantically equivalent to an empty Map.

Event Empty Map (or Null)

Search

Matches all edges

Create

New edge with defaults

Update

No update to matched edge

If T.id is used for searchCreate or onCreate, it may be ignored for edge creation if the Graph does not support user supplied identifiers. onCreate inherits from searchCreate - values for T.id, T.label, and Direction.OUT/IN do not need to be specified twice. Additionally, onCreate cannot override values in searchCreate (i.e. if (exists(x)) return(x) else create(y) is not supported).

Modulation:

  • option(Merge, Map) - Sets the onCreate or onMatch arguments directly.

  • option(Merge, Traversal) - Sets the onCreate or onMatch arguments dynamically where the Traversal must resolve to a Map.

  • option(Merge.outV/inV) can also accept a Traversal that resolves to a Vertex, allowing mergeE to be combined with mergeV via a select operation.

Exceptions

  • Map arguments are validated for their keys resulting in exception if they do not meet requirements defined above.

  • Use of T.label should always have a value that is a String.

  • If T.id, T.label, and/or Direction.IN/OUT are specified in searchCreate, they cannot be overriden in onCreate.

  • For late binding of the from and to vertices, Direction.OUT must be set to Merge.outV and Direction.IN must be set to Merge.inV. Other combinations are not allowed and will result in exception.

Considerations:

  • mergeE() (i.e. the zero-arg overload) can only be used mid-traversal. It is not a start step.

  • As is common to Gremlin, it is expected that Traversal arguments may utilize sideEffect() steps.

mergeV()

Description: Provides upsert-like functionality for vertices.

Syntax: mergeV() | mergeV(Map searchCreate) | mergeV(Traversal searchCreate)

Start Step Mid Step Modulated Domain Range

Y

Y

option()

Map

Vertex

Arguments:

  • searchCreate - A Map used to match a Vertex and if not found will be the default set of data to create the new one.

  • onCreate - A Map used to specify additional existence criteria and/or properties not already specified in searchCreate.

  • onMatch - A Map used to update the Vertex that is found using the searchCreate criteria.

The searchCreate and onCreate Map instances must consists of any combination of T.id, T.label, or arbitrary String keys (which are assumed to be vertex properties). The onMatch Map instance only allows for String keys as the id and label of a Vertex are immutable. null Values for these valid keys are not allowed.

The Map that is used as the argument for searchCreate may be assigned from the incoming Traverser for the no-arg mergeV(). If mergeV(Map) is used, then it will override the incoming Traverser. If mergeV(Traversal) is used, the Traversal argument must resolve to a Map and it would also override the incoming Traverser. The onCreate and onMatch arguments are assigned via modulation as described below.

If onMatch is triggered the Traverser becomes the matched Vertex, but the traversal still must return a Map instance to be applied. Null is considered semantically equivalent to an empty Map.

Event Empty Map (or Null)

Search

Matches all vertices

Create

New vertex with defaults

Update

No update to matched vertex

If T.id is used for searchCreate or onCreate, it may be ignored for vertex creation if the Graph does not support user supplied identifiers. onCreate inherits from searchCreate - values for T.id, T.label do not need to be specified twice. Additionally, onCreate cannot override values in searchCreate (i.e. if (exists(x)) return(x) else create(y) is not supported).

Modulation:

  • option(Merge, Map) - Sets the onCreate or onMatch arguments directly.

  • option(Merge, Traversal) - Sets the onCreate or onMatch arguments dynamically where the Traversal must resolve to a Map.

Exceptions

  • Map arguments are validated for their keys resulting in exception if they do not meet requirements defined above.

  • Use of T.label should always have a value that is a String.

  • If T.id and/or T.label are specified in searchCreate, they cannot be overriden in onCreate.

Considerations:

  • mergeV() (i.e. the zero-arg overload) can only be used mid-traversal. It is not a start step.

  • As is common to Gremlin, it is expected that Traversal arguments may utilize sideEffect() steps.

none()

Description: Filters array data from the Traversal Stream if none of the array’s items match the supplied predicate.

Syntax: none(P predicate)

Start Step Mid Step Modulated Domain Range

N

Y

N

List/array/Iterable/Iterator

List/array/Iterable/Iterator

Arguments:

  • predicate - The predicate to use to test each value in the array data.

Modulation:

None

Considerations:

Each value will be tested using the supplied predicate. Empty lists always pass through and null/non-list traversers will be filtered out of the Traversal Stream.

Exceptions

  • A GremlinTypeErrorException will be thrown if one occurs and no other value evaluates to true.

product()

Description: Adds the cartesian product to the Traversal Stream.

Syntax: product(Object values)

Start Step Mid Step Modulated Domain Range

N

Y

N

array/Iterable

List(List)

Arguments:

  • values - A list of items (as an Iterable or an array) or a Traversal that will produce a list of items.

Modulation:

None

Considerations:

A list of lists is returned after the product operation is applied with the inner list being a result pair. This step only applies to list types which means that non-iterable types (including null) will cause exceptions to be thrown.

Exceptions

  • If the incoming traverser isn’t a list (array or Iterable) then an IllegalArgumentException will be thrown.

  • If the argument doesn’t resolve to a list (array or Iterable) then an IllegalArgumentException will be thrown.

replace()

Description: Returns a string with the specified characters in the original string replaced with the new characters.

Syntax: replace(String oldChar, String newChar) | replace(Scope scope, String oldChar, String newChar)

Start Step Mid Step Modulated Domain Range

N

Y

N

String/array/Iterable

String/List

Arguments:

  • oldChar - The string character(s) in the original string to be replaced. Nullable, a null input will be a no-op and the original string will be returned

  • newChar - The string character(s) to replace with. Nullable, a null input will be a no-op and the original string will be returned

  • scope - Determines the type of traverser it operates on. Both scopes will operate on the level of individual traversers. The global scope will operate on individual string traverser. The local scope will operate on list traverser with string elements inside.

Null values from the incoming traverser are not processed and remain as null when returned.

Exceptions * For Scope.global or parameterless function calls, if the incoming traverser is a non-String value then an IllegalArgumentException will be thrown. * For Scope.local, if the incoming traverser is not a string or a list of strings then an IllegalArgumentException will be thrown.

reverse()

Description: Returns the reverse of the incoming traverser

Syntax: reverse()

Start Step Mid Step Modulated Domain Range

N

Y

N

Object

Object

Arguments:

None

The behavior of reverse depends on the type of the incoming traverser. If the traverser is a string, then the string is reversed. If the traverser is iterable (Iterable, Iterator, or an array) then a list containing the items in reverse order are returned. All other types (including null) are not processed and are returned unmodified.

Exceptions

  • If the incoming traverser is a non-String value then an IllegalArgumentException will be thrown.

rTrim()

Description: Returns a string with trailing whitespace removed.

Syntax: rTrim() | rTrim(Scope scope)

Start Step Mid Step Modulated Domain Range

N

Y

N

String/array/Iterable

String/List

Arguments:

  • scope - Determines the type of traverser it operates on. Both scopes will operate on the level of individual traversers. The global scope will operate on individual string traverser. The local scope will operate on list traverser with string elements inside.

Null values from the incoming traverser are not processed and remain as null when returned.

Exceptions * For Scope.global or parameterless function calls, if the incoming traverser is a non-String value then an IllegalArgumentException will be thrown. * For Scope.local, if the incoming traverser is not a string or a list of strings then an IllegalArgumentException will be thrown.

split()

Description: Returns a list of strings created by splitting the incoming string traverser around the matches of the given separator.

Syntax: split(String separator) | split(Scope scope, String separator)

Start Step Mid Step Modulated Domain Range

N

Y

N

String/array/Iterable

List

Arguments:

  • separator - The string character(s) used as delimiter to split the input string. Nullable, a null separator will split on whitespaces. An empty string separator will split on each character.

  • scope - Determines the type of traverser it operates on. Both scopes will operate on the level of individual traversers. The global scope will operate on individual string traverser. The local scope will operate on list traverser with string elements inside.

Null values from the incoming traverser are not processed and remain as null when returned.

Exceptions * For Scope.global or parameterless function calls, if the incoming traverser is a non-String value then an IllegalArgumentException will be thrown. * For Scope.local, if the incoming traverser is not a string or a list of strings then an IllegalArgumentException will be thrown.

substring()

Description: Returns a substring of the incoming string traverser with a 0-based start index (inclusive) and end index (exclusive).

Syntax: substring(int startIndex, int endIndex) | substring(Scope scope, int startIndex, int endIndex)

Start Step Mid Step Modulated Domain Range

N

Y

N

String/array/Iterable

String/List

Arguments:

  • startIndex - The start index, 0 based. If the start index is negative then it will begin at the specified index counted from the end of the string, or 0 if it exceeds the string length.

  • endIndex - The end index, 0 based. Optional, if it is not specific then all remaining characters will be returned. End index ≤ start index will return the empty string.

  • scope - Determines the type of traverser it operates on. Both scopes will operate on the level of individual traversers. The global scope will operate on individual string traverser. The local scope will operate on list traverser with string elements inside.

Null values from the incoming traverser are not processed and remain as null when returned.

Exceptions * For Scope.global or parameterless function calls, if the incoming traverser is a non-String value then an IllegalArgumentException will be thrown. * For Scope.local, if the incoming traverser is not a string or a list of strings then an IllegalArgumentException will be thrown.

toLower()

Description: Returns the lowercase representation of incoming string traverser, or if Scope.local is specified, returns the lowercase representation of each string elements inside incoming list traverser.

Syntax: toLower() | toLower(Scope scope)

Start Step Mid Step Modulated Domain Range

N

Y

N

String

String/List

Arguments:

  • scope - Determines the type of traverser it operates on. Both scopes will operate on the level of individual traversers. The global scope will operate on individual string traverser. The local scope will operate on list traverser with string elements inside.

Null values from the incoming traverser are not processed and remain as null when returned.

Exceptions * For Scope.global or parameterless function calls, if the incoming traverser is a non-String value then an IllegalArgumentException will be thrown. * For Scope.local, if the incoming traverser is not a string or a list of strings then an IllegalArgumentException will be thrown.

toUpper()

Description: Returns the uppercase representation of incoming string traverser, or if Scope.local is specified, returns the uppercase representation of each string elements inside incoming list traverser.

Syntax: toUpper() | toUpper(Scope scope)

Start Step Mid Step Modulated Domain Range

N

Y

N

String/array/Iterable

String/List

Arguments:

  • scope - Determines the type of traverser it operates on. Both scopes will operate on the level of individual traversers. The global scope will operate on individual string traverser. The local scope will operate on list traverser with string elements inside.

Null values from the incoming traverser are not processed and remain as null when returned.

Exceptions * For Scope.global or parameterless function calls, if the incoming traverser is a non-String value then an IllegalArgumentException will be thrown. * For Scope.local, if the incoming traverser is not a string or a list of strings then an IllegalArgumentException will be thrown.

trim()

Description: Returns a string with leading and trailing whitespace removed.

Syntax: trim() | trim(Scope scope)

Start Step Mid Step Modulated Domain Range

N

Y

N

String/array/Iterable

String/List

Arguments:

  • scope - Determines the type of traverser it operates on. Both scopes will operate on the level of individual traversers. The global scope will operate on individual string traverser. The local scope will operate on list traverser with string elements inside.

Null values from the incoming traverser are not processed and remain as null when returned.

Exceptions * For Scope.global or parameterless function calls, if the incoming traverser is a non-String value then an IllegalArgumentException will be thrown. * For Scope.local, if the incoming traverser is not a string or a list of strings then an IllegalArgumentException will be thrown.

Policies

tinkerpop conference

Provider Listing Policy

TinkerPop has two web site sections that help the community find TinkerPop-enabled providers, libraries and tools. There is the Providers page and the Community page. The Providers page lists graph systems that are TinkerPop-enabled. The Community page lists libraries and tools that are designed to work with TinkerPop providers and include things like drivers, object-graph mappers, visualization applications and other similar tools.

To be listed in either page a project should meet the following requirements:

  • The project must be either a TinkerPop-enabled graph system, a Gremlin language variant/compiler, a Gremlin language driver, or a TinkerPop-enabled middleware tool.

  • The project must have a public URL that can be referenced by Apache TinkerPop.

  • The project must have at least one release.

  • The project must be actively developed/maintained to a current or previous "y" version of Apache TinkerPop (3.y.z).

  • The project must have some documentation and that documentation must make explicit its usage of Apache TinkerPop and its version compatibility requirements.

Note that the Apache Software Foundation’s linking policy supersede those stipulated by Apache TinkerPop. All things considered, if your project meets the requirements, please email Apache TinkerPop’s developer mailing list requesting that your project be added to a listing.

Graphic Usage Policy

Apache TinkerPop has a plethora of graphics that the community can use. There are four categories of graphics. These categories and their respective policies are presented below. If you are unsure of the category of a particular graphic, please ask on our developer mailing list before using it. Finally, note that the Apache Software Foundation’s trademark policies supersede those stipulated by Apache TinkerPop.

Character Graphics

A character graphic can be used without permission as long as its being used in an Apache TinkerPop related context and it is acknowledged that the graphic is a trademark of the Apache Software Foundation/Apache TinkerPop.

gremlin and friends

Character Dress-Up Graphics

A character graphic can be manipulated ("dressed up") and used without permission as long as it’s being used in an Apache TinkerPop related context and it is acknowledged that the graphic is a trademark of the Apache Software Foundation/Apache TinkerPop.

gremlin gremstefani

gremlin gremicide

gremlin gremalicious

Explanatory Diagrams

Explanatory diagrams can be used without permission as long as they are being used in an Apache TinkerPop related context, it is acknowledged that they are trademarks of the Apache Software Foundation/Apache TinkerPop, and are being used for technical explanatory purposes.

olap traversal

cyclicpath step

flat map lambda

Character Scene Graphics

Character scene graphics require permission before being used. Please ask for permission on the Apache TinkerPop developer mailing list.

tinkerpop reading

gremlintron

tinkerpop3 splash