Management APIs¶
Admin Apis for Management Service, Label, Index ..
Create a Service¶
Service
is the top level abstraction in S2Graph which could be considered as a database in MySQL.
POST /admin/createService
Service Fields¶
In order to create a Service, the following fields should be specified in the request.
Field Name | Definition | Data Type | Example | Note |
---|---|---|---|---|
serviceName | User defined namespace | String | talk_friendship | Required |
cluster | Zookeeper quorum address | String | abc.com:2181,abd.com:2181 | Optional |
hTableName | HBase table name | String | test | Optional |
hTableTTL | Time to live setting for data | Integer | 86000 | Optional |
preSplitSize | Factor for the table pre-split size | Integer | 1 | Optional |
Basic Service Operations¶
You can create a service using the following API:
curl -XPOST localhost:9000/admin/createService -H 'Content-Type: Application/json' -d '
{
"serviceName": "s2graph",
"cluster": "address for zookeeper",
"hTableName": "hbase table name",
"hTableTTL": 86000,
"preSplitSize": 2
}'
Create a Label¶
A Label
represents a relation between two serviceColumns. Labels are to S2Graph what tables are to RDBMS since they contain the schema information, i.e. descriptive information of the data being managed or indices used for efficient retrieval.
In most scenarios, defining an edge schema (in other words, label) requires a little more care compared to a vertex schema (which is pretty straightforward).
First, think about the kind of queries you will be using, then, model user actions or relations into edges
and design a label accordingly.
POST /admin/createLabel
Label Fields¶
A Label creation request includes the following information.
Field Name | Definition | Data Type | Example | Note |
---|---|---|---|---|
label | Name of the relation | String | talk_friendship | Required |
srcServiceName | Source column’s service | String | kakaotalk | Required |
srcColumnName | Source column’s name | String | user_id | Required |
srcColumnType | Source column’s data type | Long/Integer/String | string | Required |
tgtServiceName | Target column’s service | String | kakaotalk/kakaomusic | Optional |
tgtColumnName | Target column’s name | String | item_id | Required |
tgtColumnType | Target column’s data type | Long/Integer/String | string | Required |
isDirected | Wether the label is directed or undirected | True/False | true/false | Optional. default is true |
serviceName | Which service the label belongs to | String | kakaotalk | Optional. tgtServiceName is used by default |
hTableName | A dedicated HBase table to your Label | String | s2graph-batch | Optional. Service hTableName is used by default |
hTableTTL | Data time to live setting | Integer | 86000 | Optional. Service hTableTTL is used by default |
consistencyLevel | If set to ‘strong’, only one edge is alowed between a pair of source/ target vertices. Set to ‘weak’, and multiple-edge is supported | String | strong/weak | Optional. default is ‘weak’ |
Props & Indices¶
A couple of key elements of a Label are its Properties (props) and indices. Supplementary information of a Vertex or Edge can be stored as props. A single property can be defined in a simple key-value JSON as follows:
{
"name": "name of property",
"dataType": "data type of property value",
"defaultValue": "default value in string"
}
In a scenario where user - video playback history is stored in a Label, a typical example for props would look like this:
[
{"name": "play_count", "defaultValue": 0, "dataType": "integer"},
{"name": "is_hidden","defaultValue": false,"dataType": "boolean"},
{"name": "category","defaultValue": "jazz","dataType": "string"},
{"name": "score","defaultValue": 0,"dataType": "float"}
]
Props can have data types of numeric
(byte/ short/ integer/ float/ double), boolean
or string
.
In order to achieve efficient data retrieval, a Label can be indexed using the “indices” option.
Default value for indices is _timestamp
, a hidden label property.
All labels have _timestamp
in their props under the hood
The first index in indices array will be the primary index (Think of PRIMARY INDEX idx_xxx(p1, p2) in MySQL)
S2Graph will automatically store edges according to the primary index.
Trailing indices are used for multiple ordering on edges. (Think of ALTER TABLE ADD INDEX idx_xxx(p2, p1) in MySQL)
props define meta datas that will not be affect the order of edges. Please avoid using S2Graph-reserved property names:
_timestamp
is reserved for system wise timestamp. this can be interpreted as last_modified_at_from
is reserved for label’s start vertex._to
is reserved for label’s target vertex.
Basic Label Operations¶
Here is an sample request that creates a label user_article_liked
between column user_id
of service s2graph
and column article_id
of service s2graph_news
.
Note that the default indexed property _timestamp
will be created since the indexedProps
field is empty.
curl -XPOST localhost:9000/admin/createLabel -H 'Content-Type: Application/json' -d '
{
"label": "user_article_liked",
"srcServiceName": "s2graph",
"srcColumnName": "user_id",
"srcColumnType": "long",
"tgtServiceName": "s2graph_news",
"tgtColumnName": "article_id",
"tgtColumnType": "string",
"indices": [], // _timestamp will be used as default
"props": [],
"serviceName": "s2graph_news"
}'
The created label user_article_liked
will manage edges in a timestamp-descending order (which seems to be the common requirement for most services).
Here is another example that creates a label friends
, which represents the friend relation between users
in service s2graph
.
This time, edges are managed by both affinity_score and _timestamp
.
Friends with higher affinity_scores come first and if affinity_score is a tie, recently added friends comes first.
curl -XPOST localhost:9000/admin/createLabel -H 'Content-Type: Application/json' -d '
{
"label": "friends",
"srcServiceName": "s2graph",
"srcColumnName": "user_id",
"srcColumnType": "long",
"tgtServiceName": "s2graph",
"tgtColumnName": "user_id",
"tgtColumnType": "long",
"indices": [
{"name": "idx_affinity_timestamp", "propNames": ["affinity_score", "_timestamp"]}
],
"props": [
{"name": "affinity_score", "dataType": "float", "defaultValue": 0.0},
{"name": "_timestamp", "dataType": "long", "defaultValue": 0},
{"name": "is_hidden", "dataType": "boolean", "defaultValue": false},
{"name": "is_blocked", "dataType": "boolean", "defaultValue": true},
{"name": "error_code", "dataType": "integer", "defaultValue": 500}
],
"serviceName": "s2graph",
"consistencyLevel": "strong"
}'
S2Graph supports multiple indices on a label which means you can add separate ordering options for edges.
curl -XPOST localhost:9000/admin/addIndex -H 'Content-Type: Application/json' -d '
{
"label": "friends",
"indices": [
{"name": "idx_3rd", "propNames": ["is_blocked", "_timestamp"]}
]
}'
In order to get general information on a label, make a GET request to /admin/getLabel/{label name}
curl -XGET localhost:9000/admin/getLabel/friends
Delete a label with a PUT request to /admin/deleteLabel/{label name}
curl -XPUT localhost:9000/admin/deleteLabel/friends
Label updates are not supported (except when you are adding an index). Instead, you can delete the label and re-create it.
Adding Extra Properties to Labels¶
To add a new property, use /admin/addProp/{label name}
curl -XPOST localhost:9000/admin/addProp/friend -H 'Content-Type: Application/json' -d '
{
"name": "is_blocked",
"defaultValue": false,
"dataType": "boolean"
}'
Consistency Level¶
Simply put, the consistency level of your label will determine how the edges are stored at storage level. First, note that S2Graph identifies a unique edge by combining its from, label, to values as a key.
Now, let’s consider inserting the following two edges that have same keys (1, graph_test, 101) and different timestamps (1418950524721 and 1418950524723).
1418950524721 insert e 1 101 graph_test {"weight": 10} = (1, graph_test, 101)
1418950524723 insert e 1 101 graph_test {"weight": 20} = (1, graph_test, 101)
Each consistency levels handle the case differently.
- strong
- The strong option makes sure that there is only one edge record stored in the HBase table for edge key (1, graph_test, 101). With strong consistency level, the later insertion will overwrite the previous one.
- weak
- The weak option will allow two different edges stored in the table with different timestamps and weight values.
For a better understanding, let’s simplify the notation for an edge that connects two vertices u - v at time t as u -> (t, v), and assume that we are inserting these four edges into two different labels with each consistency configuration (both indexed by timestamp only).
u1 -> (t1, v1)
u1 -> (t2, v2)
u1 -> (t3, v2)
u1 -> (t4, v1)
With a strong consistencyLevel, your Label contents will be:
u1 -> (t4, v1)
u1 -> (t3, v2)
Note that edges with same vertices and earlier timestamp (u1 -> (t1, v1) and u1 -> (t2, v2)) were overwritten and do not exist. On the other hand, with consistencyLevel weak.
u1 -> (t1, v1)
u1 -> (t2, v2)
u1 -> (t3, v2)
u1 -> (t4, v1)
It is recommended to set consistencyLevel to weak unless you are expecting concurrent updates on same edge.
In real world systems, it is not guaranteed that operation requests arrive at S2Graph in the order of their timestamp. Depending on the environment (network conditions, client making asynchronous calls, use of a message que, and so on) request that were made earlier can arrive later. Consistency level also determines how S2Graph handles these cases. Strong consistencyLevel promises a final result consistent to the timestamp. For example, consider a set of operation requests on edge (1, graph_test, 101) were made in the following order;
1418950524721 insert e 1 101 graph_test {"is_blocked": false}
1418950524722 delete e 1 101 graph_test
1418950524723 insert e 1 101 graph_test {"is_hidden": false, "weight": 10}
1418950524724 update e 1 101 graph_test {"time": 1, "weight": -10}
1418950524726 update e 1 101 graph_test {"is_blocked": true}
and actually arrived in a shuffled order due to complications
1418950524726 update e 1 101 graph_test {"is_blocked": true}
1418950524723 insert e 1 101 graph_test {"is_hidden": false, "weight": 10}
1418950524722 delete e 1 101 graph_test
1418950524721 insert e 1 101 graph_test {"is_blocked": false}
1418950524724 update e 1 101 graph_test {"time": 1, "weight": -10}
Strong consistency still makes sure that you get the same eventual state on (1, graph_test, 101). Here is pseudocode of what S2Graph does to provide a strong consistency level.
complexity = O(one read) + O(one delete) + O(2 put)
fetchedEdge = fetch edge with (1, graph_test, 101) from lookup table.
if fetchedEdge is not exist:
create new edge same as current insert operation
update lookup table as current insert operation
else:
valid = compare fetchedEdge vs current insert operation.
if valid:
delete fetchedEdge
create new edge after comparing fetchedEdge and current insert.
update lookup table
Limitations Since S2Graph makes asynchronous writes to HBase via Asynchbase, there is no consistency guaranteed on same edge within its flushInterval (1 second).
Adding Extra Indices (Optional)¶
POST /admin/addIndex
A label can have multiple properties set as indexes. When edges are queried, the ordering will determined according to indexes, therefore, deciding which edges will be included in the top-K results.
Edge retrieval queries in S2Graph by default returns top-K edges. Clients must issue another query to fetch the next K edges, i.e., top-K ~ 2 x top-K
Edges sorted according to the indices in order to limit the number of edges being fetched by a query. If no ordering property is given, S2Graph will use the timestamp as an index, thus resulting in the most recent data.
It would be extremely difficult to fetch millions of edges and sort them at request time and return a top-K in a reasonable amount of time. Instead, S2Graph uses vertex-centric indexes to avoid this. Using a vertex-centric index, having millions of edges is fine as long as size K of the top-K values is reasonable (under 1K) Note that indexes must be created prior to inserting any data on the label (which is the same case with the conventional RDBMS).
New indexes can be dynamically added, but will not be applied to pre-existing data (support for this is planned for future versions). Currently, a label can have up to eight indices.
The following is an example of adding index play_count
to a label graph_test
.
# add prop first
curl -XPOST localhost:9000/admin/addProp/graph_test -H 'Content-Type: Application/json' -d '
{ "name": "play_count", "defaultValue": 0, "dataType": "integer" }'
# then add index
curl -XPOST localhost:9000/admin/addIndex -H 'Content-Type: Application/json' -d '
{
"label": "graph_test",
"indices": [
{ name: "idx_play_count", propNames: ["play-count"] }
]
}'
Create a ServiceColumn (Optional)¶
POST /admin/createServiceColumn
If your use case requires props assigned to vertices instead of edges, what you need is a Service Column
Remark: If it is only the vertex id that you need and not additional props, there’s no need to create a Service Column explicitly. At label creation, by default, S2Graph creates column space with empty properties according to the label schema.
Service Column Fields¶
Field Name | Definition | Data Type | Example | Note |
---|---|---|---|---|
Field Name | Definition | Data Type | Example | Remarks |
serviceName | Which service the Service Column belongs to | String | kakaotalk | Required |
columnName | Service Column`s name | String | talk_user_id | Required |
props | Optional properties of Service Column | JSON (array dictionaries) | Please refer to the examples | Optional |
Basic Service Column Operations¶
Here are some sample requests for Service Column creation as well as vertex insertion and selection.
curl -XPOST localhost:9000/admin/createServiceColumn -H 'Content-Type: Application/json' -d '
{
"serviceName": "s2graph",
"columnName": "user_id",
"columnType": "long",
"props": [
{"name": "is_active", "dataType": "boolean", "defaultValue": true},
{"name": "phone_number", "dataType": "string", "defaultValue": "-"},
{"name": "nickname", "dataType": "string", "defaultValue": ".."},
{"name": "activity_score", "dataType": "float", "defaultValue": 0.0},
{"name": "age", "dataType": "integer", "defaultValue": 0}
]
}'
General information on a vertex schema can be retrieved with /admin/getServiceColumn/{service name}/{column name}
curl -XGET localhost:9000/admin/getServiceColumn/s2graph/user_id
This will give all properties on serviceName s2graph
and columnName user_id
serviceColumn.
Properties can be added to a Service Column with /admin/addServiceColumnProps/{service name}/{column name}
curl -XPOST localhost:9000/admin/addServiceColumnProps/s2graph/user_id -H 'Content-Type: Application/json' -d '
[
{"name": "home_address", "defaultValue": "korea", "dataType": "string"}
]'
Vertices can be inserted to a Service Column using /admin/vertices/insert/{service name}/{column name}
curl -XPOST localhost:9000/mutate/vertex/insert/s2graph/user_id -H 'Content-Type: Application/json' -d '
[
{"id":1,"props":{"is_active":true}, "timestamp":1417616431},
{"id":2,"props":{},"timestamp":1417616431}
]'
Finally, query your vertex via /graphs/getVertices
curl -XPOST localhost:9000/graphs/getVertices -H 'Content-Type: Application/json' -d '
[
{"serviceName": "s2graph", "columnName": "user_id", "ids": [1, 2, 3]}
]'