Worker Configuration¶
Worker Heap Memory Size¶
The environment variable TAJO_WORKER_HEAPSIZE in conf/tajo-env.sh allow Tajo Worker to use the specified heap memory size.
If you want to adjust heap memory size, set TAJO_WORKER_HEAPSIZE variable in conf/tajo-env.sh with a proper size as follows:
TAJO_WORKER_HEAPSIZE=8000
The default size is 1000 (1GB).
Temporary Data Directory¶
TajoWorker stores temporary data on local file system due to out-of-core algorithms. It is possible to specify one or more temporary data directories where temporary data will be stored.
tajo-site.xml
<property>
<name>tajo.worker.tmpdir.locations</name>
<value>/disk1/tmpdir,/disk2/tmpdir,/disk3/tmpdir</value>
</property>
Maximum number of parallel running tasks for each worker¶
In Tajo, the capacity of running tasks in parallel are determined by available resources and workload of running queries. In order to specify it, please see [Worker Resources] (#ResourceConfiguration) section.
Worker Resources¶
Each worker can execute multiple tasks simultaneously. In Tajo, users can specify the total size of memory and the number of disks for each worker. Available resources affect how many tasks are executed simultaneously.
In order to specify the resource capacity of each worker, you should add the following configs to tajo-site.xml :
property name | description | value type | default value |
---|---|---|---|
tajo.worker.resource.cpu-cores | the number of cpu cores | integer | 1 |
tajo.worker.resource.memory-mb | memory size (MB) | integer | 1024 |
tajo.worker.resource.disks | the number of disks | integer | 1 |
Note
Currently, QueryMaster requests 512MB memory and 0.5 disk per task for the backward compatibility.
Note
If tajo.worker.resource.dfs-dir-aware is set to true in tajo-site.xml, the worker will aware of and use the number of HDFS datanode’s data dirs in the node. In other words, tajo.worker.resource.disks is ignored.
Example¶
Assume that you want to give 5120 MB memory, 4 disks, and 24 cores on each worker. The example configuration is as follows:
tajo-site.xml
<property>
<name>tajo.worker.resource.tajo.worker.resource.cpu-cores</name>
<value>24</value>
</property>
<property>
<name>tajo.worker.resource.memory-mb</name>
<value>5120</value>
</property>
<property>
<name>tajo.worker.resource.tajo.worker.resource.disks</name>
<value>4.0</value>
</property>
Dedicated Mode¶
Tajo provides a dedicated mode that allows each worker in a Tajo cluster to use whole available system resources including cpu-cores, memory, and disks. For this mode, a user should add the following config to tajo-site.xml :
<property>
<name>tajo.worker.resource.dedicated</name>
<value>true</value>
</property>
In addition, it can limit the memory capacity used for Tajo worker as follows:
property name | description | value type | default value |
---|---|---|---|
tajo.worker.resource.dedicated-memory-ratio | how much memory to be used in whole memory | float | 0.8 |