[Images/tutorial.gif]

Making things run faster

Table of Contents - Index
Previous: Code Snippets - Having fun with Geometry Services
Next: Debugging Khoros code

[bar]

Introduction

This section is about making programs run faster by making some changes in the code. Unfortunately the changes suggested here means that the code either will be less flexible or more complicated, and can be of help only if the bottlenecks in the processing of your data are caused by I/O operations. Worse, there is no real warranty that the program will really run faster - it is up to you to try and see whether is worth or not to make the changes.

Why bother then ? Again, the answer is up to you. If you think that the answers here could improve your programs and is willing to try, this page can give some hints. These hints can serve both to make your programs more flexible but complicated (treating data differently accordingly to the data type) or more restrict (treating the data as you expect the data to be). This distinction will be easier to understand with the examples.

Often the problem of speed will be related with memory - if you don't have enough memory, the kernel (?) will try to allocate memory in the swap region, that means lots of disk reading and writing (at least in a FreeBSD system). Some of the hints here will be about how to avoid using too much memory and at the same time minimizing calls to kpds_get_data and kpds_put_data .

Before seeing what can be done in respect to code, let's see a related topic.

[bar2]

Making things faster between programs

If you're creating cantata workspaces with lots of files being passed from glyph to glyph, and it is taking too much time, you could consider using other data transports instead of the default files between glyphs . Khoros data transport mechanism supports normal Unix files, pipes, virtual memory, shared memory, sockets and streams (depending on your Unix machine and Khoros configuration).

Here are some results I had converting a 288 kbyte RGB image to CMYK and back to RGB using the Color Toolbox. Times were measured with the csh built-in time command, from which I guess that the real time is the one which counts (includes I/O):

Data transport mechanismRGB to CMYKCMYK to RGB
File in the /tmp directory7.074.37
Shared memory5.924.17
Virtual memory (mmap)5.254.54
(values are the average of the time for 10 runs of each program)

I was unable to put the other transport mechanisms to work - probably because I don't know enough of Unix to make proper selections. Also, I had errors running the programs with shared and virtual memory for larger images. The bottom line is, it could make things faster when you have a workspace that reads and writes several small files, but again your mileage may vary, and the only way to see if it works is by trying.

For more information on the data transport mechanisms, please see the kman page for the kfopen and kopen functions.

[bar]

Processing the data in other data types

Consider the following code:


The code starts HERE


kobject in_obj = NULL;    /* an input object */
kobject out_obj = NULL;   /* an output object */
double *inplane = NULL;   /* data for an input plane */
double *outplane = NULL;  /* data for an output plane */
int w,h,d,t,e,type;       /* data attributes */
in_obj = kpds_open_input_object("myobj.kdf");     /* open the input object */
out_obj = kpds_open_output_object("newobj.kdf");  /* open the output object */
kpds_get_attribute(in_obj,KPDS_VALUE_SIZE,&w,&h,&d,&t,&e);  /* get the object dimensions */
kpds_get_attribute(in_obj,KPDS_VALUE_DATA_TYPE,&type);      /* get the object stored data type */
if (type == KCOMPLEX || type == KDCOMPLEX)              /* won't process if data type is complex */
  {
  kprintf("Error, cannot process complex data.\n");
  kexit(KEXIT_FAILURE);
  }
kpds_set_attribute(in_obj,KPDS_VALUE_DATA_TYPE,KDOUBLE);  /* will force internal processing as double */
kpds_copy_object(in_obj,out_obj);                         /* quick way to initialize the output object */
inplane = (double *)kmalloc(w*h*sizeof(double));
outplane = (double *)kmalloc(w*h*sizeof(double));
/* do some processing in the planes here for all d*t*e planes -
   you must use kpds_get_data and kpds_put_data with the 
   KPDS_VALUE_PLANE attribute */
kpds_close_object(in_obj);       /* close input object */
kpds_set_attribute(out_obj,KPDS_VALUE_DATA_TYPE,type);  /* reset data type to be the same as input */
kpds_close_object(out_obj);      /* close output object */

The code ends HERE

In the code above data is processed as double regardless of its type (if it is complex the program will give an error message and won't process it). If we pass a kobject with dimensions 1024x1024x1x1x3 for it (say, a medium RGB image), even processing the data by planes of 1024x1024, the memory allocated for the input and output will sum up to 16 megabytes - that could be a heavy load for some machines, specially if there are other memory-consuming processes running at the same time.

Note that the call to kpds_set_attribute(out_obj,KPDS_VALUE_DATA_TYPE,type) can save disk space (and the time required to write it) if the input data is of a shorter type (long or int), because the generated file will be smaller than if the file that would be generated if we left the KPDS_VALUE_DATA_TYPE of the output object set to KDOUBLE (it was set when we set that in the input object and copied it to the output object). On the other hand one must consider which processing is to be done in the data: it won't make too much sense if we want to extract the componentwise square root of the input, in this case the input could be integer but the output should be floating point. In other words: the set/reset of the data type of the output object will depend on the application.

Now consider the modified code:


The code starts HERE


kobject in_obj = NULL;    /* an input object */
kobject out_obj = NULL;   /* an output object */
int *inplane = NULL;   /* data for an input plane */
int *outplane = NULL;  /* data for an output plane */
int w,h,d,t,e,type;       /* data attributes */
in_obj = kpds_open_input_object("myobj.kdf");     /* open the input object */
out_obj = kpds_open_output_object("newobj.kdf");  /* open the output object */
kpds_get_attribute(in_obj,KPDS_VALUE_SIZE,&w,&h,&d,&t,&e);  /* get the object dimensions */
kpds_get_attribute(in_obj,KPDS_VALUE_DATA_TYPE,&type);      /* get the object stored data type */
if (type == KCOMPLEX || type == KDCOMPLEX)              /* won't process if data type is complex */
  {
  kprintf("Error, cannot process complex data.\n");
  kexit(KEXIT_FAILURE);
  }
if (type == KDOUBLE || type == KFLOAT ||                /* will process but warns the user */
    type == KLONG || type == KULONG ||
    type = KUINT)  /* unsigned int range is actually different from signed int range */
  {
  kprintf("The data type of the input object requires more precision than the signed integer type.\n");
  kprintf("The object will be processed but some rounding errors and/or precision loss may occur.\n");
  }
kpds_set_attribute(in_obj,KPDS_VALUE_DATA_TYPE,KINT);  /* will force internal processing as integer */
kpds_copy_object(in_obj,out_obj);                      /* quick way to initialize the output object */
inplane = (int *)kmalloc(w*h*sizeof(int));
outplane = (int *)kmalloc(w*h*sizeof(int));
/* do some processing in the planes here for all d*t*e planes -
   you must use kpds_get_data and kpds_put_data with the 
   KPDS_VALUE_PLANE attribute */
kpds_close_object(in_obj);       /* close input object */
kpds_set_attribute(out_obj,KPDS_VALUE_DATA_TYPE,type);  /* reset data type to be the same as input */
kpds_close_object(out_obj);      /* close output object */

The code ends HERE

The required memory for the same kobject mentioned before will sum up to 4 megabytes - 1/4 of the memory used if the data type was considered double. It seems an advantage to use integer data for processing, but on the other hand if a kobject which data type was double or float the data will suffer some loss of precision when casted to integer.

There are two options here: one is declare that the program will work only with integer data and warn the user about floating point data types as we did in the code above, which will be simpler/quicker to program but will limit the program's abilities to work with general data. The other alternative is prepare a more complex program that will deal with all (or at least some) data types, casting to the closest supported data type, which will use less memory if the data type is integer and switch to more memory (and precision) if the data type is floating point. The following code illustrate this approach, where four man data types will be considered: unsigned char (byte), integer, long and double:


The code starts HERE


kobject in_obj = NULL;    /* an input object */
kobject out_obj = NULL;   /* an output object */
unsigned char *UCinplane = NULL,*UCoutplane = NULL;    /* data for planes of unsigned char */
int *Iinplane = NULL,*Ioutplane = NULL;      /* data for planes of int size */
long *Linplane = NULL,*Loutplane = NULL;      /* data for planes of long size */
double *Dinplane = NULL,*Doutplane = NULL;      /* data for planes of double size */
klist *objlist = NULL; /* list of stuff that will be free'd at the end */
int w,h,d,t,e,type;       /* data attributes */
int processtype;          /* internal type for processing */
int cw,ch,cd,ct,ce;       /* counters for the dimensions */
in_obj = kpds_open_input_object("myobj.kdf");     /* open the input object */
out_obj = kpds_open_output_object("newobj.kdf");  /* open the output object */
kpds_get_attribute(in_obj,KPDS_VALUE_SIZE,&w,&h,&d,&t,&e);  /* get the object dimensions */
kpds_get_attribute(in_obj,KPDS_VALUE_DATA_TYPE,&type);      /* get the object stored data type */
if (type == KCOMPLEX || type == KDCOMPLEX)              /* won't process if data type is complex */
  {
  kprintf("Error, cannot process complex data.\n");
  kexit(KEXIT_FAILURE);
  }
if (type == KDOUBLE || type == FLOAT || type == KULONG)  /* data types KDOUBLE, KFLOAT and KULONG */
  processtype = KDOUBLE;	                         /* will be processed as KDOUBLE */
if (type == KLONG || type == KUINT)                      /* data types KLONG, KUINT */
  processtype = KLONG;	                                 /* will be processed as KLONG */
if (type == KINT || type == KUSHORT || type == KSHORT)   /* data types KINT, KUSHORT, KSHORT */
  processtype = KINT;	                                 /* will be processed as KINT */
if (type == KUBYTE || type == KBYTE || type == KBIT)     /* data types KUBYTE, KBYTE, KBIT */
  processtype = KUBYTE;	                                 /* will be processed as KUBYTE */
kpds_set_attribute(in_obj,KPDS_VALUE_DATA_TYPE,processtype); /* will force internal processing as integer */
kpds_copy_object(in_obj,out_obj);                            /* quick way to initialize the output object */
switch(processtype)
  {
  case KUBYTE:
    {
    UCinplane = (unsigned char *)kmalloc(w*h*sizeof(unsigned char));   
    UCoutplane = (unsigned char *)kmalloc(w*h*sizeof(unsigned char));   
    break;	
    }
  case KINT:
    {
    Iinplane = (int *)kmalloc(w*h*sizeof(int));   
    Ioutplane = (int *)kmalloc(w*h*sizeof(int));   
    break;	
    }
  case KLONG:
    {
    Linplane = (long *)kmalloc(w*h*sizeof(long));   
    Loutplane = (long *)kmalloc(w*h*sizeof(long));   
    break;	
    }
  case KDOUBLE:
    {
    Dinplane = (double *)kmalloc(w*h*sizeof(double));   
    Doutplane = (double *)kmalloc(w*h*sizeof(double));   
    break;	
    }
  }
/* do the processing accordingly to the data type */
switch(processtype) /* get the plane of data accordingly to its type */
  {
  case KUBYTE: 
    {
    /* do some processing in the planes here for all d*t*e planes -
       you must use kpds_get_data and kpds_put_data with the 
       KPDS_VALUE_PLANE attribute, getting the data in the variable
       UCinplane and putting data with the variable UCoutplane */   
    break;
    }
  case KINT:
    {
    /* do some processing in the planes here for all d*t*e planes -
       you must use kpds_get_data and kpds_put_data with the 
       KPDS_VALUE_PLANE attribute, getting the data in the variable
       Iinplane and putting data with the variable Ioutplane */   
    break;
    }
  case KLONG:
    {
    /* do some processing in the planes here for all d*t*e planes -
       you must use kpds_get_data and kpds_put_data with the 
       KPDS_VALUE_PLANE attribute, getting the data in the variable
       Linplane and putting data with the variable Loutplane */   
    break;
    }
  case KDOUBLE:
    {
    /* do some processing in the planes here for all d*t*e planes -
       you must use kpds_get_data and kpds_put_data with the 
       KPDS_VALUE_PLANE attribute, getting the data in the variable
       Dinplane and putting data with the variable Doutplane */   
    break;
    }
  }
kpds_close_object(in_obj);       /* close input object */
kpds_set_attribute(out_obj,KPDS_VALUE_DATA_TYPE,type);  /* reset data type to be the same as input */
kpds_close_object(out_obj);      /* close output object */
(void)klist_free(objlist,(kfunc_void)lkcall_free);      /* free list of stuff */

The code ends HERE

This approach attempts to save memory to some extent: if the data type is int, it will not generalize and process as double, if he data type is unsigned char (byte) it will be processed as unsigned char and not as int. There are four general data types recognized by the code, and the input data type will be cast to the closest one without lost of precision. That could possibly save some memory since the data is allocated only for a particular type and not for all possible types.

The drawback is that the code is much more complicated. The second switch command hides the real processing of the program that could consist of three for loops (for depth, time and elements dimensions), the getting of the data from the input object with kpds_get_data , its processing and putting it on the output object with kpds_put_data - all repeated four times (for each supported data type), with minor differences on the data manipulation.

The idea could be extended for all Khoros supported data types, so if the input object data type is float, it will be processed as float, and so on. That will further increase the complexity of the code, but make it able to process any type of input data without loss and without using more memory than necessary.

[bar]

Accessing more data at a time

This section just present some results of different methods to access data. The corresponding code is on the page Code Snippets - More examples for PDM and PDS . The data file has dimensions 64x64x1x3x6 and size integer. The following table show the times and memory requirements for simple access to this data with different access methods (times in second, will depend on your machine/OS/load/etc and should be used only for comparison of the methods):

Access byTotal calls to kpds_get_dataInteger data type (time)Double data type (time)Integer data type (memory)Double data type (memory)
WxHxDxTxE Points294912125.17147.21 4 bytes (a single integer)8 bytes (a single double)
WxHxDxT Vectors 49152 22.76 26.66 24 bytes48 bytes
DxTxE Planes 18 0.56 0.79 65536 bytes131072 bytes
TxE Volumes 18 0.58 0.77 65536 bytes131072 bytes
All data at once 1 0.61 0.97 1179648 bytes2359296 bytes
Note that since the depth dimension in this object was 1, there was no noticeable difference between allocating planes and volumes.

The table above may suggest that getting all the data at once would be faster. This will be true for machines with a lot of available RAM or small objects. If the object size is too large or there is no available RAM, the OS will probably try to allocate memory in the swap, which is on the hard disk, which could slow down the program (I am not sure about this but it seems to be consistent with some speed problems I had in FreeBSD).

[bar]

Compiling the program with optimization switches

Another way to possibly increase the processing speed of the programs is to compile them with optimization switches. Optimized programs should be smaller and execute faster (accordingly to gcc man page). Please check the man page for your compile to see whether it support optimization and which are the switches.

You could probably turn off the debugging information when you compile the program, that will make the final binary smaller and possibly increase speed. For gcc 2.7.2.1, the switch -g controls the inclusion or not of debugging information, please refer to the man page for your compiler to see how it deals with the inclusion of debugging information.

Now, the bad news. If you decide to change the parameters you will have to change the files $BOOTSTRAP/config/Site.YOURMACHINE and $BOOTSTRAP/config/YOURMACHINE.cf where YOURMACHINE is the name of your computer architecture (in my case, freebsd).

Please check the file $BOOTSTRAP/config/Site.YOURMACHINE, there is an option called #define UseOptimization) with lots of useful hints) that controls whether optimization will be default or not, and the hints on the file show how one can override the defaults to create optimized or debuggable programs. The problem is that your program will use Khoros functions, that were compiled accordingly to the #define UseOptimization option. It means that even if you compile your programs with optimization, the Khoros libraries it will use weren't optimized. If you really want to use optimized code, your best bet would be recompile the Khoros system again.

The file $BOOTSTRAP/config/YOURMACHINE.cf also contains some #defines for gcc that define the level of optimization, but you should know which options are there and what they are for to be able to change them. Strong warning: Messing with these parameters could possibly give compilation problems, be sure of what you're doing before trying them !

Personally, I'd prefer to be able to debug programs than make them run a little faster. Again, the decision is up to you.

[bar]

Table of Contents - Index
Previous: Code Snippets - Having fun with Geometry Services
Next: Debugging Khoros code

[bar]

These pages copyright © Rafael Santos (e-mail valid until March 1998). Please let me know if this tutorial is useful for you, I need to justify the time I used to develop it. Comments, requests and bug reports are also welcome, but please see the section Before you e-mail me...

[bar2]

Khoros copyright © Khoral Research, Inc. (KRI) - run klicense for more information.
Khoral Research Inc. is not responsible for or is supervising these pages.

[bar2]

The latest version of this document can be found at Ejima Lab Khoros Pages at Kyushu Institute of Technology, Japan (until March 1998).
Mirrors for this tutorial can be found at Universidade do Vale do Paraíba, Brasil and PUC/RS, Brasil.

[bar2]

Generated with StructHTML, 14:19 August 30, 1997