

  Section     Title
 
     0        Introduction
     1        Required abilities
     2        Remote program execution
     3        Data exchanges
     4        Synchronization
     5        Decreasing network access
     6        Supported system calls
     7        Software installation 
     8        Example programs



0.0) Introduction
 To write a distributed program using DIPC, you have to develop your software 
by using independent processes that exchange data through System V IPC 
mechanisms, which are: shared memories, semaphores and messages. 



1.0) Required abilities
 Programming for a distributed system requires three abilities (See the 
introduction to parallel systems in the file named intro)

1.1) Remote program execution

1.2) Data exchanges

1.3) Synchronization



2.0) Remote program execution
 A distributed program consists of several processes. Some of them may run in
the same machine, but at least one has to run in a remote computer. One of 
the processes is usually designated as the main one. It initializes data
structures used in the program, so that other processes can assume a certain
state is present in the system when they start executing. 

2.1) For IPC to make sense, there should be more than one process using its 
mechanisms. In ordinary IPC, These processes may be created with the fork() 
system call. This could be followed by an exec() call. In DIPC, you can not 
use fork() for remote process creation, because it creates a local process.
You could start processes remotely 'by hand', that is, after the main program
has prepared DIPC mechanisms (shared memories, message queues, or semaphore
sets), it waits for an appropriate number of semaphores. Now another user
could start programs at a remote machine's console. They should now use
previously agreed upon semaphores to inform the main process that they are
ready. Here the user has to take care not to run remote programs too soon, as
the shared structures may not be ready yet. the example program in
examples/message directory works in this way.

you could also use programs like 'rsh' to be able to invoke programs remotely.
Here the main process prepares everything and then forks a process to execute
a shell script. This script uses the rsh command to execute programs remotely.
the example program in examples/image directory uses this method. Using rsh
in this way has an advantage: After you have developed your program, you can
simply edit this script to change the computers on which you want to run your
programs, without any need to change and recompile your program. Similarly,
users can determine remote machines with no need to have your program's
sources. This could be important for commercial programs.

 One note about rsh is that if executed too frequently in a short time, inetd 
might shut down the service. You may have to edit the file /etc/inetd.conf to 
influence this behaviour.



3.0) Data exchanges
 Most programs receive some input data, process it in appropriate ways, and
then output the results. Easy exchange of data is very important in a
distributed system. In DIPC you could use messages or shared memory segments
to do this. Shared memory is an asynchronous mechanism. The program can just
access it whenever it wants. Though there may be a long delay between this
access, and the time the data are actually received, The program knows
nothing about it. Messages are a very cheap way to transfer data. You transfer
only the data you want, and you can send it to any process that needs it.
This is a one to one style of communication. Messages can be used as a
synchronous way of data exchange. The program waits in the msgrcv() system
call until another process calls a msgsnd() call. The programmer should
decide which one of these two methods to use. 

  Each page of a distributed shared memory can have several readers in 
different computers at the same time, but only one computer with writer 
processes in it. If DIPC is configured in the segment transfer mode, this
applies to the whole segment. It means that several readers can have the 
shared memory (pages) in their local memory and access it freely, but a 
write by one of them will invalidate the data in other machines. To continue 
reading, the memory contents should be sent to them again. If DIPC is 
configured in a segment transfer mode, then the entire contents of a shared 
memory segment are transferred over the network. If processes use different 
parts of the shared memory, you could consider setting DIPC's transfer mode 
to pages, so programs that assign processes in different computers to read or 
write different pages of the shared memory work well. 

One can use the segment transfer mode to increase the efficiency of data 
transfers: read all the information you need from a shared memory segment, do 
any needed processing using local memory, and then place the results in a 
possibly different shared memory. Consider a shared memory as a door to other 
machines. You are not using the network while you are not using a shared memory 
segment.

 A possible scenario for using shared memory in segment transfer mode is this: 
A main process starts up and places the input data in a shared memory. Other 
remote processes are then started and can read this shared memory segment. 
There can be several readers at the same time, so this can be done in parallel.
Also, as the entire contents of the segment are sent to each machine, There 
will be little unnecessary overhead in the transfer (This is of benefit if 
remote processes need all the data in the shared memory segment). After 
processing input data, remote tasks can return the results in the same or 
another shared memory segment, or they could use messages to do so. As there 
can be only one writer to a shared memory, It is advisable for each remote 
machine to use a different segment to return the results. This way things can 
be done in parallel. I should remind that DIPC can be used with no regard to 
the above points, but performance may suffer.

 DIPC's page transfer mode could be used to allow processes to read and
write different parts of the same shared memory in parallel.

In most cases there is no need in a distributed program to know where a
process is executing, so in DIPC there is no way for the programmer to
explicitly refer to a specific computer in the network. This allows the
program to become address independent. The programmer can use other means
to distinguish between different processes. For example, he could use the
'mtype' field of the msgbuf structure to write an identifier representing
a process, or this information could be placed in some agreed-upon place in
a shared memory. This provides the programmer with a logical addressing
 
 
3.1) Programming in a heterogeneous environment
 In a heterogeneous environment, different machine architectures are
present. Each of them may have a different way to represent data. For
example, the way they represent floating point or even integer numbers may
not be the same. Both message and shared memory segment sizes are expressed
in bytes, and these sizes are the same in all machines. 
 
 In DIPC, interpreting the meaning of data and doing any conversions is up to
the application. But there should be a way to find out if a conversion
between different machine formats is necessary. In messages, the programmer
can use, for example the first byte to indicate the type of the originating
machine. The receiving process can decide if any conversion is necessary. In
the case of shared memories, things are not that simple. A process in the
middle of reading/writing a shared memory may be stopped, the contents of the
shared memory sent to another machine with a different type, be changed with
data generated there, and later be transferred back to the original machine.
All this without the program knowing it. So there should be a way to inform
the program of these events.

 Two interrupts do just this. One is DIPC_SIG_WRITER, and the other is 
DIPC_SIG_READER. If DIPC is configured to send them, then the first one is 
sent when the process first becomes a writer (maybe it was a reader, or maybe 
it did not have the segment at all). In the segment transfer mode, the 
process knows it has exclusive control over the segment. The process can for 
example write the identity of the current architecture in the first bytes of 
the segment and then place any data it produces in the shared memory. The 
second interrupt is sent when a process becomes the reader of a segment (it did 
not have the contents previously). Here the process can for example check the 
first bytes of the segment to see if any data conversions need to be done. Note 
that no DIPC_SIG_READER interrupt is generated if this machine was a previous 
writer and has become a reader now, meaning that its access rights were 
lowered.

 The things are not that convenient when DIPC is using the page transfer mode.
This is because there is no way for a signal to convey any parameters to its
recipient. So it is not possible for the processes to tell which page they can 
read or write now. The programmer could help determine this by pre-assigning 
clearly specified sections of the shared memory to different processes. 

 In any case, unless the programmer has specifically disabled these two
interrupts (SIG_IGN), or the system administrator has configured DIPC not to
send shared memory interrupts, (s)he should be prepared to handle them at
any time. This includes checking system call return values and restarting them 
if necessary.



4.0) Synchronization
 Remote programs should know when to do actions involving other machines. For 
example, They should know when the data they need is available. It is strongly 
recommended that you use semaphores for this purpose. Other methods like
frequently testing and setting a variable in shared memory may result in
very poor performance, as it could require frequent transfers of the whole 
shared memory segment over the network. This happens when DIPC is configured 
in a segment transfer mode. Of course the results won't be much better even 
in a page transfer mode.



5.0) Decreasing network access
 There are some reasons for trying to decrease the number of network
operations in your application. One is obviously performance, as network
operations are very expensive compared to local operations. The other reason 
has to do with some practical limitations in the TCP/IP code of the kernel:
It is not possible to make too many network connections in a short time.
After a while the kernel will issue a "Resource temporarily not available" 
error message, and your application will fail. You will encounter this
sooner if you have fast machines and network. 

 Here are some of the things you should have in mind when programming with 
DIPC:

*) The owner of an IPC structure does all the operations on it locally. So it 
would be better to make the machine that uses an IPC structure the most, as
its owner. Do this by making a process on that machine the first one to create 
it by an xxxget().

*) Spin locks on a shared memory are generally not a good idea if there is a
tight interleaving of testing and setting the lock. This will result in very
frequent transfer of the shared memory over the network. However, if the
spin lock is mostly tested, and occasionally set, then you can consider
using it. You can also consider substituting a spin lock with a semop().

*) Performaing system calls such as a non-blocking semop() (with IPC_NOWAIT) in 
a tight loop like "for(;;;) { semop(...); ...}" will result in a great number
of network operations if not executed on the owner computer. It is recommended 
that you don't use such techniques, but if you have to have them in your 
application, then consider adding some delay between each invocation of the 
system call. For example: sleep(1) puts the application to sleep for one 
second. For split second delays, you can use the select() syscall with a code 
like this:

 struct timeval tv;
 fd_set fd;

 for(; ; ;) 
 {
   /* first check the semaphore, then sleep for a while */
   check_the_semaphore_without_blocking_and_break_if_needed(...);
   /* note: you have to initialize these inside the loop */
   tv.tv_sec = 0;
   /* waits for 0.5 seconds = 2 semaphores per second */
   tv.tv_usec = 0.5 * 1000000;
   FD_ZERO(&fd);
   select(FD_SETSIZE, &fd, NULL, NULL, &tv);
 }



6.0) Supported system calls
 Here are the supported forms of using IPC system calls in DIPC:
 
6.1) Shared Memory:
* shmget(): Same as ordinary IPC.
* shmctl(): Recognized commands are: IPC_STAT, IPC_SET and IPC_RMID. Other
            commands are executed locally. 
* shmat(): Same as ordinary IPC.
* shmdt(): Same as ordinary IPC.

6.2) Message:
* msgget(): Same as ordinary IPC.
* msgctl(): Recognized commands are: IPC_SET, IPC_STAT and IPC_RMID. Other
            commands are executed locally.
* msgsnd(): Same as ordinary IPC.
* msgrcv(): Same as ordinary IPC.

6.3) Semaphore:
* semget(): Same as ordinary IPC.
* semctl(): Recognized commands are: IPC_STAT, IPC_SET, IPC_RMID, GETVAL,
            SETVAL, GETPID, GETNCNT, GETZCNT, GETALL and SETALL. Other 
            commands are executed locally.
* semop(): Same as ordinary IPC.

 Note: If dipcd is not running, then all DIPC system calls described above
are executed locally, as if they are normal IPC calls.

 Note: These system calls may return error values not found in IPC calls.
These indicate for example a failed network operation or a time-out.



7.0) Software installation 
 To Install your software, you may use systems like NFS to free yourself of 
the burden of taking a disk to different machines and copying the executable 
programs there. Other than that, 'rdist' seems a very good candidate for 
program distribution.



8.0) Example programs
 You may look at the example programs as an explanation of the above points. 
Take a look at the Readme file in the examples directory. The program in 
examples/hello directory may be easier to install and run, so you may try it 
first.

