Erlang Programming with external programs

1. Introduction

When I learn Erlang, it always bother me about its speed. Surely, it is not as fast as C/C++, but it has unique advantages for doing distributed programming. Luckly, Erlang has multiple ways of doing interface techniques.

By running the program outside the Erlang virtual machine in an external operating system process.
By running an OS command from within Erlang and capturing the result.
By running the foreign language code inside the Erlang virtual machine. It is useful since it is more efficient than using an external process. However, it is unsafe since if the code of foreign language crashed, it will also bring down the Erlang system. Another disadvantage is it can be used only for languages like C that produce native object code.

Here, we will only consider interfacing Erlang using ports and OS commands.

2. How Erlang communicates with external programs

Erlang communicates with external programs through ports. It is not the port used with address for socket programming. Ports provide a byte-oriented interface to an external program. Erlang can communicate with it by sending and receiving lists of bytes (not Erlang terms).

However, it is different from socket programming: Ports behave like an Erlang process!

Send message to it.
Register it just like process.
Link to it.
Send messages to it from a remote distributed Erlang node.
If the external program crashes, then an exit signal will be sent to the connected process.
If the connected process dies, then the external program will be killed!

It is like a pipe, we send a message to a port, the message will be sent to the external program connected to the port. Like a middle-man between Erlang process and external program.

3. Erlang with C

3.1. Define how to want to use C code

Suppose we defined the following functions in C program.

int sum(int x, int y) {
    return x + y;
}

int twice(int x) {
    return 2 * x;
}

The goal is to call them from Erlang as

X1 = example1:sum(11, 22),
Y1 = example1:twice(10),

We need to hide the detail of the interface to the C program. To do so, we need to turn function calls such as sum(11, 22) and twice(10) into sequence of bytes, such that we could send to port.

When send sequence of bytes to the port, the port adds a length count to the byte sequence and sends the result to the external program.
When the external program replies, the port receives the reply and sends the result to the connected process for the port.

3.2. Define protocal

All packets start with a 2 bytes which specifies the length of data immidiately followed.
We encode sum(N, M) as bytes sequence [1, N, M].
We eccode twice(N) as byptes sequence [2, N].
All arguments and return values are assumed to be a single byte long.

C program and Erlang program must follow this protocol.

3.3. The C program

3 files

example1.c
Contains the function we want to call.
example1_driver.c
Manages the byte stream protocol and calls the routines in example1.c.
erl_comm.c
Has routines for reading and writing memory buffers.

3.4. Comments for C Code

Left shift operator
Shift the binary bits left by the number of specified number, the position left empty will be filled with 0.

212 = 11010100 (In binary)
212<<1 = 110101000 (In binary) [Left shift by one bit]
212<<0 = 11010100 (Shift by 0)
212<<4 = 110101000000 (In binary) =3392(In decimal)

Right shift operator
Shift the binary bits right by the number of specified number.

212 = 11010100 (In binary)
212>>2 = 00110101 (In binary) [Right shift by two bits]
212>>7 = 00000001 (In binary)
212>>8 = 00000000 
212>>0 = 11010100 (No Shift)

Hex format is friend of byte because one hex represent 4 bits, so a byte is conveniently represented by 2 hex digits.
Because port driver add a 2-byte length header which indicates the length of data following it, we need to get that length from the 2-bytes header first. This is done by combining the two bytes into a single 2-bytes integer.
```
len = (buf[0] << 8) | buf[1];
```
- First, left shift buf[0] 8 bits, this will create a 16-bits (2-bytes) with higher 8 bits holds the buf[0].
- Then, that 16-bits is doing OR operation to merge the lower 8-bits with buf[1]. The result is [buf0, buf1] totally 16-bits which contains the length of the following bytes.
ssize_t read(int fd, void *buf, size_t count)
- Read up to count bytes from the file descriptor fd into the buffer starting at buf.
- In our example, read_exact will read exact bytes specified by len from file descriptor 0 which is stdin.
```
int read_exact(byte *buf, int len) {
    int i, got = 0;
    do {
        if ((i = read(0, buf+got, len-got)) <= 0) {
            return i;
        }
        got += i;
    } while (got < len);
    return len;
}
```
  - write_exact is very similar which write exact bytes into file descriptior 1 which is stdout.
  - It tries to read len bytes, and records how many bytes it really reads in got. If the got is less then expected, then it keeps reading.
  - Until, the read return 0 or negative which means
    - Return 0, means the starting position for the read operation is at the end of the file or beyond.
    - Return -1, means no data is available or read unsuccessful because some error happend.
As opposite of read from stream, we have to write stream with appending 2-bytes as header. This is done by write_cmd
```
write_cmd(buff, 1);

int write_cmd(byte *buf, int len) {
    byte li;
    li = (len >> 8) & 0xff;
    write_exact(&li, 1);

    li = len & 0xff;
    write_exact(&li, 1);
    return write_exact(buf, len);
}
```
- write_cmd wraps the buff which contains the computed result by 2-bytes header.
- The key part is converting the len integer into 2-bytes stream and writes one byte after another. And we assume the len represented in bits does not exceed 2 bytes.
  - 0xff is used as 1-byte size filter.
  - (len >> 8) & 0xff gets the higher 1-byte of the header.
  - len & 0xff gets the lower 1-bytes of the header.
- In a real world scenario, we must take care of Erlang and C have different precision and signedness. This could be difficult.

Zhao Wei