Erlang Programming with external programs
1. Introduction
When I learn Erlang, it always bother me about its speed. Surely, it is not as fast as C/C++, but it has unique advantages for doing distributed programming. Luckly, Erlang has multiple ways of doing interface techniques.
- By running the program outside the Erlang virtual machine in an external operating system process.
- By running an OS command from within Erlang and capturing the result.
- By running the foreign language code inside the Erlang virtual machine. It is useful since it is more efficient than using an external process. However, it is unsafe since if the code of foreign language crashed, it will also bring down the Erlang system. Another disadvantage is it can be used only for languages like C that produce native object code.
Here, we will only consider interfacing Erlang using ports and OS commands.
2. How Erlang communicates with external programs
Erlang communicates with external programs through ports. It is not the port used with address for socket programming. Ports provide a byte-oriented interface to an external program. Erlang can communicate with it by sending and receiving lists of bytes (not Erlang terms).
However, it is different from socket programming: Ports behave like an Erlang process!
- Send message to it.
- Register it just like process.
- Link to it.
- Send messages to it from a remote distributed Erlang node.
- If the external program crashes, then an exit signal will be sent to the connected process.
- If the connected process dies, then the external program will be killed!
It is like a pipe, we send a message to a port, the message will be sent to the external program connected to the port. Like a middle-man between Erlang process and external program.
3. Erlang with C
3.1. Define how to want to use C code
Suppose we defined the following functions in C program.
int sum(int x, int y) { return x + y; } int twice(int x) { return 2 * x; }
The goal is to call them from Erlang as
X1 = example1:sum(11, 22), Y1 = example1:twice(10),
We need to hide the detail of the interface to the C program. To do so, we need to turn function calls such as sum(11, 22)
and twice(10)
into sequence of bytes, such that we could send to port.
- When send sequence of bytes to the port, the port adds a length count to the byte sequence and sends the result to the external program.
- When the external program replies, the port receives the reply and sends the result to the connected process for the port.
3.2. Define protocal
- All packets start with a 2 bytes which specifies the length of data immidiately followed.
- We encode
sum(N, M)
as bytes sequence[1, N, M]
. - We eccode
twice(N)
as byptes sequence[2, N]
. - All arguments and return values are assumed to be a single byte long.
C program and Erlang program must follow this protocol.
3.3. The C program
3 files
example1.c
Contains the function we want to call.example1_driver.c
Manages the byte stream protocol and calls the routines inexample1.c
.erl_comm.c
Has routines for reading and writing memory buffers.
3.4. Comments for C Code
Left shift operator
Shift the binary bits left by the number of specified number, the position left empty will be filled with 0.
212 = 11010100 (In binary) 212<<1 = 110101000 (In binary) [Left shift by one bit] 212<<0 = 11010100 (Shift by 0) 212<<4 = 110101000000 (In binary) =3392(In decimal)
Right shift operator
Shift the binary bits right by the number of specified number.
212 = 11010100 (In binary) 212>>2 = 00110101 (In binary) [Right shift by two bits] 212>>7 = 00000001 (In binary) 212>>8 = 00000000 212>>0 = 11010100 (No Shift)
- Hex format is friend of byte because one hex represent 4 bits, so a byte is conveniently represented by 2 hex digits.
Because port driver add a 2-byte length header which indicates the length of data following it, we need to get that length from the 2-bytes header first. This is done by combining the two bytes into a single 2-bytes integer.
len = (buf[0] << 8) | buf[1];
- First, left shift buf[0] 8 bits, this will create a 16-bits (2-bytes) with higher 8 bits holds the buf[0].
- Then, that 16-bits is doing OR operation to merge the lower 8-bits with buf[1]. The result is [buf0, buf1] totally 16-bits which contains the length of the following bytes.
- First, left shift buf[0] 8 bits, this will create a 16-bits (2-bytes) with higher 8 bits holds the buf[0].
ssize_t read(int fd, void *buf, size_t count)
- Read up to
count
bytes from the file descriptorfd
into the buffer starting atbuf
. In our example,
read_exact
will read exact bytes specified by len from file descriptor 0 which is stdin.
int read_exact(byte *buf, int len) { int i, got = 0; do { if ((i = read(0, buf+got, len-got)) <= 0) { return i; } got += i; } while (got < len); return len; }
write_exact
is very similar which write exact bytes into file descriptior 1 which is stdout.- It tries to read
len
bytes, and records how many bytes it really reads in got. If the got is less then expected, then it keeps reading. - Until, the read return 0 or negative which means
- Return 0, means the starting position for the read operation is at the end of the file or beyond.
- Return -1, means no data is available or read unsuccessful because some error happend.
- Return 0, means the starting position for the read operation is at the end of the file or beyond.
- Read up to
As opposite of read from stream, we have to write stream with appending 2-bytes as header. This is done by
write_cmd
write_cmd(buff, 1); int write_cmd(byte *buf, int len) { byte li; li = (len >> 8) & 0xff; write_exact(&li, 1); li = len & 0xff; write_exact(&li, 1); return write_exact(buf, len); }
write_cmd
wraps thebuff
which contains the computed result by 2-bytes header.- The key part is converting the
len
integer into 2-bytes stream and writes one byte after another. And we assume thelen
represented in bits does not exceed 2 bytes.
0xff
is used as 1-byte size filter.(len >> 8) & 0xff
gets the higher 1-byte of the header.len & 0xff
gets the lower 1-bytes of the header.
- In a real world scenario, we must take care of Erlang and C have different precision and signedness. This could be difficult.