* pidq - A Process Pool Library for Erlang

*Note:* this is all work very much in progress.  If you are
 interested, drop me a note.  Right now, it is really just a readme
 and no working code.

** Use pidq to manage pools of processes (pids).

- Protect the pids from being used concurrently.  The main pidq
  interface is =pidq:take_pid/0= and =pidq:return_pid/2=.  The pidq
  server will keep track of which pids are *in use* and which are
  *free*.

- Maintain the size of the pid pool.  Specify a maximum number of pids
  in the pool.  Trigger pid creation when the free count drops below a
  minimum level or when a pid is marked as failing.

- Organize pids by type and randomly load-balance pids by type.  This
  is useful when the pids represent client processes connected to a
  particular node in a cluster (think database read slaves).  Separate
  pools are maintained for each type and a request for a pid will
  randomly select a type.

** Motivation

The need for the pidq kit arose while writing an Erlang-based
application that uses [[https://wiki.basho.com/display/RIAK/][Riak]] for data storage.  When using the Erlang
protocol buffer client for Riak, one should avoid accessing a given
client concurrently.  This is because each client is associated with a
unique client ID that corresponds to an element in an object's vector
clock.  Concurrent action from the same client ID defeats the vector
clock.  For some further explaination, see [1] and [2].

I wanted to avoid spinning up a new client for each request in the
application.  Riak's protocol buffer client is a =gen_server= process
that initiates a connection to a Riak node and my intuition is that
one doesn't want to pay for the startup time for every request you
send to an app.  This suggested a pool of clients with some management
to avoid concurrent use of a given client.  On top of that, it seemed
convenient to add the ability to load balance between clients
connected to different nodes in the Riak cluster.  The load-balancing
is a secondary feature; even if you end up setting up [[http://haproxy.1wt.eu/][HAProxy]] for that
aspect, you might still want the client pooling.

[1] http://lists.basho.com/pipermail/riak-users_lists.basho.com/2010-September/001900.html
[2] http://lists.basho.com/pipermail/riak-users_lists.basho.com/2010-September/001904.html

** Usage and API

*** Startup configuration

The idea is that you would wire up pidq to be a supervised process in
your application.  When you start pidq, you specify a module and
function to use for creating new pids.  You also specify the
properties for each pool that you want pidq to manage, including the
arguments to pass to the pid starter function.

An example configuration looks like this:

#+BEGIN_SRC erlang
  Pool1 = [{name, "node1"},
           {max_pids, 10},
           {min_free, 2},
           {init_size, 5}
           {pid_starter_args, Args1}],
  
  Pool2 = [{name, "node2"},
           {max_pids, 100},
           {min_free, 2},
           {init_size, 50}
           {pid_starter_args, Args2}],
  
  Config = [{pid_starter, {M, F}},
            {pid_stopper, {M, F}},
            {pools, [Pool1, Pool2]}]

  % either call this directly, or wire this
  % call into your application's supervisor  
  pidq:start(Config)

#+END_SRC

Each pool has a unique name, a maximum number of pids, an initial
number of pids, and a minimum free pids count.  When pidq starts, it
will create pids to match the =init_size= value.  If there are =min_free=
pids or fewer, pidq will add a pid as long as that doesn't bring the
total used + free count over =max_pids=.

Specifying a =pid_stopper= function is optional.  If not specified,
=exit(pid, kill)= will be used to shutdown pids in the case of error,
pidq shutdown, or pool removal.  The function specified will be passed
a pid as returned by the =pid_starter= function.

*** Getting and returning pids

Once started, the main interaction you will have with pidq is through
two functions, =take_pid/0= and =return_pid/2=.

Call =pidq:take_pid()= to obtain a pid from the pool.  When you are done
with it, return it to the pool using =pidq:return_pid(Pid, ok)=.  If
you encountered an error using the pid, you can pass =fail= as the
second argument.  In this case, pidq will permently remove that pid
from the pool and start a new pid to replace it.

*** Other things you can do

You can get the status for the system via =pidq:status()=.  This will
return some informational details about the pools being managed.

You can also add or remove new pools while pidq is running using
=pidq:add_pool/1= and =pidq:remove_pool/1=.  Each pid 

** Details

pidq is implemented as a =gen_server=.  Server state consists of:

- A dict of pools keyed by pool name.
- A dict mapping in-use pids to their pool name and the pid of the
  consumer that is using the pid.
- A dict mapping consumer process pids to the pid they are using.
- A module and function to use for starting new pids.

Each pool keeps track of its parameters, such as max pids to allow,
initial pids to start, number of pids in use, and a list of free pids.

Since our motivating use-case is Riak's pb client, we opt to reuse a
given client as much as possible to avoid unnecessary vector clock
growth; pids are taken from the head of the free list and returned
to the head of the free list.

pidq is a system process and traps exits.  Before giving out a pid, it
links to the requesting consumer process.  This way, if the consumer
process crashes, pidq can recover the pid.  When the pid is returned,
the link to the consumer process will be severed.  Since the state of
the pid is unknown in the case of a crashing consumer, we will destroy
the pid and add a fresh one to the pool.

The pid starter MFA should use spawn_link so that pidq will be linked
to the pids (is it confusing that we've taken the term "pid" and
turned it into a noun of this system?).  This way, when pids crash,
pidq will be notified and can refill the pool with new pids.

Also note that an alternative to a consumer explicitly returning a pid
is for the consumer to exit normally.  pidq will receive the normal
exit and can reclaim the pid.  In fact, we might want to implement pid
return as "fake death" by sending pidq exit(PidqPid, normal).

*** Pool management

It is an error to add a pool with a name that already exists.

Pool removal has two forms:

- *graceful* pids in the free list are killed (using exit(pid, kill)
  unless a =pid_stopper= is specified in the pool parameters.  No pids
  will be handed out from this pool's free list.  As pids are
  returned, they are shut down.  When the pool is empty, it is
  removed.

- *immediate* all pids in free and in-use lists are shut down; the
  pool is removed.

#+BEGIN_SRC erlang
  -spec(take_pid() -> pid()).
  
  -spec(return_pid(pid(), ok | fail) -> ignore).
  
  -spec(status() -> [term()]).
  
  -type(pid_type_opt() ::
        {name, string()} |
        {max_pids, int()} |
        {min_free, int()} |
        {init_size, int()} |
        {pid_starter_args, [term()]}).
  
  -type(pid_type_spec() :: [pid_type_opt()]).
  -spec(add_type(pid_type_spec()) -> ok | {error, Why}).
  -spec(remove_type(string()) -> ok | {error, Why}).
#+END_SRC


** Notes
*** Move pid_starter and pid_stopper to pool config
This way, you can have pools backed not only by different config, but
by entirely different services.  Could be useful for testing a new
client implementation.
*** Rename something other than "pid"
*** Consider ets for state storage rather than dict

#+BEGIN_SRC erlang
pman:start().
A1 = {riakc_pb_socket, start_link, ["127.0.0.1", 8081]}.
{ok, S1} = pidq_pool_sup:start_link(A1).
supervisor:start_child(S1, []).

{ok, S2} = pidq_sup:start_link([]).
supervisor:start_child(pidq_pool_sup, [A1]).

application:load(pidq).
C = application:get_all_env(pidq).
pidq:start(C).
#+END_SRC

*** supervision strategy

**** pidq_sup
top-level supervisor watches pidq gen_server and the pidq_pool_sup
supervisor.
**** pidq_pool_sup
A simple_one_for_one supervisor that is used to create/watch
pidq_pooled_worker_sup supervisor.  You use this to create a new pool
and specify the M,F,A of the pooled worker at start.
**** pidq_pooled_worker_sup
Another simple_one_for_one that is used to create actual workers.