Browse Source

Progress on rename to pooler and update README

Seth Falcon 14 years ago
parent
commit
a97efba4c5
1 changed files with 143 additions and 110 deletions
  1. 143 110
      README.org

+ 143 - 110
README.org

@@ -1,149 +1,174 @@
-* pidq - A Process Pool Library for Erlang
-
-*Note:* this is all work very much in progress.  If you are
- interested, drop me a note.  Right now, it is really just a readme
- and no working code.
-
-** Use pidq to manage pools of processes (pids).
-
-- Protect the pids from being used concurrently.  The main pidq
-  interface is =pidq:take_pid/0= and =pidq:return_pid/2=.  The pidq
-  server will keep track of which pids are *in use* and which are
-  *free*.
-
-- Maintain the size of the pid pool.  Specify a maximum number of pids
-  in the pool.  Trigger pid creation when the free count drops below a
-  minimum level or when a pid is marked as failing.
-
-- Organize pids by type and randomly load-balance pids by type.  This
-  is useful when the pids represent client processes connected to a
-  particular node in a cluster (think database read slaves).  Separate
-  pools are maintained for each type and a request for a pid will
-  randomly select a type.
+* pooler - An OTP Process Pool Application
+
+The pooler application allows you to manage pools of OTP behaviors
+such as gen_servers, gen_fsms, or supervisors, and provide consumers
+with exclusive access to pool members using pooler:take_member.
+
+** What pooler does
+
+- Protects the members of a pool from being used concurrently.  The
+  main pooler interface is =pooler:take_member/0= and
+  =pooler:return_member/2=.  The pooler server will keep track of
+  which members are *in use* and which are *free*.  There is no need
+  to call =pooler:return_member= if the consumer is a short-lived
+  process; in this case, pooler will detect the consumer's normal exit
+  and reclaim the member.
+
+- Maintains the size of the pool.  You specify an initial and a
+  maximum number of members in the pool.  Pooler will trigger member
+  creation when the free count drops to zero (as long as the in use
+  count is less than the maximum).  New pool members are added to
+  replace member that crash.  If a consumer crashes, the member it was
+  using will be destroyed and replaced.
+
+- Manage multiple pools.  A common configuration is to have each pool
+  contain client processes connected to a particular node in a cluster
+  (think database read slaves).  By default, pooler will randomly
+  select a pool to fetch a member from.
 
 
 ** Motivation
 ** Motivation
 
 
-The need for the pidq kit arose while writing an Erlang-based
-application that uses [[https://wiki.basho.com/display/RIAK/][Riak]] for data storage.  When using the Erlang
-protocol buffer client for Riak, one should avoid accessing a given
-client concurrently.  This is because each client is associated with a
-unique client ID that corresponds to an element in an object's vector
-clock.  Concurrent action from the same client ID defeats the vector
-clock.  For some further explaination, see [1] and [2].
-
-I wanted to avoid spinning up a new client for each request in the
-application.  Riak's protocol buffer client is a =gen_server= process
-that initiates a connection to a Riak node and my intuition is that
-one doesn't want to pay for the startup time for every request you
-send to an app.  This suggested a pool of clients with some management
-to avoid concurrent use of a given client.  On top of that, it seemed
-convenient to add the ability to load balance between clients
-connected to different nodes in the Riak cluster.  The load-balancing
-is a secondary feature; even if you end up setting up [[http://haproxy.1wt.eu/][HAProxy]] for that
-aspect, you might still want the client pooling.
+The need for pooler arose while writing an Erlang-based application
+that uses [[https://wiki.basho.com/display/RIAK/][Riak]] for data storage.  Riak's protocol buffer client is a
+=gen_server= process that initiates a connection to a Riak node.  A
+pool is needed to avoid spinning up a new client for each request in
+the application.  Reusing clients also has the benefit of keeping the
+vector clocks smaller since each client ID corresponds to an entry in
+the vector clock.
+
+When using the Erlang protocol buffer client for Riak, one should
+avoid accessing a given client concurrently.  This is because each
+client is associated with a unique client ID that corresponds to an
+element in an object's vector clock.  Concurrent action from the same
+client ID defeats the vector clock.  For some further explanation,
+see [1] and [2].  Note that concurrent access to Riak's pb client is
+actual ok as long as you avoid updating the same key at the same
+time.  So the pool needs to have checkout/checkin semantics that give
+consumers exclusive access to a client.
+
+On top of that, in order to evenly load a Riak cluster and be able to
+continue in the face of Riak node failures, consumers should spread
+their requests across clients connected to each node.  The client pool
+provides an easy way to load balance.
 
 
 [1] http://lists.basho.com/pipermail/riak-users_lists.basho.com/2010-September/001900.html
 [1] http://lists.basho.com/pipermail/riak-users_lists.basho.com/2010-September/001900.html
 [2] http://lists.basho.com/pipermail/riak-users_lists.basho.com/2010-September/001904.html
 [2] http://lists.basho.com/pipermail/riak-users_lists.basho.com/2010-September/001904.html
 
 
 ** Usage and API
 ** Usage and API
 
 
-*** Startup configuration
+*** Pool Configuration
 
 
-The idea is that you would wire up pidq to be a supervised process in
-your application.  When you start pidq, you specify a module and
-function to use for creating new pids.  You also specify the
-properties for each pool that you want pidq to manage, including the
-arguments to pass to the pid starter function.
-
-An example configuration looks like this:
+Pool configuration is specified in the pooler application's
+environment.  This can be provided in a config file using =-config= or
+set at startup using =application:set_env(pooler, pools,
+Pools)=. Here's an example config file that creates three pools of
+Riak pb clients each talking to a different node in a local cluster:
 
 
 #+BEGIN_SRC erlang
 #+BEGIN_SRC erlang
-  Pool1 = [{name, "node1"},
-           {max_pids, 10},
-           {min_free, 2},
-           {init_size, 5}
-           {pid_starter_args, Args1}],
-  
-  Pool2 = [{name, "node2"},
-           {max_pids, 100},
-           {min_free, 2},
-           {init_size, 50}
-           {pid_starter_args, Args2}],
-  
-  Config = [{pid_starter, {M, F}},
-            {pid_stopper, {M, F}},
-            {pools, [Pool1, Pool2]}]
-
-  % either call this directly, or wire this
-  % call into your application's supervisor  
-  pidq:start(Config)
-
+% pooler.config
+% Start Erlang as: erl -config pooler
+% -*- mode: erlang -*-
+% pooler app config
+[
+ {pooler, [
+         {pools, [
+                  [{name, "rc8081"},
+                   {max_count, 5},
+                   {init_count, 2},
+                   {start_mfa,
+                    {riakc_pb_socket, start_link, ["localhost", 8081]}}],
+
+                  [{name, "rc8082"},
+                   {max_count, 5},
+                   {init_count, 2},
+                   {start_mfa,
+                    {riakc_pb_socket, start_link, ["localhost", 8082]}}],
+
+                  [{name, "rc8083"},
+                   {max_count, 5},
+                   {init_count, 2},
+                   {start_mfa,
+                    {riakc_pb_socket, start_link, ["localhost", 8083]}}]
+                 ]}
+        ]}
+].
 #+END_SRC
 #+END_SRC
 
 
-Each pool has a unique name, a maximum number of pids, an initial
-number of pids, and a minimum free pids count.  When pidq starts, it
-will create pids to match the =init_size= value.  If there are =min_free=
-pids or fewer, pidq will add a pid as long as that doesn't bring the
-total used + free count over =max_pids=.
+Each pool has a unique name, an initial and maximum number of members,
+and an ={M, F, A}= describing how to start members of the pool.  When
+pooler starts, it will create members in each pool according to
+=init_count=.
 
 
-Specifying a =pid_stopper= function is optional.  If not specified,
-=exit(pid, kill)= will be used to shutdown pids in the case of error,
-pidq shutdown, or pool removal.  The function specified will be passed
-a pid as returned by the =pid_starter= function.
+*** Using pooler
 
 
-*** Getting and returning pids
+Here's an example session:
 
 
-Once started, the main interaction you will have with pidq is through
-two functions, =take_pid/0= and =return_pid/2=.
+#+BEGIN_SRC erlang
+application:start(pooler).
+P = pooler:take_member(),
+% use P
+pooler:return_pid(P, ok).
+#+END_SRC
 
 
-Call =pidq:take_pid()= to obtain a pid from the pool.  When you are done
-with it, return it to the pool using =pidq:return_pid(Pid, ok)=.  If
-you encountered an error using the pid, you can pass =fail= as the
-second argument.  In this case, pidq will permently remove that pid
-from the pool and start a new pid to replace it.
+Once started, the main interaction you will have with pooler is through
+two functions, =take_member/0= and =return_member/2=.
+
+Call =pooler:take_member()= to obtain a member from a randomly
+selected pool.  When you are done with it, return it to the pool using
+=pooler:return_member(Pid, ok)=.  If you encountered an error using
+the member, you can pass =fail= as the second argument.  In this case,
+pooler will permanently remove that member from the pool and start a
+new member to replace it.  If your process is short lived, you can
+omit the call to =return_member=.  In this case, pooler will detect
+the normal exit of the consumer and reclaim the member.
 
 
 *** Other things you can do
 *** Other things you can do
 
 
-You can get the status for the system via =pidq:status()=.  This will
+You can get the status for the system via =pooler:status()=.  This will
 return some informational details about the pools being managed.
 return some informational details about the pools being managed.
 
 
-You can also add or remove new pools while pidq is running using
-=pidq:add_pool/1= and =pidq:remove_pool/1=.  Each pid 
+You can also add or remove new pools while pooler is running using
+=pooler:add_pool/1= and =pooler:remove_pool/1= (not yet implemented). 
 
 
 ** Details
 ** Details
 
 
-pidq is implemented as a =gen_server=.  Server state consists of:
+pooler is implemented as a =gen_server=.  Server state consists of:
 
 
 - A dict of pools keyed by pool name.
 - A dict of pools keyed by pool name.
-- A dict mapping in-use pids to their pool name and the pid of the
-  consumer that is using the pid.
-- A dict mapping consumer process pids to the pid they are using.
-- A module and function to use for starting new pids.
+- A dict of pool supervisors keyed by pool name.
+- A dict mapping in-use members to their pool name and the pid of the
+  consumer that is using the member.
+- A dict mapping consumer process pids to the member they are using.
 
 
-Each pool keeps track of its parameters, such as max pids to allow,
-initial pids to start, number of pids in use, and a list of free pids.
+Each pool keeps track of its parameters, such as max member to allow,
+initial members to start, number of members in use, and a list of free
+members.
 
 
 Since our motivating use-case is Riak's pb client, we opt to reuse a
 Since our motivating use-case is Riak's pb client, we opt to reuse a
 given client as much as possible to avoid unnecessary vector clock
 given client as much as possible to avoid unnecessary vector clock
-growth; pids are taken from the head of the free list and returned
+growth; members are taken from the head of the free list and returned
 to the head of the free list.
 to the head of the free list.
 
 
-pidq is a system process and traps exits.  Before giving out a pid, it
-links to the requesting consumer process.  This way, if the consumer
-process crashes, pidq can recover the pid.  When the pid is returned,
-the link to the consumer process will be severed.  Since the state of
-the pid is unknown in the case of a crashing consumer, we will destroy
-the pid and add a fresh one to the pool.
+pooler is a system process and traps exits.  Before giving out a
+member, it links to the requesting consumer process.  This way, if the
+consumer process crashes, pooler can recover the member.  When the
+member is returned, the link to the consumer process will be severed.
+Since the state of the member is unknown in the case of a crashing
+consumer, we will destroy the member and add a fresh one to the pool.
+
+The member starter MFA should use start_link so that pooler will be
+linked to the members.  This way, when members crash, pooler will be
+notified and can refill the pool with new pids.
 
 
-The pid starter MFA should use spawn_link so that pidq will be linked
-to the pids (is it confusing that we've taken the term "pid" and
-turned it into a noun of this system?).  This way, when pids crash,
-pidq will be notified and can refill the pool with new pids.
+*** Supervision
+
+The top-level pooler supervisor, pooler_sup, supervises the pooler
+gen_server and the pooler_pool_sup supervisor.  pooler_pool_sup
+supervises individual pool supervisors (pooler_pooled_worker_sup).
+Each pooler_pooled_worker_sup supervises the members of a pool.
+
+[[./pidq_appmon.jpg]]
 
 
-Also note that an alternative to a consumer explicitly returning a pid
-is for the consumer to exit normally.  pidq will receive the normal
-exit and can reclaim the pid.  In fact, we might want to implement pid
-return as "fake death" by sending pidq exit(PidqPid, normal).
 
 
 *** Pool management
 *** Pool management
 
 
@@ -213,3 +238,11 @@ pidq_pooled_worker_sup supervisor.  You use this to create a new pool
 and specify the M,F,A of the pooled worker at start.
 and specify the M,F,A of the pooled worker at start.
 **** pidq_pooled_worker_sup
 **** pidq_pooled_worker_sup
 Another simple_one_for_one that is used to create actual workers.
 Another simple_one_for_one that is used to create actual workers.
+
+* Rename plan
+genpool, gs_pool, pooler
+pid => worker, member, gs, gspid
+pooler:take_member/0
+pooler:return_member/2
+
+#+OPTIONS: ^:{}