9章翻訳途中.パース部分は専門的なので暫定的に飛ばす。

author rezoo <rezoolab@gmail.com>

Wed, 2 Dec 2009 09:31:06 +0000 (18:31 +0900)

committer rezoo <rezoolab@gmail.com>

Wed, 2 Dec 2009 09:31:06 +0000 (18:31 +0900)
author rezoo <rezoolab@gmail.com>
Wed, 2 Dec 2009 09:31:06 +0000 (18:31 +0900)
committer rezoo <rezoolab@gmail.com>
Wed, 2 Dec 2009 09:31:06 +0000 (18:31 +0900)
diff --git a/language.rst b/language.rst

index 6cce5ca..7a3bf3c 100644 (file)
--- a/language.rst
+++ b/language.rst
@@ -227,7 +227,7 @@ omakeのスコープはインデントのレベルで定義されます。イン
      else
         <else-clause>
  
-まず ``<test>`` が評価され、もしそれが *true* の値(真偽値についての詳細は ":ref:`label9.2`" を参照してください)であるならば ``<true-clause>`` 節が評価されます。そうでなければ、残りの節が評価されます。また、 ``if`` 文は複数の ``elseif`` 節を持たせることができます。 ``elseif`` と ``else`` 節はなくても構いません。ただし、新しいスコープを導入するため、それぞれの節はインデントされている必要があります。
+まず ``<test>`` が評価され、もしそれが *true* の値(真偽値についての詳細は ":ref:`label9.2`" を参照してください)であるならば ``<true-clause>`` のコードが評価されます。そうでなければ、残りの節が評価されます。また、 ``if`` 文は複数の ``elseif`` 宣言句を持たせることができます。 ``elseif`` と ``else`` 宣言句はなくても構いません。ただし、新しいスコープを導入するため、それぞれの宣言句はインデントされている必要があります。
  
  ``if`` 文では、もし評価する文字列が空であったり、内容が ``false`` , ``no`` , ``nil`` , ``undefined`` , ``0`` であった場合、真偽値は *false* として評価されます。それ以外はすべて *true* になります。
  
@@ -272,9 +272,9 @@ omakeのスコープはインデントのレベルで定義されます。イン
      default
         <default-clause>
  
-``case`` の数は任意です。 ``default`` 節はなくても構いませんが、使う場合は一番最後の節で用いるべきです。
+``case`` の数は任意です。 ``default`` 宣言句はなくても構いませんが、使う場合は一番最後の宣言句で用いるべきです。
  
-``switch`` 文での ``<string>`` 文字列は以下の ``<patternN>`` と比較され、適合する節が評価されます。 ::
+``switch`` の場合、文字列は ``<patterni>`` と「文字通りに」比較されます。 ::
  
      switch $(HOST)
      case mymachine
diff --git a/system.rst b/system.rst

index 10bd719..f806a0d 100644 (file)
--- a/system.rst
+++ b/system.rst
@@ -1257,7 +1257,7 @@ Unixシステム上ではバイナリモードは意味を持たず、テキス
     $(flush channel...)
        channel : OutChannel
  
-``flush`` é\96¢æ\95°ã\81¯æ\9b¸ã\81\8dè¾¼ã\82\80ã\81\9fã\82\81にファイルが開かれている場合のみに使われます。この関数はファイルにまだ書き込まれていないすべてのデータを消去します。
+``flush`` é\96¢æ\95°ã\81¯æ\9b¸ã\81\8dè¾¼ã\81¿ç\94¨é\80\94にファイルが開かれている場合のみに使われます。この関数はファイルにまだ書き込まれていないすべてのデータを消去します。
  
  .. index::
     single: channel-name()
@@ -1840,7 +1840,7 @@ Unixシステム上ではバイナリモードは意味を持たず、テキス
  
  各々の行で、レコードはまずいくつかのフィールドに分割され、それらのフィールドは変数 ``$1, $2, ...`` に束縛されます。変数 ``$0`` は行全体として定義されており、 ``$*`` はすべてのフィールドの値が定義されている配列です。 ``$(NF)`` 変数はフィールドの数が定義されています。
  
-次に ``case`` 文が実行されます。もし ``string_i`` がトークン ``$i`` にマッチした場合、 ``body_i`` が評価されます。もし ``case`` の内容が ``export`` で終わっていたのなら、現在の状態は次のループへ受け継がれます。そうでない場合、この値は捨てられます。
+次に ``case`` 文が実行されます。もし ``string_i`` がトークン ``$i`` にマッチした場合、 ``body_i`` が評価されます。もし ``case`` の内容が ``export`` で終わっていたのなら、現在の状態は次の宣言句へ受け継がれます。そうでない場合、この値は捨てられます。
  
  例えば、以下の ``scan`` 関数は単純なコマンドプロセッサのように振る舞います。 ::
  
@@ -1917,7 +1917,7 @@ Unixシステム上ではバイナリモードは意味を持たず、テキス
  
  各々の行で、レコードはまずフィールドセパレータ ``FS`` を用いていくつかのフィールドに分割され、それらのフィールドは変数 ``$1, $2, ...`` に束縛されます。変数 ``$0`` は行全体として定義されており、 ``$*`` はすべてのフィールドの値が定義されている配列です。変数 ``$(NF)`` はフィールドの数が定義されています。
  
-次に、 ``case`` が順番どおりに評価されていきます。各々の ``case`` において、もし正規表現 ``pattern_i`` がレコード ``$0`` にマッチしていた場合は、 ``body_i`` が評価されます。もし ``body_i`` が ``export`` で終わっていたのなら、現在の状態は次のループへ受け継がれます。そうでない場合、この値は捨てられます。もし正規表現が ``\(r\)`` を含んでいたのなら、フィールド ``$1, $2, ...`` はこれらの表現で書き換えられます。
+次に、 ``case`` が順番どおりに評価されていきます。各々の ``case`` において、もし正規表現 ``pattern_i`` がレコード ``$0`` にマッチしていた場合は、 ``body_i`` が評価されます。もし ``body_i`` が ``export`` で終わっていたのなら、現在の状態は次の宣言句へ受け継がれます。そうでない場合、この値は捨てられます。もし正規表現が ``\(r\)`` を含んでいたのなら、フィールド ``$1, $2, ...`` はこれらの表現で書き換えられます。
  
  例えば、以下のコードはテキストが二つのデリミタ ``\begin{<name>}`` と ``\end{<name>}`` の間にあり、さらに ``filter`` 関数の引数として渡された配列の中に ``<name>`` が入っているときだけ、その間のテキストを出力しています。 ::
  
@@ -1958,6 +1958,38 @@ Unixシステム上ではバイナリモードは意味を持たず、テキス
  
  10.11.6 fsubst 
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+::
+
+   fsubst(files)
+   case pattern1 [options]
+      body1
+   case pattern2 [options]
+      body2
+   ...
+   default
+      bodyd
+
+``fsubst`` 関数は ``sed(1)`` のような置換機能を提供します。 ``awk`` と似ていて、もし ``fsubst`` が何の引数も指定されずに呼び出された場合、入力は ``stdin`` が用いられます。もし引数が与えられていた場合、各々の引数は ``InChannnel`` が指定されるか、入力としてファイル名が用いられます。
+
+``RS`` 変数はレコードのセパレータを指定する正規表現が定義されており、 ``RS`` のデフォルトの値は ``\r|\n|\r\n`` です。
+
+``fsubst`` 関数は1回につき1つのレコードを読み込みます。
+
+各々のレコードで、 ``case`` 文は順番どおりに評価されます。各々の ``case`` ではマッチした ``pattern`` を、定義された文字列に置換する機構について定義しています。
+
+現在のところ、omakeでは ``g`` オプションだけがサポートされています。指定した場合、各々の宣言句は全体の置換を行い、すべての ``pattern`` のインスタンスによって置換が行われます。そうでない場合、置換は1回だけ行われます。
+
+出力は ``stdout`` 変数を再定義することによってリダイレクトできます。
+
+例えば、以下のプログラムは ``word`` に適合した文字列すべてを大文字化し、置換します。 ::
+
+    section
+       stdout = $(fopen Subst.out, w)
+       fsubst(Subst.in)
+       case $"\<\([[:alnum:]]+\)\." g
+          value $(capitalize $1).
+       close(stdout)
+
  
  .. index::
     single: lex()
@@ -1965,6 +1997,37 @@ Unixシステム上ではバイナリモードは意味を持たず、テキス
  
  10.11.7 lex 
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+::
+
+   lex(files)
+   case pattern1
+      body1
+   case pattern2
+      body2
+   ...
+   default
+      bodyd
+
+``lex`` 関数はシンプルな文法解析器を提供します。入力はファイルやチャネルのシーケンスです。 ``case`` には正規表現を指定します。この関数は入力を読み込むたび、 *最も長い接頭辞* にマッチした正規表現を選択し、その内容を評価します。
+
+同じ長さで2つの ``case`` 文がマッチしてしまった場合、 *後ろの* ``case`` 文が実行されます。 ``default`` 文は正規表現 ``.`` にマッチするので、パターンリストの最初に設置するのが恐らく望ましいでしょう。
+
+もし ``case`` の内容が ``export`` で終わっていたのなら、現在の状態は次のループへ受け継がれます。
+
+例えば、以下のプログラムは入力されたファイルからすべての英数字を集めます。 ::
+
+    collect-words($(files)) =
+       words[] =
+       lex($(files))
+       default
+          # empty
+       case $"[[:alnum:]]+" g
+          words[] += $0
+          export
+
+``default`` 文が存在する場合、この文は任意の1つの文字のみにマッチします。また、もし入力がどの正規表現にもマッチしなかった場合、この関数はエラーとなります。
+
+``break`` 関数はループを停止する際に用いられます。
  
  .. index::
     single: lex-search()
@@ -1972,6 +2035,33 @@ Unixシステム上ではバイナリモードは意味を持たず、テキス
  
  10.11.8 lex-search 
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+::
+
+   lex-search(files)
+   case pattern1
+      body1
+   case pattern2
+      body2
+   ...
+   default
+      bodyd
+
+``lex-search`` 関数は ``lex`` 関数と似ていますが、この関数はどの正規表現にもマッチしなかった入力をスキップします。 ``default`` 文を含んでいた場合、 ``default`` はすべてのスキップしたテキストにマッチします。
+
+例えば、以下のプログラムは入力されたファイルからすべての英数字文字を集め、含まれていない他のテキストはスキップします。 ::
+
+    collect-words($(files)) =
+       words[] =
+       lex-search($(files))
+       default
+          eprintln(Skipped $0)
+       case $"[[:alnum:]]+" g
+          words[] += $0
+          export
+
+``default`` 文が存在する場合、この文は任意の1つの文字のみにマッチします。また、もし入力がどの正規表現にもマッチしなかった場合、この関数はエラーとなります(訳注:この文章はただコピペしただけの可能性があります)。
+
+``break`` 関数はループを停止する際に用いられます。
  
  .. index::
     single: Lexer
@@ -1979,21 +2069,116 @@ Unixシステム上ではバイナリモードは意味を持たず、テキス
  
  10.11.9 Lexer 
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+``Lexer`` オブジェクトは用意に字句解析を行えるようにするオブジェクトで、 ``lex(1)`` や ``flex(1)`` プログラムと似ています。
+
+omakeでは、字句の解析は ``Lexer`` クラスを継承することによって動的に構成することができます。字句解析器(以後レキサと呼ぶ)の定義はメソッドを呼び出すことで指示文(directive)を指定しているものの集合と、ルールとして宣言句(clause)を指定しているものの集合によって成り立っています。
+
+例えば、以下のシンプルなデスクトップ電卓の演算の字句解析を行う、レキサの定義について考えてみましょう。 ::
+
+   lexer1. =
+      extends $(Lexer)
+
+      other: .
+         eprintln(Illegal character: $* )
+         lex()
+
+      white: $"[[:space:]]+"
+         lex()
+
+      op: $"[-+*/()]"
+         switch $*
+         case +
+            Token.unit($(loc), plus)
+         case -
+            Token.unit($(loc), minus)
+         case *
+            Token.unit($(loc), mul)
+         case /
+            Token.unit($(loc), div)
+         case $"("
+            Token.unit($(loc), lparen)
+         case $")"
+            Token.unit($(loc), rparen)
+
+      number: $"[[:digit:]]+"
+         Token.pair($(loc), exp, $(int $* ))
+
+      eof: $"\'"
+         Token.unit($(loc), eof)
+
+このプログラムは ``Lexer`` オブジェクトから字句解析の環境を定義している ``lexer1`` を継承しています。
+
+残りの定義では宣言句の集合の定義を行っています。コロン(:)の前にはメソッド名を指定し、コロンの後には正規表現を指定します。この場合は内容も指定しています。内容はなくても構いません。もし指定されなかった場合、レキサの定義で既に存在している、与えられたメソッド名が用いられます。
+
+.. warning::
+  *最も長い* 接頭辞にマッチした宣言句が選択されます。もし2つの宣言句が同じ長さの場合、 *後ろの* 宣言句が選択されます。これはほとんどの標準的なレキサとな異なっていますが、拡張性から見ればこの仕様は大きな意味を持ちます。
+  
+最初の宣言句は他の宣言句にマッチしなかった任意の入力文字列がマッチします。この場合、未知の文字のエラーメッセージが出力されます。この宣言句は他の宣言句にマッチしなかった場合のみ選択されることに注意してください。
+
+2番目の宣言句ではホワイトスペースを無視する役割を持っています。ホワイトスペースが見つかった場合、これを無視し、再帰的にレキサを呼び出します。
+
+3番目の宣言句では演算子の役割を持っています。ここでは ``Token`` オブジェクトを利用しています。なお、この ``Token`` オブジェクトは3つのプロパティ(ソース位置を表現している ``loc`` , ``name`` , ``value``)を定義しています。
+
+レキサは各々のメソッドの ``body`` 部で、現在の語彙素(lexeme)の位置を表す ``loc`` 変数が定義されているので、私たちはトークンを生成するためにこの値を用いています。
+
+``Token.unit($(loc), name)`` メソッドは与えられた名前とデフォルトの値を用いて新しい ``Token`` オブジェクトを構成します。
+
+``number`` 宣言句は正の整数の定数にマッチします。 ``Token.pair($(loc), name, value)`` は与えられた名前と値でトークンを構成します。
+
+``Lexer`` オブジェクトは ``InChannel`` オブジェクトを操作します。 ``lexer1.lex-channel(channel)`` メソッドは与えられたチャネルから次のトークンを読み込みます。
  
  .. _label10.11.10:
  
  10.11.10 レキサのマッチング 
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+字句解析においては、最も長くマッチした宣言句が選択されます。これは、最も長い入力文字のシーケンスにマッチした宣言句が評価対象になるということです。もしどの宣言句にもマッチしなかった場合、レキサは ``RuntimeException`` を送出します。もし1つ以上の宣言句が同じ量の入力にマッチした場合、最初の1つが評価に用いられます。
  
  .. _label10.11.11:
  
  10.11.11 拡張したレキサの定義
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+それでは前回のレキサのサンプルを、コメントを無視するように拡張してみましょう。ここで、コメントを ``(*`` から ``*)`` で終わる任意のテキストとして定義します。なお、コメントはネスト化されているものとします。
+
+これを実現する一つの簡単な方法としては、コメントをスキップする別のレキサを定義することが挙げられます。 ::
+
+   lex-comment. =
+      extends $(Lexer)
+
+      level = 0
+
+      other: .
+         lex()
+
+      term: $"[*][)]"
+         if $(not $(eq $(level), 0))
+            level = $(sub $(level), 1)
+            lex()
+
+      next: $"[(][*]"
+         level = $(add $(level), 1)
+         lex()
+
+      eof: $"\'"
+         eprintln(Unterminated comment)
+
+このレキサには、ネストレベルを記録し続けている ``lebel`` プロパティを含んでいます。 ``(*`` に遭遇すると、この変数はレベルを1増やし、 ``*)`` が来たら、0でない場合はレベルを1減らし、続けます。
+
+次に、前回のレキサを、コメントをスキップするような形に修正してみましょう。これはちょうど前に作った ``lexer1`` オブジェクトを拡張することで実現できます。 ::
+
+   lexer1. +=
+      comment: $"[(][*]"
+         lex-comment.lex-channel($(channel))
+         lex()
+
+``comment`` 宣言句の内容にはコメントに遭遇した場合、 ``lex-comment`` レキサを呼び出し、このレキサが返されたときに解析し続けることを指定しています。
+
+(訳注: このコードだとコメントに遭遇した場合、lexer1に戻らずにそのままコメントレキサが続けてしまうと思うのですが…自分が間違っているだけかもしれませんが…)
  
  .. _label10.11.12:
  
-10.11.12 レキサのスレッド化 
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+10.11.12 lexerオブジェクトのスレッド化 
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Clause bodies may also end with an export directive. In this case the lexer object itself is used as the returned token. If used with the Parser object below, the lexer should define the loc, name and value fields in each export clause. Each time the Parser calls the lexer, it calls it with the lexer returned from the previous lex invocation.
  
  .. index::
     single: Parser
@@ -2001,21 +2186,117 @@ Unixシステム上ではバイナリモードは意味を持たず、テキス
  
  10.11.13 Parser
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+The ``Parser`` object provides a facility for syntactic analysis based on context-free grammars.
+
+``Parser`` objects are specified as a sequence of directives, specified with method calls; and productions, specified as rules.
+
+For example, let's finish building the desktop calculator started in the ``Lexer`` example. ::
+
+   parser1. =
+      extends $(Parser)
+
+      #
+      # Use the main lexer
+      #
+      lexer = $(lexer1)
+
+      #
+      # Precedences, in ascending order
+      #
+      left(plus minus)
+      left(mul div)
+      right(uminus)
+
+      #
+      # A program
+      #
+      start(prog)
+
+      prog: exp eof
+         return $1
+
+      #
+      # Simple arithmetic expressions
+      #
+      exp: minus exp :prec: uminus
+         neg($2)
+
+      exp: exp plus exp
+         add($1, $3)
+
+      exp: exp minus exp
+         sub($1, $3)
+
+      exp: exp mul exp
+         mul($1, $3)
+
+      exp: exp div exp
+         div($1, $3)
+
+      exp: lparen exp rparen
+         return $2
+
+Parsers are defined as extensions of the ``Parser`` class. A Parser object must have a ``lexer`` field. The ``lexer`` is not required to be a ``Lexer`` object, but it must provide a ``lexer.lex()`` method that returns a token object with ``name`` and ``value`` fields. For this example, we use the ``lexer1`` object that we defined previously.
+
+The next step is to define precedences for the terminal symbols. The precedences are defined with the ``left``, ``right``, and ``nonassoc`` methods in order of increasing precedence.
+
+The grammar must have at least one start symbol, declared with the ``start`` method.
+
+Next, the productions in the grammar are listed as rules. The name of the production is listed before the colon, and a sequence of variables is listed to the right of the colon. The body is a semantic action to be evaluated when the production is recognized as part of the input.
+
+In this example, these are the productions for the arithmetic expressions recognized by the desktop calculator. The semantic action performs the calculation. The variables ``$1, $2, ...`` correspond to the values associated with each of the variables on the right-hand-side of the production.
  
  .. _label10.11.14:
  
  10.11.14 パーサの呼び出し
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+The parser is called with the ``$(parser1.parse-channel start, channel)`` or ``$(parser1.parse-file start, file)`` functions. The start argument is the start symbol, and the channel or file is the input to the parser.
  
  .. _label10.11.15:
  
  10.11.15 パースの制御 
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+The parser generator generates a pushdown automation based on LALR(1) tables. As usual, if the grammar is ambiguous, this may generate shift/reduce or reduce/reduce conflicts. These conflicts are printed to standard output when the automaton is generated.
+
+By default, the automaton is not constructed until the parser is first used.
+
+The ``build(debug)`` method forces the construction of the automaton. While not required, it is wise to finish each complete parser with a call to the ``build(debug)`` method. If the ``debug`` variable is set, this also prints with parser table together with any conflicts.
+
+The ``loc`` variable is defined within action bodies, and represents the input range for all tokens on the right-hand-side of the production.
  
  .. _label10.11.16:
  
  10.11.16 拡張したパーサ
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Parsers may also be extended by inheritance. For example, let's extend the grammar so that it also recognizes the ``<<`` and ``>>`` shift operations.
+
+First, we extend the lexer so that it recognizes these tokens. This time, we choose to leave ``lexer1`` intact, instead of using the += operator. ::
+
+   lexer2. =
+      extends $(lexer1)
+
+      lsl: $"<<"
+         Token.unit($(loc), lsl)
+
+      asr: $">>"
+         Token.unit($(loc), asr)
+
+Next, we extend the parser to handle these new operators. We intend that the bitwise operators have lower precedence than the other arithmetic operators. The two-argument form of the ``left`` method accomplishes this. ::
+
+   parser2. =
+      extends $(parser1)
+
+      left(plus, lsl lsr asr)
+
+      lexer = $(lexer2)
+
+      exp: exp lsl exp
+         lsl($1, $3)
+
+      exp: exp asr exp
+         asr($1, $3)
+
+In this case, we use the new lexer ``lexer2``, and we add productions for the new shift operations. 
  
  .. index::
     single: Passwd
@@ -2023,6 +2304,17 @@ Unixシステム上ではバイナリモードは意味を持たず、テキス
  
  10.11.17 Passwd
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+``Passwd`` オブジェクトはシステムユーザのデータベース上にあるエントリを表現します。このオブジェクトは以下のプロパティを持っています。
+
+* ``pw_name`` : ログインネーム
+* ``pw_passwd`` : 暗号化されたパスワード
+* ``pw_uid`` : ユーザのユーザID
+* ``pw_gid`` : ユーザのグループID
+* ``pw_gecos`` : ユーザ名かコメント欄
+* ``pw_dir`` : ユーザのホームディレクトリ
+* ``pw_shell`` : ユーザが通常使うシェル
+
+すべてのプロパティがすべてのOS上で意味をもつわけではないことに注意してください。
  
  .. index::
     single: getpwnam()
@@ -2031,6 +2323,15 @@ Unixシステム上ではバイナリモードは意味を持たず、テキス
  
  10.11.18 getpwnam, getpwuid
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+::
+
+    $(getpwnam name...) : Passwd
+       name : String
+    $(getpwuid uid...) : Passwd
+       uid : Int
+    raises RuntimeException
+
+``getpwnam`` 関数はユーザのログイン名からエントリを探しだします。 ``getpwuid`` 関数はユーザID(numerical id, uid)からエントリを探し出します。もしエントリが見つからなかった場合、例外が送出されます。
  
  .. index::
     single: getpwents()
@@ -2038,6 +2339,11 @@ Unixシステム上ではバイナリモードは意味を持たず、テキス
  
  10.11.19 getpwents
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+::
+
+    $(getpwents) : Array
+
+``getpwents`` 関数は ``Passwd`` オブジェクトの配列を返します。すべてのユーザは、システムユーザのデータベースによって用意されます。この関数はOSやユーザデータベースの状況に依存し、返される配列は完全でなかったり、空である可能性もある点に注意してください。
  
  .. index::
     single: Group
author	rezoo <rezoolab@gmail.com>
	Wed, 2 Dec 2009 09:31:06 +0000 (18:31 +0900)
committer	rezoo <rezoolab@gmail.com>
	Wed, 2 Dec 2009 09:31:06 +0000 (18:31 +0900)
language.rst		patch \| blob \| history
system.rst		patch \| blob \| history