                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         RE2C                {                      A                   More            ersatileV              Scanner              Generator
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    eterP         bulisBum                                                                                                                                                                                                         Donald           D.            anwCo
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Computer        Science      tDepartmen     and              Computer       Systems        Group
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              yersitUniv    of          aterlooW
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          April          15,              1994
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Abstract
                                                                                                                                                                                                                                                                                                                                                                                                                                     It              is              usually           claimed           that            lexical            analysis          routines          are             still             coded            yb              hand,             despite           the             widespread
                                                                                                                                                                                                                                                                                                                                                                        ailabiliytva     of            scanner         generators,    for           eciency        reasons.               While           eciency        is             a             consideration,    there          exist
                                                                                                                                                                                                                                                                                                                                                                          freely          ailableva              scanner           generators        hsuc               as                 GLA               [7]               that              can                generate           scanners            that              are                faster            than
                                                                                                                                                                                                                                                                                                                                                                          most              hand-coded          ones.                       er,evwHo          most              generated         scanners          are               tailored           for             a                particular         t,vironmenen
                                                                                                                                                                                                                                                                                                                                                                          and             retargetting     these          scanners         to             other          ts,vironmenen   if              possible,         is              usually           complex          enough          to            emak
                                                                                                                                                                                                                                                                                                                                                                          a               hand-coded        scanner         more            appealing.                 In              this            paper           ew             describe          RE2C,          a              scanner          generator       that           not
                                                                                                                                                                                                                                                                                                                                                                          only          generates  scanners     hwhic       are        faster      (and       usually       smaller)     than        those       produced     yb         yan         other       scanner
                                                                                                                                                                                                                                                                                                                                                                          generator     wnkno        to            the          authors,     including        GLA,       but           also          adapt        easily          to          yan         t.vironmenen
                                                                                                                                                                                                                                                                                                                                                                          Categories         and               Subject             Descriptors:               D.3.2            [Programming        Languages]:              Language           Classications       {
                                                                                                                                                                                                                                                                                                                                                                    d cializeespnatioapplic;languagesD.3.4    [Programming  Languages]:     Processors
                                                                                                                                                                                                                                                                                                                                                                          General      erms:T          Algorithms,   Languages,    erformanceP
                                                                                                                                                                                                                                                                                                                                                                          Additional      Key        ordsW         and          Phrases:         Lexical        analysis,       scanner       generator
                                                                                                                                                                                                                                                                   1                                                               troInduction
                                                                                                                                                                                                                                                                   Lexical       analysis      routines       are            still         often         coded           yb            hand          despite         the            widespread yailabilitvaof           scanner         gener-
                                                                                                                                                                                                                                                                   ators.           or F          example,    while         most         Unix          systems     evha          a             scanner         generator      installed    ypically(t   LEX          ][15          or            
ex
                                                                                                                                                                                                                                                                   [16]),     few          Unix        applicationsuse         a          hanicallymecgenerated   scanner.           One          commonly cited        reason       for        not          using
                                                                                                                                                                                                                                                                   LEX-generated   scanners        is              performance:         they           can            be              10             times       erwslo        than        talenequiv    hand-coded       scanners
                                                                                                                                                                                                                                                                   [13].           As            a             result,       there          has           been            considerable hresearc        toin        vingimpro  the           performance    of          hanicallymecgen-
                                                                                                                                                                                                                                                                   erated          scanners        ,[16           7,            9].                  GLA           ],[7           one           hsuc            scanner         generator,      can            produce          scanners         that           are             faster         than
                                                                                                                                                                                                                                                                   most            hand-coded       scanners.                  er,evwHo         the               use              of              hand-coded        scanners          is               still        t.alenprev                 One              pyossibilit     is
                                                                                                                                                                                                                                                                   that          this          is           due             to           the           ydicult     of           adapting     the           generated       scanners       to            specic         applications.
                                                                                                                                                                                                                                                                                                                                 Most            scanner        generators      are            tailored      to            a              particular  t.vironmenen       In            fact,         the             trend          in           trecen         earsy          has
                                                                                                                                                                                                                                                                   been             to             tegratein      scanner          generators      with           compiler      toolkits.          orF            example,      GLA            is             part           of             the            Eli            compiler
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     1
                                                                                                                                                                                                                                                                   construction       system           ],[8              and               Rex              [9]             is                part               of              the               GMD           oolbxoT           for              Compiler        Construction                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        .                             Scanners
                                                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                                                                  ermissionPto        ycop      without    fee        all        or         part       of         this       material is         tedgran videdpro  that      the        copies    are        not        made      or          distributed
                                                                                                                                                                                                                                                                   for         direct   commercialtage,anadvthe   CMA         tyrighcopnotice     and        the        title      of          the        publicationand      its        date       appear,    and        notice     is
                                                                                                                                                                                                                                                                  engiv      that      yingcop   is         yb         permissionof          the        Associationfor        Computing.hineryMaco T       ycop       otherwise,or          to         republish,requires
                                                                                                                                                                                                                                                                   a             fee          and/or      specic     permission.       tyrighCop   1994         yb           the          Association for           Computing,hineryMacInc.               o  T          appear       in           LOPLAS
                                                                                                                                                                                                                                                                   2(1{4).
                                                                                                                                                                                                                                                                                                                 1
                                                                                                                                                                                                                                                                                                                                   Also       wnkno     as         CoktailcxoolboTler-(Compiler-CompiKarlsruhe).
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       1
                                                                                                                                                                                                                                                                   generated           yb                these               tools               assume            the                 existence           of                a                  library           of                support            modules            for                error              handling,          input
                                                                                                                                                                                                                                                                   buering,     bsymol       table        t,managemenand           similar     functions.        While         these          support         modules       simplify    the           task
                                                                                                                                                                                                                                                                   of       tingimplemena   compiler or       terpreter,inthey     emak     adaptationto      other     purposes     more     dicult.        Adaptation
                                                                                                                                                                                                                                                                   to            other        tsvironmenen is            also         made         more         dicult      because          often         assumptions  are            made        about          the            input         and
                                                                                                                                                                                                                                                                   restrictions    are             placed         on            enstok         in             order          to           ehievac        better            performance.          RE2C          goes             to             the            other           extreme:
                                                                                                                                                                                                                                                                   it           tratesconcen    solely       on            generating    code            for         hingmatc    regular       expressions.
                                                                                                                                                                                                                                                                                                                                 RE2C          is           successful      at            its          task:              not          only         does            it           produce         scanners       hwhic         are           faster        than          those         created        yb
                                                                                                                                                                                                                                                                   other           scanner         generators      but,         ,  surprisinglythey           are             usually       smaller      as             ell.w             urther,F       RE2C           does             not           emak
                                                                                                                                                                                                                                                                  yan              assumptions    about            the             input            or              place           yan             restrictions      on             ens.tok                o  T            a               large           degree,           the               performance
                                                                                                                                                                                                                                                                   and           y
exibilit  of            RE2C-generated scanners        is             due           to             a           elvno        method         for           determining  when           to            rell         a             buer
                                                                                                                                                                                                                                                                  hwhic       oidsva       the            complicationstroinduced  yb            the          tinelsen       method       ].[1
                                                                                                                                                                                                                                                                                                                                 The        wingfollosections  of        this       paper       describe      RE2C     scanner     specications,discuss  who       these       specications
                                                                                                                                                                                                                                                                   are                ertedvcon           toin                scanners,              and                egiv               performance          results            edhievac            yb                  our               ntatioimplemen   (including          a
                                                                                                                                                                                                                                                                   comparison   with         GLA).
                                                                                                                                                                                                                                                                   2                                                                 Scanner                  Specications
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      2
                                                                                                                                                                                                                                                                   An            RE2C          source         le           consists       of          ]C[14        or            C++[4]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               code        edvterleain   with         tscommen    of           the            form        /*!re2c             :      :      :
                                                                                                                                                                                                                                                                   */            tainingcon    scanner         specications.          These           specications    are            replaced        with          generated        code            that           is           edokvin
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      3
                                                                                                                                                                                                                                                                   simply     yb            \falling  to"in        the           tscommen    as            illustrated   in           Figure        1             and          in            Appendix       A                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            .
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 #define             YYCURSOR                                  p
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 unsigned            char                 *scan         uint(unsigned      char                 *p)f
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 /*!re2c
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         [0-9]+                                                                                                                                                                                            freturn              p;g
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       377]000-\[\                                                                            freturn             NULL;g
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 */
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 g
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Figure        1:                A             simple       scanner.
                                                                                                                                                                                                                                                                                                                                 A               scanner         specication   estak          the            form         of            a              list          of            rules,        heac           rule           consisting     of            a             regular        expression     ][10
                                                                                                                                                                                                                                                                   and             an              action         expressed          in             executable       code.                    Figure          2               illustrates     a               trivial       RE2C            scanner           specication    that
                                                                                                                                                                                                                                                                   will        be            used         as            an         example     throughout  this         paper.                              hEac          call       to            the          code          generated     from       a           specication   will
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            "print"                                                                                                                                                                                                  f                     return              PRINT;                                           /*                   rule                 5                     */                   g
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            +[a-z]                                                                                                                                                                                                                          f                     return              ID;                                                                                                                /*                   rule                 4                     */                   g
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            +[0-9]                                                                                                                                                                                                                          f                     return              DEC;                                                                                         /*                   rule                 3                     */                   g
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            "0x"               + [0-9a-f]                                        f                     return              HEX;                                                                                         /*                   rule                 2                     */                   g
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            377][\000-\                                                                                                      f                      return              ERR;                                                                                         /*                   rule                1                      */                  g
                                                                                                                                                                                                                                                                                                                                                                                                                         Figure         2:                  Sample       specication.          []b-a        hesmatc       yan          haracterc      eenbwet         a              and            b,        .  elyinclusiv
                                                                                                                                                                                                                                                                                                                                                                                                                         The             last           rule,          for            example,     will         hmatc        yan          teigh          bit           haracter.c             Rules          are            listed          in            order           of
                                                                                                                                                                                                                                                                                                                                                                                                                         precedence.
                                                                                                                                                                                                                                                                   rst         determine    the         longest      possible     prex        of          the          remaining input      that        hesmatc    one         of          the          regular     expressions
                                                                                                                                                                                                                                                                   and           will         then          execute         the           action        in           the            rst           applicable   rule.
                                                                                                                                                                                                                                                                                                                 2
                                                                                                                                                                                                                                                                                                                                   RetargettingRE2C    to         a          tdieren language is       ard.tforwstraigh
                                                                                                                                                                                                                                                                                                                 3
                                                                                                                                                                                                                                                                                                                                   RE2C-generatedscannersrequireno        additionalsupport code.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              [a-z]
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           11                                                                                                                                                                                                                                       6
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            1                                                                                                                                                                                                                                4
                                                                                                                                                                                                                                                                                                                                                                                                                                                       [\000-\377]\                                                                                                                                                                                                                                                                                                                                     [a-z]\p                                                                                                                                                                                                                                                               [a-z]\r                                                                                                                                                                                                                           [a-z]\i                                                                                                                                                                                                                          [a-z]\n                                                                                                                                                                                                                          [a-z]\t                                                                                                                                                                                                                          [a-z]
                                                                                                                                                                                                                                                                                                                                                                                                                                                       [0-9a-z]
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   0                                                                                                                                         p                                                                                      14                                                                                                                   r                                                                                                           24                                                                                                                   i                                                                                                           34                                                                                                                   n                                                                                                           44                                                                                                                   t                                                                                                           55
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  0                                                                                                                                                     [1-9]                                                                                                                                                                                                                                                 [0-9]
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   73                                                                              [0-9]                                                                                                                                            83
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  x                                                                                                                                                                                                                                                                                                                                                                                                           [0-9a-f]
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   9                                                                                                                                                                                                                        10
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   [0-9                                                                                                                                                                      2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   a-f]
                                                                                                                                                                                                                                                                                                                                                                                                                         Figure               3:                              A              A    DF              for                 the                   sample             specication           in                  Figure              2.                                     State                0                   is                   the                 start                state.
                                                                                                                                                                                                                                                                                                                                                                                                                         Accepting         states            are               labeled          with            the             bumern          of              the               rule             that             they            h.matc                 or F            example,
                                                                                                                                                                                                                                                                                                                                                                                                                         state             10             accepts           rule             2.                    ransitionsT     diering        only           yb              label           are             tedrepresen        with            the              same
                                                                                                                                                                                                                                                                                                                                                                                                                         arc.            or F        example,   state         0          has           transitions  to          state         6          on            all        of           the         wingfolloharacters:c        a,           :      :    , :
                                                                                                                                                                                                                                                                                                                                                                                                                         o,            q,           :      :      :           z.
                                                                                                                                                                                                                                                                                                                                 RE2C          is           tdieren        from         most         other          scanner         generators     in            that           the            user          ustm         videpro       the            input         buering
                                                                                                                                                                                                                                                                  hanismmec       for              the                scanner;            the               generated          code                simply         assumes           that              the               user               has               dened             three              pters:oin
                                                                                                                                                                                                                                                                   YYCURSOR,YYLIMIT  and       YYMARKER,and      a          routine  YYFILL(.)n      Before      executing  the        generated   code,      YYCURSOR
                                                                                                                                                                                                                                                                   and         YYLIMIT ustm      be          set         to          ptoin     to         the         rst        and        one         past       the         last      haracterc    in         the        buer,    .  respelyectiv   After
                                                                                                                                                                                                                                                                   a               entok           is              recognized,       and              before           yan             action          is               executed,          YYCURSOR       is              set               to              ptoin           to               just             past             the             en.tok
                                                                                                                                                                                                                                                                   YYFILL        will          be              called          as             the             buer           needs           lling;       at              least           n              additional   input         haractersc       should         be              vided.pro
                                                                                                                                                                                                                                                                   When            YYFILL       is             called,       YYCURSOR     will          ptoin         to             the            next           haracterc       to             be              scanned         and            YYMARKER,   if            set,            will
                                                                                                                                                                                                                                                                   ptoin          to            a             possible      kingktracbac   ptoin         in            the            buer.               YYFILL       ustm         update           YYLIMIT,    and            possibly       YYCURSOR
                                                                                                                                                                                                                                                                   and          YYMARKER    before        returning.       ypicallyT   YYCURSOR,  YYLIMIT,   YYMARKER, and          YYFILL()n  will        be            dened        as
                                                                                                                                                                                                                                                                   macros.
                                                                                                                                                                                                                                                                   2.1                                                     Things             That               RE2C              Doesn't           videPro
                                                                                                                                                                                                                                                                   RE2C       doesn't     videpro   yman     things   ailablevain        more    tionalenvconscanner    generators  including default    rules,
                                                                                                                                                                                                                                                                   end-of-input    ens,pseudo-tok     and             buer             tmanagemen    routines.                    All              of              these             ustm            be                 supplied         yb               the               user.
                                                                                                                                                                                                                                                                   Rather    than     being     a        handicap, this     wsallo  RE2C-generatedscannersto       be         tailored to       almost yan    t.vironmenen
                                                                                                                                                                                                                                                                or F         example,    the           scanner        dened        in           Figure        1            compiles   toin         32          ytesb         of          i486         code           (using    atcomW     C             9.5);       the
                                                                                                                                                                                                                                                                   same        size         as          an       talenequiv hand-coded    routine.         Most         other        scanner       generators   cannot      produce       scanners      that
                                                                                                                                                                                                                                                                   are               compeetitiv     with            hand-coded         analyzers       in              this             case.                    urther,F          it               is               not           erlyvo           dicult         to             tenimplem      a
                                                                                                                                                                                                                                                                   more       traditionalscanner     using       RE2C.  orF        example,   Appendix    A          tainscon    the        support       code         for        the          C          scanner
                                                                                                                                                                                                                                                                  edbhmarkenc   in        ableT        1.               Note          that          this         code          wsallo       for          arbitrarily long        tiguouscon  enstok        and         videspro      line
                                                                                                                                                                                                                                                                   and           column    bumern         information.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       3
                                                                                                                                                                                                                                                                   3                                                                 Generating             Directly                 Executable              Scanners
                                                                                                                                                                                                                                                                   As            demonstrated yb           GLA         ][7          generating    directly      executable    code           instead       of           tables        can          result         in         hucm        faster
                                                                                                                                                                                                                                                                   scanners.        er,evwHo     to         ehievac       this         speed,         GLA-generated  scanners     emak        some        assumptions  about         the           input
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    4
                                                                                                                                                                                                                                                                   and             place           certain         restrictions    on             enstok                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               .                      In              this            section        ew              will         wsho          who            to              generate        directly        executable
                                                                                                                                                                                                                                                                   scanners             hwhic              not                only             oidva            hsuc                restrictions,        but                 are                 also              faster              and                usually            smaller.                         The               happroac
                                                                                                                                                                                                                                                                  tedpresen         here             has             the             added           benet           that           enev            faster           scanners         can             be               easily         be                created,         at             the              expense           of
                                                                                                                                                                                                                                                                   increased       code          size,          yb           using         a           hniquetec      akin         to            loop          unrolling.
                                                                                                                                                                                                                                                                   3.1                                                     Constructing       a           A      DF
                                                                                                                                                                                                                                                                   The            rst           step           in            generating     a             directly      executable     scanner         is            to            construct       a        A    DF        that          recognizes      the           regular
                                                                                                                                                                                                                                                                   expressions   in         the           specication.     Figure      3          tspresen      a      A    DF      that        recognizes   the           regular    expressions   in          Figure      2.
                                                                                                                                                                                                                                                                   One               possible         algorithm       for            constructing      hsuc              a          A    DF           can              be               found            in              [1].                     enGiv          hsuc              a           A,   DF         the               task             of
                                                                                                                                                                                                                                                                                                                 4
                                                                                                                                                                                                                                                                                                                                   These      assumptionsand     restrictionsare     discussedin          more     detail    in         Sections 3.3.1      and       5.1.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       4
                                                                                                                                                                                                                                                                   scanning      the            input         can           be            expressed       as            ws:follo
                                                                                                                                                                                                                                                                                                                                                                                                                                          Starting        from           the                start             state,          evmo            from           state             to               state             along           transitions       labeled           with            con-
                                                                                                                                                                                                                                                                                                                                                                          esecutiv       haractersc       from         the            input.               When           no            further         transitions    can            be              made,      kktracbac      to             the
                                                                                                                                                                                                                                                                                                                                                                          last            accepting       state,          ysa             q.                       The             path            to             q                spells            the             next           entok           and             the             rule             associated       with            q
                                                                                                                                                                                                                                                                                                                                                                          determines    the            code            to            be              executed.
                                                                                                                                                                                                                                                                   As          a        result,    the         problem  of        generating scanners   tiallyessenreduces     to         the        problem  of        generating an        executable
                                                                                                                                                                                                                                                                  tationrepresen  for          a        A.   DF
                                                                                                                                                                                                                                                                   3.2                                                     Generating        Code
                                                                                                                                                                                                                                                                   If               ew                assume            that              the                input             is               tirelyen         tainedcon         in                a                single            buer              then               generating        code                for               the          A    DF            is
                                                                                                                                                                                                                                                                  elyrelativ ard,tforwstraighas       is           illustrated yb          the           code           templates   in           Figure       4.                                 Note          that         the          only         dierence
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                gueoloPr
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            int                  yyaccept;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            goto                 M;start
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            fin:                 YYCURSOR            =                     YYMARKER;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            switch(yyaccept)f
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   :      :      :
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      A:n                                                                                          case                 n:                                         action;)n(
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   :      :      :
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            g
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        deo c          for             states
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    deCo           for       eptingcac      state                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               deCo           for        eptingcnon-ac   state
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Lq:                                                ++YYCURSOR;                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Lq:                                               ++YYCURSOR;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    yyaccept            =                    ruleq();
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    YYMARKER            =                    YYCURSOR;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Mq:                                               R)fswitch(*YYCURSO                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Mq:                                               switch(*YYCURSOR)f
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            :      :     :                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       :      :      :
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               case                 c:                                         goto                 Lgoto(q; c;)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    case                 c:                                         goto                 Lq(goto;c;)
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            :      :     :                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       :      :      :
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               default:                                  goto                 fin;                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  default:                                  goto                 fin;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    g                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    g
                                                                                                                                                                                                                                                                                                                                                                                                                         Figure               4:                            Directly            executable           scanner.                              The                 code                 generated            for                 a                   scanner              consists             of                  a
                                                                                                                                                                                                                                                                                                                                                                                                                         prologue  edwfollo  yb          code           for        heac         state.            start      is           the          start        state.            action)n(    denotes       the          code
                                                                                                                                                                                                                                                                                                                                                                                                                         associated    with        rule         n,          gotoq(; c)          denotes       the          state       hedreac       from       state        q             along      the           transition
                                                                                                                                                                                                                                                                                                                                                                                                                         labeled            with            c               and              ruleq()           denotes            the              rule              associated         with            state             q.                           yyaccept        is               used              to
                                                                                                                                                                                                                                                                                                                                                                                                                       evsa         kingktracbac information.   The            M-labels      will        be              used           in           section        3.4.2.
                                                                                                                                                                                                                                                                  eenbwet           the              templates       for             accepting       and             non-accepting   states            is              that            the              accepting        states         evha            additional
                                                                                                                                                                                                                                                                   code          to          evsa       kingktracbacinformation.   Figure       5           wssho        code           that        tmigh     be             generated     for          state        1            in          Figure       3.
                                                                                                                                                                                                                                                                   3.3                                                     Buering
                                                                                                                                                                                                                                                                   Complicationsarise          when           the            input          is             not          tainedcon      in             a             single        buer:               additional   code             is            needed           for           lling
                                                                                                                                                                                                                                                                   the            buer          as         .  necessary
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       5
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   L1:                                               ++YYCURSOR;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      yyaccept            =                     4;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      YYMARKER            =                     YYCURSOR;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   M1:                                               switch(*YYCURSOR)f
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  case                 'a':                                       goto                 L6;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             :      :      :
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  case                 'q':                                       goto                 L6;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  case                 'r':                                       goto                 L2;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  case                 's':                                       goto                 L6;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             :      :      :
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  case                 'z':                                       goto                 L6;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  default:                                  goto                 fin;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      g
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Figure        5:                Code           for           state         1.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     L6:                                               ++YYCURSOR;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         if(YYLIMIT         ==                    YYCURSOR)           YYFILL(1);
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         yyaccept            =                    4;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         YYMARKER            =                    YYCURSOR;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     M6:                                              R)fswitch(*YYCURSO
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                :      :      :
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         g
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Figure        6:                Code           for           state         6.
                                                                                                                                                                                                                                                                   3.3.1                                           The            tinelSen    Method
                                                                                                                                                                                                                                                                   Most                scanner             generators          use                 the                tinelsen           method             ][1               to                 determine          when                the                buer                needs               lling.                         In                the
                                                                                                                                                                                                                                                                   simplest        case,           a               bsymol          that            does              not             appear            in            alidv          input           is             hosenc           as               the             tinelsen        haracter.c                An               extra
                                                                                                                                                                                                                                                                   state       is         added      to         the    A    DF    and        transitionsto        this       state       on         the       tinelsen   bsymol    are        added      to         the         original states.
                                                                                                                                                                                                                                                                   When            the        A    DF        esarriv        in             this            new            state            it             is             time          to             rell          the             buer.                 After           the             buer            is             relled,       scanning
                                                                                                                                                                                                                                                                  ustm            be               restarted         in              the             previous        state.                 ,  Unfortunatelythis            is              not             possible         with            the             happroac        outlined        in
                                                                                                                                                                                                                                                                   Figure             4:                       the                necessary           information    is                simply          not             ailable.va                  Code               could             be                 added              to               heac              state              to              evsa
                                                                                                                                                                                                                                                                   the             necessary       information but            this          ouldw        result          in          erwslo         and           larger         scanners.             GLA          essolv         this          problem      yb
                                                                                                                                                                                                                                                                   ensuring        that            the           tinelsen        only           gets           inserted        eenbwet         ens:tok              if            this            is             the             case,           the             scanner         can           ysaalw        be
                                                                                                                                                                                                                                                                   restarted         from          the             start           state.                 o T            ensure           that            the             tinelsen        only           gets            inserted        eenbwet         ens,tok         GLA             wsallo
                                                                                                                                                                                                                                                                   newline         (ASCII          LF)             haractersc      to              appear           only           at              the             end             of             a              entok           and           wsdisallo     the             buering        of             partial
                                                                                                                                                                                                                                                                                                                                                   5
                                                                                                                                                                                                                                                                   lines                                                                                               .
                                                                                                                                                                                                                                                                   3.3.2                                           Buering
                                                                                                                                                                                                                                                                   RE2C-generated  scanners         khecc            if             the              buer           needs            lling         simply        yb              comparing    YYCURSOR      and             YYLIMIT.      A
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  6
                                                                                                                                                                                                                                                                   method      inspired  yb         the        hanismmecused        to         guard      against   kstac   wer
ovo    in        ][17                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          is         used         to         reduce      the        tamoun
                                                                                                                                                                                                                                                                   of          king.hecc
                                                                                                                                                                                                                                                                                                                                 ksChec          are              only            inserted         in              certain         eyk             states.                    These           kshecc            simply        ensure            that            there             is              enough           input
                                                                                                                                                                                                                                                                   in              the              buer             for              the               scan             to               proceed           tilun           the               next            eyk              state.                   or F             example,        in              the          A    DF          of              Figure           3                it               is
                                                                                                                                                                                                                                                                  tsucien       to         khecc          that          there          are            at           least         6            haractersc     in            the           buer          when          it            starts,       and           that          there          is            at           least
                                                                                                                                                                                                                                                                   one           haracterc      in           the            buer          when          the       A    DF       is            in           states          6,           8,           or            10.               No           other        kshecc         are            required.          The         kshecc
                                                                                                                                                                                                                                                                   inserted        in          eyk           states         are           of            the           form
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       6
                                                                                                                                                                                                                                                                                                                                                                          if((YYLIMIT         -                    YYCURSOR)           <                      n)                   YYFILL();n
                                                                                                                                                                                                                                                                   where           n               is              the           ummaxim   bumern          of            haractersc       that            can             be               consumed        before           another        eyk             state           is             hed.reac
                                                                                                                                                                                                                                                                or F          example,    Figure        6            wssho         the           code            generated      for           state          6            in            Figure        3.
                                                                                                                                                                                                                                                                                                                                 A               set            of           eyk            states          can            be              determined    yb          eringvdisco    the            strongly-connectedcomptsonen  (SCCs)          of            the
                                                                                                                                                                                                                                                              A.   DF     An           SCC         is         a          lmaxima  subset       of          states      hsuc         that        there        exists       a           path        from     yan         state       in          the          subset       to         yan
                                                                                                                                                                                                                                                                   other.                  The             set             of            eyk             states          consists         of             all           of             the             states           in             non-trivial   SCCs,          together         with           the             start           state.
                                                                                                                                                                                                                                                                   Note         that        for        heac        SCC         S ,           ew          actually    only     evha        to          include     a          subset        of         states        of         S             hsuc         that        when        the          subset
                                                                                                                                                                                                                                                                   is        ed,vremo   S              becomes      acyclic.         Indeed,     ][17       describes      a           simple    heuristic    for       hocosing    hsuc        a           subset.        er,evwHo
                                                                                                                                                                                                                                                                   since         in           practice      most       of           the          (non-trivial)SCCs       teredencoun    will       consist       of          a            single       state        the          tcurren      ersionv
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           7
                                                                                                                                                                                                                                                                   of           RE2C         simply     includes      all        states         in           non-trivialSCCs                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  .                 An           algorithm  engiv       in          ][3         asw          used          to           compute      the
                                                                                                                                                                                                                                                                   SCCs.
                                                                                                                                                                                                                                                                   3.4                                                     Optimizations
                                                                                                                                                                                                                                                                  enEv             good             optimizing    C               compilers      can              be                coaxed         toin            generating    hucm           smaller         and            tlysligh       faster           code             if
                                                                                                                                                                                                                                                                   some         transformationsare         rst           applied      to            the           generated       code.
                                                                                                                                                                                                                                                                   3.4.1                                          gEliminatinkingktracBac
                                                                                                                                                                                                                                                                   Consider           state             1                 in                the           A    DF           in                Figure             3.                             Note               that              since             all              of                the                transitions       from            state              1                hreac              only
                                                                                                                                                                                                                                                                   accepting         states,          kingktracbac      information   does               not              need              to               be               edvsa             if              the               code               for              the               default         case               is
                                                                                                                                                                                                                                                                  hangedc             to                go                directly           to                 the                code                associated          with              state               1.                              The                result              of                this               optimization    is               wnsho              in
                                                                                                                                                                                                                                                                   Figure          7.                                              More          , generally    this            optimization can             be               applied        to              all            accepting       states          hwhic         evha            transitions
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   L1:                                               ++YYCURSOR;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   M1:                                               switch(*YYCURSOR)f
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  case                 'a':                                       goto                 L6;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             :      :      :
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  case                 'q':                                       goto                 L6;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  case                 'r':                                       goto                 L2;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  case                 's':                                       goto                 L6;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             :      :      :
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  case                 'z':                                       goto                 L6;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  default:                                  goto                 A4;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      g
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Figure        7:                Code           for           state         1             with        kingktracbac eliminated.
                                                                                                                                                                                                                                                                   only          to           accepting      states.
                                                                                                                                                                                                                                                                   3.4.2                                           Optimizing  switches
                                                                                                                                                                                                                                                                   Most                 C                   compilers            will              generate               either              a                    jump               table               or                   a                    set                  of                  if                 tsstatemen           for                  a                   switch             tstatemen
                                                                                                                                                                                                                                                                   depending          on              the               distribution    of              the               case            labels.                   In              yman           compilers       the              decision         as               to              hwhic            method           to
                                                                                                                                                                                                                                                                   use                is                biased         ardswto           generating        jump            tables            since             in                most            cases              this              results            in              faster            albeit            larger            code.
                                                                                                                                                                                                                                                                er,evwHo      experience      with         directly      executable     scanners       has         wn,sho       that         replacing    yman       of           these         jump        tables
                                                                                                                                                                                                                                                                                                                 5
                                                                                                                                                                                                                                                                                                                                   If         the        input    tainscon  no         newlines, a          GLA        scanner   will       attempt   to         buer     the      tireen    input      stream.
                                                                                                                                                                                                                                                                                                                 6
                                                                                                                                                                                                                                                                                                                                   The         problem  of         detectingkstac wer
ovo   in          LR         parsers   is          probably best        left     to         arehardw hanismsmec].[12
                                                                                                                                                                                                                                                                                                                 7
                                                                                                                                                                                                                                                                                                                                   It         should    be         noted     that      nding   the        minimal  set        of         states    to        evremo   from      an         SCC         in         order     to         render    it         acyclic  is       talenequiv
                                                                                                                                                                                                                                                                   to          the      CKFEEDBA  TEXVER        SET        problem  hwhic      is         NP-complete[6].
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       7
                                                                                                                                                                                                                                                                   with        if         tsstatemen  results      in          scanners     hwhic      are        hucm       smaller,  and      ,  surprisinglyin      some       cases       tlysligh   faster
                                                                                                                                                                                                                                                                                                                                                                                          8
                                                                                                                                                                                                                                                                   as         ellw                                                                                                                          .                As           a          result,     the          ycapabilitof        replacing   a          switch   tstatemen   with       if         tsstatemen asw         added      to          RE2C.
                                                                                                                                                                                                                                                                                                                                 RE2C          bases            its            decision       on              whether         to              generate        a             switch       tstatemen       or              to             replace         it             with           ifs            solely         on
                                                                                                                                                                                                                                                                                                                                                                                                                                                                              9
                                                                                                                                                                                                                                                                   the              ydensit                                                                                                                                                                                                                       of               the              switch         t.statemen                 It                is              surprising        that            hsuc              a               simple          heuristic        orksw           ell.w                    orF             more
                                                                                                                                                                                                                                                                   esoteric        applications    in            hwhic           the             input          alphabet         is             not             a              simple     altervin       RE2C           has            the           tageanadv     in             that
                                                                                                                                                                                                                                                                   there         is         no          visionpro  for         don't      care        triesen      in          a           switch    t:statemen       if         no          case      hesmatc     none        of         the         tsstatemen
                                                                                                                                                                                                                                                                   in             the            switch      ustm         be              executed.          er,evwHo       for           the            examples      in         ableT        1             this           is            not            so:                RE2C          simply       does            a
                                                                                                                                                                                                                                                                   better             job            of             generating     code              for            switch       tsstatemen      than            the             compiler.           ],[18         ],[11          and            ][2            also           address          the
                                                                                                                                                                                                                                                                   problem      of           generating    good           code            for           switch      ts.statemen
                                                                                                                                                                                                                                                                   Replacing         switches          with              ifs                                      When             replacing       a               switch          tstatemen       with             if             ts,statemen       it               is              useful            to
                                                                                                                                                                                                                                                                   sort              the              cases          yb              label            and             then             group            them            according        to              rule            toin            subranges,        as               illustrated     in              Figure           8.
                                                                                                                                                                                                                                                                   RE2C         replaces      a            switch      with        either        a            linear      or           binary      h,searc       depending     on           the         bumern       of          subranges     in           the
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               switch(*YYCURSOR)f
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           case                 000':'\                                          :      :      :                   case                 '/':                                                                                                               goto                 L11;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            case                 '0':                                                                                                                                                                                                                                                                                                                                                                                                                                           goto                 L7;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            case                 '1':                                             :      :      :                   case                 '9':                                                                                                               goto                 L8;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            case                 ':':                                             :      :      :                   case                 '`':                                                                                                               goto                 L11;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            case                 'a':                                             :      :      :                   case                 'o':                                                                                                               goto                 L6;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            case                 'p':                                                                                                                                                                                                                                                                                                                                                                                                                                           goto                 L1;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            case                 'r':                                             :      :      :                   case                 'z':                                                                                                               goto                 L6;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             case                 '':f                                             :      :      :                   case                 '\377':                                          goto                 L11;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               g
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Figure        8:               switch       for          state           0.
                                                                                                                                                                                                                                                                   switch.         If            there           are           only         a             few           subranges      a            linear       hsearc         is            generated;    otherwise,     a            binary       hsearc         is            used.
                                                                                                                                                                                                                                                                                                                                 Figure        9          and         Figure       10         wsho          linear     and           binary     hes,searc  ,  respelyectivthat        could       be            used          to         replace                                             the
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           if(*YYCURSOR       <=                   '/')                 goto                 L11;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           if(*YYCURSOR       <=                   '0')                 goto                 L7;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           if(*YYCURSOR       <=                   '9')                 goto                 L8;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           if(*YYCURSOR       <=                   '`')                 goto                 L11;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           if(*YYCURSOR       ==                   'p')                 goto                 L1;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           if(*YYCURSOR       <=                   'z')                 goto                 L6;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           goto                 L11;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         Figure        9:                Linear        lookup        code           sequence        for           state          0.
                                                                                                                                                                                                                                                                                                                 8
                                                                                                                                                                                                                                                                                                                                   See     ableT      1          for        examples.
                                                                                                                                                                                                                                                                                                                 9
                                                                                                                                                                                                                                                                                                                                   The       bumern    of         distinct  subrangesdivided yb         the        total   bumern     of         cases.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       8
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               if(*YYCURSOR       <=                    '`')f
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      if(*YYCURSOR       <=                    '/')                 goto                 L11;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      if(*YYCURSOR       <=                    '0')                 goto                 L7;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      if(*YYCURSOR       <=                    '9')                 goto                 L8;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      goto                 L11;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               g                      else                f
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      if(*YYCURSOR       ==                    'p')                 goto                 L1;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      if(*YYCURSOR       <=                    'z')                 goto                 L6;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      goto                 L11;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               g
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Figure       10:               Binary        lookup        code           sequence        for           state         0.
                                                                                                                                                                                                                                                                   switch      in         Figure        8.              Note          in          particular  the           comparison for          the          \"p         in          Figure       9.               This         optimizationeliminates
                                                                                                                                                                                                                                                                   a             comparison  heac           time        it            is            applied.          Also          note           that          no            comparisons  are            required       at            the           top           and           bottom        of
                                                                                                                                                                                                                                                                   the            range.
                                                                                                                                                                                                                                                                  ngSimplifyi       switches                                As                 a                 general           rule,               better              treplacemen       code                can                be                 generated           for                a                 switch          if                it
                                                                                                                                                                                                                                                                  tainscon         erfew             subranges.                     One             yaw              of               reducing          the             bumern            of               subranges          in               a                switch,         at               the                expense             of
                                                                                                                                                                                                                                                                   some           speed,             is             to              locate           a            aseb            switch        hwhic          is             eryv            similar      and             then            replace          the             code              for            all           cases            hwhic
                                                                                                                                                                                                                                                                   appear         ticallyiden in            the           base           switch      with          a            goto         to            (the           code           generated       for)         the            base          switch.          RE2C         uses
                                                                                                                                                                                                                                                                   this              optimization   to               good           tageanadv        when              generating        code               in               the               transitions       of               states             used              for             hingmatc
                                                                                                                                                                                                                                                                 ords.eywk       or F        example,   note          that         the          switches    for          states        1            through      4           dier        from        the          switch      of          state         6           only
                                                                                                                                                                                                                                                                   on            \",r         \",i         \",n        and           \",t      .  respelyectiv    Figure        11          wssho         the            code            generated      for          these           states.                              Another      yaw
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            L1:                                              ++YYCURSOR;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            M1:                                              if(*YYCURSOR       !=                    'r')                 goto                 M6;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            L2:                                              ++YYCURSOR;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            M2:                                              if(*YYCURSOR       !=                    'i')                 goto                 M6;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            L3:                                              ++YYCURSOR;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            M3:                                              if(*YYCURSOR       !=                    'n')                 goto                 M6;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            L4:                                              ++YYCURSOR;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            M4:                                              if(*YYCURSOR       !=                    't')                 goto                 M6;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               goto                 L5;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Figure        11:              Code           for           states         1{4          after         all          optimizations.
                                                                                                                                                                                                                                                                   of           tingimplementhis         optimizationis            to            construct       a            tunnel        automaton    ][9           from        the       A,   DF       and           then           generate
                                                                                                                                                                                                                                                                   code            from        the            tunnel        automaton.
                                                                                                                                                                                                                                                                   Common          SubexpressionnEliminatio                          yMan          compilers     will         miss         the            fact          that           *YYCURSOR   in            Figures        9
                                                                                                                                                                                                                                                                   and          10          should       be            loaded     toin        a            register.          Most         can         be            coaxed        to         do            so          yb          rst          assigning   *YYCURSORto           a           local
                                                                                                                                                                                                                                                                 ariable.v
                                                                                                                                                                                                                                                                   4                                                                 Exptalerimen             Results
                                                                                                                                                                                                                                                                ableT           1                compares      owt             RE2C-generated   C                scanners          with            the               (hand-coded)     lcc              scanner          ][5             and              comparable
                                                                                                                                                                                                                                                                   GLA-            and             
ex-generated     scanners        on              a             yarietv         of             platforms.                                      It              reports            the             times          in              seconds           required        yb
                                                                                                                                                                                                                                                                   the          ariousv      scanners       to           scan          about          170,000     lines        of           C             source.            The           5,607,820 yteb          source        le           used          tiallyessen
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       9
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               time                                                                                                                                                                                                                                                                                                                                                                                                                                                   space
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    program                                                                           user                                                                               sys                                                                         total                                                                             text                                                                           data                                                                        bss                                                                        total
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        R4000           /            c2.3.3gc        -O
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
ex           -Cem                                                                   10.36                                             0.87                                                                   11.23                                                                  5200                                                                   4192                                                                                        48                                                                      9440
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
ex             -Cf                                                                                        5.44                                              0.72                                                                                        6.16                                                                   4688                                               64384                                                                                      48                                                 69120
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       lcc                                                                                          3.19                                              0.67                                                                                        3.86                                                                   7328                                                                   1216                                               8256                                              16800
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   gla                                                                                        2.89                                              0.63                                                                                        3.52                                              11552                                                                  3056                                                                    144                                               14752
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             re2c                                                                                          2.54                                              0.68                                                                                        3.22                                              13264                                                                                       512                                                                                                              0                                                  13776
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 re2c            -s                                                                                           2.38                                              0.67                                                                                       3.05                                              11056                                                                  4528                                                                                                             0                                                  15584
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             R4000           /            c2.11.2c       -O              -Olimit      5000
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
ex           -Cem                                                                                        9.97                                              0.89                                                                   10.86                                                                  4704                                                                   4240                                                                                        32                                                                      8976
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
ex             -Cf                                                                                        6.19                                              0.72                                                                                        6.91                                                                   4256                                               64432                                                                                      32                                                 68720
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       lcc                                                                                          2.74                                              0.72                                                                                        3.46                                                                   9664                                                                                        864                                                8256                                              18784
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   gla                                                                                        2.46                                              0.69                                                                                        3.15                                              19232                                                                  2992                                                                    128                                               22352
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             re2c                                                                                          2.97                                              0.63                                                                                        3.60                                              15088                                                                                       528                                                                                                              0                                                  15616
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 re2c            -s                                                                                           2.94                                              0.61                                                                                       3.55                                              16080                                              11808                                                                                                           0                                                  27888
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          CARSP          /            c2.3.3gc       -O
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
ex           -Cem                                                                   16.03                                             2.78                                                                   18.81                                                                  8992                                                                                                             24                                                                                          48                                                                      9064
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
ex             -Cf                                                                                        7.84                                              2.69                                                                   10.53                                                                  6560                                               62232                                                                                      48                                                 68840
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       lcc                                                                                          4.46                                              2.01                                                                                        6.47                                                                   7800                                                                                        384                                                8256                                              16440
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   gla                                                                                        4.08                                              1.56                                                                                        5.64                                              10864                                                                  2168                                                                    136                                               13168
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             re2c                                                                                          3.67                                              1.76                                                                                        5.43                                              13552                                                                                                                                 0                                                                                                                0                                                  13552
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 re2c            -s                                                                                           3.48                                              1.70                                                                                       5.18                                              15464                                                                                                                                 0                                                                                                                0                                                  15464
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            i486           /             c2.4.5gc       -O
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
ex           -Cem                                                                   21.86                                             1.26                                                                   23.12                                                                  8536                                                                                                             20                                                                                          24                                                                      8580
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
ex             -Cf                                                                                        9.12                                              1.18                                                                   10.30                                                                  6200                                               62228                                                                                      24                                                 68452
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       lcc                                                                                          5.45                                              1.22                                                                                        6.67                                                                   5924                                                                                        384                                                8240                                              14548
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   gla                                                                                        5.11                                              1.18                                                                                        6.29                                              15496                                                                  2144                                                                    108                                               17748
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             re2c                                                                                          4.73                                              1.13                                                                                        5.86                                                                   9800                                                                                                                                  0                                                                                                                0                                                                       9800
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 re2c            -s                                                                                           4.85                                              1.17                                                                                       6.02                                              12968                                                                                                                                 0                                                                                                                0                                                  12968
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   68020           /            c1.40gc         -O
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
ex           -Cem                                              117.37                                            5.89                                               123.26                                                                7700                                                                                                             20                                                                                          22                                                                      7742
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
ex             -Cf                                                                   50.93                                             5.27                                                                   56.20                                                                  5388                                               62228                                                                                      22                                                 67638
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       lcc                                                                      33.28                                             6.28                                                                  39.56                                                                  4956                                                                                        384                                                8236                                              13576
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   gla                                                                   33.80                                             4.20                                                                   38.00                                             13904                                                                  2144                                                                    106                                               16154
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             re2c                                                                     28.92                                             2.91                                                                   31.83                                                                  8556                                                                                                                                  0                                                                                                                0                                                                       8556
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 re2c            -s                                                                      30.72                                             3.19                                                                  33.91                                                                  9856                                                                                                                                  0                                                                                                                 0                                                                       9856
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           ableT        1:                Comparison  of           generated       C            scanners.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             10
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            10
                                                                                                                                                                                                                                                                   consists       of          10           copies        of          the           source         to           James       Clark's      SGML        parser,       sgmls                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       .                 The           times       reported       are         eragesva
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                11
                                                                                                                                                                                                                                                                   for             10               trials;           the               sizes             reported            include         erythingev        but               C                library         code                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             .                           
ex              videspro         a              bumern           of               table
                                                                                                                                                                                                                                                                   compression    options        including     -Cem          for            tables         optimized     for            space,          and            -Cf           for            tables         optimized     for           speed.
                                                                                                                                                                                                                                                                   By           default,   RE2C        will       use         a           heuristic    to          decide       if         a           switch     should      be          replaced     with        ifs:             the          -s         option      forces
                                                                                                                                                                                                                                                                   RE2C          to          ysaalw      generate        switches.
                                                                                                                                                                                                                                                                                                                               o T                 emak                 comparisons         more                 meaningful,           all                 ticseman              processing              code                   asw                 edvremo               from                the                    GLA-
                                                                                                                                                                                                                                                                   generated           and                lcc               scanners,            and               code                to                videpro           line              and               column         bumern             information   asw                added              to                the
                                                                                                                                                                                                                                                                   RE2C          specication.      The           remaining   dierences      of           note         eenbwet         the           scanners        include:
                                                                                                                                                                                                                                                                                                                                                    The             
ex-generated  scanners        do           not          videpro       line         or            column    bumern         information.
                                                                                                                                                                                                                                                                                                                                                    The             GLA-generated scanner         assumes      7-bit         input.
                                                                                                                                                                                                                                                                                                                                 As          a           general     rule,        the         RE2C-generatedscanners    erew         the         fastest,    edwfollo  yb          the         GLA-generatedscanner
                                                                                                                                                                                                                                                                   and             then            the             lcc             scanner.                The            
ex-generated    scanners         erew           tlysignicaner.wslo                Only           the             space-optimized
                                                                                                                                                                                                                                                                   
ex               scanner           asw              smaller         than            the               default          RE2C             scanner,           and              only            yb               a               wnarro           margin.                   There              are              some
                                                                                                                                                                                                                                                                  hitectures,arc      notably          the              IBM               370,            on              hwhic            table           endriv           scanners           will            probably         produce            better              results:
                                                                                                                                                                                                                                                                   IBM            370          compilers   ypicallyt   generate       poor            code            for           large        routines.
                                                                                                                                                                                                                                                                                                                                 The                 ariousv              scanners                 and                 input                  les                  used                   for                  the                    tests                  are                ailableva           for                 ymousanon           ftp                   from                csg.-
                                                                                                                                                                                                                                                                 aterlowo.cauin      /pub/p.r.Zeter/re2c/sampler.ta
exisailablevaforymousanonftpfrom   ftp.uu.netas      ages/-/pack
                                                                                                                                                                                                                                                                  r.Z,gnu/
ex-2.3.7.taGLA      is          ailableva    for           ymousanon     ftp            from         rado.eduftp.cs.coloas          part           of             the             Eli         agekpac
                                                                                                                                                                                                                                                                  r.Z,/pub/cs/distribs/eli/Eli3.4.2.taandthe           lcc               tfron            end               is              ailableva       for             ymousanon       ftp              from            rinceton.edup
                                                                                                                                                                                                                                                                   as             r.Z./pub/lcc/lccfe-1.9.taAn    alpha         ersionv        of             RE2C           will          soon             be               made       ailableva    for           ymousanon     ftp            from
                                                                                                                                                                                                                                                                 aterlowo.cacsg.uas            /pub/p.r.Zeter/re2c/re2c-0.5.ta
                                                                                                                                                                                                                                                                   5                                                                 Related          ork   W
                                                                                                                                                                                                                                                                   The              eyk               to               the               performance      and             y
exibilit      of              an               RE2C-generated    scanner            is               the              happroac         used              to               deter-
                                                                                                                                                                                                                                                                   mine           when             the             buer            needs            lling.            ,  terestinglyInthe              lcc              scanner         ][5             uses             a               similar      happroac        (with           certain
                                                                                                                                                                                                                                                                   concessions     to           eepk          the            boeepingokk   manageable.)
                                                                                                                                                                                                                                                                   5.1                                                     Comparison        With              GLA
                                                                                                                                                                                                                                                                   It                 is                natural         to                compare          RE2C              to                GLA              [7]              as                it                also             generates           directly           executable         scanners.                       RE2C              and
                                                                                                                                                                                                                                                                   GLA          evha          yman         dierences       simply       because          they            are            targeted        for          tdieren       yptes            of            users:                GLA           is            tendedin
                                                                                                                                                                                                                                                                   for            people           who            simply       wish           to           eragelev        their          eorts           with          existing       tools           and            libraries;     RE2C          is            tendedin        for
                                                                                                                                                                                                                                                                   people        that       evha        more        specialized   needs         and         are          willing   to          videpro     their       wno         support       routines.      or F        example,
                                                                                                                                                                                                                                                                   GLA         videspro      a            good          buering     hanism,mecRE2C         users         ustm        supply       their        wn.o             These         dierences,  er,evwho
                                                                                                                                                                                                                                                                   are            not           unique       to            GLA           and        evha          been            addressed       for          the            most         part          in           previous      sections.
                                                                                                                                                                                                                                                                                                                                 Of               more           terestin          is               the               dierences          in              the               code               that             RE2C             and              GLA              generate.                    Scanners          generated         yb
                                                                                                                                                                                                                                                                   RE2C          and           GLA          dier         primarily  in          owt           aspects:           who           they          determine     when           the           buer          needs          lling,     and          who
                                                                                                                                                                                                                                                                   they           generate       code            for          switches.
                                                                                                                                                                                                                                                                                                                                 GLA       uses        the        ASCII       NUL      haracterc   as         the       tinelsen   to        determine  when       the        buer      needs       lling.      o  T    evimpro
                                                                                                                                                                                                                                                                   the          speed          and        reduce        the          size          of         the          generated    scanners      GLA         buers        only       complete    lines       and        restricts     enstok
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  12
                                                                                                                                                                                                                                                                   to         those      that      do       not       taincon  newline   (ASCII      LF)      haractersc                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               .               If        a        entok     with      an       bemedded    newline  haracterc
                                                                                                                                                                                                                                                                  h(suc         as           a          t)commen   is          required      it         ustm       be             recognized    with        an         auxiliary   scanner      written       in         C.          This         code          has
                                                                                                                                                                                                                                                                   to            perform       the            buering-relatedboeepingokk that          is            done          automaticallyyb         GLA-generated  code.
                                                                                                                                                                                                                                                                                                 10
                                                                                                                                                                                                                                                                                                                                 ailableAvfor        ymousanonftp       from      ftp.uu.net  as    .r.Zta1.1.roml/sgmls-/pub/text-pcessing/sg
                                                                                                                                                                                                                                                                                                 11
                                                                                                                                                                                                                                                                                                                                   The         GLA-generatedscannersizes    also     do          not        include  the       size       of         an         error       reportingmodule    err.o.
                                                                                                                                                                                                                                                                                                 12
                                                                                                                                                                                                                                                                                                                                   This        is         discussedin         more      detail    in         Section   3.3.1.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             11
                                                                                                                                                                                                                                                                                                                                 The                 hanismmec         RE2C               uses                to                   rell               the                 buer                eliminates        these                 restrictions         and               ety                 wsallo            RE2C                to
                                                                                                                                                                                                                                                                   generate       faster        and           smaller     scanners.         RE2C          also        wsallo       both          auxiliary   and          primary     scanners       to            be             specied
                                                                                                                                                                                                                                                                   using         regular       expressions.     or F         example,    Appendix       A            tainscon      an            auxiliary  scanner         for          ts.commen
                                                                                                                                                                                                                                                                                                                                 eLik            RE2C,           GLA             usually         replaces          switches        with             ifs.                      eUnlik          RE2C,           GLA             does               not              use              a                case-based
                                                                                                                                                                                                                                                                   heuristic       to             decide         hwhic          switches       to             replace:             rather,         it           ysaalw        generates        a              switch       for            the             start           state           and
                                                                                                                                                                                                                                                                   uses            ifs          for           the           rest.               GLA          replaces       switches      with         code            sequences       of            the           form:
                                                                                                                                                                                                                                                                                                                                                                                                                                                       if(*YYCURSOR        in                    S                                                                                                                                                                                                                                                                                                                                                       )                   goto                 L                                                                                                                                                      ;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     1                                                                                                                                                                                                1
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              .
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              .
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              .
                                                                                                                                                                                                                                                                                                                                                                                                                                                       if(*YYCURSOR        in                    S                                                                                                                                                                                                                                                                                                                                                          )                    goto                 L                                                                                                                                                          ;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     n                                                                                                                                                                                                    n
                                                                                                                                                                                                                                                                   Bit        ectorsv      are         used         for        all      bmemership tests       olvingvinsets         with       more       than       one        t.elemen         As          an          optimization,
                                                                                                                                                                                                                                                                   if              a              state            has             a               transition      to              itself         the              test            as              to              whether          to              remain         in             the              same           state            or              not             is              performed
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               13
                                                                                                                                                                                                                                                                   rst.                 or F            example,      Figure          12            wssho            the             GLA-generated    code            for             state           8               in              Figure          2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          .                                                                        Note            the             use              of
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               static               unsigned            char                 yytable[]           =                     f
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               0x00,                0x00,               0x00,                0x00,                /*                                                               0.                                                               1.                                                                2.                                                              3.                    */
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               :      :      :
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               0x00,                0x00,               0x00,                0x00,                /*                                         ,                                          -                                           .                                          /                     */
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               0x01,                0x01,               0x01,                0x01,                /*                                         0                                          1                                           2                                          3                     */
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               0x01,                0x01,               0x01,                0x01,                /*                                         4                                          5                                           6                                          7                     */
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               0x01,                0x01,               0x00,                0x00,                /*                                         8                                          9                                           :                                          ;                     */
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               0x00,                0x00,               0x00,                0x00,                /*                                         <                                          =                                           >                                          ?                     */
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               :      :      :
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               0x00,                0x00,               0x00,                0x00                 g;                   /*                                         |                                           g                                                                                                           127.                 */
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      .
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      .
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      .
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               L8:                ]++)+0URSORif(yytable[(*YYC&             1<<0)               goto                 L8;--YYCURSOR;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               goto                 A3;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Figure        12:              GLA           code            for          state          8            in            Figure        2.
                                                                                                                                                                                                                                                                   128            telemen       triesen          for            the              bit            ectorsv          to             reduce            the             scanner           size:                 A               GLA-generated    scanner          will           crash           or
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                14
                                                                                                                                                                                                                                                                   otherwise     ebveha         unpredictably if           a             non-ASCII    haracterc       appears        in            the           source                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              .
                                                                                                                                                                                                                                                                                                                                 In                some              sense                the                results             of                Section            4                 are                 a                 bit               misleading:               the                GLA               specication         that             asw                used                to
                                                                                                                                                                                                                                                                   obtain         the            gures          in         ableT         1              is             not           a             ypicalt       GLA           specication.          Usually       scanners       tedenimplem  using          GLA
                                                                                                                                                                                                                                                                   will            handle          ordseywk         as               tiersiden        as                GLA              has              been               optimized       for              this            ].[7                      ableT           2               tspresen             a               more            fair
                                                                                                                                                                                                                                                                   comparison:     the           ordeywk      hingmatc    rules         erew         edvremo      from        both            the           GLA           and           RE2C         specications.                          The
                                                                                                                                                                                                                                                                   RE2C-generated    scanners         erew             still          faster           and              smaller      except            on              the              MIPS             R4000,         where            the              cc-compiled
                                                                                                                                                                                                                                                                   GLA           scanner       asw          tlysligh     faster.
                                                                                                                                                                                                                                                                                                                                 Note           er,evwho         that            the              RE2C             specication      can              be               tiallysubstan   sped               up              yb              using            a              hniquetec         akin            to
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         15
                                                                                                                                                                                                                                                                   loop          unrolling.      Replacing     the            original  ordeywk     hingmatc    rule           in           the            RE2C         specication
                                                                                                                                                                                                                                                                                                                                                                                                                                                       L                     I*                                                                                                                                                                                                                                                                                                                                                                                                                                                f                      RET(ID);            g
                                                                                                                                                                                                                                                                   with          the           wingfollo  rules
                                                                                                                                                                                                                                                                                                 13
                                                                                                                                                                                                                                                                                                                                ,  ActuallyGLA       ouldw      generate a          while    t.statemen   Most       compilerswill        generate the        same      object      code      for        both.
                                                                                                                                                                                                                                                                                                 14
                                                                                                                                                                                                                                                                                                                                   No        kshecc    are        made      to         ensure    that     only       7-bit    haracterscappear    in         the        input.
                                                                                                                                                                                                                                                                                                 15
                                                                                                                                                                                                                                                                                                                                   L        =          [a-zA-Z   ]          and        I        =          [a-zA-Z . 0-9]
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             12
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   time                                                                                                                                                                                                                                                                                                                                                                                                space
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        program                                                      user                                                                    sys                                                               total                                                        text                                                       data                                                    bss                                                             total
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              R4000           /            c2.3.3gc        -O
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           gla                                                                    2.63                                               0.58                                                                   3.21                                              5040                                              2496                                               144                                                                    7680
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       re2c                                                                     2.50                                              0.65                                                                   3.15                                              6448                                                                   512                                                                                         0                                                                       6960
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          re2c            -s                                                                      2.49                                              0.67                                                                   3.16                                              4976                                              4224                                                                                        0                                                                       9200
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          re2c            -s            y                                                                       2.08                                              0.59                                                                   2.67                                              5792                                              4224                                                                                        0                                                   10016
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  R4000           /            c2.11.2c       -O              -Olimit      5000
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           gla                                                                    2.43                                               0.64                                                                   3.07                                              6512                                              2416                                               128                                                                    9056
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       re2c                                                                     2.93                                              0.67                                                                   3.60                                              8048                                                                   528                                                                                         0                                                                       8576
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          re2c            -s                                                                      3.04                                              0.64                                                                   3.68                                              9952                                              2208                                                                                        0                                                   12160
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               CARSP          /            c2.3.3gc       -O
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           gla                                                                    4.08                                               1.65                                                                   5.73                                              5472                                              1656                                               136                                                                    7264
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       re2c                                                                     3.77                                              1.67                                                                   5.44                                              7008                                                                                                             0                                                                                           0                                                                       7008
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          re2c            -s                                                                      3.66                                              2.37                                                                   6.03                                              9112                                                                                                             0                                                                                           0                                                                       9112
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 i486           /            c2.4.5gc        -O
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           gla                                                                    5.04                                               1.15                                                                   6.19                                              5368                                              1632                                               108                                                                    7108
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       re2c                                                                     4.75                                              1.17                                                                   5.92                                              5448                                                                                                             0                                                                                           0                                                                       5448
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          re2c            -s                                                                      5.06                                              1.13                                                                   6.19                                              8248                                                                                                             0                                                                                           0                                                                       8248
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        68020           /            c1.40gc         -O
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           gla                                                32.69                                             3.37                                              36.06                                             4772                                              1632                                               106                                                                    6510
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       re2c                                                29.86                                             3.74                                              33.60                                             4468                                                                                                             0                                                                                           0                                                                        4468
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          re2c            -s                                                  28.77                                             3.55                                             32.32                                             5616                                                                                                              0                                                                                          0                                                                        5616
                                                                                                                                                                                                                                                                                                                                                                                                                      ableT            2:                      Scanner           performance      with            ordseywk        treated           as              tiers.iden                   y               uses              an               \unrolled"
                                                                                                                                                                                                                                                                                                                                                                                                                         specication.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             13
                                                                                                                                                                                                                                                                                                                                                                                                                                                       L                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   f                     RET(ID);            g
                                                                                                                                                                                                                                                                                                                                                                                                                                                       L                     I                                                                                                                                                                                                                                                                                                                                                                                                                                                                        f                     RET(ID);            g
                                                                                                                                                                                                                                                                                                                                                                                                                                                       L                     I                     I                                                                                                                                                                                                                                                                                                                                                                                                                            f                     RET(ID);            g
                                                                                                                                                                                                                                                                                                                                                                                                                                                       L                     I                     I                    I                                                                                                                                                                                                                                                                                                                                                                                 f                     RET(ID);            g
                                                                                                                                                                                                                                                                                                                                                                                                                                                       L                     I                     I                    I                     I                                                                                                                                                                                                                                                                                                                                     f                     RET(ID);            g
                                                                                                                                                                                                                                                                                                                                                                                                                                                       L                     I                     I                    I                     I                    I                                                                                                                                                                                                                                                                                          f                     RET(ID);            g
                                                                                                                                                                                                                                                                                                                                                                                                                                                       L                     I                     I                    I                     I                    I                     I                                                                                                                                                                                                                                              f                     RET(ID);            g
                                                                                                                                                                                                                                                                                                                                                                                                                                                       L                     I                     I                    I                     I                    I                     I                    I                                                                                                                                                                                                   f                     RET(ID);            g
                                                                                                                                                                                                                                                                                                                                                                                                                                                       L                     I*                                                                                                                                                                                                                                                                                                                                                                                                                                                f                      RET(ID);            g
                                                                                                                                                                                                                                                                   reduces            the            bumern           of             end-of-buer    kshecc            and             results           in              a              tsignican      speed           temenvimproervo             the              GLA-
                                                                                                                                                                                                                                                                   generated       scanner.
                                                                                                                                                                                                                                                                   6                                                                 Summary                  and             urther F           ork   W
                                                                                                                                                                                                                                                                   This            paper            has             described         RE2C,          a               tool            for            creating        lexical        analyzers.              eUnlik          other          hsuc             tools,          RE2C           con-
                                                                                                                                                                                                                                                                  tratescen        solely        on            generating   tecien        code            for          hingmatc     regular       expressions.         Not           only          does            this           singleness
                                                                                                                                                                                                                                                                   of              purpose           emak          RE2C            more            suitable        for             a               wider        yarietv         of              applications,  it             wsallo         it              to              generate          scanners
                                                                                                                                                                                                                                                                  hwhic        happroac      hand-crafted   scanners       in            terms         of           size           and          speed.               Compared     to            scanners        generated     yb            
ex,
                                                                                                                                                                                                                                                                   and           GLA,         RE2C-generated scanners       are            faster        and           in          yman        cases           smaller     as           ell.w
                                                                                                                                                                                                                                                                                                                                 While         RE2C-generated scanners        perform       ell,w         there           is            still         room          for         t.emenvimpro      Near           term        e-vimpro
                                                                                                                                                                                                                                                                  tsmen         include       using        GLA's         bit         ectorsv        to            simplify   some         switches      and          adding       a             state          unrolling   operator.
                                                                                                                                                                                                                                                                                                                                 In            the           longer        term,        inline       actions       will        be             added          to           RE2C.      orF          example,    a            specication    elik
                                                                                                                                                                                                                                                                                                                                                                                                                                                       D                     fc                   =                     $g                   (D                    fc                   =                     10*c                +                     $)*g
                                                                                                                                                                                                                                                                  tmigh          be                used            to              obtain          the            aluev           of              a               previously      scanned          teger.in              ,  ypicallyT  these             sorts             of              specications
                                                                                                                                                                                                                                                                  ouldw         be             used           as            an           action        in           some         other          specication.
                                                                                                                                                                                                                                                                   7                                                           tswledgmenknoAc
                                                                                                                                                                                                                                                                   The            authors       thank         the           referees        for           their        yman      aluablev    tscommen    and           suggestions.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             14
                                                                                                                                                                                                                                                                   A                                                                   C                     Scanner
                                                                                                                                                                                                                                                                   #define         BSIZE                                                                                                                                                                                           8192
                                                                                                                                                                                                                                                                   #define         RET(i)                                                                                                                                                                        {s->cur        =                cursor;        return          i;}
                                                                                                                                                                                                                                                                   #define         YYCTYPE                                                                                                                                                     uchar
                                                                                                                                                                                                                                                                   #define         YYCURSOR                                                                                                                                  cursor
                                                                                                                                                                                                                                                                   #define         YYLIMIT                                                                                                                                                     s->lim
                                                                                                                                                                                                                                                                   #define         YYMARKER                                                                                                                                  s->ptr
                                                                                                                                                                                                                                                                   #define         YYFILL(n)                                                                                                               {cursor        =                fill(s,        cursor);}
                                                                                                                                                                                                                                                                   typedef         struct         Scanner        {
                                                                                                                                                                                                                                                                                                                                          int                                                                                                                                                                                                                                                                                                        fd;
                                                                                                                                                                                                                                                                                                                                          uint                                                                                                                                                                                                                                                                                     line;
                                                                                                                                                                                                                                                                                                                                          uchar                                                                                                                                                                                                                                                                  *bot,           *tok,           *ptr,          *cur,           *pos,           *lim,           *top,           *eof;
                                                                                                                                                                                                                                                                   }                 Scanner;
                                                                                                                                                                                                                                                                   uchar          er*fill(Scann *s,              uchar           *cursor){
                                                                                                                                                                                                                                                                                                                                         ){if(!s->eof
                                                                                                                                                                                                                                                                                                                                                                                                                 uint           cnt             =                  s->tok         -               s->bot;
                                                                                                                                                                                                                                                                                                                                                                                                                 if(cnt){     /*               move             partial        token           to              bottom          */
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      ot,memcpy(s->bs->tok,        s->lim         -                 s->tok);       s->tok         =                s->bot;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       s->ptr          -=              cnt;            cursor          -=              cnt;            s->pos          -=               cnt;            s->lim          -=               cnt;
                                                                                                                                                                                                                                                                                                                                                                                                                 }
                                                                                                                                                                                                                                                                                                                                                                                                                pif((s->to     -                s->lim)        <                BSIZE){         /*              buffer          needs           to              be               expanded      */
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       uchar           *buf            =                 (uchar*)    m>limalloc(((s--                s->bot)        +            );ar)uchof(BSIZE)*size
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       memcpy(buf,   s->tok,        s->lim          -                s->tok);       s->tok         =                buf;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       s->ptr          =               r&buf[s->pt    -                s->bot];       cursor         =                or&buf[curs   -                 s->bot];
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       s->pos          =               s&buf[s->po    -                s->bot];       s->lim         =                im&buf[s->l   -                 s->bot];
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       s->top          =              ];IZE&s->lim[BS
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      );free(s->bot s->bot          =                buf;
                                                                                                                                                                                                                                                                                                                                                                                                                 }
                                                                                                                                                                                                                                                                                                                                                                                                                 if((cnt        =               ,read(s->fd    (char*)        s->lim,        BSIZE))        !=               BSIZE){        /*               EOF              */
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       s->eof          =               t];&s->lim[cn+*(s->eof)+    =                '\n';
                                                                                                                                                                                                                                                                                                                                                                                                                 }
                                                                                                                                                                                                                                                                                                                                                                                                                 s->lim         +=              cnt;
                                                                                                                                                                                                                                                                                                                                          }
                                                                                                                                                                                                                                                                                                                                          return          cursor;
                                                                                                                                                                                                                                                                   }
                                                                                                                                                                                                                                                                   int             erscan(Scann  *s){
                                                                                                                                                                                                                                                                                                                                                                                                                 uchar          *cursor         =               s->cur;
                                                                                                                                                                                                                                                                   std:                                                                  s->tok         =                cursor;
                                                                                                                                                                                                                                                                   /*!re2c
                                                                                                                                                                                                                                                                                                                                                                                                                 "/*"                                                                                                                                                        {                goto            comment;       }
                                                                                                                                                                                                                                                                   :     :     :        e mor         rules       :     :     :
                                                                                                                                                                                                                                                                                                                                                                                                                 [               \t\v\f]+                                          {                goto            std;             }
                                                                                                                                                                                                                                                                                                                                                                                                                 "\n"                                                                                                                                                        {                if(cursor      ==               s->eof)        RET(EOI);     s->pos          =                cursor;        s->line++;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        goto            std;             }
                                                                                                                                                                                                                                                                                                                                                                                                                7][\000-\37                     {             dctexpeprintf("unecharacter: '%c'\n",      *s->tok);
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        goto            std;             }
                                                                                                                                                                                                                                                                   */
                                                                                                                                                                                                                                                                   comment:
                                                                                                                                                                                                                                                                   /*!re2c
                                                                                                                                                                                                                                                                                                                                                                                                                 "*/"                                                                                                                                                        {                goto            std;             }
                                                                                                                                                                                                                                                                                                                                                                                                                 "\n"                                                                                                                                                        {                if(cursor      ==               s->eof)        RET(EOI);     s->tok          =                s->pos         =                 cursor;        s->line++;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        goto            comment;        }
                                                                                                                                                                                                                                                                                                                                                                                                                7][\000-\37                     {               goto            comment;        }
                                                                                                                                                                                                                                                                   */
                                                                                                                                                                                                                                                                   }
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             15
                                                                                                                                                                                                                                                                   References
                                                                                                                                                                                                                                                                                        [1]                 Aho,            A.               V.,             Sethi,           R.,               and              Ullman,           J.              D.                   Compilers:         principles,  chniques,te     and           ols.to              Addison-
                                                                                                                                                                                                                                                                                                                                                   , esleyW     1988.           tedReprin     with         corrections.
                                                                                                                                                                                                                                                                                        [2]                 Bernstein,      R.             L.                 Producing     good          code            for          the           case          t.statemene actice{PrSoftwarand    e eriencExp    15,
                                                                                                                                                                                                                                                                                                                                                        10           (October         1985),      1021{1024.
                                                                                                                                                                                                                                                                                        [3]                 DeRemer,          F.,              and             Pennello,          T.                 tEcien         computation  of            LALR(1)        look-ahead     sets.               CMA       ans-rT
                                                                                                                                                                                                                                                                                                                                                        actions         on       amminggroPr  anguagesL       and            Systems        4,            4             (October        1982),       615{649.
                                                                                                                                                                                                                                                                                        [4]                 Ellis,          M.,             and           up,oustrStr      B.                 The         d nnotateA      C++     e enceferR     Manual.     ,  esleyAddison-W1990.
                                                                                                                                                                                                                                                                                        [5]                 Fraser,         C.             W.,             and             Hanson,          D.              R.                A             retargetable compiler    for          ANSI          C.               SIGPLAN      esNotic       26,           10
                                                                                                                                                                                                                                                                                                                                                        (October        1991),       29{43.
                                                                                                                                                                                                                                                                                        [6]                 Garey,              M.               R.,               and                Johnson,            D.                S.                       Computers       and            actability:Intr      A                Guide            to              the            oryThe          of               NP-
                                                                                                                                                                                                                                                                                                                                                        Completeness.     W.            H.        reemanF      and       ,  yCompan 1991.
                                                                                                                                                                                                                                                                                        [7]              y, Gra        R.          W.          
 -GLA     -         A         generator for       lexical  analyzers that      programmerscan    use.        USENIX  e encConfer
                                                                                                                                                                                                                                                                                                                                              dingseeco Pr     (June          1988),      147{160.
                                                                                                                                                                                                                                                                                        [8]              y, Gra          R.             W.,            Heuring,       V.            P.,            Levi,           S.             P.,           ane,Slo          A.            M.,            and       aite,W         W.             M.              Eli:             A            complete,
                                                                                                                                                                                                                                                                                                                                                        
exible       compiler    construction   system.         ationsCommunic of              the           CMA           35,           2          ebruary(F     1992),       121{131.
                                                                                                                                                                                                                                                                                        [9]                osch,Gr           J.                 tEcien        generation     of            lexical      analysers.     e actice{PrSoftwarand      e eriencExp     19,            11            (1989),
                                                                                                                                                                                                                                                                                                                                                        1089{1103.
                                                                                                                                                                                                                                                                   [10]                Harrison,       M.              A.            ductionoIntr    to          ormalF      anguageL      ory.The     ,  esleyAddison-W1978.
                                                                                                                                                                                                                                                                   [11]                Hennessy,      J.           L.,            and           Mendelsohn,     N.             Compilationof      the         Pascal      case       t.statemeneactice{PrSoftwar
                                                                                                                                                                                                                                                                                                                                                        and        e eriencExp     12,           9            b(Septemer      1982),      879{882.
                                                                                                                                                                                                                                                                   [12]                Horspool,       R.              N.,            and            Whitney,        M.              enEv          faster        LR          parsing.  e actice{PrSoftwarand   e eriencExp    20,
                                                                                                                                                                                                                                                                                                                                                        6            (1990),       515{535.
                                                                                                                                                                                                                                                                   [13]               cobson,Ja        V.              uningT        UNIX          Lex            or            it's         NOT            true           what          they          ysa           about          Lex.               In            USENIX      e encConfer
                                                                                                                                                                                                                                                                                                                                              dingseeco Pr  ashington,(WDC,         terWin        1987),       pp.          163{164.        Abstract     .  only
                                                                                                                                                                                                                                                                   [14]                Kernighan,           B.                 W.,                 and                 Ritchie,           D.                  M.                          The               C         amminggroPr    anguage,L          2nd                Ed.                        tice-Hall,Pren
                                                                                                                                                                                                                                                                                                                                                        Inc.,        1988.
                                                                                                                                                                                                                                                                   [15]                Lesk,                    M.                     E.                                    LEX                 {                   a                   lexical             analyzer            generator.                            Computing         Science           hnicalecT           Report                39,                  Bell
                                                                                                                                                                                                                                                                                                                                                     elephoneT     Laboratories, yMurra        Hill,       NJ,          1975.
                                                                                                                                                                                                                                                                   [16]            axson,P           V.                 
ex          {             man         pages,        1988.            In           r.Z.
ex-2.3.7.taailableAvfor       ymousanon   ftp           from        ftp.uu.net     in
                                                                                                                                                                                                                                                                                                                                                        ages/gnu./pack
                                                                                                                                                                                                                                                                   [17]                Pennello,               T.                   J.                            eryV               fast               LR                parsing.                         In       dingseeco Pr       of                the                 CMA                SIGPLAN'86        osiumSymp           on
                                                                                                                                                                                                                                                                                                                                                        Compiler      Construction     (July        1986),       CM.A
                                                                                                                                                                                                                                                                   [18]                Sale,               A.                       The            ntatioimplemenof             case            tsstatemen       in             Pascal.            e actice{PrSoftwarand         e eriencExp      11,             9
                                                                                                                                                                                                                                                                                                                                                        b(Septemer    1981),       929{942.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             16
