GHC’s renamer

GHC Contributors’ Workshop

Sam Derbyshire, Well-Typed

June 7th, 2023

Plan

  • First part: survey of the renamer and how information about identifiers flows through the compiler pipeline.
  • Second part: getting our hands dirty, fixing a bug in the renamer (interaction of COMPLETE sets of pattern synonyms and do notation).

What does the renamer do?

  1. Gives all names unique identifiers.
    • Resolve namespacing (type constructor vs variable, module qualification, record fields).
    • Handle shadowing.
  2. Usage & dependency analysis.
    • Redundant/unused imports, unused declarations, …
    • Dependency analysis: determines order of typechecking.

Names

There are many ways that GHC can refer to an identifier depending on how much it knows about it.

  • OccName
  • Name
  • RdrName
  • GlobalRdrElt
  • Var, TyVar, Id

OccName: occurrences with a NameSpace

data OccName = OccName
    { occNameSpace :: NameSpace
    , occNameFS    :: FastString }

data NameSpace
  = VarName
  | FldName { fldParent :: FastString }
  | DataName
  | TvName
  | TcClsName

newtype OccEnv a = MkOccEnv (FastStringEnv (UniqFM NameSpace a))

Name: unique identifiers

data Name = Name
  { n_sort :: NameSort
  , n_occ  :: OccName
  , n_uniq :: Unique
  , n_loc  :: SrcSpan
  }

data NameSort
  = External Module
  | WiredIn Module TyThing BuiltInSyntax
  | Internal
  | System

RdrName: umbrella datatype

data RdrName
  = Unqual OccName
  | Qual ModuleName OccName
  | Orig Module OccName
  | Exact Name

Var and Id: variables

data Var -- slightly abridged
  = TyVar { varName    :: Name
          , varType    :: Kind }
  | TcTyVar { varName       :: Name
            , varType       :: Kind
            , tc_tv_details :: TcTyVarDetails }
  | Id { varName    :: Name
       , varType    :: Type
       , varMult    :: Mult
       , idScope    :: IdScope
       , id_details :: IdDetails
       , id_info    :: IdInfo
       }
data IdScope = GlobalId | LocalId ExportFlag
data ExportFlag = NotExported | Exported
data IdDetails = VanillaId | RecSelId {} | PrimOpId {} | CoVarId {} | ...

type TyVar = Var
type Id = Var

Information about Names: TyThing and GREInfo

data TyThing
  = AnId     Id
  | AConLike ConLike
  | ATyCon   TyCon
  | ACoAxiom (CoAxiom Branched)
data GREInfo
  = Vanilla
  | IAmTyCon    (TyConFlavour Name)
  | IAmConLike  ConInfo
  | IAmRecField RecFieldInfo

GlobalRdrElt and the GlobalRdrEnv

The renamer primarily deals with GlobalRdrElt, which consists of a Name, information about how it’s in scope in the renamer, and additional information that the renamer might need to know.

data GlobalRdrElt
  = GRE { gre_name :: Name
        , gre_par  :: Parent
        , gre_lcl  :: Bool
        , gre_imp  :: Bag ImportSpec
        , gre_info :: GREInfo }

type GlobalRdrEnv = OccEnv [GlobalRdrElt]

The TcRn monad

Renaming and typechecking happens in a shared monad, TcM (also called TcRn or RnM).

type TcRnIf a b = IOEnv (Env a b)
type TcRn = TcRnIf TcGblEnv TcLclEnv
type TcM = TcRn
type RnM = TcRn

This is ReaderT over IO, with access to:

  • HscEnv – per-module options (e.g. flags passed to GHC) and environment
    e.g. UnitEnv, currently loaded modules;
  • TcGblEnv – generated during typechecking and passed on
    e.g. TypeEnv, InstEnv, GlobalRdrEnv;
  • TcLclEnv – changes as we move inside expressions
    e.g. SrcSpan, TcLevel, LocalRdrEnv.

Example: renaming a record field update

Start by looking at rnExpr (RecordUpd {..}):

rnExpr (RecordUpd { rupd_expr = L l expr, rupd_flds = rbinds })
  = setSrcSpanA l $
    case rbinds of
      RegularRecUpdFields { recUpdFields = flds } ->
        do  { (e, fv_e) <- rnExpr expr
            ; (parents, flds, fv_flds) <- rnHsRecUpdFields flds
            ; ... }
      -- ...

Looking up fields

After handling duplicates, rnHsRecUpdFields starts by calling lookupRecUpdFields to look up each field individually.

lookupRecUpdFields :: NE.NonEmpty (LHsRecUpdField GhcPs GhcPs)
                   -> RnM (NE.NonEmpty (HsRecUpdParent GhcRn))
lookupRecUpdFields flds
  = do { gre_env <- getGlobalRdrEnv
       ; fld1_gres NE.:| other_flds_gres <-
           mapM (lookupFieldGREs gre_env . getFieldUpdLbl) flds
       ; -- ...
       }

We retrieve GlobalRdrElts for each record field.
These have GREInfos which specify which constructors have that field.
We can use that to disambiguate.

Disambiguating using constructors

data S = MkS1 { x, y :: Float } | MkS2 { x, y, z :: Float }
data T = MkT1 { x :: Word } | MkT2 { y :: Int }

foo r = r { x = 3, y = 4 }

Lookup: [(x, [MkS1, MkS2, MkT1]), (y, [MkS1, MkS2, MkT2])].

Intersect all the possible data constructors: [MkS1, MkS2].

Take parents (removing duplicates): [S].

There is a single parent: the record update is unambiguous!

Flow through the compiler & trees that grow

type family IdP p
type LIdP p = XRec p (IdP p)
type family XRec p a
type instance IdP (GhcPass p) = IdGhcP p
type family IdGhcP pass where
  IdGhcP 'Parsed      = RdrName
  IdGhcP 'Renamed     = Name
  IdGhcP 'Typechecked = Id
type instance XRec (GhcPass p) a = GenLocated (Anno a) a

data GenLocated l e = L l e

type instance Anno RdrName = SrcSpanAnnN -- SrcSpan, but with
type instance Anno Name    = SrcSpanAnnN -- some extra stuff
type instance Anno Id      = SrcSpanAnnN -- for exact-print

Lexing

We start off by lexing. See GHC.Parser.Lexer.x:

@varid  = $small $idchar*          -- variable identifiers
@conid  = $large $idchar*          -- constructor identifiers
@varsym = ($symbol # \:) $symbol*  -- variable (operator) symbol
@consym = \: $symbol*              -- constructor (operator) symbol

$small = [a-z \_]
$large = [A-Z]
$symbol = [\!\#\$\%\&\*\+\.\/\<\=\>\?\@\\\^\|\-\~\:]
$idchar = [$small $large $digit \']

Parsing

Then parsing, in GHC.Parser.y.

  • Parse occurrences into RdrName.
    • Use mkQual/mkUnqual depending on qualification.
    • Resolve NameSpace using context (e.g. whether we are parsing a term or a type).
  • Fix up after the fact when necessary.

Example: parsing a type constructor.

tyconsym :: { RdrName } -- (actually LocatedN RdrName)
  : CONSYM { mkUnqual tcClsName (getCONSYM $1) }
  | VARSYM { mkUnqual tcClsName (getVARSYM $1) }
  | ':'    { consDataCon_RDR }
  | '-'    { mkUnqual tcClsName (fsLit "-") }
  | '.'    { mkUnqual tcClsName (fsLit ".") }

Quick example of fixing up

data D a b c d = Q a => b :+: c :*: d

Parse this as a type first. Then, after properly associating using the fixities, we find that :+: should be namespaced as a data constructor, and the others remain type constructors.

We also need to resolve ambiguities between expression and patterns. See Note [Ambiguous syntactic categories] in GHC.Parser.PostProcess.

The renamer & typechecker pipeline

The main entry point is GHC.Tc.Module.tcRnModuleTcRnM.

  1. Rename the import list in tcRnImports.
  2. Rename and typecheck local declarations and the export list in tcRnSrcDecls.
    1. Rename local declarations in rnTopSrcDecls.
    2. Typecheck these declarations in tcTopSrcDecls.
    3. Rename the exports in rnExportList. This allows us to assemble a final TcGblEnv which contains everything provided by the module.
  3. Assemble the final typechecked module, to be passed onto the next stage of the compiler pipeline. Everything is extracted from the TcGblEnv.

Renaming imports

Entry point: GHC.Rename.Names.rnImportDecl.

  1. Load the module we’re importing; this will load its interface file from disk, or directly load the information if it’s available in memory.
  2. Figure out what we are importing, in filterImports.
    • For a blanket import, import everything.
    • For an explicit import list, accumulate everything that is mentioned.

      import M( A(x, ..), B(..) )

    • import M hiding ( .. )

      Works the same, except now we filter out instead.

  3. Add all the imported identifiers to the GlobalRdrEnv. We will look up in this environment when renaming the body of the module.

Renaming local definitions

The main entry point to renaming local declarations is GHC.Rename.Module.rnSrcDecls. Control flow:

  1. Generate new Names for all binders.
    1. “Non-value” binders getLocalNonValBinders.
      1. Type synonyms, type families, data declarations, data families, class declarations.
      2. Data family instances and class instances, including associated types and methods.
      3. Foreign imports.
    2. Pattern synonyms extendPatSynEnv.
    3. Generate new Names and rename the LHS of top-level value bindings (rnTopBindsLHS).

  2. Rename declaration bodies.
    1. Type declarations rnTyClDecls.
      1. Type synonyms, type families, data declarations, data families, class declarations.
      2. Standalone kind signatures.
      3. Type family instances, data family instances, class instances.
      4. Role annotations.
    2. Top-level value bindings rnValBindsRHS.
    3. Everything else: RULES, foreign import/exports, default declarations…

Dependency analysis of type declarations

rnTyClDecls - renames all types/classes defined in the module - uses this information to compute dependency groups (strongly-connected components).

rnTyClDecls :: [TyClGroup GhcPs] -> RnM ([TyClGroup GhcRn], FreeVars)
rnTyClDecls tycl_ds =
  do { tycls_w_fvs  <- mapM rnTyClDecl (tyClGroupTyClDecls tycl_ds)
     ; kisigs_w_fvs <- rnStandaloneKindSigs (tyClGroupKindSigs tycl_ds)
     ; instds_w_fvs <- mapM rnSrcInstDecl (tyClGroupInstDecls tycl_ds)
     ; let tycl_sccs = depAnalTyClDecls kisig_fv_env tycls_w_fvs
     ; .. }

See Note [Dependency analysis of type, class, and instance decls] in GHC.Rename.Module.


This approach is known to have shortcomings, as it doesn’t properly account for type instances when kind-checking.

See wiki page: Type-&-Class-Dependency-Analysis.

Dependency analysis of value declarations

data HsBindLR idL idR
  = FunBind {..} -- used for function/variable bindings
  | PatBind {..} -- (pattern is never a simple variable)
  | VarBind {..} -- (introduced by the typechecker)
  | PatSynBind {..}
  | XHsBindsLR !(XXHsBindsLR idL idR) -- (introduced by the typechecker)

rnValBindsRHS - renames all value bindings - performs dependency analysis

rnValBindsRHS ctxt (ValBinds _ mbinds sigs)
  = do { (sigs', sig_fvs) <- renameSigs ctxt sigs
       ; binds_w_dus <- mapBagM (rnLBind (mkScopedTvFn sigs')) mbinds
       ; let !(anal_binds, anal_dus) = depAnalBinds binds_w_dus
       ; ... }

Topics not covered

  • Template Haskell
  • Backpack

Slides available online: sheaf.github.io/ghc-renamer.