Perl Internals
Contents:
Reading the Source
Architecture
Perl Value Types
Stacks and Messaging Protocol
Meaty Extensions
Easy Embedding API
Peek into the Future
Resources
It cannot be seen, cannot be felt,
Cannot be heard, cannot be smelt.
It lies behind stars and under hills,
And empty holes it fills.[]
- J.R.R. Tolkien, The Hobbit
[1] Answer: dark.
This chapter is a modest attempt to shed light on most of the critical data structures and functions of the Perl interpreter. Getting a handle on such (admittedly dry) detail will give you the confidence to write powerful extensions, and will inform your judgment about how (and how much) Perl should be used in a given application. The mark of a good Perl developer is the ability to answer questions that don't figure in the Frequently Asked Questions list, such as these:
- Why are objects so much preferable to closures?
- Why is my faster than local?
- The easy-embedding API presented in the last chapter isn't convenient enough. How can I roll my own?
- What do xsubpp and SWIG really produce?
- Why not join the Java revolution by making the Perl interpreter output Java byte-code?
And so on. All you require is fluency in C, an enquiring mind, and a comfortable chair.
If you are into instant gratification and can't wait to churn out a cool extension, you may opt for the low-fat thread running through this chapter; read the following sections: "Perl Value Types," "Stacks and Messaging Protocol," and "Meaty Extensions." You can definitely skip all sections entitled "Inside..." on a first reading without loss of continuity.
Reading the Source
There was this developer, the story goes, who was deeply mystified by a piece of code. It had no comments at all, and he couldn't for the life of him figure out how it did what it did. For years, he cursed the author of that code, but it continued to fascinate and trouble him. One day, it came to him in a flash. He understood it all. In fact, it was so obvious that he also understood why it didn't need any comments!
While the Perl source may be the final repository of all answers, it is a fairly reluctant informant. A lack of comments, generous use of macros, and some breath-taking optimizations make for a rather forbidding task of understanding the code, even for the true die-hard. If you are one of those who just wants to hack it and achieve all kinds of greatness, this chapter should get you adequately primed. In addition, here are some ways of understanding the system better:
- The -D option
- Perl can be optionally compiled with the -DDEBUGGING option, which enables the -D command-line switch. This takes several flags, all of which are documented in the perlrun document. Like a CAT scan, these flags provide unobtrusive snapshots of important structures at run time. For example, invoking Perl as perl -Dts tells it to display a trace of opcode execution (-t) and to dump the argument stack before each opcode is executed (-s).
- Devel tools
- Three modules available under the Devel hierarchy on CPAN provide script-level access to some important data structures. These are Devel::Peek (to dump internal information associated with a variable), Devel::Symdump (to dump the symbol table), and Devel::RegExp (to examine a regular expression). We will use the Devel::Peek module often in this chapter.
- Debugger (gdb, dbx, Microsoft Developer Studio)
- Examining Perl under a debugger gives a firsthand view of the entire process. At run-time, the process goes through three major phases: initialization, parsing, and execution; these can be examined quite independently. I suggest that you understand Perl value types and the stack protocol first, then attempt to understand the execution phase by setting a breakpoint at run.c:runops,[] and proceed from there. The parser and code generator are the most complex part of the tool; I recommend that you attempt to understand them only after you are comfortable with the rest of the system. Incidentally, tools such as cxref do not help much because most interesting accesses are hidden by macros, casts, and pointer indirections, so single-stepping with a source-level debugger is often the sole option.
[2] I did mean true die-hard. 
This chapter makes frequent references to source files, and while you may find it handy to have them in front of you, it is by no means necessary to do so.