How do Falcon’s compile-time regular expression compilations compare with the library routines of other languages?
Actually, compile-time regular expression was scheduled to be in by now (April 2009), but we went a bit long on the 0.9 release and this moved compile-time Regex development a bit forward in time. However the topic is interesting, because it allows me to show three mechanisms at binding level that may be useful to the users of the scripting engine.
The plan is as follows: Falcon modular system provides to C++ a mechanism called “service”. A service is a virtual class publishing to C++ what the module would publish to the scripts loading it. Since 0.8.12, the Falcon compiler has had a meta-compiler that fires a complete virtual machine on request. Once we accept the idea of meta-compilation, the compiler may also use the environmental settings to load the Regex module and use its methods from the service interface; that's exactly like calling the C functions directly, with just a virtual call indirection layer (which is totally irrelevant in the context of compiling a regular expression).
Since 0.9, items themselves are in charge of resolving operators through a function vector called item_co (item common operations). We may either introduce a new item type for strings generated as compile time regular expressions, and provide them with a item_co table partially derived from the other string item type, or just create them a string with a special marker (we already have string subtypes) and act with branches on equality/relational operators. On modern systems, a branch may cost more than a simple call in terms of CPU/memory times, so I would probably go for adding an item type (that would be also useful at script level to detect those special strings and handle them differently).
The fact that we want to use the Regex module at compile time is another interesting point for embedders. If we included regular expressions in the main engine, we would grow it some more and we would prevent the embedders from the ability of disabling this feature.
One of the reasons I wanted Falcon was to allow foreign, less-trusted scripts to be compiled remotely and sent in pre-compiled format to a server for remote execution. The executing server may want to disable some features for security reasons (it may forbid to use file i/o), and that just on some unprivileged vm, while the administrative scripts run at full power. That was impossible to do with the other scripting engines unless there were deep rewrites. Falcon modular system allows the modules to be inspected and modified by the application prior to injection into the requesting vms. So, a server or application with differently privileged script areas can pre-load and re-configure the modules it wishes the script to be able to use, preventing the loading of other modules, while letting privileged scripts to run unhindered.
Regexes are heavy, and not all embedding applications may wish their scripts to use them. In example, a MMORPG application may decide that AI bots have no use for regular expressions, and avoid providing the Regex module. At this point, the compiler would simply raise an error if it finds a r"..." string in a source, and the vm would raise an error if it has to deal with a pre-compiled Regex in a module. At the same time, as the Regex module is mandatory on any complete command line Falcon installation, command line scripts can use Regexes at the sole extra cost of dynamic load of the Regex module, which is irrelevant on a complete Falcon application, and that would be cached on repeated usage patterns as with the Web server modules.
Do you plan to develop your own REGEX library to drive your regular expressions?
No we're happy with PCRE, which is the best library around in our opinion, and even if it's relatively huge, having it in a separate module loaded on need seems the way to go. We keep updated as possible with its development, providing native binding on some systems where PCRE is available (many Linux distributions) and shipping it directly in the module code where it is not available.
Is the embeddable aspect of Falcon versatile?
I talked diffusely about that in the Regex example above, but other than the reconfigurability and sharing of pre-loaded modules across application vm, we have more. The vm itself has many virtual methods that can be overloaded by the target application, and is light enough to allow a one-vm-per-script model. Heavy scripts can have their own vm in the target application, and can happily be run each in its own thread; yet vms can be recycled by de-linking just run scripts and linking new ones, keeping the existing modules so that they're already served to scripts needing them.
The vm itself can interact with the embedding application through periodic callbacks and sleep requests. For example, a flag can be set so that every sleep request in which the vm cannot swap in a coroutine ready to run is passed to the calling application, that can decide do use the idle time as it thinks best. For instance, this allows spectacular quasi-parallel effects in the FXChat binding, where the sleep() function allows Xchat to proceed. This may seem a secondary aspect, but other engines are actually very closed on this; once you start a script or issue a callback, all that the application can do is to hope that it ends soon. With Falcon you can interrupt the target vm with simple requests that will be honoured as soon as possible, and eventually resume it from the point it was suspended and inspected.
Since 0.9 we have introduced even a personalized object model. Falcon instances need not be full blown Falcon objects; the application may provide its own mapping from data to items travelling through the Falcon vm. Compare this with the need of creating a dictionary at each new instance, and having to map each property to a function retrieving data from the host program or from the binded library.
Other classes which you can override are the module loader, which may provide Falcon modules from other type of sources, or from internal storage in embedding applications, and since 0.9 the URI providers. Modules and embedding applications can register their own URI providers, so that opening a module in the app:// space would turn into a request to get a module from an internally provided resource, or opening a stream from a script from app:// would make possible to communicate binary data via streams to other parts of the application.
Frankly, we did our best to make our engine the most versatile around. They say LUA is very versatile, as you can reprogram it as you wish. But then, that is true for any open source project.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.