mod_rewrite & .htaccess
In order to achieve the centralised (it’s not a typo, I’m an Aussie) parser outlined in the framework architecture we will need to have all relevant requests directed to index.php and block direct requests to any PHP page. For this, Apache’s mod_rewrite is a perfect tool and I will assume that you have some knowledge of regular expressions (there are great cheat sheets available for both mod_rewrite and regular expressions).
Here we will be dealing with the path of the URL (the part that tells the server what page is being requested). This is generally separated into a directory structure by forward slashes (/). The URL conventions are:
- Allowed characters are alpha (a-z), numeric (0-9), hyphen (-) and underscore (_)
- A trailing slash (/) will be treated as if the directory index has been requested
- The directory index is called main
- At least one page or directory is required
- URLs have no extension
This first rule matches for a URL with any number of directories (zero or more) as well as a specific file:
RewriteRule ^([a-z0-9_-]+)((\/[a-z0-9_-]+)*)$ index.php?page=$1$2 [NC,L,QSA]
The first set of parentheses match the mandatory page or directory. The second set is almost identical except for the fact that every remaining part will be prefixed with a slash (/). This path is then concatenated as a single string and passed to index.php for handling. Note that it is not case-sensitive (NC), it is a final match (L) and any query string is still passed (QSA).
A couple of examples:
- my-path is rewritten as index.php?page=my-path
- directory/structure/to/my-path is matched as $1 = directory while $2 = /structure/to/my-path and is rewritten as index.php?page=directory/structure/to/my-path
This second rule is almost identical to the first. The only difference is the inclusion of a trailing slash which, as described in the conventions, is a request for the directory index called main.
RewriteRule ^([a-z0-9_-]+)((\/[a-z0-9_-]+)*)\/$ index.php?page=$1$2/main [nc,L,QSA]
An example:
- directory/ is equivalent to directory/main and is rewritten as index.php?page=directory/main
I am by no means an expert in mod_rewrite nor regular expressions and I am quite confident that the efficiency of these two rules can be greatly improved. If you have any suggestions please post them as comments.
Current text version of .htaccess