MeatballWiki |
RecentChanges |
Random Page |
Indices |
Categories
For a wiki, this is the
RegularExpression that the script searches for to determine which text makes a link.
See LinkPatternSuggestions for suggested changes or additions to this wiki's LinkPatterns.
CategoryWikiTechnology CategoryIndexingScheme
For those who know Perl, this wiki's LinkPattern(s) are:
# Current for UseModWiki/MeatballWiki 0.8.8
$UseSubpage = 0; # 1 = use subpages, 0 = do not use subpages
$SimpleLinks = 1; # 1 = only letters, 0 = allow _ and numbers
$NonEnglish = 1; # 1 = extra link chars, 0 = only A-Za-z chars
sub InitLinkPatterns {
my ($UpperLetter, $LowerLetter, $AnyLetter, $LpA, $LpB, $QDelim);
$UpperLetter = "[A-Z";
$LowerLetter = "[a-z";
$AnyLetter = "[A-Za-z";
if ($NonEnglish) {
$UpperLetter .= "\xc0-\xde";
$LowerLetter .= "\xdf-\xff";
$AnyLetter .= "\xc0-\xff";
}
if (!$SimpleLinks) {
$AnyLetter .= "_0-9";
}
$UpperLetter .= "]"; $LowerLetter .= "]"; $AnyLetter .= "]";
# Main link pattern: lowercase between uppercase, then anything
$LpA = $UpperLetter . "+" . $LowerLetter . "+" . $UpperLetter
. $AnyLetter . "*";
# Optional subpage link pattern: uppercase, lowercase, then anything
$LpB = $UpperLetter . "+" . $LowerLetter . "+" . $AnyLetter . "*";
if ($UseSubpage) {
# Loose pattern: If subpage is used, subpage may be simple name
$LinkPattern = "((($LpA)?\\/$LpB)|$LpA)";
# Strict pattern: both sides must be the main LinkPattern
# $LinkPattern = "((($LpA)?\\/)?$LpA)";
} else {
$LinkPattern = "($LpA)";
}
$QDelim = '("")?'; # Optional quote delimiter (not in output)
$LinkPattern .= $QDelim;
# Url-style links are delimited by one of:
# 1. Whitespace (kept in output)
# 2. Left angle-bracket (<) (kept in output)
# 3. A single double-quote (") (kept in output)
# 4. A double double-quote ("") (removed from output)
# Inter-site convention: sites must start with uppercase letter
# (Uppercase letter avoids confusion with URLs)
$InterSitePattern = $UpperLetter . $AnyLetter . "+";
$InterLinkPattern = "(($InterSitePattern:[^\\s\"<]+)$QDelim)";
$UrlProtocols = "(http|ftp|afs|news|nntp|mid|cid|mailto|wais|"
. "prospero|telnet|gopher)";
$UrlPattern = "((($UrlProtocols):[^\\s\"<]+)$QDelim)";
$ImageExtensions = "(gif|jpg|png|bmp|jpeg)";
$RFCPattern = "RFC\\s?(\\d+)";
$ISBNPattern = "ISBN:?([0-9- xX]{10,})";
}
For the other 98% who don't read Perl regular expressions, the Meatball pattern is:
One or more uppercase letters, then one or more lowercase letters, then one uppercase letter, then "any letters" (either upper or lowercase). (For non-Meatball wikis using UseModWiki, "any letters" can include underscores and numbers.)
UseModWiki's LinkPattern is purposefully looser than WikiWiki's, mostly to accomodate names with middle initials, and a few cases like titles with "A" in the middle. Many users have been confused by the Wiki rules--watch Wiki:RecentVisitors for frequent examples.
WikiWiki's
LinkPattern is: \b([A-Z][a-z]+){2,}\b
This is considered to be the CamelCase link pattern and is the "standard" format. (More like the reference format.) Most wiki sites with automatic linking will create links on this pattern.
LinkPattern usually refers to just the pattern for normal page links. There are other patterns which match URLs and images.
(These patterns are now listed in the code above.)
I was talking to OriFolger
? who has set up a wiki in Hebrew at
http://www.rashreshet.org (
BrokenLink dec 2003). Since Hebrew has no miniscule/majuscule distinctions, these "bumpy" link patterns aren't applicable. He decided on the "standard" free form link pattern ala "[link pattern]". On the other hand, YasushiIwata
? has a Japanese
MoinMoin at
http://www.sh.rim.or.jp/~yasusii (
BrokenLink dec 2003). There, as far as I can tell, the page names are the standard
MoinMoin patterns, essentially
CamelCase in the Latin alphabet. Consequently, they use English page names with Japanese content. This further reinforces the need to tailor the
LinkPattern to the site's particular needs, not enforce a "standard" pattern. --
SunirShah
Thank you Cliff for the elegant external link pattern in brackets, e.g. [homepage of usemod]. Now MeatballWiki can rather easily refer to arbitrary WikiPages on external wikis, using their native names. Could this be combined with InterWiki prefixes, looking like [Tcl:Tcl community projects] in browse mode and [Tcl:Tcl community projects] in edit mode. -- FridemarPache
Some notes
This page is obsolete and bogus. I'll be rewriting it over time, as well as trying to develop better WikiSyntaxSemantics? for links.
- There are two essential types of link patterns. There are ExplicitLink?s, like FreeLinks, where the authors explicitly write the link into the text; and there are ImplicitLink?s, where the normal text is augmented with links by the software, such as RFC 1234, where the software conveniently creates a link to the RFC text. Another example are URLs.
- The PatternLanguage-based LinkPattern that we have here, known as CamelCase, is possibly an inbetween case if your brain isn't already written in Smalltalk, but it worked so well for the PortlandPatternRepository simply because it was very natural for those authors to name concepts (i.e. patterns) as single tokens written in CamelCase.
- The GaGaParser takes implicit links to a new level.
- InterWiki links are explicit links. TwinPages are implicit links.
- I secretly believe that a well designed implicit link system is much better than an explicit link system.
- There are two essential types of FreeLink syntax. There are the [[external form]] and the internal_form. That is, the external form is delineated by beginning and end markers whereas the internal form connect each word together with some syntax. They have different pluses and minuses. [[The external form makes it easer to make entire sentences or paragraphs links]], whereas the internal form makes it harder because it is harder to type. Making it hard to link entire swaths of text will probably make page titles better. The internal form also has the potential of looking like a link if you use underscores.
- Some people have done \(very_strange_things) with free links like combine the two types whilst adding extraneous (and therefore useless) syntax for the hell of it. Don't do this.
- The external form of the link syntax also makes link verification very complex because it demands two versions of the LinkPattern. One for the syntax with the begin/end markers and one for the page nym matching the WikiNameCanonicalization format.
- The internal form loses a character from the domain.
- _link_pattern_ is also an internal form, even with begin and end markers, because the essential linking is done by connecting words together. This pattern allows single word _links_.
- The LinkPattern in the WikiSyntax may not have anything to do with how it is displayed. Some wikis use CamelCase, but display Camel Case. This has the disadvantage of displaying McDonalds? as Mc Donalds, as well as breaking WYSIWYG.
- It's usually a good idea to make the page titles match the LinkPattern that links to them, although this is not necessary. Doing so will reinforce the LinkPattern, help the users remember the PageDatabase, and generally reduce user confusion. Then again, spacing out CamelCase in the title is almost always the right thing to do, and at least helps Google index your page properly. For FreeLinks, however, it's a very bad idea as meaningless variations such as punctuation or capitalization will create distinct pages. Instead, use WikiNameCanonicalization. The problem of representing 8-track as the same page as "eight track" cannot be resolved (easily) this way, however. You could rely on people doing the right thing, or you could restrict their choices so that they must do the right thing.
- The NoSuchPageSyntax modifies representation to indicate the lack of an existing page.
- Magic link syntax can act as macros. Consider how non-linear ISBN 1-234-5678-X works. It would be worse if the ISBN links validate the ISBN format.
- Links may include aggregate structure. InterWiki links first link against an InterWiki moniker to choose a particular wiki and then against some text that (theoretically) links against some page on the foreign wiki. SubPages link first against the main page and then against the subpage. In neither case is it possible to link against a particular component by itself.
- Very simply, links are three part affairs. Link syntax translates to link representation and _target nym (page name). Each part can be as complex as a TuringMachine? will allow.
- The LinkPattern has to be used not only during syntax translation but also during page verification. You have to match all requests to load page X against the LinkPattern to check if X is in fact a valid page name; this includes checking if the page exists for the NoSuchPageSyntax. However, since the _target nym format may not be exactly the same as the link syntax format, say in the case of the external form of the FreeLink syntax, a separate verification link pattern may be required. This can make the implementation much more complex as link pattern information is coded in more than one place.
- At the very least, it's probably a good idea to canonicalize to some, singular invariant nym format. If you have multiple link patterns, accept them on the URL, but redirect to a canonicalized version of the nyms. For instance, suppose MeatballWiki moved to the link_pattern format, so this page would become http://meatballwiki.org/wiki/link_pattern. For backward compatibility, though, we also accept the old LinkPattern, but we canonicalize it. So, when someone went to http://meatballwiki.org/wiki/LinkPattern, LinkPattern would canonicalize to link_pattern, and then the script will redirect to http://meatballwiki.org/wiki/link_pattern.
- Then again, suppose we made the underscore (_) a linkable character but a non-character. So Link_Pattern would link the same as LinkPattern. If you go to http://meatballwiki.org/wiki/Link_Pattern, the script may then be intelligent enough to space the nym as Link Pattern. On the other hand, if you go to http://meatballwiki.org/wiki/LinkPattern, the script would emit the nym as LinkPattern. This would allow Marty_McFly to render properly, whilst still remaining MartyMcFly? from the PageDatabase's point of view. This doesn't break WYSIWYG, though it does add the mystifying part about WikiNameCanonicalization to the mix.
- There is no, none, notta programmatic way to unify plurals with singulars. Don't try. You'll fail. Don't make algorithms that make mistakes when humans can do much better.
- Many people find it necessary to delimit the influence of a LinkPattern. Here's a tip: the pipe (|) is not a legal URL character and not very likely to be part of any meaningful page name. It also has the advantage of looking like a dividing line. So LinkPattern|s will become LinkPatterns. UseMod currently uses double double quotes (""), which makes no sense. The pipe has a big disadvantage, though, in that it looks like an l or a 1 on some screens to some eyes. Another option is the forward slash, e.g. LinkPattern/s, which is closer to contemporary typographic style (e.g. s/he), although it is incompatible with SubPages as they stand today, and the forward slash is a URL character, making it hopeless for delimiting InterWiki links.
- Dropping the trailing punctuation of URLs and InterWiki links is necessary, even if this last character is a valid URL character, because links may appear at the ends of sentences, like http://www.example.com. It would be bad to link the trailing period in the previous sentence.
- One thing might be nice though -- allowing a final close paren if the link contains an open paren. A fair number of WikiPedia pages are titled with parenthetical disambiguators, such as WikiPedia:John_Williams_(composer), which don't make it through the current link pattern here.
- Speaking about disambiguators, is there any way I can configure UseMod to accept Shiva(Skansen) as a link, not provided with brackets? [[Dan Koehl]]
---
New to MeatBall, not sure where to post this. :-/
FreeLinks cover most of my needs, except for losing some of the convenience of CamelCase-ing links. I would like to get feedback on the social and technical ramifications of a linking rule for camel cased links that, after validating a potential wiki link, matched it against that word regardless of case. Thus PythOn? would match PyThon? but not Python. The latter case can be covered by a bracket syntax. GaGaParser is too liberal for my needs. - ZWiki:DeanGoodmanson
So, I'd love to hear about experiments with other patterns for ImplicitLink?s. Sure, CamelCase is all well and good, but what about some other ideas?
- regex methods
- capitalization methods
- special characters
- underscore_separated (LizzyWiki)
- hyphen-separated-terms
- ^startingwithspecialcharacter (CoForum)
- other
- any word longer than 4 letters
- dictionary methods
- words that dictionary-match on a verb or a noun
- words that don't dictionary-match at all
- words that dictionary-match a different langue from the standard text
- word usage
- unique words in the Wiki
- low-frequency words in the Wiki
--EvanProdromou
ProWiki supports
- Automatic word linking (and definition of what a word is). In German nouns are capitalized, so this lends itself very well. In English one could use e.g. words containing 8 letters or more. Begging links on words are not implemented, but words link when corresponding pages exist.
- Automatic underline linking. Any word containing at least one underline becomes a link. The underline is interpreted as a space (like WikiPedia). Underline linking has been pioneered by LizzyWiki (AlainDesilets?) as far as I know (collaborative story writing needs it obviously).
--
HelmutLeitner