Definitionary and the Language Independent Document

"Definitionary" is a term I invented for an inverse dictionary. (1) The entries are based on definitions, not the terms. There is a single, canonical, authoritative definition for each quantized shade of meaning. Additionally, the Definitionary is multilingual in that all languages share a global set of definitions with a one to many relationship from each definition to equivalent terms in various languages. Due to the cross-language aspect of the Definitionary, it is possible to create a simple cross-language electronic document based on definitions (including unique concepts). Normal sentences are composed of words. The definition-based Language Independent Document (LID) has sentences composed of numeric definition identifiers. Simple computer software renders the numbers into textual terms. Technically, each definition is assigned a unique numeric key. Software encodes and decodes the strings of definitions. Encoding software allows the user to choose one or more definitions for each word entered. When decoded, the document is rendered into the most popular word for each definition. The key is that decoded documents are rendered in the reader's language of choice, regardless of what language was used when composing the document.

In the early prototype, defintion order is identical to word order, leaving grammar to be determined by the human reader and writer. However, by imposing fixed part of speech to the encoding and decoding, grammar is trivially transposed by software. That isn't to say that the software "understands", but merely retains human choices for part of speech. The Definitionary is not a computer translator, nor does it analyze human language grammar or syntax. In the trivial case pluralization is retained as well, and tense may or may not be retained. That may be left up to the document author and user interface.

The LID has great value in business, social networks, and electronic publishing. For large, modern countries LID allows day-to-day business transactions between Europe, Asia, and the Americas without resorting to English. For businesses in small countries struggling to join the global economy, LID could be the key enabling technology where English is not common and for businesses too small to employ translators. The Definitionary is very much an inclusive technology.

A simple example is "Tom likes bugs". Encoded with the numeric definitions for "Tom (proper name)" "enjoys and appreciates" "Volkswagen Beetles". This renders into Spanish as "Tom le gustan los vochos." The key in this example is that the English writer selected the meaning "VW Beetle" as the meaning of "bug", and not the more common "crawling invertebrate". As a side note, Google Translate returns "Tom le gusta un insecto" which is clearly not the author's intent. Definitionary technology unambiguously captures the author's intent.

Web pages created in the LID could easily be rendered into the local language by the browser (or a plugin, or even a server-based tool). This enables all LID web pages to be universally accessible to a global audience. The people and cultures that have been somewhat marginalized by the (largely) English language World Wide Web will have a global audience for their pages.

Since the Definitionary uses concepts, unique cultural aspects are preserved. People won't be forced to express their ideas in English and using Western conceptual definitions. When there isn't a specific word for a concept in the reader's language, a phrase is substituted. Blended meanings and double entendres are supported via a mechanism allowing multiple definitions for a word-position in the LID. Thus the LID supports subtle meaning, and it also supports humor, albiet the cultural relevance burden is on the author and reader.

The prototype leaves grammar open to humans to understand. People have a talent for this, and by avoiding part of speech encoding, the user interface is greatly simplified. Anyone with a shred of experience across cultures has experienced SVO ordering differences, and most seem to deal with it. I already have working prototypes of the Definitionary and the LID. The big hurdle is to create a small number of syntaxes for various writers to choose from. Ideally, there would be a syntax not too different from the speaker's normal written grammar. The syntax would allow the writer to explicitly specific linguistic linkages such as subject-verb order. I estimate that it would take about 30 minutes to learn the syntax. Short sentences will be preferred, leading to Haiku-like or poetic documents.

Of course, every language has unique concepts. This is good. First, it highlights cultural differences, which the Definitionary both preserves and illuminates. Second, it is easy for the Definitionary creators to invent a phrase for each language that conveys the meaning of the unique definition from another culture. If the reader needs the exact concept, then they must read the full definition, just like a normal dictionary. The LID makes this simple. In the current implementation, I use a "mouse-over" to pop up the definition.

You can read a LID in any language present in the Definitionary, regardless in which language the document was originally composed. This has boundless possibilities to transform humanity. Not only can people communicate without becoming multilingual, but the Definitionary preserves all the unique concepts of each language.

Creating the Definitionary is going to be time consuming, although a working Defininitionary may require definitions for a bare 1000 words. Additinally, the project is helped by the many open source dictionaries. Naturally, technical terms are required, but once again, the number of terms is reasonable, and terms can be added over time.

Grammar of the LID is an interesting problem which I plan to avoid for the time being. That said research work done in the area of universal concepts in language, suggests that it is possible to create simplified grammatical encodings that work for everyone. One approach for LID might rely on a formal syntactical convention used in composing a LID. Rule-based decoding for each language renders the LID into sensible output. Guides could specify what kinds of compromises have been made for each language, and I'm guessing that someone can read through a guide to their language in a few minutes. For many languages the problem will be small due to their overall similarities.

I'm currently adding American English, and Spanish to the Definitionary. I plan to encode one of my web sites into LID web pages. At that point, the site becomes available in both English and Spanish. Due to the limited topic, a single site requires fewer than 750 definitions.

Although I plan to open source the Definitionary, it makes sense to get the project onto a firm business footing. Conceptually, the business model is based on revenue generated from training, consulting (primarily, document repository indexing), cultural guides, etc. The revenue stream keeps the project going, and is used to fund development of additional languages. It may be possible to get government and/or business funding for some popular languages. However, I expect that the Definitionary project will have to find internal resources for smaller languages.

If you have an interest in the Definitionary and the Language Independent Document, please contact me.

Footnote 1

The inverse dictionary concept and name Definitionary may have been previously invented, although I'm unclear if the idea was extrapolated to multi-lingual and the LID.


Contact Tom

Defindit Open Source  |  InfoGizmo Sites