Machine translation: Difference between revisions

Latest revision as of 08:23, 28 August 2013

Machine language translation with apertium

കമ്പ്യൂട്ടറിന്റെ സഹായത്താല്‍ ഒരു ഭാഷയിലുള്ള ഉള്ളടക്കത്തെ മറ്റൊരു ഭാഷയിലേക്ക് പരിഭാഷപ്പെടുത്തുന്നതിനെയാണ് യാന്ത്രിക പരിഭാഷ എന്നു പറയുന്നത്. ഒരു ഭാഷയിലെ വാക്കുകള്‍ക്ക് പകരം target ഭാഷയിലെ വാക്കുകള്‍ പകരം വച്ചതുകൊണ്ടു മാത്രം പരിഭാഷ സാധ്യമല്ല , കാരണം ഓരോ ഭാഷയ്ക്കും വ്യത്യസ്തമായ വ്യാകരണമാണ്.

യാന്ത്രിക പരിഭാഷയെ പ്രധാനമായും രണ്ടായി തിരിക്കാം.

Rule Based
Corpus Based

Rule Based

വ്യാകരണ നിയമങ്ങളുടെ അടിസ്ഥാനമാക്കി പരിഭാഷപെടുത്തുന്നതിനെയാണ് റൂള്‍ ബേസ്ഡ് എന്നു പറയുന്നത് "Rule-based machine translation is like taking a set of dictionaries and a descriptive grammar, and trying to translate from one language you don’t know into another."

ഗുണങ്ങള്‍

പ്രവചിക്കാവുന്ന ഫലം (predictable output)
പ്രവചിക്കാവുന്ന തെറ്റുകള്‍(predictable errors)
(incremental improvements)
തെറ്റുകള്‍ എളുപ്പത്തില്‍ കണ്ടുപിടിക്കാം
വലിയ അളവില്‍ നിലവിലുള്ള പരിഭാഷയുടെ ലഭ്യത ആവശ്യമില്ല.

ദോഷങ്ങള്‍

Lack of fluency
Lack of idiomaticness
“Mechanical” output
Development (വികസനം) കൂടുതല്‍ സമയമെടുത്തേക്കാം

Corpus Based

ലഭ്യമായ മുന്‍ പരിഭാഷകള്‍ ഉപയോഗിച്ച് word matchingങ്ങിലൂടെ പരിഭാഷപ്പെടുത്തുന്നതിനെയാണ് കോര്‍പസ് ബേസ്ഡ് പരിഭാഷ എന്നു പറയുന്നത് "Corpus-based machine translation is like taking two documents in two languages you don’t know which are translations of each other and trying to match up words. Then you use these words to build sentences which you put into Google to see if they sound likely."

ഗുണങ്ങള്‍

Fluent output
Idiomatic output
No need for linguistic resources:
1. dictionaries
2. grammars
3. linguists

ദോഷങ്ങള്‍

Unpredictable
Incremental improvements are hard
Development can be time consuming

റൂള്‍ ബേസ്ഡ് പരിഭാഷയെ അടിസ്താനമാകിയുള്ള സ്വതന്ത്ര സോഫ്ട്വൈര്‍ ആണ് apertium. wikimeadia യുടെ പുതിയ പരിഭാഷ പദ്ധതിയിലും apertium ആണ് ഉപയോഗിക്കുന്നത് .ലളിതവും extensibility യുമാണ് അപെറ്ടിയത്തെ പ്രിയങ്കരമാക്കുന്നത് . 2004 ഇല്‍ സ്പൈന് വ്യവസായ വകുപ്പിന്റെ കീഴില്‍ കണ്‍സോര്ഷ്യമാണ് അപെറ്ടിയം ഉണ്ടാക്കിയത്

സ്ഥാപിക്കാന്‍/Installation

Installing the newest version from SVN

Step 1: install the prerequisites. ടെര്‍മിനല്‍ ഓപെണ്‍ ചെയ്ത് താഴെയുള്ള കോഡ് എന്റെര്‍ ചെയ്യുക

sudo apt-get install subversion build-essential pkg-config gawk libxml2 libxml2-dev libxml2-utils xsltproc flex automake libtool libpcre3-dev

പാസ്വെര്‍ഡ് ട്യെപ് ചെയ്ത് എന്റെര്‍ ചെയ്യുക .

Step 2: Download required packages.

svn checkout https://svn.code.sf.net/p/apertium/svn/trunk/lttoolbox
svn checkout https://svn.code.sf.net/p/apertium/svn/trunk/apertium
svn checkout https://svn.code.sf.net/p/apertium/svn/trunk/apertium-lex-tools

Step 2: compile and install.

cd lttoolbox
./autogen.sh 
make
make install
ldconfig

cd apertium
./autogen.sh 
make
make install
ldconfig

cd apertium-lex-tools
./autogen.sh 
make
make install
ldconfig

ഘടന

Deformatter

ഈ സ്ടെപ്പില്‍ പരിഭാഷപ്പെടുത്തേണ്ട textഇനെ ഫോര്‍മാറ്റിങ്ങുകളില്‍ നിന്നു മുക്തമാകും , ഉദാഹരണത്തിന് പരിഭാഷപെടുത്തേണ്ട text ഒരു html ഫയല്‍ ആണെങ്കില്‍ അതിലെ ടാഗ്സ് ഒഴിവാക്കും
Morphological analyzer: ഈ സ്ടെപ്പില്‍ പരിഭാഷപ്പെടുത്തേണ്ട textഇനെ സര്‍ഫസ് ഫോമുകളായി ഭാഗിക്കുകയും , ഒരോന്നിനും ഒന്നോ അതിലതികമോ ലെക്സിക്കല്‍ ഫോമുകള്‍ കണ്ടെത്തുകയും ചെയ്യും
Part of speech tagger: ഒന്നിലതികം ലെക്സിക്കല്‍ ഫോമുകളുള്ള വാക്കുക്കളില്‍ ഏറ്റവും അനുയോജ്യമായത് തിരഞെടുക്കും
Lexical transfer
Lexical Selection
Structural Transfer
Morphological generator
Post generator
Reformatter: ആദ്യത്തെ സ്ടെപ്പില്‍ ഒഴിവാക്കിയ ഫോര്‍മാറ്റിങ്ങ് വിവരങ്ങള്‍ വീണ്ടും ചേര്‍ക്കും

മലയാളം പരിഭാഷ

എനി നമുക്ക് അപെര്‍ടിയം ഉപയോഗിച്ച് മലയാളം പരിഭാഷ എങ്ങനെ ചെയ്യും എന്നു നോക്കം (It is difficult to handle agglutination in the following method)

morphological Analyzer

ഇതിനു നമുക്ക് മൂന്ന് നിഘണ്ടുകള്‍ വേണം

mono lingual dictionary of malayalam
mono lingual dictionary of malayalam
bilingual english malayalam dictionary

ആദ്യം മലയാളം മോണൊlingual നിഘണ്ടു എങ്ങനെ ഉണ്ടാക്കും എന്നു നോക്കാം


<?xml version="1.0" encoding="UTF-8"?>
<dictionary>
      <alphabet>അആഇഈഉഊഋഌഎഏഐഒഓഔകഖഗഘങചഛജഝഞടഠഡഢണതഥദധനഩപഫബഭമയരറലളഴവശഷസഹഽാിീുൂൃൄെേൈൊോൌ്ൗൠൡൢൣൺൻര്‍ൾൿംഃ</alphabet>
      <sdefs>
      </sdefs>
      <pardefs>

      </pardefs>
      <section id="main" type="standard">

      </section>
</dictionary>

ഇതിലെ ഒരോ ടാഗുകളും അതിന്റെ ഉപയോഗങ്ങളും

<dictionary></dictionary> : ഈ ടഗിനകത്താണ് നമ്മുടെ നിഗണ്ടുവിലെ ഉള്ളടക്കം ചേര്‍ക്കേണ്ടത്
<alphabet></alphabet> : ഈ ടാഗിനകതാണ് അക്ഷരങ്ങള്‍ ചേര്‍ക്കേണ്ടത്
<sdefs></sdefs> : സിംബlലുകള്‍ നിര്‍വചിക്കാന്‍
<pdefs></pdefs> : paradigms നിര്‍വചിക്കാന്‍
<section></section> : വാക്കുകള്‍ നിര്‍വചിക്കാന്‍
<sdef></sdef> : സിംബല്‍ നിര്വചിക്കാന്‍
<pdef></pdef> :paradigms നിര്‍വചിക്കാന്‍
- <e> : എന്റിറ്റി
- : pair
- <l> : left
- <r> : right

note: <l></> ടാഗിനകത്ത് വാക്കിന്റെ അവസാനം വരുന്ന മാറ്റങ്ങളും <r></r> ടാഗിനകത്ത് അതിന്റെ സിംബല്‍സുമാണ് ചേര്‍ക്കുക നമുക്കിതില്‍ മരം എന്ന വാക്ക് ചേര്‍ത്തു നോക്കാം

<?xml version="1.0" encoding="UTF-8"?>
<dictionary>
<alphabet>അആഇഈഉഊഋഌഎഏഐഒഓഔകഖഗഘങചഛജഝഞടഠഡഢണതഥദധനഩപഫബഭമയരറലളഴവശഷസഹഽാിീുൂൃൄെേൈൊോൌ്ൗൠൡൢൣൺൻര്‍ൾൿംഃ</alphabet>
	<sdefs>
		<sdef n="n" c="Noun . നാമം"/>
		<sdef n="nom"     c="Nominative"/>
		<sdef n="acc"     c="Accusative"/>
		<sdef n="ins"     c="Instrumental"/>
		<sdef n="soc"     c="Sociative"/>
		<sdef n="dat"     c="Dative"/>
		<sdef n="voc"     c="Vocative"/>
		<sdef n="gen"     c="Genitive"/>
		<sdef n="loc"     c="Locative"/>
		<sdef n="sg"      c="Singular . ഏകവചനം"/>
    	<sdef n="pl"      c="Plural . ബഹു വചനം"/>
    	<sdef n="sp"      c="Singular / Plural . ഏകവചനം/ബഹുവചനം"/>
 	</sdefs>
  	<pardefs>
		<pardef n="മര/ം__n">
			  <e>       <p><l>ം</l><r>ം<s n="n"/><s n="sg"/><s n="nom"/></r></p></e>
			  <e>       <p><l>ത്തെ</l><r>ം<s n="n"/><s n="sg"/><s n="acc"/></r></p></e>
			  <e>       <p><l>ത്തിന്റെ</l><r>ം<s n="n"/><s n="sg"/><s n="gen"/></r></p></e>
			  <e>       <p><l>ത്തിന്</l><r>ം<s n="n"/><s n="sg"/><s n="dat"/></r></p></e>
			  <e>       <p><l>ത്തില്‍</l><r>ം<s n="n"/><s n="sg"/><s n="loc"/></r></p></e>
			  <e>       <p><l>ത്താല്‍</l><r>ം<s n="n"/><s n="sg"/><s n="ins"/></r></p></e>
			  <e>       <p><l>മേ</l><r>ം<s n="n"/><s n="sg"/><s n="voc"/></r></p></e>
			  <e>       <p><l>ത്തൊടു</l><r>ം<s n="n"/><s n="sg"/><s n="soc"/></r></p></e>
			  <e>       <p><l>ങ്ങള്‍</l><r>ം<s n="n"/><s n="pl"/><s n="nom"/></r></p></e>
			  <e>       <p><l>ങ്ങളെ</l><r>ം<s n="n"/><s n="pl"/><s n="acc"/></r></p></e>
			  <e>       <p><l>ങ്ങളുടെ</l><r>ം<s n="n"/><s n="pl"/><s n="gen"/></r></p></e>
			  <e>       <p><l>ങ്ങള്‍കൂ</l><r>ം<s n="n"/><s n="pl"/><s n="dat"/></r></p></e>
			  <e>       <p><l>ങ്ങളില്‍</l><r>ം<s n="n"/><s n="pl"/><s n="loc"/></r></p></e>
			  <e>       <p><l>ങ്ങളാല്‍</l><r>ം<s n="n"/><s n="pl"/><s n="ins"/></r></p></e>
			  <e>       <p><l>ങ്ങളേ</l><r>ം<s n="n"/><s n="pl"/><s n="voc"/></r></p></e>
			  <e>       <p><l>ങ്ങളോടു</l><r>ം<s n="n"/><s n="pl"/><s n="soc"/></r></p></e>
		</pardef>
 	</pardefs>
 	<section id="main" type="standard">
    	<e lm="മരം"><i>മര</i><par n="മര/ം__n"/></e>
 	</section>
</dictionary>

ഇതിനെ apertium-mal-eng.sh.dix എന്ന പേരില്‍ സേവ് ചെയ്യുക എനി നമുക്കിതിനെ അനലെസര്‍ ആയി കംബെല്‍ ചെയ്യാം ( left to right)

lt-comp lr apertium-mal-eng.sh.dix mal-eng.automorf.bin

ജെനെറേടര്‍ ആയി കംബെല്‍ ചെയ്യാന്‍ (right to left)

lt-comp rl apertium-mal-eng.sh.dix eng-mal.autogen.bin

ഇതിനെ ടെസ്ട് ചെയ്യാന്‍ lt-comp ട്ടൂള്‍ ഉപയോഗിക്കാം

lt-proc mal-eng.automorf.bin

ശേഷം മരം എന്നെഴുതി എന്റെര്‍ പ്രെസ്സ് ചെയ്താല്‍ ഇതു പോലെയുള്ള ഔട്പുട് ലഭിക്കും

മരം
^മരം/മരം<n><sg><nom>$

Anonymous

Search

Machine translation: Difference between revisions

Namespaces

More

Page actions

Latest revision as of 08:23, 28 August 2013

Contents

Machine language translation with apertium

Rule Based

ഗുണങ്ങള്‍

ദോഷങ്ങള്‍

Corpus Based

ഗുണങ്ങള്‍

ദോഷങ്ങള്‍

സ്ഥാപിക്കാന്‍/Installation

ഘടന

മലയാളം പരിഭാഷ

morphological Analyzer

റിസോഴ്സെസ്

കൂടുതല്‍ വിവരങ്ങള്‍ക്ക്

Navigation

Navigation

പ്രധാന കണ്ണികള്‍

പ്രാദേശികവത്കരണം

നിവേശകരീതികള്‍

സംഭാഷണോപാധികള്‍

ഉപകരണങ്ങള്‍

കല

പ്രസിദ്ധീകരണം

Wiki tools

Wiki tools

@@ Line 131: / Line 131: @@
 * <dictionary></dictionary> : ഈ ടഗിനകത്താണ് നമ്മുടെ നിഗണ്ടുവിലെ ഉള്ളടക്കം ചേര്‍ക്കേണ്ടത്
 * <alphabet></alphabet> : ഈ ടാഗിനകതാണ് അക്ഷരങ്ങള്‍ ചേര്‍ക്കേണ്ടത്
-* <sdefs></sdefs> : സിംബല്‍ നിര്‍വചിക്കാന്‍
+* <sdefs></sdefs> : സിംബlലുകള്‍  നിര്‍വചിക്കാന്‍
 * <pdefs></pdefs> : paradigms നിര്‍വചിക്കാന്‍
 * <section></section> : വാക്കുകള്‍ നിര്‍വചിക്കാന്‍
+* <sdef></sdef> : സിംബല്‍ നിര്വചിക്കാന്‍
+* <pdef></pdef> :paradigms നിര്‍വചിക്കാന്‍
+**<e> : എന്റിറ്റി
+** <p> : pair
+**<l> : left
+**<r> : right
+note: <l></> ടാഗിനകത്ത് വാക്കിന്റെ അവസാനം വരുന്ന മാറ്റങ്ങളും <r></r> ടാഗിനകത്ത് അതിന്റെ സിംബല്‍സുമാണ് ചേര്‍ക്കുക
 നമുക്കിതില്‍ മരം എന്ന വാക്ക് ചേര്‍ത്തു നോക്കാം
 <pre>
@@ Line 150: / Line 157: @@
 		<sdef n="gen"     c="Genitive"/>
 		<sdef n="loc"     c="Locative"/>
-		<sdef n="pl"      c="Plural . ബഹു വചനം"/>
+		<sdef n="sg"      c="Singular . ഏകവചനം"/>
-		<sdef n="sp"      c="Singular / Plural . ഏകവചനം"/>
+    	<sdef n="pl"      c="Plural . ബഹു വചനം"/>
+    	<sdef n="sp"      c="Singular / Plural . ഏകവചനം/ബഹുവചനം"/>
   	</sdefs>
    	<pardefs>
@@ Line 176: / Line 184: @@
      	<e lm="മരം"><i>മര</i><par n="മര/ം__n"/></e>
   	</section>
-<dictionary>
+</dictionary>
+</pre>
+ഇതിനെ apertium-mal-eng.sh.dix എന്ന പേരില്‍ സേവ് ചെയ്യുക
+എനി നമുക്കിതിനെ അനലെസര്‍ ആയി കംബെല്‍ ചെയ്യാം ( left to right)
+<pre>
+lt-comp lr apertium-mal-eng.sh.dix mal-eng.automorf.bin
+</pre>
+ജെനെറേടര്‍ ആയി കംബെല്‍ ചെയ്യാന്‍ (right to left)
+<pre>
+lt-comp rl apertium-mal-eng.sh.dix eng-mal.autogen.bin
+</pre>
+ഇതിനെ ടെസ്ട് ചെയ്യാന്‍ lt-comp ട്ടൂള്‍ ഉപയോഗിക്കാം
+<pre>
+lt-proc mal-eng.automorf.bin
+</pre>
+ശേഷം മരം എന്നെഴുതി എന്റെര്‍ പ്രെസ്സ് ചെയ്താല്‍ ഇതു പോലെയുള്ള ഔട്പുട് ലഭിക്കും
+<pre>
+മരം
+^മരം/മരം<n><sg><nom>$
 </pre>

Anonymous

Search

Machine translation: Difference between revisions

Latest revision as of 08:23, 28 August 2013

Machine language translation with apertium

Rule Based

ഗുണങ്ങള്‍

ദോഷങ്ങള്‍

Corpus Based

ഗുണങ്ങള്‍

ദോഷങ്ങള്‍

സ്ഥാപിക്കാന്‍/Installation

ഘടന

മലയാളം പരിഭാഷ

morphological Analyzer

റിസോഴ്സെസ്

കൂടുതല്‍ വിവരങ്ങള്‍ക്ക്

Navigation

Wiki tools

Page tools