WaKan Project Website Forum Index WaKan Project Website
Forums about WaKan and Japanese & Chinese language
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Adding to dictionaries?
Goto page 1, 2  Next
 
Post new topic   Reply to topic    WaKan Project Website Forum Index -> Japanese language
View previous topic :: View next topic  
Author Message
Broshek
Guest





PostPosted: Fri Feb 03, 2006 8:07 am    Post subject: Adding to dictionaries? Reply with quote

Hi, I have a job doing technical translation from Japanese to English, and while Wakan has a lot of the words that I look up, there are many that aren't in Wakan, and I end up searching through several dictionaries before I find the word I'm looking for.

Is there any way to make a new dictionary for the Wakan system or add entries to an existing one? That way, once I find the word I'm looking for, I could put it in Wakan and next time it will be there. This could also be useful to others in my position who can't find the word(s) they are looking for.
Back to top
Tom Hodgers
Co-Admin


Joined: 26 Jan 2004
Posts: 253
Location: Valencia, Venezuela via Liverpool and Manchester, England

PostPosted: Fri Feb 03, 2006 8:05 pm    Post subject: Reply with quote

Hi Broshek,

New words may be inserted to a vocabulary file (call it Personal Dictionary if you like) and saved to one of the vocabulary classes (Lesson, Group, Temporary or Wordlist - see Wakan help file, Terms&reference, Vocabulary categories).

When you do a dictionary search or use pop-up over a word not in any of the loaded dictionaries but contained in the vocabulary file, the word will be shown with its translation and the name of the Vocabulary file (in Green text).

Vocabulary files must be stored in the Wakan folder.

Hope this helps,

Tom
_________________
Just another
和漢 WAKAN
若人 WHACKO DOing his thing
Back to top
View user's profile Send private message Send e-mail
tony
Co-Admin


Joined: 27 Nov 2003
Posts: 750

PostPosted: Fri Feb 03, 2006 10:13 pm    Post subject: Reply with quote

Broshek,

It is also possible to create a new dictionary or to edit an existing dictionary, but not entirely within WaKan. You create or edit an EDICT-formatted file, and then import it using the "Import from EDICT" command in the Dictionary Manager, which is accessed from the command "Dictionary manager..." on the Tools menu.

If you want to do this, and need more details on EDICT format or importing a dictionary, let me know, and I will either provide you with links to other threads in these forums where I have discussed this, or copy over the text of those explanations into this thread.

--Tony
Back to top
View user's profile Send private message Send e-mail
Broshek
Guest





PostPosted: Sun Feb 05, 2006 11:13 am    Post subject: Reply with quote

Tony,

That would be very helpful! Thank you!
Back to top
tony
Co-Admin


Joined: 27 Nov 2003
Posts: 750

PostPosted: Sun Feb 05, 2006 3:21 pm    Post subject: Reply with quote

Broshek,

An EDICT-formatted file is just a text file with an entry on each line with the following format:

KanjiSpelling [PhoneticSpellingInHiragana] /Definition1/Definition2/

The definition strings (you can have only one, or more than two, with forward slash delimiters as indicated) must contain only ASCII characters. Various encodings can be used for the file, but I think the simplest way is to create a Unicode file; most word processors (even NotePad and WordPad) can save files in Unicode format. So a typical entry might look like:

攻究 [こうきゅう] /study/investigation/research/

I recommend making a new dictionary to contain the entries you want to add. You will need to update the original text file you make for this, and import it again each time you update it, so that will go faster if it consists only of your new entries, rather than containing all of EDICT as well.

[Warning: there is a bug in the import routine which causes it to miss the last definitions in the EDICT-formatted file. The workaround for this suggested by Filip is to put 20 or so lines starting with # at the end of the file; these are ignored for the purposes of making dictionary entries, but stop the routine from missing the definitions near the end. ]

Steps in importing an EDICT-formatted dictionary file:

On the Tools menu, select "Dictionary Manager..."

Click the "Import from EDICT" button near the bottom of the window ("Import from EDICT-formatted file" would be clearer).

In the "Dictionary import" window:
(1) For "File name without extension," give the name you want the file containing the imported dictionary to have. WaKan has its own dictionary format, with extension .dic, so this will be appended to the name you give.
(2) For "Dictionary name," give the name for the dictionary which you want to appear in the dictionary manager list, used for assigning it to the three dictionary groups, choosing whether to have it used by the popup tool, and so on.
(3) Click the add button, and navigate to and select the file you want to import using the Open File common dialog which appears. The pathname of this will appear in the list box "Included EDICT format files."
(4) Click the "Japanese" radio button.
(5) Select the remaining options you prefer, and click "Build."

Back in the Dictionary Manager window, you will probably need to click the "Refresh and rescan" button for the newly imported file to appear on the "Dictionaries" list. Select it, check that the build date given for it is the date on which you do the import, and select the options you want to apply to the dictionary.

Let me know if you run into any problems.

--Tony
Back to top
View user's profile Send private message Send e-mail
Broshek
Guest





PostPosted: Tue Feb 07, 2006 6:06 am    Post subject: Reply with quote

Okay, so the original file that you make with all the entries... should that be saved as a .txt file or a .doc file, or what?
Back to top
tony
Co-Admin


Joined: 27 Nov 2003
Posts: 750

PostPosted: Wed Feb 08, 2006 1:27 am    Post subject: Reply with quote

It doesn't matter what extension the file has, or even whether it has an extension. But it has to be a plain text file, preferably in Unicode format.

If you make it with WordPad, do a "Save As..." command from the file menu, and when the "Save As" dialog comes up, be sure to select "Unicode Text Document" from the dropdown combo box at the bottom of the dialog. (If this is not one of the choices, you need to update your WordPad.)

If you make it with NotePad, the "Save As" dialog has a dropdown combo box at the bottom labeled "Encoding". Unicode or UTF-8 will both work; I don't know about the third Unicode option ("Unicode big endian").

Microsoft Word knows how to save a plain text file in Unicode, but it takes several steps. I would recommend using NotePad instead. If you save it as a plain text file, it may mangle your kanji and kana, depending on your system settings. It's all too complicated to bother with.

In StarOffice (and probably OpenOffice as well; I don't have it installed at the moment), save it as type "Text Encoded (.txt)". It will prompt you for the Encoding you want; you can choose Unicode or any Japanese encoding, as long as you remember to select the same encoding when you import the file into WaKan.

Hope you are using one of the above.

--Tony

--Tony
Back to top
View user's profile Send private message Send e-mail
Drex



Joined: 11 Feb 2008
Posts: 26

PostPosted: Mon Feb 11, 2008 3:38 pm    Post subject: Reply with quote

Hello guys!
Wakan is a very useful and friendly tool. And creating of new dictionaries definetely works. However I wonder how to make it display cyrillic translations? It works with user vocabulary but with dictionary there's no way. It doesn't matter if I save source file in Unicode. Cyrillic text can be shown in Written position but not within translation field. There is only empty space displayed instead of characters.
Back to top
View user's profile Send private message
tony
Co-Admin


Joined: 27 Nov 2003
Posts: 750

PostPosted: Mon Feb 11, 2008 6:41 pm    Post subject: Reply with quote

Drex,

I'm sorry, but this is a built-in limitation of the program. The internal data structures and/or control used for displaying definitions can only handle ASCII characters (the first 256 character values in any font, usually used for Roman letters and a handful of punctuation marks). There have been many requests over the years to replace this control, but as far as I know, there are not yet any plans to do so. I agree that it would be very nice to be able to write definitions using Unicode characters.

--Tony

Note to Filip: Any possibility of this in the near future? You said at one time that you were going to look at updated controls in the development environment you used for WaKan.
Back to top
View user's profile Send private message Send e-mail
Drex



Joined: 11 Feb 2008
Posts: 26

PostPosted: Tue Feb 12, 2008 10:54 am    Post subject: Reply with quote

tony wrote:
There have been many requests over the years to replace this control, but as far as I know, there are not yet any plans to do so. I agree that it would be very nice to be able to write definitions using Unicode characters.

You're absolutely right. That'd be a great feature. Wakan dictionary now is rich and more useful than any of other known to me. I like it very much. But it's annoying sometimes to reach japanese through english while my native is russian. Smile Both my hands vote for that feature development!
Back to top
View user's profile Send private message
denkbert



Joined: 15 Jun 2009
Posts: 6

PostPosted: Mon Jun 15, 2009 12:05 pm    Post subject: Reply with quote

Hello tony (I seriously hope someone still has an eye on these boards everyone now and then ...),

I tried out what you explained there, and it actually worked so far, but during the conversion it converts just exactly one line.

I copied your example
攻究 [こうきゅう] /study/investigation/research/
directly into the text file and added some other entries myself, f.e.
妖怪 [ようかい] /monster/
日本学 [にほんがく] /japanology/

The program converts the file and I can also see parts of all entries in the ("scrambled") dic source code. But when I select this new converted dictionary in the dictionary manager, it says that there is just exactly 1 entry. This entry is always your 攻究 example, it doesn't show anything else.

I do all the editing in Notepad++, my txt files are being saved as unicode files and I select the unicode conversion during the conversion process. Everything seems to be working fine so far, except for the problem that it keeps showing just this one entry.

The code of my test dictionary:

Code:
攻究 [こうきゅう] /study/investigation/research/
妖怪研究 [ようかいけんきゅう] /youkai Forschung/
日本学 [にほんがく] /Japanologie/


There are like six of the # symbols above it and 200 or so in the lines behind it.

After the conversion, the dic file looks like this:
Code:
PKGv2.1> õ  NONAME22                 PKGBOF
Labyrinth Package File

Title:           NONAME22 Dictionary
Filename:        noname22.dic
Format:          Pure Package File
Company:         LABYRINTH
Copyright:       
Version:         1.0
Comments:        File is used by WaKan - Japanese & Chinese Learning Tool

Builder: PKG Builder (C) LABYRINTH 1999-2001
Package file technology (C) LABYRINTH 1999-2001

If distributing this file, it must be distributed as it is
without any modifications. Note that distributing this file may not
be allowed by the creator of this file and will be lawfully punished.
Ask the creator of this file whether you can distribute it.

LABYRINTH is not responsible for any damage caused by misuse of this
file and is not responsible for the contents.

< end of header >
}Gÿ'V   øOÆO>þPÌl7õWsœœ e
Ka¥vœ4³
0SO;ƒ@c0Êæõæÿsàê6u@   EL'@{.DSZx
i(ˆְVth1/Ÿ6¡JV¢£Œ0²Õ033œY¾Yæ;4þ³a'E=l bœ%ºñÆ8¯Ÿ,Wø-B½û#gÀà(W;ˆWBS%|èRØmJ¼p {ŸIªB+ŸlN¾   E@ϼñã2à£þu^Xøìø'CñŐ~W"2 þ1]4OðNh>]͑ÏÅZ+ƃÊ~³h_>^F Û¥GÏ¡õfþWK£m-   5996    5B66   602A   653B   65E5   672C   7814   7A76   
       
                 
         
   }[æÞ,%k4&¼Ã5# 1aO<eaïv2Grï2ïo`;Í>O~ÆÏ_ův.¼xÛ²9æR
ŒrÃ6#Ÿa9,õvÿ{Ù,Ø&Ÿ?q@Åqù¢/¾r͙vDPŸAXp'Ætpn$Ê:[Ø   C8ʃànN8øÆ<6s˜¼@%Ÿd,/Tè¼l?{uYã%þè`Jfw}ہ ÿ   c=ŸãJ¡õ0 M½M~ɧõŒƒû4£b׵3ɵ³]g$ªy   wðÀ3Ha1õL^{ðHÆt]Ï{   dede    fors   inve   japa   rese   stud   youk       
                         
   }s^ÑRÀ1èǗޕ
À=ðþapJK)X'pDb@;\ØP²~?xŸ1~BÑΚñƒr ªy(T_ºH    ˜nA7~"
q2=vG h貌CF7D^60a;m>Ê}6MV˛õÙPø߆#/Ÿv@J(OLn’*qÅÞIa켌%+ìC>¾[²@cÀ9@&¿Gù5m³PhÌÆ"i#þÒd³¹ 1û;].rÏ@:ûï q7m1'˜kðmƒxv?C3nƒ¹¿
><KÑ\˜ºhrÿe³/þB.¿E,ύ$TEXTTABLE
$PREBUFFER
$PRECOUNTED
$RAWINDEX
$FIELDS
iIndex
sEnglish
xPhonetic
xKanji
sSort
sMarkers
iFrequency
$ORDERS
Phonetic_Ind
Kanji_Ind
<Phonetic_Ind
<Kanji_Ind
$SEEKS
Index
Sort
Kanji
<Sort
<Kanji
}MïðêÐ<ÙGÿåH5`<!Y6¼BðDRqN{"Ûºþ¥;è å`k¢ïNÿlÌIg. ½yvuZÀ"ƒŒ¹œ¥lEÌGBÈu2dC{kӑKgwÆ|þ_¾bzÙðUÛ| B[GÕ:|xƒ*)4ù˜zø¢Z¿ÆÞ¼t   iN<YùBĆ*ðWE¾[r$g'`V]æûø\ߓw¹£Œ-0Þc?|x䧓֎Jo(=m0s+ðhÑû >n\5͑Oû=ÿwˆº]gT[ã¢*_¥keŒ¼[ZR$@IÏ99ΰ¯ƒ`         study, investigation, research,0S0F0M00Fe;zvkoukyuu   
   youkai ForschungY`*xzv0  [0ˆ0F0K0D0Q00M00F ]Y`*xzv0  [0ˆ0F0K0D0Q00M00F ][youkaikenkyuu]       Japanologieeåg,[f0  [0k0{00L0O ]eåg,[f0  [0k0{00L0O ][nihongaku]       dede,0g0g0g0gdede    }ò
oÑKpØ29_Ò
aS\e˜ï¥wù)kÐS&)=¿ÞãX}û~-WÅVìêT;2<n$¹²
;˜[#ÀjÀwj:pTZTÈׂOdaU[u¯PÀrFÆ¥¾ŒCHR1z0$8̺æIIì²ÐSœ
ˆ¼J߱Ê~b³ÅK>6£+GbP):C¾IA4Dw²QGpÈ6½ïÅ   Ù_KÆb4*~\Ê+q¾Sl[æmAÿ½¹&ɀōUMW&Dñò [l5¿_swæ¼t%ð<¥>]plÐLÈ3ïœRãè½½1y`È?П%     
  <       £           }6V'~fÆ1}uÐZΊÅCê½ñðD!³feñÛo:'¢ì@¯¥䀸 _ê U;c.DĎX%ÕfØÿlì;Ÿ!ã;ZHƒ%9²Q2¼paXCO]
S:èˆøEd=luªÈ.^{\#½MÀˆGÀK¹2.2f¢˜$R,d4PAªõàø"ˆP¯jVŒ+Hm{Õ䖚o^YÀ%QAIenw/ }ñ ¹kYãBï4œ_)œpF~]XÿàrGÀÅ[ÿœ6 ]j½)ÙZN³"

j(&\¾õJædà0³                                                    }¼_)#¾^w¥ͧME]L{½º}Û:¼+ڠ¹ælò;¼\Èr1.xjJoj8|h<AP'28:^2bXsŸRr¾ñNïvÞD"   bŝJn3uQV)1qþlw”KXÒ;,Å Œ70A²l<œ²4|!årˆ% uN`ÅÌ:ݓ-cOÅLÆ¿kõx݀ã_#àPœ~pvIªIÛx{5@¥<ûÃ1õÐJ] gÃÕZ7,0{ì ÿùP#7
rq&¿S>>p{³'R/dKfÈAl -zv}RñWÑœƒMAŸDICT
4
39979
N/A
NONAME22
j

0
10596

PKGEOF


I'd be really, really glad, if someone, anyone, could help; I also think that I might not be the only one with a problem like this.

Greetings and thanks in advance.
Back to top
View user's profile Send private message
tony
Co-Admin


Joined: 27 Nov 2003
Posts: 750

PostPosted: Mon Jun 15, 2009 2:17 pm    Post subject: Reply with quote

Broshek-san, hajimemashite.

I think you are running into a bug which I warned about. Here is the relevant paragraph in my earlier entry about this bug, and a workaround for it. Unfortunately, there has not been a new version of WaKan posted for a very long time, so I believe this bug is still present.

Good luck, and let me know how it goes.

Douzo yoroshiku.
Tony

[Warning: there is a bug in the import routine which causes it to miss the last definitions in the EDICT-formatted file. The workaround for this suggested by Filip is to put 20 or so lines starting with # at the end of the file; these are ignored for the purposes of making dictionary entries, but stop the routine from missing the definitions near the end. ]
Back to top
View user's profile Send private message Send e-mail
Tom Hodgers
Co-Admin


Joined: 26 Jan 2004
Posts: 253
Location: Valencia, Venezuela via Liverpool and Manchester, England

PostPosted: Mon Jun 15, 2009 2:42 pm    Post subject: Reply with quote

denkbert wrote:
...... I do all the editing in Notepad++, my txt files are being saved as unicode files and I select the unicode conversion during the conversion process. Everything seems to be working fine so far, except for the problem that it keeps showing just this one entry.


Hi denkbert,

Are you saying that you try to convert Unicode saved files to Unicode with the convertor? This may be your problem.

Something similar happened to me when compiling the Spanish dictionary.

I eventually saved directly to Unicode from JWPce (without using convertor) and everything worked OK.

Try changing the 4 hash signs # at the begining and those at the end to ????

Tony: I have noticed that you need about 200 ???? at the end to make sure no lines of data are missed out.


Cheers,

Tom
_________________
Just another
和漢 WAKAN
若人 WHACKO DOing his thing
Back to top
View user's profile Send private message Send e-mail
denkbert



Joined: 15 Jun 2009
Posts: 6

PostPosted: Mon Jun 15, 2009 6:31 pm    Post subject: Reply with quote

Thanks guys, thank you really much. The hint with the four question marks was the missing piece. I experimented for about two hours and was able to work out the following requirements:


  1. The file has to start with four question marks.
  2. The entries have to formatted in EDICT format, i.e. KANJI<space> [READING]<space>/MEANING/ - as in this example: 攻究 [こうきゅう] /study/investigation/research/
  3. I tested about a dozen dictionaries, each with a different number of question marks at their end; I thus found out that you have to paste around 320 lines (!) of four question marks in each line (????). This will undoubtedly screw up the word count display in the dictionary manager display, but it's not like that'd be that important anyway.
  4. While the textfiles have to be in Unicode, not every Unicode derivation works. When I wrote my txt in Notepad++ and chose utf8 as format, Wakan converted, but screwed up the conversion. It works if you use the "UCS-2 Little Endian" in Notepad++. If you use the native Windows Text Editor Notepad, you can simply save your files as unicode text files and the conversion will work (if you paid attention to the points above).
  5. When being prompted to choose a text format, i chose "unicode 16-bit".
  6. 残念ながら、I was not able to implement vocabulary frequency information; but I think actually building a dictionary was far more important.


This should be all the info needed to build a wakan dictionary on your own.
Back to top
View user's profile Send private message
Anonymous
Guest





PostPosted: Tue Sep 14, 2010 2:01 am    Post subject: Reply with quote

Any way to make a new dictionary for the Wakan system or add entries to an existing one? That way, once I find the word I'm looking for, I could put it in Wakan and next time it will be there. This could also be useful to others in my position who can't find the word(s) they are looking for.
_________________________________
Hotels Kuala Lumpur
Hotel Kuala Lumpur
Back to top
Display posts from previous:   
Post new topic   Reply to topic    WaKan Project Website Forum Index -> Japanese language All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group