WaKan Project Website Forum Index WaKan Project Website
Forums about WaKan and Japanese & Chinese language
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

zkanji v -1.0 sub delta
Goto page Previous  1, 2, 3, 4, 5, 6  Next
 
Post new topic   Reply to topic    WaKan Project Website Forum Index -> Software tools
View previous topic :: View next topic  
Author Message
z-one



Joined: 17 Sep 2006
Posts: 99

PostPosted: Sat Feb 23, 2008 3:15 pm    Post subject: Reply with quote

tony wrote:
Seems to work on my machine.


It's good to know! I'm a bit ashamed of this stupid bug but I'll send an e-mail.
Back to top
View user's profile Send private message Visit poster's website
Drex



Joined: 11 Feb 2008
Posts: 26

PostPosted: Mon Feb 25, 2008 9:32 am    Post subject: Reply with quote

Hurray! You've shoot it this time! Laughing
Back to top
View user's profile Send private message
z-one



Joined: 17 Sep 2006
Posts: 99

PostPosted: Wed Feb 27, 2008 7:24 pm    Post subject: Reply with quote

After a lot of work I have uploaded a new version. Now you can print vocabulary lists with furigana above the kanji, generate word lists from kanji that has a specific reading in the included words and list only the words for the kanji that has a specific reading.

Get the full version with the database, because there was a bug in it too.

Contact me if you want to include furigana data next to JMDict in your program. (Which part of the kana reading is used for which kanji.) That data is not included with JMDict and it wasn't easy to add. I'm not sure that everyone will be able to use it in their own code though...
Back to top
View user's profile Send private message Visit poster's website
z-one



Joined: 17 Sep 2006
Posts: 99

PostPosted: Sat Mar 01, 2008 1:06 pm    Post subject: Reply with quote

I have started working on adding the example sentences database to the program. It will be in a separate file like for wakan. The file will have the same version numbers as zkanji, because the indexes for the data will have to be changed when I update the zkanji database. This means you won't be able to use an old example database with a new dictionary.

The work has just begun so I might have announced this early. You have to wait. Crying or Very sad
Back to top
View user's profile Send private message Visit poster's website
tony
Co-Admin


Joined: 27 Nov 2003
Posts: 750

PostPosted: Sat Mar 01, 2008 3:15 pm    Post subject: Reply with quote

z-one,

Oh, too bad, I was ready to download this as soon as I saw the phrase "example sentences database"!

Will example sentences be directly accessible in the dictionary popup window? I know the window would have to be larger to accomodate a sentence and navigating controls, but it would be extremely handy there.

--Tony
Back to top
View user's profile Send private message Send e-mail
z-one



Joined: 17 Sep 2006
Posts: 99

PostPosted: Sat Mar 01, 2008 4:15 pm    Post subject: Reply with quote

I have no idea how it will look like. Very Happy

But I have plans! Wink
Back to top
View user's profile Send private message Visit poster's website
z-one



Joined: 17 Sep 2006
Posts: 99

PostPosted: Tue Mar 04, 2008 10:56 pm    Post subject: Reply with quote

It is alive! The example sentence database is now available at the zkanji download page.

It is still an experimental version and I'm not really satisfied with the outcome. It was a pain to put controls to such a small place and many features are missing that I wanted to add. There is no way to select and copy the text yet either... The possibility of bugs is high too. Embarassed At least this is what I can tell without much testing. It doesn't mean the sentence handling really has bugs.
Back to top
View user's profile Send private message Visit poster's website
tony
Co-Admin


Joined: 27 Nov 2003
Posts: 750

PostPosted: Wed Mar 05, 2008 1:27 am    Post subject: Different matching criteria when searching for examples Reply with quote

I haven't tested it enough to know whether or not there are bugs-- but I noticed something immediately which may be of interest to others who are trying out zkanji.

If you look for examples containing a verb-- say なる-- the examples search finds sentence examples in which the verb appears with different inflections-- e.g. なった-- and also examples in which the verb appears spelled both with kana only and with kanji (e.g. 成る). I believe that WaKan's search does not match inflections, and uses only the spelling from the "Written" column of the selected dictionary entry. So zkanji's search finds considerably more examples of usage.

I think this is extremely helpful, although it would also be nice at some point to have the option of specifying more strict matching criteria.
Back to top
View user's profile Send private message Send e-mail
z-one



Joined: 17 Sep 2006
Posts: 99

PostPosted: Wed Mar 05, 2008 1:15 pm    Post subject: Reply with quote

I don't know how Wakan uses the original example data file, but the file contains two lines for each sentence. The second line lists the japanese words of the sentence in their dictionary form and also the form they appear in the sentence (including inflections). So the hard work was done for me.

Take a look at the example sentences for the word 足. In many sentences this kanji is not even present, and another one is marked red which also means "leg". This is because the example file specified that 足 is in the sentence in the form of 脚.
Back to top
View user's profile Send private message Visit poster's website
z-one



Joined: 17 Sep 2006
Posts: 99

PostPosted: Thu Mar 06, 2008 2:17 am    Post subject: Reply with quote

http://zkanji.extra.hu/download/zkanji0113.zip This is an "unofficial" build of zkanji v0.113. I made this to experiment a bit with the possibilities of the example sentence data I have converted from the Tanaka Corpus.

I can't really explain what is new in this version, you have to see for yourself. Please try it if you have some time. Write over the previous executable with the one in the zip file as usual and run it. You can see the change when you move your cursor over a recognized word in an example sentence. (It only works with the example sentence database of course).

As I said, this is just an experiment, and I don't know how I could make this part of zkanji really useful. For example I could add furigana over the words (only in plain japanese sentence view as there is not much space...). I would also like to make user sentences possible, but I don't yet know how. How about example groups?

The possibilities are endless, but I only add what I find useful. Maybe this is not even such an important part of a language study program.
Back to top
View user's profile Send private message Visit poster's website
tony
Co-Admin


Joined: 27 Nov 2003
Posts: 750

PostPosted: Thu Mar 06, 2008 3:21 am    Post subject: Reply with quote

There are some parsing problems in the current implementation. Also, the popup is not helpful and is a little annoying when all it does is repeat the word verbatim, and then repeat it again verbatim in square brackets.

Example: The second example sentence for "naru" is:

ちまたではインターネットなるものがはやっています。
People are talking about this "Internet" phenomenon.

The main parsing error made here is with 「はやっています」. Instead of recognizing this as an inflection of the verb はやる (流行る), it appears to be misinterpreting は as a particle and the remainder as an inflection of the verb やる, which doesn't make any sense.

Note, also, that で gets the popup で[で] although the following は gets no popup, and インターネット gets the popup インターネット[インターネット].

With some fine tuning, this could be a helpful feature; but the parsing problems may be difficult to solve. I'll see if this happens with other verbs or adjectives starting with a syllable which could also be a particle.
Back to top
View user's profile Send private message Send e-mail
z-one



Joined: 17 Sep 2006
Posts: 99

PostPosted: Thu Mar 06, 2008 3:39 am    Post subject: Reply with quote

I don't like to point at others, but many sentences are just plain bad in the example database. Of course it's hard to make something with 150,000 sentences to be perfect. Unfortunately it's almost impossible to find the right parts of a sentence by some algorithm.

This is the line you mentioned from the original corpus:
A: ちまたではインターネットなるものがはやっています。 People are talking about this "Internet" phenomenon.#ID=34994
B: 巷{ちまた} で は インターネット なる 物(もの){もの} が は 遣る{やっています}
The B line lists all the words that are present in the Japanese sentence. As you can see the last verb is 遣る{やっています} and it lists は as a separate particle.

There is work to be done with the data that I can do. For example I'll exclude all particles. I've already excluded all that were part of more than 65536 sentences but I'll have to do away with all. That popup won't appear for items that have no kanji, but only if there were no more choices. The point of the popup at the moment is not obvious. It will be a list of possible variations. Right now it shows more than what is possible. Search for "anata" and you'll see it lists another word with the same kanji, but different reading. Unfortunately this is another undocumented "feature" of the Tanaka Corpus. It should clearly specify which word is the one appearing in the sentence, and it doesn't. Zkanji could be clever and notice when the reading is different from what is in the sentence, but I was only following the specifications...
Back to top
View user's profile Send private message Visit poster's website
tony
Co-Admin


Joined: 27 Nov 2003
Posts: 750

PostPosted: Thu Mar 06, 2008 4:00 am    Post subject: Reply with quote

Hmm-- I knew some of the translations were bad, but I've never looked at the parsing, because I've only used the database via WaKan or KanjiLab, never directly. It doesn't surprise me that there are bad translations, since the translator probably was not a native English speaker. But I'm surprised that there are errors in parsing the Japanese. I wonder if someone DID write an algorithm to do this, and never debugged it completely? If I can see immediately that that's a bad parsing, I wouldn't expect it to happen if someone who knew Japanese did it manually.
Back to top
View user's profile Send private message Send e-mail
z-one



Joined: 17 Sep 2006
Posts: 99

PostPosted: Fri Mar 07, 2008 9:29 pm    Post subject: Reply with quote

Small update: http://zkanji.extra.hu/download.html

I hope that popup is now more useful. Selection is added to the example line too. There was a fatal bug in the database so you have to download the full version once again...
Back to top
View user's profile Send private message Visit poster's website
z-one



Joined: 17 Sep 2006
Posts: 99

PostPosted: Tue Mar 18, 2008 8:16 pm    Post subject: Reply with quote

New update! Please see changes.txt on the website for complete list.
JLPT is now included in the database (so you have to get the full pack again..), but there is no need to update the example sentences.
Another change is that you can minimize all windows but bring back the kanji filters via the popup menu. The kanji filter will then be on top of other windows all the time and you can still use the popup dictionary. This is experimental, because many changes had to be done and I'm not sure they are all stable.
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    WaKan Project Website Forum Index -> Software tools All times are GMT
Goto page Previous  1, 2, 3, 4, 5, 6  Next
Page 3 of 6

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group