| View previous topic :: View next topic |
| Author |
Message |
z-one
Joined: 17 Sep 2006 Posts: 99
|
Posted: Sat Feb 23, 2008 3:15 pm Post subject: |
|
|
| tony wrote: | | Seems to work on my machine. |
It's good to know! I'm a bit ashamed of this stupid bug but I'll send an e-mail. |
|
| Back to top |
|
 |
Drex
Joined: 11 Feb 2008 Posts: 26
|
Posted: Mon Feb 25, 2008 9:32 am Post subject: |
|
|
Hurray! You've shoot it this time!  |
|
| Back to top |
|
 |
z-one
Joined: 17 Sep 2006 Posts: 99
|
Posted: Wed Feb 27, 2008 7:24 pm Post subject: |
|
|
After a lot of work I have uploaded a new version. Now you can print vocabulary lists with furigana above the kanji, generate word lists from kanji that has a specific reading in the included words and list only the words for the kanji that has a specific reading.
Get the full version with the database, because there was a bug in it too.
Contact me if you want to include furigana data next to JMDict in your program. (Which part of the kana reading is used for which kanji.) That data is not included with JMDict and it wasn't easy to add. I'm not sure that everyone will be able to use it in their own code though... |
|
| Back to top |
|
 |
z-one
Joined: 17 Sep 2006 Posts: 99
|
Posted: Sat Mar 01, 2008 1:06 pm Post subject: |
|
|
I have started working on adding the example sentences database to the program. It will be in a separate file like for wakan. The file will have the same version numbers as zkanji, because the indexes for the data will have to be changed when I update the zkanji database. This means you won't be able to use an old example database with a new dictionary.
The work has just begun so I might have announced this early. You have to wait.  |
|
| Back to top |
|
 |
tony Co-Admin
Joined: 27 Nov 2003 Posts: 750
|
Posted: Sat Mar 01, 2008 3:15 pm Post subject: |
|
|
z-one,
Oh, too bad, I was ready to download this as soon as I saw the phrase "example sentences database"!
Will example sentences be directly accessible in the dictionary popup window? I know the window would have to be larger to accomodate a sentence and navigating controls, but it would be extremely handy there.
--Tony |
|
| Back to top |
|
 |
z-one
Joined: 17 Sep 2006 Posts: 99
|
Posted: Sat Mar 01, 2008 4:15 pm Post subject: |
|
|
I have no idea how it will look like.
But I have plans!  |
|
| Back to top |
|
 |
z-one
Joined: 17 Sep 2006 Posts: 99
|
Posted: Tue Mar 04, 2008 10:56 pm Post subject: |
|
|
It is alive! The example sentence database is now available at the zkanji download page.
It is still an experimental version and I'm not really satisfied with the outcome. It was a pain to put controls to such a small place and many features are missing that I wanted to add. There is no way to select and copy the text yet either... The possibility of bugs is high too. At least this is what I can tell without much testing. It doesn't mean the sentence handling really has bugs. |
|
| Back to top |
|
 |
tony Co-Admin
Joined: 27 Nov 2003 Posts: 750
|
Posted: Wed Mar 05, 2008 1:27 am Post subject: Different matching criteria when searching for examples |
|
|
I haven't tested it enough to know whether or not there are bugs-- but I noticed something immediately which may be of interest to others who are trying out zkanji.
If you look for examples containing a verb-- say なる-- the examples search finds sentence examples in which the verb appears with different inflections-- e.g. なった-- and also examples in which the verb appears spelled both with kana only and with kanji (e.g. 成る). I believe that WaKan's search does not match inflections, and uses only the spelling from the "Written" column of the selected dictionary entry. So zkanji's search finds considerably more examples of usage.
I think this is extremely helpful, although it would also be nice at some point to have the option of specifying more strict matching criteria. |
|
| Back to top |
|
 |
z-one
Joined: 17 Sep 2006 Posts: 99
|
Posted: Wed Mar 05, 2008 1:15 pm Post subject: |
|
|
I don't know how Wakan uses the original example data file, but the file contains two lines for each sentence. The second line lists the japanese words of the sentence in their dictionary form and also the form they appear in the sentence (including inflections). So the hard work was done for me.
Take a look at the example sentences for the word 足. In many sentences this kanji is not even present, and another one is marked red which also means "leg". This is because the example file specified that 足 is in the sentence in the form of 脚. |
|
| Back to top |
|
 |
z-one
Joined: 17 Sep 2006 Posts: 99
|
Posted: Thu Mar 06, 2008 2:17 am Post subject: |
|
|
http://zkanji.extra.hu/download/zkanji0113.zip This is an "unofficial" build of zkanji v0.113. I made this to experiment a bit with the possibilities of the example sentence data I have converted from the Tanaka Corpus.
I can't really explain what is new in this version, you have to see for yourself. Please try it if you have some time. Write over the previous executable with the one in the zip file as usual and run it. You can see the change when you move your cursor over a recognized word in an example sentence. (It only works with the example sentence database of course).
As I said, this is just an experiment, and I don't know how I could make this part of zkanji really useful. For example I could add furigana over the words (only in plain japanese sentence view as there is not much space...). I would also like to make user sentences possible, but I don't yet know how. How about example groups?
The possibilities are endless, but I only add what I find useful. Maybe this is not even such an important part of a language study program. |
|
| Back to top |
|
 |
tony Co-Admin
Joined: 27 Nov 2003 Posts: 750
|
Posted: Thu Mar 06, 2008 3:21 am Post subject: |
|
|
There are some parsing problems in the current implementation. Also, the popup is not helpful and is a little annoying when all it does is repeat the word verbatim, and then repeat it again verbatim in square brackets.
Example: The second example sentence for "naru" is:
ちまたではインターネットなるものがはやっています。
People are talking about this "Internet" phenomenon.
The main parsing error made here is with 「はやっています」. Instead of recognizing this as an inflection of the verb はやる (流行る), it appears to be misinterpreting は as a particle and the remainder as an inflection of the verb やる, which doesn't make any sense.
Note, also, that で gets the popup で[で] although the following は gets no popup, and インターネット gets the popup インターネット[インターネット].
With some fine tuning, this could be a helpful feature; but the parsing problems may be difficult to solve. I'll see if this happens with other verbs or adjectives starting with a syllable which could also be a particle. |
|
| Back to top |
|
 |
z-one
Joined: 17 Sep 2006 Posts: 99
|
Posted: Thu Mar 06, 2008 3:39 am Post subject: |
|
|
I don't like to point at others, but many sentences are just plain bad in the example database. Of course it's hard to make something with 150,000 sentences to be perfect. Unfortunately it's almost impossible to find the right parts of a sentence by some algorithm.
This is the line you mentioned from the original corpus:
A: ちまたではインターネットなるものがはやっています。 People are talking about this "Internet" phenomenon.#ID=34994
B: 巷{ちまた} で は インターネット なる 物(もの){もの} が は 遣る{やっています}
The B line lists all the words that are present in the Japanese sentence. As you can see the last verb is 遣る{やっています} and it lists は as a separate particle.
There is work to be done with the data that I can do. For example I'll exclude all particles. I've already excluded all that were part of more than 65536 sentences but I'll have to do away with all. That popup won't appear for items that have no kanji, but only if there were no more choices. The point of the popup at the moment is not obvious. It will be a list of possible variations. Right now it shows more than what is possible. Search for "anata" and you'll see it lists another word with the same kanji, but different reading. Unfortunately this is another undocumented "feature" of the Tanaka Corpus. It should clearly specify which word is the one appearing in the sentence, and it doesn't. Zkanji could be clever and notice when the reading is different from what is in the sentence, but I was only following the specifications... |
|
| Back to top |
|
 |
tony Co-Admin
Joined: 27 Nov 2003 Posts: 750
|
Posted: Thu Mar 06, 2008 4:00 am Post subject: |
|
|
| Hmm-- I knew some of the translations were bad, but I've never looked at the parsing, because I've only used the database via WaKan or KanjiLab, never directly. It doesn't surprise me that there are bad translations, since the translator probably was not a native English speaker. But I'm surprised that there are errors in parsing the Japanese. I wonder if someone DID write an algorithm to do this, and never debugged it completely? If I can see immediately that that's a bad parsing, I wouldn't expect it to happen if someone who knew Japanese did it manually. |
|
| Back to top |
|
 |
z-one
Joined: 17 Sep 2006 Posts: 99
|
Posted: Fri Mar 07, 2008 9:29 pm Post subject: |
|
|
Small update: http://zkanji.extra.hu/download.html
I hope that popup is now more useful. Selection is added to the example line too. There was a fatal bug in the database so you have to download the full version once again... |
|
| Back to top |
|
 |
z-one
Joined: 17 Sep 2006 Posts: 99
|
Posted: Tue Mar 18, 2008 8:16 pm Post subject: |
|
|
New update! Please see changes.txt on the website for complete list.
JLPT is now included in the database (so you have to get the full pack again..), but there is no need to update the example sentences.
Another change is that you can minimize all windows but bring back the kanji filters via the popup menu. The kanji filter will then be on top of other windows all the time and you can still use the popup dictionary. This is experimental, because many changes had to be done and I'm not sure they are all stable. |
|
| Back to top |
|
 |
|