Thursday, March 5, 2009

Small Sumary

Training data:

TXT  HTML
RMRB 406M 7.56G
XWLB 4.82M 257M
XW30 5.77M 352M


u8-ansi:convert UTF8 to ANSI. Failure.
Since 20080612, they change the pages in UTF-8 format, and my program is not suit with it. I will rewrite the u8-ansi and firstly I decide to trainning my language model.

-Katrina

No comments:

Post a Comment