Announcement
Starting on July 4, 2018 the Indonesian Publication Index (IPI) has been acquired by the Ministry of Research Technology and Higher Education (RISTEKDIKTI) called GARUDA Garba Rujukan Digital (http://garuda.ristekdikti.go.id)
For further information email to portalgaruda@gmail.com

Thank you
Logo IPI  
Journal > JUTI: Jurnal Ilmiah Teknologi Informasi > TERM WEIGHTING BASED ON INDEX OF GENRE FOR WEB PAGE GENRE CLASSIFICATION

 

Full Text PDF (402 kb)
JUTI: Jurnal Ilmiah Teknologi Informasi
Vol 12, No 1, Januari 2014
TERM WEIGHTING BASED ON INDEX OF GENRE FOR WEB PAGE GENRE CLASSIFICATION
Article Info   ABSTRACT
Published date:
01 Jan 2014
 
Automating the identification of the genre of web pages becomes an important area in web pages classification, as it can be used to improve the quality of the web search result and to reduce search time. To index the terms used in classification, generally the selected type of weighting is the document-based TF-IDF. However, this method does not consider genre, whereas web page documents have a type of categorization called genre. With the existence of genre, the term appearing often in a genre should be more significant in document indexing compared to the term appearing frequently in many genres despites its high TF-IDF value. We proposed a new weighting method for web page documents indexing called inverse genre frequency (IGF). This method is based on genre, a manual categorization done semantically from previous research. Experimental results show that the term weighting based on index of genre (TF-IGF) performed better compared to term weighting based on index of document (TF-IDF), with the highest value of accuracy, precision, recall, and F-measure in case of excluding the genre-specific keywords were 78%, 80.2%, 78%, and 77.4% respectively, and in case of including the genre-specific keywords were 78.9%, 78.7%, 78.9%, and 78.1% respectively.
Copyrights © 2014