Abstract
In this research, we present the results of a study conducted to ascertain the applicability of document clustering techniques on Urdu Language corpus. This study, which is first of its kind, employs a fully probabilistic Bayesian method, Latent Dirichlet Allocation, for clustering Urdu language corpus by using the features collected from the documents. Results obtained are compared with those obtained from a simplistic classification technique. Analysis of the results shows that supervised and unsupervised techniques for grouping documents perform reasonably well on this corpus. Results further indicate that Urdu document clustering technique outperforms document classification technique in some cases with an accuracy of above 90%.
Keyword(s)
A PHP Error was encountered
Severity: Notice
Message: Undefined variable: pub_keywords
Filename: front/publication.php
Line Number: 77
Backtrace:
File: /home/prdbpk/public_html/application/views/front/publication.php
Line: 77
Function: _error_handler
File: /home/prdbpk/public_html/application/controllers/Front.php
Line: 221
Function: view
File: /home/prdbpk/public_html/index.php
Line: 315
Function: require_once
A PHP Error was encountered
Severity: Warning
Message: Invalid argument supplied for foreach()
Filename: front/publication.php
Line Number: 77
Backtrace:
File: /home/prdbpk/public_html/application/views/front/publication.php
Line: 77
Function: _error_handler
File: /home/prdbpk/public_html/application/controllers/Front.php
Line: 221
Function: view
File: /home/prdbpk/public_html/index.php
Line: 315
Function: require_once
A PHP Error was encountered
Severity: Notice
Message: Undefined variable: pub_info
Filename: front/publication.php
Line Number: 84
Backtrace:
File: /home/prdbpk/public_html/application/views/front/publication.php
Line: 84
Function: _error_handler
File: /home/prdbpk/public_html/application/controllers/Front.php
Line: 221
Function: view
File: /home/prdbpk/public_html/index.php
Line: 315
Function: require_once