Kajian Literatur Mengenai Klasifikasi Blog

Husni Husni

Abstract


Klasifikasi blog merupakan topik kajian baru. Teknik klasifikasi web tradisional tidak dapat diterapkan secara langsung terhadap blog karena sering terjadinya update terhadap isi dan variasi topik pada suatu situs blog. Komponen penyusun blog seperti judul, isi dan komentar, tag (label), penulis, hyperlink, permalink, outlink, tanggal dan jam termasuk obyek yang perlu dilibatkan dalam proses klasifikasi. Tulisan ini mencoba meninjau berbagai pendekatan klasifikasi blog yang hadir sejak 2009. Pada awal kemunculan blog, klasifikasi biner digunakan untuk membedakan blog dari halaman web biasa. Kami fokus pada bagaimana mengkategorikan suatu blog ke dalam daftar topik, genre dan opini (mood dan sentimen) yang telah  didefinisikan sebelumnya.Pada klasifikasi topik dan genre, algoritma kNN, Naive Bayes, CFC, SVM  dan pendekatan machine learning lainnnya banyak digunakan.Pemanfaatan ontologi topik dan tag dapat meningkatkan akurasi klasifikasi. Pada deteksi opini, pendekatan berbasis lexicon seperti ANEW cenderung lebih banyak digunakan. Opini dari suatu situs blog juga dapat diprediksi berdasarkan opini di sekitar inlink yang menuju situs tersebut. Kajian ini perlu diperluas dan diperdalam, seperti keterlibatan lebih lanjut dari tag, link dan analisis jejaring sosial.

Kata kunci: klasifikasi blog, analisis sentimen, blog mining


References


Michael Chau, Porsche Lam, Boby Shiu, Jennifer Xu, Jinwei Cao (2009) A Blog Mining Framework, IT Pro January/February 2009, IEEE Computer Society

Technorati (2011) State of the Blogosphere 2011, URL: http://technorati.com/blogging/article/state-of-the-blogosphere-2011-introduction/page-2/

Geetika T. Lakshmanan, Marten A. Oberhofer (2010) Knowledge Discovery in the Blogosphere Approaches and Challenges, IEEE Internet Computing

Flora S. Tsai (2011) A Tag-Topic Model for Blog Mining, Expert System with Applications (ESwA) Journal, Vol. 38, Page 5330 – 5335

Jiawei Han, Micheline Kamber, Jian Pei (2012) Data Mining Concepts and Techniques, Third Edition, Morgan Kaufmann Publishers

Hae-Ching Chang, Kao-chi, Yeh (2008) Clarifying The Difficulties And Management Of Blogging, Journal of Information, Technology, and Society (JITAS), Vol. 8 No.2, URL: jitas.im.cpu.edu.tw/2008-2/1.pdf

Bonnie A. Nardi, Diane J. Schiano, Michelle Gumbrecht (2004) Blogging As Social Activity, Or, Would You Let 900 Million People Read Your Diary?, dalam Proceedings of Conference On Computer Supported Cooperative Work: 222-231, ACM, URL: http://home.comcast.net /~diane.schiano/CSCW04.Blog.pdf

Xiaoguang Qi, Brian D. Davidson (2009) Web page Classification: Features and Algorithms, ACM Computing Survey, Vol. 41, No. 2, Article 12, URL: http://www.cse.lehigh.edu/~xiq204/pubs /classification-survey/LU-CSE-07-010.pdf

Tomoyuki Nanno, Yasuhiro Suzuki, Toshiaki Fujiki, Manabu Okumura (2004) Automatically Collecting, Monitoring, And Mining Japanese Weblogs, dalam Proceedings Of The 13th International World Wide Web Conference On Alternate Track Papers & Posters (WWW Alt.): 320-321, ACM, URL: www.iw3c2.org/WWW2004/docs/2p320.pdf

Erik Elgersma, Maarten De Rijke (2005) Learning To Recognize Blogs: A Preliminary Exploration, EACL Workshop: New Text—Wikis And Blogs And Other Dynamic Text Sources, URL: http://www.Sics.Se/Jussi/Newtext/Working_Notes/05_Elgersma_ Derijke.Pdf

Feng Yu, Dequan Zheng, Tiejun Zhao, Xiao Cheng (2008) Structure and Content Based Blog Pages Identification, dalam Fifth International Conference on Fuzzy Systems and Knowledge Discovery, IEEE

Ioannis Kanaris & Efstathios Stamatatos (2009) Learning To Recognize Webpage Genres, Information Processing and Management Journal, Volume 45 Issue 5, URL: http://www.icsd.aegean.gr/lecturers/stamatatos/papers/IPM2009%20preprint.pdf

Philipp Petrenz (2009) Assessing Approaches To Genre Classification, M.Sc. Thesis, School Of Informatics University Of Edinburgh, URL: http://www.inf.ed.ac.uk/publications/thesis/online /IM090692.pdf

Alisabeth Lex, Christin Seifert, Michael Granitzer, Andreas Juffinger (2009) Automatic Blog Classification: A Cross Domain Approach, dalam Proceedings of IADIS International Conference WWW/Internet 2009, URL: http://www.iadisportal.org/digital-library/automated-blog-classification-a-cross-domain-approach

Alisabeth Lex, Christin Seifert, Michael Granitzer, Andreas Juffinger (2010) Efficient Cross-Domain Classification of Weblogs, International Journal of Intelligeny Computing Research (IJICR), Vol 1, Issue 1/2, URL: http://infonomics-society.org/IJICR/Efficient%20Cross_Domain%20Classification%20of%20Weblogs.pdf

Verayuth Lertnattee, Thanaruk Theeramunkong (2004) Effect Of Term Distributions On Centroid-Based Text Categorization, Information Sciences - Informatics and Computer Science Journal, Volume 158 Issue 1, URL: http://ccc.inaoep.mx/~villasen/index_archivos/cursoTL/articulos/Lertnattee-EffectOfTermDistributionsOn Centroid-based TextCategorization.pdf

Hu Guan, Jingyu Zhou, Minyi Guo (2009) A Class-Feature-Centroid Classifier for Text Categorization, WWW 2009, April 20–24, Madrid, Spain, ACM, URL: www2009.eprints.org/21/1/p201.pdf

Fabrizio Sebastiani (2002) Machine Learning in Automated Text Categorization, ACM Computing Surveys, Vol. 34, No. 1: 1-47, URL: http://nmis.isti.cnr.it/sebastiani/Publications/ACMCS02.pdf

Ken Hagiwara, Hiroya Takamura, Manabu Okumura (2010) Constructing Blog Entry Classifiers Using Blog-Level Topic Labels, Proceedings of Asia Information Retrieval Symposium (AIRS) 2010, Springer

Andrew McCallum, Kamal Nigam (1998) A Comparison of Event Models for Naive Bayes Text Classification. Proceedings of AAAI 1998 Workshop on Learning for Text Cetegorization: 41-48, 1998, URL: http://www.cs.cmu.edu/~knigam/papers/multinomial-aaaiws98.pdf

Kamal Nigam, Andrew McCallum, Sebastian Thrun, Tom Mitchell (2000) Text Classication from Labeled and Unlabeled Documents using EM, Machine Learning Journal: 1-34, Vol 39 Issue 2-3, URL: www.kamalnigam.com/papers/emcat-mlj99.pdf

Subramaniyaswamy, V, S. Chenthur Pandia (2012) An Improved Approach for Topic Ontology Based Categorization of Blogs Using Support Vector Machine, Journal of Computer Science 8 (2): 251-258, URL: http://thescipub.com/pdf/10.3844/jcssp.2012.251.258

Michael Wiegand, Dietrich Klakow (2009) Topic-Related Polarity Classification of Blog Sentences, Proceeding EPIA '09 Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence, URL: www.lsv.uni-saarland.de/epia.pdf

Macdonald, C.; Santos, R. L.; Ounis, I.; and Soboroff, I. (2010) Blog track research at TREC. SIGIR Forum 44:58–75, URL: http://www.sigir.org/forum/2010J/2010j-sigirforum-macdonald.pdf

Malik Muhammad Saad Missen, Guillaume Cabanac , Mohand Boughanem (2010) Opinion Detection in Blogs: What is still Missing?, International Conference on Advances in Social Networks Analysis and Mining, IEEE, URL: http://acadmedia.wku.edu/Zhuhadar/nikhile/ASONAM-2010/ASONAM-47.pdf

Gianluca Demartini, Stefan Siersdorfer, Sergiu Chelaru,Wolfgang Nejdl (2011) Analyzing Political Trends in the Blogosphere, Fifth International AAAI Conference on Weblogs and Social Media, URL: http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/view/2838/3244

B. Ernsting, W. Weerkamp, and M. de Rijke (2007) The University of Amsterdam at the TREC 2007 Blog Track, URL: http://staff.science.uva.nl/~mdr/Publications/Files/trec2007-wn-blog.pdf

S. Gerani, M. Carman, and F. Crestani (2009) Investigating Learning Approaches for Blog Post Opinion Retrieval, ECIR 2009, URL: bradipo.net/mark/papers/gerani_ecir2009.pdf

Andrea Esuli, Fabrizio Sebastiani (2006) SentiWordNet: A Publicly Available Lexical Resource For Opinion Mining, LREC-06: ELRA, URL: gandalf.aksis.uib.no/lrec2006/pdf/384_pdf.pdf

Bo Pang, Lillian Lee (2008) Opinion Mining And Sentiment Analysis, Foundations And Trends In Information Retrieval, Vol. 2, No 1-2: 1–135, URL: www.cs.cornell.edu/home/llee/omsa/omsa.pdf

Thin Nguyen, Dinh Phung, Brett Adams, Truyen Tran, Svetha Venkatesh (2010) Classification and Pattern Discovery of Mood in Weblogs, PAKDD 2010, Springer

Yuchul Jung, Hogun Park, Sung Hyon Myaeng (2006) A Hybrid Mood Classification Approach for Blog Text, PRICAI 2006: 1099 - 1103, Springer

Yiming Yang, Jan O. Pedersen (1997) A Comparative Study On Feature Selection In Text Categorization. Proceedings. of ICML, pp. 412–420, URL: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.32.9956

Fazel Keshtkar, Diana Inkpen (2011) A Hierarchical Approach To Mood Classification In Blogs, Natural Language Engineering 18 (1): 61–81, Cambridge University Press

Feifan Liu, Dong Wang, Bin Li, Yang Liu (2010) Improving Blog Polarity Classification via Topic Analysis and Adaptive Methods, Human Language Technology: The 2010 Annual Conference of the North American Chapter of the ACL, 309-312, URL: www.aclweb.org/anthology/N10-1042

Zheng Lin, Songbo Tan, Xueqi Cheng (2011) Using Key Sentence to Improve Sentiment Classification, Proceddings of Advanced Information Retrieval Systems (AIRS) 2011, Springer

Shoushan Li, Zhongqing Wang, Guodong Zhou, Sophia Yat Mei Lee (2011) Semi-Supervised Learning for Imbalanced Sentiment Classification, Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, URL: ijcai.org/papers11/Papers/IJCAI11-306.pdf

Sergey Brin, Larry Page (1998) The Anatomy of A Large-Scale Hypertext Web Search Engine. Computer Networks and ISDN System Archive 30 (1-7), 107-117

Anubhav Kale, Amit Karandikar, Pranam Kolari, Akshay Java, Tim Finin, Anupam Joshi (2007) Modelling Trust and Influence in the Blogosphere Using Link Polarity. International Conference on Weblogs and Social Media, URL: http://ebiquity.umbc.edu/_file_directory_/papers/364.pdf

Justin Martineau, Matthew Hurst (2008) Blog Link Classification. International Conference on Weblogs and Social Media, URL: http://ebiquity.umbc.edu/_file_directory_/papers/510.pdf

Jure Leskovec, Daniel Huttenlocher, Jon Kleinberg (2010) Predicting Positive and Negative Links in Online Social Networks, 10th WWW, URL: cs.stanford.edu/~jure/pubs/signs-www10.pdf

Aya Ishino, Hidetsugu Nanba, Toshiyuki Takezawa (2011) Automatic Classification of Link Polarity in Blog Entries, Proceedings of Asia Information Retrieval Symposium (AIRS) 2011, Springer

Aurangzeb Khan, Baharum Baharudin, Khairullah Khan (2011) Sentiment Classification Using Sentence-level Lexical Based Semantic Orientation of Online Reviews, Trends in Applied Science Research 6 (10); 1141-1157, URL: eprints.utp.edu.my/6435/1/207-627-1-PB.pdf

Shoushan Li, Chu-Ren Huang, Guodong Zhou, Sophia Yat Mei Lee (2010) Employing Personal/Impersonal Views in Supervised and Semi-supervised Sentiment Classification, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 414–423, 2010, URL: www.aclweb.org/anthology/P10-1043




DOI: https://doi.org/10.21107/simantec.v8i2.7223

Refbacks

  • There are currently no refbacks.


Copyright (c) 2020 Husni Husni

Indexed By