LABR: A large scale arabic book reviews dataset

Mohamed Aly, Amir Atiya

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

118 Scopus citations

Abstract

We introduce LABR, the largest sentiment analysis dataset to-date for the Arabic language. It consists of over 63,000 book reviews, each rated on a scale of 1 to 5 stars. We investigate the properties of the the dataset, and present its statistics. We explore using the dataset for two tasks: sentiment polarity classification and rating classification. We provide standard splits of the dataset into training and testing, for both polarity and rating classification, in both balanced and unbalanced settings. We run baseline experiments on the dataset to establish a benchmark.

Original languageEnglish (US)
Title of host publicationShort Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages494-498
Number of pages5
ISBN (Print)9781937284510
StatePublished - Jan 1 2013
Event51st Annual Meeting of the Association for Computational Linguistics, ACL 2013 - Sofia, Bulgaria
Duration: Aug 4 2013Aug 9 2013

Publication series

NameACL 2013 - 51st Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
Volume2

Other

Other51st Annual Meeting of the Association for Computational Linguistics, ACL 2013
CountryBulgaria
CitySofia
Period08/4/1308/9/13

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'LABR: A large scale arabic book reviews dataset'. Together they form a unique fingerprint.

Cite this