ArabicNLP Dataset
Published:
URL | |
---|---|
Papers | https://arxiv.org/ftp/arxiv/papers/1702/1702.07835.pdf |
Sentiment Analysis | http://archive.ics.uci.edu/ml/datasets/Twitter+Data+set+for+Arabic+Sentiment+Analysis |
Classification | http://archive.ics.uci.edu/ml/datasets/Opinion+Corpus+for+Lebanese+Arabic+Reviews+%28OCLAR%29 |
Wikipedia | https://archive.org/details/arwiki-20190201 |
Multiple | https://lionbridge.ai/datasets/best-arabic-datasets-for-machine-learning/ |
https://github.com/ibnmalik/golden-corpus-arabic/tree/develop/core | |
https://github.com/linuxscout/tashkeela2 | |
Diacritization | https://github.com/AliOsm/arabic-text-diacritization/tree/master/dataset |
http://tanzil.net/download | |
https://www.kaggle.com/datasets/linuxscout/tashkeela?resource=download | |
Crawls | https://traces1.inria.fr/oscar/ |
https://github.com/alisafaya/Arabic-BERT | |
https://github.com/mohataher/arabic_big_corpus | |
https://github.com/aosaimy/riyadh-corpus-collection | |
https://github.com/anastaw/Arabic-Wikipedia-Corpus/blob/master/Wikipedia-Corpus-30-08-10.sql.gz | |
https://github.com/antcorpus/RSSCrawlerArabicCorpus | |
BBC Crawl: https://github.com/motazsaad/bbc-crawler | |
News | https://github.com/parallelfold/SaudiNewsNet |
https://github.com/motazsaad/Arabic-News | |
ArabicWeb16 | Labeled Dataset: https://sites.google.com/view/arabicweb16/download/labelled-datasets?authuser=0 |
Sample big: https://sites.google.com/view/arabicweb16/getting-started?authuser=0 https://drive.google.com/drive/folders/0B6P2zR7VKiV4SWdITFlXcmxObWM | |
Raw | https://github.com/Islamicate-DH/arabicCorpus |
NER | https://github.com/EmnamoR/Arabic-named-entity-recognition |
https://github.com/RamziSalah/Classical-Arabic-Named-Entity-Recognition-Corpus | |
https://www.cs.cmu.edu/~ark/ArabicNER/ | |
https://github.com/juand-r/entity-recognition-datasets | |
https://github.com/oudalab/Arabic-NER | |
https://github.com/EmnamoR/Arabic-named-entity-recognition/tree/master/ | |
Keyphrase | https://github.com/ailab-uniud/akec |
Sentiment Analysis | https://github.com/nora-twairesh/AraSenti |
https://github.com/almoslmi/masc | |
https://github.com/marwanalomari/Sentiment-Classifier-Logistic-Regression-for-Arabic-Services-Reviews-in-Lebanon | |
https://github.com/komari6/Arabic-twitter-corpus-AJGT | |
https://tahatobaili.github.io/project-rbz/ | |
Speech data | http://www.cs.stir.ac.uk/~lss/arabic/ |
https://github.com/Anwarvic/Arabic-Speech-Recognition | |
Tashkeel | https://github.com/Anwarvic/Tashkeela-Model |
Speech to text | https://github.com/motazsaad/jsc-news-broadcast |
Misspellings | https://github.com/linuxscout/aghlat |
Arabic/English Translation | https://github.com/meedan/news-memory |
Poetry | https://github.com/d7eame/Matn |
WordEmbeddings | http://mazajak.inf.ed.ac.uk:8000/ |
Stories | https://github.com/motazsaad/Arabic-Stories-Corpus |
Dialects | https://github.com/motazsaad/corpus2json/tree/master/corpora/nizar_arabic_dialects |
Opinion Mining | https://github.com/AhmedObaidi/omcca |
POS + rel | https://github.com/salsama/Arabic-Information-Extraction-Corpus |
https://github.com/qcri/dialectal_arabic_pos_tagger | |
https://github.com/seloufian/Arabic-PoS-Tagger | |
Annotated per nationality | https://github.com/Data-Science-for-Linguists-2020/Arabic-Learner-Corpus-Considerations |
OCR | Digits: https://www.kaggle.com/mloey1/ahdd1 |
Letters: https://www.kaggle.com/mloey1/ahcd1 | |
https://cactus.orange-labs.fr/ALIF/download.html | |
http://www.ccse.kfupm.edu.sa/~husni/ArabicOCR/PATS-A02.htm | |
http://kafd.ideas2serve.net/KAFDDownloadOptions.php | |
https://github.com/ainawind27/arabicocr-data | |
http://kitab-project.org/ | |
https://medium.com/@openiti/openiti-aocp-9802865a6586 | |
https://www.rdi-sotoor.com/#/login | |
https://www.primaresearch.org/RASM2019/ | |
https://blogs.bl.uk/digital-scholarship/2018/02/8th-century-arabic-scientists-meet-todays-computer-scientists.html | |
https://blogs.bl.uk/digital-scholarship/2018/03/arabic-handwrittten-ocr.html | |
Raw images: https://fromthepage.com/bldigital/arabic-scientific-manuscripts | |
Arabic conversation for chatbots | https://www.kaggle.com/ahmedkaramdev/arabic-conversational-dataset |
WordNet | http://compling.hss.ntu.edu.sg/omw/ |
Resources: http://globalwordnet.org/resources/arabic-wordnet/arabic-resources/ | |
http://compling.hss.ntu.edu.sg/omw/wns/arb/LICENSE | |
Other | https://www.al-fanarmedia.org/2018/11/an-online-arabic-dictionary-makes-its-debut/#.W__YlAjqMNw.twitter |
Treebank: https://sourceforge.net/projects/arabicsubcats/files/ |