Cognitionis
The little I know

NLP Resources


Software applications related to NLP are in NLP Tools section. In Resoruces we have ontologies/databases and corpora.

Ontologies/DB/Semi-structured

Something to do here in this mess…

Sowa has a good summary

http://www.jfsowa.com/

http://www.jfsowa.com/ontology/

http://www.jfsowa.com/ontology/ontoshar.htm

And also has a good bock to buy (Borja)

Cyc

SUMO

WordNet

Try text2onto software

WordNet: Lexical database of English, developed under the direction of George A. Miller. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations.
(Fellbaum book)

Wikipedia: An open source Web Encyclopedia (TODO: LINK TO cognitionis.com/it/wikipedia/)
(How to use) (mirror download)

Corpora

Available corpus

TreeBank Corpus

Brown

Multext?? descargar…

Available corpus for Spanish languages

Annotated:

  • Ancora (AnCora Basic Kit)(Official web)
  • Europarl (European Parallel corpora)(http://www.statmt.org/europarl/)
  • TimeML?
  • TERN-2004 (English & Spanish)
  • CREA y CORDE (RAE)
  • UAM-treebank
  • LexESP CLiC-TALP
  • Cast3LB

Unannotated:

  • ECI/MCI Corpus (European Corpus) (Many languages) www.elsnet.org/resourecs/eciCorpus.html
  • Elaleph (textos literarios) www.elaleph.com
  • BEC (religion/cristianismo)
  • TimeBank Basic Kit (The same as PropBank in LDC same as Penn TreeBank same as WSJ and Brown corpus)
  • AnCora Basic Kit
  • Wikipedia Basic Kit
  • http://infomotions.com/alex/ (electronic documents of english classic literature)
  • The holy bible (zip, link)
  • Fary Tales and kid’s stories
    • The little red ridding hood
    • Lily and the lion
  • http://www.inf.ed.ac.uk/resources/corpora/ (Edinburg, many corpus)
  • Brown corpus, Penn Treebank (http://www.cis.upenn.edu/~treebank/)… do specialized pages for that.

Question sets

  • Common name based answer questions (zip, link)
  • TREC (link): Questionsets with answers and human judjements
  • CLEF (link): Questionsets with answers and human judgements (source Wikipedia… interesting)
  • OpenTrivia.com (link)

PoS (http://en.wikipedia.org/wiki/Part-of-speech_tagging) tags EAGLES, PAROLE …

A treebank or parsed corpus is a text corpus in which each sentence has been parsed, i.e. annotated with syntactic structure. Syntactic structure is commonly represented as a tree structure, hence the name Treebank. The term Parsed Corpus is often used interchangeably with Treebank: with the emphasis on the primacy of sentences rather than trees.

Treebanks are often created on top of a corpus that has already been annotated with part-of-speech tags. In turn, treebanks are sometimes enhanced with semantic or other linguistic information.

Penn Treebank II tags: http://bulba.sdsu.edu/jeanette/thesis/PennTags.html

Detects numeric quantities even word spelled ones.

Alphabetical list of part-of-speech tags used in the Penn Treebank, CHARNINAK Project:

Number
Tag
Description
1. CC Coordinating conjunction
2. CD Cardinal number
3. DT Determiner
4. EX Existential there
5. FW Foreign word
6. IN Preposition or subordinating conjunction
7. JJ Adjective
8. JJR Adjective, comparative
9. JJS Adjective, superlative
10. LS List item marker
11. MD Modal
12. NN Noun, singular or mass
13. NNS Noun, plural
14. NNP Proper noun, singular
15. NNPS Proper noun, plural
16. PDT Predeterminer
17. POS Possessive ending
18. PRP Personal pronoun
19. PRP$ Possessive pronoun
20. RB Adverb
21. RBR Adverb, comparative
22. RBS Adverb, superlative
23. RP Particle
24. SYM Symbol
25. TO to
26. UH Interjection
27. VB Verb, base form
28. VBD Verb, past tense
29. VBG Verb, gerund or present participle
30. VBN Verb, past participle
31. VBP Verb, non-3rd person singular present
32. VBZ Verb, 3rd person singular present
33. WDT Wh-determiner
34. WP Wh-pronoun
35. WP$ Possessive wh-pronoun
36. WRB Wh-adverb

PAROLE TAGSET (FREELING)

ABBREVIATION ABBREVIATED WORD
ADJ Adjective
ADP Adposition
ADV Adverb
ART Article
CON Conjunction
DET Determiner
INT Interjection
NOU Noun
NUM Numeral
PRN Pronoun
RES Residual
UNIQUE Unique Membership Class
VRB Verb

GEO INFO

In order to conduct geo-retrieval well, you may need resources such as gazetteers or ontologies. Here is a brief list of resources that we know about. Please contact Mark Sanderson, if you have other resources you want added to this list.