NLP Resources
Software applications related to NLP are in NLP Tools section. In Resoruces we have ontologies/databases and corpora.
Ontologies/DB/Semi-structured
Something to do here in this mess…
Sowa has a good summary
http://www.jfsowa.com/
http://www.jfsowa.com/ontology/
http://www.jfsowa.com/ontology/ontoshar.htm
And also has a good bock to buy (Borja)
Cyc
SUMO
WordNet
Try text2onto software
WordNet: Lexical database of English, developed under the direction of George A. Miller. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations.
(Fellbaum book)
Wikipedia: An open source Web Encyclopedia (TODO: LINK TO cognitionis.com/it/wikipedia/)
(How to use) (mirror download)
Corpora
Available corpus
TreeBank Corpus
Brown
Multext?? descargar…
…
Available corpus for Spanish languages
Annotated:
- Ancora (AnCora Basic Kit)(Official web)
- Europarl (European Parallel corpora)(http://www.statmt.org/europarl/)
- TimeML?
- TERN-2004 (English & Spanish)
- CREA y CORDE (RAE)
- UAM-treebank
- LexESP CLiC-TALP
- Cast3LB
Unannotated:
- ECI/MCI Corpus (European Corpus) (Many languages) www.elsnet.org/resourecs/eciCorpus.html
- Elaleph (textos literarios) www.elaleph.com
- BEC (religion/cristianismo)
- TimeBank Basic Kit (The same as PropBank in LDC same as Penn TreeBank same as WSJ and Brown corpus)
- AnCora Basic Kit
- Wikipedia Basic Kit
- http://infomotions.com/alex/ (electronic documents of english classic literature)
- The holy bible (zip, link)
- Fary Tales and kid’s stories
- The little red ridding hood
- Lily and the lion
- http://www.inf.ed.ac.uk/resources/corpora/ (Edinburg, many corpus)
- Brown corpus, Penn Treebank (http://www.cis.upenn.edu/~treebank/)… do specialized pages for that.
Question sets
- Common name based answer questions (zip, link)
- TREC (link): Questionsets with answers and human judjements
- CLEF (link): Questionsets with answers and human judgements (source Wikipedia… interesting)
- OpenTrivia.com (link)
PoS (http://en.wikipedia.org/wiki/Part-of-speech_tagging) tags EAGLES, PAROLE …
A treebank or parsed corpus is a text corpus in which each sentence has been parsed, i.e. annotated with syntactic structure. Syntactic structure is commonly represented as a tree structure, hence the name Treebank. The term Parsed Corpus is often used interchangeably with Treebank: with the emphasis on the primacy of sentences rather than trees.
Treebanks are often created on top of a corpus that has already been annotated with part-of-speech tags. In turn, treebanks are sometimes enhanced with semantic or other linguistic information.
Penn Treebank II tags: http://bulba.sdsu.edu/jeanette/thesis/PennTags.html
Detects numeric quantities even word spelled ones.
Alphabetical list of part-of-speech tags used in the Penn Treebank, CHARNINAK Project:
|
Number
|
Tag
|
Description
|
| 1. | CC | Coordinating conjunction |
| 2. | CD | Cardinal number |
| 3. | DT | Determiner |
| 4. | EX | Existential there |
| 5. | FW | Foreign word |
| 6. | IN | Preposition or subordinating conjunction |
| 7. | JJ | Adjective |
| 8. | JJR | Adjective, comparative |
| 9. | JJS | Adjective, superlative |
| 10. | LS | List item marker |
| 11. | MD | Modal |
| 12. | NN | Noun, singular or mass |
| 13. | NNS | Noun, plural |
| 14. | NNP | Proper noun, singular |
| 15. | NNPS | Proper noun, plural |
| 16. | PDT | Predeterminer |
| 17. | POS | Possessive ending |
| 18. | PRP | Personal pronoun |
| 19. | PRP$ | Possessive pronoun |
| 20. | RB | Adverb |
| 21. | RBR | Adverb, comparative |
| 22. | RBS | Adverb, superlative |
| 23. | RP | Particle |
| 24. | SYM | Symbol |
| 25. | TO | to |
| 26. | UH | Interjection |
| 27. | VB | Verb, base form |
| 28. | VBD | Verb, past tense |
| 29. | VBG | Verb, gerund or present participle |
| 30. | VBN | Verb, past participle |
| 31. | VBP | Verb, non-3rd person singular present |
| 32. | VBZ | Verb, 3rd person singular present |
| 33. | WDT | Wh-determiner |
| 34. | WP | Wh-pronoun |
| 35. | WP$ | Possessive wh-pronoun |
| 36. | WRB | Wh-adverb |
PAROLE TAGSET (FREELING)
| ABBREVIATION | ABBREVIATED WORD |
| ADJ | Adjective |
| ADP | Adposition |
| ADV | Adverb |
| ART | Article |
| CON | Conjunction |
| DET | Determiner |
| INT | Interjection |
| NOU | Noun |
| NUM | Numeral |
| PRN | Pronoun |
| RES | Residual |
| UNIQUE | Unique Membership Class |
| VRB | Verb |
GEO INFO
In order to conduct geo-retrieval well, you may need resources such as gazetteers or ontologies. Here is a brief list of resources that we know about. Please contact Mark Sanderson, if you have other resources you want added to this list.
- Geonames geo coding web service: http://www.geonames.org/
- Geographical ontology for Portugal: http://xldb.fc.ul.pt/geonetpt/
- Alexandria Digital Library Gazetteer Server: http://middleware.alexandria.ucsb.edu/client/gaz/adl/index.jsp
- GEOnet World Place Names Server: http://earth-info.nga.mil/gns/html/index.html
- WorldGazetteer: http://www.world-gazetteer.com
- TGN (Getty Thesaurus): http://www.getty.edu/research/conducting_research/vocabularies/tgn/index.html