Named Entities (abbreviated as NE) are defined
as units that can be denoted by proper names like those of
individuals, institutions, businesses, countries or cities and
brands. Apart from these, NEs also include names of specialized
techniques, domains, and software systems. In time new and more
complex units are being included in the class of Named Entities. In
order to recognize these NEs and to categorize them in specific
classes, Named Entity Recognition (abbreviated as NER) systems have
gained importance in modern days due to the growing needs of fast
and efficient Portuguese translation. It is also abbreviated as NER
in the translation profession. REPENTINO is a reference book that is
readily available and consists of data that is useful to design
named entity recognition systems that assist translators in
delivering efficient machine Portuguese translation.
REPENTINO has different categories of hierarchy
suitable to different instances of the NEs involved. The hierarchy
structure of REPENTINO consists of 11 main categories and 97 sub
categories. For example the main category of ‘Location’ consists of
16 sub categories; on the other hand, the main category of
‘Paperwork’ consists of 8 subcategories. The 11 main categories of
hierarchy of REPENTINO are: Location, Organizations, Beings, Event,
Products, Art and Media, Paperwork, Substance, Abstraction, Nature
and Miscellaneous.
Out of these main categories, ‘Beings’ can be
considered the most significant one to design NER systems useful for
machine Portuguese translation. This is mainly because it comprises
the two third instances that are stored in REPENTINO. The category
of ‘Beings’ covers actual, imaginary and legendary beings. Said
category is further classified into six sub categories that include:
Ethnic, Human, Human-Collective, Mythological, Non-Human and others.
The main category of ‘Location’ includes NEs mostly known due to
their geographical position in the Universe. It is the category of
REPENTINO that ranks second in the list of significant ones. It is
divided into subcategories like Address, Administrative Division/
Region/Town, Country, Civil/Administration/Military,
Commercial/Industrial/Financial, Hydrographic, Heritage/Monuments,
Infrastructure/Facility, Loose Address, Mythological/Fictional,
Religious, Space, Socio-Cultural, Real-Estate, Terrestrial and
Other.
The other significant categories of REPENTINO
are ‘Organizations’ and ‘Events’. Organizations category represents
NEs that involve groups of people having defined structures that
function together as a single body to achieve a goal complying with
specific regulations. Some of the sub categories of Organizations
include Company, Socio-Cultural, Interests Groups, Religious and
Sports. On the other hand ‘Events’ category of REPENTINO includes
NEs with a specific time period, have a fixed start and end time.
Few of the subcategories include Scientific, Sports, Socio-Cultural
and Political.
REPENTINO is a very valuable resource that can
be used to develop tools for machine Portuguese translation. It has
been tried and tested in recent times in Named Entity Recognition
systems and has proved its worth for further applications in the
Portuguese translation field.