Cargando
 

extractor features

libextractor es una libreria cuyo fin es extraer metadatos de archivos de cualquier tipo. Se diseñó para utilizar otras liberias para ayudar en el proceso concreto de extraccion de metadatos, y tambien para que se pudiera ampliar facilmente su función con solo enlazar con extractores externos para tipos de archivo adicionales. libextractor es parte del proyecto GNU

Ventajas:

  • Extracción de metadatos de cualquier tipo de archivo
  • Programado en C con bindigs para otros lenguajes (en nuestro caso python)
from extractor import Extractor
extractor = Extractor()
extractor.keywordTypes()

('unknown', 'filename', 'mimetype', 'title', 'author', 'artist', 'description', 'comment', 'date', 'publisher', 'language', 'album', 'genre', 'location', 'version', 'organization', 'copyright', 'subject', 'keywords', 'contributor', 'resource-type', 'format', 'resource-identifier', 'source', 'relation', 'coverage', 'software', 'disclaimer', 'warning', 'translated', 'creation date', 'modification date', 'creator', 'producer', 'page count', 'page orientation', 'paper size', 'used fonts', 'page order', 'created for', 'magnification', 'release', 'group', 'size', 'summary', 'packager', 'vendor', 'license', 'distribution', 'build-host', 'os', 'dependency', 'MD4', 'MD5', 'SHA-0', 'SHA-1', 'RipeMD160', 'resolution', 'category', 'book title', 'priority', 'conflicts', 'replaces', 'provides', 'conductor', 'interpreter', 'owner', 'lyrics', 'media type', 'contact', 'binary thumbnail data', 'publication date', 'camera make', 'camera model', 'exposure', 'aperture', 'exposure bias', 'flash', 'flash bias', 'focal length', 'focal length (35mm equivalent)', 'iso speed', 'exposure mode', 'metering mode', 'macro mode', 'image quality', 'white balance', 'orientation')

from namespaces import *

namespaces.py:

"""Common namespaces"""
 
from rdflib import Namespace
 
SIOC = Namespace(u'http://rdfs.org/sioc/ns#')
RDF = Namespace(u'http://www.w3.org/1999/02/22-rdf-syntax-ns#')
RDFS = Namespace(u'http://www.w3.org/2000/01/rdf-schema#')
DC = Namespace(u'http://purl.org/dc/elements/1.1/')
DCTERMS = Namespace(u'http://purl.org/dc/terms/')
FOAF = Namespace(u'http://xmlns.com/foaf/0.1/')
GEO = Namespace(u'http://www.w3.org/2003/01/geo/wgs84_pos#')
MVCB = Namespace(u'http://webns.net/mvcb/')
ICAL = Namespace(u'http://www.w3.org/2002/12/cal/icaltzd#')
XSD = Namespace(u'http://www.w3.org/2001/XMLSchema#')
PFO = Namespace(u'http://www.kaskaras.net/pfo/0.1/') # Pathetic FileSystem Ontology
name ontology predicate Qualified Predicate
album
aperture
artistDC DC['creator']
authorDC DC['creator']
binary thumbnail data
book titleDC DC['title']
build-host
camera make
camera model
category
chapter
character count
character set
comment
company
conductor
conflicts
contact
content type
contributorDC DC['creator']
copyrightDC DC['rights']
coverageDC DC['coverage']
created by softwareDC DC['creator']
created for
creation dateDC DC['date']
creatorDC DC['creator']
dateDC DC['date']
dependencyDC DC['relation']
descriptionDC DC['description']
director
disclaimer
distribution
duration
editing cycles
encoded by
exposure
exposure bias
exposure mode
filenameDC DC['title']
filesize
flash
flash bias
focal length
focal length (35mm equivalent)
formatDC DC['format']
format versionDC DC['relation'] DC['hasVersion']
full name
generator
genre
group
hardware dependency
image quality
information
interpreter
iso speed
keywords
language
last printed
last saved by
license DC['rights']
line count
link
location
lower case conversion
lyrics
macro mode
magnification
manager
MD4
MD5
media type
metering mode
mimetype
modification date
modified by software
mood
music CD identifier
musician credits list
operating system
organization
orientation
owner
packager
page count
page order
page orientation
paper size
paragraph count
play counter
popularity meter
priority
producer
product version
provides
publication date
publisher
relation
release
replaces
resolution
resource-identifier
resource-type
revision history
RipeMD160
ripper
scale
security
SHA-0
SHA-1
size
software
song count
source
split
starting song
subject
summary
television system
template
thumbnails
time
title
total editing time
translated
unknown
used fonts
vendor
version
warning
white balance
word count
year
 
python-extractore-features.txt · Última modificación: 05/11/2007 07:29 por kaskaras     Subir
Get Firefox! Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki
Translations of this page?: