extractor features
libextractor es una libreria cuyo fin es extraer metadatos de archivos de cualquier tipo. Se diseñó para utilizar otras liberias para ayudar en el proceso concreto de extraccion de metadatos, y tambien para que se pudiera ampliar facilmente su función con solo enlazar con extractores externos para tipos de archivo adicionales. libextractor es parte del proyecto GNU
Ventajas:
- Extracción de metadatos de cualquier tipo de archivo
- Programado en C con bindigs para otros lenguajes (en nuestro caso python)
from extractor import Extractor extractor = Extractor() extractor.keywordTypes()
('unknown', 'filename', 'mimetype', 'title', 'author', 'artist', 'description', 'comment', 'date', 'publisher', 'language', 'album', 'genre', 'location', 'version', 'organization', 'copyright', 'subject', 'keywords', 'contributor', 'resource-type', 'format', 'resource-identifier', 'source', 'relation', 'coverage', 'software', 'disclaimer', 'warning', 'translated', 'creation date', 'modification date', 'creator', 'producer', 'page count', 'page orientation', 'paper size', 'used fonts', 'page order', 'created for', 'magnification', 'release', 'group', 'size', 'summary', 'packager', 'vendor', 'license', 'distribution', 'build-host', 'os', 'dependency', 'MD4', 'MD5', 'SHA-0', 'SHA-1', 'RipeMD160', 'resolution', 'category', 'book title', 'priority', 'conflicts', 'replaces', 'provides', 'conductor', 'interpreter', 'owner', 'lyrics', 'media type', 'contact', 'binary thumbnail data', 'publication date', 'camera make', 'camera model', 'exposure', 'aperture', 'exposure bias', 'flash', 'flash bias', 'focal length', 'focal length (35mm equivalent)', 'iso speed', 'exposure mode', 'metering mode', 'macro mode', 'image quality', 'white balance', 'orientation')
from namespaces import *
namespaces.py:
"""Common namespaces""" from rdflib import Namespace SIOC = Namespace(u'http://rdfs.org/sioc/ns#') RDF = Namespace(u'http://www.w3.org/1999/02/22-rdf-syntax-ns#') RDFS = Namespace(u'http://www.w3.org/2000/01/rdf-schema#') DC = Namespace(u'http://purl.org/dc/elements/1.1/') DCTERMS = Namespace(u'http://purl.org/dc/terms/') FOAF = Namespace(u'http://xmlns.com/foaf/0.1/') GEO = Namespace(u'http://www.w3.org/2003/01/geo/wgs84_pos#') MVCB = Namespace(u'http://webns.net/mvcb/') ICAL = Namespace(u'http://www.w3.org/2002/12/cal/icaltzd#') XSD = Namespace(u'http://www.w3.org/2001/XMLSchema#') PFO = Namespace(u'http://www.kaskaras.net/pfo/0.1/') # Pathetic FileSystem Ontology
| name | ontology | predicate | Qualified Predicate |
|---|---|---|---|
| album | |||
| aperture | |||
| artist | DC | DC['creator'] | |
| author | DC | DC['creator'] | |
| binary thumbnail data | |||
| book title | DC | DC['title'] | |
| build-host | |||
| camera make | |||
| camera model | |||
| category | |||
| chapter | |||
| character count | |||
| character set | |||
| comment | |||
| company | |||
| conductor | |||
| conflicts | |||
| contact | |||
| content type | |||
| contributor | DC | DC['creator'] | |
| copyright | DC | DC['rights'] | |
| coverage | DC | DC['coverage'] | |
| created by software | DC | DC['creator'] | |
| created for | |||
| creation date | DC | DC['date'] | |
| creator | DC | DC['creator'] | |
| date | DC | DC['date'] | |
| dependency | DC | DC['relation'] | |
| description | DC | DC['description'] | |
| director | |||
| disclaimer | |||
| distribution | |||
| duration | |||
| editing cycles | |||
| encoded by | |||
| exposure | |||
| exposure bias | |||
| exposure mode | |||
| filename | DC | DC['title'] | |
| filesize | |||
| flash | |||
| flash bias | |||
| focal length | |||
| focal length (35mm equivalent) | |||
| format | DC | DC['format'] | |
| format version | DC | DC['relation'] | DC['hasVersion'] |
| full name | |||
| generator | |||
| genre | |||
| group | |||
| hardware dependency | |||
| image quality | |||
| information | |||
| interpreter | |||
| iso speed | |||
| keywords | |||
| language | |||
| last printed | |||
| last saved by | |||
| license | DC['rights'] | ||
| line count | |||
| link | |||
| location | |||
| lower case conversion | |||
| lyrics | |||
| macro mode | |||
| magnification | |||
| manager | |||
| MD4 | |||
| MD5 | |||
| media type | |||
| metering mode | |||
| mimetype | |||
| modification date | |||
| modified by software | |||
| mood | |||
| music CD identifier | |||
| musician credits list | |||
| operating system | |||
| organization | |||
| orientation | |||
| owner | |||
| packager | |||
| page count | |||
| page order | |||
| page orientation | |||
| paper size | |||
| paragraph count | |||
| play counter | |||
| popularity meter | |||
| priority | |||
| producer | |||
| product version | |||
| provides | |||
| publication date | |||
| publisher | |||
| relation | |||
| release | |||
| replaces | |||
| resolution | |||
| resource-identifier | |||
| resource-type | |||
| revision history | |||
| RipeMD160 | |||
| ripper | |||
| scale | |||
| security | |||
| SHA-0 | |||
| SHA-1 | |||
| size | |||
| software | |||
| song count | |||
| source | |||
| split | |||
| starting song | |||
| subject | |||
| summary | |||
| television system | |||
| template | |||
| thumbnails | |||
| time | |||
| title | |||
| total editing time | |||
| translated | |||
| unknown | |||
| used fonts | |||
| vendor | |||
| version | |||
| warning | |||
| white balance | |||
| word count | |||
| year |
Discusión