September 17, 2004
ShaRef
Caught this in D-lib, an online Info Science journal which sounds promising be still appears to store the citation metadata separately from the papers themselves.
ShaRef is a project aiming to bridge the gap between centralized library catalogs and isolated bibliographies managed by individual users. Typically, bibliographic information is required at many different levels of an academic institution, such as individual researchers, research groups, interdisciplinary collaboration among groups, and administration (for example, preparing annual reports of the department or university). In many cases, it requires a lot of effort to manage the information flow as bibliographic information is moved, copied, and edited in many different locations. ShaRef will provide a tool and a service to support this scenario, by striking a balance between individually stored information and sharing capabilities based on groups of users.
August 24, 2004
Executable for updating PDF metadata
It’s not XMP but it will update the document information dictionaries … I got this in the mail today. It goes to show the value of spreading your requirements to the nine winds …
From: ssteward@AccessPDF.com Subject: Themp: add/change PDF metadata Date: August 24, 2004 7:21:45 PM EDT To: james@freelancepropaganda.comHello-
Just following up on your old comp.text.pdf thread RE changing PDF metadata in C. The latest version of pdftk allows you to add/change/remove metadata in the PDF’s Info dictionary (but not the XMP stream). Pdftk is a free (GPL) stand-alone executable that runs on many platforms: http://www.accesspdf.com/pdftk/
HTH-
Sid
July 14, 2004
iPapers (goody)
Here’s an app from Japan which should help those that use PubMed to manage their downloaded pdfs: iPapers
iPapers is an application that manages many PDF files of articles, whose information can be obtained by PubMed. If you have articles as PDF files which have the filenames as PMID (e.g. 10867176.pdf), you can import the articles into iPapers by drag and drop operation. PMID is the unique ID provided by PubMed. iPapers searches and imports the information, e.g. name of authors, title, journal name, volume, number, pages and abstract, of the article from PubMed DB.
Not sure yet if it can export that information in a useful citation format … but it is a meaning to the bad file-names that most PDF publishers use … still database specific but a nice interaction.
(Thanks Mike)
June 30, 2004
More workflows
I like flow charts ;) Alf over at Hublog has been a-drawing his ‘ideal’ document-with-citations preparation workflow …
I like it except for the confusion over the exact meaning of document publisher. In the comments Alf says that he means this has the bibtex or endnote run where the citation keys are converted to proper citations and bibliographies … makes sense to me.
Collaborative metadata lookup would fit into his model, I think, as a feature of the datastore (RefDB) or editing client (eg Bibdesk) …
June 28, 2004
Spotlight--Just another reason to put metadata in the file
Apple is going to introduce a metadata search engine right into OS X. The engine scans the files and uses the specific metadata format of the file to build its index.
This shows again why having the metadata in the file is the right way to do this stuff. I can’t, however, get the image of placing the catalogue file with the book on the shelf out of my mind.
Think of Spotlight as a team of super-speedy librarians that are constantly running through the stack looking at books and copying down the card information and seeing when any new book comes into the library and grabbing the info on the card.
Hmmmm. The possibilities are interesting here for collaborative metadata … a p2p application leveraging these metadata directories to update metadata for digital objects and to discover related information. In fact I think it would be fairly easy to build a recommender system straight off the Spotlight database … Interesting. Interesting.
[Update: … I see that darcus has been similarly inspired by Spotlight …]
June 22, 2004
XMP embedding via pdflatex
Ah, one step closer. Creative Commons brings the news that XMP data can now be incorporated when making pdfs through the pdflatex system (which I use).
Maarten Sneep has created a pdflatex macro for embedding XMP in PDF files generated from LaTeX source, the de facto standard for scientific documents. As Maarten’s documentation points out, one may obtain XMP suitable for embedding via choose license process. We have a tech challenge calling for Creative Commons and XMP support in open source applications. Follow Maarten’s lead!
June 14, 2004
Presented Paper
I presented the first version of this paper on Friday 11 June at the Colleges, Code and Copyright conference . I think the presentation went well and the reaction from the audience was good.
Our panel was quite diverse but included two speakers who were focused on open access publishing. The speaker before me, Gary , included the little gem of hoping towards a time when students have 11,000 papers on their computers, rather (or in addition to) 11,000 music files. A better segue could not have been asked for.
Clifford Lynch, from the Centre for Networked Information at SIMS at Berkeley was there and gave a handy keynote on the state of play for DRM and trusted computing and its potential impacts.
Read the full paper: Why can’t I manage PDFs like I manage MP3s . It has quite a bit of emphasis on the collaborative metadata systems that drive the ease of music management, rather than just the embedding of metadata in the file (which remains important!).
I was asked if this was my dissertation. Not at the moment because I’m focusing on the social science of Free and Open Software development but I’m hoping to keep this one bubbling along as a free software project (once there is actually some code!). Maybe it’ll become a participant observer component of the dissertation ;)
April 23, 2004
Workflow pictures
I drew pretty pictures of academic and MP3 workflow and got so excited I posted them to the main blog.
April 08, 2004
Workflow comparison
Process Maps of Academic vs MP3 workflow.
“Rip, Mix, Burn”
Music
1. Get song
- rip from CD, lookup metadata from CDDB, drop into manager
- download from file-sharing network, check metadata, use MusicBrainz
- Record Internet Radio, meta broadcast, drop into manager
- From download services, integrated with manager
- emailed/ftp’d from friends, metadata travels with file, can use MusicBrainz.
2. Mix
- creating playlists in manager
- sorting by artist, genre, album, ranking, bpm, rating, play counts.
- create Mash-up (overplay, speed shift etc)
3. Burn
- play, move to mobile device (metadata essential)
- burn a CD (metadata provides order etc.)
- share, metadata faciliates discovery (but mostly filename)
Academic research
1. Get paper as pdf
- download pdf paper from online journal (publisher’s site, aggregators system (eg JSTOR) or pre-print services like citeseer.
- search from within Endnote etc.
- get paper by email from colleague (or author)
- scan photocopy and save to PDF
2. Get metadata
- some download services have metadata in a range of formats available on the download page (eg Citeseer, emerald …)
- Some (not all!) articles have a full citation on the title page of the article
- find citation in source (either database or older paper that references it
- colleague emailing may send citation in text format.
3. Storing Paper
- idiosyncratic. some have folders, some print and file, most store in ‘project folder’ which binds the paper tightly to the particular mix
- add citation to citation manager eg Endnote, Procite or Bibdesk
- link paper to citation manager (usually paper is not imported into program but rather a file path is recorded …)
-AutoFile is a possibility (making a consolidated personal library under the control of the file manager)
4. Mix
- Preparing a paper for publication
- reading to prepare an annotated bibliography
- reading to stay ‘on top of’ the literature
Print paper out vs read on screen.
Using citation
- manually (or grad-student laboury) “fumbling”. Different citation format requirements are painful.
- Citation manager allows ‘drop’ into document either through a macro or code which then is scanned and turned into appropriate citation.
Usually no record of personal use, “Which papers do I reference most often” … difficulties in finding papers on our own hard drives. Especially when filed by project (but that is the most useful while working on a specific piece of work).
5. Burn
- print and distribute (loose electronic citation)
- email to colleagues, upload to collaborative repository (ftp site, intranet site file gallery) (loose ‘code’ citation …)
- maybe send annotated bibliography (citation but not paper) or bibtex/endnote file.
metadata and paper travel separately meaning that labour to join them is required again!