NRV Project presents PubMedPDF Tools

[v1.09, 2008-09-18; v1.07b-win, 2004-11-12] --- Go to DOWNLOAD section


PubMedPDF Tools Manage Your PDF Reprints Automatically in a Self-Organizing Virtual Filing Cabinet

PubMedPDF Tools are a set of Perl and shell scripts for managing PDF reprints of papers that are indexed by PubMed (www.pubmed.gov). Useful on most platforms including Unix, MacOS X, and Windows.

If most of the journals you read are indexed by PubMed, you will never again lose a PDF reprint in the vast sea of files on your hard disk. With a help from PubMed, it automatically files PDF reprints into right folders using right filenames. Folders are created automatically for each author/year and journal/year (and for key words and title words too, if you are willing to tweak the scripts). And within these folders, papers are placed automatically with intuitive file names based on the author names, title, etc. The resulting well-organized reprint collection can be accessed via standard filesystem browsers such as Explorer (Windows) or Finder (MacOSX), and via web browsers (if web sharing is enabled).

This is not yet another reference/citation manager, nor do you run any special search-and-find software when you need a particular paper. It is more like a traditional filing cabinet created in the standard filesystem of the OS, except that reprints thrown into a large "In" folder magically just show up in well-organized per-author and per-journal folders automatically. Users do not have to be aware that automated scripts run periodically behind the scenes. The filing scheme itself is probably something you might already be trying to use by organizing reprints by hand, perhaps with different degrees of success depending on how disciplined you are. Here instead, the system automatically does the tedius job for you, and does it more consistently than any human can ever hope to achieve manually. It works for nearly all papers in life sciences and biomedicine since the 1950's, and works well for scanned old reprints as well as new PDF files downloaded from journal sites.


The screen shot above is a view of the "In" folder named PMID, as accessed via Windows Explorer. All that users need to do is to save reprint files using PubMed ID (PMID) as filenames, and copy the files into this folder by drag-and-drop. (Yes, you do have to look up PubMed once when giving reprint a filename, unless you download PDF reprints using iPapers.)

The rest of handling and organizing will take place automatically. ALL of the folder/file organzations shown below are created automatically by the scripts.

The scripts automatically file these papers via symblic links (or shortcuts in Windows) without touching the original files. Then, the PDF files may be accessed via more meaningful names using Explorer on Windows, or using Finder or equivalent on a Mac (e.g., RBrowser), as shown below in the author/year directory tree. Multi-author papers are listed under ALL authors equally. No special treatment for the first or the last author. So, if you know the name of just one author for a given paper, you can find it. And this does not increase disk usage much at all because files are not duplicated, thanks to symbolic links and shortcuts. The scripts are run automatically by the Unix "cron" system (tell us how this is done on Windows), which periodically processes newly added reprint files.



The items in the selected folder above appear as shown below in the list view, revealing highly descriptive filenames for the symbolic links (note also that symbolic links require only 4KB of file size regardless of the size of PDF files; actual size depends on the OS and disk/filesystem parameters):


And in the journal/year directory tree, they appear as:


You can also browse and view files using a web browser on any computer as shown below (Keep your archive site secure from unauthorized access). Links to the reprint files are given very descriptive names automatically, based on the author names, title, volume number, and pages, etc.:


The getNewFromXxxx sync script compares list of files on the two machines and download only those reprints that are missing on your laptop. To sync with machines for which you have login accounts, it would be easier to use "rsync" command instead.


Setting Up

Please read NOTES.txt below for details. The pubmedpdf*.zip file contains everything you need. Old Windows version is in the "Windows" folder. Please also note that there is absolutely no support for this software (It's all source code, so it's easy to add features.). In particular, we cannot respond to any Perl or Unix questions. However, comments and improvements are welcome, and they should be sent to: pubmedpdf@fbs.osaka-u.ac.jp. And finally, if you run the scripts and find them useful, please do let us know. Our funding source would be glad to hear about it.



Notes on what's what --- PubMedPDF Tools, iPapers, and PubMedPDF/XOOPS

These pieces of software are based on the same idea of using PubMed ID as the key for organizing your reprint collection automatically, but are based on entirely different codebases. PubMedPDF/XOOPS has been written from scratch in a server-side scripting language PHP to add more sophisticated features such as key word searches and web-based interface. To support these features, it relies on MySQL database and XOOPS content management system. Despite these highly sopisticated features, installation and use of PubMedPDF/XOOPS is surprisingly simple and fast (5~10 min) on Windows and MacOS X (even for novices) thanks to prepackaged all-in-one distributions such as XAMPP. So, if you don't know what is suitable for you, I would recommend that you try PubMedPDF/XOOPS first. This software does take up CPU/memory resouces when it is running, and running it on a laptop all the time may be a bit problematic. PubMedPDF/XOOPS has been written and is maintained by a different group lead by Dr. Hidetoshi Ikeno at University of Hygo (Japanese page).

PubMedPDF Tools, which is downloadable below, is much simpler but less capable. It does not offer any search. It is a tool for creating static hierachical folder systems and symbolic links (shortcuts) with meaningful names to original PDF files by looking up PubMed. Once this is run, no programs will be running on your computer (so, no CPU/memory is consumed). You are supposed to use whatever is on a given OS for browsing the file system to find the PDF file you need. You can install both PubMedPDF Tools and PubMedPDF/XOOPS on the same machine without conflicts. The key parts of PubMedPDF Tools was written by Dr. T. Aoyama, before he went on to write the popular iPapers for MacOS X.

iPapers, which is downloadable from the original distribution site, is NOT a wrapper around "PubMedPDF Tools" as far as I am aware. Although the base idea is similar in that both use PubMedID-based file naming convention, and relies on PubMed lookup, iPapers is written from scratch in Cocoa API and does not use the Perl scripts.

Historical Notes: PubMedID-based PDF file naming convention originated from the MyPDF feature (implemented in April 2003) of Visiome Platform. For making stand-alone uses possible of the PDF collection consisting of numbered PDF files, Dr. T. Aoyama wrote PubMedPDF Tools (first release 2003-08-20 v1.0), and then iPapers (first release v0.1, 2004/5/14). PubMedPDF/XOOPS module was then written by a group led by Prof. H. Ikeno at Univ. of Hyogo, to fill a need for platform-indepencence and offer features similar to iPapers but via a web-based database solution.
Windows Version (2004-11-12: Shortcut creation bug FIXED. This version is frozen. No more updates will be done.) --- Key scripts in PubMedPDF Tools are now available also for Windows, thanks to Philipp Sasse. These are all command line tools and not GUI applications.
>> JAPANESE Installation Guide for Windows is available here.

MacOS X Application --- iPapers -- a new GUI-based MacOS X application is available now from the primary author (Dr. T. Aoyama) of PubMedTools scripts. iPapers offers more conveniences, and is complementary to the server-based system implemented by the scripts here.


Acknowlegements: Contents of this page mirror the official distribution from Visiome Platform ( http://platform.visiome.org/ ), (search with "pubmedpdf" there), which provides key word search capabilities (limited to the field of Vision Science) with links to reprints in your own collection (not ones at journal sites) in the form outlined above. Visiome Platform is now open for public access. PubMedPDF Tools and Visiome Platform are products of Neuroinfomatics Research in Vision (NRV) Project, which has been supported by a special coordination fund for promoting science from the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of the Japanese government during fiscal years 1999-2003.



Download



Last updated: Tue Aug 18 10:44:16 JST 2009