This is probably one of those scripts that will evolve over time, but I’m posting it now in case someone can get some use out of it. My problem was this:
I had many, many figures in my working directory, but I didn’t use all of them in the Latex document. I was trying to figure out a way to send the source files — *.tex, *.cls, *.bst, *.bib, etc, plus only the images files that were actually in the document — to someone else so they could edit on their own and compile on their own. I didn’t want to set up a version control (SVN, etc), I just wanted a tar file.
After some poking around I couldn’t find anything already made that would do this (Kile has an Archive menu item, but this doesn’t include figures). It was easy enough to get a Python script going.
This script parses an input file, looks at the various documents and figures that are included, and archives them in a tar.gz file which can then be sent to someone. Note that as it stands, it only looks two levels deep for \include tags. If I use this more I’ll have to make it recursive (it’s not obvious to me how to do that, I haven’t used recursion much before).
Consider this script a rough draft. It worked perfectly for me, but your mileage may vary.
"""
This script gathers the necessary images and files (from
an arbitrarily large number of unneeded figures) and
puts it all in a tarball for distribution.
Usage: latexpackager.py main.tex dissertation.tar.gz
"""
import sys
import re
import os
import tarfile
def find_references(f):
'''Returns a list of Latex files that f refers to,
by parsing \include, \bibliography, \bibliographystyle,
\input, etc.
If nothing was found, returns an empty list.'''
s = open(f).read()
# Find the .tex files.
texs = []
for i in re.finditer(r"""[^%]\\include\{(.*)\}""", s):
texs.append(i.groups()[0]+'.tex')
# Find the .bib files.
bibs = []
for i in re.finditer(r"""[^%]\\bibliography\{(.*)\}""", s):
bibs.append(i.groups()[0]+'.bib')
# Find the styles.
styles = []
for i in re.finditer(r"""[^%]\\bibliographystyle\{(.*)\}""", s):
styles.append(i.groups()[0]+'.bst')
# Find the document class description file
docclass = []
for i in re.finditer(r"""[^%]\\documentclass\{(.*)\}""", s):
docclass.append(i.groups()[0]+'.cls')
# Look for any inputs.
inputs = []
for i in re.finditer(r"""[^%]\\input\{(.*)\}""", s):
texs.append(i.groups()[0]+'.tex')
# Here is everything that was referenced in f:
return texs + bibs + styles + docclass + inputs
def find_figures(f):
'''Returns a list of figures found in the file. Only
looks in .tex files. If not a .tex file or no figures found,
returns an empty list.'''
# Short circuit if not a .tex file.
if f[-4:] != '.tex':
return []
includegraphics = r"""[^%].*\\includegraphics\[.*\]\{([^\}]*)\}"""
figures = []
s = open(f).read()
matches = re.finditer(includegraphics, s)
for match in matches:
basename = match.groups()[0]
if basename[-4] == '.':
# that is, it has an extension already.
# This is for things like .png images.
figures.append(basename)
else:
figures.append(basename + '.pdf')
figures.append(basename + '.eps')
return figures
main = sys.argv[1]
tarfn = sys.argv[2]
projectdir, main = os.path.split(main)
if projectdir == '':
projectdir = os.getcwd()
keepers = find_references(main)
# Don't forget to add the main .tex file.
keepers.append(main)
# For each of those that main.tex referenced, look for more.
# These are files referenced two levels deep.
for f in keepers:
if f[-4:] != '.tex':
continue
keepers.extend(find_references(f))
# Now look for graphics.
figures = []
for f in keepers:
figures.extend(find_figures(f))
#paths = [os.path.join(projectdir, i) for i in keepers + figures]
paths = keepers + figures
tarball = tarfile.open(tarfn, 'w:gz')
for path in paths:
print path
tarball.add(path)
tarball.close()