Remove duplicate of files

I had a number of folders on my MacBook. Each of them a full backup from a mobil phone, from different dates. To clean it up, I decided to only keep the elements in each folder, not present in the next folder.

To do this, I started by creating a diff output

diff -q -s folder1/ folder2 > ~/Desktop/diffOutput.txt

This output contained both the uniq files, but also list the identical files.

So to get a list of the identical file names, I used grep and cut, like the following.

grep identical diffoutput.txt | grep -e "folder1/.* and" -o | cut -f2 -d'/' | cut -f1 -d' ' > ~/Desktop/diffParsed.txt

The last thing is to remove the identical files.

cat ~/Desktop/diffParsed.txt | xargs rm

Preparing video for editing

I had the pleasure to work with some video editing resently. The background is that my sister got some 8mm cine film from our youth converted to a DVD. As this DVD contains clip from many different 8mm cine films mixed together, I would like to split them up and be able to combine them in a new way.

I got the DVD as a dmg image from a mac, so the first thing was to see what kind of dmg image it was.

xxx@stas ~/Delta $ file film-compressed.dmg
film-compressed.dmg: VAX COFF executable - version 8343

This type is a compressed image, so the first thing I need is to decompress it. For this a tool named dmg2img is available in the Gentoo portage tree.

dmg2img -v film-compressed.dmg film-uncompressed.iso

Then I can mount the image and copy the contents out. Now I have the raw data for my videos

sudo mount -o loop -t auto film-uncompressed.iso isoMount

Though was the vob files with error in the form of wrong length. Opening a file in vlc or another tool indicated a length of 9 seconds even though the clip was 3 minutes long. I fixed it together with a conversion using ffmpeg. So a quick script later, my computer was working on the conversion.

for f in $FILES
    OUTPUT_FILE=`echo $f | cut -f 4 -d "/" | cut -f 1 -d "."`
    echo "Outputfile $OUTPUT_FILE"
    ffmpeg -i $f -sameq -vcodec libx264 -acodec libfaac $OUTPUT_FILE

Now I have prepared my data, and is ready to do the editing. For this I use kdenlive.

Being the “Timothy McGee” of the family

As a fan of NCIS, I have often seen how Timothy McGee and Abby Sciuto is doing their forensic work on computers, where they need to find some information to solve a case. Of cause they are in a crises, and have some magical graphical interfaces telling them all sort of information extremely fast. In the real world things is a bit different.

I decided to write this post, while recovering some mails for my uncle. He had an old computer, which started spontaneous shutdowns because of a thermal event. So he bought a new computer and asked me for help with getting the old mails. At that point I thought it was an easy job, just involving getting the hard drive, put it in my disk cradle, copy the outlook data file to the new computer and import it. When I dismantled the computer, I noticed the hard drive was a bit older then first expected. It had an IDE interface instead of a SATA interface, so my disk cradle idea did not work. No problem, I will just put it in a computer with IDE interface and access the files from this computer. Though luck, when I booted in Windows 7 check disk failed, and asked if I wanted to format the hard drive. Not the question I had hoped for.

The task just got bigger than first imagined. So the first thing would be to prevent any more damage to the data and file system on the disk. So I put the hard drive in my trusty Gentoo Linux machine and did a raw copy of the partition using dd.

sudo dd if=/dev/sdi1 of=/home/frosteyes/charlie/Bo/diskImage.img

Now having the disk image, I can work on getting the mails without risking to destroy anything on the physical disc. I am using testdisk and photorec. They can be installed by emerge testdisk on Gentoo Linux.

Using testdisk on the image, it shows that the MFT and MFT mirror are bad. Failed to repair them. This was identical with what chkdsk had reported earlier. So no usable master file table (MFT).

The next task was to run photorec on the disc image as seen below. photorec is a recovery program, which among other file types can detect pst files without having a file sytem on the disk. It resulted in a huge number of folders with files, including a number of pst files.

photorec is running on the disc image

An then finding the pst files I was looking for.

frosteyes@stas ~/charlie/Bo $ find ./ -iname *.pst | xargs ls -l
-rw-r--r-- 1 frosteyes users 81282048  5 maj 17:16 ./recup_dir.133/f9541960.pst
-rw-r--r-- 1 frosteyes users   271360  5 maj 17:19 ./recup_dir.171/f13814104.pst
-rw-r--r-- 1 frosteyes users 24396800  5 maj 18:46 ./recup_dir.374/f47280112.pst
-rw-r--r-- 1 frosteyes users  1033216  5 maj 18:49 ./recup_dir.414/f54353688.pst
-rw-r--r-- 1 frosteyes users   271360  5 maj 18:50 ./recup_dir.416/f55373800.pst

Before handing over the files to my uncle I just tested the files using lspst from libpst. Can be installed with emerge libpst on my Gentoo system. It showed that the pst files contains the needed emails.

So all in all it ended up being more forensic work than expected, but quite fun and I felt a bit like McGee from NCIS.

Removing simple PDF password

At my work I receive my payslip as a pdf file with a password. This is very nice, as it of cause prevents anybody from snooping into my salary. Though today I had the need for sending a copy of my payslip to my bank adviser, so I needed to create a version with the password protection removed. Luckily it was pretty simple, as I could just use ghostscript on my mac. XXX is the password, and the parameters outputFileName.pdf and inputFileName.pdf is self-evident.

gs -q -dSAFER -dNOPAUSE -dBATCH -sDEVICE=pdfwrite \
 -sPDFPassword=XXX -sOutputFile=outputFileName.pdf \
 -c .setpdfwrite -f inputFileName.pdf

Doing presentations on my Mac

I am in the middle of creating some presentations about Git. A version control system, I have successfully integrated into my department at Phase One. So I could of cause use Microsoft PowerPoint, or Apple Keynote.
Though is there a tool, which I am more used to, and even though it is harder to learn, superior in many ways. The tool is LaTeX, Beamer, Emacs and AuxTex.

So why do I consider this a superior solution.

  • The files are text files, and easy to version control, including being able to merge changes.
  • Depending on target, It will guide the presentation design against a PDF output, meaning you will not create distracting animations etc.
  • Being designed to PDF output also means it easy to prints handout etc.
  • The tools is true cross platform, working on Windows, Mac OS X, Linux and *BSD. So you data are not locked to a vendor.

The bad thing is the learning curve and knowhow needed for using the tool. PowerPoint and Keynote are much easier to get started with.

So knowing the tool from my many years working with Linux, the new part was to install the tools on my Mac. It was though pretty easy to install, as I am already using MacPorts.

  • LaTeX. The easiest way to install LaTeX using MacPorts, is the texlive package. This can be installed in a basis, medium or full variants. I just select the medium variants, as it contains the package I need.
sudo port install texlive +medium
  • Beamer is a tetex package, containing what is needed for creating presentation in LaTeX. Again installed by a MacPort command.
sudo port install tex-beamerposter
  • Emacs is an old family of editors, started in 1975 by Richard Stallman. The version I choose to use on my Mac is called emacs-app in Macport, and contains a Cocoa edition of GNU emacs. Cocoa means the editor uses the Macs native graphical toolkit.
sudo port install emacs-app
  • AucTex is a package originally from Aalborg University, which is a major emacs mode for editing TeX files. It is installed by the following command.
sudo port install auctex +emacs_app

When everything is installed you will need to bind it together in the .emacs file. This is the configuration file for emacs, and there are many examples for how to create one around on the internet. Personally I see it as a continued work in progress; there never can be finished. Later I will maybe give some tips as comments to this post.