Mon, Nov 30, 2020
I recently received a rather suspicious E-Book in epub format. Like PDFs, these can also contain malicious code. I wanted to open it, but I wasn’t sure if I could trust it.
In the past, I would have deleted it straight away. But this time I wanted to actually know if there is malicious code inside this file. So, how do I find that out?
First I figured, I needed an understanding of what a
.epub is and what contents are in it.
I found this W3 specification. Which I of course read in every detail, because I absolutely want to implement an
.epub reader with every detail.
However, I noticed two main points:
.epubFiles could have SVG, XML, HTML, CSS in it
The specification suggests that there are some files packaged into the
.epub. So I tried the old ‘rename it to
.zip and open it’ trick and voilà it worked:
$ tree . ├── META-INF │ └── container.xml ├── content.opf ├── cover.jpeg ├── images │ ├── 00002.jpeg │ ├── ... │ └── 00025.jpeg ├── mimetype ├── page_styles.css ├── stylesheet.css ├── text │ ├── part0000.html │ ├── part0001.html │ ├── ... │ └── part0031.html ├── titlepage.xhtml └── toc.ncx
Lots of HTML, Images and CSS. When you open these
.opf files, it turns out these are XML.
I had only known that
.epub files could contain malicious code, but not how this code could be executed.
So we have to look for things like this:
These XML external entity injection can look like this:
<!DOCTYPE foo [ <!ENTITY entity_name SYSTEM "file:///etc/passwd"> ]>
I am just going to be ignorant how these attack exactly work. My target is to recognize the patterns of malicious code.
For those interested, look at the Web Security Academy from PortSwigger.
I tested a few
.epub files and depending on the file, there could be 30 or so HTML files be in there.
In this next part, I could write some fancy regex based automated malware scanner for
.epub files. OR I use the best neural network in the world every human has built on his shoulders: my brain.
The thing is, I don’t want to manually open each of those files and close each file. I want to look at them, but fast and easy.
Luckily there is a text editor which many are obsessed with closing: VIM.
This is easily one of the best and simplest solutions. By saying this, let’s add something to my
.vimrc configuration file.
if $VIMENV == 'prev' noremap <Space> :n<CR> noremap <Backspace> :N<CR> set noswapfile endif
This tells vim if
VIMENV is set to
prev, remap space to move a tab forwards and remap Backspace to move a tab backwards.
Great, how do I use that configuration?
$ VIMENV=prev vim file.txt
To make that even easier 😉, I add an alias to my
alias vimprev="VIMENV=prev vim"
After this, I simply
cd to my
epub dir and execute:
vimprev $(find . -type f)
This will open all the files in the current dir in vim tabs. I can tab through them with space and backspace.
This took about 5 minutes for me, because our brains are normally extremely fast in pattern recognition and anomaly detection. If something looks fishy, I take a second look.
You are probably wondering how to exit vim:
Interested on trying it yourself? Or do you want read some copyright free books (at least in the US)? Look at Project Gutenberg for royalty free E-Books.
If you have any questions, let me know on Twitter (my DMs are open).