TZWorks LLC
System Programming and Consulting
www.tzworks.net


TZWorks®
Mozilla Cache Parser - mcp

(Version 0.23)



Information about our End User's License Agreements (EULAs)
for software on TZWorks, LLC Website www.tzworks.com

User Agreement

TZWorks LLC software and related documentation ("Software") is governed by separate licenses issued from TZWorks LLC. The User Agreement, Disclaimer, and/or Software may change from time to time. By continuing to use the Software after those changes become effective, you agree to be bound by all such changes. Permission to use the Software is granted provided that (1) use of such Software is in accordance with the license issued to you and (2) the Software is not resold, transferred or distributed to any other person or entity. Refer to your specific EULA issued to for your specific the terms and conditions. There are 3 types of licenses available: (i) for educational purposes, (ii) for demonstration and testing purposes and (iii) business and/or commercial purposes. Contact TZWorks LLC (info@tzworks.com) for more information regarding licensing and/or to obtain a license. To redistribute the Software, prior approval in writing is required from TZWorks LLC. The terms in your specific EULA do not give the user any rights in intellectual property or technology, but only a limited right to use the Software in accordance with the license issued to you. TZWorks LLC retains all rights to ownership of this Software.

Export Regulation

The Software is subject to U.S. export control laws, including the U.S. Export Administration Act and its associated regulations. The Export Control Classification Number (ECCN) for the Software is 5D002, subparagraph C.1. The user shall not, directly or indirectly, export, re-export or release the Software to, or make the Software accessible from, any jurisdiction or country to which export, re-export or release is prohibited by law, rule or regulation. The user shall comply with all applicable U.S. federal laws, regulations and rules, and complete all required undertakings (including obtaining any necessary export license or other governmental approval), prior to exporting, re-exporting, releasing, or otherwise making the Software available outside the U.S.

Disclaimer

The user agrees that this Software made available by TZWorks LLC is experimental in nature and use of the Software is at user's sole risk. The Software could include technical inaccuracies or errors. Changes are periodically added to the information herein, and TZWorks LLC may make improvements and/or changes to Software and related documentation at any time. TZWorks LLC makes no representations about the accuracy or usability of the Software for any purpose.

ALL SOFTWARE ARE PROVIDED "AS IS" AND "WHERE IS" WITHOUT WARRANTY OF ANY KIND INCLUDING ALL IMPLIED WARRANTIES AND CONDITIONS OF MERCHANTABILITY, FITNESS FOR ANY PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL TZWORKS LLC BE LIABLE FOR ANY KIND OF DAMAGE RESULTING FROM ANY CAUSE OR REASON, ARISING OUT OF IT IN CONNECTION WITH THE USE OR PERFORMANCE OF INFORMATION AVAILABLE FROM THIS SOFTWARE, INCLUDING BUT NOT LIMITED TO ANY DAMAGES FROM ANY INACCURACIES, ERRORS, OR VIRUSES, FROM OR DURING THE USE OF THE SOFTWARE.

Removal

The Software are the original works of TZWorks LLC. However, to be in compliance with the Digital Millennium Copyright Act of 1998 ("DMCA") we agree to investigate and disable any material for infringement of copyright. Contact TZWorks LLC at email address: info@tzworks.com, regarding any DMCA concerns.


About the mcp Tool (top)

The Mozilla Cache Parser (mcp) targets the Mozilla Firefox cache and extracts useful information for the examiner. This tool is not unique, in that there are other Mozilla cache parsers available; a few good ones are even free. This tool was primarily created based on a need to provide more insight into the association of the cache metadata (eg. timestamps, URL, http request/response, etc), and cache content data (eg. data for the webpage that is displayed), especially when applied to the earlier versions of the Mozilla formats. In addition, and from a tool developer's standpoint, the mcp codebase can be used as framework for future work to evaluate Mozilla cache artifact data that may be corrupted or fragmented.

As background, the Mozilla cache (like any other browser cache) is a repository for web data a user has viewed or downloaded. In general, the purpose of the cache is to store data locally, to allow the browser quick access for later requests to that same website. The cache includes: website pages, files, and images that were viewed by a user. In addition to the raw data that was received from a web server, the Mozilla cache also contains useful metadata associated with each item. From the point of view of the forensic examiner the data is interesting, since it contains items such as: the URL of the webpage, number of times the page was fetched from the cache, filename/type/size, last modified time, last fetched time, server time, etc. Having a tool available that can take advantage of this artifact data is necessary to have insights into the user's activity.

Like any other application that has lasted for some time, the Mozilla architecture (including cache structure) has evolved over time. The older version of the Mozilla cache architecture, consisted of 3 categories of files:

  1. The _CACHE_MAP_ which associates the metadata and raw data locations.
  2. The _CACHE_001_, _CACHE_002_ and _CACHE_003_ block files. These 3 files contain predefined chunks (ranging from small chunks to larger chunks) that are used to store the metadata for the cache as well as some raw data from webpages.
  3. The last category are the data files (and metadata) that are too large to fit within one of the three block files listed in (2) above. If all the files are available, mcp will look at the data in all the files to generate the results. Using the data in the _CACHE_MAP_, mcp will annotate the location of the raw cache data in the results. In the absence of the _CACHE_MAP_ file, this associations will not be present.

The newer version of the Mozilla cache uses a separate file per webpage to store both the raw data and the metadata associated with the webpage.

In the case of mcp, the tool can handle the above various format nuances without using any special parameters. It was designed to sense which parsing engine to use to appropriately parse any of the cache formats.

Mozilla artifacts are located in the user's directory. This varies depending on the operating system used. Below is a table that breaks out the location by operating system.

OS Mozilla Cache Location
Win XP %userprofile%\Local Settings\Application Data\Mozilla\Firefox\Profiles\<random text>.default\Cache
Post Win XP %userprofile%\AppData\Local\Mozilla\Firefox\Profiles\<random text>.default\Cache
OSX /Users/[user acct]/Library/Caches/Firefox/Profiles/<random text>.default/Cache
Linux /home/[user acct]/.mozilla/firefox/<random text>.default/Cache

How to use this Tool (top)

The menu below shows the options available. The formatting options are similar to the rest of the TZWorks tools. The output can be rendered in either: delimited text (CSV and Log2Timeline) or SQLite. The SQLite option was added primarily to allow one to parse the cache records while archiving the results along with any companion content data.

    Usage

      mcp -file <cache file(s)> [-mapfile <_CACHE_MAP_ file>] [options]
      mcp -enumdir <folder> -num_subdirs <# levels> [options]
      dir <folder> /b /s /a | mcp -pipe

     Format Options
      -csv                            = output is CSV format
      -csvl2t                         = log2timeline output
      -dateformat mm/dd/yyyy          = "yyyy-mm-dd" is the default
      -timeformat hh:mm:ss            = "hh:mm:ss.xxx" is the default
      -no_whitespace                  = remove whitespace between delimiter
      -csv_separator "|"              = change delimiter to a pipe char
      -base10                         = use base10 numbers

     File Output Opions (some format opts above do not apply to SQLite)
      -sqlite <output db>             = create/put results in SQLite db
      -out <output file>              = put results in text delimited file

     Folder Traversing Options
      -pipe                           = pipe files to parse
      -enumdir <dir> -num_subdirs <#> = pull from files from folder
      -split_sessions                 = split sessions into separate files
 

To process cache files, one can either target a folder or individual cache files. The tool will automatically determine which version of the format the cache files are in and adjust the parsing engine accordingly. In fact, when parsing many subdirectories of artifacts where each subdirectory is a different account or machine, the tool will dynamically adjust for the version of the cache being parsed at that time and keep the mapping of the cache metadata to the cache content data sorted.

If processing a directory of cache files (either by using the -pipe command or the -enumdir command), the tool will look for the Mozilla directory structure starting with the "Cache" or "Cache2" folder to indicate when to start parsing. Alternatively, if targeting the older cache format, where you only have a few files, one can use the option -file <cache file1 | cache file 2 | …> as well.

Targeting Specific Cache files

As mentioned above, if one wants to target a specific cache file or a couple of cache files, one can use the -file option. This was included in the options since it was needed during the debugging of the mcp tool. With this option, it is useful to analyze one file at a time to help debug what is going on within the parsing engine. In the example below, we are targeting a cache file that uses the older version of the cache format (eg _CACHE_001_). The results would be rendered in the test1.csv file.

    > mcp64 -file .\cache_test1\_CACHE_001_ -out test1.csv

One should note that even though there is no _CACHE_MAP_ file being used in the above example, the mcp tool will still try to parse out where the "content data" is located. For the older Mozilla cache formats (such as in this exmaple), the association between the 'metadata' and 'content data' is recorded in the _CACHE_MAP_ file, so without this mapping file being present, the mcp tool needs to use its internal scanning/heuristics to try to locate 'content data'.

If one adds the _CACHE_MAP_ file into the mix during the parse option, the command will look like this.

    > mcp64 -file .\cache_test1\_CACHE_001_ -mapfile .\cache_test1\CACHE_MAP_ -out test2.csv

Since the mapfile was added during the parsing operation, the mcp tool will make use of the _CACHE_MAP_ file to associate all the metadata records to the content data.

The above discussion focused on the older Mozilla cache format (version 1.x format). The later cache formats overcome this mapping issue by integrating the cache's metadata within the same file as the content data. This eases the parsing logic since only one file needs to be analyzed for both the metadata and content data.

Processing Cache Files in one or more Subdirectories

If desiring to process many Mozilla cache files in one pass, one can make use of mcp's piping option (-pipe) or the folder enumeration option (-enumdir). Either of these options allow one to target multiple subdirectories during the parsing operation. Below is a simple way to target the cache files in a Mozilla account.

    > dir c:\Users /b /s /s | mcp64 -pipe -out test3.csv

        or

    > ccp64 -enumdir c:\Users -num_subdirs 15 -out test3.csv

For any of the above options to work with this tool, the Mozilla folder structure must be preserved after the random session text string and before the 'Cache' or 'Cache2' folder. Why? It was a design choice to allow the tool to easily tell the 'type' a file the tool was examining to assist in parsing; this allows the tool to determine the version of the format of the cache being used (due to the naming convention).

Processing Multiple Mozilla Accounts and/or Instances

Since mcp makes uses of the Mozilla directory structure, one can pass in a number of accounts for the tool to process in one session. Internally, the mcp tool will detect the change in Mozilla instance and/or account and flush the current instance/account prior to processing the next instance/account. In this way, the the tool to conserves memory usage on the host machine. This is useful if trying to parse many Mozilla cache collections at one time.

When outputting the results, the tool defaults by integrating the output into one file. The context of the metadata is preserved in the output, since there is a delimited field that includes the source cache path/filename.

Archiving the Content Data

With the default option, the tool sends the parsed output to delimited text. This is fine when only wanting the results associated with the metadata and pointers to the content data. If desiring to archive the content data as well, mcp has an option to create and output the results into a SQLite database.

To invoke this option, use the -sqlite <db_name> in your command. All parsed results will include both the record metadata and its associated content data. To view the results, one will need to be familiar with the SQL syntax to query the database, or alternatively, will need a separate SQLite viewer to look at the data. A good SQLite viewer is the "DB Browser for SQLite" and a reference is located at the end of this document.

The database schema created by mcp consists of 3 tables: (a) moz_cache_metadata_entries, (b) moz_cache_ctxdata_entries, and (c) ref. Only the first two tables have the records from the parsed metadata and content data, respectively. The last table is used internally by mcp for bookkeeping only. The schema for these tables are discussed in the user's guide.

Splitting the Mozilla Sessions into Separate Files

One can take the discussion in the previous sections and modify the output so that the data is broken out into separate files per Mozilla session. This applies to both the CSV and the SQLite output variants. The syntax is the same as before, however, one just appends the sub-option -split_sessions to the command. This tells the mcp tool to take whatever was specified as the output file to be appended with a session number along with the random string used by the Mozilla folder name. Below is an example using this syntax.

    > dir c:\Users /b /s /a | mcp64 -pipe -out test4.csv -split_sessions

When the processing is done, one will have a number of files (one per Mozilla session). . The output name specified (in this case "test4") will be the part of the name with an incremented number along with the folder name used by Mozilla for that session.


Limitations (top)

Areas to be noted
The earliest version of the Mozilla cache this tool has been tested on is v3.0.1. Prior versions to v3.0.1 are not known whether they will work or not
The folder enumeration of the cache file option relies on the Mozilla directory structure as well as the naming convention used by Mozilla. Therefore, if either of these things are changed by Mozilla or if changed by a user, the parsing engine will have unpredictable results or no results at all
There are a couple of parsing engines within this tool; which engine is used is a function of the Mozilla name convention used for the cache file

List of options (top)

Option Description
-csv Outputs the data fields delimited by commas. Since filenames can have commas, to ensure the fields are uniquely separated, any commas in the filenames get converted to spaces.
-csvl2t Outputs the data fields in accordance with the log2timeline format.
-sqlite Outputs the data into a SQLite database. The syntax is: -sqlite <db name to create or use>.
-pipe Used to pipe files into the tool via STDIN (standard input). Each file passed in is parsed in sequence.
-enumdir Experimental. Used to process files within a folder and/or subfolders. Each file is parsed in sequence. The syntax is -enumdir <"folder"> -num_subdirs <#>.
-filter Filters data passed in via stdin via the -pipe option. The syntax is -filter <"*.ext | *partialname* | ...">. The wildcard character '*' is restricted to either before the name or after the name.
-no_whitespace Only applies to -csv and -csvl2t options. Used in conjunction with -csv option to remove any whitespace between the field value and the CSV separator.
-csv_separator Only applies to -csv and -csvl2t options. Used in conjunction with the -csv option to change the CSV separator from the default comma to something else. Syntax is -csv_separator "|" to change the CSV separator to the pipe character. To use the tab as a separator, one can use the -csv_separator "tab" OR -csv_separator "\t" options.
-dateformat Output the date using the specified format. Default behavior is -dateformat "yyyy-mm-dd". Using this option allows one to adjust the format to mm/dd/yy, dd/mm/yy, etc. The restriction with this option is the forward slash (/) or dash (-) symbol needs to separate month, day and year and the month is in digit (1-12) form versus abbreviated name form.
-timeformat Output the time using the specified format. Default behavior is -timeformat "hh:mm:ss.xxxxxxxxx" One can adjust the format to microseconds, via "hh:mm:ss.xxxxxx" or milliseconds, via "hh:mm:ss.xxx", or no fractional seconds, via "hh:mm:ss". The restrictions with this option is that a colon (:) symbol needs to separate hours, minutes and seconds, a period (.) symbol needs to separate the seconds and fractional seconds, and the repeating symbol 'x' is used to represent number of fractional seconds.
-quiet Show no progress during the parsing operation
-split_sessions Split the Mozilla sessions into separate files.
-utf8_bom All output is in Unicode UTF-8 format. If desired, one can prefix an UTF-8 byte order mark to the output using this option.

CSV field definitions (top)

Field Definition
type Cache version number
url_hash SHA1 hash of the URL contained in the metadata. This is a computed value by mcp. This hash should be equivalent to the filename for those cache versions that show a SHA1 hash for the name.
url_etag The HTTP etag that was present in the HTTP response
request_type_reply_status HTTP request type (eg. GET, POST), and reply status (eg. HTTP/1.1 200 OK)
serv_name Server name recorded in the HTTP Response
serv_timezone Server time zone
serv_date Server timestamp included in the HTTP Response
serv_modify Server modify timestamp included in the HTTP Response
serv_expire Server expire timestamp included in the HTTP Response
browser_fetch_utc Browser - last time the cache was fetched
browser_modify_utc Browser modify timestamp associated with the cache
browser_expire_utc Browser expire timestamp associated with the cache
content_create_utc Actual content data file create timestamp. This is only present if the content file is a separate file. For Linux and OSX, this is the status change timestamp
content_modify_utc Actual content data file modify timestamp. This is only present if the content file is a separate file.
fetch_count Number of times the cache was fetched
url URL of the webpage visited
url_params Any URL parameters used. This is formatted as JSON.
content_type The content data type (eg. GIF, JPEG, text, etc) extracted from the HTTP response
content_filename Last part of the URL prior to the URL parameters extracted from the HTTP response
content_encoding The encoding used on the content data (eg. gzip, br, etc) extracted from the HTTP response
content_size Size of the content data extracted from the HTTP response
content_location_info The file and offset (if not zero) within the file where the content data is located. This is formatted as JSON.
extra_fields The key/value pairs extracted from the HTTP response. This is formatted as JSON.
file The original path/file containing the metadata

Authentication and License File (top)

This tool has authentication built into the binary. The primary authentication mechanism is the digital X509 code signing certificate embedded into the binary (Windows and macOS).

The other mechanism is the runtime authentication, which applies to all the versions of the tools (Windows, Linux and macOS). The runtime authentication ensures that the tool has a valid license. The license needs to be in the same directory of the tool for it to authenticate. Furthermore, any modification to the license, either to its name or contents, will invalidate the license.

Limited versus Demo versus Full in the tool's output banner

The tools from TZWorks will output header information about the tool's version and whether it is running in limited, demo or full mode. This is directly related to what version of a license the tool authenticates with. The limited and demo keywords indicates some functionality of the tool is not available, and the full keyword indicates all the functionality is available. The lacking functionality in the limited or demo versions may mean one or all of the following: (a) certain options may not be available, (b) certain data may not be outputted in the parsed results, and (c) the license has a finite lifetime before expiring.


Version history (top)


References (top)

  1. Mozilla-central [https://hg.mozilla.org/mozilla-central/annotate/80eff2b52d14/netwerk/cache2/CacheFileMetadata.h#l54]
  2. Mozilla-central [https://dxr.mozilla.org/mozilla-central/source/netwerk/cache2/CacheIndex.h]
  3. firefox-cache-forensics -FfFormat.wiki [https://code.google.com/archive/p/firefox-cache-forensics/wikis/FfFormat.wiki]
  4. Joachim Metz. Firefox cache file format [https://github.com/libyal/dtformats/blob/master/documentation/Firefox%20cache%20file%20format.asciidoc].
  5. Endpoint Protection / Web Browser Forensics part 2 [https://community.broadcom.com/symantecenterprise/communities/community-home/librarydocuments/viewdocument?DocumentKey=30e9590f-e848-4857-8bd1-adf70638af36&CommunityKey=1ecf5f55-9545-44d6-b0f4-4e4a7f5f5e68&tab=librarydocuments]
  6. SQLite library statically linked into tool [Amalgamation of many separate C source files from SQLite version 3.32.3].
  7. SQLite documentation [http://www.sqlite.org].
  8. DB Browser for SQLite [http://sqlitebrowser.org/]