TZWorks LLC
System Programming and Consulting
www.tzworks.com


TZWorks®
Chromium Cache Parser - ccp

(Version 0.14)



Information about our End User's License Agreements (EULAs)
for software on TZWorks, LLC Website www.tzworks.com

User Agreement

TZWorks LLC software and related documentation ("Software") is governed by separate licenses issued from TZWorks LLC. The User Agreement, Disclaimer, and/or Software may change from time to time. By continuing to use the Software after those changes become effective, you agree to be bound by all such changes. Permission to use the Software is granted provided that (1) use of such Software is in accordance with the license issued to you and (2) the Software is not resold, transferred or distributed to any other person or entity. Refer to your specific EULA issued to for your specific the terms and conditions. There are 3 types of licenses available: (i) for educational purposes, (ii) for demonstration and testing purposes and (iii) business and/or commercial purposes. Contact TZWorks LLC (info@tzworks.com) for more information regarding licensing and/or to obtain a license. To redistribute the Software, prior approval in writing is required from TZWorks LLC. The terms in your specific EULA do not give the user any rights in intellectual property or technology, but only a limited right to use the Software in accordance with the license issued to you. TZWorks LLC retains all rights to ownership of this Software.

Export Regulation

The Software is subject to U.S. export control laws, including the U.S. Export Administration Act and its associated regulations. The Export Control Classification Number (ECCN) for the Software is 5D002, subparagraph C.1. The user shall not, directly or indirectly, export, re-export or release the Software to, or make the Software accessible from, any jurisdiction or country to which export, re-export or release is prohibited by law, rule or regulation. The user shall comply with all applicable U.S. federal laws, regulations and rules, and complete all required undertakings (including obtaining any necessary export license or other governmental approval), prior to exporting, re-exporting, releasing, or otherwise making the Software available outside the U.S.

Disclaimer

The user agrees that this Software made available by TZWorks LLC is experimental in nature and use of the Software is at user's sole risk. The Software could include technical inaccuracies or errors. Changes are periodically added to the information herein, and TZWorks LLC may make improvements and/or changes to Software and related documentation at any time. TZWorks LLC makes no representations about the accuracy or usability of the Software for any purpose.

ALL SOFTWARE ARE PROVIDED "AS IS" AND "WHERE IS" WITHOUT WARRANTY OF ANY KIND INCLUDING ALL IMPLIED WARRANTIES AND CONDITIONS OF MERCHANTABILITY, FITNESS FOR ANY PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL TZWORKS LLC BE LIABLE FOR ANY KIND OF DAMAGE RESULTING FROM ANY CAUSE OR REASON, ARISING OUT OF IT IN CONNECTION WITH THE USE OR PERFORMANCE OF INFORMATION AVAILABLE FROM THIS SOFTWARE, INCLUDING BUT NOT LIMITED TO ANY DAMAGES FROM ANY INACCURACIES, ERRORS, OR VIRUSES, FROM OR DURING THE USE OF THE SOFTWARE.

Removal

The Software are the original works of TZWorks LLC. However, to be in compliance with the Digital Millennium Copyright Act of 1998 ("DMCA") we agree to investigate and disable any material for infringement of copyright. Contact TZWorks LLC at email address: info@tzworks.com, regarding any DMCA concerns.


About the ccp Tool (top)

The Chromium Cache Parser (ccp) targets various caches associated with Chromium-based browsers, or browsers that use the cache component of Chromium. This tool addresses parsing caches in these browsers (at least the later versions of these browsers): Google Chrome, Microsoft Edge, Opera, Brave, Vivaldi, and others.

Browser cache files contain useful information for the examiner. The ccp tool is not unique, in that there are other Chromium-based cache parsers available; a few are even free. This tool was primarily created based on a need to provide more insight into the association of the cache metadata (eg. timestamps, URL, http request/response, etc) and cache content data (eg. data for the webpage that is displayed), especially when applied to the various cache types that Chromium can use. In addition, and from a tool developer's standpoint, the ccp codebase can be used as framework for future prototyping work to evaluate Chromium cache artifact data that may be corrupted or fragmented.

As background, the Chromium cache, is a repository for web data a user has viewed or downloaded. In general, the purpose of the cache is to store data locally, and thus allow the browser quick access for later requests to a previously viewed website. The cache includes: website pages, files, scripts, images and other items that were viewed by a user or data that the browser needed to use. In addition to the raw data that was received from a web server, the cache also contains useful metadata associated with each item. From the point of view of the forensic examiner the cache provides insights to the user's Internet usage, since it contains items such as: the URL of the webpage, number of times the page was fetched from the cache, filename/type/size, last modified time, last fetched time, server time, etc. Having a tool available that can take advantage of this artifact data is necessary to have insights into the user's activity.

The Chromium cache can consist of various cache types; each cache type can be determined by where it is located in the Chromium directory structure. These various types of cache include: normal Cache data, CacheStorage type data, Code Cache, and ScriptCache, to name a few. A listing of all the cache types is shown below and is taken from the Chromium project's documentation. The ccp tool has only been tested on a few of these types, primarily due to the limited test samples available.

Cache Type Meaning
DISK_CACHE Disk is used as the backing storage
MEMORY_CACHE Data is stored only in memory
REMOVED_MEDIA_CACHE No longer in use
APP_CACHE Special case of DISK_CACHE. cache storage, service worker script cache
SHADER_CACHE Backing store for the GL shader cache
PNACL_CACHE Backing store for the Portable native client translation cache
GENERATED_BYTE_CODE_CACHE Backing store for renderer generated data like bytecode for JavaScript
GENERATED_NATIVE_CODE_CACHE Backing store for renderer generated data native code for WebAssembly
GENERATED_WEBUI_BYTE_CODE_CACHE Backing store for renderer generated data bytecode for JavaScript from WebUI pages

As a side note, not-withstanding that the above are the cache types, when the cache location is different than the traditional cache location (Cache_Data), this tool will identify the cache based on the location of where the cache was found. For example, if the cache was in the scriptcache section, it will be labelled as 'scriptcache', or GPUCache if in the GPUCache section, etc. Below is the how the ccp tool maps how it labels it cache type to where the cache is located.

Cache Type labeled in ccp tool location
disk cache ../Cache/Cache_Data/..
storage cache ../Service Worker/CacheStorage/..
code cache ../Code Cache/..
code cache ../Service Worker/ScriptCache/..
GPU cache ../GPUCache/..

Details about the Chromium cache types as well as the locations of the cache artifacts are discussed in the User's Guide or in the references at the end of this document.


How to use this Tool (top)

    Usage

      ccp -index <index cache file>  [options]
      ccp -simple <simple cache file>  [options]
      ccp -enumdir <folder> -num_subdirs <# levels> [options]
      dir <folder> /b /s /a | ccp -pipe

     Format Options
      -csv                            = output is CSV format
      -csvl2t                         = log2timeline output
      -dateformat mm/dd/yyyy          = "yyyy-mm-dd" is the default
      -timeformat hh:mm:ss            = "hh:mm:ss.xxx" is the default
      -no_whitespace                  = remove whitespace between delimiter
      -csv_separator "|"              = change delimiter to a pipe char
      -base10                         = use base10 numbers

     File Output Opions (some format opts above do not apply to SQLite)
      -sqlite <output db>             = *** create/put results in SQLite db
      -out <output file>              = put results in text delimited file

     Folder Traversing Options
      -pipe                           = pipe files to parse
      -enumdir <dir> -num_subdirs <#> = pull from files from folder

To process cache files, one can either target a block-size set of cache file or individual simple-type cache files. The tool will automatically determine which version of the format the cache files are in and adjust the parsing engine accordingly. In fact, when parsing many subdirectories of artifacts where each subdirectory is a different account or machine, the tool will dynamically adjust for the version of the cache being parsed at that time and keep the mapping of the cache metadata to the cache content data sorted.

Targeting Specific Cache files

If one wants to target a specific block cache set of files, one can use the -index option. This was included in the options since it was needed during the debugging of the ccp tool. With this option, it is useful to analyze one type of file at a time to help debug what is going on within the parsing engine. In the example below, we are targeting a block-size cache files . The results would be rendered in the test1.csv file.

    > ccp64 -index .\cache_test1\index -out test1.csv

Processing Cache Files in one or more Subdirectories

If desiring to process many Mozilla cache files in one pass, one can make use of ccp's piping option (-pipe) or the folder enumeration option (-enumdir). Either of these options allow one to target multiple subdirectories during the parsing operation. Below is a simple way to target the cache files in a Chromium-based browser.

    > dir c:\Users /b /s /s | ccp64 -pipe -out test3.csv

        or

    > ccp64 -enumdir c:\Users -num_subdirs 15 -out test3.csv

Archiving the Content Data

With the default option, the tool sends the parsed output to delimited text. This is fine when only wanting the results associated with the metadata such as URLs visited, timestamps of the visit, etc. If desiring to archive the content data as well, then one would run the tool with the -sqlite <db_name> option, which then tell ccp to create a database table for the metadata and a database table for the content data.

To view the results, one will need to be familiar with the SQL syntax to query the database, or alternatively, will need a separate SQLite viewer to look at the data. A good SQLite viewer is the "DB Browser for SQLite" and a reference is located at the end of this document.

The database schema created by ccp consists of 4 tables: (a) cache_metadata_entries, (b) cache_ctxdata_entries, (c) metadata and (d) ref. Only the first two tables have the records from the parsed metadata and content data, respectively. The metadata table is used to record the session parameters used when running the parser. The last table (ref) is used internally by ccp for bookkeeping only.


Limitations (top)

Areas to be noted
The tool is still prototype in nature being that this is the first version released. It still needs to be tested against various types of files, corrupted files, etc. to ensure the tool can behalf consistently.
The folder enumeration of the cache file option relies on the Chromium directory structure as well as the naming convention used by Chromium. Therefore, if either of these things are changed by Chromium or if changed by a user, the parsing engine will have unpredictable results or no results at all
There are a couple of parsing engines within this tool; which engine is used is a function of the Chromium name convention used for the cache file

List of options (top)

Option Description
-csv Outputs the data fields delimited by commas. Since filenames can have commas, to ensure the fields are uniquely separated, any commas in the filenames get converted to spaces.
-csvl2t Outputs the data fields in accordance with the log2timeline format.
-sqlite Outputs the data into a SQLite database. The syntax is: -sqlite <db name to create or use>.
-pipe Used to pipe files into the tool via STDIN (standard input). Each file passed in is parsed in sequence.
-enumdir Experimental. Used to process files within a folder and/or subfolders. Each file is parsed in sequence. The syntax is -enumdir <"folder"> -num_subdirs <#>.
-filter Filters data passed in via stdin via the -pipe option. The syntax is -filter <"*.ext | *partialname* | ...">. The wildcard character '*' is restricted to either before the name or after the name.
-no_whitespace Only applies to -csv and -csvl2t options. Used in conjunction with -csv option to remove any whitespace between the field value and the CSV separator.
-csv_separator Only applies to -csv and -csvl2t options. Used in conjunction with the -csv option to change the CSV separator from the default comma to something else. Syntax is -csv_separator "|" to change the CSV separator to the pipe character. To use the tab as a separator, one can use the -csv_separator "tab" OR -csv_separator "\t" options.
-dateformat Output the date using the specified format. Default behavior is -dateformat "yyyy-mm-dd". Using this option allows one to adjust the format to mm/dd/yy, dd/mm/yy, etc. The restriction with this option is the forward slash (/) or dash (-) symbol needs to separate month, day and year and the month is in digit (1-12) form versus abbreviated name form.
-timeformat Output the time using the specified format. Default behavior is -timeformat "hh:mm:ss.xxxxxxxxx" One can adjust the format to microseconds, via "hh:mm:ss.xxxxxx" or milliseconds, via "hh:mm:ss.xxx", or no fractional seconds, via "hh:mm:ss". The restrictions with this option is that a colon (:) symbol needs to separate hours, minutes and seconds, a period (.) symbol needs to separate the seconds and fractional seconds, and the repeating symbol 'x' is used to represent number of fractional seconds.
-quiet Show no progress during the parsing operation
-utf8_bom All output is in Unicode UTF-8 format. If desired, one can prefix an UTF-8 byte order mark to the CSV output using this option.

CSV field definitions (top)

Field Definition
type Cache version number
url_hash SHA1 hash of the URL contained in the metadata. This is a computed value by ccp. This hash should be equivalent to the filename for those cache versions that show a SHA1 hash for the name.
url_etag The HTTP etag that was present in the HTTP response
request_type_reply_status HTTP request type (eg. GET, POST), and reply status (eg. HTTP/1.1 200 OK)
serv_name Server name recorded in the HTTP Response
serv_timezone Server time zone
serv_date Server timestamp included in the HTTP Response
serv_modify Server modify timestamp included in the HTTP Response
serv_expire Server expire timestamp included in the HTTP Response
browser_fetch_utc Browser - last time the cache was fetched
browser_modify_utc Browser modify timestamp associated with the cache
browser_expire_utc Browser expire timestamp associated with the cache
content_create_utc Actual content data file create timestamp. This is only present if the content file is a separate file. For Linux and OSX, this is the status change timestamp
content_modify_utc Actual content data file modify timestamp. This is only present if the content file is a separate file.
fetch_count Number of times the cache was fetched
url URL of the webpage visited
url_params Any URL parameters used. This is formatted as JSON.
content_type The content data type (eg. GIF, JPEG, text, etc) extracted from the HTTP response
content_filename Last part of the URL prior to the URL parameters extracted from the HTTP response
content_encoding The encoding used on the content data (eg. gzip, br, etc) extracted from the HTTP response
content_size Size of the content data extracted from the HTTP response
content_location_info The file and offset (if not zero) within the file where the content data is located. This is formatted as JSON.
extra_fields The key/value pairs extracted from the HTTP response. This is formatted as JSON.
file The original path/file containing the metadata

Authentication and License File (top)

This tool has authentication built into the binary. The primary authentication mechanism is the digital X509 code signing certificate embedded into the binary (Windows and macOS).

The other mechanism is the runtime authentication, which applies to all the versions of the tools (Windows, Linux and macOS). The runtime authentication ensures that the tool has a valid license. The license needs to be in the same directory of the tool for it to authenticate. Furthermore, any modification to the license, either to its name or contents, will invalidate the license.

Limited versus Demo versus Full in the tool's output banner

The tools from TZWorks will output header information about the tool's version and whether it is running in limited, demo or full mode. This is directly related to what version of a license the tool authenticates with. The limited and demo keywords indicates some functionality of the tool is not available, and the full keyword indicates all the functionality is available. The lacking functionality in the limited or demo versions may mean one or all of the following: (a) certain options may not be available, (b) certain data may not be outputted in the parsed results, and (c) the license has a finite lifetime before expiring.


Version history (top)


References (top)

  1. Chromium Design documents [https://www.chromium.org/developers/design-documents]
  2. Chromium disk cache [https://www.chromium.org/developers/design-documents/network-stack/disk-cache]
  3. Chromium simple cache [https://www.chromium.org/developers/design-documents/network-stack/disk-cache/very-simple-backend]
  4. SQLite library statically linked into tool [Amalgamation of many separate C source files from SQLite version 3.32.3].
  5. SQLite documentation [http://www.sqlite.org].
  6. DB Browser for SQLite [http://sqlitebrowser.org/]