TZWorks LLC
System Analysis and Programming
www.tzworks.com


TZWorks®
Chromium SQLite Parser - csp

(Version 0.27)



Information about our End User's License Agreements (EULAs)
for software on TZWorks, LLC Website www.tzworks.com

User Agreement

TZWorks LLC software and related documentation ("Software") is governed by separate licenses issued from TZWorks LLC. The User Agreement, Disclaimer, and/or Software may change from time to time. By continuing to use the Software after those changes become effective, you agree to be bound by all such changes. Permission to use the Software is granted provided that (1) use of such Software is in accordance with the license issued to you and (2) the Software is not resold, transferred or distributed to any other person or entity. Refer to your specific EULA issued to for your specific the terms and conditions. There are 3 types of licenses available: (i) for educational purposes, (ii) for demonstration and testing purposes and (iii) business and/or commercial purposes. Contact TZWorks LLC (info@tzworks.com) for more information regarding licensing and/or to obtain a license. To redistribute the Software, prior approval in writing is required from TZWorks LLC. The terms in your specific EULA do not give the user any rights in intellectual property or technology, but only a limited right to use the Software in accordance with the license issued to you. TZWorks LLC retains all rights to ownership of this Software.

Export Regulation

The Software is subject to U.S. export control laws, including the U.S. Export Administration Act and its associated regulations. The Export Control Classification Number (ECCN) for the Software is 5D002, subparagraph C.1. The user shall not, directly or indirectly, export, re-export or release the Software to, or make the Software accessible from, any jurisdiction or country to which export, re-export or release is prohibited by law, rule or regulation. The user shall comply with all applicable U.S. federal laws, regulations and rules, and complete all required undertakings (including obtaining any necessary export license or other governmental approval), prior to exporting, re-exporting, releasing, or otherwise making the Software available outside the U.S.

Disclaimer

The user agrees that this Software made available by TZWorks LLC is experimental in nature and use of the Software is at user's sole risk. The Software could include technical inaccuracies or errors. Changes are periodically added to the information herein, and TZWorks LLC may make improvements and/or changes to Software and related documentation at any time. TZWorks LLC makes no representations about the accuracy or usability of the Software for any purpose.

ALL SOFTWARE ARE PROVIDED "AS IS" AND "WHERE IS" WITHOUT WARRANTY OF ANY KIND INCLUDING ALL IMPLIED WARRANTIES AND CONDITIONS OF MERCHANTABILITY, FITNESS FOR ANY PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL TZWORKS LLC BE LIABLE FOR ANY KIND OF DAMAGE RESULTING FROM ANY CAUSE OR REASON, ARISING OUT OF IT IN CONNECTION WITH THE USE OR PERFORMANCE OF INFORMATION AVAILABLE FROM THIS SOFTWARE, INCLUDING BUT NOT LIMITED TO ANY DAMAGES FROM ANY INACCURACIES, ERRORS, OR VIRUSES, FROM OR DURING THE USE OF THE SOFTWARE.

Removal

The Software are the original works of TZWorks LLC. However, to be in compliance with the Digital Millennium Copyright Act of 1998 ("DMCA") we agree to investigate and disable any material for infringement of copyright. Contact TZWorks LLC at email address: info@tzworks.com, regarding any DMCA concerns.


About the csp Tool (top)

Google's Chrome Browser (called Chrome in this document) has many artifacts available that the forensics examiner can use in identifying a user's Internet activity. This includes Chrome's various databases, local storage, JSON formatted text files, and cache.

As a disclaimer, this tool does not target all of Chrome's artifacts; it only targets certain SQLite databases and specific tables within those databases that are used by the Browser that have been deemed useful by the forensics community. Specifically, this tool only targets these seven databases: (a) History, (b) Cookies, (c) Web Data, (d) Top Sites, (e) Shortcuts, (f) Login Data and (g) Favicons.

One should also be aware that the database schemas (which identifies the fields and their respective type that comprises of a record) do vary across certain Chrome versions. This is something that needed to be accounted for when designing the csp tool. The approach that was used was to have the tool dynamically detect and adjust to varying schemas as they are encountered during the parsing operation. This makes for a more complex algorithm and therefore done on a best effort basis.

In addition to the auto-schema detection, the csp tool offers three ways to parse a target database. (1) The first way makes use of a standard SQL (Structured Query Language) parse. This is internal to the tool, so no SQL knowledge from the user is required. For this option, the SQLite library was statically linked into the tool, so no outside third-party libraries are required to run the tool. (2) The second approach tells the tool to traverse the internal SQLite structures and parse out the records as they are encountered. This option does not use the standard SQLite library, but uses the TZWorks' internally designed libraries. The benefit of 'rolling your own' library is multi-faceted; not only does it allow the tool to pull out records from a corrupted database, but one can record the exact offset of the data where it was found, if desiring to validated it later with a hex-editor. (3) The third, and final approach, uses a signature-based parse. This turns out to be a unique way in parsing out records and the beauty is it makes no use of the internal SQLite table structures. Consequently, this latter option allows one to parse records from a blob of data whether from memory or from a fragment of a database. All three approaches were left in the tool for the analyst to use, since there are benefits (and disadvantages) for each option depending on the situation.


How to use this Tool (top)

Chrome artifacts are located in the user's directory. This varies depending on the operating system used. Below is a table that breaks out the location by OS.

OS Path
Win XP C:\Documents and Settings\<username>\Local Settings\Application Data\Google\Chrome\User Data\Default
Post Win XP C:\Users\<username>\AppData\Local\Google\Chrome\User Data\Default
OSX /Users/{user}/Library/Application Support/Google/Chrome/Default
Linux /home/{user}/.config/google-chrome/Default

The semantics to run this tool just requires one to use the -db option and pass in the path/file of the Chrome SQLite database to parse. The screenshot shows all the options available. Details of each option can be found here.

    Usage

      csp64 -db <Chromium db> [options]
      dir <location chrome db> /b /s | csp64 -pipe [options]
      cps64 -enumdir <location chrome dbs> -num_subdirs <#> [options]

     Basic options
      -csv                            = output in CSV format
      -csvl2t                         = log2timeline output
      -bodyfile                       = sleuthkit output

     Additional options
      -username <name>                = for -csvl2t output
      -hostname <name>                = for -csvl2t output
      -csv_separator "|"              = use a pipe char for csv separator
      -dateformat mm/dd/yyyy          = "yyyy-mm-dd" is the default
      -timeformat hh:mm:ss            = "hh:mm:ss" is the default
      -no_whitespace                  = remove whitespace around csv delimiter
      -pipe                           = pipe db's into tool for processing
      -quiet                          = no progress shown

     Experimental options
      -carve [-incl_slack]            = bypass SQLite lib during parse
      -parse_chunk                    = requires at least 1 db page

     Testing options
      -no_table_merge                 = output without merging tables
      -verify [-add_comments]         = generate stats on parsing [temporary]

Below is an example of running the tool in its simplest form. Without explicitly setting any options, the tool will use will default to the SQL Select-type parser. The parsed output will dump to the screen, unless one redirects the output to a file.

    c:\> csp64 -db c:\dump\History -csv > out.csv

To process multiple databases one would use the -pipe option while enumerating a folder and subfolder of databases, like so:

    c:\> dir c:\dump\chrome_dbs /b /s /a | csp64 -pipe -carve -csvl2t > out.csv

In the above example, we also added the -carve option, to bypass the normal SQLite libraries and use the TZWorks internal libs to parse the database structure and extract records.


Integraged Parsing Algorithms (top)

The csp tool offers three possible parsing algorithms to choose from. They are as follows:
  1. Default option. Uses the internal SQLite library that is statically linked into the tool to perform a SQL-Select statement on the database under analysis. This option is sensitive to corrupt databases and will not parse out records from unused or slack space.
  2. Carve option. (-carve). This uses a TZWorks based set of algorithms to traverse the SQLite data structures to parse the records in the database. It relies on the database's schema and internal tree-based structures to find the data. This option appears to work fine even if the database cannot be opened via the standard SQLite library. When corruption is present, this option tries to skips bad records and attempts to go to the next one. This option also looks at unused space (or free pages) for any records that may be present using the -incl_slack option.
  3. Signature-base option. (-parse_chunk). This does not make use of the SQLite schema or tree-based structures in the database to locate records. Instead, it looks for certain signatures to locate records and parse them. Empirical testing has shown this approach works from either a good database, corrupted database or a partial blob of a database. While this option can pull valid records, it truncates the data when a record spans multiple SQLite-pages. For any records that are truncated, the output will be annotated with a flag identifying it as such.

Algorithms and their Pros/Cons

The benefit of the default option is its usefulness for verification and validation purposes. Since the format of the output records generated by the three parsing engines is the same, one can easily verify whether the carve option and/or signature-based option works, simply by comparing the results to that of the default SQL-Select option.

In most cases, the carve option (-carve) is a better choice over the default option, simply because is returns the same, if not more, results. The carve parsing option also incorporates the logic to sense unused space and switches to a signature-based scan for those areas using the -incl_slack option.

Surprisingly, the signature-base option (-parse_chunk) competes very well with the other two options. Keep in mind, this option strictly relies on unique signatures being accurate for its success. Whereas the other two options can dynamically adjust their parsing engine based on the schema identified in the database, the signature-based option does not. While sometimes the signature-based option extracts more records than the other options, one is cautioned, that more records do not necessarily mean accurate data. If, for example, one passes in a file that contains the contents of a disk volume, with the intent of extracting all the Chrome artifacts from that image, then the results could be disappointing in that one may get many false positives on certain table records. The csp tool does a good job statically at pulling out table entries that have many fields versus those tables that only have a few fields. Therefore, certain table entries will have less false positives than others.

The other issue to consider with the signature-base option is the merging operation from data in one table to another table (based on some relationship between the tables) may or may not make sense. For example, if a timestamp from one table is merged with data from another table, and the data is not in sync (from a chronological point of view), then the resulting merged record will mislead the investigator of an event's occurrence time-wise. The other pitfall with the signature-based scans, which was mentioned earlier, is this approach will truncate the data if a record overflows into multiple databases pages; the signature-based scan will only report on data found in the initial page.

To handle the data accuracy issue, refer to the section on "Merging of Data between Tables" in the user's guide. Finally, leaving on a positive, note, the signature-based parse is the only choice if analyzing partial chunks of database fragments, whether from memory or disk images.


Type Designations used in the Output (top)

In the output there is a column that identifies records by a type. These types are listed below along with where the data comes from.

Record Type Table(s) where the data resides Database where the table(s) reside
Autofill autofill Web Data
Cookie cookie Cookie
Download downloads, download_url_chains History
Favicon favicons, icon_mapping, favicon_bitmaps Favicons
Login logins Logins
Shortcut omni_box_shortcuts Shortcuts
Thumbnail thumbnails Top Sites
Url urls, visits, keyword_search_terms, visit_source History
Bounces bounces DIPS
nel_policies nel_policies Reporting and NEL

In addition to the record types shown above, there are some cases were the type is supplemented with some extra words.


Verification and Validation (top)

All tools need to do some form of verification to ensure their results are accurate. Part of that testing is to validate the tool's functionality across different artifact versions. If the tool developer can automate this testing, then it allows the developer to test the tool across many datasets quickly. This in turn quickly identifies inconsistencies and problems so a wide range of bugs can be diagnosed and fixed.

Normally, the developer tries to do as much of this testing before sending a tool out to clients. In the case of Chrome, however, since it has a history of changing the schema's across versions such that they are not backwards compatible, we decided to temporarily add an option for clients run this type of verification on their own, if they so choose. To this end, the csp tool incorporates the -verify option to aid in this purpose.

The -verify option internally invokes all three parsing engines in sequence to parse the same database so it can compare the results of all three. Simplistically, if all the results match, then the confidence is very high the tool is working as designed. If the results do not match, it will be because a version of Chrome is being analyzed where the tool may work on one of parsing engines, but not the others. The first parsing engine most likely to have problems will be the signature-based parsing, since it more sensitive to schema changes. In contrast, the default SQL-Select type parsing engine should be the most robust if there are schema changes, because it will key off of specific field names, which typically are more consistent across versioning. Either way, the purpose of the -verify option is an internal test to alert a user if any issues are found.

The nice thing about the way this option was implemented, is not only does it check the internal parsers against themselves, but it also outputs critical diagnostic data that can be used by TZWorks to help improve the tool. To ensure no personal information is outputted, the -verify option sanitizes the results so it contains no, or unintelligible, private/confidential information from the raw artifact. The output only contains metadata from the SQLite internal structures. This causes the data generated to be cryptic and only useful for machine type learning/statistics. An additional sub-option was added (-add_comments) to annotate some additional commentary to the results; this provides some extra information for the user if a test passed or failed and why.

If one wants to test an entire collection of SQLite artifacts from Chrome, one can run this option with the -pipe< switch and point it at the folder/subfolders of Chrome database artifacts. The tool will process all the database/records it finds and incorporate the results into a final report. Below is an example of doing this.

    e:\ dir e:\testcase\sqlite\google\tests /b /s | csp64 -pipe -verify -all_comments > results.txt

As mentioned earlier, the data produced is mostly cryptic since it contains statistical information about the database and records being parsed. This statistical information, if sent back to TZWorks, will help us improve our parsing engines for future releases.


List of options (top)

Option Description
-db Specifies which database file to act on. The format is: -db <SQLite db to parse>.
-csv Outputs the data fields delimited by commas. Since filenames can have commas, to ensure the fields are uniquely separated, any commas in the filenames get converted to spaces.
-csvl2t Outputs the data fields in accordance with the log2timeline format.
-bodyfile Outputs the data fields in accordance with the 'body-file' version3 specified in the SleuthKit. The date/timestamp outputted to the body-file is in terms of UTC. So if using the body-file in conjunction with the mactime.pl utility, one needs to set the environment variable TZ=UTC.
-username Option is used to populate the output records with a specified username. The syntax is -username <name to use> .
-hostname Option is used to populate the output records with a specified hostname. The syntax is -hostname <name to use>.
-pipe Used to pipe files into the tool via STDIN (standard input). Each file passed in is parsed in sequence.
-enumdir Experimental. Used to process files within a folder and/or subfolders. Each file is parsed in sequence. The syntax is -enumdir <"folder"> -num_subdirs <#>.
-filter Filters data passed in via stdin via the -pipe option. The syntax is -filter <"*.ext | *partialname* | ...">. The wildcard character '*' is restricted to either before the name or after the name.
-no_whitespace Used in conjunction with -csv option to remove any whitespace between the field value and the CSV separator.
-csv_separator Used in conjunction with the -csv option to change the CSV separator from the default comma to something else. Syntax is -csv_separator "|" to change the CSV separator to the pipe character. To use the tab as a separator, one can use the -csv_separator "tab" OR -csv_separator "\t" options.
-dateformat Output the date using the specified format. Default behavior is -dateformat "yyyy-mm-dd". Using this option allows one to adjust the format to mm/dd/yy, dd/mm/yy, etc. The restriction with this option is the forward slash (/) or dash (-) symbol needs to separate month, day and year and the month is in digit (1-12) form versus abbreviated name form.
-timeformat Output the time using the specified format. Default behavior is -timeformat "hh:mm:ss.xxxxxxxxx" One can adjust the format to microseconds, via "hh:mm:ss.xxxxxx" or milliseconds, via "hh:mm:ss.xxx", or no fractional seconds, via "hh:mm:ss". The restrictions with this option is that a colon (:) symbol needs to separate hours, minutes and seconds, a period (.) symbol needs to separate the seconds and fractional seconds, and the repeating symbol 'x' is used to represent number of fractional seconds.
-quiet Show no progress during the parsing operation
-carve Experimental option. Bypass the SQLite embedded library and parse using TZWorks internal algorithms. This is for the situation where the database to be parsed is corrupted and the SQLite library has trouble parsing it.
-incl_slack Experimental option to look at unused space to see if any records are present. Not required with the -parse_chunk option. Use this in conjunction with -carve option to look for deleted records.
-parse_chunk Experimental option. Given a portion (chunk) of the database this option will examine the data to see if any records exist and parse out the contents. This is a signature-based parse so it can parse out records from chunks of memory or slack space (in the form of a file).
-no_table_merge This option is for pulling records from a image. It is also used for testing and debugging purposes. If you want to see all the tables that were parsed without merging any relationships, use this option.
-verify This option is for testing and debugging purposes only. This option runs all 3 parsing engines in the tool (SQL Select parse, Carve parse and Signature-based parse) and reports whether the parsers work at least up to the level of the SQL Select parse. Metadata is generated that can be used to help develop more robust parsing algorithms.
-utf8_bom All output is in Unicode UTF-8 format. If desired, one can prefix an UTF-8 byte order mark to the CSV output using this option.

Authentication and License File (top)

This tool has authentication built into the binary. The primary authentication mechanism is the digital X509 code signing certificate embedded into the binary (Windows and macOS).

The other mechanism is the runtime authentication, which applies to all the versions of the tools (Windows, Linux and macOS). The runtime authentication ensures that the tool has a valid license. The license needs to be in the same directory of the tool for it to authenticate. Furthermore, any modification to the license, either to its name or contents, will invalidate the license.

Limited versus Demo versus Full in the tool's output banner

The tools from TZWorks will output header information about the tool's version and whether it is running in limited, demo or full mode. This is directly related to what version of a license the tool authenticates with. The limited and demo keywords indicates some functionality of the tool is not available, and the full keyword indicates all the functionality is available. The lacking functionality in the limited or demo versions may mean one or all of the following: (a) certain options may not be available, (b) certain data may not be outputted in the parsed results, and (c) the license has a finite lifetime before expiring.


Version history (top)


References (top)

  1. https://cs.chromium.org/chromium/src/docs
  2. https://chromium.googlesource.com/chromium/chromium/+/df261d32079bc4e1160c36200657eed26fad5961/content/public/common/page_transition_types_list.h
  3. https://developers.google.com/analytics/devguides/collection/analyticsjs/cookie-usage?csw=1
  4. Evolution of Chrome Databases, by Ryan Benson, https://dfir.blog/chrome-evolution
  5. Hindsight, by Ryan Benson, https://www.obsidianforensics.com/hindsight
  6. SQLite library statically linked into tool [Amalgamation of many separate C source files from SQLite version 3.32.3].
  7. SQLite documentation [http://www.sqlite.org]