TZWorks LLC
System Analysis and Programming
www.tzworks.com


TZWorks®
$MFT and $LogFile Analysis - mala

(Version 0.24)



Information about our End User's License Agreements (EULAs)
for software on TZWorks, LLC Website www.tzworks.com

User Agreement

TZWorks LLC software and related documentation ("Software") is governed by separate licenses issued from TZWorks LLC. The User Agreement, Disclaimer, and/or Software may change from time to time. By continuing to use the Software after those changes become effective, you agree to be bound by all such changes. Permission to use the Software is granted provided that (1) use of such Software is in accordance with the license issued to you and (2) the Software is not resold, transferred or distributed to any other person or entity. Refer to your specific EULA issued to for your specific the terms and conditions. There are 3 types of licenses available: (i) for educational purposes, (ii) for demonstration and testing purposes and (iii) business and/or commercial purposes. Contact TZWorks LLC (info@tzworks.com) for more information regarding licensing and/or to obtain a license. To redistribute the Software, prior approval in writing is required from TZWorks LLC. The terms in your specific EULA do not give the user any rights in intellectual property or technology, but only a limited right to use the Software in accordance with the license issued to you. TZWorks LLC retains all rights to ownership of this Software.

Export Regulation

The Software is subject to U.S. export control laws, including the U.S. Export Administration Act and its associated regulations. The Export Control Classification Number (ECCN) for the Software is 5D002, subparagraph C.1. The user shall not, directly or indirectly, export, re-export or release the Software to, or make the Software accessible from, any jurisdiction or country to which export, re-export or release is prohibited by law, rule or regulation. The user shall comply with all applicable U.S. federal laws, regulations and rules, and complete all required undertakings (including obtaining any necessary export license or other governmental approval), prior to exporting, re-exporting, releasing, or otherwise making the Software available outside the U.S.

Disclaimer

The user agrees that this Software made available by TZWorks LLC is experimental in nature and use of the Software is at user's sole risk. The Software could include technical inaccuracies or errors. Changes are periodically added to the information herein, and TZWorks LLC may make improvements and/or changes to Software and related documentation at any time. TZWorks LLC makes no representations about the accuracy or usability of the Software for any purpose.

ALL SOFTWARE ARE PROVIDED "AS IS" AND "WHERE IS" WITHOUT WARRANTY OF ANY KIND INCLUDING ALL IMPLIED WARRANTIES AND CONDITIONS OF MERCHANTABILITY, FITNESS FOR ANY PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL TZWORKS LLC BE LIABLE FOR ANY KIND OF DAMAGE RESULTING FROM ANY CAUSE OR REASON, ARISING OUT OF IT IN CONNECTION WITH THE USE OR PERFORMANCE OF INFORMATION AVAILABLE FROM THIS SOFTWARE, INCLUDING BUT NOT LIMITED TO ANY DAMAGES FROM ANY INACCURACIES, ERRORS, OR VIRUSES, FROM OR DURING THE USE OF THE SOFTWARE.

Removal

The Software are the original works of TZWorks LLC. However, to be in compliance with the Digital Millennium Copyright Act of 1998 ("DMCA") we agree to investigate and disable any material for infringement of copyright. Contact TZWorks LLC at email address: info@tzworks.com, regarding any DMCA concerns.


Introduction (top)

The Windows NTFS file system has a transactional architecture that is used to ensure that the operating system can recover from a crash into a known good state. Aside from the NTFS file system kernel driver failing, Windows does a good job at maintaining data consistency after critical failures that cause the system to shut down unexpectedly. Specifically, NTFS logs file transactions when:

To achieve this level of reliability, Windows NTFS employs a journaling technique that records a sequence of file changes in the $LogFile. After the sequence of operations is completed, the operating system commits the changes and the transaction is done. In this way, if the system should crash prior to a transaction being committed to disk, the system can read the sequence of changes from the $LogFile and then perform (if necessary) any 'undo' operations to get the system into a known good, stable state.

From a forensic standpoint, analyzing the $LogFile can yield a chronological list of historical transactions that were done. The $LogFile is fixed size, so once it is filled, additional data is wrapped and the old data overwritten with new transactions. Depending on the frequency of file changes made on a system the number of historical transactions will vary. The size of the $LogFile is typically 64 MB for a volume, however, it can be resized based on need. Using the standard default size and normal usage, one should expect a few hours of activity recorded in a $LogFile. This time estimate, is highly subjective and will vary depending on the frequency of the file system changes.

To determine the size of the $LogFile on a volume, type the following:

    > chkdsk /L

To adjust the size of the $LogFile on a volume, use the following. The example adjusts the c: volume to 128MB, the value is in terms of KB:

    > chkdsk c: /L:131072

When looking at what NTFS calls a transaction and how it translates to the records in the $LogFile, one sees that there are multiple operations that occur in sequence, and when combined together, are labeled a transaction. mala looks at all the records/operations and pieces together which records are chained together to form a single transaction.

Unlike other artifacts used in forensics, the $LogFile does not have a timestamp embedded into its normal record's data structure. This makes it difficult for the forensic analyst to try to correlate the time when a transaction occurred. However, one can infer time by looking at of the combination of records that comprise timestamp within its payload data. One can also infer time by looking at the inode records that match the $LogFile records logical sequence numbers.

For example, if any of the records in a sequence contains a payload of $UsnJrnl change log data, then one can parse that payload data and pull the timestamp embedded in the change log record. This is only because the $UsnJrnl change log data has a timestamp as part of its metadata. If this metadata is contained within a $LogFile record, then it can be extracted and parsed, giving the time inferred for the $LogFile transaction. Further, if one parses the $MFT file records in parallel with the $LogFile records and if there happens to be an entry in the $MFT file that matches the $LogFile record in question, one can pull the latest timestamp from the $MFT attributes. In this way, one can estimate the time the log entry was made.

To aid the analyst in this inference of time, mala tries to do this on a best effort basis and reports an 'extrapolated timestamp' for each record. Internally, mala keeps track of the last time that was reported (either via a $UsnJrnl entry or a $MFT entry) and reports that time for the next log sequence number (LSN) record reported. This extrapolated time is just a guess or estimate, even though the precision is shown in the 100 nano second resolution. Sometimes this estimate is very accurate when a change log entry was recorded and sometimes it is not very accurate at all. The latter case happens when a large number of log records have passed without a timestamp being observed to infer during the parse operation. Once a timestamp is found, the accuracy is again good until the next timestamp found. As an indicator to the analyst, the more closely aligned timestamps are shown in the 'ref' field of the report.


How to use this Tool (top)

The current options for the mala tool are shown below:
    Usage
      mala -log <logfile> [-mftfile <mft> [-showall_inodes]] [other options]
       -mftfile <file>        = [optional] use specified MFT for analysis
       -showall_inodes        = [optional] used with -mftfile - show all inodes

     Basic options
      -csv                    = output in CSV format

     Additional options
      -base10                 = use base10 for file size instead of hex
      -dateformat mm/dd/yyyy  = "yyyy-mm-dd" is the default
      -timeformat hh:mm:ss    = "hh:mm:ss" is the default
      -csv_separator "|"      = use a pipe char for csv separator
      -no_whitespace          = remove whitespace around csv delimiter
      -quiet                  = dont show any progress
 

The required syntax is to pass in a $LogFile via the -log <file> option as shown below. The syntax below shows an -out <file> parameter, but one can redirect the output to any file as well. As a side note, if you are using Windows PowerShell instead of a command prompt, then we recommend the use of use single quotes around any path/filename that contain a '$' character.

mala64 -log 'c:\test\$LogFile' -csv -out results.csv

To gain more context information, one can also pass in the companion $MFT file via the -mftfile <file> option. The tool will merge the parsed $MFT data into the $LogFile data to generate more complete records

mala64 -log 'c:\test\$LogFile' -mftfile 'c:\test\$MFT' -csv -out results.csv

When multiple artifact files are used in the analysis (eg. using both $LogFile and $MFT files), mala will spawn multiple threads to handle each artifact in parallel so as to parse the data quickly. The resulting output from each thread will then be combined into one output file.

Parsing with only the $LogFile for analysis

While it is preferable to use both the $LogFile and $MFT artifact files to maximize the usefulness of the report generated by mala, some use-cases only contain corrupted or partial $MFT files. If the corruption of the $MFT file is sufficient to prevent the mala tool from parsing it, then running mala with just the $LogFile still provides useful, albeit, degraded results. To explain this in more detail, one needs to analyze the differences in the reporting of running mala with just using the $LogFile to that of running it in conjunction with the $MFT file. The main issues without using the $MFT file in the analysis are: (a) the path for the target file being created, changed or deleted is not present; and (b) any logical sequence number matches between the $LogFile records and $MFT records are not available which consequently results in less possible context data in the output. There are other artifact data that is lost as well, but the ones listed above are the main ones.

Aside from these issues, parsing the $LogFile by itself is still useable for those cases were the $MFT file is not present. Why? The $LogFile has embedded into its records the $UsnJrnl:$J change log entries that were done. More specifically, the $LogFile has an entry for each change recorded in the $UsnJrnl:$J as part of a transaction since the change log journal is a file as well. Since $UsnJrnl:$J entries are preserved in the $LogFile records, one can extract and parse these records which has meaningful data, even though the path of the file is still missing.

Reporting

The output generated by mala is in a delimited format where the delimiter can be either a comma (CSV format), pipe, or tab character. So as to limit the number of fields and provide uniformity across different operations, the last field is a quasi-json format that allows the tool to use a condensed notation and be extensible so as to allow for an unlimited combination of data types. In this way dissimilar data can be concisely put into a format that is easily digested by a spreadsheet program (like excel) or into a database. Most of the data put into this 'catch-all' column is the payload data associated with the operation, as well as, any supporting information provided by the $MFT file (if available).

Below are the delimited fields that are included in the reporting:

Field Meaning
extrapolated_timestamp Internally, mala keeps track of the last time that was reported (either via a $UsnJrnl entry or a $MFT entry) and reports that time for the next log sequence number (LSN) record reported. This extrapolated time is just a guess or estimate.
ref Indicator when the time was updated based on explicit timestamp data in the payload or a referenced $MFT record
change_reason Relates to the $UsnJrnl entry embedded into one of the operations associated with that transaction, or if the $UsnJrnl entry is not available it is derived from the type of operation
lsn Log Sequence Number
type Specifies whether it is the start of the transaction or one of the operations in the transaction
op_pattern Operational code (or sequence of codes if the initial start of the transaction)
redo_op The translated operation name for the redo operation (doesn't show the undo operation).
target_lcn Logical Cluster number of the target that is affected
inode MFT record entry. Combination of either (a) explicitly listed inode or (b) computed based on the offset, cluster size, and MFT record size
inode_seqnum MFT record entry sequence number. From data that explicitly listed the sequence number
parent_inode Parent MFT record entry. From data that explicitly listed this.
parent_seqnum Parent MFT record entry sequence number. From data that explicitly listed this.
path Relies on the $MFT file to build the absolute path.
comment General purpose field that displays the parsed payload data and/or other data from support files.

For the quasi-json formatted data, there are some keywords used. The main ones are listed below. The purpose of using keywords is to try to group like-data segments from various sources so as to allow one to have more insight as to where they came from. For example, any data that comes from a $MFT supporting file, will be preceded by the 'mftfile' keyword. Likewise, if the data came from a $UsnJrnl:$J data, it would be preceded by the 'usnjrnl' keyword. Sometimes the payload data is truncated, which can be derived when the size of the payload disagrees with the actual number of bytes left before the start of the next log record. For these cases, the 'data_truncated' keyword is used and the data is parsed to the extent possible given that it was truncated.

Field Meaning
open_record Relates to the payload data associated with the OpenNonresidentAttribute operation
file_record Relates to the payload data associated with the InitializeFileRecordSegment operation
Name is derived from one of the MFT attributes Partial data within the payload data that can affect any of the MFT attributes, including: $filename, $stdinfo, $indx_direntry, $data, etc
usnjrl Contains a $UsnJrnl:$J record embedded in the payload data
hex_bytes Contains an unparsed series of bytes in the payload data
cluster_run_data Contains a cluster run embedded in the payload data
bitmap_set Relates to the payload data associated with the SetBitsInNonresidentBitMap operation
set_size Relates to the payload data associated with the SetNewAttributeSizes operation
mftfile Comes directly from any inode data from a separate $MFT file. If listed for an operation, it directly relates to the operation log entry
metadata This is the metadata associated during the parsing of the operation record.
data_truncated This relates to the operation's record reference to the payload data and the fact that the size for the payload doesn't reflect the number of bytes present. Some payload data exists, but the data is truncated.

Pulling Artifacts off a Live System

The raw artifact files used by mala (eg. $LogFile and $MFT) are locked down if trying to access them from the running system. One solution is to look to other tools to copy the appropriate artifact files. If you are on a Windows machine, one can use the TZWorks' tool dup (Disk Utility and Packer). It will allow one to copy a file, or an entire directory, even if some of the files are locked down by the operating system. To use dup to target the system files used by mala, one could use the following command:

    dup -copygroup -pull_sysfiles -out <results folder>

The above command will also pull other system files not needed by mala, but all the files used by mala will be extracted.


List of options (top)

Option Description
-log Specifies which $LogFile to act on. The format is: -log <$LogFile to parse>.
-mftfile Use the specified $MFT file for $LogFile analysis. The syntax is: -mftfile <$MFT to parse>. There is a sub-option [-showall_inodes] to display all the inodes in the output
-csv Outputs the data fields delimited by commas. Since filenames can have commas, to ensure the fields are uniquely separated, any commas in the filenames get converted to spaces.
-base10 Ensure all size/address output is displayed in base-10 format versus hexadecimal (base-16) format. Default is hexadecimal format.
-no_whitespace Used in conjunction with -csv option to remove any whitespace between the field value and the CSV separator.
-csv_separator Used in conjunction with the -csv option to change the CSV separator from the default comma to something else. Syntax is -csv_separator "|" to change the CSV separator to the pipe character. To use the tab as a separator, one can use the -csv_separator "tab" OR -csv_separator "\t" options.
-dateformat Output the date using the specified format. Default behavior is -dateformat "yyyy-mm-dd". Using this option allows one to adjust the format to mm/dd/yy, dd/mm/yy, etc. The restriction with this option is the forward slash (/) or dash (-) symbol needs to separate month, day and year and the month is in digit (1-12) form versus abbreviated name form.
-quiet Show no progress during the parsing operation
-utf8_bom All output is in Unicode UTF-8 format. If desired, one can prefix an UTF-8 byte order mark to the CSV output using this option.

Authentication and License File (top)

This tool has authentication built into the binary. The primary authentication mechanism is the digital X509 code signing certificate embedded into the binary (Windows and macOS).

The other mechanism is the runtime authentication, which applies to all the versions of the tools (Windows, Linux and macOS). The runtime authentication ensures that the tool has a valid license. The license needs to be in the same directory of the tool for it to authenticate. Furthermore, any modification to the license, either to its name or contents, will invalidate the license.

Limited versus Demo versus Full in the tool's output banner

The tools from TZWorks will output header information about the tool's version and whether it is running in limited, demo or full mode. This is directly related to what version of a license the tool authenticates with. The limited and demo keywords indicates some functionality of the tool is not available, and the full keyword indicates all the functionality is available. The lacking functionality in the limited or demo versions may mean one or all of the following: (a) certain options may not be available, (b) certain data may not be outputted in the parsed results, and (c) the license has a finite lifetime before expiring.


Version history (top)


References (top)

  1. NTFS Log Tracker, blueangel, forensic-note.blogspot.kr, JungHoon Oh briefing charts, http://forensicinsight.org/wp-content/uploads/2013/06/F-INSIGHT-NTFS-Log-TrackerEnglish.pdf
  2. NTFS.com, NTFS Transaction Journal [https://www.ntfs.com/transaction.htm]
  3. G-C Partners, File System Journal Analysis, David Cowen and Matthew Seyer and ANJPv3.11.07_FE.exe tool
  4. LogFileParser, https://github.com/jschicht/LogFileParser
  5. Windows Internals, Microsoft Press