Post

Sat Jan 18, 2020 9:31 pm

NAME Get-DTWFileEncoding

SYNOPSIS

Returns the encoding type of the file

SYNTAX

Get-DTWFileEncoding [-Path] <String> [[-ByteCountToCheck] <Int32>] [[-PercentageMatchUnicode] <Decimal>]

[<CommonParameters>]

DESCRIPTION

Returns the encoding type of the file. It first attempts to determine the

encoding by detecting the Byte Order Marker using Lee Holmes' algorithm

(http://poshcode.org/2153). However, if the file does not have a BOM

it makes an attempt to determine the encoding by analyzing the file content

(does it 'appear' to be UNICODE, does it have characters outside the ASCII

range, etc.). If it can't tell based on the content analyzed, then

it assumes it's ASCII. Note: it does not correctly detect UTF32 BE or LE

if no BOM is present.

If your file doesn't have a BOM and 'doesn't appear to be Unicode' (based on

my algorithm*) but contains non-ASCII characters *after* index ByteCountToCheck,

the file will be incorrectly identified as ASCII. So put a BOM in there, would ya!

For more information and sample encoding files see:

http://danspowershellstuff.blogspot.com ... order.html

And please give me any tips you have about improving the detection algorithm.

*For a full description of the algorithm used to analyze non-BOM files,

see "Determine if Unicode/UTF8 with no BOM algorithm description".

PARAMETERS

-Path <String>

Path to file

Required? true

Position? 1

Default value

Accept pipeline input? true (ByValue, ByPropertyName)

Accept wildcard characters? false

-ByteCountToCheck <Int32>

Number of bytes to check, by default check first 10000 character.

Depending on the size of your file, this might be the entire content of your file.

Required? false

Position? 2

Default value 10000

Accept pipeline input? false

Accept wildcard characters? false

-PercentageMatchUnicode <Decimal>

If pecentage of null 0 value characters found is greater than or equal to

PercentageMatchUnicode then this file is identified as Unicode. Default value .5 (50%)

Required? false

Position? 3

Default value 0.5

Accept pipeline input? false

Accept wildcard characters? false

<CommonParameters>

This cmdlet supports the common parameters: Verbose, Debug,

ErrorAction, ErrorVariable, WarningAction, WarningVariable,

OutBuffer, PipelineVariable, and OutVariable. For more information, see

about_CommonParameters (https:/go.microsoft.com/fwlink/?LinkID=113216).

INPUTS

OUTPUTS

-------------------------- EXAMPLE 1 --------------------------

PS C:\\>Get-IHIFileEncoding -Path .\\SomeFile.ps1 1000

Attempts to determine encoding using only first 1000 characters

BodyName : unicodeFFFE

EncodingName : Unicode (Big-Endian)

HeaderName : unicodeFFFE

WebName : unicodeFFFE

WindowsCodePage : 1200

IsBrowserDisplay : False

IsBrowserSave : False

IsMailNewsDisplay : False

IsMailNewsSave : False

IsSingleByte : False

EncoderFallback : System.Text.EncoderReplacementFallback

DecoderFallback : System.Text.DecoderReplacementFallback

IsReadOnly : True

CodePage : 1201

RELATED LINKS

Post a reply in the forum

Get-DTWFileEncoding

Share this page: