Geo*Data:FAQ

From Melissa Data Wiki
Revision as of 01:28, 17 November 2023 by Admin (talk | contribs) (Created page with "← Geo*Data {{CustomTOC}} ==Instructions== Here is an '''Unsupported''' and '''Untested script''' that can be used with '''Windows PowerShell''' to Split the new <code>US.txt</code> file found in '''GeoDAT_202310''' into individual State Files. This may be useful for GeoData users that still want previous legacy format with every file separated instead of the updated all-in-one state file (US.txt). ===Download GeoData Update (GeoDAT_YYYYMM)=== Downloa...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

← Geo*Data


Instructions

Here is an Unsupported and Untested script that can be used with Windows PowerShell to Split the new US.txt file found in GeoDAT_202310 into individual State Files.

This may be useful for GeoData users that still want previous legacy format with every file separated instead of the updated all-in-one state file (US.txt).

Download GeoData Update (GeoDAT_YYYYMM)

Download and Extract GeoDAT_202310 folder onto your local desktop.

Download and Extract GeoData_SplitState_Test.zip

Download and Extract the GeoData_SplitState_Test.zip below and save the Split_All_States_20231114.ps1 and Split_By_State_Test.ps1 to the GeoDAT_202310 folder.

For example: C:\Users\Roxanne\Desktop\GeoDAT_202310\

GeoData FAQ SplitFile.png


Open Windows PowerShell as Administrator

Open "Windows PowerShell" as Administrator.

Change Drive to GeoDAT_YYYYMM file path

Change drive to the File path of GeoDAT_202310 on your desktop and press Enter.

For example:

PS C:\WINDOWS\system32> cd "C:\Users\Roxanne\Desktop\GeoDAT_202310\"

PS C:\Users\Roxanne\Desktop\GeoDAT_202310>

Determine if you want to Split the US.txt file

Determine if you want to Split the US.txt file by all States at one time OR one desired state file.

All States at One Time

To Split the entire US.txt file into individual state files at one time, please perform the following steps:

Open the Split_All_States_Test.ps1 in notepad to update the the following file paths before running the script in PowerShell:

# $txtFilePath

# $idxFilePath

# $outputFolderPath

Once the file paths are updated, Save and close the the file.

Type .\Split_All_States_Test.ps1 in Windows PowerShell and hit enter to split all states at once.

NOTE: This will parse out the state files into individual state files as done previously and may take awhile, Press CTRL+C to Stop the script.

File Contents: Split_All_States_Test.ps1

# Split GeoData - "US.txt" file by All States at once using the "US.idx" file.

# PLEASE NOTE:
# This script has NOT been tested and is NOT supported.
# This script may take some time. If using Windows Powershell, type "CTRL + C" to Stop script from running.

# The following paths must be updated before running:
	# $txtFilePath
	# $idxFilePath
	# $outputFolderPath 

# Last Updated: 2023-11-14 

# Define the paths to the US.txt file and the US.idx file
$txtFilePath = "C:\Users\Roxanne\Desktop\GeoDAT_202310\TXT\US.txt"
$idxFilePath = "C:\Users\Roxanne\Desktop\GeoDAT_202310\TXT\US.idx"
$outputFolderPath = "C:\Users\Roxanne\Desktop\GeoDAT_202310\TXT"
 
# Read the index file
$indexLines = Get-Content $idxFilePath
 
# Create the output folder if it doesn't exist
if (!(Test-Path -Path $outputFolderPath -PathType Container)) {
    New-Item -Path $outputFolderPath -ItemType Directory
}
 
# Initialize variable to keep track of the cumulative record count
$recordOffset = 0
 
foreach ($indexLine in $indexLines) {
    # Parse the index line to get the state abbreviation and count
    $stateAbbreviation, $stateFIPS, $count = $indexLine -split ','
 
    # Convert the count to an integer
    $count = [int]$count
 
    if ($stateAbbreviation -and $count -gt 0) {
        # Create a new output file with the state abbreviation
        $outputFileName = "${stateAbbreviation}.txt"
        $outputFilePath = Join-Path -Path $outputFolderPath -ChildPath $outputFileName
 
        # Read and append lines from the US.txt file to the current output file
        Get-Content $txtFilePath | Select-Object -Skip $recordOffset -First $count | Add-Content -Path $outputFilePath
 
        # Update the cumulative record count for the next state
        $recordOffset += $count
 
        Write-Host "Splitting complete for $stateAbbreviation."
    }
}
 
Write-Host "Splitting complete for all states."

One Desired State File

To Split an individual state file out of the US.txt one at a time, please perform the following steps:

Open the Split_By_State_Test.ps1 in notepad to update the the following file paths before running the script in PowerShell:

$txtFilePath

$idxFilePath

$outputFolderPath

$desiredState

Once the file paths are updated, Save and close the the file.

Type .\Split_By_State_Test.ps1 in Windows PowerShell and hit enter to split an individual desired state.

NOTE: Please note the new State file may take a moment to display in the TXT folder depending on the size.

File Contents: Split_By_State_Test.ps1

# Split GeoData - "US.txt" file by Individual State using the "US.idx" file.

# PLEASE NOTE:
# This script has NOT been tested and is NOT supported.

# The following paths must be updated before running:
	# $txtFilePath
	# $idxFilePath
	# $outputFolderPath 
	# $desiredState 

# Last Updated: 2023-11-14


# Define the paths to the US.txt file and the US.idx file
$txtFilePath = "C:\Users\Roxanne\Desktop\GeoDAT_202310\TXT\US.txt"
$idxFilePath = "C:\Users\Roxanne\Desktop\GeoDAT_202310\TXT\US.idx"
$outputFolderPath = "C:\Users\Roxanne\Desktop\GeoDAT_202310\TXT"

# Read the index file
$indexLines = Get-Content $idxFilePath

# Create the output folder if it doesn't exist
if (!(Test-Path -Path $outputFolderPath -PathType Container)) {
    New-Item -Path $outputFolderPath -ItemType Directory
}

# Initialize variables to keep track of line counts
$currentState = $null
$recordOffset = 0

# Define the state you want to parse (e.g., "AL" for Alabama)
$desiredState = "AL"

# Loop through the index lines
foreach ($indexLine in $indexLines) {
    # Parse the index line to get the state abbreviation and count
    $stateAbbreviation, $stateFIPS, $count = $indexLine -split ','

    # Convert the count to an integer
    $count = [int]$count

    # Check if the current state matches the desired state
    if ($stateAbbreviation -eq $desiredState -and $count -gt 0) {
        # Create a new output file with the state abbreviation
        $outputFileName = "${stateAbbreviation}.txt"
        $outputFilePath = Join-Path -Path $outputFolderPath -ChildPath $outputFileName

        # Read and write lines from the US.txt file to the current output file
        Get-Content $txtFilePath | Select-Object -Skip $recordOffset -First $count | Set-Content -Path $outputFilePath

        Write-Host "Splitting complete for $stateAbbreviation."
        break  # Exit the loop after processing the desired state
    }

    # Update the record offset for the next state
    $recordOffset += $count
}

Write-Host "Splitting complete for $desiredState."