Dyota's blog

SQL database on local machine for development

This article describes how to recreate a remote database on your local machine.

My particulars: my remote database is Azure Synapse, my local is SQL Server Express LocalDB, and my source systems are usually Microsoft Dynamics (Finance & Operations, Business Central, or CRM).

I have a SQL database at work. There are a lot of views written into it, and the view definitions are quite long and complex.

The views are ultimately pulled into Power BI. This is where errors are picked up: in Power BI, by the end user.

When errors are caused by the way the views are written, it's cumbersome to tinker with, for several reasons:

This is the motivation behind wanting a local database. I wanted:

I learned a lot from doing this exercise. It was way more work than I had anticipated, with some gotchas to overcome along the way.

Prerequisites

Overview

There are two sides to the operations here: the remote side, and the local side. It makes more sense to deal with the remote side first, because data and metadata will be downloaded first before pushing it all into the local database.

On the remote side:

On the local side:

In between these two sides there will be a staging ground, where data and metadata will be downloaded to the local machine as text files first, before importing them into the local database.

This is done so that data and metadata can be inspected by eye, in case anything is amiss. I am lucky enough that the volumes of data I am working with is not prohibitive. It also means that the remote database won't need to be queried multiple times for the same data if it is not required.

SQL Server Express

I chose this software instead of any other one because I'm working in a Microsoft environment and I need the SQL dialect to be the same across both the remote and the local.

Project folder structure

I propose the following folder structure.

|_ LocalData/
|_ Source/
    |_ #Metadata
        |_ schemas.json
        |_ tables.json
    |_ <each schema has its own folder>
        |_ <each table/view has its own .sql file>
|_ SourceChanged/
|_ config.json
|_ configLocal.json

Config files

The config files will contain the necessary data to log into the remote and local environments. The scripts will read from these files to know how to connect to the databases.

Both config.json and configLocal.json will largely be the same, with the latter having less properties.

config.json

{
    "tenantId": "",
    "sqlEndpoint": "", 
    "db": ""
}

configLocal.json

{
    "sqlEndpoint": "", 
    "db": ""
}

Remote side

Prepare metadata

Schemas

Get the names of the schemas and store it in .\Source
#Metadata\schemas.json.

We are not interested in system schemas. The condition specified below will filter those out.

We also only need the output to be a single array of strings, nothing more.

SELECT 
    name 
FROM sys.schemas 
WHERE 
    schema_id > 4 
    AND name NOT LIKE 'db_%'

Tables

Later, when creating the tables, the columns that the tables will have and their data types will need to be specified explicitly. This SQL query will fetch the necessary information

SELECT 
    TABLE_SCHEMA
    , TABLE_NAME
    , COLUMN_NAME
    , DATA_TYPE
    , CHARACTER_MAXIMUM_LENGTH
    , NUMERIC_PRECISION
    , NUMERIC_SCALE
FROM INFORMATION_SCHEMA.COLUMNS
ORDER BY ORDINAL_POSITION
function downloadTablesColumns ($source, $exclude) {
    # $exclude specifies the columns that we don't want
    # i.e. system columns that are not necessary to port over
    
    Write-Host "Starting download of tables and columns from database $($source.db) at $($source.server)"

    $fields = @(
        'TABLE_SCHEMA'
        'TABLE_NAME'
        'COLUMN_NAME'
        'DATA_TYPE'
        'CHARACTER_MAXIMUM_LENGTH'
        'NUMERIC_PRECISION'
        'NUMERIC_SCALE'
        'IS_NULLABLE'
    )

    $q = @"
        SELECT $($fields -join ',')
        FROM INFORMATION_SCHEMA.COLUMNS
        $(
            if($null -ne $exclude) {
                "WHERE COLUMN_NAME NOT IN ( $( ($exclude | % { "`'$_`'" }) -join ", " ) )"
            }
        )
        ORDER BY ORDINAL_POSITION
"@
    $q

    Write-Host "Sending query..."
    $tables = Invoke-Sqlcmd -ServerInstance $source.server -Database $source.db -AccessToken $source.accessToken -query $q

    Write-Host "Writing to file..."
    $outFile = '.\Source\#Metadata\tables.json'

    
    $tables |
        # need to explicitly specify the fields to select for, otherwise a lot of system metadata will come through
        select $fields |
        ConvertTo-Json -Compress |
        Out-File $outFile

    Write-Host "Saved tables in $outFile"
}

Data

We only need to download the data for the base tables, not for any of the derived views. Here, we assume that all of the base tables are collected under one schema, and hence, we need to fetch the table names from just that one schema.

First, get the names of the tables from that schema. All of this is done in PowerShell, assuming that tables.json has already been downloaded.

function getBaseTables ($baseSchemaName) {    
    
    # this function assumes that tables.json has already been written in
    $sourceFile = '.\Source\#Metadata\tables.json'

    Write-Host "Reading source file at $sourceFile"
    $raw = cat $sourceFile -Raw

    Write-Host "Collecting table names from schema $baseSchemaName"

    $tables = $raw |
        ConvertFrom-Json |
        ? {
            $_.TABLE_SCHEMA -eq $baseSchemaName
        } | 
        % TABLE_NAME |
        select -Unique |
        sort 

    $count = $tables.count

    Write-Host "Collected a list of $count table names"

    return $tables
}
function downloadDataToLocal ($source, $baseSchemaName) {
    $tables = getBaseTables $baseSchemaName
    
    $tables |   
        % {
            $schema = $_.TABLE_SCHEMA
            $table = $_.TABLE_NAME
            $q = "SELECT * FROM [$schema].[$table]"
    
            Write-Host "Running $q..."
            
            $data = Invoke-Sqlcmd -ServerInstance $source.server -Database $source.db -AccessToken $source.accessToken -query $q
    
            $count = $data.Count
            Write-Host "Downloaded $count rows"

            $dest = "$localPath\$schema\"

            if (!(Test-Path $dest)) { New-Item -ItemType Directory $dest}
    
            $jsonFile = "$dest\$table.json"
    
            Write-Host "Saving to $jsonFile"
    
            $exclude = exclude
    
            $data | 
                # this excludes system fields that are not wanted
                select -ExcludeProperty $exclude |
                ConvertTo-Json -Depth 10 -Compress | 
                Out-File $jsonFile
    
            Write-Host "Saved`n"
    
        }
}

Local side

Instance

Here, I am using the default instance of LocalDB. I'm not creating a new one.

Create database

There are two things that I want to specify when creating a database:

The PowerShell script below does just this.

using namespace System.Data.SqlClient

function New-LocalDatabase ([string] $db, [string] $Path) {
    # Ensure the directory exists
    if (-not (Test-Path $Path)) {
        New-Item -Path $Path -ItemType Directory -Force | Out-Null
        Write-Host "Created directory: $Path" -ForegroundColor Green
    }
    
    # Build file paths
    $mdfPath = Join-Path $Path "$db.mdf"
    $ldfPath = Join-Path $Path "$db.ldf"
    
    # Create the database with explicit file locations
    $createDbSql = @"
CREATE DATABASE [$db]
ON PRIMARY (
    NAME = N'$db',
    FILENAME = N'$mdfPath',
    SIZE = 100MB,
    FILEGROWTH = 50MB
)
LOG ON (
    NAME = N'${db}_log',
    FILENAME = N'$ldfPath',
    SIZE = 25MB,
    FILEGROWTH = 25MB
);
"@
    
    try {
        Write-Host "Creating database in LocalDB..." -ForegroundColor Green
        
        Invoke-Sqlcmd -ServerInstance "(localdb)\MSSQLLocalDB" -Database "master" -Query $createDbSql -QueryTimeout 120
        
        Write-Host "Database '$db' created successfully!" -ForegroundColor Green
        Write-Host "  Data file: $mdfPath" -ForegroundColor Cyan
        Write-Host "  Log file:  $ldfPath" -ForegroundColor Cyan
    }
    catch {
        Write-Error "Failed to create database: $_"
    }
}

To rename the database, run the following command:

ALTER DATABASE <original> SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
ALTER DATABASE <original> MODIFY NAME = <changed>;
ALTER DATABASE <changed> SET MULTI_USER WITH ROLLBACK IMMEDIATE;

There is more to is than this. Renaming databases is more trouble than it's worth - recommend not doing this.

Create schemas

First, create the schemas to put the tables into.

SELECT name 
FROM sys.schemas 
WHERE schema_id > 4 AND name NOT LIKE 'db_%'
function importSchemas ($dest) {
    # get schemas

    $sourceSchemas = cat '.\Source\#Metadata\schemas.json' | 
        ConvertFrom-Json

    Write-Host "Creating these schemas: $($schemas -join ", ")"

    $q = "SELECT name FROM sys.schemas WHERE schema_id > 4 AND name NOT LIKE 'db_%'"

    $res = Invoke-Sqlcmd -ServerInstance $dest.server -Database $dest.db  -Query $q
    $destSchemas = $res.name

    Write-Host "The local db already has these schemas created: $($destSchemas -join ", ")"

    # go through one by one and create schema
    $sourceSchemas | 
        % {
            if (!($_ -in $destSchemas)) {
                $q = "CREATE SCHEMA $_;"
                Write-Host "Running $q in destination"
                Invoke-Sqlcmd -ServerInstance $source.server  -Database $source.db -Query $q
            }
        }
    
    # check that all schemas have been made
    $res = Invoke-Sqlcmd -ServerInstance $dest.server -Database $dest.db -Query $q
    $destSchemas = $res.name

    $allSchemasImported = [Collections.Generic.SortedSet[String]]::CreateSetComparer().Equals($sourceSchemas,$destSchemas)

    if ($allSchemasImported) {
        Write-Host "All schemas have been imported" -ForegroundColor Green
        return
    }
}

Create tables

Then, create the tables.

Only create and import the base tables. If there are a lot of views, the only create them using the view definitions, rather than importing the data in from the remote. We make the assumption that all of the base tables belong to one single schema.

function createTables ($source, $dest, $baseSchemaName) {

    # only pick out the base schema, this is where the raw tables live
    $tables = getBaseTables $baseSchemaName

    $sourceFile = '.\Source\#Metadata\tables.json'

    Write-Host "Reading source file at $sourceFile"
    $raw = cat $sourceFile -Raw

    $tablesAndColumns = $raw | 
        ConvertFrom-Json
    
    $tables |
        # select -first 1 |
        % {
            $schema = $_.TABLE_SCHEMA
            $table = $_.TABLE_NAME
            
            Write-Host "Processing [$($schema)].[$($table)]"

            $columns = $tablesAndColumns |
                ? {
                    $_.TABLE_SCHEMA -eq $schema -and
                    $_.TABLE_NAME -eq $table
                }
            
            # drop table first to clear out
            Write-Host "Dropping the previous table..."
            $dropTable = "DROP TABLE [$($_.TABLE_SCHEMA)].[$($_.TABLE_NAME)];"
            
            # Build CREATE TABLE statement
            Write-Host "Composing CREATE command..."
            $createTable = "CREATE TABLE [$($_.TABLE_SCHEMA)].[$($_.TABLE_NAME)] (`n"
    
            $columnDefs = foreach ($col in $columns) {
                # $dataType = $col.DATA_TYPE

                $dataType = switch ($col.DATA_TYPE) {
                    # String types need length
                    { $_ -in 'nvarchar', 'varchar', 'nchar', 'char' } {
                        if ($col.CHARACTER_MAXIMUM_LENGTH -eq -1) {
                            "$($col.DATA_TYPE)(MAX)"
                        } else {
                            "$($col.DATA_TYPE)($($col.CHARACTER_MAXIMUM_LENGTH))"
                        }
                    }
                    
                    # Decimal/Numeric need precision and scale
                    { $_ -in 'decimal', 'numeric' } {
                        "$($col.DATA_TYPE)($($col.NUMERIC_PRECISION),$($col.NUMERIC_SCALE))"
                    }
                    
                    # Binary types might need length
                    { $_ -in 'varbinary', 'binary' } {
                        if ($col.CHARACTER_MAXIMUM_LENGTH -eq -1) {
                            "$($col.DATA_TYPE)(MAX)"
                        } else {
                            "$($col.DATA_TYPE)($($col.CHARACTER_MAXIMUM_LENGTH))"
                        }
                    }
                    
                    # Everything else (int, bigint, datetime, bit, etc.) - use as-is
                    default {
                        $col.DATA_TYPE
                    }
                }
                
                $nullable = if ($col.IS_NULLABLE -eq 'NO') { 'NOT NULL' } else { 'NULL' }
                
                "    [$($col.COLUMN_NAME)] $dataType $nullable"
            }
    
            $createTable += $columnDefs -join ",`n"
            $createTable += "`n)"

            Write-Host "CREATE command composed. "
            Write-Host $createTable
    
            Write-Host "Writing into local database..."
            Invoke-Sqlcmd -ServerInstance $dest.server -Database $dest.db -query $dropTable
            Invoke-Sqlcmd -ServerInstance $dest.server -Database $dest.db -query $createTable
            Invoke-Sqlcmd -ServerInstance $dest.server -Database $dest.db -query $q | ft
    
        }
        
        Write-Host "All tables completed"
        Invoke-Sqlcmd -ServerInstance $dest.server -Database $dest.db -query "SELECT TABLE_SCHEMA, TABLE_NAME FROM INFORMATION_SCHEMA.TABLES ORDER BY TABLE_SCHEMA, TABLE_NAME"
}

Import all data

This is a big one.

function loadIntoLocalDB ($jsonPath, $db, $schema, $table) {

    $exclude = exclude
    
    # Get the schema from LocalDB
    # this is required to get the data types for each of the columns to make sure that they are all the same when the data gets imported
    $columnsQuery = @"
    SELECT 
        COLUMN_NAME,
        DATA_TYPE,
        CHARACTER_MAXIMUM_LENGTH,
        NUMERIC_PRECISION,
        NUMERIC_SCALE
    FROM INFORMATION_SCHEMA.COLUMNS
    WHERE 
        TABLE_SCHEMA = '$schema' 
        AND TABLE_NAME = '$table'
    ORDER BY ORDINAL_POSITION
"@
    
    $columns = Invoke-Sqlcmd -ServerInstance "(localdb)\MSSQLLocalDB" -Database "$db" -Query $columnsQuery |
        ? {
            # do not want system columns, only data columns
            !($_.COLUMN_NAME -in $exclude)
        }
    
    $dataTable = New-Object DataTable
    
    foreach ($col in $columns) {
        $colName = $col.COLUMN_NAME
        
        # Map SQL types to .NET types
        $netType = switch ($col.DATA_TYPE) {
            'int'           { [int] }
            'bigint'        { [long] }
            'smallint'      { [short] }
            'tinyint'       { [byte] }
            'bit'           { [bool] }
            'decimal'       { [decimal] }
            'numeric'       { [decimal] }
            'money'         { [decimal] }
            'float'         { [double] }
            'real'          { [float] }
            'datetime'      { [datetime] }
            'datetime2'     { [datetime] }
            'date'          { [datetime] }
            'time'          { [timespan] }
            'uniqueidentifier' { [guid] }
            default         { [string] }
        }
        
        $column = New-Object DataColumn($colName, $netType)
        $column.AllowDBNull = $true
        [void]$dataTable.Columns.Add($column)
    }
    
    # Read JSON
    $jsonData = Get-Content -Path $jsonPath -Raw | ConvertFrom-Json
    
    foreach ($jsonRow in $jsonData) {
        $dataRow = $dataTable.NewRow()
        
        foreach ($col in $columns) {
            $colName = $col.COLUMN_NAME
            $value = $jsonRow.$colName
            
            # Handle nulls and empty strings
            if ($null -eq $value ) { 
                $dataRow[$colName] = [DBNull]::Value
            }
            else {
                # Convert to proper type based on SQL column type
                $dataRow[$colName] = switch ($col.DATA_TYPE) {
                    { $_ -in 'datetime', 'datetime2', 'date' } { [datetime]$value }
                    'bit'               { [bool]$value }
                    'uniqueidentifier'  { [guid]$value }
                    'int'               { [int]$value }
                    'bigint'            { [long]$value }
                    'smallint'          { [short]$value }
                    'tinyint'           { [byte]$value }
                    'decimal'           { [decimal]$value }
                    'numeric'           { [decimal]$value }
                    'money'             { [decimal]$value }
                    'float'             { [double]$value }
                    'real'              { [float]$value }
                    default             { $value }  # Strings stay as-is
                }
            }
        }
        
        [void]$dataTable.Rows.Add($dataRow)
    }
    
    # Bulk insert
    $connection = New-Object SqlConnection("Server=(localdb)\MSSQLLocalDB;Database=$db;Integrated Security=true;")
    $connection.Open()
    
    $bulkCopy = New-Object SqlBulkCopy($connection)
    $bulkCopy.DestinationTableName = "[$schema].[$table]"
    # Wrap the WriteToServer in try-catch
    try {
        $bulkCopy.WriteToServer($dataTable)
    }
    catch {
        Write-Host "Bulk copy failed: $($_.Exception.Message)" -ForegroundColor Red
        
        # Try inserting row by row to find the problematic row/column
        Write-Host "Testing rows individually..." -ForegroundColor Yellow
        
        $connection2 = New-Object SqlConnection("Server=(localdb)\MSSQLLocalDB;Database=$db;Integrated Security=true;")
        $connection2.Open()
        
        for ($i = 0; $i -lt $dataTable.Rows.Count; $i++) {
            try {
                $singleRowTable = $dataTable.Clone()  # Same schema
                $singleRowTable.ImportRow($dataTable.Rows[$i])
                
                $bulkCopy2 = New-Object SqlBulkCopy($connection2)
                $bulkCopy2.DestinationTableName = "[$schema].[$table]"
                $bulkCopy2.WriteToServer($singleRowTable)
            }
            catch {
                Write-Host "Row $i failed: $($_.Exception.Message)" -ForegroundColor Red
                
                # Show the problematic row data
                Write-Host "Row $i data:" -ForegroundColor Cyan
                foreach ($col in $dataTable.Columns) {
                    $value = $dataTable.Rows[$i][$col.ColumnName]
                    $type = $value.GetType().Name
                    Write-Host "  $($col.ColumnName) = '$value' (Type: $type, Expected: $($col.DataType.Name))"
                }
                break
            }
        }
        $connection2.Close()
    }
    
    $connection.Close()
}
$localPath = '.\LocalData'
# load in all data to localdb
dir $localPath |
    % {
        $jsonPath = $_.FullName
        $db = $dest.db
        $schema = $_.BaseName.split('.')[0]
        $table = $_.BaseName.split('.')[1]
        Write-Host "Start loading in [$schema].[$table]..."
        try {
            loadIntoLocalDB $jsonPath $db $schema $table
            Write-Host `Load complete`n`
        } catch {
            Write-Host "Load failed"
        }
    }

Create views

Views need to be created in order.

#database #debugging #development #powershell #sql #testing