Using F# to read/write files and databases

April 9, 2025    FSharp Development Programming AI-Coding

Using F# to read/write files and databases

I had an task to read from a file, look up the values in the database and find discrepencies. Since I had some extra time, I decided to apply my F# learning to this. The script won’t be long lasting, so I didn’t ask for permissions first. I used Copilot along the way to help with syntax and building issues that I wan’t as familiar with. It allowed me to move from task to async computation expressions quickly and learn how to work with files and databases.

I have a file with 80,000 rows and multiple database reads so Copilot helped me optimize that, but it still takes 30 minutes to run the full thing.

Database Querying

There are a lot of options for database querying listed. I chose FSharp.Dapper as I was familiar with Dapper in C# and wanted to add queries quickly. I did end up using the select { } expression, but then moved to the real Dapper to have a direct query after my joining included more than the supported 3 tables.

I contributed some documentation updates after not being sure about views and types. It was accepted!

Code

I started with a .fsx repl and later moved to a Program.fs with a .fsproj when I needed to send in params and have someone else run it.

I’ll share some snippets of the code, but leave out other parts.

Reading from a file

open System
type RecordFromFile = {
  First_ID: string
  ID_With_Dash: string
  ID: string
  FirstName: string
  LastName: string
  EffectiveDate: int
}

let idWithDash id =
    if not (String.IsNullOrWhiteSpace id) then
        id.Insert(id.Length - 2, "-")
    else ""

let processRecordsFromFile dataFilePath skipToLineInFile =
    // Instead of loading all rows into memory, process rows as they are read from the file or database.
    seq {
        use reader = System.IO.File.OpenText(dataFilePath)
        let mutable lineNumber = 0
        while not reader.EndOfStream do
            let line = reader.ReadLine()
            lineNumber <- lineNumber + 1
            if lineNumber >= skipToLineInFile && not (String.IsNullOrWhiteSpace(line)) then
                let fields = line.Split('|')
                yield {
                    First_ID = fields.[0]
                    ID = fields.[2]
                    ID_With_Dash = idWithDash fields.[2]
                    FirstName = fields.[4]
                    LastName = fields.[5]
                    EffectiveDate = int fields.[8]
                }
    }

Database Querying


open Dapper.FSharp.MSSQL
open Dapper
open Microsoft.Data.SqlClient

// Anonymized MEMBER_CENSUS type
type MEMBER_CENSUS = {
  AlternateId: string
  FirstName: string
  LastName: string
  CoverageEffectiveDate: string
}

// Anonymized MemberJoined type
type MemberJoined = {
  MemberId: string
  MemberStatus: string
  SSN: string
  LastName: string
  AccountKey: decimal
  TerminationDate: DateTime
  PlanStartDate: DateTime
  PlanEndDate: DateTime
}

let createNewConnection connectionString =
    new SqlConnection(connectionString)

let queryDatabaseByIds (conn: SqlConnection) (ids: string list, ids_with_dash: string list) =
    let memberCensusTable = table<MEMBER_CENSUS>
    async {
        // using FSharp.Dapper
        let! censusMembers =
            select {
                for record in memberCensusTable do
                    where (isIn record.AlternateId ids)
                    selectAll
            }
            |> conn.SelectAsync<MEMBER_CENSUS>
            |> Async.AwaitTask
        let censusMembersList = censusMembers |> Seq.toList
        printfn $"{censusMembersList.Length} Census Members"
        
        // the ids in the tables were different so I couldn't do a more efficient direct join
        // I removed some of the joins from my query, but this goes into MemberJoined easily
        // using Dapper here
        let! members =
            let query = """
                SELECT 
                    m.MemberId
                    m.MemberStatus,
                    m.SSN,
                    m.LastName, 
                    m.AccountKey,
                    mps.TerminationDate, 
                    mps.PlanStartDate, 
                    mps.PlanEndDate
                FROM dbo.Members m
                INNER JOIN dbo.MemberPlans mps ON m.AccountKey = mps.AccountKey
                WHERE m.MemberId IN @idsWithDash
            """
            let parameters = 
                let parameters = new DynamicParameters()
                parameters.Add("@idsWithDash", ids_with_dash)
                parameters
            printfn $"Ids IN { ids_with_dash }"
            conn.QueryAsync<MemberJoined>(query, parameters)
            |> Async.AwaitTask
        let membersList = members |> Seq.toList
        printfn $"{membersList.Length} Members"
        return censusMembersList, membersList
    }

let recordsFromDatabase firstConn recordsFromFile chunkSize maxDegreeOfParallelism = async {
    let processChunk (index: int) (chunk: RecordFromFile seq) =
        printfn $"Processing chunk {index}"
        let conn = ensureConnectionOpenWithRetry firstConn 2
        let ids = chunk |> Seq.map (fun record -> record.ID) |> Seq.toList
        let ids_with_dash = chunk |> Seq.map (fun record -> record.ID_With_Dash) |> Seq.toList

        async {
            let! censusMembers, members = queryDatabaseByIds conn (ids, ids_with_dash)
            printfn $"Chunk {index} Loaded"
            return censusMembers, members
        }

    let! chunkResults =
        recordsFromFile
        |> Seq.chunkBySize chunkSize
        |> Seq.mapi processChunk
        // |> Async.Parallel
        |> fun chunks -> Async.Parallel(chunks, maxDegreeOfParallelism=maxDegreeOfParallelism)

    return chunkResults |> Array.toList
}

Creating a CSV file

FSharp.Data for the win!

This won’t compile with the code above, because I left out my toCombinedMemberInfo function that puts members into this structure, but you get the idea.

open FSharp.Data
type MemberCsv = CsvProvider<Sample = "Status,Alt_ID,M_ID,FirstName,LastName,EffectiveDate,MEMBER_TERMINATION_DATE,MEMBER_PLAN_START,MEMBER_PLAN_END">
let writeMembersToCsv (filePath: string) members =
    let rows =
        members
        |> List.map (fun m ->
            MemberCsv.Row(
                m.Member.STATUS,
                m.CensusMember.ALT_ID,
                m.Member.ID,
                m.CensusMember.MEMBER_FIRST_NAME,
                m.CensusMember.MEMBER_LAST_NAME,
                m.CensusMember.EFFECTIVE_DATE_OF_COVERAGE,
                string m.Member.TERMINATION_DATE,
                string m.Member.PLAN_START,
                string m.Member.PLAN_END
            )
        )
    let csv = new MemberCsv(rows)
    csv.Save filePath

Main


// Main function
// dotnet run "Server=test;Database=MyDatabase;Integrated Security=SSPI;MultipleActiveResultSets=True;Connect Timeout=160;Max Pool Size=100;" "./data.txt" "2000"
[<EntryPoint>]
let main argv =
    if argv.Length < 3 then
      printfn "Usage: censusCheck <connectionString> <dataFilePath> <chunkSize>"
      Environment.Exit(1)

    let connectionString = argv.[0]
    let dataFilePath = argv.[1]
    let chunkSize = int argv.[2]
    let maxDegreeOfParallelism = 2
        
    // for testing, start from this line in the Census txt file
    let skipToLineInCensusTxtFile = 0
    // let connectionString = "Server=test;Database=MyDatabase;Integrated Security=SSPI;MultipleActiveResultSets=True;Max Pool Size=100;Connect Timeout=160;"
    // let dataFilePath = @"./data.txt"
    // let chunkSize = 500
    let terminatedOutputCsvPath = "./terminated_members.csv"
    let missingCensusOutputCsvPath = "./missing_census_members.csv"
    let terminationCsvPath = "./termination.csv"

    printfn $"Starting at {DateTime.Now}"
    let recordsFromFile = processRecordsFromFile dataFilePath skipToLineInCensusTxtFile
    let membersFromDatabase =
        recordsFromFile
        |> Seq.chunkBySize chunkSize
        |> Seq.mapi (fun index chunk ->
            printfn $"Processing chunk {index}. The set is {chunk.Length} items"
            async {
                let conn = createNewConnection connectionString
                conn.Open()
                try
                    let! result = recordsFromDatabase conn chunk index
                    return result
                finally
                    conn.Close()
                    conn.Dispose()
            }
        )
        |> fun chunks -> Async.Parallel(chunks, maxDegreeOfParallelism = maxDegreeOfParallelism)
        |> Async.RunSynchronously
        |> List.concat    

    let combinedMembers =
        toCombinedMemberInfo recordsFromFile membersFromDatabase

    let terminatedMembers = combinedMembers 
                            |> List.distinctBy(fun m -> m.Member.ID) 
                            |> List.filter (fun m -> m.Member.STATUS = terminatedStatus)
    let missingCensusMembers = combinedMembers 
                            |> List.distinctBy(fun m -> m.Member.ID) 
                            |> List.filter (fun m -> not m.CensusFound)

    writeMembersToCsv terminatedOutputCsvPath terminatedMembers
    writeMembersToCsv missingCensusOutputCsvPath missingCensusMembers

    printfn $"{terminatedMembers.Length} Terminated members written to {terminatedOutputCsvPath}"
    printfn $"{missingCensusMembers.Length} Missing Census members written to {missingCensusOutputCsvPath}"

    
    printfn $"{terminatedMembers.Length} Terminated members written to {terminationCsvPath}"
    writeTerminatedMembersToCsv terminationCsvPath terminatedMembers

    printf $"Completed at {DateTime.Now}"
    0 // Return an integer exit code

I played around with the timeout, chunk size and max degree of parallelism. It all depends on what the database server can handle.

Thoughts on Using GitHub Copilot

Copilot is becoming an essential tool very quickly. Just as StackOverflow replaced books and sometimes co-workers as a reference, this is expanding and speeding up time to working solution. It was nice to ask in the editor about fixing issues instead of opening a browser to search for it.

The ability to ask about optimization improvements, moving from task to async, generating code to review and converting to C# saved me a lot of time and taught me good approaches.

Overall, I was satisfied with the code it generated. It was very helpful overall.

Converting to C#

I started with the F# code and then asked Copilot to convert it to C#. It did a good job, making static functions. I did prompt it to convert from classes to immutable records and it handled it like a champ.

Are we to a point where it doesn’t matter what language you write code in?

Conclusion

It was nice to apply and use F# in the real world. I’ll definitely keep using it, learning about F# and coming back to it. I hope we get more opporunities to use it in the future, but until then F# is great for scripting and experimentation.



Watch the Story for Good News
I gladly accept BTC Lightning Network tips at [email protected]

Please consider using Brave and adding me to your BAT payment ledger. Then you won't have to see ads! (when I get to $100 in Google Ads for a payout (I'm at $97.66!), I pledge to turn off ads)

Use Brave

Also check out my Resources Page for referrals that would help me.


Swan logo
Use Swan Bitcoin to onramp with low fees and automatic daily cost averaging and get $10 in BTC when you sign up.