I had an task to read from a file, look up the values in the database and find discrepencies. Since I had some extra time, I decided to apply my F# learning to this. The script won’t be long lasting, so I didn’t ask for permissions first. I used Copilot along the way to help with syntax and building issues that I wan’t as familiar with. It allowed me to move from task to async computation expressions quickly and learn how to work with files and databases.
I have a file with 80,000 rows and multiple database reads so Copilot helped me optimize that, but it still takes 30 minutes to run the full thing.
There are a lot of options for
database querying
listed. I chose
FSharp.Dapper
as I was familiar with Dapper in C# and wanted to add queries quickly. I did end up using the select { }
expression, but then moved to the real Dapper to have a direct query after my joining included more than the supported 3 tables.
I contributed some documentation updates after not being sure about views and types. It was accepted!
I started with a .fsx repl and later moved to a Program.fs with a .fsproj when I needed to send in params and have someone else run it.
I’ll share some snippets of the code, but leave out other parts.
open System
type RecordFromFile = {
First_ID: string
ID_With_Dash: string
ID: string
FirstName: string
LastName: string
EffectiveDate: int
}
let idWithDash id =
if not (String.IsNullOrWhiteSpace id) then
id.Insert(id.Length - 2, "-")
else ""
let processRecordsFromFile dataFilePath skipToLineInFile =
// Instead of loading all rows into memory, process rows as they are read from the file or database.
seq {
use reader = System.IO.File.OpenText(dataFilePath)
let mutable lineNumber = 0
while not reader.EndOfStream do
let line = reader.ReadLine()
lineNumber <- lineNumber + 1
if lineNumber >= skipToLineInFile && not (String.IsNullOrWhiteSpace(line)) then
let fields = line.Split('|')
yield {
First_ID = fields.[0]
ID = fields.[2]
ID_With_Dash = idWithDash fields.[2]
FirstName = fields.[4]
LastName = fields.[5]
EffectiveDate = int fields.[8]
}
}
open Dapper.FSharp.MSSQL
open Dapper
open Microsoft.Data.SqlClient
// Anonymized MEMBER_CENSUS type
type MEMBER_CENSUS = {
AlternateId: string
FirstName: string
LastName: string
CoverageEffectiveDate: string
}
// Anonymized MemberJoined type
type MemberJoined = {
MemberId: string
MemberStatus: string
SSN: string
LastName: string
AccountKey: decimal
TerminationDate: DateTime
PlanStartDate: DateTime
PlanEndDate: DateTime
}
let createNewConnection connectionString =
new SqlConnection(connectionString)
let queryDatabaseByIds (conn: SqlConnection) (ids: string list, ids_with_dash: string list) =
let memberCensusTable = table<MEMBER_CENSUS>
async {
// using FSharp.Dapper
let! censusMembers =
select {
for record in memberCensusTable do
where (isIn record.AlternateId ids)
selectAll
}
|> conn.SelectAsync<MEMBER_CENSUS>
|> Async.AwaitTask
let censusMembersList = censusMembers |> Seq.toList
printfn $"{censusMembersList.Length} Census Members"
// the ids in the tables were different so I couldn't do a more efficient direct join
// I removed some of the joins from my query, but this goes into MemberJoined easily
// using Dapper here
let! members =
let query = """
SELECT
m.MemberId
m.MemberStatus,
m.SSN,
m.LastName,
m.AccountKey,
mps.TerminationDate,
mps.PlanStartDate,
mps.PlanEndDate
FROM dbo.Members m
INNER JOIN dbo.MemberPlans mps ON m.AccountKey = mps.AccountKey
WHERE m.MemberId IN @idsWithDash
"""
let parameters =
let parameters = new DynamicParameters()
parameters.Add("@idsWithDash", ids_with_dash)
parameters
printfn $"Ids IN { ids_with_dash }"
conn.QueryAsync<MemberJoined>(query, parameters)
|> Async.AwaitTask
let membersList = members |> Seq.toList
printfn $"{membersList.Length} Members"
return censusMembersList, membersList
}
let recordsFromDatabase firstConn recordsFromFile chunkSize maxDegreeOfParallelism = async {
let processChunk (index: int) (chunk: RecordFromFile seq) =
printfn $"Processing chunk {index}"
let conn = ensureConnectionOpenWithRetry firstConn 2
let ids = chunk |> Seq.map (fun record -> record.ID) |> Seq.toList
let ids_with_dash = chunk |> Seq.map (fun record -> record.ID_With_Dash) |> Seq.toList
async {
let! censusMembers, members = queryDatabaseByIds conn (ids, ids_with_dash)
printfn $"Chunk {index} Loaded"
return censusMembers, members
}
let! chunkResults =
recordsFromFile
|> Seq.chunkBySize chunkSize
|> Seq.mapi processChunk
// |> Async.Parallel
|> fun chunks -> Async.Parallel(chunks, maxDegreeOfParallelism=maxDegreeOfParallelism)
return chunkResults |> Array.toList
}
FSharp.Data for the win!
This won’t compile with the code above, because I left out my toCombinedMemberInfo function that puts members into this structure, but you get the idea.
open FSharp.Data
type MemberCsv = CsvProvider<Sample = "Status,Alt_ID,M_ID,FirstName,LastName,EffectiveDate,MEMBER_TERMINATION_DATE,MEMBER_PLAN_START,MEMBER_PLAN_END">
let writeMembersToCsv (filePath: string) members =
let rows =
members
|> List.map (fun m ->
MemberCsv.Row(
m.Member.STATUS,
m.CensusMember.ALT_ID,
m.Member.ID,
m.CensusMember.MEMBER_FIRST_NAME,
m.CensusMember.MEMBER_LAST_NAME,
m.CensusMember.EFFECTIVE_DATE_OF_COVERAGE,
string m.Member.TERMINATION_DATE,
string m.Member.PLAN_START,
string m.Member.PLAN_END
)
)
let csv = new MemberCsv(rows)
csv.Save filePath
// Main function
// dotnet run "Server=test;Database=MyDatabase;Integrated Security=SSPI;MultipleActiveResultSets=True;Connect Timeout=160;Max Pool Size=100;" "./data.txt" "2000"
[<EntryPoint>]
let main argv =
if argv.Length < 3 then
printfn "Usage: censusCheck <connectionString> <dataFilePath> <chunkSize>"
Environment.Exit(1)
let connectionString = argv.[0]
let dataFilePath = argv.[1]
let chunkSize = int argv.[2]
let maxDegreeOfParallelism = 2
// for testing, start from this line in the Census txt file
let skipToLineInCensusTxtFile = 0
// let connectionString = "Server=test;Database=MyDatabase;Integrated Security=SSPI;MultipleActiveResultSets=True;Max Pool Size=100;Connect Timeout=160;"
// let dataFilePath = @"./data.txt"
// let chunkSize = 500
let terminatedOutputCsvPath = "./terminated_members.csv"
let missingCensusOutputCsvPath = "./missing_census_members.csv"
let terminationCsvPath = "./termination.csv"
printfn $"Starting at {DateTime.Now}"
let recordsFromFile = processRecordsFromFile dataFilePath skipToLineInCensusTxtFile
let membersFromDatabase =
recordsFromFile
|> Seq.chunkBySize chunkSize
|> Seq.mapi (fun index chunk ->
printfn $"Processing chunk {index}. The set is {chunk.Length} items"
async {
let conn = createNewConnection connectionString
conn.Open()
try
let! result = recordsFromDatabase conn chunk index
return result
finally
conn.Close()
conn.Dispose()
}
)
|> fun chunks -> Async.Parallel(chunks, maxDegreeOfParallelism = maxDegreeOfParallelism)
|> Async.RunSynchronously
|> List.concat
let combinedMembers =
toCombinedMemberInfo recordsFromFile membersFromDatabase
let terminatedMembers = combinedMembers
|> List.distinctBy(fun m -> m.Member.ID)
|> List.filter (fun m -> m.Member.STATUS = terminatedStatus)
let missingCensusMembers = combinedMembers
|> List.distinctBy(fun m -> m.Member.ID)
|> List.filter (fun m -> not m.CensusFound)
writeMembersToCsv terminatedOutputCsvPath terminatedMembers
writeMembersToCsv missingCensusOutputCsvPath missingCensusMembers
printfn $"{terminatedMembers.Length} Terminated members written to {terminatedOutputCsvPath}"
printfn $"{missingCensusMembers.Length} Missing Census members written to {missingCensusOutputCsvPath}"
printfn $"{terminatedMembers.Length} Terminated members written to {terminationCsvPath}"
writeTerminatedMembersToCsv terminationCsvPath terminatedMembers
printf $"Completed at {DateTime.Now}"
0 // Return an integer exit code
I played around with the timeout, chunk size and max degree of parallelism. It all depends on what the database server can handle.
Copilot is becoming an essential tool very quickly. Just as StackOverflow replaced books and sometimes co-workers as a reference, this is expanding and speeding up time to working solution. It was nice to ask in the editor about fixing issues instead of opening a browser to search for it.
The ability to ask about optimization improvements, moving from task to async, generating code to review and converting to C# saved me a lot of time and taught me good approaches.
Overall, I was satisfied with the code it generated. It was very helpful overall.
I started with the F# code and then asked Copilot to convert it to C#. It did a good job, making static functions. I did prompt it to convert from classes to immutable records and it handled it like a champ.
Are we to a point where it doesn’t matter what language you write code in?
It was nice to apply and use F# in the real world. I’ll definitely keep using it, learning about F# and coming back to it. I hope we get more opporunities to use it in the future, but until then F# is great for scripting and experimentation.
Please consider using Brave and adding me to your BAT payment ledger. Then you won't have to see ads! (when I get to $100 in Google Ads for a payout (I'm at $97.66!), I pledge to turn off ads)
Also check out my Resources Page for referrals that would help me.