project on Eric M. Jalbert

Get Bob to Help With Budgeting!

Fri, 26 Jun 2020 00:00:00 +0000

Managing personal finance can be difficult. My wife and I have put some effort into making sure we stay on top of these things. Over the years we have gone through many iterations of paper documents and Google Sheets to keep track of our budgets. Lately I’ve wondered if there is a way to make this a more complete and approachable process.

To this goal, I’ve created a personal webapp to manage categorizing spendings and tracking personal savings. I call it Budgeting Bob:

Here is the github repo for the project
Here is a public demo for the project: login with username = demo and password = demo

I’ll first talk about my own viewpoint on how personal finance works. Then I’ll give an overview of the work involved in the Budgeting Bob project. Following that will be my self-reflections on the project as a whole.

Personal Finance Management

Whenever money moves from one place to another I call that a transaction. Personal finance management is just the meaningful analysis of these movements.

Keeping track of one’s personal finance is important because otherwise you cannot easily answer basic questions about it. Simple questions like:

“How much money can I spend on food deliveries this month?”
“Will I be able to go on vacation and still afford rent?”
“I just got paid, why do I have so little money in my bank account?”.

These are the type of questions that personal financing should help with.

The way that I think of it is that every time money moves from one place to another, a “transaction” has just occurred. To this effect, whenever I get paid, a transaction has just moved money from my employers account into my own savings account. Whenever I pay off my monthly student debt, I make a transaction from my savings account to my OSAP debt account. Whenever I frivolously spend money on a video game, a transaction has occurred from my VISA account to some external “spendings” account.

That last example outlines the idea of categorizing spendings so that a transaction for rent can be separated from a transaction for entertainment (video games). Different categories might have varying importance and it’s important to be able to set personal budgets on each category.

Below, I have a simple diagram that outlines this idea. Every line represents transactions that can move money from one node to another:

To be honest, I’m not an accountant so take this mental idea of finance as just an opinion. Anyways, with that in mind let’s get to the actual project.

Budgeting Bob

There were 3 main requirements that Budgeting Bob needed to satisfy:

Manage applying categories to transactions.
Present a simple view of monthly budgets and their current status.
Be able to get an overview of our total wealth overtime.

These breakdown into the three separate feature pages of the application: Transactions, Budgets, and Account Totals.

Feature 1: Transaction Management

The core problem that this application solves is the managing of transactions. This means that modeling the transactions is a very important part of it. After many iterations, the simplest idea was to track how much money a given account has and which transactions are associated to each account. At the database level this looks like two main tables: transactions and accounts.
- For transactions, I mostly copied all the data available from RBC’s CSV export: transaction_date, account_id, and description_1 and description_2 (RBC provides two distinct values and I keep them separated to assist with category assignments).
- For accounts, I have the needed descriptive information to make them readable: type and owner (ie. type="Savings", and owner="Eric").
- I’ve also included some metadata on the accounts table to help with personal bookkeeping: liquidable, and source_of_truth are to help track where the data for the account comes from and if the account contains usable money. This is needed because some of the accounts are for student debts and car payments.
- For accounts, I also have the initial_amount. This is just the amount of money the account had whenever I recorded the first transaction. This value is used in the Account Total page.

I’ll also talk about how categories are automatically applied. Whenever a new transaction is added to the table I check the most recent category for a matching description_1. This gives situations where we’ll manually categorize Hydro bill from description_1='utility bill pmt', description_2='enbridge gas #12023' and the next time that same description_1 appears the category will be automatically applied. The current algorithm has only been tested with RBC’s data export, I’m not sure if other banks provide a similar separation of descriptive values.

Feature 2: Budget Statuses

This page is pretty self-explanatory. It’s just the monthly aggregation of transactions for each category (from the transactions page above). The monthly budget amounts are set at the database level with the database initialization script.

For us, this is the main value proposition of using Budgeting Bob. My wife and I needed a way that was dead-simple to check on our status, so that’s what this page satisfies. To further this, I’ve also added another column “Overall Overage” (which probably has a more official financial term). The idea of Overall Overage is that I wanted to handle cases where we pay for a cat clinic visit and the cat budget is 10x higher than it should be for that one month. This is “okay” because we only pay that cat clinic visit once a year, but planning the budget is difficult because it’s so skewed. The Overall overage is just adding all the Remaining Amount values from all previous months so we can see if we’re approaching zero. If that value is around 0, then the budget is good, if it’s way over than it might indicate that we’re consistently over budget and need to adjust.

Feature 3: Account Total Graphs

The accounts total graph is just a simple way to get all our accounts in one page. This data comes free since we’re already tracking all the transactions and we have the initial_amount for each account. Using this page helps for double checking values since it should agree with each individual bank account.

The daily graph is the leftover from a scrapped feature. I originally planned to have a way to forecast how much money we’d have for the next n months, to help with long-term planning. To complete this I was using linear regression and having another slider that selected the data range to use for building the line. Without getting too much into the details, this feature was taking more time than I wanted and it wasn’t going to add much value since we could already approximate using the graph. Also my wife did not care about forecasting so it wasn’t really worth it.

About the visualization, I originally used Chart.js but really wanted to have a slider for date picking. While researching how to do this I found this example. I thought that it was the perfect solution to a problem I didn’t really know how to solve otherwise, so I swap from Chart.js to am4chart.

Other Requirements

There were lots of other minor requirements that I needed for this to be fully functional.

Working with Heroku

I wanted to have a simple cloud platform to host my application. I also did not want to spend any money. Heroku is awesome and I’ve used it in the past so I decide to use it for a simple setup.

Demo mode

I wanted to have a way to show case this project without publicly revealing my personal bank information. This led to a demo mode that had fake data.

Because this was hosted on Heroku it was very easy to setup the separate workflow for demo deployments, since I just created a new heroku app using the same repo and just added a new heroku-demo remote to my local git repo. This way I can do deploys to git push heroku-demo and git push heroku. The only hard part about this was to generating the fake demo data, which I managed by altering my real transaction data with some random numbers. The actual script can viewed here.

Database Initialization

I originally wanted to use Flask to manage the database, but I ended up doing a custom bash script since I did the original setup manually. This isn’t very clean but it works in this case. I think next time though I’ll actually write out the data models and let Flask and SQLalchemy manage the database, since it would simplify the full workflow.

RBC Transaction Automation

I currently use RBC as my bank of choice. To get transactions from RBC they have a “Download Transactions” page, which allows users to get a CSV of all the transactions. Using Selenium, I wrote an experimental script that navigates to that page and automated the data entry from this CSV export. Ideally I’d use the RBC Developer API but I was never able to successfully register.

Project Reflections

I want to take a step back to write down some of the self-reflections I had on this project.

What Did I Enjoy

Working with Heroku was very easy for both local and deployments.
Working with jinja2 is interesting because it always has the features that I need, I just never really know what the terms are called. Using Macros was useful for keeping DRY, but I didn’t know what they were called originally so I had to search more than I’d like.
Working with my wife on planning Budgeting Bob was nice.

What Did I Dislike

Login Authentication. I’m not confident enough with writing security systems for web applications. I think it’s something I’ll need to research a bit more moving forward.
RBC Developer API portal never responded to me when I requested access. Which caused me to use Selenium….
Using Selenium for the RBC automation. It’s a hacky, messy, and unreliable workflow; I think next time I do a brute-force web scraping task I might take a step back and research other solutions first.

What Did I Find Difficult

Front end development, feels like I’m actively working against myself when I do anything. Probably because I never actually took a holistic approach to the front-end. I think next time I do some serious front-end work I might try using a front-end library like React.
Having Heroku handle some of the variables and having others handled by local .env files made some things confusing.

What Did I Learn

Getting just the Minimal Viable Product is extremely useful. My wife and I have been successfully using Budgeting Bob for many months now, even while it was still in development. The only things we needed was the database to store transactions and the budget status page. At first I was manually downloading the CSV and writing SQL queries to UPDATE the categories of those transactions. The workflow was painful but it helped prioritize the next steps (Working on automating these steps). Overtime little changes were developed and that turned Budgeting Bob into something that is easier and more useful, but the MVP was still functional. If I waited until all the features were completed before using it I might have had a very different application that wouldn’t solve the actual problems we had.
Not using SQLalchemy to manage the data models means you have to write a bunch of stuff yourself. Lesson learned is that for web applications I should not do raw SQL queries (via db.executes).
I started the project with local dev work using the production database and so adding the demo workflow was shoehorned in instead of planned. I think even for personal projects I might consider having a local, staging, and production environment to help at the ending of a project.
In all honesty, this application is probably overkill for personal financing. If you want to budget just use a Google sheet like: https://themeasureofaplan.com/budget-tracking-tool/.

What Would I Do Next For This Project

I’ve noticed that the application is not very mobile friendly. It runs slower and some of the texts overlaps painfully. I think I would put some effort to profile the application and find out ways to optimize the speed and UI for mobile.
Create more automation scripts for other accounts.

Thank you for reading this!

How to Automate Job Applications

Thu, 11 Jun 2020 00:00:00 +0000

Creating job applications is a needed effort to progress in a professional career. When I do a large batch of job applications I find that I follow one of two main methodologies:

Be non-optimal and use the same resume for all job applications. This is easy, but ineffective,
Create a specialized job application for each job description that I apply to. This is hard, but tends to yield better results.

If I’m actively searching for a new position I’ll tend to do the latter option, but I’ve started to wonder if there is a better way to handle this. To this end I planned to automate the creation of specialized job applications.

The final product can be viewed on github at Resume Generator.

I’ll first give an overview of my work on the project. Following that will be my self-reflections on the project as a whole.

There are 4 main components to this project:

Create a master resume that contains all the information about my professional career.
Have a repeatable way to create a rendered resume PDF.
Parse a given job description into a list of requirements.
Intelligently select a subset of the master resume based on the job descriptions requirements.

Component 1: Create a Master Resume

TL;DR I created a JSON object with all my data. Look at my master_resume to see the result.

Main idea of component is to stop having the difficult task of manually managing my resume. With each job application I’m currently having to edit the most recent state of my resume and overtime I’m losing details and wasting time hunting down specific information that I’ve written in the past. To this end I will convert my resume into a format that is easily readable by both machine and human: JSON. This master resume will contain all my professional bullet points so that I don’t lose them over time and it provides an easy place for me to update achievements.

Notes About Master Resume Format

I was originally going to use TinyDB to manage a local database of jobs descriptions, work highlights, and skills. With TinyDB, it stores database tables as separate JSON files on the local disk. The main issue with using it would be that I only need to manage one resume worth of data. It’d be difficult to manage multiple JSON (or have 1 complex JSON with all the data in one big table) This idea might be considered more if this project was suppose to serve multiple “master resumes” or if there was some interface to manage foreign keys across the tables, but that is outside the score of this project.

Instead of TinyDB, the choice of following the JSON Resume schema was made because it’s the “easiest” type of format to be human-maintainable. I originally followed the schema exactly, however I had some problems with the ways “Skills” were stored and diverged to my own type of JSON resume schema. The main differences are in the Education and Skills:

@dataclass
class Education:
"""Model Academic experiences and highlights"""
institution: str
area: str
studyType: str
startDate: str
endDate: str
# I removed the "gpa" and courses field from JSON Resume
# gpa: str
# courses: List[str]
# I added "thesis" and "publications"
thesis: str = None
publications: str = None
@dataclass
class Skills:
"""Model skill highlights"""
name: str
# I removed the "level" field from JSON Resume
# level: str
keywords: List[str]

This change meant I couldn’t use any pre-built tools for Validating JSON resumes. But turns out that it wasn’t really a huge detriment since I could setup an easy pytest to do some simple validations.

from resume_generator.general.resume import Resume
def test_resume_file_is_valid():
json_resume = Resume()
assert json_resume.validate() # Custom method to check types and existence on some keys
def test_resume_file_can_fail():
fake_resume_data = json.dumps({"fake_resume": "blahblah"})
fake_resume_file = io.StringIO(fake_resume_data)
with pytest.raises(TypeError):
Resume(fake_resume_file)
fake_resume_file.close()
def test_resume_fields_can_be_accessed():
test = Resume()
test.basics.name
test.basics.location
[job.company for job in test.work]
[skill.keywords for skill in test.skills]

The other benefit to writing my own validator is that I didn’t need to include dummy values for “required” fields of the original schema (eg. location.address and some entire sections like interests and volunteer).

Notes About Using `dacite`

As part of this project, I wanted to really get to know @dataclass in python, so I tried modelling everything using it. I ended up using dacite as a “shortcut” to parse a JSON into a dataclass because doing something like that seemed like such an obvious pattern to use. Retrospectively, I think it might have been easier for readability to not use the package because now I have this “magic” function that just handles the initializations. That being said, it’s pretty slick and easy to use, so I think I’ll accept the loss of readability for the ease that it brings.

Component 2: Repeatable Way to Render Resume

TL;DR I used best-resume-ever to a moderate amount of success and generated a PDF resume. I still think my original resume looks better though….

This phase is the bread and butter of this entire project. It represented the main pain point I had with my job application process and so I needed to have a way to generate a PDFs.

Which Format To Render: PDF Or `docx`

Because of historic problems with creating a docx version of my resume, the initial plan was to have that be the primary format for rendering. From docx it’s pretty easy to get a nice PDF; whereas PDF’s formatting tends to get mangled when converting to docx. The problem is that no one really has anything that does this out of the box, and using API’s and libraries that create docx are pretty hard to work with. After some effort on testing different ways to do this, I decided to focus on an easy setup to creates a PDF version of my resume. best-resume-ever does the one button creating step and it’s simple enough for me to understand how to edit the existing templates.

About the External Project Repo

Using the external tool best-resume-ever involves another project repo in my workflow. This is not really ideal, but with a problem installation automation and environment variables to handle finding the external repo, we’re able to work with it in an easy way. I could probably invest more time into a cleaner solution for this, but since it works and is not a real bottleneck it does not make sense to worry about at this point.

Together with the master resume I am now able to generate a resume; just a very large one that renders every single bullet point in my master resume.

Component 3: Job Description Parser

TL;DR Took a simple approach and just grabbed every bullet point (<li> tag) that matches some basic conditions. Turns out this includes company benefits.

Now we are starting to get into the more unique part of the project, this is what separates it from a basic “Create-a-resume” to a “Automatically-customized-resume” project. Because remember, the whole point of this is to have a subset of my master resume be used to create a specially tailored job application. To even be able to do that, I needed to be able to scrap the job requirements from an online job description. Because every single job description has the same idea of using bullet points to list out the requirements, I just need to grab the <li> tags using a simple tool to parse HTML: I chose Beautiful Soup. This is overall pretty easy besides a few simple edge cases:

List of links in the footer for site navigation; just remove points with <a>.
Nonsense <li> elements in the header; just remove points that are too short (< 4 words)

There is one interesting issue that couldn’t be easily solved with the simple approach of grabbing <li> tags. Almost every job description also includes a section dedicated to the company selling themselves to you. This is very appreciated as a potential candidate, but they are very clearly not suppose to be requirements for a custom job application. A resume shouldn’t specifically put a skill bullet point that says “I am excellent at ping pong” just because the company mentions they regularly have ping pong tournaments. Here is a striped down example of job application that does this:

If I naively pull just the <li> points I’ll be including all the “compensation” bullet points. To fix this I attempted futile efforts to read the parents <div> to see if it said “Compensation” or “What we offer”, but that was overly complex. Instead I decided to create a list of “company benefit words” that are used in cases that are most often not an actual job requirement:

company_benefit_word_list = [
"annual",
"benefits",
"catered",
"coffee",
"company retreat",
"great place to work",
"groceries",
"healthcare",
"laptop",
"ping pong",
"retirement",
"salary",
"vacation",
"we offer",
"weeks",
]

This has the problem that I might over ambitiously remove a legitimate job description point, but I think I’m safe with most of these.

Job Description Output

The parsing of a job description produces a list of strings that represent the job’s description. From the above HTML page, it would produce the following job description list:

[
"You have experience in SQL: You've used written complex SQL queries that join across data from multiple systems, matching them up even when there was not a straightforward way to join the tables. You've designed tables with an eye towards ease of use and high performance. You've documented schemas and created data dictionaries.",
"You are a skilled written communicator. We are a 100% remote team and writing is our primary means of communication.",
"You appreciate our team's values of eagerness to collaborate with teammates from any function of the organization or with any level of data knowledge, iterating over your deliverables, and being curious.",
"You understand that the perfect is the enemy of the good and default to action by shipping MVP code and iterating as needed to get towards better solutions."
]

Component 4: Create “Intelligent” Subset of Master Resume

TL;DR Used the most simple implementation. Made it work using Universal Sentence Encoder and cosine similarities to find the resume highlight that is most relevant to the job description.

This is the “Data Science” part of the project. To get a specialized resume for a job application I needed a way to take a subset of my master resume. For this I used Universal Sentence Encoder to convert the text into a vector and then for each highlight of my master resume I would use cosine similarities to assign a score for each highlight. This score would be the “closeness” that a resume highlight has with all the job description points. Summing up these individual values give a score to the resume highlight. By getting the score for each resume highlight I just need to select the top 2-4 resume highlights per work-experience and now I have a resume that is tailored for the given job description.

Let’s run through a simple example to see this in action.

Simple Example

Let’s assume my master resume has 2 resume highlights:

"work": [
{
"company": "Cool Company, Inc.",
...
"highlights": [
"I used my experience at planning projects and communicating with stakeholders to remove inefficiencies in the day-to-day workflow.",
"My expertise with AWS helped architecture a scalable infrastructure."
]
}

These are both legitimate highlights for a developer, but the first highlight is better for a project manager and the second is better for software architects or start-ups.

Let’s say we have some obvious job description bullets like:

“Experienced with managing large projects.”

“Skilled at communicating with leadership team.”

“Handle everyday planning of tasks and duties.”

The way the algorithm works is that we convert every text sentence into a vector representation using the Universal Sentence Encoder. Without getting into too much details this essentially turns the sentence, “Experienced with managing large projects”, into a list of 512 numbers between -1 and 1. This list of numbers (vector) represents the “meaning” of the sentence in numbers that can be compared against the other vectors.

Using the vector encoding of each resume highlight I can calculate the “closeness” of each Job description point. The “closeness” is calculated using cosine similirities, which is essentially doing computations against the angle between the two vectors. A cosine score of 1 is the the most similar you can be, a cosine score of -1 is as opposite you can be. The final aspect is to take the summation of each cosine score to get a single value for each resume highlight.

Lets go back to our example, pulling in the resume highlights and job descriptions we get the real values of:

RESUME_HIGHLIGHT: "I used my experience at planning projects and communicating
 with stakeholders to remove inefficiencies in the day-to-day workflow."
JOB_DESCRIPTIONS:
"Experienced with managing large projects." | cosine score: "0.3377"
"Skilled at communicating with leadership team." | cosine score: "0.1514"
"Handle everyday planning of tasks and duties." | cosine score: "0.2358"
OVERALL_SCORE: "0.7249"
RESUME_HIGHLIGHT: "Used AWS to architecture a scalable infrastructure."
JOB_DESCRIPTIONS:
"Experienced with managing large projects." | cosine score: "0.2326"
"Skilled at communicating with leadership team." | cosine score: "0.0856"
"Handle everyday planning of tasks and duties." | cosine score: "0.1974"
OVERALL_SCORE: "0.5156"

In this case, the higher overall score suggests that we should select the highlight that is more “Project Manager” since it scored higher.

Problems With Scoring Algorithm

This methodology has never actually been tested or compared to others. I’ve manually checked that the output is different between job descriptions, but really only tested it on 3 application. It would make much more sense to compare this to other methods and actually have some workflow for testing and improving it. But at this point I just wanted to start using this project to apply to jobs so I settled for the first thing that worked.

An obvious next step would be to start testing this by scraping LinkedIn or whoishiring.io for a massive amount of jobs and seeing the difference in resume generation over large databases. This could be accomplished by making a resume that has obvious project management resume highlights and obvious software developer resume highlights and counting how many times the correct resume points make it into the subset. However, this is all future work.

What Automatic Resume Generation Looks Like:

Because there are many components that are needed for the full resume to be generated. I’ve moved all of the steps into a Makefile to simplify the workflows. Below is an example of a “job application” using this project (~30 seconds)

Project Reflections

I want to take a step back to write down some of the self-reflections I had on this project.

What Did I Enjoy

Working with Makefile. I used to use them a lot back in Grad school and haven’t really used it for anything real in a while. But the ability to encapsulate a workflow into a single command is very useful whenever you have multiple technologies interacting with each other. I’m not sure I’ll use it for every project, but it was easy enough to work with.

What Did I Dislike

When writing the snapshot cases, I noticed that sometimes there were failed tests despite being an exact match. Turns out VIM was adding an EOL character to the end of the files whenever I opened it to visually inspect the cases. This would cause previously passing test cases to be failing for reasons that were hard to diagnose.

The actual fix to this was to add set nofixendofline to my .vimrc file.
The clue that helped me identify the problem was when I tried to manually compare the files using diff and I saw: “\ No newline at end of file”

Adding the tensorhub pre-trained model made the normal workflow and test cases run much slower, from less than 1 second to about 30 seconds after optimizations. This wasn’t really an issue, but I noticed that the slower test cases really made my motivation drop and it’s something I might try to think about more carefully in future projects.

What Did I Find Difficult

PDF generation. Just all of it. Before landing on best-resume-ever, I experimented with many other solutions and they were either annoyingly difficult to work with or inconsistent in the output. This part of the project also highlighted my weakest technical ability, front-end development. Which is a skill set I might need to start taking more seriously moving forward.

What Did I Learn

You don’t have to make everything perfectly. I was starting to lose motivation by the end of the project. With the amount of things that were not “quite right” (PDF generation, test cases, complex code structure) I did not see the projects completion coming anytime soon. The main learning was that I can put off a lot of work to future optimizations, I just needed to have something that “worked”. This helped me to: simplify the ML algorithm (no analysis on the accuracy of the highlight scoring), simplify the test cases (only needed the end-to-end test), and not worry about how the PDF was generated. These were things that I could improve in the future if they needed to be improved.

What Would I Do Next For This Project

There are a lot of things that I think need to be worked on for this project, but at a high-level I think it’d be important to:

Update the scoring algorithm for resume highlights. Not only the ML algorithm (ie. maybe universal sentence encoder isn’t the best option), but I also might want to change the way I aggregate the scores together.
- To expand on that second point, the current setup of summing all the scores for each resume highlight is not good since it might favor resume highlights that do well on multiple bullet points instead of killing it on a single bullet point (ie. something that is a 90% match to only 1 bullet might not make the cut compared to others that do 20% match to 5 different bullets).
Add functionality to make a cover letter (using the same formatting as in the best-resume-ever). Not many companies look for cover letters, but I think they still add a nice level of polish to a job application.
Add a way to update my LinkedIn with the same content as the master_resume.json so that I only have to manage it in one place

Thank you for reading this!

project on Eric M. Jalbert

Get Bob to Help With Budgeting!

Personal Finance Management

Budgeting Bob

Feature 1: Transaction Management

Feature 2: Budget Statuses

Feature 3: Account Total Graphs

Other Requirements

Working with Heroku

Demo mode

Database Initialization

RBC Transaction Automation

Project Reflections

What Did I Enjoy

What Did I Dislike

What Did I Find Difficult

What Did I Learn

What Would I Do Next For This Project

How to Automate Job Applications

Component 1: Create a Master Resume

Notes About Master Resume Format

Notes About Using dacite

Component 2: Repeatable Way to Render Resume

Which Format To Render: PDF Or docx

About the External Project Repo

Component 3: Job Description Parser

Job Description Output

Component 4: Create “Intelligent” Subset of Master Resume

Simple Example

Problems With Scoring Algorithm

What Automatic Resume Generation Looks Like:

Project Reflections

What Did I Enjoy

What Did I Dislike

What Did I Find Difficult

What Did I Learn

What Would I Do Next For This Project

Notes About Using `dacite`

Which Format To Render: PDF Or `docx`