<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>project on Eric M. Jalbert</title><link>https://www.ericmjalbert.com/tags/project/</link><description>Recent content in project on Eric M. Jalbert</description><generator>Hugo -- gohugo.io</generator><copyright>Copyright &amp;copy; 2020 - Eric Jalbert</copyright><lastBuildDate>Fri, 26 Jun 2020 00:00:00 +0000</lastBuildDate><atom:link href="https://www.ericmjalbert.com/tags/project/index.xml" rel="self" type="application/rss+xml"/><item><title>Get Bob to Help With Budgeting!</title><link>https://www.ericmjalbert.com/post/2020-06-17_personal_budgetting_project/</link><pubDate>Fri, 26 Jun 2020 00:00:00 +0000</pubDate><guid>https://www.ericmjalbert.com/post/2020-06-17_personal_budgetting_project/</guid><description>&lt;p>Managing personal finance can be difficult.
My wife and I have put some effort into making sure we stay on top of these things.
Over the years we have gone through many iterations of paper documents and Google Sheets to keep track of our budgets.
Lately I&amp;rsquo;ve wondered if there is a way to make this a more complete and approachable process.&lt;/p>
&lt;p>To this goal, I&amp;rsquo;ve created a personal webapp to manage categorizing spendings and tracking personal savings. I call it Budgeting Bob:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/ericmjalbert/budgeting-bob">Here is the github repo for the project&lt;/a>&lt;/li>
&lt;li>&lt;a href="http://budgeting-bob-demo.herokuapp.com">Here is a public demo for the project&lt;/a>: login with &lt;code>username = demo&lt;/code> and &lt;code>password = demo&lt;/code>&lt;/li>
&lt;/ul>
&lt;p>I&amp;rsquo;ll first talk about my own viewpoint on how personal finance works.
Then I&amp;rsquo;ll give an &lt;a href="#budgeting-bob">overview&lt;/a> of the work involved in the Budgeting Bob project.
Following that will be my &lt;a href="#project-reflections">self-reflections&lt;/a> on the project as a whole.&lt;/p>
&lt;hr>
&lt;h1 id="personal-finance-management">Personal Finance Management&lt;/h1>
&lt;blockquote>
&lt;p>Whenever money moves from one place to another I call that a transaction. Personal finance management is just the meaningful analysis of these movements.&lt;/p>
&lt;/blockquote>
&lt;p>Keeping track of one&amp;rsquo;s personal finance is important because otherwise you cannot easily answer basic questions about it.
Simple questions like:&lt;/p>
&lt;ul>
&lt;li>&amp;ldquo;How much money can I spend on food deliveries this month?&amp;rdquo;&lt;/li>
&lt;li>&amp;ldquo;Will I be able to go on vacation and still afford rent?&amp;rdquo;&lt;/li>
&lt;li>&amp;ldquo;I just got paid, why do I have so little money in my bank account?&amp;rdquo;.&lt;/li>
&lt;/ul>
&lt;p>These are the type of questions that personal financing should help with.&lt;/p>
&lt;p>The way that I think of it is that every time money moves from one place to another, a &amp;ldquo;transaction&amp;rdquo; has just occurred.
To this effect, whenever I get paid, a transaction has just moved money from my employers account into my own savings account.
Whenever I pay off my monthly student debt, I make a transaction from my savings account to my OSAP debt account. Whenever I frivolously spend money on a video game, a transaction has occurred from my VISA account to some external &amp;ldquo;spendings&amp;rdquo; account.&lt;/p>
&lt;p>That last example outlines the idea of categorizing spendings so that a transaction for rent can be separated from a transaction for entertainment (video games).
Different categories might have varying importance and it&amp;rsquo;s important to be able to set personal budgets on each category.&lt;/p>
&lt;p>Below, I have a simple diagram that outlines this idea. Every line represents transactions that can move money from one node to another:&lt;/p>
&lt;p>&lt;img src="https://www.ericmjalbert.com/2020-06-17_transaction_mental_map.png" alt="Simple Diagram to show how personal wealth moves around">&lt;/p>
&lt;p>To be honest, I&amp;rsquo;m not an accountant so take this mental idea of finance as just an opinion. Anyways, with that in mind let&amp;rsquo;s get to the actual project.&lt;/p>
&lt;hr>
&lt;h1 id="budgeting-bob">Budgeting Bob&lt;/h1>
&lt;p>&lt;img src="https://www.ericmjalbert.com/2020-06-17_budgeting_bob_home.png" alt="Home page of Budgeting Bob">&lt;/p>
&lt;p>There were 3 main requirements that Budgeting Bob needed to satisfy:&lt;/p>
&lt;ol>
&lt;li>Manage applying categories to transactions.&lt;/li>
&lt;li>Present a simple view of monthly budgets and their current status.&lt;/li>
&lt;li>Be able to get an overview of our total wealth overtime.&lt;/li>
&lt;/ol>
&lt;p>These breakdown into the three separate feature pages of the application: Transactions, Budgets, and Account Totals.&lt;/p>
&lt;h2 id="feature-1-transaction-management">Feature 1: Transaction Management&lt;/h2>
&lt;p>&lt;img src="https://www.ericmjalbert.com/2020-06-17_budgeting_bob_transactions.png" alt="Transaction pages from demo app">&lt;/p>
&lt;ul>
&lt;li>The core problem that this application solves is the managing of transactions.
This means that modeling the transactions is a very important part of it.
After many iterations, the simplest idea was to track how much money a given account has and which transactions are associated to each account.
At the database level this looks like two main tables: &lt;code>transactions&lt;/code> and &lt;code>accounts&lt;/code>.
&lt;ul>
&lt;li>For &lt;code>transactions&lt;/code>, I mostly copied all the data available from RBC&amp;rsquo;s CSV export: &lt;code>transaction_date&lt;/code>, &lt;code>account_id&lt;/code>, and &lt;code>description_1&lt;/code> and &lt;code>description_2&lt;/code> (RBC provides two distinct values and I keep them separated to assist with category assignments).&lt;/li>
&lt;li>For &lt;code>accounts&lt;/code>, I have the needed descriptive information to make them readable: &lt;code>type&lt;/code> and &lt;code>owner&lt;/code> (ie. &lt;code>type=&amp;quot;Savings&amp;quot;&lt;/code>, and &lt;code>owner=&amp;quot;Eric&amp;quot;&lt;/code>).&lt;/li>
&lt;li>I&amp;rsquo;ve also included some metadata on the &lt;code>accounts&lt;/code> table to help with personal bookkeeping: &lt;code>liquidable&lt;/code>, and &lt;code>source_of_truth&lt;/code> are to help track where the data for the account comes from and if the account contains usable money. This is needed because some of the accounts are for student debts and car payments.&lt;/li>
&lt;li>For &lt;code>accounts&lt;/code>, I also have the &lt;code>initial_amount&lt;/code>. This is just the amount of money the account had whenever I recorded the first transaction. This value is used in the Account Total page.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>I&amp;rsquo;ll also talk about how categories are automatically applied.
Whenever a new transaction is added to the table I check the most recent category for a matching &lt;code>description_1&lt;/code>.
This gives situations where we&amp;rsquo;ll manually categorize Hydro bill from &lt;code>description_1='utility bill pmt'&lt;/code>, &lt;code>description_2='enbridge gas #12023'&lt;/code> and the next time that same &lt;code>description_1&lt;/code> appears the category will be automatically applied.
The current algorithm has only been tested with RBC&amp;rsquo;s data export, I&amp;rsquo;m not sure if other banks provide a similar separation of descriptive values.&lt;/p>
&lt;h2 id="feature-2-budget-statuses">Feature 2: Budget Statuses&lt;/h2>
&lt;p>&lt;img src="https://www.ericmjalbert.com/2020-06-17_budgeting_bob_budget_report.png" alt="Budget report pages from demo app">&lt;/p>
&lt;p>This page is pretty self-explanatory.
It&amp;rsquo;s just the monthly aggregation of transactions for each category (from the transactions page above).
The monthly budget amounts are set at the database level with the &lt;a href="https://github.com/ericmjalbert/budgeting-bob/blob/master/initialize_db_demo.sh">database initialization script&lt;/a>.&lt;/p>
&lt;p>For us, this is the main value proposition of using Budgeting Bob. My wife and I needed a way that was dead-simple to check on our status, so that&amp;rsquo;s what this page satisfies.
To further this, I&amp;rsquo;ve also added another column &amp;ldquo;Overall Overage&amp;rdquo; (which probably has a more official financial term).
The idea of Overall Overage is that I wanted to handle cases where we pay for a cat clinic visit and the cat budget is 10x higher than it should be for that one month.
This is &amp;ldquo;okay&amp;rdquo; because we only pay that cat clinic visit once a year, but planning the budget is difficult because it&amp;rsquo;s so skewed.
The Overall overage is just adding all the &lt;code>Remaining Amount&lt;/code> values from all previous months so we can see if we&amp;rsquo;re approaching zero.
If that value is around 0, then the budget is good, if it&amp;rsquo;s way over than it might indicate that we&amp;rsquo;re consistently over budget and need to adjust.&lt;/p>
&lt;h2 id="feature-3-account-total-graphs">Feature 3: Account Total Graphs&lt;/h2>
&lt;p>&lt;img src="https://www.ericmjalbert.com/2020-06-17_budgeting_bob_account_totals.png" alt="Account totals pages from demo app">&lt;/p>
&lt;p>The accounts total graph is just a simple way to get all our accounts in one page.
This data comes free since we&amp;rsquo;re already tracking all the transactions and we have the &lt;code>initial_amount&lt;/code> for each account.
Using this page helps for double checking values since it should agree with each individual bank account.&lt;/p>
&lt;p>The daily graph is the leftover from a scrapped feature.
I originally planned to have a way to forecast how much money we&amp;rsquo;d have for the next &lt;code>n&lt;/code> months, to help with long-term planning.
To complete this I was using linear regression and having another slider that selected the data range to use for building the line.
Without getting too much into the details, this feature was taking more time than I wanted and it wasn&amp;rsquo;t going to add much value since we could already approximate using the graph.
Also my wife did not care about forecasting so it wasn&amp;rsquo;t really worth it.&lt;/p>
&lt;p>About the visualization, I originally used &lt;code>Chart.js&lt;/code> but really wanted to have a slider for date picking.
While researching how to do this I found &lt;a href="http://www.amcharts.com/demos/line-chart-with-scroll-and-zoom/#theme-light">this example&lt;/a>.
I thought that it was the &lt;strong>perfect&lt;/strong> solution to a problem I didn&amp;rsquo;t really know how to solve otherwise, so I swap from &lt;code>Chart.js&lt;/code> to &lt;code>am4chart&lt;/code>.&lt;/p>
&lt;h2 id="other-requirements">Other Requirements&lt;/h2>
&lt;p>There were lots of other &lt;em>minor&lt;/em> requirements that I needed for this to be fully functional.&lt;/p>
&lt;h4 id="working-with-heroku">Working with Heroku&lt;/h4>
&lt;p>I wanted to have a simple cloud platform to host my application.
I also did not want to spend any money.
Heroku is awesome and I&amp;rsquo;ve used it in the past so I decide to use it for a simple setup.&lt;/p>
&lt;h4 id="demo-mode">Demo mode&lt;/h4>
&lt;p>I wanted to have a way to show case this project without publicly revealing my personal bank information.
This led to a &lt;a href="http://budgeting-bob-demo.herokuapp.com">demo mode&lt;/a> that had fake data.&lt;/p>
&lt;p>Because this was hosted on Heroku it was very easy to setup the separate workflow for demo deployments, since I just created a new heroku app using the same repo and just added a new &lt;code>heroku-demo&lt;/code> remote to my local git repo.
This way I can do deploys to &lt;code>git push heroku-demo&lt;/code> and &lt;code>git push heroku&lt;/code>.
The only hard part about this was to generating the fake demo data, which I managed by altering my real transaction data with some random numbers. The actual script can viewed &lt;a href="https://github.com/ericmjalbert/budgeting-bob/blob/master/fill_demo_data.sh">here&lt;/a>.&lt;/p>
&lt;h4 id="database-initialization">Database Initialization&lt;/h4>
&lt;p>I originally wanted to use Flask to manage the database, but I ended up doing a custom bash script since I did the original setup manually.
This isn&amp;rsquo;t very clean but it works in this case.
I think next time though I&amp;rsquo;ll actually write out the data models and let &lt;code>Flask&lt;/code> and &lt;code>SQLalchemy&lt;/code> manage the database, since it would simplify the full workflow.&lt;/p>
&lt;h4 id="rbc-transaction-automation">RBC Transaction Automation&lt;/h4>
&lt;p>I currently use RBC as my bank of choice.
To get transactions from RBC they have a &amp;ldquo;Download Transactions&amp;rdquo; page, which allows users to get a CSV of all the transactions.
Using &lt;a href="https://selenium-python.readthedocs.io/">Selenium&lt;/a>, I wrote an experimental script that navigates to that page and automated the data entry from this CSV export.
Ideally I&amp;rsquo;d use the &lt;a href="https://developer.rbc.com/">RBC Developer API&lt;/a> but I was never able to successfully register.&lt;/p>
&lt;hr>
&lt;h1 id="project-reflections">Project Reflections&lt;/h1>
&lt;p>I want to take a step back to write down some of the self-reflections I had on this project.&lt;/p>
&lt;h2 id="what-did-i-enjoy">What Did I Enjoy&lt;/h2>
&lt;ul>
&lt;li>Working with Heroku was very easy for both local and deployments.&lt;/li>
&lt;li>Working with &lt;a href="https://jinja.palletsprojects.com/en/2.11.x/">jinja2&lt;/a> is interesting because it always has the features that I need, I just never really know what the terms are called.
Using Macros was useful for keeping DRY, but I didn&amp;rsquo;t know what they were called originally so I had to search more than I&amp;rsquo;d like.&lt;/li>
&lt;li>Working with my wife on planning Budgeting Bob was nice.&lt;/li>
&lt;/ul>
&lt;h2 id="what-did-i-dislike">What Did I Dislike&lt;/h2>
&lt;ul>
&lt;li>Login Authentication. I&amp;rsquo;m not confident enough with writing security systems for web applications. I think it&amp;rsquo;s something I&amp;rsquo;ll need to research a bit more moving forward.&lt;/li>
&lt;li>&lt;a href="https://developer.rbc.com/">RBC Developer API&lt;/a> portal never responded to me when I requested access. Which caused me to use Selenium&amp;hellip;.&lt;/li>
&lt;li>Using Selenium for the RBC automation. It&amp;rsquo;s a hacky, messy, and unreliable workflow; I think next time I do a brute-force web scraping task I might take a step back and research other solutions first.&lt;/li>
&lt;/ul>
&lt;h2 id="what-did-i-find-difficult">What Did I Find Difficult&lt;/h2>
&lt;ul>
&lt;li>Front end development, feels like I&amp;rsquo;m actively working against myself when I do anything. Probably because I never actually took a holistic approach to the front-end. I think next time I do some serious front-end work I might try using a front-end library like React.&lt;/li>
&lt;li>Having Heroku handle some of the variables and having others handled by local &lt;code>.env&lt;/code> files made some things confusing.&lt;/li>
&lt;/ul>
&lt;h2 id="what-did-i-learn">What Did I Learn&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Getting &lt;strong>just&lt;/strong> the Minimal Viable Product is extremely useful.
My wife and I have been successfully using Budgeting Bob for many months now, even while it was still in development.
The only things we needed was the database to store transactions and the budget status page.
At first I was manually downloading the CSV and writing SQL queries to &lt;code>UPDATE&lt;/code> the categories of those transactions.
The workflow was painful but it helped prioritize the next steps (Working on automating these steps).
Overtime little changes were developed and that turned Budgeting Bob into something that is easier and more useful, but the MVP was still functional.
If I waited until all the features were completed before using it I might have had a very different application that wouldn&amp;rsquo;t solve the actual problems we had.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Not using &lt;code>SQLalchemy&lt;/code> to manage the data models means you have to write a bunch of stuff yourself. Lesson learned is that for web applications I should not do raw SQL queries (via &lt;code>db.executes&lt;/code>).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>I started the project with local dev work using the production database and so adding the demo workflow was shoehorned in instead of planned.
I think even for personal projects I might consider having a local, staging, and production environment to help at the ending of a project.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>In all honesty, this application is probably overkill for personal financing. If you want to budget just use a Google sheet like: &lt;a href="https://themeasureofaplan.com/budget-tracking-tool/">https://themeasureofaplan.com/budget-tracking-tool/&lt;/a>.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="what-would-i-do-next-for-this-project">What Would I Do Next For This Project&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>I&amp;rsquo;ve noticed that the application is not very mobile friendly. It runs slower and some of the texts overlaps painfully. I think I would put some effort to profile the application and find out ways to optimize the speed and UI for mobile.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Create more automation scripts for other accounts.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;hr>
&lt;p>Thank you for reading this!&lt;/p></description></item><item><title>How to Automate Job Applications</title><link>https://www.ericmjalbert.com/post/2020-06-11_resume_generator/</link><pubDate>Thu, 11 Jun 2020 00:00:00 +0000</pubDate><guid>https://www.ericmjalbert.com/post/2020-06-11_resume_generator/</guid><description>&lt;p>Creating job applications is a needed effort to progress in a professional career. When I do a large batch of job applications I find that I follow one of two main methodologies:&lt;/p>
&lt;ol>
&lt;li>Be non-optimal and use the same resume for all job applications. This is easy, but ineffective,&lt;/li>
&lt;li>Create a specialized job application for each job description that I apply to. This is hard, but tends to yield better results.&lt;/li>
&lt;/ol>
&lt;p>If I&amp;rsquo;m actively searching for a new position I&amp;rsquo;ll tend to do the latter option, but I&amp;rsquo;ve started to wonder if there is a better way to handle this. To this end I planned to automate the creation of specialized job applications.&lt;/p>
&lt;p>The final product can be viewed on github at &lt;a href="https://github.com/ericmjalbert/resume-generator">Resume Generator&lt;/a>.&lt;/p>
&lt;p>I&amp;rsquo;ll first give an overview of my work on the project. Following that will be my &lt;a href="#project-reflections">self-reflections&lt;/a> on the project as a whole.&lt;/p>
&lt;hr>
&lt;p>There are 4 main components to this project:&lt;/p>
&lt;ol>
&lt;li>Create a master resume that contains &lt;em>all&lt;/em> the information about my professional career.&lt;/li>
&lt;li>Have a repeatable way to create a rendered resume PDF.&lt;/li>
&lt;li>Parse a given job description into a list of requirements.&lt;/li>
&lt;li>Intelligently select a subset of the master resume based on the job descriptions requirements.&lt;/li>
&lt;/ol>
&lt;h2 id="component-1-create-a-master-resume">Component 1: Create a Master Resume&lt;/h2>
&lt;ul>
&lt;li>TL;DR I created a JSON object with all my data. Look at my &lt;a href="https://github.com/ericmjalbert/resume-generator/blob/master/master_resume/resume.json">master_resume&lt;/a> to see the result.&lt;/li>
&lt;/ul>
&lt;p>Main idea of component is to stop having the difficult task of manually managing my resume.
With each job application I&amp;rsquo;m currently having to edit the most recent state of my resume and overtime I&amp;rsquo;m losing details and wasting time hunting down specific information that I&amp;rsquo;ve written in the past.
To this end I will convert my resume into a format that is easily readable by both machine and human: JSON.
This master resume will contain &lt;em>all&lt;/em> my professional bullet points so that I don&amp;rsquo;t lose them over time and it provides an easy place for me to update achievements.&lt;/p>
&lt;h3 id="notes-about-master-resume-format">Notes About Master Resume Format&lt;/h3>
&lt;p>I was originally going to use &lt;a href="https://github.com/msiemens/tinydb">TinyDB&lt;/a> to manage a local database of jobs descriptions, work highlights, and skills.
With TinyDB, it stores database tables as separate JSON files on the local disk.
The main issue with using it would be that I only need to manage one resume worth of data.
It&amp;rsquo;d be difficult to manage multiple JSON (or have 1 complex JSON with all the data in one big table)
This idea might be considered more if this project was suppose to serve multiple &amp;ldquo;master resumes&amp;rdquo; or if there was some interface to manage foreign keys across the tables, but that is outside the score of this project.&lt;/p>
&lt;p>Instead of TinyDB, the choice of following the &lt;a href="https://jsonresume.org/">JSON Resume&lt;/a> schema was made because it&amp;rsquo;s the &amp;ldquo;easiest&amp;rdquo; type of format to be human-maintainable.
I originally followed the schema exactly, however I had some problems with the ways &amp;ldquo;Skills&amp;rdquo; were stored and diverged to my own type of JSON resume schema.
The main differences are in the Education and Skills:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-python" data-lang="python">&lt;span style="color:#a6e22e">@dataclass&lt;/span>
&lt;span style="color:#66d9ef">class&lt;/span> &lt;span style="color:#a6e22e">Education&lt;/span>:
&lt;span style="color:#e6db74">&amp;#34;&amp;#34;&amp;#34;Model Academic experiences and highlights&amp;#34;&amp;#34;&amp;#34;&lt;/span>
institution: str
area: str
studyType: str
startDate: str
endDate: str
&lt;span style="color:#75715e"># I removed the &amp;#34;gpa&amp;#34; and courses field from JSON Resume&lt;/span>
&lt;span style="color:#75715e"># gpa: str&lt;/span>
&lt;span style="color:#75715e"># courses: List[str]&lt;/span>
&lt;span style="color:#75715e"># I added &amp;#34;thesis&amp;#34; and &amp;#34;publications&amp;#34;&lt;/span>
thesis: str &lt;span style="color:#f92672">=&lt;/span> None
publications: str &lt;span style="color:#f92672">=&lt;/span> None
&lt;span style="color:#a6e22e">@dataclass&lt;/span>
&lt;span style="color:#66d9ef">class&lt;/span> &lt;span style="color:#a6e22e">Skills&lt;/span>:
&lt;span style="color:#e6db74">&amp;#34;&amp;#34;&amp;#34;Model skill highlights&amp;#34;&amp;#34;&amp;#34;&lt;/span>
name: str
&lt;span style="color:#75715e"># I removed the &amp;#34;level&amp;#34; field from JSON Resume&lt;/span>
&lt;span style="color:#75715e"># level: str&lt;/span>
keywords: List[str]
&lt;/code>&lt;/pre>&lt;/div>&lt;p>This change meant I couldn&amp;rsquo;t use any pre-built tools for &lt;a href="https://github.com/kelvintaywl/jsonresume-validator">Validating JSON resumes&lt;/a>.
But turns out that it wasn&amp;rsquo;t really a huge detriment since I could setup an easy &lt;code>pytest&lt;/code> to do some simple validations.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-python" data-lang="python">&lt;span style="color:#f92672">from&lt;/span> resume_generator.general.resume &lt;span style="color:#f92672">import&lt;/span> Resume
&lt;span style="color:#66d9ef">def&lt;/span> &lt;span style="color:#a6e22e">test_resume_file_is_valid&lt;/span>():
json_resume &lt;span style="color:#f92672">=&lt;/span> Resume()
&lt;span style="color:#66d9ef">assert&lt;/span> json_resume&lt;span style="color:#f92672">.&lt;/span>validate() &lt;span style="color:#75715e"># Custom method to check types and existence on some keys&lt;/span>
&lt;span style="color:#66d9ef">def&lt;/span> &lt;span style="color:#a6e22e">test_resume_file_can_fail&lt;/span>():
fake_resume_data &lt;span style="color:#f92672">=&lt;/span> json&lt;span style="color:#f92672">.&lt;/span>dumps({&lt;span style="color:#e6db74">&amp;#34;fake_resume&amp;#34;&lt;/span>: &lt;span style="color:#e6db74">&amp;#34;blahblah&amp;#34;&lt;/span>})
fake_resume_file &lt;span style="color:#f92672">=&lt;/span> io&lt;span style="color:#f92672">.&lt;/span>StringIO(fake_resume_data)
&lt;span style="color:#66d9ef">with&lt;/span> pytest&lt;span style="color:#f92672">.&lt;/span>raises(&lt;span style="color:#a6e22e">TypeError&lt;/span>):
Resume(fake_resume_file)
fake_resume_file&lt;span style="color:#f92672">.&lt;/span>close()
&lt;span style="color:#66d9ef">def&lt;/span> &lt;span style="color:#a6e22e">test_resume_fields_can_be_accessed&lt;/span>():
test &lt;span style="color:#f92672">=&lt;/span> Resume()
test&lt;span style="color:#f92672">.&lt;/span>basics&lt;span style="color:#f92672">.&lt;/span>name
test&lt;span style="color:#f92672">.&lt;/span>basics&lt;span style="color:#f92672">.&lt;/span>location
[job&lt;span style="color:#f92672">.&lt;/span>company &lt;span style="color:#66d9ef">for&lt;/span> job &lt;span style="color:#f92672">in&lt;/span> test&lt;span style="color:#f92672">.&lt;/span>work]
[skill&lt;span style="color:#f92672">.&lt;/span>keywords &lt;span style="color:#66d9ef">for&lt;/span> skill &lt;span style="color:#f92672">in&lt;/span> test&lt;span style="color:#f92672">.&lt;/span>skills]
&lt;/code>&lt;/pre>&lt;/div>&lt;p>The other benefit to writing my own validator is that I didn&amp;rsquo;t need to include dummy values for &amp;ldquo;required&amp;rdquo; fields of the original schema (eg. &lt;code>location.address&lt;/code> and some entire sections like &lt;code>interests&lt;/code> and &lt;code>volunteer&lt;/code>).&lt;/p>
&lt;h3 id="notes-about-using-dacite">Notes About Using &lt;code>dacite&lt;/code>&lt;/h3>
&lt;p>As part of this project, I wanted to &lt;strong>really&lt;/strong> get to know &lt;code>@dataclass&lt;/code> in python, so I tried modelling everything using it.
I ended up using &lt;a href="https://github.com/konradhalas/dacite">&lt;code>dacite&lt;/code>&lt;/a> as a &amp;ldquo;shortcut&amp;rdquo; to parse a JSON into a &lt;code>dataclass&lt;/code> because doing something like that seemed like such an obvious pattern to use.
Retrospectively, I think it might have been easier for readability to not use the package because now I have this &amp;ldquo;magic&amp;rdquo; function that just handles the initializations.
That being said, it&amp;rsquo;s pretty slick and easy to use, so I think I&amp;rsquo;ll accept the loss of readability for the ease that it brings.&lt;/p>
&lt;h2 id="component-2-repeatable-way-to-render-resume">Component 2: Repeatable Way to Render Resume&lt;/h2>
&lt;ul>
&lt;li>TL;DR I used &lt;a href="https://github.com/salomonelli/best-resume-ever">best-resume-ever&lt;/a> to a moderate amount of success and generated a PDF resume. I still think my original resume looks better though&amp;hellip;.&lt;/li>
&lt;/ul>
&lt;p>&lt;img src="https://www.ericmjalbert.com/2020-06-11_resume_compare.png" alt="Old and new resume comparison">&lt;/p>
&lt;p>This phase is the bread and butter of this entire project.
It represented the main pain point I had with my job application process and so I needed to have a way to generate a PDFs.&lt;/p>
&lt;h3 id="which-format-to-render-pdf-or-docx">Which Format To Render: PDF Or &lt;code>docx&lt;/code>&lt;/h3>
&lt;p>Because of historic problems with creating a &lt;code>docx&lt;/code> version of my resume, the initial plan was to have that be the primary format for rendering.
From &lt;code>docx&lt;/code> it&amp;rsquo;s pretty easy to get a nice PDF; whereas PDF&amp;rsquo;s formatting tends to get mangled when converting to &lt;code>docx&lt;/code>.
The problem is that no one really has anything that does this out of the box, and using API&amp;rsquo;s and libraries that create &lt;code>docx&lt;/code> are pretty hard to work with.
After some effort on testing different ways to do this, I decided to focus on an easy setup to creates a PDF version of my resume.
&lt;code>best-resume-ever&lt;/code> does the one button creating step and it&amp;rsquo;s simple enough for me to understand how to edit the existing templates.&lt;/p>
&lt;h3 id="about-the-external-project-repo">About the External Project Repo&lt;/h3>
&lt;p>Using the external tool &lt;code>best-resume-ever&lt;/code> involves another project repo in my workflow.
This is not really ideal, but with a problem installation automation and environment variables to handle finding the external repo, we&amp;rsquo;re able to work with it in an easy way.
I could probably invest more time into a cleaner solution for this, but since it works and is not a real bottleneck it does not make sense to worry about at this point.&lt;/p>
&lt;p>Together with the master resume I am now able to generate a resume; just a very large one that renders every single bullet point in my master resume.&lt;/p>
&lt;h2 id="component-3-job-description-parser">Component 3: Job Description Parser&lt;/h2>
&lt;ul>
&lt;li>TL;DR Took a simple approach and just grabbed every bullet point (&lt;code>&amp;lt;li&amp;gt;&lt;/code> tag) that matches some basic conditions. Turns out this includes company benefits.&lt;/li>
&lt;/ul>
&lt;p>Now we are starting to get into the more unique part of the project, this is what separates it from a basic &amp;ldquo;Create-a-resume&amp;rdquo; to a &amp;ldquo;Automatically-customized-resume&amp;rdquo; project.
Because remember, the whole point of this is to have a subset of my master resume be used to create a specially tailored job application.
To even be able to do that, I needed to be able to scrap the job requirements from an online job description.
Because every single job description has the same idea of using bullet points to list out the requirements, I just need to grab the &lt;code>&amp;lt;li&amp;gt;&lt;/code> tags using a simple tool to parse HTML: I chose &lt;a href="https://www.crummy.com/software/BeautifulSoup/">Beautiful Soup&lt;/a>.
This is overall pretty easy besides a few simple edge cases:&lt;/p>
&lt;ul>
&lt;li>List of links in the footer for site navigation; just remove points with &lt;code>&amp;lt;a&amp;gt;&lt;/code>.&lt;/li>
&lt;li>Nonsense &lt;code>&amp;lt;li&amp;gt;&lt;/code> elements in the header; just remove points that are too short (&amp;lt; 4 words)&lt;/li>
&lt;/ul>
&lt;p>There is one interesting issue that couldn&amp;rsquo;t be easily solved with the simple approach of grabbing &lt;code>&amp;lt;li&amp;gt;&lt;/code> tags.
Almost every job description also includes a section dedicated to the company selling themselves to you.
This is very appreciated as a potential candidate, but they are very clearly not suppose to be requirements for a custom job application.
A resume shouldn&amp;rsquo;t specifically put a skill bullet point that says &amp;ldquo;I am excellent at ping pong&amp;rdquo; just because the company mentions they regularly have ping pong tournaments.
Here is a striped down example of job application that does this:&lt;/p>
&lt;p>&lt;img src="https://www.ericmjalbert.com/2020-06-11_example_job_scraping.png" alt="Example job scraping problem">&lt;/p>
&lt;p>If I naively pull just the &lt;code>&amp;lt;li&amp;gt;&lt;/code> points I&amp;rsquo;ll be including all the &amp;ldquo;compensation&amp;rdquo; bullet points.
To fix this I attempted futile efforts to read the parents &lt;code>&amp;lt;div&amp;gt;&lt;/code> to see if it said &amp;ldquo;Compensation&amp;rdquo; or &amp;ldquo;What we offer&amp;rdquo;, but that was overly complex.
Instead I decided to create a list of &amp;ldquo;company benefit words&amp;rdquo; that are used in cases that are most often not an actual job requirement:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-python" data-lang="python">company_benefit_word_list &lt;span style="color:#f92672">=&lt;/span> [
&lt;span style="color:#e6db74">&amp;#34;annual&amp;#34;&lt;/span>,
&lt;span style="color:#e6db74">&amp;#34;benefits&amp;#34;&lt;/span>,
&lt;span style="color:#e6db74">&amp;#34;catered&amp;#34;&lt;/span>,
&lt;span style="color:#e6db74">&amp;#34;coffee&amp;#34;&lt;/span>,
&lt;span style="color:#e6db74">&amp;#34;company retreat&amp;#34;&lt;/span>,
&lt;span style="color:#e6db74">&amp;#34;great place to work&amp;#34;&lt;/span>,
&lt;span style="color:#e6db74">&amp;#34;groceries&amp;#34;&lt;/span>,
&lt;span style="color:#e6db74">&amp;#34;healthcare&amp;#34;&lt;/span>,
&lt;span style="color:#e6db74">&amp;#34;laptop&amp;#34;&lt;/span>,
&lt;span style="color:#e6db74">&amp;#34;ping pong&amp;#34;&lt;/span>,
&lt;span style="color:#e6db74">&amp;#34;retirement&amp;#34;&lt;/span>,
&lt;span style="color:#e6db74">&amp;#34;salary&amp;#34;&lt;/span>,
&lt;span style="color:#e6db74">&amp;#34;vacation&amp;#34;&lt;/span>,
&lt;span style="color:#e6db74">&amp;#34;we offer&amp;#34;&lt;/span>,
&lt;span style="color:#e6db74">&amp;#34;weeks&amp;#34;&lt;/span>,
]
&lt;/code>&lt;/pre>&lt;/div>&lt;p>This has the problem that I might over ambitiously remove a legitimate job description point, but I think I&amp;rsquo;m safe with most of these.&lt;/p>
&lt;h3 id="job-description-output">Job Description Output&lt;/h3>
&lt;p>The parsing of a job description produces a list of strings that represent the job&amp;rsquo;s description.
From the above HTML page, it would produce the following job description list:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-python" data-lang="python">[
&lt;span style="color:#e6db74">&amp;#34;You have experience in SQL: You&amp;#39;ve used written complex SQL queries that join across data from multiple systems, matching them up even when there was not a straightforward way to join the tables. You&amp;#39;ve designed tables with an eye towards ease of use and high performance. You&amp;#39;ve documented schemas and created data dictionaries.&amp;#34;&lt;/span>,
&lt;span style="color:#e6db74">&amp;#34;You are a skilled written communicator. We are a 100&lt;/span>&lt;span style="color:#e6db74">% r&lt;/span>&lt;span style="color:#e6db74">emote team and writing is our primary means of communication.&amp;#34;&lt;/span>,
&lt;span style="color:#e6db74">&amp;#34;You appreciate our team&amp;#39;s values of eagerness to collaborate with teammates from any function of the organization or with any level of data knowledge, iterating over your deliverables, and being curious.&amp;#34;&lt;/span>,
&lt;span style="color:#e6db74">&amp;#34;You understand that the perfect is the enemy of the good and default to action by shipping MVP code and iterating as needed to get towards better solutions.&amp;#34;&lt;/span>
]
&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="component-4-create-intelligent-subset-of-master-resume">Component 4: Create &amp;ldquo;Intelligent&amp;rdquo; Subset of Master Resume&lt;/h2>
&lt;ul>
&lt;li>TL;DR Used the most simple implementation. Made it work using &lt;a href="https://tfhub.dev/google/universal-sentence-encoder/4">Universal Sentence Encoder&lt;/a> and &lt;a href="https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html">cosine similarities&lt;/a> to find the resume highlight that is most relevant to the job description.&lt;/li>
&lt;/ul>
&lt;p>This is the &amp;ldquo;Data Science&amp;rdquo; part of the project.
To get a specialized resume for a job application I needed a way to take a subset of my master resume.
For this I used &lt;a href="https://tfhub.dev/google/universal-sentence-encoder/4">Universal Sentence Encoder&lt;/a> to convert the text into a vector and then for each highlight of my master resume I would use &lt;a href="https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html">cosine similarities&lt;/a> to assign a score for each highlight.
This score would be the &amp;ldquo;closeness&amp;rdquo; that a resume highlight has with all the job description points.
Summing up these individual values give a score to the resume highlight.
By getting the score for each resume highlight I just need to select the top 2-4 resume highlights per work-experience and now I have a resume that is tailored for the given job description.&lt;/p>
&lt;p>Let&amp;rsquo;s run through a simple example to see this in action.&lt;/p>
&lt;h3 id="simple-example">Simple Example&lt;/h3>
&lt;p>Let&amp;rsquo;s assume my master resume has 2 resume highlights:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-python" data-lang="python">&lt;span style="color:#e6db74">&amp;#34;work&amp;#34;&lt;/span>: [
{
&lt;span style="color:#e6db74">&amp;#34;company&amp;#34;&lt;/span>: &lt;span style="color:#e6db74">&amp;#34;Cool Company, Inc.&amp;#34;&lt;/span>,
&lt;span style="color:#f92672">...&lt;/span>
&lt;span style="color:#e6db74">&amp;#34;highlights&amp;#34;&lt;/span>: [
&lt;span style="color:#e6db74">&amp;#34;I used my experience at planning projects and communicating with stakeholders to remove inefficiencies in the day-to-day workflow.&amp;#34;&lt;/span>,
&lt;span style="color:#e6db74">&amp;#34;My expertise with AWS helped architecture a scalable infrastructure.&amp;#34;&lt;/span>
]
}
&lt;/code>&lt;/pre>&lt;/div>&lt;p>These are both legitimate highlights for a developer, but the first highlight is better for a project manager and the second is better for software architects or start-ups.&lt;/p>
&lt;p>Let&amp;rsquo;s say we have some obvious job description bullets like:&lt;/p>
&lt;blockquote>
&lt;ul>
&lt;li>&amp;ldquo;Experienced with managing large projects.&amp;rdquo;&lt;/li>
&lt;li>&amp;ldquo;Skilled at communicating with leadership team.&amp;rdquo;&lt;/li>
&lt;li>&amp;ldquo;Handle everyday planning of tasks and duties.&amp;rdquo;&lt;/li>
&lt;/ul>
&lt;/blockquote>
&lt;p>The way the algorithm works is that we convert every text sentence into a vector representation using the &lt;a href="https://tfhub.dev/google/universal-sentence-encoder/4">Universal Sentence Encoder&lt;/a>.
Without getting into too much details this essentially turns the sentence, &amp;ldquo;Experienced with managing large projects&amp;rdquo;, into a list of 512 numbers between -1 and 1.
This list of numbers (vector) represents the &amp;ldquo;meaning&amp;rdquo; of the sentence in numbers that can be compared against the other vectors.&lt;/p>
&lt;p>Using the vector encoding of each resume highlight I can calculate the &amp;ldquo;closeness&amp;rdquo; of each Job description point.
The &amp;ldquo;closeness&amp;rdquo; is calculated using &lt;a href="https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html">cosine similirities&lt;/a>, which is essentially doing computations against the angle between the two vectors.
A cosine score of 1 is the the most similar you can be, a cosine score of -1 is as opposite you can be.
The final aspect is to take the summation of each cosine score to get a single value for each resume highlight.&lt;/p>
&lt;p>Lets go back to our example, pulling in the resume highlights and job descriptions we get the real values of:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-shell" data-lang="shell">RESUME_HIGHLIGHT: &lt;span style="color:#e6db74">&amp;#34;I used my experience at planning projects and communicating
&lt;/span>&lt;span style="color:#e6db74"> with stakeholders to remove inefficiencies in the day-to-day workflow.&amp;#34;&lt;/span>
JOB_DESCRIPTIONS:
&lt;span style="color:#e6db74">&amp;#34;Experienced with managing large projects.&amp;#34;&lt;/span> | cosine score: &lt;span style="color:#e6db74">&amp;#34;0.3377&amp;#34;&lt;/span>
&lt;span style="color:#e6db74">&amp;#34;Skilled at communicating with leadership team.&amp;#34;&lt;/span> | cosine score: &lt;span style="color:#e6db74">&amp;#34;0.1514&amp;#34;&lt;/span>
&lt;span style="color:#e6db74">&amp;#34;Handle everyday planning of tasks and duties.&amp;#34;&lt;/span> | cosine score: &lt;span style="color:#e6db74">&amp;#34;0.2358&amp;#34;&lt;/span>
OVERALL_SCORE: &lt;span style="color:#e6db74">&amp;#34;0.7249&amp;#34;&lt;/span>
RESUME_HIGHLIGHT: &lt;span style="color:#e6db74">&amp;#34;Used AWS to architecture a scalable infrastructure.&amp;#34;&lt;/span>
JOB_DESCRIPTIONS:
&lt;span style="color:#e6db74">&amp;#34;Experienced with managing large projects.&amp;#34;&lt;/span> | cosine score: &lt;span style="color:#e6db74">&amp;#34;0.2326&amp;#34;&lt;/span>
&lt;span style="color:#e6db74">&amp;#34;Skilled at communicating with leadership team.&amp;#34;&lt;/span> | cosine score: &lt;span style="color:#e6db74">&amp;#34;0.0856&amp;#34;&lt;/span>
&lt;span style="color:#e6db74">&amp;#34;Handle everyday planning of tasks and duties.&amp;#34;&lt;/span> | cosine score: &lt;span style="color:#e6db74">&amp;#34;0.1974&amp;#34;&lt;/span>
OVERALL_SCORE: &lt;span style="color:#e6db74">&amp;#34;0.5156&amp;#34;&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>In this case, the higher overall score suggests that we should select the highlight that is more &amp;ldquo;Project Manager&amp;rdquo; since it scored higher.&lt;/p>
&lt;h3 id="problems-with-scoring-algorithm">Problems With Scoring Algorithm&lt;/h3>
&lt;p>This methodology has never actually been tested or compared to others.
I&amp;rsquo;ve manually checked that the output is different between job descriptions, but really only tested it on 3 application.
It would make &lt;strong>much&lt;/strong> more sense to compare this to other methods and actually have some workflow for testing and improving it.
But at this point I just wanted to start using this project to apply to jobs so I settled for the first thing that worked.&lt;/p>
&lt;p>An obvious next step would be to start testing this by scraping LinkedIn or whoishiring.io for a massive amount of jobs and seeing the difference in resume generation over large databases.
This could be accomplished by making a resume that has obvious project management resume highlights and obvious software developer resume highlights and counting how many times the correct resume points make it into the subset.
However, this is all future work.&lt;/p>
&lt;h2 id="what-automatic-resume-generation-looks-like">What Automatic Resume Generation Looks Like:&lt;/h2>
&lt;p>Because there are many components that are needed for the full resume to be generated. I&amp;rsquo;ve moved all of the steps into a &lt;a href="https://github.com/ericmjalbert/resume-generator/blob/master/Makefile">Makefile&lt;/a> to simplify the workflows. Below is an example of a &amp;ldquo;job application&amp;rdquo; using this project (~30 seconds)&lt;/p>
&lt;p>&lt;img src="https://github.com/ericmjalbert/resume-generator/blob/master/assets/example_application.gif?raw=true" alt="example run of resume generator">&lt;/p>
&lt;hr>
&lt;h1 id="project-reflections">Project Reflections&lt;/h1>
&lt;p>I want to take a step back to write down some of the self-reflections I had on this project.&lt;/p>
&lt;h2 id="what-did-i-enjoy">What Did I Enjoy&lt;/h2>
&lt;p>Working with &lt;code>Makefile&lt;/code>. I used to use them a lot back in Grad school and haven&amp;rsquo;t really used it for anything real in a while. But the ability to encapsulate a workflow into a single command is very useful whenever you have multiple technologies interacting with each other. I&amp;rsquo;m not sure I&amp;rsquo;ll use it for every project, but it was easy enough to work with.&lt;/p>
&lt;h2 id="what-did-i-dislike">What Did I Dislike&lt;/h2>
&lt;p>When writing the snapshot cases, I noticed that &lt;em>sometimes&lt;/em> there were failed tests despite being an exact match. Turns out VIM was adding an &lt;code>EOL&lt;/code> character to the end of the files whenever I opened it to visually inspect the cases. This would cause previously passing test cases to be failing for reasons that were hard to diagnose.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>The actual fix to this was to add &lt;code>set nofixendofline&lt;/code> to my &lt;code>.vimrc&lt;/code> file.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The clue that helped me identify the problem was when I tried to manually compare the files using &lt;code>diff&lt;/code> and I saw: &amp;ldquo;&lt;code>\ No newline at end of file&lt;/code>&amp;rdquo;&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Adding the &lt;code>tensorhub&lt;/code> pre-trained model made the normal workflow and test cases run much slower, from less than 1 second to about 30 seconds after optimizations.
This wasn&amp;rsquo;t really an issue, but I noticed that the slower test cases really made my motivation drop and it&amp;rsquo;s something I might try to think about more carefully in future projects.&lt;/p>
&lt;h2 id="what-did-i-find-difficult">What Did I Find Difficult&lt;/h2>
&lt;p>PDF generation.
Just all of it.
Before landing on &lt;code>best-resume-ever&lt;/code>, I experimented with many other solutions and they were either annoyingly difficult to work with or inconsistent in the output.
This part of the project also highlighted my weakest technical ability, front-end development.
Which is a skill set I might need to start taking more seriously moving forward.&lt;/p>
&lt;h2 id="what-did-i-learn">What Did I Learn&lt;/h2>
&lt;p>You don&amp;rsquo;t have to make everything perfectly.
I was starting to lose motivation by the end of the project.
With the amount of things that were not &amp;ldquo;quite right&amp;rdquo; (PDF generation, test cases, complex code structure) I did not see the projects completion coming anytime soon.
The main learning was that I can put off a lot of work to future optimizations, I just needed to have something that &amp;ldquo;worked&amp;rdquo;.
This helped me to: simplify the ML algorithm (no analysis on the accuracy of the highlight scoring), simplify the test cases (only needed the end-to-end test), and not worry about how the PDF was generated.
These were things that I could improve in the future if they needed to be improved.&lt;/p>
&lt;h2 id="what-would-i-do-next-for-this-project">What Would I Do Next For This Project&lt;/h2>
&lt;p>There are a lot of things that I think need to be worked on for this project, but at a high-level I think it&amp;rsquo;d be important to:&lt;/p>
&lt;ul>
&lt;li>Update the scoring algorithm for resume highlights. Not only the ML algorithm (ie. maybe universal sentence encoder isn&amp;rsquo;t the best option), but I also might want to change the way I aggregate the scores together.
&lt;ul>
&lt;li>To expand on that second point, the current setup of summing all the scores for each resume highlight is not good since it might favor resume highlights that do well on multiple bullet points instead of &lt;em>killing it&lt;/em> on a single bullet point (ie. something that is a 90% match to only 1 bullet might not make the cut compared to others that do 20% match to 5 different bullets).&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Add functionality to make a cover letter (using the same formatting as in the &lt;code>best-resume-ever&lt;/code>). Not many companies look for cover letters, but I think they still add a nice level of polish to a job application.&lt;/li>
&lt;li>Add a way to update my LinkedIn with the same content as the &lt;code>master_resume.json&lt;/code> so that I only have to manage it in one place&lt;/li>
&lt;/ul>
&lt;hr>
&lt;p>Thank you for reading this!&lt;/p></description></item></channel></rss>