This is a log of work I'm doing to get better at data science, machine learning, and any other code-related things. It is a more day-to-day look at work I'm up to: everything from learning new software to tricks with Terminal to designing simple hacks. Basically...
This page is here as an effort to never stop learning. Thanks and hope you enjoy.
Eli5 library is a visual way to see results of model. So nice!
SHAP values are also very cool. Can easily see how different features affect how a row's prediction is different from the baseline. It is a good example of less focus on the complexity of the model and more about critical thinking on what is going on (good for discussing a model with non-technical folks and confirming real-world expectations).
Super helpful to make ML models feel a little less like black boxes. Overall, use eli5 for permutation importance (shuffle feature, look at effect on loss function, thus importance). Partial Dependence plots help to see effect of one variable on the prediction of the baseline (leftmost) value. SHAP is a great way to critically think about the effects of different features on the outcomes of the model.
I've been doing a lot of design work lately and I can feel the ML muscles atrophying. Went through the Intermediate Machine Learning micro-course on Kaggle. Wonderful refresher on the loveliness of pipelines and how to scale things.
Went through Codelab on using Google Material in Sketch. It was helpful to see how a Sketch file of Symbols can he used as a Library and thus across many projects (essentially a global variable for designers). I was having trouble getting the theme editor to work effectively, though (the automatic color palette changes didn't propagate to other projects. Turns out it may not be working in newer Sketch versions). Even so, I will definitely be creating libraries in the future.
Turns out submissions to the Battle Royale Riddler were only open for three days instead of till the following Friday. Sigh. Guess I'll have to wait till they release the data and see if my submission would have made the cut. I wrote something to generate random submissions and test them against each other, keeping the best ones. I took the work from yesterday on what the best submissions were for the past three royales and pitted those against my results to hone in on a good choice. Came up one I'm going with (it was able to beat the top result across all three previous contests by one match, but it lost against other submissions due to a small roster pool). I'm saying my submission is [0, 0, 0, 5, 15, 8, 4, 13, 30, 25]. Fingers crossed.
I was interested in what the previous submissions had done and how well they worked. So that meant I had to make the battle royale happen! I pulled all three CSVs from past royales into a python notebook and was able to make a function that simulates the battle process (I think). Then I could run the results for all previous submissions (ran it on Colab for a GPU boost) and see what works best. I'll probably submit one of them for fun, but no doubt more competitive players will tweak their submissions a little to slip by in the leaderboards. Super cool feeling of pulling some data, writing some code, running it for a while, and getting interesting results.
Recently accepted into beta for GitHub Actions, so thought I'd checkout (get it?) how it works. Did the GitHub Learning Lab course and found it was similar to working with Travis CI, which I used for edav.info/. This switch to provide continuous integration (CI) tools and things like it reminds me of the transition the iPhone made with its flashlight ability. It used to be just a bunch of third party apps, but now it comes standard with every iPhone. Similarly, there are a bunch of CI-type apps for GitHub like Travis CI and Circle CI, but GitHub actions seems to be the response of having it be standard. It looks promising. Would be interested in seeing how this advances.
Initially the plan was to copy over my work from the mirror. However, it turned out to be too much extraneous info (the scales and such were too distracting), so it morphed into a more minimalist table of possible states, relegating the explanation text to the bottom. There was also a transition from the terminology used in the posted solution and my figure. Visually, the letter/number convention for each ball was hard to follow, so instead used three colors to distinguish the groups and just number the balls within each grouping.
Overall, taking the time to understand what was going on made it easier to present the work effectively.
First crack at infinite scrolling for long feed pages on this website. Stack Overflow page from 2019/08/13 log worked great. Modified it to calculate height by main content div, allowing for the bootstrap side-menu to not interfere with the loading. Also chose to make the pages force reloads to the top of the page to avoid weird behaviors. Applied to this page and my food log (also went and reduced image size of food photos to help improve loading). Lookin' good!
Uploaded work for past Riddler. Turns out I was waaay off (did it in 21, it's possible with 12), but it was a pretty exercise. Thinking of them as dynamic structures that can morph was an interesting insight. Fantastic demonstration from @xaqwg showing the inability to get 11.
Also, recognizing this log is getting long and may start to take a while to load (that is a little hopeful considering it is text-focused so far, but this setup is prone to a load hang. For example, my food log already takes a while to load due to photos that should probably be compressed somewhat). Found a Stack Overflow on lazy loading a list and will look into loading in batches rather than all at once. We will see.
Updated my tidytuesday work to publish on my site from the Github master branch. So, they are all now available from a README.md serving as the index file. Did a short short pass at a dataset on video games (more about testing publishing setup). Also took a crack at this week's Riddler.
Working on getting more granular data out of my Noun Project activity. The site provides monthly reports, but I wanted to have more info of how my nouns were being received. Currently collecting download counts every day and creating a big time-series dataframe of every noun I have and how many times it has been downloaded. Need to work on incorporating public activity to only show paid downloads. Currently can get noun count, activity of recorded data (volatile and rolling window), and estimate average interactions per day.
Working on Spark ML course on DataCamp. Similarly with sklearn, pipelines will save your bacon when it comes to creating and evaluating models. I find it similar to learning how to do arithmetic in school and then moving on to more advanced stuff and using a calculator for the arithmetic. At the time, discovering a calculator can do all the stuff learned previously makes it feel like a waste of time to learn it. But, having an understanding in what is going on at the lowest level makes the more advanced work make more sense and help you understand when things go wrong. Same goes for ML pipelines. Doing it by hand, passing all the items between variables, is informative and helps clarify what is going on. However, it leads to so many places to make mistakes. Pipelines are like the calculators in school, consolidating the variable handling. But, it still means care must be taken to ensure the model is doing the right thing. Calculators will do exactly what you tell them to, even if it is junk, so watch out! Overall, the Spark pipeline setup of stages allows for easy use of fit and transform. Lovely. Finished the course.
Combining some work over the weekend. Finished DataCamp course on Building Chatbots in Python. Super cool and really interesting. Would definitely be awesome to work on a user-based interaction system using ML like this. Will be bookmarking RASA to explore more. Got sidetracked while reading Learning Python and made a dict and pkl shirt for MZ Tees. Ended up finishing chapter on tuples and files (284-315).
I think in my heart of hearts the thing I want to work on is going to be programming/design-focused, working on a team and collaborating on a problem much bigger than myself. That sounds great, but I trip myself up before even attempting to look for a job like that.
The biggest anxiety I have about getting a job is my personal feeling that I would be unqualified for a position, so my knee-jerk reaction is not to apply. I know it's not an uncommon anxiety to have, but it still remains difficult for me to overcome that feeling.
This has led to a lot of inactivity on my part that is not healthy or productive. I spend time actively avoiding working on my skills, afraid that if I try and fail (or are not perfect at something from the outset), it will prove that I'm not qualified and validate all those feelings.
And let me stop right here and say that even as I type this, I know that that's stupid. Of course it's going to be hard! Of course you will not be perfect from the outset! WORKING AT IT IS HOW YOU GET BETTER! But this fact that improvement takes effort—some of which will be difficult/frustrating/<another good adjective>—can be hard for me to acknowledge and overcome. But I know for a fact that I've gotten around it in the past.
For example, this website used to be pretty lame. I had to build one for a class I was taking in college, and it was not a great result. And then I made another one for another class. Also not great. But I got better at it. I actively wanted to make something cooler, which led me to slowly make this site from the ground up. Same thing goes for making icons. I was really intimidated by Illustrator (now using Affinity Designer; very happy with my one-time purchase!) and my first few designs were not great. But over time, I was able to make my ideas come to life more easily and now, after many icons made, I can see how far I've come.
Literally. For both of those projects I can actually see that I have gotten better by looking at what I have made in the past. I want to get better at coding/data science/machine learning in the same way. I need a place to see the progress I am making.
So this is that place: a page for me to demonstrate my progress in those things. And yes, while it certainly can help external people see that I am capable of doing work, the main reason for this page's existence—more important than anything else—is to prove to myself that I am making progress regarding my abilities and experience. It's one of those things that may sound obvious to you reading this but trust me: this has been a mini-revelation for me. I have to literally see my progress.
As a result, this page will inevitably be full of less-than-perfect work. It will be full of failures and learning experiences, but it will help me be more active and produce work that relates to my skills. I have to start somewhere because being inactive wishing I had the ability to code really well or work at some big company has demonstrably not gotten me anywhere.
This is a totally self-serving webpage, but it is public for two reasons. One: if it was private, I would have less of an incentive to try to add to it. And Two: I have to believe that I'm not the only one that can benefit from a page like this—even the mindset of it. I love seeing people's amazing work on places like Twitter and LinkedIn, but it has given me the implicit mentality that I have to make something really polished and amazing before it is worth sharing with people. And this has only exacerbated my tendency toward inactivity discussed above. And I know that polished work has behind it a bunch of struggles and difficult challenges, but it can be difficult to acknowledge that when you only see the amazing results, try something for yourself, and brush up against a bunch of difficulties.
So, hopefully you can find something helpful about this page. I know I will.
What's with the brackets?
The brackets spell out "code". The square brackets are the 'c' and 'd', the parens create the 'o', and the open curly brace is the 'e'. You can get it on a tshirt if you want. Helps remind me to keep working at it, but I understand it might bug people who just see some weird bracket syntax error. Oh well, logos can't please everybody…but they can go overlooked.