Data science is a tough field. It combines in equal parts mathematics and statistics, computer science, and black magic. As of mid-2020, it is also a booming field with numerous applicants swarming every job ad. Also, as I mentioned – it is mid-2020, the raging pandemic dragging everything down just that extra bit.
Building up a list of course completion certificates won’t get you very far either, unless you’ve got some bona fides (Masters/PhD) in academic credentials. MOOC certificates like those from Coursera and EdX are nice, but I’ve yet to hear too many examples of them counting for much. Kaggle ain’t what it used to be, either. Its free competitions have become graveyards of useless overfit models, and real competitions are dominated by teams which is hard to compete with, and is of limited use for individual portfolios anyway.
So, how does one go about building a profile online? My personal thought is that just as a famous band once said, you can go your own way.
Instead of trying to do exactly what others do, or did, work on projects that you are interested in, build up a portfolio of your work, and put it up there for the world to see what you did, and what you can do.
Having said all that, I appreciate that it’s easier said than done. Not many data scientists are also designers/front-end developers, and not always keen to pick up that extra skill nor do they necessarily have the time to.
Luckily, we don’t always have to reinvent the wheel. Unlike the old days where portfolios were literally… portfolios full of glossy pages, or resumes that would only come across HR’s desk, many amazing portfolios are available online. These are invaluable resources, so why not make full use of them?
Learning / Inspiration
Outside of using them as references for our own portfolios, these sites are also extremely valuable resources for learning, and for ideas.
Many of these authors’ projects are practical, interesting and original. They are, for my money, also great complementary learning tools. For example, seeing practical applications of an ML tool provides context when learning the theoretical side, as I consider where I might apply this tool in my work or for my clients.
I’ve said enough – let’s take a dive into some of these amazing works, to look at exactly how they are useful.
This is obviously just a few random selections of many, many great portfolios out there. Let me know in the comments about any of your favourites, and if you agree / disagree with my thoughts!
I first came across David Venturi a few years ago while researching data science courses. He had written a blog post called “I Dropped Out of School to Create My Own Data Science Master’s – Here’s My Curriculum”.
That post is from April 2016 and it has certainly stood the test of time, having racked up over 8000 claps on Medium as of August 2020!
Since then, he’s gone on to do much more. He’s created courses for DataCamp, including one for Scala, a part of a Tableau course and one using the MLB’s (baseball) Statcast data.
He has even created a course titled UP AND DOWN WITH THE KARDASHIANS: Python project that uses pandas. Who’d have guessed the word Python to appear next to the word Kardashians, without it being a reference to the latest scandal or a terrible choice of a pet?
Yup. The man is talented.
His portfolio site appropriately reflects this wide range of talents, to showcase the breadth of content types and variety of subject matters in his work.
The headings on Venturi’s site organise its content by the type of end clients. They range all the way from courses, projects and content created for DataCamp or Udacity, to a set of personal projects including articles for FreeCodeCamp, sports analytics and a sprinkling of web apps.
What struck me after looking at the site for a while, though, was the clarity with which it demonstrated the exact types of outputs he is capable of producing. In other words:
Each section on Venturi’s portfolio fulfils a marketing purpose.
The MOOCs are easy — after all, he is a seasoned course producer.
But the next section includes two very different videos to highlight his production skills and comfort in front of a camera. One is an instructional video and the other is a highly produced… dog video (a clever marketing video).
And his personal projects exist to highlight the output medium indicated with bright links. His outputs are segmented with link to one of “Code”, “Demo”, or “Website”. This allows the viewer to instantly see the output of interest in the context of a project.
Even his written works are clearly categorised as one of a “Report”, “Article” or “Post”, explicitly acknowledging intended audience types. Someone reading this is clearly led to a relevant sample product than a mess of “writing samples” sorted by subject matter. (It does make me wonder if he did any scraping or analysis of job postings to arrive at this taxonomy.)
Check out his portfolio here.
Hannah Yan Han
Being a data visualisation geek, this site just immediately filled me with a combination of joy and envy.
The majority of the projects represented on her front page are (gorgeous, I might add) visualisations. Each project is represented by an image, where a mouseover reveals further details about it — as shown in the animation below.
So, within seconds of visiting the site, the reader is given the opportunity to see the range of visualisations that she’s produced, and her technical prowess in using a diverse range of tools from R, D3.js or P5.js to Tableau.
Personally I also really like the clean layout and simple and consistent interface. It’s simply a pleasure to navigate through.
Clicking on each project takes the reader to an article about the visualisation.
She also has a dedicated data science portfolio, which she has placed on a separate page.
Clearly, this layout is designed to convey more information about each data science than those in the visualisation page. By segregating projects by type like she has done, she’s able to achieve visual consistency within each page for the reader. This probably also indicates that generally, the reader (prospective client) is interested in only one of visualisation or data science, rather than both.
Check out her portfolio here.
Before moving on to the next example portfolio, sit down, grab a drink, and brace yourself.
Donne Martin claims to be a software engineer at Facebook, but looking at his website and GitHub page, I am quite convinced that he a time traveller or some sort of a wizard who’s able to stretch time. I’ll get back to this point later, but for now, take a look at the animation below, scrolling through his main website.
His approach to the portfolio site is quite different to those we’ve looked at before this. He takes the approach of letting the crowd noise (i.e. GitHub stars) do the talking, and boy — are they loud.
He casually flaunts the multiple personal projects with 20k+ stars!
His GitHub page itself is very impressive. Since we are discussing data science portfolios, let’s take a look at his repo of data science notebooks.
Remember how I said that I think Martin might be a wizard? Whenever we go back to burning witches and wizards, this data science notebook repo is going to be my primary evidence submission against Martin.
I just don’t understand when he could possibly have had time to create all of these unless he has the ability to slow down time. Here is just a sampling — a very small sampling, actually, of the notebooks that he has made available in this repo.
It’s a dense list, but grouped by the primary library used, it does a great job as a showcase. Even before opening any of his notebooks or even reading the summaries of these notebooks, this list easily demonstrates his work ethic, breadth of skills and ability to communicate and teach.
You could easily spend days, or weeks, browsing through Martin’s portfolio — and personally I don’t think it would be such a bad idea to do so. Check it out here.
Claudia Ten Hoope
Hoope’s website is clean, neat and easy to read. One key difference I wanted to highlight with this portfolio site is that it explicitly doubles as a hiring/enquiry page, with her daily rates etc.
She is a freelancer, so it makes sense for her to spell out the exact services she offers to her prospective clients. The language she uses here also indicate that they are for those who may not necessarily be that familiar with data science.
It’s a good reminder for us to think about who the intended audience is for every piece of communication that we put out there, and to tailor the content accordingly.
Check it out — her page is here.
This is another excellent portfolio website, this time by Julia Nikulski. As with the others, she’s got some kickass projects listed here, each one with a hero image accompanied by a short description and key skills.
I won’t write too much more about it — as the main layout seems to be similar to some of the others, and I don’t read German!
One super interesting and (very meta) highlight is a post entitled “How to Build a Data Science Portfolio Website”, which, if you are reading this, you might find relevant!
Thanks for reading — that was just a small selection of sites I’ve found online. If you have your personal favourites, or (constructive) critiques of the article, please let me know in the comments or on twitter!
ICYMI: Also check out this article comparing Plotly Dash with Streamlit.