Jekyll2020-05-14T11:30:32-04:00https://fniksic.github.io/feed.xmlFilip NikšićFilip Nikšić's personal web pageFilip NikšićA Twist on a Coding Interview Question2020-05-02T00:00:00-04:002020-05-02T00:00:00-04:00https://fniksic.github.io/blog/2020/05/02/twist-on-coding-interview-question<p>With my postdoctoral appointment at the University of Pennsylvania coming to
an end this summer, I’m in the process of finding a new job. And even though
this wasn’t my first choice when I started the postdoc a year and a half ago,
I ended up applying for software engineering positions at big tech
companies. What this means is that I’m facing a scary obstacle—the
notorious coding interviews.</p>
<p>To increase my chance of success, or at least to comfort myself that I’m
actually doing something to prepare for these interviews, I acquired the
obligatory book, <a href="http://www.crackingthecodinginterview.com/">Cracking the Coding Interview</a>. As I was reading the chapter
about the big O notation, one of the sample problems caught my attention.
The problem is to analyze the time complexity of the following code that
prints out all strings of length <script type="math/tex">k</script> consisting of the lower-case
letters a–z in increasing order. (Hopefully I’m not doing a serious
copyright infringement by showing the code here).</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">printSortedStrings</span><span class="o">(</span><span class="kt">int</span> <span class="n">k</span><span class="o">)</span> <span class="o">{</span>
<span class="n">printSortedStrings</span><span class="o">(</span><span class="n">k</span><span class="o">,</span> <span class="s">""</span><span class="o">);</span>
<span class="o">}</span>
<span class="kt">void</span> <span class="nf">printSortedStrings</span><span class="o">(</span><span class="kt">int</span> <span class="n">k</span><span class="o">,</span> <span class="nc">String</span> <span class="n">prefix</span><span class="o">)</span> <span class="o">{</span>
<span class="k">if</span> <span class="o">(</span><span class="n">k</span> <span class="o">==</span> <span class="mi">0</span><span class="o">)</span> <span class="o">{</span>
<span class="k">if</span> <span class="o">(</span><span class="nc">IsInOrder</span><span class="o">(</span><span class="n">prefix</span><span class="o">))</span> <span class="o">{</span>
<span class="nc">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="n">prefix</span><span class="o">);</span>
<span class="o">}</span>
<span class="o">}</span> <span class="k">else</span> <span class="o">{</span>
<span class="k">for</span> <span class="o">(</span><span class="kt">char</span> <span class="n">c</span> <span class="o">=</span> <span class="sc">'a'</span><span class="o">;</span> <span class="n">c</span> <span class="o"><=</span> <span class="sc">'z'</span><span class="o">;</span> <span class="n">c</span><span class="o">++)</span> <span class="o">{</span>
<span class="n">printSortedStrings</span><span class="o">(</span><span class="n">k</span> <span class="o">-</span> <span class="mi">1</span><span class="o">,</span> <span class="n">prefix</span> <span class="o">+</span> <span class="n">c</span><span class="o">);</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="kt">boolean</span> <span class="nf">IsInOrder</span><span class="o">(</span><span class="nc">String</span> <span class="n">s</span><span class="o">)</span> <span class="o">{</span>
<span class="kt">boolean</span> <span class="n">isInOrder</span> <span class="o">=</span> <span class="kc">true</span><span class="o">;</span>
<span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="o">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">s</span><span class="o">.</span><span class="na">length</span><span class="o">();</span> <span class="n">i</span><span class="o">++)</span> <span class="o">{</span>
<span class="kt">int</span> <span class="n">prev</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="na">charAt</span><span class="o">(</span><span class="n">i</span> <span class="o">-</span> <span class="mi">1</span><span class="o">);</span>
<span class="kt">int</span> <span class="n">curr</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="na">charAt</span><span class="o">(</span><span class="n">i</span><span class="o">);</span>
<span class="k">if</span> <span class="o">(</span><span class="n">prev</span> <span class="o">></span> <span class="n">curr</span><span class="o">)</span> <span class="o">{</span>
<span class="n">isInOrder</span> <span class="o">=</span> <span class="kc">false</span><span class="o">;</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="k">return</span> <span class="n">isInOrder</span><span class="o">;</span>
<span class="o">}</span>
</code></pre></div></div>
<p>If we write <script type="math/tex">n</script> for the number of letters in the alphabet, then
the time complexity is clearly <script type="math/tex">O(k n^k)</script>: the code generates all <script type="math/tex">n^k</script>
possible strings, and for each of them calls the function <code class="language-plaintext highlighter-rouge">IsInOrder</code>,
which runs in time <script type="math/tex">O(k)</script>.</p>
<p>The bit about this problem that’s interesting to me is
that the function <code class="language-plaintext highlighter-rouge">IsInOrder</code> can be
trivially improved by exiting the for loop (and the function)
as soon as we find that the string is not in order:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">boolean</span> <span class="nf">IsInOrder</span><span class="o">(</span><span class="nc">String</span> <span class="n">s</span><span class="o">)</span> <span class="o">{</span>
<span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="o">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">s</span><span class="o">.</span><span class="na">length</span><span class="o">();</span> <span class="n">i</span><span class="o">++)</span> <span class="o">{</span>
<span class="kt">int</span> <span class="n">prev</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="na">charAt</span><span class="o">(</span><span class="n">i</span> <span class="o">-</span> <span class="mi">1</span><span class="o">);</span>
<span class="kt">int</span> <span class="n">curr</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="na">charAt</span><span class="o">(</span><span class="n">i</span><span class="o">);</span>
<span class="k">if</span> <span class="o">(</span><span class="n">prev</span> <span class="o">></span> <span class="n">curr</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="kc">false</span><span class="o">;</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="k">return</span> <span class="kc">true</span><span class="o">;</span>
<span class="o">}</span>
</code></pre></div></div>
<p>But what is the effect of this improvement on the overall run time of the
code? Clearly we gain something, but does it improve the asymptotic time
complexity?</p>
<p>To answer this question, we need to analyze the <em>amortized</em> run time
of the improved <code class="language-plaintext highlighter-rouge">IsInOrder</code>.
That is, instead of analyzing a call to <code class="language-plaintext highlighter-rouge">IsInOrder</code> in isolation and deriving
a pessimistic bound of <script type="math/tex">O(k)</script>, let’s consider how much time it takes to
execute all calls to <code class="language-plaintext highlighter-rouge">IsInOrder</code> and note that for some strings the
function will exit much sooner than for the other. In particular, the
function will go through the first step of the loop (<code class="language-plaintext highlighter-rouge">i=1</code>) for all strings,
but it will go through the second step (<code class="language-plaintext highlighter-rouge">i=2</code>) only for strings with the
first two positions in order, and it will go through the last step (<code class="language-plaintext highlighter-rouge">i=k-1</code>)
only for strings with the first <script type="math/tex">k-1</script> positions in order.</p>
<p>How many strings are there with the first <script type="math/tex">j</script> positions in order, for
<script type="math/tex">1\leq j \leq k</script>? Calculating this requires knowing a bit of combinatorics.
Without going into details, the number of such strings is</p>
<script type="math/tex; mode=display">n^{k-j} {n+j-1 \choose j}</script>
<p>Hence, the total run time of the code can be expressed as</p>
<script type="math/tex; mode=display">% <![CDATA[
T(n, k) := \sum_{1\leq j<k} n^{k-j} {n+j-1 \choose j} %]]></script>
<p>and by factoring out <script type="math/tex">n^k</script>, we get the amortized run time of <code class="language-plaintext highlighter-rouge">IsInOrder</code>
expressed as</p>
<script type="math/tex; mode=display">% <![CDATA[
S(n, k) := \sum_{1\leq j<k} n^{-j} {n+j-1 \choose j} %]]></script>
<p>The sum may look scary at first, but there’s a way to deal with it.
We can note experimentally that for a fixed value of <script type="math/tex">n</script>, say <script type="math/tex">n=26</script>,
the sum seems to converge to a constant as we increase <script type="math/tex">k</script>. Check it out
on <a href="https://www.wolframalpha.com/input/?i=N%5BSum%5B26%5E%7B-j%7D+Binomial%5B26%2Bj-1%2C+j%5D%2C+%7Bj%2C+1%2C+100%7D%5D%2C+5%5D">Wolfram Alpha</a>!
As you change the upper bound of the sum from 100 to 1,000 and more, the
value stays fixed at 1.7725. Now go ahead and check the value of <a href="https://www.wolframalpha.com/input/?i=N[Sqrt[Pi]%2C+5]"><script type="math/tex">\sqrt{\pi}</script></a>.
It is 1.7725 as well! Have we experimentally stumbled on a surprising
identity <script type="math/tex">\lim_{k\to\infty} S(26,k)=\sqrt{\pi}</script>? Well… no. This
turns out to be just a coincidence. I’ve deliberately rounded both
numbers to 5 digits. As soon as you increase the rounding to 6 digits,
you’ll see a difference.</p>
<p>Anyway, although <script type="math/tex">S(26, k)</script> doesn’t converge to <script type="math/tex">\sqrt{\pi}</script>,
it does seem to converge to a constant. To show this for arbitrary <script type="math/tex">n</script>,
since <script type="math/tex">S(n,k)</script> increases as we increase <script type="math/tex">k</script>, it suffices to show that
<script type="math/tex">S(n,k)</script> is upper-bounded by a constant. And to show the latter,
we can somewhat counterintuitively relax the sum by adding infinitely many additional terms.</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
S(n,k) &\leq \sum_{j\geq 1} n^{-j} {n+j-1 \choose j} \\
S(n,k) + 1 &\leq \sum_{j\geq 0} n^{-j} {n+j-1 \choose j}
\end{align*} %]]></script>
<p>By the <a href="https://en.wikipedia.org/wiki/Ratio_test">ratio test</a>, the
latter series
converges for <script type="math/tex">n>1</script>. And to find out what it converges to, we can
use the identity</p>
<script type="math/tex; mode=display">{n+j-1 \choose j} = (-1)^j {-n \choose j}</script>
<p>to recognize the series as a <a href="https://en.wikipedia.org/wiki/Binomial_series">binomial series</a>.</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
\sum_{j\geq 0} n^{-j} {n+j-1 \choose j}
&= \sum_{j\geq 0} (-1/n)^{j} {-n \choose j} \\
&= (1-1/n)^{-n}
\end{align*} %]]></script>
<p>Plugging the result back to the above “relaxation,” we get</p>
<script type="math/tex; mode=display">S(n,k) \leq (1-1/n)^{-n} - 1</script>
<p>The expression <script type="math/tex">(1-1/n)^{-n}</script> is interesting. For <script type="math/tex">n=2</script>,
it evaluates to 4, and it decreases as we increase <script type="math/tex">n</script>
(in fact, it converges to <script type="math/tex">e</script>).
Hence, <script type="math/tex">S(n, k)\leq 3</script>, which means that instead of <script type="math/tex">O(k)</script>, the
improved function <code class="language-plaintext highlighter-rouge">IsInOrder</code> runs in <strong>constant</strong> amortized time.
Let’s try to appreciate what this means: no matter how long the strings
are, on average the loop in the improved <code class="language-plaintext highlighter-rouge">IsInOrder</code> takes at most 3 steps.
Even better, on average, the loop takes at most <script type="math/tex">(1-1/n)^{-n} - 1</script>
steps, which is 1.7725 (almost <script type="math/tex">\sqrt{\pi}</script>) for the English alphabet (<script type="math/tex">n=26</script>).
And overall, instead of <script type="math/tex">O(kn^k)</script>, the run time of the improved code is
<script type="math/tex">O(n^k)</script>.</p>
<!--
Note that kramdown doesn't work well with MathJax 3 since it produces old
jsMath script tags that are no longer supported in MathJax 3. The code below is
taken from
https://github.com/gettalong/kramdown/issues/626
It configures a render action that takes care of the script tags. Hopefully
this will eventually be handled in a better way.
-->
<script>
MathJax = {
options: {
renderActions: {
find: [10, function (doc) {
for (const node of document.querySelectorAll('script[type^="math/tex"]')) {
const display = !!node.type.match(/; *mode=display/);
const math = new doc.options.MathItem(node.textContent, doc.inputJax[0], display);
const text = document.createTextNode('');
node.parentNode.replaceChild(text, node);
math.start = {node: text, delim: '', n: 0};
math.end = {node: text, delim: '', n: 0};
doc.math.push(math);
}
}, '']
}
}
};
</script>
<script type="text/javascript" id="MathJax-script" async="" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml.js">
</script>Filip NikšićWith my postdoctoral appointment at the University of Pennsylvania coming to an end this summer, I’m in the process of finding a new job. And even though this wasn’t my first choice when I started the postdoc a year and a half ago, I ended up applying for software engineering positions at big tech companies. What this means is that I’m facing a scary obstacle—the notorious coding interviews.New Website2020-03-03T00:00:00-05:002020-03-03T00:00:00-05:00https://fniksic.github.io/blog/2020/03/03/new-website<p>After several months of putting it off, I have finally retired my old website
and replaced it with a shiny new one. And it was about time. I created the old
website when I started my PhD at the Max Planck Institute for Software Systems,
which was in September 2012. Having in mind how fast things change on the
Internet, a face-lift after more than seven years seemed to be long overdue.</p>
<p>Not that I was unhappy with the old website. I liked it a lot! I was proud of
its minimalist design, gray headings combined with light-blue
hyperlinks—a
combination of colors I find aesthetically very pleasing—and its minimal
content contained within a single page. I thought it looked elegant, and I was
not alone: at least ten people at the MPI-SWS and Penn liked it enough to
use it as a template for their own personal web pages.</p>
<figure>
<img src="/assets/images/old-website.png" alt="Screenshot of the old website" />
<figcaption>Screenshot of the old website</figcaption>
</figure>
<p>Nevertheless, the website had its shortcomings. For one,
it wasn’t mobile-friendly. At the time it was created,
smartphones were not so ubiquitous, and I was completely unaware
of concepts like mobile-first design. But one can certainly live with their
page being rendered poorly on mobile. More importantly, I’ve always wanted
to have a blog, and integrating one with the old website would have required
effort which I was never really inclined to expend.</p>
<p>The new website has a blog (you’re reading the first post! :open_mouth:),
and it looks great on screens big and small. It is statically generated
using <a href="https://jekyllrb.com/">Jekyll</a>, and it is
hosted on <a href="https://pages.github.com/">GitHub Pages</a>. I decided to go with
a statically generated site because I felt it was perfectly suited
for a personal website with a blog: it is simple, secure, and there’s no
overhead of running a server with a database. Out of several static generators,
I chose Jekyll mainly because of its maturity, large user base, well-written
documentation, and shareable themes. As a bonus, Jekyll is natively supported
by GitHub Pages, so publishing a blog post is as simple as writing a
Markdown-formatted file and pushing it to my GitHub repository.</p>
<p>Choosing a theme for the website was not so simple. I went through
hundreds of Jekyll themes available
online in search for a clean, elegant design—a theme that would be well-suited
for an academic that needs to put forward a short summary of their research
and a list of publications. At the end I chose <a href="https://mademistakes.com/work/minimal-mistakes-jekyll-theme/">Minimal Mistakes</a>,
a theme created by <a href="https://mademistakes.com/">Michael Rose</a>. There is also a
theme called <a href="https://github.com/academicpages/academicpages.github.io">Academicpages</a>,
which is based on Minimal Mistakes and geared towards academics, but I felt
it was a bit too extreme in trying to make every aspect (publications, talks,
teaching materials) data-driven and generated from data files.
Maybe I’ll regret this decision once my number of publications reaches 100. :thinking:</p>
<p>One thing that doesn’t play well with statically generated blogs are comments.
By their nature, comments are dynamically generated by the readers as they
interact with the blog post. While there
are solutions for integrating comments into static blogs, all of the solutions I’ve
seen so far have drawbacks. For instance, a very popular solution is <a href="https://disqus.com/">Disqus</a>. The idea behind it is that you outsource
commenting to a third-party service: Disqus provides the comments section
for you to embed into your pages. While being simple, with simplicity you also
get their ads, user tracking, and security vulnerabilities, and they get to keep
your readers’ comments on their servers and use them for any purpose they see
fit. On top of that, the comment form that goes on your pages is incredibly
ugly!</p>
<p>For websites hosted on GitHub Pages, there are solutions like <a href="https://utteranc.es/">utterances</a>
and <a href="https://staticman.net/">Staticman</a>. They use the GitHub
API to turn a submitted comment into a GitHub pull request that adds a file
containing the comment directly into the repository. The file is then
processed by Jekyll and integrated into the corresponding blog post.
The idea is pretty neat, and it doesn’t suffer from any of the Disqus’s drawbacks.
However, it assumes that there is a principal authenticated
with GitHub who can actually issue the pull request. With utterances, the
principal is the reader submitting the comment—it is assumed that the
reader is also a GitHub user logged into their account. With Staticman,
the principal is an external service holding a GitHub authentication token.
This service processes the reader’s comment and issues a pull request, so the
reader doesn’t have to be a GitHub user. But the service needs to be hosted
somewhere. Staticman hosts a public instance of the service, but due to increased
popularity, the instance quickly reaches the GitHub’s limit on the number of
pull requests it can issue. So it seems that hosting a private Staticman instance is
the only viable option.</p>
<p>Of all the options, I think I like Staticman the most. I certainly don’t want
Disqus’s ads and trackers on my website, and I don’t want to require a rare
soul that stumbles on my blog to be a GitHub user. On the other hand, it seems
that running a private Staticman instance on a platform like <a href="https://www.heroku.com/">Heroku</a>
shouldn’t be a big deal. The Staticman service is actually a perfect use case
for serverless computing provided by systems like <a href="https://aws.amazon.com/lambda/">Amazon Lambda</a>, but so far I haven’t seen a good solution along those
lines. Perhaps I should implement one myself. Either way, I will try to enable
comments as soon as possible.</p>
<p>With the technology in place, what am I going to blog about? Well, I don’t want
to immediately limit the scope of the blog, but I imagine it’s mostly going to
be about math, programming languages, and related topics. For instance, I’m a
huge fan of math problems and puzzles, and over the years I’ve seen quite a few
that are interesting, unusual, or insightful. Also, sometimes in my research I
discover little mathematical bits or solve practical problems that are not
important enough to make it into the papers, but they are still interesting.
This blog will be an outlet for such topics.</p>
<p>Anyway, knowing myself, there’s a high probability that my enthusiasm will
quickly drop and I will lose motivation to maintain the blog. Let’s see how
long I can keep it going. Hopefully this post will not be the last one. :slightly_smiling_face:</p>Filip NikšićAfter several months of putting it off, I have finally retired my old website and replaced it with a shiny new one. And it was about time. I created the old website when I started my PhD at the Max Planck Institute for Software Systems, which was in September 2012. Having in mind how fast things change on the Internet, a face-lift after more than seven years seemed to be long overdue.