{"id":849,"date":"2016-06-13T02:08:40","date_gmt":"2016-06-13T02:08:40","guid":{"rendered":"http:\/\/www.themissingdocs.net\/wordpress\/?p=849"},"modified":"2016-06-13T02:08:40","modified_gmt":"2016-06-13T02:08:40","slug":"distributed-code-jam-2016-r2","status":"publish","type":"post","link":"https:\/\/www.themissingdocs.net\/?p=849","title":{"rendered":"Distributed Code Jam 2016 R2"},"content":{"rendered":"<p>15 to the world finals and at the moment the top 12 solved all the problems.\u00a0To get through required a good time and only failing one of the larges, and not the one worth the most points.<\/p>\n<p>I think I could have pretty easily gotten a top 200 placing, with the 2 easiest larges and the other two smalls. \u00a0None of the problems seem &#8216;too&#8217; difficult, but writing the correct solution to 3 larges in the time available is probably a stretch for me.<\/p>\n<p><strong>Q1) Given some inefficient code, write equivalent but efficient code to solve the problem.<\/strong><\/p>\n<p>A1) Like last time the detail is of course in the code given. \u00a0This time it was a bit more deceptive. \u00a0It appeared to be calculating as a sum the all pair product between a set A and a set B, but subtly the central server doesn&#8217;t do any calculating, so its actually the all pair product where the pairs don&#8217;t sum\u00a0to 0 mod 20.<\/p>\n<p>The trick here is to realise the all pair product is sum of all elements in set A times the sum of all elements in set B. \u00a0Removing the pairs which sum to 0 mod 20 is a bit more tricky, but its the product of the sum of the 0 mod 20 in set A and set B plus the product of the sum of the items which are 1 mod 20 in set A and the items which are 19 mod 20 in set B, and so on.<\/p>\n<p>To simplify the code slightly rather than calculating the total and the removing, you can just calculate the sum of items that are x mod 20 in set A and y mod 20 in set B and sum all pair product where x+y mod 20 != 0<\/p>\n<p>All that remains is to solve the problem in a distributed scenario where there are 100 million elements in each of set A and set B. \u00a0Summing things is easily distributed, so each node creates to sum tables, one for each of set A and set B with an entry for each mod 20. \u00a0Then it processes its chunk of the two inputs calculating the mod and summing it into the correct chunk. \u00a0Once done, send both tables to the central server, which sums all the tables together and then performs the final pair product step.<\/p>\n<p>One thing is that the answer can easily be too big for 64 bit, so the problem actually wants it modulo 1 billion and 7. \u00a0As with many problems with this requirement, its important to remember to apply the modulo after every calculation. \u00a0And even though the modulo and all individual inputs are valid for 32 bit, calculating products will have intermediary values which need 64 bit precision.<\/p>\n<p><strong>Q2) Determine the longest prefix of a sequence of brackets which can be extended in to a balanced sequence. \u00a0Return -1 if the entire sequence is already a balanced sequence.<\/strong><\/p>\n<p>A2) So the question is worded a bit more vaguely than above, but if you&#8217;ve had some exposure to bracket systems you&#8217;ll know that the description is just equivalent to any balanced sequence of brackets. \u00a0At first it might seem this problem will be difficult to do distributed because each segment&#8217;s validity clearly depends on the one before, but one of the details of a balanced sequence of brackets is that any sequence of brackets can be made balanced, so long as the running sum of open (+1) and closed (-1) never goes negative. \u00a0Therefore each segment just depends on the running sum of the previous segments.<\/p>\n<p>So each node calculates its total for the local running sum, sends them to the central server which accumulates those running sums and sends back the correct starting total for each node, to that node. \u00a0Now each node can check whether the true running sum goes negative at any point. \u00a0If it does, it\u00a0sends that position to central server, otherwise it sends -1. \u00a0The central server then returns the first non-negative result. \u00a0Otherwise it checks if the grand total is 0, if so it returns -1, otherwise it returns the length of the full data set. \u00a0(This last conditional I think is probably the easiest thing to miss in this problem&#8230;)<\/p>\n<p>Since the data is only 1 character per value, memory use isn&#8217;t a high risk, and the cost per read of value is low enough that you could probably read each value assigned to the node twice without running out of time if you don&#8217;t want to read in to a local array. \u00a0The usual gotchas around splitting data amongst the nodes apply.<\/p>\n<p><strong>Q3) Given a game where you can move left or right\u00a01 unit\u00a0or stay still in any given turn, and the rest of the game map moves down one unit after your move. \u00a0Determine the maximum number of points you can get given any choice of starting locations. \u00a0Each non-wall containing cell is worth between 0 and 9 points which you earn just by entering it.<\/strong><\/p>\n<p>A3) \u00a0The small input can be easily done on a single node, and the solution is a dynamic program on the maximum number of points you can get in total once you arrive at position x. \u00a0From each row from bottom to top you try each non-wall position, then there are 3 possibilities, if the position in front isn&#8217;t a wall max its current value with the sum of the local current value and the value of that cell. \u00a0Then if the right and right forward diagonal aren&#8217;t walls, similar with the right diagonal cell, but also add in the value of the right cell. \u00a0Similarly with left diagonal. \u00a0This is seeded by the value of each non-wall position in the bottom row, and the final result is the maximum value in\u00a0the row above the top of the input. \u00a0This DP is O(1) per cell, so can be completed very quickly in the small input scenario where the maximum size is 1000 by 1000. \u00a0The large input of 30000 by 30000 would be almost processed in time by a single node, if it wasn&#8217;t for the memory limits and the time to read the raw input.<\/p>\n<p>So the question becomes, how to break the problem up to pass the large input across multiple servers. \u00a0Horizontal stripes are a bad idea, because the mathematical operation is a max of sums and it mixes values from a lot of locations so it doesn&#8217;t distribute well trying to chain it with results that arrive after you&#8217;ve started. \u00a0So the solution is vertical stripes. \u00a0However the edges of each row of a stripe are dependent on one cell to the right and left of the stripe of \u00a0the row below. \u00a0If each server was to communicate its edges with every row, that would be 30 thousand messages, which given the estimated 5ms latency per message is far far far too long to run in time.<\/p>\n<p>In order to send less messages each node needs to do more work. \u00a0Because the game\u00a0is limited to one move left or right per row, if we want to only send a message every 300 rows (which brings in a total of half a second expected latency instead of 150 seconds) we need to start with an extra 600 width on our base section, then 598, 596&#8230; \u00a0 This doubles the number of values which has to be read by each server if they would usually be dealing with a 300 wide stripe (and doubles the number of DP calculations, but the reading of the input dominates).<\/p>\n<p>In order to continue after we&#8217;ve done 300 rows, we need to share the correct values for the full set of 300 neighbours on each side &#8211; which we can get from each of the neighbouring servers if we&#8217;ve set it up right&#8230; \u00a0Once we&#8217;ve got that information we can continue for another 300 rows.<\/p>\n<p>So, does this solution scale well enough? \u00a0Estimated latency waiting for network is 500ms. \u00a0Time to ready the values for the stripe and half of each neighbour is 30000*600*0.05us = 0.9 seconds. \u00a0Actual computation time is much lower. \u00a0So 5s should be plenty of time. \u00a0Memory usage is 18MB minimum, but quadruple that if you use 16bit chars and a resizing array without calling reserve. \u00a0Still fits. \u00a0In fact it fits well enough you might as well make your life simpler and just use a full 3 widths allocation even though you&#8217;ll only use half of each of the outside areas.<\/p>\n<p>Each message sent is 1200 bytes, you send up to 100 messages to 2 servers, this isn&#8217;t even a whole megabyte&#8230; \u00a0And the 100 messages is well below the 1000 limit.<\/p>\n<p>Gotchas:<\/p>\n<ol>\n<li>Using your normal logic for distributing the columns to servers can result in\u00a0server with much less than 300 width. \u00a0If your code mistakenly decides to send in height chunks equal to the width allocated to the server, you can easily become too message spammy&#8230; \u00a0 If you don&#8217;t, you need to start requesting results from more than one server over, and again you become too spammy. You want to explicitly allocate exactly 300 main columns to each server to make this simpler. \u00a0If you don&#8217;t use all the servers, that&#8217;s okay, it will still easily run in time. \u00a0More generically (if you don&#8217;t like writing your code to the specific problem constraints&#8230;) you should divide it in to equal chunks of width depending on the longest of the two dimensions.<\/li>\n<li>One server has a different width, so its neighbouring server gets complicated&#8230; \u00a0Solution is to pretend that the total width is actually exactly a multiple of 300, and fill the extra columns with walls. \u00a0Makes the code for reading values in to the array slightly more difficult, but makes everything else much simpler&#8230;<\/li>\n<\/ol>\n<p><strong>Q4) Determine the cheapest way to get from a to b if there is a petrol station every km and you use one litre per km, and have a maximum fuel tank size T. \u00a0Each petrol station has its own price per litre.<\/strong><\/p>\n<p>A4) So I struggled for a while just to get the logic right for the small input, but eventually I realised its actually quite simple. \u00a0For each km, the cost you paid for that km, is the cheapest of the previous T petrol stations. \u00a0So if you keep a running minimum you just add the minimum value for each node as you go.<\/p>\n<p>As I mentioned for the first problem, the sum operation distributes well. \u00a0A running min on the other hand&#8230;<\/p>\n<p>So, the solution is all about calculating a running min. \u00a0If T is small, this is easy, just do it locally. \u00a0More specifically, if T is less than the width you allocate to each node, the problem might as well be solved locally, each node potentially doubles the number of values it has to read in, but that is still easily managed against the time and memory limits with some care. \u00a0Each node calculates its sum and sends it to the central server which creates the grand total.<\/p>\n<p>However there is no such limit on T, it could be very large&#8230; \u00a0However, if T is larger than the width you allocate to each machine, you don&#8217;t have to do a running min on that machine, its just a simple min of the values seen so far. \u00a0Any value added is never removed. \u00a0However you also need to consider the minimum of some of the values allocated to other nodes, and that you do have to remove values from. \u00a0But before we get into that detail.. Consider if T is larger than double the width. \u00a0In that case there is an entire server&#8217;s worth of values which will always be in the running min for any given server other than the first. \u00a0Here we find the opportunity to calculate something distributed which will help the other nodes.<\/p>\n<p>So the first step is for each node to find its minimum price of its allocated section, and broadcast that to every node to the right of it.<\/p>\n<p>So, now for each node the running min consists of 3 (or 4) values of which we take the minimum.<\/p>\n<ol>\n<li>The minimum seen so far in this chunk that we are calculating the sum for.<\/li>\n<li>The minimum of the previous servers which are always completely covered by the fuel tank size T.<\/li>\n<li>The minimum of the previous server which is currently completely covered (but won&#8217;t be later once we&#8217;ve advanced more).<\/li>\n<li>The minimum of the partial server section needed to fill out T.<\/li>\n<\/ol>\n<p>The first two of these are easy, the third is easy if its present. \u00a0The fourth presents a bit of a challenge. \u00a0I propose that the simplest way is for the local server to read the values allocated to that partial server, in reverse order from the end of that server allocation. \u00a0As it finds a new local minimum, it records it and its position. \u00a0Then as you advance pointer you check if the current far minimum is still valid, if not, pop it off the stack and use the one underneath. \u00a0Worst case this stack contains the entire contents of a servers allocation, at 8 bytes a pop, 4 bytes for the minimum (since they all fit under 1 billion) and 4 bytes for the index.<\/p>\n<p>So, does it scale well enough. \u00a0Well the worst case tank size is when it causes some servers to create the stack for almost one entire server, then again for another server because advancing the width just drops over into the next server, and in order to calculate that minimum you have to process the entire server&#8217;s width. \u00a0The processing time itself is cheap, the memory allocations will fit so long as you reuse your reserved size stack when switching servers at the boundary&#8230; (Should be under 80MB if you do it right.) Network latency is not significant&#8230; \u00a0The big concern is the data reading time. \u00a05 million positions allocated per server, reading time 0.75 seconds just for its own. \u00a0Triple that and its getting close to half of the allowed time limit just reading the data.<\/p>\n<p>So, it might be good enough&#8230; but can we do better? \u00a0The answer is kind of yes. \u00a0At the beginning we calculated a single min per server and sent that around. \u00a0If instead we calculate 100 mins per server, we don&#8217;t make calculating the global minimum for the trivially covered section much slower, but it reduces the size of the worst case stack by a factor of 100. \u00a0On the other hand it increases the worst case number of stacks we have to create from 2 to 101, but because those stacks are all 1\/100th the size, we just reduced the overhead from 2 extra servers worth of reading to 1.01 extra servers worth. \u00a0This saves you almost 0.75 seconds of reading time in the worst case. \u00a0The reduction to the worst case stack size also makes it much much easier to stay under the memory limit. \u00a0It increases the network traffic size by a factor of 100, but its still under 100KB sent per node.<\/p>\n<p>Gotchas:<\/p>\n<ol>\n<li>I said it was easy for the case where T is small &#8211; but for T around the size of 1 width its actually a bit tricky. \u00a0Doing a running min the natural data structure is a multiset (an ordered set which allows repetition), but a multiset\u00a0with 5 million entries can have an an unexpectedly large memory cost, well beyond the 40MB you might expect. \u00a0Without adequate knowledge of the implementation details of the multiset included in your programming language, you definitely run risk of running out of ram&#8230; \u00a0This can be solved by the same extension proposed to reduce the read worst size in the very large T case. \u00a0Using the locally calculated 100 mins, the size of T where you need to do a true running min is reduced to only 50thousand elements, which a multiset will easily fit in to memory for.<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>15 to the world finals and at the moment the top 12 solved all the problems.\u00a0To get through required a good time and only failing one of the larges, and not the one worth the most points. I think I could have pretty easily gotten a top 200 placing, with the 2 easiest larges and &hellip; <a href=\"https:\/\/www.themissingdocs.net\/?p=849\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Distributed Code Jam 2016 R2<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-849","post","type-post","status-publish","format-standard","hentry","category-code-competitions"],"_links":{"self":[{"href":"https:\/\/www.themissingdocs.net\/index.php?rest_route=\/wp\/v2\/posts\/849","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.themissingdocs.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.themissingdocs.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.themissingdocs.net\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.themissingdocs.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=849"}],"version-history":[{"count":0,"href":"https:\/\/www.themissingdocs.net\/index.php?rest_route=\/wp\/v2\/posts\/849\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.themissingdocs.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=849"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.themissingdocs.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=849"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.themissingdocs.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=849"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}