Advanced PHP Practice Interview Questions And Answers Part 2
This is a continuation of the practice PHP interview question from part 1. Here is the additional part of the question that we want you to try to answer:
The code should also be able to handle the URL in the src attribute of an iframe.
And, here is the original question for your convenience:
Write a PHP script to report the total download size of any URL. You may not use any 3rd-party code that performs the entire task described below.
No HTML interface is necessary for this exercise; you can write this as a command-line
script that accepts the URL as an argument.
For a single-file resource such as an image or SWF, the script would
simply report on the total size of the document.
For a complex resource such as an HTML document, the script would need
to parse it to find references to embedded, included resources:
javascript files, CSS files, iframes, etc.
The goal of this exercise is to output the following information for a given URL:
– total number of HTTP requests
– total download size for all requests
How to handle an iframe src
The second part of the question states that we will need to be able to handle the URL in the src attribute of an iframe tag. What exactly does that mean? Well, the src attribute of an iframe tag points to another HTML page. When an iframe tag is used in a page it’s like embedding another HTML page within that page. And since the whole point of this exercise is to find the number of HTTP requests being made along with the total download size for all requests, we basically have to follow the iframe src URL ourselves and figure out how many new HTTP requests are created from that URL and what their total download size will be.
For example, we actually use an iframe tag on this page to embed the Facebook likebox, which you can see on the bottom of the left hand sidebar. This is what our iframe tag looks like (you can also see this if you “View Source” for this page) – note that the iframe src actually points to a php page called “likebox.php”:
<iframe src="http://www.facebook.com/plugins/likebox.php? href=http%3A%2F%2Fwww.facebook.com%2Fpages%2FProgrammer- Interview%2F120896424636091&width=238&colorscheme =light&show_faces=false&stream=false&header=true& height=62" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:230px; height:70px;" allowTransparency="true"></iframe>
You can see the result on this page itself when the iframe is actually rendered – there is a like button, a count of likes, some text, and an image of a nerd. The like button and the nerd will be 2 separate HTTP requests. The iframe src itself counts as an HTTP request as well – because the browser will have to make a request for whatever URL the iframe src points to.
View source does not show you markup generated by iframe
But, the thing is that when we read the HTML on the page, we will only see the iframe tag – we will not see the markup that is created by the iframe. This is an important point to understand, and you can confirm this fact by just doing a view source on this page. Even if we tell cURL to retrieve the page for us, the HTML returned will have the iframe tag in it’s original form, and not in it’s rendered form. For that reason, if we want to find out the number of HTTP requests that the iframe will generate, we have to take a look at the URL in the source tag itself and evaluate just like we did for the original URL.
What this means is that we will essentially have to re-use the same code to find the requests and download size that we used for the top level (containing) document as we would use for the containing document that will hold the iframe tag. Think about that for a second and see if you can come up with a good approach to solve that problem on your own.
It turns out that we can actually use recursion to basically call our existing code, and pass in the URL value from the iframe src attribute. This way we can find the number of HTTP requests and total download size for the requests that will come from the iframe src URL, we just re-use the code that we have already written.
In order to use recursion here, we should have our code wrapped inside a function. So, with that in mind we create the function below that we call start – note that we deliberately left out the code which is used to find images/css/javascript since we just want to focus on the iframe piece and making a recursive call.
Using recursion to answer PHP Interview question part 2
So, we can just make a recursive call to the start function as shown below in red:
function start($URL){ if (!check_if_html($URL)) { $totalSize = get_remote_file_size($URL); echo "Final Total Download Size: $totalSize Bytes "; $totalNumResources += 1;//single resource is an HTTP request echo "Final total HTTP requests: $totalNumResources" ; return; } /* at this point we know we are dealing with an HTML document which also counts as a resource, so increment the $totalNumResources variable by 1: */ $totalNumResources += 1; foreach($html->find('iframe') as $element) { echo "IFRAME" . "$element->src.\n"; start($element->src); } } //CLOSING BRACE FOR "START" FUNCTION
But, wait a second. What will happen to the $totalSize and $totalNumResources variables? Well, with the implementation above they will actually get reset during the recursive call, and when the recursive call returns, the values will be back to what they were originally, before the recursive call. This makes no sense – what we really want is to count the number of HTTP requests and calculate the download size of the requests that are added by the iframe. Remember – we want a cumulative sum of the HTTP requests and download size, including whatever resources are added to the page by the iframe.
Saving the values of the PHP variables
So, there clearly needs to be a way to save the value of those variables while the recursive call is made – so that the recursive call can just add on to those values. The way to do this is to pass in the values of the $totalSize and $totalNumResources variables into the recursive call to the start function – so the start function will now have to be modified so it can accept two extra parameters.
But, just passing the variables $totalSize and $totalNumResources into the recursive call is not enough – we also need to return those variables from the function itself. If those values are incremented during the recursive call, we need to be sure to retain the modified values even after the recursive call is over. So, we will have to use the code below – note the changes are in red:
function start($URL, $totalSize, $totalNumResources){ if (!check_if_html($URL)) { $totalSize = get_remote_file_size($URL); echo "Final Total Download Size: $totalSize Bytes "; $totalNumResources += 1; //single resource is HTTP request echo " Final total HTTP requests: $totalNumResources" ; return; } /* at this point we know we are dealing with an HTML document which also counts as a resource, so increment the $totalNumResources variable by 1 */ $html = file_get_html($URL); $totalNumResources += 1; foreach($html->find('iframe') as $element) { echo "IFRAME:" . "$element->src.\n"; list($totalSize, $totalNumResources) = start($element->src, $totalSize, $totalNumResources); } return array($totalSize, $totalNumResources) ; } //closing brace for 'start' function...
Note that we use the list function in PHP to hold the variables that will be returned once the start function returns from the recursive call.
Another interesting thing that we should point out in the code above is the fact that we do not have any code inside the iframe foreach loop that increments the $totalNumResources by 1. This is because during the recursive call the $totalNumResources variable will be incremented by 1 anyways, because the iframe URL is counted as a separate HTML document.
The final answer to Advanced PHP Interview Question Part 2
Now, here is what the complete PHP code looks like which includes the recursive call to the start function, and is our final answer to part 2 of the PHP interview questions:
include('simple_html_dom.php'); $URL = $argv[1]; $totalSize = 0; $totalNumResources = 0; list($totalSize, $totalNumResources) = start($URL, $totalSize, $totalNumResources); echo "Final total download size: $totalSize Bytes " ; echo "Final total HTTP requests: $totalNumResources"; function start($URL, $totalSize, $totalNumResources){ if (!check_if_html($URL)) { $totalSize = get_remote_file_size($URL); echo "Final Total Download Size: $totalSize Bytes "; $totalNumResources += 1; //a single resource is an HTTP request echo " Final total HTTP requests: $totalNumResources" ; return; } $html = file_get_html($URL); // find all images!! foreach($html->find('img') as $element){ $size = get_remote_file_size($element->src); $totalSize = $totalSize + $size; $totalNumResources += 1; //echo "Here is the total size: $totalSize.\n"; // echo "total resources: $totalNumResources .\n"; } // Find all css foreach($html->find('link') as $element) { if (strpos($element->href,'.css') !== false) { $size = get_remote_file_size($element->href); // echo "SIZE: $size.\n"; $totalSize = $totalSize + $size; $totalNumResources += 1; //echo "total resources: $totalNumResources .\n"; // echo "Here is the total size: $totalSize.\n"; //echo "$element->href.\n"; } //only output the ones with 'css' inside... } foreach($html->find('script') as $element) { if (strpos($element->src,'.js') !== false) { $size = get_remote_file_size($element->src); $totalSize = $totalSize + $size; $totalNumResources += 1; // echo "Here is the total size: $totalSize.\n"; // echo "total resources: $totalNumResources .\n"; //echo "$element->src.\n"; } } foreach($html->find('iframe') as $element) { //echo "IFRAME" . "$element->src.\n"; /* DON't count iframe as request, because it will be counted as an HTML document which means it will be counted above, so get rid of the increment line in the actual code, and explain this point as well!! */ list($totalSize, $totalNumResources) = start($element->src, $totalSize, $totalNumResources); } return array($totalSize, $totalNumResources) ; } //CLOSING BRACE FOR THE FUNCTION "START"... function get_remote_file_size($url) { $headers = get_headers($url, 1); if (isset($headers['Content-Length'])) return $headers['Content-Length']; //THIS ONE CHECKS FOR LOWER CASE L IN CONTENT-length (DIFFERENT FROM ABOVE!!) if (isset($headers['Content-length'])) return $headers['Content-length']; $c = curl_init(); curl_setopt_array($c, array( CURLOPT_URL => $url, CURLOPT_RETURNTRANSFER => true, CURLOPT_HTTPHEADER => array('User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3'), )); curl_exec($c); $size = curl_getinfo($c, CURLINFO_SIZE_DOWNLOAD); return $size; curl_close($c); } function check_if_html($url){ $ch = curl_init($url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); curl_setopt($ch, CURLOPT_HEADER, TRUE); curl_setopt($ch, CURLOPT_NOBODY, TRUE); $data = curl_exec($ch); $contentType = curl_getinfo($ch, CURLINFO_CONTENT_TYPE ); curl_close($ch); if (strpos($contentType,'text/html') !== false) return TRUE; // this is HTML, yes! else return FALSE; }
Feel free to add your comments below if you see anything wrong or have any questions. Note that the code above is meant to run from a command line, and you may see some PHP warnings depending on which site you are trying to analyze with the code.
3 thoughts on “Advanced PHP Interview Questions And Answers Part 2”