Print All Links Solution – Intro to Computer Science
Articles,  Blog

Print All Links Solution – Intro to Computer Science


So here’s the code that we need to finish. We need a test condition for the while, and in this case we really want to keep on going forever until we’re done. So, we’re going to use while True and then use break to stop the loop. The test condition is true, and the way we know when we’re done is when the value returned as the URL was none. That means we got to the else, so to finish, we need to finish the else block by using break. Now let’s test our code. We’ll call print_all_links with our test string that has test 1, test 2, and test 3 as the 3 links. And when we run this, what we see is it prints out the 3 links, test 1, test 2, and test 3. We’ll try a more interesting test. For a more interesting example, let’s go back to the web page we looked at earlier. This was the page with the flying Python comic. And it has many links in it, and we saw when we did view source we can see what those links look like, so the first link that we see on the page. So if we use Command-F [or Ctrl-F], we can search for the first link on the page. We see that we have these links. The first one is the link to Archive, and that corresponds to what we see here at the top of the page. Here’s the link to Archive, the link to Blag, the link to the Store and About and Forums. And so there’s a lot of links on this page. If we go back to our print_all_links code, we can try to print them all. First, let’s look at what we actually see when we do get_page passing in the URL of xkcd.com/353, which is the page we were looking at. And when we run this, we see a lot of text printed out, and that’s exactly what we saw from view source, but now that’s what we’re getting as the result from get_page. That’s the text of the web page. Not very easy to look at. Instead of using get_page, we’ll print all the links on the page instead of just printing the whole page out. Now we’re using our print_all_links procedure to print all the links in the xkcd page. And when we run this, we indeed see all the links on the page. At least we see most of them. There are a few that we missed, and we’ll talk about that in a later unit, but we’re seeing many of the links on the page, including the first ones that we saw with the Archive and the Blag and the Store link, and we’re seeing lots of other links. And you can see that we go all the way to the Buttercup Festival link and the license link. Those are the 2 at the bottom of the page. This is the link to license, which was the last one that we printed out. So, congratulations are in order. You’ve made it to the end of Unit 2, and I hope you understand the main concepts that we’ve seen so far. In Unit 1 we saw variables. We learned about programs. We saw how to do arithmetic. In Unit 2 we learned about making procedures, and we learned how to use if to make decisions, and we saw that those by themselves were enough to do every possible computer program as Alan Turing showed in the 1930s. We also learned about while loops, a way to make it more convenient to do things over again. And now we’ve actually got a really good start on our search engine. We can print all the links on the page. We still don’t have our web crawler. We need to actually collect those links and do something with them. That’s what we’re going to do in Unit 3, and then in Unit 4, 5, and 6, we’ll see how to use the corpus that we’ve built to do useful web searches, but we’ve come a long way, and I hope we’ll see you back soon for Unit 3.

5 Comments

  • everdimension

    you can also use "while page"
    So the loop will run as long as there are any symbols left in the "page" string. It is almost the same as writing "True" with one exception – if the position of the last quote is the last symbol of "page" string, then the procedure will stop executing one cycle earlier, just after reaching "while page" and before reaching "else: break" :)))

    The quiz accepted this answer.

  • Richard Ouma

    The getpage() function as described is old and no longer works instead of urllib.urlopen("http://xkcd.com/353").read() use urllib.request.urlopen("http://xkcd.com/353").read().

Leave a Reply

Your email address will not be published. Required fields are marked *