• Great Job! – Intro to Computer Science
    Articles,  Blog

    Great Job! – Intro to Computer Science

    So congratulations, you’ve made it to the end of unit one. You’re learned a lot of computer science already, you know what a program is, you’ve learned about variables, you’ve learned about expressions and grammars, you’ve learned about strings in Python. So now it’s time for you to work on homework one on your own. And that will check that you understood everything from this class and prepare you to get started on unit two. And we’re well on our way towards learning a lot of computer science, as well as building our web crawler and then building our search engine.

  • Improving Crawler – Intro to Computer Science
    Articles,  Blog

    Improving Crawler – Intro to Computer Science

    If we want more confidence, we could also look at the documentation and the challenges. If we don’t know what we’re looking for, well the documentation won’t tell us We can see that we have union that returns a union of the sets. What we don’t see yet is update and now, given that we found update and guess what it did, we can see this description, not the most clear description to know for sure that it does what we want. We could do some more searches. Let’s see what we got when we search for Python and update. Well, we got some discussion here, we’re looking at the…

  • Grace Hopper
    Articles,  Blog

    Grace Hopper

    One of the pioneers in computing was Admiral Grace Hopper. She was famous for walking around with nanosticks, which were pieces of wire that were the length light would travel in a nanosecond– 30 cm long. Grace Hopper wrote one of the first languages, and the language COBOL, which she is seen holding here next to UNIVAC, was for a long time the most widely-used computer language. She was one of the first people to think about writing languages this way, [“Nobody believed that I had a running compiler and nobody would touch it. They told me computers could only do arithmetic.” Grace Hopper] and you have this quote when…

  • Crawling Process – Intro to Computer Science
    Articles,  Blog

    Crawling Process – Intro to Computer Science

    So I’m going to describe that process, and I’m going to write it out in a, fairly precise way, but not as actual python code. Because it will end up being your job to finish the python code to do this. But I want to describe it precisely enough so we can ask some questions about it. So we’re going to start with some seed page and to crawl will just be that page. The list containing just the seed page, and crawled will be empty. And we’re going to keep going as long as there are more pages to crawl. And for each step we’re going to pick one…

  • Crawl Web Loop – Intro to Computer Science
    Articles,  Blog

    Crawl Web Loop – Intro to Computer Science

    The next step is to write the loop that’s going to do the crawling. And we said the process we want to follow is to keep going as long as there are more pages to crawl. We can do that with a while loop, and we can use tocrawl like this in our test condition. If a list is empty that’s interpreted as false. If the list is not empty, that would be interpreted as true. So this means the same thing as testing if the length of the list is zero, it’s a cleaner way to write this by just doing while tocrawl. Inside the loop, well, we want…

  • Finishing the Web Crawler Solution – Intro to Computer Science
    Articles,  Blog

    Finishing the Web Crawler Solution – Intro to Computer Science

    So the answer is we should use the “addpageto_index” procedure we just defined, and we should pass in the index. We should pass in the page, that’s the URL that identifies the location, and we should pass in the content. And that’s all we need. So we’re done with our web crawler.>From a seed, we can find a set of pages. Following that seed, following all the links that we find on the pages that we find starting from that seed, for each page, we’re going to add the content that we find on that page to an index, and we’re going to return that index. And we’ve already written…

  • Finishing the Web Crawler – Intro to Computer Science
    Articles,  Blog

    Finishing the Web Crawler – Intro to Computer Science

    So let’s remember the code we had at the end of unit 2 for crawling the web. So we used 2 variables. We initialized “tocrawl” to the seed, a list containing just the seed, and we’re going to use “tocrawl” to keep track of the pages to crawl. We initialized “crawled” to the empty list, and we’re keeping track of the pages we found using “crawled.” Then we had a loop that would continue as long as there were pages left to crawl. We’d pop the last page off the “tocrawl” list. If it’s not already crawled, then we’ll union into “tocrawl” all the links that we can find on…