• Extracting Links – Intro to Computer Science
    Articles,  Blog

    Extracting Links – Intro to Computer Science

    So now you know enough about Python to be able to solve the problem that we started with at the beginning of this unit, which the problem of extracting a link from its page. Before we get to the code, I want to describe a little more carefully what’s going on in a webpage. So we’ve talked about strings in Python and all a web page really is, is a long string. When you see a web page in your browser, it doesn’t look like that. So here’s an example web page, one of my favorite XKCD comics. And hopefully, you’re starting to learn enough about Python to appreciate the…

  • Articles

    Seed Pages in Real World – Intro to Computer Science

    Student Rodith asks, “How do we go about finding a good seed page?”>>This is a really good question, we haven’t solved that question in this course, and we won’t. We’re going to give you a seed page and have exercises and– sets of pages to crawl set up so that seed page will work well. For a real web crawler, this is a big challenge. And they don’t start with just one seed page; they start with a set of many seed pages that are selected in different ways. Some may be found by hand, by people identifying them as good seed pages, some may be the results of previous…

  • Finishing Crawl Web – Intro to Computer Science
    Articles,  Blog

    Finishing Crawl Web – Intro to Computer Science

    So now, we’re ready to finish the heart of our crawler. Let me put the last statement in, so you know there’s nothing else missing and you’ll be able to test this. And the last thing we want to do is return the result in crawled. When we finished the while loop, we’re ready to return crawled, which is the list of pages we found. What we have left to do is to figure out what we do to crawl each page. This is going to be a pretty tough quiz, I think you’ll need at least two lines of code. If you think about using all the procedures that…

  • Articles

    Search Engines And The Web – Intro to Computer Science

    [Dave] So welcome to Homework 1. Homework is going to be a little different from quizzes, so unlike the quizzes where you get instant feedback for each question, in the homework you’ll submit your answers, and you won’t see feedback on them until you submit your answers, and after the deadline for the homework, we’ll be posting answers and having discussion about the questions. So for Question 1, we want to see that you understand how web pages are constructed and what a web crawler will do. The goal for Question 1 is for you to find all the target links in the sample web page that we’ve provided. So…

  • Crawl Web – Intro to Computer Science
    Articles,  Blog

    Crawl Web – Intro to Computer Science

    Now we’re ready to write the code for crawling the web. So our goal is to define a procedure, we’ll call it crawl_web, that takes as input a seed page url. So, that’s the url that identifies our seed page, and outputs a list of all the urls that can be reached by following links starting from the seed page. So, if you’re really ambitious you should try to do this yourself without anymore help. That’s going to be a pretty tough challenge. So we’re also going to step through one way to do this as a series of quizzes. But you should feel free at any point, when you…

  • Print All Links Solution – Intro to Computer Science
    Articles,  Blog

    Print All Links Solution – Intro to Computer Science

    So here’s the code that we need to finish. We need a test condition for the while, and in this case we really want to keep on going forever until we’re done. So, we’re going to use while True and then use break to stop the loop. The test condition is true, and the way we know when we’re done is when the value returned as the URL was none. That means we got to the else, so to finish, we need to finish the else block by using break. Now let’s test our code. We’ll call print_all_links with our test string that has test 1, test 2, and test…

  • Articles

    Pop Quiz – Intro to Computer Science

    Now it’s time for our pop quiz to make sure everyone understands pop. For this quiz, assume that the name p refers to a list with at least two elements. Your goal is to determine which of the code fragments does not change the final value of p. And so that means no matter what value p starts with, as long as it has at least two elements after executing the code. And I’ll show you the choices for the code next. The value of the p is the same as what it started with. So here are the choices. Check the box for each code fragment where the value…

  • Overview of the Unit – Intro to Computer Science
    Articles,  Blog

    Overview of the Unit – Intro to Computer Science

    The goal of the first three units in this course is to build a Web crawler that will collect data from the Web for our search engine. And to learn about big ideas in Computing by doing that. In Unit 1, we’ll get started by extracting the first link on a web page. A Web crawler finds web pages for our search engine by starting from a “seed” page and following links on that page to find other pages. Each of those links lead to some new web page, which itself could have links that lead to other pages. As we follow those links, we’ll find more and more web…

  • Introduction
    Articles,  Blog

    Introduction

    Welcome to CS101. I’m Dave Evans, I will be your guide on this journey. This course will introduce you to the fundamental ideas in computing and teach you to read and write your own computer programs. We are going to do that in the context of building a web search engine. I’m guessing everyone here has at least used a search engine before. Like Google, DuckDuckGo or even my personal favorite – DaveDaveFind. You type in what you are looking for, and voila – in literally a blink of an eye, about a tenth of a second, back come the results. This might not be enough to make you wise,…

  • Great Job! – Intro to Computer Science
    Articles,  Blog

    Great Job! – Intro to Computer Science

    So congratulations, you’ve made it to the end of unit one. You’re learned a lot of computer science already, you know what a program is, you’ve learned about variables, you’ve learned about expressions and grammars, you’ve learned about strings in Python. So now it’s time for you to work on homework one on your own. And that will check that you understood everything from this class and prepare you to get started on unit two. And we’re well on our way towards learning a lot of computer science, as well as building our web crawler and then building our search engine.