CS 150 - Spring 2017 - Class 9

  • list recap
       - we can create new lists using square braces
          >>> [] # the empty list
          >>> [5, 1, 3, 5]
          [5, 1, 3, 5]
       - we can index into a list once we've created it to access individual values
          >>> mylist = [5, 1, 3, 17]
          >>> mylist[0]
          >>> mylist[-1]
       - we can do all the indexing tricks we did with strings, including slicing
          >>> mylist[1:3]
          [1, 3]

          notice that it returns a list
       - there are some built-in functions that work on lists: len, max, min, sum
          >>> len(mylist)
       - lists are objects and also have methods associated
          - append: add a value on to the end of the list
          - pop: remove a value off of the end of the list and return it
          - insert: inserts a value at a particular index
          - sort
  • lists are mutable
       - what does that mean?
          - we can change (or mutate) the values in a list
       - notice that many of the methods that we call on lists change the list itself

       - we can mutate lists with methods, but we can also change particular indices
          >>> my_list = [15, 2, 1, 20, 5]
          >>> my_list
          [15, 2, 1, 20, 5]
          >>> my_list[2] = 100
          >>> my_list
          [15, 2, 100, 20, 5]

  • run the sentence_stats function from word-stats.py code
       - similar idea to our scores functions except now we're going it over strings instead of numbers
       - the string class has a "split" method that splits up a sentence into a list by splitting on spaces
          >>> "this is a sentence".split()
          ['this', 'is', 'a', 'sentence']

       - optionally, can specify what to split on (though this is much more rare)

          >>> "this is a sentence".split("s")
          ['thi', ' i', ' a ', 'sentence']

  • files
       - what is a file?
          - a chunk of data stored on the hard disk
       - why do we need files?
          - hard-drives persist state regardless of whether the power is on or not
          - when a program is running, all the data it is generating/processing is in main memory (e.g. RAM)
             - main memory is faster, but doesn't persist when the power goes off

  • reading files
       - to read a file in Python we first need to open it

          file = open("some_file_name", "r")

          - open is another function that has two parameters
          - the first parameter is a *string* identifying the filename
             - be careful about the path/directory. Python looks for the file in the same directory as the program (.py file) unless you tell it to look elsewhere
          - the second parameter is another string telling Python what you want to do with the file
             - "r" stands for "read", that is, we're going to read some data from the file
          - open returns a "file" object that we can use later on for reading purposes
             - above, I've saved that in a variable called "file", but I could have called it anything else

             >>> open("english.txt", "r")
             <open file 'english.txt', mode 'r' at 0x10120a030>
             >>> type(open("english.txt", "r"))
             <type 'file'>

       - once we have a file open, we can read a line at a time from the file using a for loop:

          for <variable> in <file_variable>:
             # do something

          - for each line in the file, the loop will get run
          - each time the variable will get assigned to the next line in the file
             - the line will be of type string
             - the line will also have an endline at the end of it which you'll often want to get rid of (the strings strip() method is often good for this)
  • look at the read_words function in word-stats.py code
       - what does it do?
          - opens a file
          - reads a line at a time
          - appends each entry in the file to a list called words (stripping of the end of line)
          - prints out the statistics of the word file

       - in this same directory I have a file call "english.txt" that has a large list of English words

          >>> file_stats("english.txt")
          Number of words: 47158
          Longest word: antidisestablishmentarianism
          Shortest word: Hz
          Avg. word length: 8.37891768099

          - notice how quickly it can process the file
             - computers are fast!

  • sentence_stats vs. file_stats
       - code sharing is one of the keys of writing good programs
          - we shared print_stats, average_word_length, longest_word and shortest_word
       - we wrote two small functions that put strings into a list, but the rest of the code was shared
       - if you ever find yourself copy and pasting code, make a function instead, and call that function!