403.616
The content in this page is written in the form of Jupyter notebooks. You can read the HTML version of the notebooks here. However, you will likely want to open and run the notebooks in a JupyterHub instance, such as CLASSE JupyterHub. See CLASSE Wiki for details.
Programming Essentials - Python Programming and Jupyter Notebooks (PE100)
Welcome to the Programming Essentials - Python Programming and Jupyter Notebooks component of the X-CITE training materials. The intent behind this unit is to show new CHESS users how to write programs in Python for experimental data analysis.
Prerequisites? None.
The training materials before you are designed both for scientists who may not have any programming experience whatsoever and for those who have at least some basic programming capability but in a language other than Python.
Python’s adoption has exploded in the last decade. Much of its success can be attributed to productivity. Many programming languages force the programmer to deal with very small details to do even the simplest things. Python’s attitude is to just take care of all the minutae so we don’t have to. On top of that, Python’s popularity has resulted in hundreds of thousands of packages of useful code for specific tasks. If there is something you need to write a program for, it’s almost definite that someone else has had the same problem. There’s a good chance at least one of those people neatly wrapped up some of their code and made it available in one of the repositories on the internet. There’s no reason for you to reinvent the proverbial wheel again - take advantage of their work (don’t forget to cite it!) and get back to doing actual science that much sooner.
You’re currently looking at PE100-01 Introduction. If you are new to Python and especially if you’re new to programming, you should work through each of the modules in order. More experienced programmers might benefit from skipping directly to topics that interest them. Select one of the following:
- PE100-02 Types, Variables, and Operators - the heart of any programming language.
- PE100-03 Decision Structures - conditional statements (“if” statements”) change the program flow.
- PE100-04 Repetition - “while” and “for” loops let us do things over and over.
- PE100-05 Functions - Python comes with a lot of functions, but we can write even more.
- PE100-06 Files - Reading input from and storing your results to disk.
- PE100-07 Exceptions - Dealing with unexpected contingencies.
- PE100-08 Lists - Another kind of variable, and the key to structuring data storage.
- PE100-09 Strings - More details on working with text.
- PE100-10 Dictionaries - Like a simple database, look up information quickly.
PE100-02: Types, Variables, and Operators
Niklaus Wirth was one of the founding giants of Computer Science. He wrote an introductory textbook whose title neatly summed up the act and art of programming: Algorithms + Data Structures = Programs. Data Structures are how information is stored in a computer, and algorithms are the instructions the computer applies to transform that data.
To run the code in a cell, first click in the cell to select it. Then you can either: 1. Go to the “Run” menu and choose “Run Selected Cells”, or 1. Just press Shift + Enter.
Let’s do this now: click just below where it says “print (403.616”), then Go to the “Run” menu and choose “Run Selected Cells”.
When it ran, it printed “403.616” on a line by itself. That was the output from the print.
Click in the next cell (where it says “103.5”) to select that cell. Then hold down the Shift key while you press Enter.
In Python, and in Jupyter notebooks, if the last (or only!) line evaluates to some value then it will be printed out. That’s how “103.5” got printed - a literal number evaluates to that number when it’s run. A “literal number” means you look at it in your code and you literally see a number.
Take a look at a string literal. Run each of the next two cells…
At this point, we can use Python and Jupyter Lab as a scientific calculator. We have some literals of different types (int, real, and string, so far) and we can print them out with the print()
function. If we don’t explicitly print anything at the end of a cell, Python will show us the last value that was computed.
Operators
Like any programming language, Python lets you “do math” and lots of other things. Let’s take a look at some of the basic “operators”. In all of the code-containing cells through this course, try to predict what will happen first, and then run the code.
2
Besides the “classic” operators, there are some handy extras:
What happened there? The //
operator does integer division - it returns the whole number part of the answer, just like when we learned division in elementary school.
The %
operator returns the remainder. This is also called “modulo”, and the above would be pronounced “sixteen mod 3”.
256
The **
operator does exponentiation. The arguments can be integers or they can be real numbers. Naturally, operators can be combined into arbitrarily long expressions.
Notice what happens when we use different operators. They are applied in the “My Dear Aunt Sally” order of precendence (multiplication, division, addition, subtraction).
Order of operations: * Exponentiation: **
* Multiplication, Division, Remainder: * / // %
* Addition and Subtraction: + -
Within the same level, operators are applied left-to-right. 8-5+2 is evaluated as 3+2 and yields 5. The exception is exponentiation: 2 ** 3 ** 4 is treated as 2 ** 81 and yeilds an annoyingly large number
Variables
Unless we just use Jupyter as a big, expensive scientific calculator, we need a way to store data. Variables were invented for just that purpose, and virtually every language has them. Think of them as a place to store data of some kind, and that place has a name. They behave in Python just like you’d expect.
42
We just created a variable named answer
and gave it the value 42. Variables are long-lived - later we’ll talk about just how long when we start writing our own functions, but until then our variables last as long as Python (or in our case, Jupyter) is running. Take a look - answer is still there.
The value stored in a variable can change. It can even change type:
We can declare many variables, and we can “do things” with them just like we can when we type in numbers or strings.
In the last line, we just put watts
because Jupyter automatically prints what the last line evaluates to.
We can use variables to change the order of operations. Let’s see the average price of two people’s meals:
That’s the right answer. If we hadn’t done that, we would have gotten
which is utterly wrong. Beware of the order of operations… it is a frequent source of bugs in scientific programming.
Variable Naming Rules
For the most part, you can pick whatever name makes sense for a variable, but there are some rules. When choosing a name: 1. No keywords (False
won’t work.) 1. No spaces (sample thickness
is invalid) 1. The first character must be one of * a-z, or A-Z, or _. (the underscore character) * As a result, no numbers (3rd_sample_holder is invalid) 1. After the first character, you can then have numbers (sample_holder_3 is perfectly valid) 1. No other symbols are allowed (exploded&destroyed_spectrometers is invalid, and probably suggests it’s time to review lab safety procedures).
Note: Uppercase vs. Lowercase matters! Bevatron
is not the same variable as bevatron
Types
We’ve hinted that variables have a “type”, and that the type can change if it needs to. The way it works is that variables keep track of what type they are (integer, real number, or string) and what their “value” is. We can even interrogate a variable as see what type it is:
reading:
<class 'float'>
<class 'str'>
The type of a variable matters. Let’s create a variable with an integer in it and another with a string. Then let’s do some math:
How do we handle situations like that, where second_thing
held a string representing a seven, but because it was a string variable it couldn’t be used as an integer? Python provides a few functions to convert values from one type to another. The str()
function takes a variable and converts it to a string. The float()
and int()
functions convert their arguments to floating-point and to integer numbers, respectively.
13
13.0
Being able to convert values from one type to another is often called type coercion. These conversions are extremely important for situation where you need to get input from a user, even more so if you need to do it repetitively.
Continuation Character
Sometimes the expressions we need to evaluate can be very long. It would be nice if we could split up a long expression and spread it out over a few lines. As a small example, we’ll take a look at 4+2+3. Many programming languages will let us split an expression anywhere we want, such as:
…but that result isn’t right in Python. The last line, +3
, was evaluated and printed as the result of running that cell. In Python,it turns out, if we need to continue an expression on the next line we must end the current line with a backslash \
and press enter. It has to be a backslash, by the way, and cannot be the forward slash like we use for division.
9
Time for an exercise! Try to predict what will be printed when you run the next cell. Then, run the next cell and see how you did. If you miss one, make sure you figure out what happened before you go. I know, we’re professionals, I shouldn’t have to say that…
Now write an expression to average three numbers (12, 14, and 66), divide the result by three, and square it. You can use the code cell right below here:
The String Type
At the beginning of this notebook, we casually mentioned “strings” without saying what they are. They’re just “sequences of characters”. And these can be any kind of characters - the English alphabet, the Hungarian alphabet, hiragana… it doesn’t matter.
Some, probably most, languages contain strings inside “double quotes”, "
, which is shift+apostrophe on US English keyboards. Other languages (SQL and Pascal are the only two I can think of) use single quotes: '
. Python lets you use either one. You do have to be consistent in each string, but it can vary from one string to the next:
Because we can use either type of quotation mark, we can exploit that to let us put quotation marks into strings:
Don't put explosive mixtures in the spectrometer, please.
Of course he was warned... "Do not turn the spectrometer into a bomb, please" but I am sure he ignored that.
That lets us embed whichever kind of quotation mark we need into a string.
But what if we need to embed both kinds of quotes into one string? We’re in luck: we can use the backslash character again to “quote” our quotation mark. In fact, we can quote any character with it if we need to.
We told him "Hexanitrohexaazaisowurtzitane and spectrometers don't mix, buddy", but we're pretty sure he ignored us.
That sentence contains three things, inside the string itself: 1. Double Quotes to surround a direct quotation 2. A single quote, also called an apostrophe depending on how it’s used, to make a contraction, and 3. A totally awesome/terrifying molecule you have to google to believe.
OK, I’ll save you the trouble. Prepare to lose most of a day’s productivity. You’re welcome.
(Derek has written gobs of articles on fun substances. Here are some more. )
There is one last kind of string literal. Sometimes you need a string that is several lines long. The “triple quote” is a way to do it. You have to use three double-quotes in a row:
Triple quotes are also an easier way to embed mixed kinds of quotation marks into strings:
I know people who say "The Avengers" isn’t a good movie, but I don’t agree.
Coming Up Next
We just looked at enough of Python and Jupyter notebooks to use it as a basic calculator, but so far we can’t do any real, general-purpose programming with it. The “flow of control” sob far as been a straight line from top to bottom and we can’t change what we’re doing in response to different inputs. That’s about to change. In the next section we’ll look at the if
statement and how to use it.
PE100-03: Decision Structures
In the first lesson, everything we did was sequential programming. Statements are executed one after the other in exactly the order they’re written in. As long as there aren’t any errors, every statement will be executed.
The Simplest “if” Statement
In almost any real Jupyter notebook or standalone program we write, there will have to be places where different code paths are taken depending on what has happened leading up to there. Suppose we’re looking at absorption at one specific wavelength and we know that some of our instruments are a little bit too sensitive to changes in humidity. Maybe the first spectrometer has some insulation that is just a little too porous and reads a bit high, but the second one is even worse. We have calibration constants we can apply, but we have to apply the right constant for each individual instrument.
7.539441569999999
Here we have the first Decision Structure (also called control flow statement) that we’ll look at. Taking the above code apart, we see several important things.
- This is an “if statement”.
- Testing to see if two things are equal is done with two equals signs, not one (
==
). There’s a historical reason for this, and it’s a good reason, but it always trips up newcomers. You have been warned. You’re welcome. - The last character on the
if
line is:
(a colon ). - The “body” of the
if
statement, the part that is run if and only if the tested condition is met, is indented.
In the case of the above if
statement, what the code does is check to see if we’re using spectrometer number 1 and if we are then we add 7.7% to the reading and save it in a variable called “useful_result”.
Else: the catch-all specialist
If that was all an if
statement could do then it would be really useful. But that’s not all it can do. We need to do something reasonable when we get readings from the second instrument. Such as:
8.3304879
Here we have added an “else clause”. The above code is interpreted as “check to see if we’re using spectrometer number 1 and if we are then we add 7.7% to the reading and save it in a variable called useful_result. Otherwise, set useful_result to whatever is saved in”reading” plus 19%.
So far, so good. But there’s more! Suppose we need to handle several of these not-quite-top-quality spectrometers. How do you suppose we could deal with that? We could resort to putting if-else statements inside if-else statements in sort of a brute force fashion…
6.4403771999999995
The above code looks a little intimidating, but all there is to it is just a series of if
statements. The logic of it goes like this: “If the instrument number is 1, then adjust it 7.7% and we’re done. Otherwise, it must be some other instrument number, so run our else clause”. Then in the else clause, it does the same thing, except checking for the second instrument and adjusting by 19%. If there was nothing to do there (because the instrument number was 3) then we run the else
clause of that second if statement. This else clause houses an if
statement that checks to see if the instrument is number three. This time it is, so the body of the if statement is executed. We set useful_reading equal to 92% of reading.
Elif
This is fine if we only have three instruments, but what do we do if we have 20 of them? We could, in principle, type in 60 lines of code, but that would be tedious, error prone, and would take a while to read and find any mistakes. Of course there’s a better way.
That better way is the “elif” keyword.
Let’s see an example with 5 instruments…
7.210422299999999
The final else
clause is the one that runs if no other clauses ran. If no clause’s conditional statement is true so no clause runs, whether it’s the if clause or any of the elif clauses, then the else clause runs. It’s really easy to spot else
clauses even from across the room - they’re the ones that don’t have a conditional test.
Note that the if
, elif
, and else
lines must end with a colon. True confession time: I forget the colons about half the time. Python catches it as an error, I fix it, and life goes on.
Slightly More Complicated
You can run more than one line of code in response to the tested conditions, but they have to be indented the same amount:
7.00041 True
There are four interesting things going on here. The first and most important thing to notice is that we’ve got more than one line of code running in response to an “if”, “elif”, or “else” clause. A collection of lines that should be run together as a whole is called a code block. Unlike many languages that mark the start and end of code blocks with special words or characters, Python just does it by using indentation. Everything that is indented the same amount is considered to be in the same code block. We’ll look at this in more detail in a few minutes.
Secondly, we’ve added lines to set a variable named “trustworthy” to a value depending on whether we had to adjust the reading. Evidently, if we have to compensate for old, dry, cracking insulators then we don’t really trust the instrument.
The third interesting thing is the values True
and False
. These are “Boolean” values, and when we put them into the “trustworthy” variable then it takes on the Boolean type. There are only two values, True
and False
. The capitalization is important.
The fourth thing to notice is that we’re sending two values into the print statement and it’s printing both of them. In general, we can give the print statement any number of arguments, separated by commas, and it will print all of them separated by one space.
Conditional (aka Relational) Operators
The conditional test in each part of an if
statement is an expression that results in a Boolean value. So far, the only conditional operator (or relational operator) we’ve seen is ==
. There are others, though. For the sake of completeness, I’ll include ==
here:
operator | tested condition |
---|---|
== |
equals |
!= |
not equals |
> |
greater than |
>= |
greater than or equal |
< |
less than |
<= |
less than or equal |
“Relational” has at least two meanings in computing. Relational Operators have nothing to do with Releational Databases.
Try This
For each of the following code cells, decide what the result is, run the cell, and see how you did:
True
True
False
False
False
Relational operators also work with strings.
equals Alice.
The person is not Bob.
Alice comes before Bob in alphabetical order.
Alice comes before or in the same place as Alice in sorted order
Working left to right, the M, the a, and the r match on
both strings, but when we finally get to the y and the k, y comes
after k in alphabetical order.
A couple words of caution: the comparisons are based on the ASCII codes for each character. The “A” in ASCII stands for “American”, and as you might expect that means it only works for English language text. If you need to handle other languages, even potentially, then there is a better way to do it and we’ll see that in the lesson on strings.
Also, Capital letters are always less than lowercase letters, and not in the way you might think. “A” is less than “Z”, as you might expect, but “Z” is greater than “a”. The numbers 0-9 are the lowest of all. Punctuation is sprinkled around and the only way to know for sure is to look up “ASCII Chart”.
Code Blocks
Let’s go back to that part about running several lines of code but they have to be indented the same amount. Python always runs “blocks” of code. That block might be as short as one line:
125.6636
or it might be arbitrarily long:
1511396.1762899999
Whether it was the one line example or the eight line one, Python will set out to run all of those lines in one shot, and as long as there aren’t any errors it’ll do it. These are known as code blocks.
The decision structures (again, also called control flow statements) in Python all do basically the same thing: they evaluate an expression and depending on whether it turns out True or False, they execute a code block in some manner. This means that wherever we can have a single line of code running in a decision structure we can have as many lines as we want.
Take a look at the following example. For the four possible combinations of potentially_hazardous
and explody
, decide what would be printed out. Then try out the combinations and make sure you know why each combination was handled the way it was.
Total Available Kaboom (TAK) to ruin your day is 1511396.1762899999
Did you notice potentially_hazardous and explody
? and
is a boolean operator. We’ve seen the arithmetic operators already (+
, -
, *
, /
, etc.) and now here are the boolean operators. They’re named after Boolean algebra, the algebra of logic, and are used to make larger logical expressions from smaller ones. There are three boolean operators: and
, or
, and not
.
The and
operator evaluates to True if both of its arguments are True. The or
operator evaluates to True if either or both of its arguments are true. The not
operator takes only one argument and reverses it: not
turns True into False and False into True.
Doctor of Medical Dentistry (DMD)
Coming Up Next: Loops
At this point, we’ve seen the most basic way to alter the flow of control in Python: the if
statement. We can write Python code to solve non-trivial problems now, but there are still some things we need in order to use Python as a truly general-purpose language. In the next notebook we’re going to make our code do something over and over.
PE100-04: Repetition
We started off learning Python with just simple lists of statements…
It's 100 celsius
or 212.0 in pagan units.
Then we added the if
statement so we could control whether or not certain statements would execute or not:
Good chance it's boiling.
In both of these cases, the code blocks only run once.
The problem is, sometimes we need things to run repeatedly. We want to look up all of the readings from an experiment or we need to compute the properties of something over dozens of temperatures each at dozens of pressures.
Python gives us two different ways to make our programs repeat things in a loop.
while
loops run a block of code, repeatedly, until some conditional statement evaluates to false.for
loops run a code block a specific number of times.
Let’s start with the while
loop.
While Loops
The syntax of a while
loop looks a bit like an if
statement. Take a look:
Looking at instrument number 1
and then maybe we'll look at the next one.
Looking at instrument number 2
and then maybe we'll look at the next one.
Done with all that looping.
...and ready to do something else now.
Here’s what the above code does. First, it creates a variable named “instrument” and sets it to 1. Then it goes into the while loop. The first time through, it checks to see if instrument is less than or equal to 2. It is (because we set it to 1 just a moment ago) so the while loop will execute the code block. This block prints out two lines and then it adds 1 to instrument. That means instrument now equals 2.
The second time through the loop, instrument equals 2. That satisfies the conditional statement of the while loop (2 is less than or equal to 2) so the code block runs again. Two more lines are printed out and then instrument is incremented one more time.
The while loop runs for a third time now. This time, 3 is not less than or equal to 2, so the conditional statement is false. This means the while loop is done - it won’t run its code block again, and the flow of control will go on to the next line after the while loop. It will run the two print statements explaining that the looping is over and it can go on to other tasks.
Let’s look at another example. Let’s print out all the powers of two that are less than 928.
2 to the 0 equals 1
2 to the 1 equals 2
2 to the 2 equals 4
2 to the 3 equals 8
2 to the 4 equals 16
2 to the 5 equals 32
2 to the 6 equals 64
2 to the 7 equals 128
2 to the 8 equals 256
2 to the 9 equals 512
2 to the 10 is too big.
Did you notice I sneaked something in there we haven’t talked about yet? See the “#” character on the line with the while
statement? That indicates the rest of the line is a comment. Python will totally ignore it. It’s handy for leaving little notes to yourself, like “why did I choose 928 there when I could have put 944?” This is very, very important when writing full-fledged, standalone programs. If you don’t leave some notes for yourself, you’ll never remember what you were thinking when you go back to that code six months from now. Also, the next person who comes along and has to change something in your code will greatly appreciate the hints.
Leaving comments in the code isn’t as big a deal in Jupyter notebooks… you can write rather substantial notes in a Markdown cell complete with boldface, italics, and whatever other fanciness you desire. On the other hand, it’s also nice to be able to leave your comments in the just the right place in the code so it flows effortlessly through your comprehension as you read it. Let experience and personal opinion be your guide here.
Reading information from the outside world
Notice that in both of those cases, we actually did know how many times the loop would run. We know that 2 to the 9th is 512 and so we know the while loop will only run that far. In fact, in every example we’ve had so far we’ve know what the output will be because we always have the same inputs. Computer software wouldn’t be terribly interesting if it could only run specific, known, canned inputs. Fortunately, Python gives us several ways to bring data into our programs.
The simplest way to bring data into a Python program is to edit the program and change the values we assign to variables. This is sort of the reducto ad absurdum method, but honestly it isn’t a bad way to handle very small amounts of input. It’s even easier in Jupyter notebooks since the code is just sitting there looking at us, waiting to be edited. For values that aren’t going to change very often (your name, perhaps, or the chargeback account number for using some instrument, for instance) then just assigning a value to a variable and editing it every once and a while is a fine way to go.
Another way to get data into a Python program is to read it in from where the user is running the program. For doing this, Python provides a function called “input” which takes an optional argument, specifically a string that is printed as a prompt. Python then waits for the user to type something as a response. When they do, that string is returned to the calling program. Here’s a simple example:
Please enter your name Erik
Hello, Erik
When the above code runs, the prompt “Please enter your name” is displayed right below the code cell and a text entry box is placed beside it. When you enter your name, it greets you.
If we were running this tiny little snippet of code as a regular program, the interaction would be in the terminal emulator window that we ran the program in. Because this is running in Jupyter, though, the interaction is directly in the notebook. The prompt and the entry blank occur just below the running code cell.
What will happen when we run the following?
Enter a number between 4 and 8 5.25
TypeError: can only concatenate str (not "float") to str
Wow! Python couldn’t run that and it “threw an error”. We’ll examine Python’s error handling facilities later, but for now we’ll just assume that means it came to a screeching halt. Looking at the error message, it seems there is some problem with trying to add a real number (a floting point number) to a string.
Type Casting
input()
prompts the user and returns the string they entered, but what if we want the user to enter a number? What do we do then? The answer is we’ll use a process known as type casting. The act of type casting is no more than converting information from one type to another.
There are three very useful functions for type casting: int()
, float()
, and str()
. Let’s see them in action…
16.454
34543456
4
What did the above do? First, it converted the string “9.9” (literally, three characters… it’s a string) to a “float” (a floating point number, some languages will call that a real number). The second example takes a string of 8 characters and interprets them as an integer. That value is what gets returned and stored in our variable. Finally, we copmute the number 4 by adding 2+2, and then we let the str()
function convert that to a single character long string having just the character “4”.
By now we know enough to be able to ask the user for a number and get something back that we can actually do math with.
There’s an even easier way, though. Just like function composition worked when you took precalculus, the results of a Python function can be used as the argument to another. Hence:
Enter a number between 17 and 34 26
26.0
Sometimes, function calls can be nested really deeply. Personally, when it comes time to debug code like that I find myself printing it out and coloring each level with a different highlighter pen.
Putting it together: while loops to get user input
The great thing about a while statement is that it can loop zero times, one, two, or twelve trillion. Best of all, we don’t have to know how many ahead of time. We could do the following:
print(“Computing an average.”)
sum=0.0 counter=0 data_point = float(input(“Enter a number, or enter negative num to stop”)) while data_point >= 0.0: sum = sum+data_point counter = counter+1 data_point = float(input(“Enter a number, or enter negative num to stop”))
print(“Average value is”, sum/counter)
When we run the code above, we’re prompted to keep entering numbers until we finally enter -999. Each time it goes through the loop it keeps track of the running total of the numbers and the count of how many numbers have been entered. Once it’s done, it divides the total by the count and displays that as the average.
Let’s step through what happens when the user enters 1, 2, 3, and -999: 1. The sum and counter variables are initialized to zero. 1. The user is prompted to enter a number, possibly a negative number to indicate no more data, and that input is type cast to a floating point number. 1. The while loop’s condition will be met any time a positive number was input (greater than or equal to zero). 1 is a positive number, so run the loop body. 1. This first time through, we’ll add the 1 that was input to our running total, which is now 1. 1. And increment the count, now equal to 1. 1. AND PROMPT THE USER FOR ANOTHER NUMBER!!! 1. Back at the while statement again, we check the condition and, yes, 2 is a positive number, so we run the loop’s code block. 1. Update the sum and count, and then… 1. PROMPT THE USER FOR ANOTHER VALUE!!! 1. Running the while statement again, the user entered 3, and 3 is positive, so the clode block will be executed. 1. Update the sum (now 6) and count (now 3). 1. Prompt for another number 1. Back at the while statement, we check and see that -999 is not a positive number, so we skip the code block and resume by running whatever follows it. 1. Having exited the while loop entirely, print out the average value by dividing sum/count.
All the boldface and all-capitals lines above are there to emphasize how important it is to make sure your while loop isn’t just checking the same thing over and over. If we didn’t get a new number from the user each time through, the value of data_point would never change. That would result in an infinite loop, causing Python to never be able to complete the code in that cell. If it ever happens to you, and it probably will, the “Interrupt Kernel” command on JupyterLab’s Kernel menu will stop the looping and let you get back to work.
The while
loop is certainly versatile… it can be used any time you need to do something repeatedly. If you know how many times you need to have the code block execute, either when you write the code or when it’s running, then keep a variable that is incremented in the block every time and exit the while loop when the counter hits the right number.
Where while
loops really shine is when it’s impossible to know ahead of time how many times the code block should run. The example above, where we keep accepting numbers until the user signals there aren’t any more, there’s no way to know how many times to execute that loop until we see a negative number. In a case like that, the while
loop is the only practical solution.
So if while
loops are so great and solve every problem, why do we need anything else? The big reason is expressiveness: they can be a little awkward to understand, especially when you’re looking at someone else’s code. Having the conditional test separated from the action that establishes when to stop makes it a little awkward to understand (or debug!) someone else’s code. This is especially true when we need to step through something by unusual increments.
So what are we to do in these cases?
For Loops
The for
loop is quite similar to the while
loop. The difference is that for
loops are controlled by a count whereas while
loops are controlled by a condition.
Let’s start with an example.
1
2
3
That is the simplest for loop you’ll see. Let’s look at the pieces. 1. The for
statement itself 2. The name of the target variable whose value will be changing as the loop runs (“the_value” in this case”) 3. “in” - and if this reminds you of set membership then you’re on to something 4. “range()” - this is an example of an iterable, which means “something that can be stepped through”. 5. The colon… the one I forget 50% of the time. 6. The code block, in this case just a print statement.
Most of the time, fairly close to “always”, the code block will take advantage of the target variable changing each time through. In our example, “the_value” is our target variable, as it loops through it will take on the values 1 through 3, and the code block has a print statement that uses it.
Before we examine the range()
function, let’s take a look at another iterable. We’ll talk about lists in a later lesson, but for now we can just wave our hands around and understand enough for the moment.
The sample weighed 143.6 grams.
The sample weighed 141.9 grams.
The sample weighed 139.4 grams.
The sample weighed 144.23219 grams.
You can use the target variable as many times as you want to in the code block.
Now let’s take a more detailed look at the range()
function. In its most basic form it takes one argument - the stop value.
0
1
2
3
This single-argument form starts at zero, counts up by one each time, and doesn’t include the stop value. This is different from every other programming language you’ll ever encounter. It’s just one of those things.
We’ve already seen the two-argument form. It takes a starting value and a stopping value, and iterates by one from the start until the last value that is less than the stop.
7
8
9
And there’s even a three-argument form. The third argument is the amount to step by.
12
15
18
The step size doesn’t have to be a positive number…
6
4
2
0
-2
In case you’re curious, the step size cannot be zero. If you really want an infinite loop, and there are cases where it makes sense, you have to use a while
loop instead.
As a general rule, any place where you can use an explicit value (a literal) you can use a variable. Arguments to a for
loop are no exception:
where should we start? 13
where should we run right up to and stop just short of it? 15
what should we step by? 2
13
If we need to do something a specific number of times, we need to pay attention to our starting and stopping conditions. I’ve messed this up so many times I know now to be careful. You’ve been warned.
How many numbers would you like to total up? 3
Enter a number 4
Enter a number 5
They add up to 9
Notice something wrong? If you ask it to total 3 numbers, it only prompts for two of them. There are a couple of ways to solve this. The easiest is to just use the one-argument form of range()
.
How many numbers would you like to total up? 3
Enter a number 4
Enter a number 5
Enter a number 6
They add up to 15
That offers a little insight into why Python has it’s funny “up to but not including” semantics: zero is a perfectly legitimate number and a very natural starting point.
The only problem with the single-argument method is that the values that the target variable goes through include zero. This may or may not be a problem if that value is used inside the code block. If you really need to count from one instead of zero, you can increment the stopping value:
How many numbers would you like to total up? 3
Enter a number 4
Enter a number 5
Enter a number 6
They add up to 15
And that behaved just like we expected.
You may have noticed a pattern already. We frequently need to compute a new value for an existing variable. What we’ve done so far has been along the lines of grand_total = grand_total + new_reading
. Python gives us a shorthand way to write that. We could instead express that as grand_total += new_reading
. There is no space between the plus and equals signs. The only reason this exists is to save you some typing. As you might expect, there are a few more of these Augmented Assignment Operators…
Operator | Example | Equivalent |
---|---|---|
+= | count += 1 | count = count + 1 |
-= | x -= offset | x = x - offset |
*= | product *= val | product = product * val |
/= | y /= 3 | y = y / 3 |
%= | val %= 2 | val = val % 2 |
Out of all of them, +=
is far and away the most commonly used one.
Nested Loops
You know what’s fun to put in a loop’s code block? Another loop! Best of all, it comes in pretty handy when dealing with high-dimensional data. Plenty of algorithms rely on nested loops, too. Take a look at this:
x= 0 y= 0
x= 0 y= 1
x= 0 y= 2
x= 0 y= 3
x= 1 y= 0
x= 1 y= 1
x= 1 y= 2
x= 1 y= 3
x= 2 y= 0
x= 2 y= 1
x= 2 y= 2
x= 2 y= 3
x= 3 y= 0
x= 3 y= 1
x= 3 y= 2
x= 3 y= 3
x= 4 y= 0
x= 4 y= 1
x= 4 y= 2
x= 4 y= 3
What’s going on here? Initially, the outer loop, the one that iterates zero through four and assigns it’s value to x
, runs. When it starts running its code block for the x=0 pass, the for loop for the y
variable starts. ‘y’ assumes the values 0 through 3, so the first four lines printed out are for x=0, y=0, then x=0, y=1, and so on through x=0, y=3. Once that inner for loop completes, the outer for loop gets to iterate again. Now the inside for loop runs again, only this time we have x=1. That’s why the next four lines are “x=1, y=0” through “x=1, y=3”. Every time the outer loop runs another iteration, the inner loop gets to run all the way from start to finish.
In later lessons, we’ll have a few opportunities to play with nested loops. In fact, we’ll get to do that in the very next lesson: Functions!
PE100-05: Functions
Functions in Python are very, very similar to functions in mathematics. Our functions take one or more input values and transform them into precisely one output. Let’s start with an example.
12566.36
The above code defines a function named “inductiveReactance” that accepts two input parameters. The first one, l, is the amount of inductance a coil has in henrys. The second one, f, is the frequency of interest (in hertz). We can call that function with parameters of one millihenry and two megahertz. The function computes the value 12566.36 (the unit is ohms) and returns that to the code that called it. In this case, it was a print statement that called it.
Take a look at the first line. The very first thing is the keyword def
(shortened from the word “define”). After the “def” is where you specify the name of the new function you want to create. In this case, it’s “inductiveReactance”. Next is a list of names of parameters, enclosed in parentheses. In our example, the parameters are “l” and “f”. Finally at the end of the line is a colon.
Once that first line is done, the next step is to write the body of the function. Just like an “if”, “while”, or “for” statement, the code block has to be indented consistently. Our example function computes the reactance of the device in question. On the last line of the function, the return
statement is how the function sends its computed value back to the code that called the function in the first place. Every function should have at least one return statement. I won’t get drawn into this debate: some say a function should have precisely one return statement and utilize whatever logical means necessary to make sure that all code paths eventually lead to it. Others say it’s not a problem at all for a function to have multiple return statements (and hence multiple ways for a function to end) if it makes the logic more understandable. Personally, I try to minimize the number of return statements in my functions but I’m by no means a zealot on this one. If I need seven different places to exit the function and return a value then so be it.
Encapsulation
Functions are useful in programming for the same reason they’re useful in math - ours encapsulate a chunk of code so you don’t have to think about what is in it every time. Imagine how tedious it would be to write a program that needed to compute cosine in a lot of different places in the code. You could, I suppose, type in a Taylor series expansion for cosine in each of the places where we need to compute a cosign. That would be irritating, error prone, and confusing to anyone else who has to read it. Instead, we can write a function exactly once to compute cosine and then call that function from many places in our code. Once we have the function tested and debugged, we don’t have to think about it again. That frees up mental energy for more productive uses.
Functions can be classified into one of two types. Void Functions exist for encapsulation and don’t actually return a value. print()
is an example of a void function. Value-Returning Functions, as the name strongly implies, return a value to the calling code. inductiveReactance()
is an example of one.
Here’s another example. This time, we’ll define a function that calls another function.
Area of a circle with a radius of 2 is 12.56636
We defined a function to compute the area of a circle. It needed to square a number and so we decided to write a function to do that. Functions can call other functions ad infinitum. In fact, functions can even call themselves! When that happens the function is said to be recursive. Recursive functions are very useful for solving some hard problems but they’re a little beyond an introductory module like this one.
Function (and Variable) Naming
What kinds of names can we use for functions? The same ones we can use for variables! More specifically, * No keywords (e.g., False
is invalid) * No spaces (e.g., my function
is invalid) * The first character must be: * a-z, A-Z, or _ (the underscore character) * No numbers (e.g., 1st_function
is invalid) * After the first character, the following are allowed: * a-z, A-Z, _, and 0-9 * No other symbols (e.g., get_room&board
is invalid)
As a widely agreed upon best practice, names should be meaningful and be composed of lowercase characters with underscores as separators.
Function Arguments
Input Parameters to functions are called arguments. They are the primary and best way to put information into a function, and definitely the way that causes the fewest problems. Arguments to a function in Python are mostly analagous to what we’re used to in math, but of course Python has some extensions.
A function can have any number of arguments, including zero. “A function of zero arguments” might sound like a mathematician’s idea of “humor”, but it can actually make sense in programming. Sometimes you just need to encapsulate part of your code so you don’t have to worry with it again. For instance:
==============================
==============================
Greetings, User. I'll start
loading the instrument config
files and opening connections
to them. It'll take a minute.
==============================
==============================
Now the code to print that banner is hidden away inside a function we’ll never have to look at again. Less mental clutter means fewer bugs.
And for the sake of completeness, functions can also take one or more arguments:
The race was 6.213712 miles long and my ankles were hurting the ENTIRE way.
The polynomial evaluates to: 1284.04
When arguments are passed into a function, they become parameter variables and can be referred to inside the function just like any other variable. This handy because the variables inside a function are called local variables and they have special properties: nothing outside of the function can modify their value, they’re destroyed and re-created every time the function is called, and these local variables supercede any outside variables with the same name.
Take a look for yourself:
Twice the wavelength is 40
Twice the wavelength is 40
Twice the wavelength is 40
Does that seem odd to you? What happened is this: four lines from the bottom we created a variable named “wavelength” and set it to 20. We then called the function to print it out doubled. We passed the global variable “wavelength” to our function which took it as its only argument. That argument became a parameter variable that was coincidentally named “wavelength”. That “wavelength” parameter variable has nothing to do with the “wavelength” variable in the main part of the program. Our function doubles that parameter variable and prints it out. At that point, the function completes and the flow of control goes back to the main body.
The next time our function is called an entirely new, fresh set of variables and parameter variables is created. This is important - it means that if we call the function with the same value every time then we always get the same result. Functions are unable to save their “state”. Like a football player on a stretcher, they have no memory of what happened before.
(OK, yes, there are ways for them to save their state. Sometimes it’s unavoidable and you just have to do it, but doing so makes more places for bugs to creep into your programs and makes it harder to understand later. Try to avoid it. We’ll talk about it later.)
Variable Scope
The degree to which your programs can “see” a variable is called scope. There are two levels of scope in most Python programming:
- Global Scope
- Defined in main Python file
- Outside of ANY function
- Try to avoid these!
- Considered poor design
- Dangerous to use: any part of the program anywhere can change these
- Bug Magnet!
- Local Scope
- Variables defined within a function
- Only visible and useable from inside their own function!
- Use these if at all possible.
The danger in global variables comes from two things. The first is the fact that the value can be changed anywhere in your program, either in the main program or inside of a function, and it’s devilishly hard to keep track of where that might be.
The second danger is more subtle. When a function saves a value into a global variable, the function is now said to have side effects. Side effects break the idea of isolation that functions are meant to give us. Imagine a mathematical function, such as tangent, if it had side effects. Calling tan(.0125)
would not only result in the tangent of .0125, but it would have some other effect on some unrelated part of math. Imagine if calling tan
caused your coordinate system to change every time? That would be insane.
It gets worse, though. What if our tangent function also read from a global variable and changed its behavior based on that. Then each time we called tan(.0125)
we might get a different value.
In other words, we basically broke math.
Similarly, when we write programs, if our functions have side effects then we’ve complicated them tremendously. And more complication means more places for bugs to sneak into our code and they’ll also be harder to find.
As an aside, there is a style of programming that eliminates global variables and, to an extent, even local variables. It’s called functional programming, and Python has some support for that style. There is usually more than one way to do anything in Python, and experienced Pythonistas will usually try to choose the most Pythonic way. Part of being in Pythonic style means to use (at least partially) a functional style.
Constants
There is an exception to the “no globals” rule: Constants. Just like in math, a constant is given a value once and never changed again. “Never changing” means “no side effects” so everything is OK. It is good practice to define your constants using ALL CAPITAL LETTERS.
1.9878e-19
Abstraction
A valuable property of functions is how they isolate the code and variables inside of them from being manipulated elsewhere in your software. A consequence of that is their ability to “hide” detail from us. We’ve already talked about writing a function, debugging it, and never having to look at the code inside of it again. What is every bit as useful, if not more so, is using functions to provide abstraction.
Abstraction is something we’ve used every day even if we haven’t thought it. Remember learning math? You started off counting things, and yes, that counts as math. If you had four bottle caps in one hand and three in the other, you could toss them all on the table, count them, and know that you have seven in total.
There are two problems with having to count everything. One is that the amount of stuff can get big in a hurry. Try using two hands and table to count sand grains. The other problem is that if there are any insights to be had, it’s hard to find them when you’re stuck down in the details. Fortunately, we learned arithmetic.
Arithmetic is great. We don’t have to deal with handfuls of stuff anymore. We can just use numbers and operators and get an answer without a bunch of messing around. We can start to see patterns we never would have just tossing bottle caps on the table. If we need to add 12 to something, we can instead add 10 and then add 2 more. This is so handy. Of course, it would be nice if we could just do something to analyze entire families of arithmetic problems.
Algebra lets us analyze entire families of arithmetic problems. We don’t have to fool with numbers if we don’t have to - we can just substitute variables in their place. We’ve hidden some of the complexity, like the petty little details of numbers, and abstracted that complexity away.
Similarly, a lot of problem solving is perfectly amenable to using abstraction. Let’s write a bit of code to run an experiment…
That function is a (admittedly fanciful) representation of running an experiment. It makes sense, anyone can understand it, and if there’s a bug in there then it’s going to be really obvious. The only problem: if we try to run it, it’ll crash because those other functions haven’t been defined yet. Shall we fix that?
Notice how the program is broken up into several functions? The best part is that you don’t have to keep everything in your head. All you have to remember is the part you’re working on. Smaller pieces, fewer bugs.
Modules
One reason Python has become so popular is the sheer amount of code that has been written in it and made available for public use. We’ve seen a few functions already that were built in to Python - int()
, float()
, and str()
, for example - but there are many tens of thousands of modules that are freely available for use in your own software. Just picking five common ones at random:
- math
- random
- os
- PyMySql
- psycopg
The first two contain functions for general-purpose math and for producing random numbers. The “os” module interfaces Python with the operating system the code is running on. PyMySql and psycopg provide connectivity to relational databases.
Remember at the beginning of this lesson when we wrote a function to calculate inductive reactance? I put the value of pi in there as 3.14159, but that really isn’t anywhere near enough digits for some problems. Let’s fix that:
3.141592653589793
There are two things to note here. First, the keyword import
is used to tell Python to go find a module with the right name and load it. The name we want it to find is the word right after the import
. And secondly, just looking at the output we can see that there are a lot more digits than when we did something by hand in our Inductive Impedance example (top of this page). In general, using a module that was (a) written by someone else and (b) is widely used and has been checked by a lot of people is going to avoid a lot of bugs. For instance, I would never code my own Fast Fourier Transform. Instead, I would use the one in the “numpy” module. I know how easy it is to make a mistake and I trust their work a lot more than my own. They have tens or hundreds of thousands of users and scores of developers. I have… a copy of Numerical Recipes that’s old enough to run for President.
Since we used the “math” module already, here’s a very incomplete list of what is in there: * sin(), cos(), tan(), acos(), asin(), atan()… - “acos” is “arc cosine”, etc. * log(), log10(), sqrt() - square root * radians(), degrees() - converts between them
And lots more stuff. How do you know what’s in it? Go to the online documentation: https://docs.python.org/3/library/math.html
Random Numbers
Another module that is heavily used is “random”. It generates random numbers, yes, but it can also do things like take a list of things and shuffle them randomly.
The random integer between 10 and 100 was: 20
The random float between 0 and 1 is: 0.6474502367565016
There are more functions available in the “random” module, including ones to select a real number from a non-uniform distribution. Take a look at https://docs.python.org/3/library/random.html
Here’s a slightly more complicated example:
Don't feel bad... proposal number 24 didn't get funded either.
Don't feel bad... proposal number 24 didn't get funded either.
Don't feel bad... proposal number 24 didn't get funded either.
Don't feel bad... proposal number 24 didn't get funded either.
Don't feel bad... proposal number 24 didn't get funded either.
Don't feel bad... proposal number 24 didn't get funded either.
Don't feel bad... proposal number 24 didn't get funded either.
Don't feel bad... proposal number 24 didn't get funded either.
Don't feel bad... proposal number 24 didn't get funded either.
Don't feel bad... proposal number 24 didn't get funded either.
Don't feel bad... proposal number 24 didn't get funded either.
Don't feel bad... proposal number 24 didn't get funded either.
Don't feel bad... proposal number 24 didn't get funded either.
Don't feel bad... proposal number 24 didn't get funded either.
Don't feel bad... proposal number 24 didn't get funded either.
Don't feel bad... proposal number 24 didn't get funded either.
Don't feel bad... proposal number 24 didn't get funded either.
Don't feel bad... proposal number 24 didn't get funded either.
Don't feel bad... proposal number 24 didn't get funded either.
Don't feel bad... proposal number 24 didn't get funded either.
Don't feel bad... proposal number 24 didn't get funded either.
Don't feel bad... proposal number 24 didn't get funded either.
Don't feel bad... proposal number 24 didn't get funded either.
Proposal number 23 was funded!
Don't feel bad... proposal number 24 didn't get funded either.
Let’s try out what we’ve learned so far. Use the next code cell to write a bit of Python that simulates rolling a pair of dice and adds the two values. Print the value out.
Let’s add to that… add a loop so that we keep doing that over and over until we get the same sum twice in a row. Some questions to ask yourself are “What kind of loop do I need?” and “How can I compare what happened between two different loop iterations?”
PE100-05: Files
You can write a good deal of software that runs entirely inside Jupyter notebooks or that runs on the command line and only communicates through the screen and the keyboard. Sometimes, though, you have to do with files. It may not be practical to hardcode all your data into assignment statements in Python, or maybe you have to deal with a number of files and therefore you can’t use pipes. link to Sys Fundamentals page here
We’ve already seen two basic ways to do Input and Output (often referred to as “I/O”). We’ve used input()
to read from the keyboard and print()
to send output to the screen. Those functions work quite well, except you might have to do a lot of typing or deal with your output scrolling off the screen. In neither case is the data durable - it goes away as soon as the program is done or you close the Jupyter notebook.
The input()
and print()
functions are just the tip of the proverbial iceberg in terms of getting information in and out of running Python code. Some of our other options include: * GUI controls: text box, menu, dialog box… * Networks: HTTP, TCP/IP sockets, Infiniband… * Databases: Relational (SQL) and NoSQL * Other: cameras, microphones, speakers, LabView
Files
Practically everyone is more-or-less familiar with the idea of a file, even if fairly few people know how they work. We’re going to ignore a lot of details for the moment and say this: a file is a long-lasting collection of bytes. It has a first byte, a last byte, and every one in between stays in the same order.
This begs the question “What is a byte?” A byte is just a small number from 0 to 255 (inclusive). We can assign meaning to those numbers, and if we’re smart about how we do it then we can represent any information a computer can process as long as we use enough of these bytes.
We like to think of files as being one of two types: binary files and text files. Binary files are pure data. We decide how to write bytes to a file to represent data. Then when we’re ready to read it in again, we read the bytes, process them somehow, and reconstruct the original data. It’s a great technique - it’s fast and efficient.
We won’t be talking about binary files in this notebook or even in this module. Fifteen years ago we wouldn’t have had a choice, we would have had to. These days, it’s unusual to have to deal with binary files, especially in Python, because there is so often a library function already available to do the work for us.
Text files, on the other hand, are probably something you’re already familar with - they are what you get when you edit a “plain text” file in “notepad” or “textedit”. In a text file, every one of the letter, number, and punctuation mark characters is assigned its own number. For instance, capital “A” is 65. “B” is 66. Not that it should ever matter, but here’s a complete list and then some!
Let’s say you open an editor and type “CAT”. When you save that to a file, there will be a file that is three bytes long and contains the three bytes 67, 65, and 84. Actually there will usually be a fourth byte, 10, which is the character you get when you press “Enter” or “Return”.
For now, at least for a few minutes, we’re going to pretend the only language on earth is English. We’ll talk about other languages when we talk about networks.
It’s about time for an example, don’t you think?
Three lines of code was all it took to create a file, write to it, and tidy up after ourselves. What does each of those lines do?
my_file_object = open("/tmp/first_file.txt", "w")
my_file_object is an object variable. Think of an object as a way to store data in a variable along with some functions that only make sense to that data. They hide a lot of complexity from us. A file object is one that keeps track of a filename, how to get to it, and how to use it. It has some functions built in to it to help us do things to the file.
Python gives us the function “open”. It gets a file ready to be used by our code. It takes two arguments. The first is the file’s name, and the second is the mode we want to use the file in. In our example, we specified that the file’s name was “first_file.txt” and that it was in the “/tmp” directory. Then in the second argument we specified “w”, meaning we wanted to write to the file. The “w” mode will cause the file to be created if it didn’t already exist. If it did already exist, on the other hand, all the contents of it will be deleted and we’ll start writing from the beginning just as if the file was created from scratch. We’ll see more modes as we go.
my_file_object.write("First Post!")
This line uses one of those functions that are tucked away inside an object. In this case, we’re calling the file object’s “write” function. It does what we expect - it takes its argument, in this case “First Post!”, and causes it to be written to disk byte by byte.
my_file_object.close()
Finally, we call one more of the file object’s functions: close. When we run this, Python tells the operating system “Hey, we’re done with the file. You can get rid of any of the tedious housekeeping data that operating systems keep behind the scenes!”
Closing files is considered “good programming hygene”. You’re allowed 1024 file objects to be open and connected to files in one program on the CLASSE cluster of computers. I’ll say from my experience: if you think you need that many, you’re probably doing something the wrong way.
Writing files, then, is fairly easy. What about reading files? I’m glad you asked.
'First Post!'
You can probably tell mostly how that worked just by looking at. We used the open()
function again, but this time with a “r” for our mode. This means “read”. Also, this time we used read()
instead of write()
. The read() function reads in an entire file and saves it a string variable. Finally, we call close()
again to close the file and tidy up after ourselves.
Note that if the file is, say, 500 megabytes long, the string variable is going to be very, very large - roughly half a gigabyte. Python can handle this, but it may not be terribly convenient. If the file is more than 100-200 gigabytes, the CLASSE servers are probably not going to be able to handle. I say “probably” because there are a lot of factors at play.
Just writing one line to a file is probably not very useful. Let’s try writing two lines:
When we run that, it will open /tmp/first_file.txt for writing and it will delete anything already in it (that’s what the “w” means, remember?). Then it will write “First line written.” and “This is my second line.”.
Let’s read the file again and prove to ourselves that it worked…
'First line written.This is my second line.'
Oh no! The two lines ran together!
And that is one of the first differences we’ll see between write()
and print()
. Print() always adds a newline character after it prints out anything. Remember when I said there would usually be a byte at the end of a line, represented by the number 10? This character is called “newline” and it, as the name implies, marks where a new line starts.
In all likelihood, when we do two write() statements like we did, we want to put a newline character in the file to make it into two lines. Fortunately, there are several ways to do that. Here are two of them.
The first way is simple and direct - call write() three times instead of two and put a newline in there “by hand”, as it were:
'First line written.\nThis is my second line.'
The output looks a little strange. We put an extra write() function call, but we gave it an odd looking argument - . That is a backslash (usually between the Enter and the backspace keys on a US keyboard) immediately followed by a lowercase “n”. The combination together means “newline character”. This much is fairly straightforward.
Next we read the contents of the file. This is just like before.
Finally, and this is where things take an unexpected turn, we evaluate the_contents
and let Jupyter print that out for us. And when Jupyter does that, we see the “” there. It seems like Python didn’t convert those two characters to a newline, just sticking them in there as-is, and still left us with one long line. But is that true? Has Python foresaken us?
Run the code in the next cell:
First line written.
This is my second line.
Salvation! print()
did the right thing. This is a key difference between just typing a variable or an expression at the end of a cell and letting Python evaluate it versus putting a print()
in there and having absolute control over what gets sent to the notebook and on to the screen.
This also illustrates something else important and useful: all of the code cells in this notebook are being run by the same Python “interpreter”. This means if we set a variable to a value in one cell, we will see the same value stored in that variable in other cells. That’s how we were able to print what was stored in the_contents
in the cell above even though we had set its value to the file contents two cells above that.
If a file only has a line or two, it’s not a big deal dealing with that with string functions. If a file has millions of lines, then it becomes a bit of a hassle. We need a way to read a file one line at a time. Fortunately, there’s readline()
:
First line written.
This is my second line.
This does almost what we expect: it reads both lines from the file, one at a time, and prints them out. The only snag is that blank space between the lines. What has happened? It turns out readline()
reads the entire line, even the newline character at the end. We can see this if we evaluate the string instead of just printing it:
'First line written.\n'
There’s that \n
again! What about the second line?
'This is my second line.'
When readline() reads a line, it includes the newline character at the end unless it reaches the end of the file and the file didn’t end with a newline.
It’s rare that we would want to read a bunch of lines in a file with the newlines included. That’s just not something we do very often, and practically never in scientific software. We’ll almost always want to trim off the newline character. And for that, we have the rstrip()
function. It takes a string, strips off any newlines on the right side of it, and returns that cleaned-up string. rstrip()
does that for the right side of the string, lstrip()
cleans up the left side (the beginning of the string) and strip()
goes crazy and does both ends at the same time.
Let’s try it:
First line written.
This is my second line.
What’s going on here? A couple of things. The first thing to note is that rstrip()
and its close companions lstrip()
and strip()
take one argument, which is the character to be stripped. Practically always we’ll want to get rid of the trailing newline character.
The other interesting things is how we called the rstrip() function in the first place. We gave the name of the string variable, a period, and the name of the function we were calling. This is just like how we called the close()
function on a file object. And in fact, strings are another kind of object in Python. We’ll see a lot more on this later.
Historical note: The original programming language that had objects was named “Smalltalk”. In Smalltalk, the functions that were inside of objects were called “methods”. You’ll still hear people call them that. Later, the “C++” language came along and it called methods “member functions”. When programmers talk about the functions that are contained in objects, we’ll use either term interchangably, sometimes even switching in the middle of a sentence. We now return to your Python tutorial, already in progress…
We read both lines in the file we created. We were able to call readline()
twice and know that we had all of our lines in the file because (1) we created the file ourselves and (2) we therefore knew it had precisely two lines. It wasn’t even too bad having to type those readline()
and rstrip()
lines twice. But what if we had a lot more lines? We would certainly want to use a loop.
For example, what do we do with a five-line file?
Line 1.
Line 2.
Line 3.
Line 4.
Line 5.
No problem - we just use a for loop and do the readline() inside of it. It repeats the five times we asked for. In this case, after we read each line we cleaned it up a little and printed it.
But what if we can’t know the number of lines ahead of time? One approach is to have whatever program that creates the file write the number of lines that will be in it first. I won’t say this is a common approach in scientific software, but it isn’t exactly rare either.
Line 1.
Line 2.
Line 3.
Line 4.
Line 5.
The overall scheme for this is probably obvious by now. In the first half, when we’re writing the file, we write a “5” on its own line, and then write five more lines. In the second part, we 1. Read the first line. 2. rstrip()
to get rid of the trailing newline 3. Use the results of that as the argument to int()
, converting that string (“5”) to an actual integer (5). 4. and finally go through a for
loop that many times just like before
Most of the time we won’t have the luxury of knowing how many lines are in a file, though. We need a way to read all of the lines, line by line, without limit. For that, we can loop through the file and quit when Python returns an empty string with not even a newline character.
5
Line 1.
Line 2.
Line 3.
Line 4.
Line 5.
The while loop behaved just like we expected - strat by reading a line, and then every time the line isn’t empty, print it out and read another line. When you finally hit a line that is completely empty, exit the while loop and close the file.
Looping through a file all the way to the end is such a common thing to do, Python has a shortcut for doing it. Remember when we talked about a for
loop iterating over an ordered set? A file can be thought of as an ordered set of strings. They’re not in alphabetical order, but rather they are ordered by line number. That means we can:
5
Line 1.
Line 2.
Line 3.
Line 4.
Line 5.
As you can imagine, reading isn’t the only file operation you can do with a loop. You can also write to a file that way. For instance,
0
1
2
3
4
5
6
Finally, we don’t have to erase the contents of a file every time we write to it. It’s perfectly normal to append to an existing file, and for that the “a” mode can be used with open()
.
0
1
2
3
4
5
6
7
8
9
When you use the append mode, the write()
calls will either add to the existing file or, if it doesn’t already exist, it will be created and then written to as though we used the “w” mode.
So far in this lesson we’ve acted like everything just works perfectly every time. In reality, it’s not that neat. Filenames get typed in wrong, didks get full, and lines that are supposed to be numbers might contain text instead. Any of these problems is enough to bring our Python code to a grinding halt. Our next lesson is all about how to handle these problems and many, many more like them. We’re going to learn about Exceptions!
PE100-06: Exceptions
Most of the time, the code we write does exactly what we expect. Our numbers are added up, files are written and read, and users type their input in neat little boxes. Sometimes, though, something goes wrong. Maybe the disk storage space filled up, or we try to write to a file in a directory we don’t have access to (or maybe the directory doesn’t even exist). When things like this happen, the Python interpreter stops the normal flow of execution.
Take a look at an exception:
ZeroDivisionError: division by zero
When you run the above, Python will notice the error, stop the code from running, and point out that a “ZeroDivisionError” has occurred. Since this kind of thing wasn’t supposed to happen (division by zero is considered a Bad Thing(tm) by most people) we can say the situation we’re faced with is an exception. And indeed, Python’s error handling mechanisms are based on what are called “exceptions”.
When Python saw the “division by zero” error, it stopped running the rest of the code. It created one of these Exceptions, and then it threw it. Nothing in our one-line example tried to do anything about that exception, so Python just let the program crash and it printed the helpful error messages for us.
Most of the time, we want our code to be able to handle exceptions when they arrise. We want something that can catch these exceptions when they’re thrown. For that, we need to use Python’s try
statement.
Try, try again
try
is how we safely wrap up a bit of code so that if something in there fails and an exception is thrown, we have a way to catch it. For example:
Please enter the denominator 0
Looks like someone tried to divide by zero.
Either we were able to do the division or else we successfully handled an exception.
Try running the code above a few times. In the input area, try some different numbers each time. Maybe 4, 0, and -2. Notice that division by non-zero numbers works as expected. Notice also that division by zero now lets us print out an error message instead of crashing. Once we’re done handling the exception, the program resumes with the first line after the try
/except
structure.
In fact, there might be several except
clauses if there are several kinds of exceptions that might be thrown. For example, let’s figure out how to share a pizza.
How many people: 0
Seriously? There are zero people sharing a pizza?
Whatever happened up there, this is the first line of code after
the try/except structure.
As you try different numbers of people, you can see that division by zero is, of course, handled. You can also enter things that aren’t integers. In response to the prompt, you could enter “Fred”. That can’t be converted to an integer, so the int()
function throws an error. The except ValueError
clause catches that exception and prints out a message.
Notice that after either exception handler executes its code, the flow of control goes down to the next line after the try/except structure. In this case, that line is one that prints out a message saying it’s the first line of code after the try and all of the excepts.
Sometimes it’s hard to predict what exception might be thrown in a section of code. In that case, we can use just except:
without any exception type. This serves as a “catch-all” handler.
The catch-all handler has been awoken from its slumber.
I don't know what went wrong, except I can tell you it
wasn't a ValueError or a ZeroDivisionError, because
those would have been caught by more specific handlers
further up the list.
Indeed, if we’re lazy (or in a hurry) then we can get by with just a plain except
clause and let the user figure it out later:
There was some sort of problem. I have no idea what.
Using just a plain catch-all exception handler doesn’t give you much to work with, but it is slightly better than nothing. Your code won’t crash outright but you won’t much information about what went wrong. If only there was a way to examine that exception, to peer in and divine its secret nature…
Yep. Here you go…
Error: [Errno 2] No such file or directory: '/tmp/ThisFileIsUnlikelyToExist'
What we’ve done is catch any kind of exception (except Exception
) and assigned it to a variable named “err”. Then we can print out err. We could even convert err to a string and search for the interesting parts (like the filename of our missing file) and do some clever error handling based on what specifically went wrong.
Python has a few more tricks when it comes to exception handling, and these can be handy for making your code more readable.
Fancy exception handling
A try/except structure can have an else
clause. This clause will only be executed if no exception was thrown.
How many people: 0
Error: division by zero
If the user enters something that can be converted to an integer and is non-zero, then the program continues, finishing up the try
block and executing the else
block. On the other hand, if an exception of any type is thrown then the “number of pieces” message will never be printed.
There is also a finally
clause. This one will run after everything else has happened, no matter what.
How many people: 0
Error: division by zero
In the try
clause, a file opening was added. In the finally
clause, the file will be closed whether an exception was thrown or not.
How useful are else
and finally
clauses? It’s true they’re not absolutely necessary. Most programming languages don’t have anything like that. You can always juggle your code around and get by with just try and except. On the other hand, these two clauses can make your code easier to read and understand. Your precise intention can be discerned.
We’ve seen how to write Python code that catches errors without crashing. This technique works in both regular Python programs and in Jupyter Notebooks. Next up, we’ll turn back to ways of storing information. This time we’ll look at lists.
PE100-08: Lists
All of the variables we’ve seen so far store exactly one value. If you set the variable “weight” to 74.5, then 74.5 is the only value there is in “weight”. Nice and simple. If we need to save several values then we can use several variables…
As you can imagine, this turns tedious in a hurry. What if you had a thousand values to deal with? And even if you did all of that typing, doing any kind of non-trivial computation with it would be difficult, too. We need a way to store a bunch of values, but doing it in a way that makes it easy to manipulate the whole thing as a whole or each individual value. For doing that, Python provides us with lists.
Python is one of the few languages that support lists deep down in the language itself. Because of that, they’re easy to work with. Let’s take a look, shall we?
['Alice', 'Bob', 'Candice', 'Dan']
Lists are represented with square brackets [ ] at the beginning and end, and with the values inside the brackets separated by commas.
The values in a list don’t all have to be the same type.
A list can have any number of values, limited only by the amount of memory in the computer that is hosting the Jupyter (or JupyterLab) server. Lists are even allowed to have no values in them.
So far we’ve been creating lists using literal values, but we could use variables just as easily…
[16, 2, 6]
To find out how many elements are in a list, use the len()
function:
4
3
There are operators that act on lists. The *
operator is used for repetition…
[1, 2, 3, 1, 2, 3]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
…and the +
operator combines two lists:
[1, 2, 3, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Lists are iterables, just like the results of the range()
function, so they can be iterated over using a for
loop:
David
Bill
Richard
flour
lard
baking powder
milk
flour
lard
baking powder
milk
flour
lard
baking powder
milk
It’s fairly common to iterate over a list for things like sums and averages.
16
The above code steps its way over all of the values in the list. Each time it goes to a new value, it adds that value to total
. When it gets to the end, all of the values have been added up. If we want an average, we don’t have to count the values ourselves. We can just use the len()
function.
4.0
Sometimes you need to use a particular value in a list and you don’t want to iterate over the whole thing. For this, Python gives us indexing, letting us directly access any element of a list. The first (as it appears on screen, “leftmost”) element in numbered zero and each one after that goes up by one. The highest numbered one is therefore the length of the list minus one.
['flour', 'lard', 'baking powder', 'milk']
Of all the biscuit ingredients, flour is the most important one.
The second most important one is baking powder
Like the majority of programming languages, Python uses square brackets to indicate the index into the list. Unlike the vast majority of languages, Python allows indexes to be negative! A negative number for an index means “count backwards from the end”. my_list[-1]
refers to the item at the end of the list. my_list[-3]
refers to the third to last item.
milk
lard
We’ve seen how to iterate over lists and also how to access individual list elements by using indexing. Python has a special indexing scheme, though, that lets us deal with small lists made from our original list. This is called List Slicing and can save you a lot of work sometimes. The overall syntax for this looks like list_name[start:end]
An example is definitely called for here:
[4, 6]
Remember that list indexes count from zero, and remember also that ranges in Python include the starting index (here, it’s the 1) and will continue to the last value that is smaller than the one on the right side of the colon.
Both the starting and the ending indexes are optional! If one of the two is missing, it will be interpreted as 0 or the list’s length, respectively.
[2, 4, 6, 8, 10, 12]
[2, 4, 6]
[4, 6, 8, 10, 12]
[2, 4, 6, 8, 10, 12]
And finally, the in
operator is used to test list membership.
Enter your favorite number 8
Sorry! Better luck next time!
There’s Method to the Madness
There are two kinds of functions available for working with lists. Built-in functions are the ones that are part of Python itself. Methods, as you’ll recall from the unit on files, are special functions that are situated inside of objects and only usable with that kind of object. Python lists are objects. They’re iterable objects, in fact.
Let’s take a look at a few of the methods available for working with lists. First up is append()
.
[2, 7, 17, 9]
[2, 7, 17, 9, 106]
Just as the name implies, append()
adds an element to the end of a list.
But what if we want to put a new element in a specific place? For that, there is insert()
.
[2, 7, 17, 9, 106]
[2, 7, 202, 17, 9, 106]
The insert function takes two arguments. The first is the position in the list where the insertion should happen. In the example above, it was at position 2. Remember, list indexes start at zero! The second argument is the element to insert. And when we look at the resulting list, we see that 202 is in position 2 now (which is the third position!) and all the other elements have been shifted to the right.
We’ve been fetching elements from the list by location number, so far. How do we find something by searching for it? The index()
method does that.
2
We passed the argument 202 to the index method. It searched the list and returned the index of the first occurence. That index is 2. Makes sense because we just inserted it there a minute ago!
If we can insert things into a list then surely we can remove them too, right? Indeed we can with the remove()
method.
[2, 7, 202, 17, 9, 106]
[2, 202, 17, 9, 106]
Watch out! remove()
looks up an item, like index()
does, and then removes it. It doesn’t take a position number. In other words:
['David', 'Richard']
You might find yourself needing to sort the items in a list, and for that the sort()
method exists:
[2, 202, 17, 9, 106]
[2, 9, 17, 106, 202]
Finally, there are methods to find the greatest and smallest values in a list.
2
202
Earlier we saw the use of len()
to find out how many items are in a list. This is a built-in function and works on many types of variables, not just lists. There are two more built-in functions that are useful for working with lists: min()
and max()
.
Bill
Shirley
Lists and Functions
Functions have no problem accepting lists as arguments and they can also return lists as the function’s value. There is a subtle “gotcha” when passing lists as an argument, though.
First, let’s look at a simple example:
15
That worked as expected - there’s no problem passing lists into functions. What about returning lists from functions?
['goldfish', 'catfish', 'goldfish', 'catfish', 'goldfish', 'catfish', 'goldfish', 'catfish', 'goldfish', 'catfish']
Earlier, when we talked about functions in section 5, we said that if a function changes the value of one of its arguments then the effects of that change stay inside the function and aren’t visible to anything when the function exits. That statement was mostly true. If you pass a list as an argument to a function and if that function changes the list then the change made there will be visible outside. Strings, floats, and integers asre protected, but lists are more exposed.
[1, 2, 3, 9]
[2, 4, 6, 18]
Changing the value of an argument inside of a function usually isn’t a great idea, but in the case of lists it can be useful.
No Funny Glasses Required
The lists we have worked with up to this point have all been one dimensional. Lists get a lot more interesting as the number of dimensions goes up.
Unlike most programming languages, Python does not have a multi-dimensional list or array construction, per se. What Python does have is a list that is versatile enough to contain anything - and that includes containing other lists! A two-dimensional list in Python is just a “list of lists”.
Take a look:
Above, on that very long line, we’ve created a list with square brackets. Inside that list, we’ve put three more lists inside square brackets of their own. So we’ve made a list of lists.
That long line is hard to read, isn’t it? Python won’t let us just split a long line of code across multiple lines… unless we explicitly tell it what we’re doing. That is done by ending each line with a backslash and immediately pressing enter. It looks like this:
Jupyter even goes to the trouble to line up the columns for us.
Anyway, let’s see what we’ve created.
[['George', 'Washington'], ['John', 'Adams'], ['Thomas', 'Jefferson']]
['George', 'Washington']
['Thomas', 'Jefferson']
We can index into the outer array, the one that contains the smaller lists, just like we normally would. We can also index into the inner array two different ways. The long way…
'George'
… or we can take the shortcut:
'George'
The first zero got us to the “George”, “Washington” element, and the second zero indexed into that and gave us ‘George’. Let’s try some other combinations:
'John'
'Adams'
It’s easy to see how we’re indexing into this two-dimensional list. In fact, it works roughly the same way as a 2-D array in most programming languages.
It’s so similar, in fact, that you’re probably feeling the urge to do some Linear Algebra right now.
Don’t. Not yet.
Python’s multidimensional list support is exactly that: support for lists. It can be pressed into service for arrays (in the linear algebraic sense of the term) but performance is pretty bad. In Programming Elements 101 we’ll see a software library called “numpy”. It is superior for arrays where you want to do some math.
Now let’s look at how to traverse multi-dimensional array. We’ll create a 2-D list that look like this:
Column 0 Column 1 Column 2 Column3
Row 0. A B C D
Row 1. E F G H
Row 2. I J K L
Row 3. M N O P
We can get a whole row:
['E', 'F', 'G', 'H']
or we can get a specific cell (the order is row, then column):
J
We can access the table by column, but it’s not as easy. We’ll have to write a loop that steps down a column and reads the values:
D
H
L
P
What if we want to access all of the cells in the array? For that, nested loops work.
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
“But wait!”, I hear you say. “I need to store higher-dimensionality data!” No problem. Python will allow arbitrarily deep nesting. We can have lists of lists of lists (3 dimensions) or lists of lists of lists of lists for four dimensions. Accessing the cells is just a matter of adding more array indexes to the end of the name.
Adams
Aaron
It’s easy to get confused with deeply nested lists. Three dimensions isn’t bad, four is managable, but as the structures get deeper and deeper I have to resort to drawing pictures and frequent testing every step of the way.
tl;dr: If you’re a string theorist working in 21 dimensions or whatever, Python lists probably aren’t the way to go. You should use numpy.
Tuples
A “double”, mathematically speaking, is two of something. A “triple” is three of them. If you don’t know how many, or you don’t want to specify, then it’s generically called a “tuple” (pronounced “Too pull”, according the The American Heritage Dictionary and, more importantly, everyone who has ever taught the database class).
Python gracious provides us with tuples. Their syntax is just like a list, only using parentheses instead of square brackets. For instance:
8
Tuples have some restrictions when compared to lists. * You can’t sort them. * You can’t insert or delete from them * You can’t change the values in them
Why would we want tuples if they’re so similar to lists, only somewhat disabled? In a word, “speed”. They’re very fast compared to lists. That’s why some Python functions require them. The most likely time you’ll see tuples is when you’re accessing data from a database. The second most common use is when you need to return multiple values from a function.
Since tuples have the speed advantage but lists are more versatile, it’s not unusual to see programmers use the list()
and tuple()
functions to convert between the two types:
[2, 8, 256]
(2, 4, 6, 8)
Returning multiple values from a function feels like cheating the first time you do it. After all, sin(x)
returns exactly one number, right?
What if you wrote a function that returns a complex number, like 1.105+7.3i ? That’s one number (albeit one on the complex plane) but it’s written like two pieces of data being returned.
What if you got really fancy and wrote a function that returned a column vector? That would be like returning a lot of numbers all at once, wouldn’t it?
So returning multiple values at once isn’t that bad, is it? Especially if the values all have related meaning and “belong” together.
2
7
A couple of things to note. First, notice how the function creates a tuple and returns it. The parentheses indicate a tuple is being constructed and the min_val and max_val variables are put into the tuple as the first and second elements.
Second, look at how that tuple is returned to the caller, taken apart, and stored in a pair of variables. You’ll see the syntax first_variable, second_variable, third_variable = func()
when a tuple is returned from a function. The first element of the tuple is placed in first_variable
and so on.
Coming Up Next
We’ve made it to the end of this section. Take a moment, breathe, and relax… this is the longest module in the “Python and Jupyter” series. Next up we have lots of information on strings. We’ve been using strings a lot already without really looking at what they are and what they can do. It’s time to remedy that.
(pssst. Want a hint? Strings are just tuples of letters!)
PE100-09: Strings
We’ve been using strings in each of the previous modules, but we’ve accepted them as a found artifact without really getting into them and seeing how they. This module will correct that deficiency and make all of better peopleprogrammers.
Review time
Let’s take a quick look at what we’ve done so far.
String literals: "Doug McKenzie"
String variables: comedic_genius = "Mel Brooks"
Comparison: if my_name == your_name:
if your_name != "Chuck Woolery":
Concatenation: full_name = first_name+last_name
Repetition: "ABC" * 20
There is a lot more we can do with strings. We can: * Index into them * Iterate over them * Slice them * Search them * Call methods that act on them…
In fact, when you look at what you can do with a string and how you do it, you suddenly realize that a string is just (conceptually) a tuple of letters.
We can index into a string:
J
i
i
Strings are iterables:
J
o
h
n
B
e
l
u
s
h
i
Just like a tuple, the elements of a string are immutable. Once a string is created, the characters can’t be changed.
TypeError: 'str' object does not support item assignment
We can slice strings:
ch
ael Jordan
Mi
We can search into a string with the in
operator:
found one!
There are a huge variety of string methods. We’ll look at just a few here. Some of them are handy for validating inputs…
How many bolts did you install? op
I was expecting something that looked like an integer
PE100-10: Dictionaries
Dictionaries are, conceptually, a special type of list. A list has an order to it. Elements are placed into a list in a particular order. Removing and inserting items changes the order in a well understood way. Because of this, yoiu can always access a list by looking into it at a specific integer index. None of this is possible with a dictionary. Instead, dictionaries have a better trick: all access is by searching.
Python’s dictionaries store each of their elements as a pair of things, a key and a value. The key is the thing that can be searched for, and the value is the information that will be retured when that element is read.
An example will help:
Ice Cream
Let’s take that apart and see how it works.
We have the creation of the dictionary itself. We create a dictionary literal much like we do a list, only here we use curly brackets instead of square brackets.
We have key:value pairs. The keys we’re using are all strings (people’s names, in fact) but they could just as well be numbers. The values are also strings, but they can be anything we want. These keys and values are separated by colons (
:
).Between each of the key:value pairs there is a comma.
We see how to look up information in a dictionary. It looks like indexing into a list to extract a particular element, except in this case we don’t give a positional index number but rather we give it a key to look up.
When you think about it, looking for Bob’s favorite food by using favorite_food["Bob"]
is a pretty powerful tool. Rather than having to specify where to get something from, we can just specify what to get. This is why you’ll occasionally see dictionaries refered to as “Content Addressable Memory”. It’s nice to just get the data we want without having to step through every element of a list looking for it. It’s also faster: Python’s dictionaries use clever indexing so they can straight to what you’re looking for.
Dictionaries get their name, by the way, from real-world physical dictionaries. Suppose you have Webster’s 9th New Collegiate Dictionary in front of you. There are a lot of keys in there - each one of those words in alphabetical order is a key. There’s not really any way to look up something by page number alone - there’s no algorithm to tell me what page the definition for brisance is on. There’s no way to look up a word by knowing which word number it is. If brisance is the 8000th word in the dictionary, that knowledge does me no good. On the other hand, there is an algorithm for finding that definition by looking up the key word. I go to the “B”s, look for the “Br” part, and so forth until I find brisance. I can only look up things by key, not by position.
One of the great things about Python is its flexibility with data types. Lists can contain any data type. You can have lists of tuples of lists of strings if you want to. Dictionaries are similarly versatile. You can have dictionaries that contain, say, lists:
['Jane', 'John', 'Alice', 'Bob']
You can even have dictionaries of dictionaries. They’re quite useful, in fact.
99.5
Dictionaries give us the beginnings of a database. It’s not as powerful as a “real” database, but it’s good enough for a lot of things. Of course, a dictionary is like any iother variable: it only lasts as long as your program is running. You would have to combine a dictionary with some file access to have any permanent storage.
What happens if we try to look up a key:value pair, and the key isn’t in the dictionary? Let’s see!
KeyError: "Ruth's Chris"
Yeah, we expected that by now, didn’t we? It threw a “KeyError” exception. What should we do about this? We could always wrap the access up in a try/except structure to catch the KeyError, but this is such a common problem that Python gives us a friendlier way to do it: the in
operator.
Remember using in
to see if something was in a string? This is philosophically similar. Let’s try it:
that's a relief, actually.
Adding to a dictionary is even easier than adding to a list. All you do is just act like the key was already there and assign it a value:
{'Alice': 'Apple Pie', 'Bob': 'Ice Cream', 'Charlie': 'Pizza', 'Dan': 'Fish'}
Now we have:
{'Alice': 'Apple Pie', 'Bob': 'Ice Cream', 'Charlie': 'Pizza', 'Dan': 'Fish'}
What happens if we try to overwrite some data?
{'Alice': 'Apple Pie', 'Bob': 'Ice Cream', 'Charlie': 'Soup', 'Dan': 'Fish'}
We can change what is stored in the “value” part of the key:value pair any time we want. We can’t change the key, though. At least, we can’t change it directly. We can always delete the existing key:value pair and replace it with a new one. Let’s say Charlie really wants to be known as Chuck. He has his reasons. So let’s fix the favorite_food dictionary:
What we did to accomplish that was 1. Look up Charlie’s favorite food and save that value. 2. use the built-in del
operator to remove Charlie as a key and whatever value was associated with him. 3. Insert a new key:value pair whose key is “Chuck” and whose value is whatever we looked up before.
Since every other Python data type that holds more than one thing can work with the built-in len()
function, it stands to reason that dictionaries can, too. And as you would imagine, len() returns the number of entries in the dictionary.
4
Looping and Iteration
A dictionary is another type of iterable. This means we can write loops that traverse the entire dictionary, start to finish, and do something useful.
Alice
Bob
Dan
Chuck
Notice that a traversal of a dictionary retrieves the keys. If you want to retrieve the values, just use the keys to look up the values.
for key in favorite_food: print(favorite_food[key])