您在這裡

Programming with multiple tables

24 二月, 2015 - 12:40

We will now re-do the Twitter spider program using two tables, the primary keys, and the key references as described above. Here is the code for the new version of the program:

import urllibimport twurlimport jsonimport sqlite3

TWITTER_URL = 'https://api.twitter.com/1.1/friends/list.json'

conn = sqlite3.connect('friends.sqlitesqlite3')cur = conn.cursor()

cur.execute('''CREATE TABLE IF NOT EXISTS People    (id INTEGER PRIMARY KEY, name TEXT UNIQUE, retrieved INTEGER)''')cur.execute('''CREATE TABLE IF NOT EXISTS Follows    (from_id INTEGER, to_id INTEGER, UNIQUE(from_id, to_id))''')

while True:    acct = raw_input('Enter a Twitter account, or quit: ')    if ( acct == 'quit' ) : break    if ( len(acct) < 1 ) :        cur.execute('''SELECT id, name FROM People            WHERE retrieved = 0 LIMIT 1''')        try:            (id, acct) = cur.fetchone()        except:            print 'No unretrieved Twitter accounts found'            continue    else:        cur.execute('SELECT id FROM People WHERE name = ? LIMIT 1',            (acct, ) )        try:            id = cur.fetchone()[0]        except:            cur.execute('''INSERT OR IGNORE INTO People (name, retrieved)                VALUES ( ?, 0)''', ( acct, ) )            conn.commit()            if cur.rowcount != 1 :                print 'Error inserting account:',acct                continue            id = cur.lastrowid

url = twurl.augment(TWITTER_URL,    {'screen_name': acct, 'count': '20'} )print 'Retrieving account', acctconnection = urllib.urlopen(url)data = connection.read()headers = connection.info().dictprint 'Remaining', headers['x-rate-limit-remaining']

js = json.loads(data)    # print json.dumps(js, indent=4)    
    cur.execute('UPDATE People SET retrieved=1 WHERE name = ?', (acct, ) )
    countnew = 0    countold = 0    for u in js['users'] :        friend = u['screen_name']        print friend        cur.execute('SELECT id FROM People WHERE name = ? LIMIT 1',            (friend, ) )        try:            friend_id = cur.fetchone()[0]            countold = countold + 1        except:            cur.execute('''INSERT OR IGNORE INTO People (name, retrieved)                VALUES ( ?, 0)''', ( friend, ) )            conn.commit()            if cur.rowcount != 1 :                print 'Error inserting account:',friend                continue            friend_id = cur.lastrowid            countnew = countnew + 1        cur.execute('''INSERT OR IGNORE INTO Follows (from_id, to_id)            VALUES (?, ?)''', (id, friend_id) )    print 'New accounts=',countnew,' revisited=',countold    conn.commit()
cur.close()

This program is starting to get a bit complicated, but it illustrates the patterns that we need to use when we are using integer keys to link tables. The basic patterns are:

  1. Creating tables with primary keys and constraints.
  2. When we have a logical key for a person (i.e. account name) and we need the id value for the person. Depending on whether or not the person is already in the People table, we either need to: (1) look up the person in the People table and retrieve the id value for the person or (2) add the person the the People table and get the id value for the newly added row.
  3. Insert the row that captures the “follows” relationship.

We will cover each of these in turn.