您在這裡

Retrieving an image over HTTP

24 二月, 2015 - 10:47

In the above example, we retreived a plain text file which had newlines in the file and we simply copied the data to the screen as the program ran. We can use a similar program to retrieve an image across using HTTP. Instead of copying the data to the screen as the program runs, we accumulate the data in a string, trim off the headers and then save the image data to a file as follows:

import socketimport time

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)mysock.connect(('www.py4inf.com', 80))mysock.send('GET http://www.py4inf.com/cover.jpg HTTP/1.0\n\n')

count = 0picture = "";while True:    data = mysock.recv(5120)    if ( len(data) < 1 ) : break    # time.sleep(0.25)    count = count + len(data)    print len(data),count    picture = picture + data

mysock.close()

# Look for the end of the header (2 CRLF)pos = picture.find("\r\n\r\n");print 'Header length',posprint picture[:pos]

# Skip past the header and save the picture datapicture = picture[pos+4:]fhand = open("stuff.jpg","w")fhand.write(picture);fhand.close()

When the program runs it produces the following output:

$ python urljpeg.py2920 29201460 43801460 58401460 7300...1460 627801460 642402920 671601460 686201681 70301Header length 240HTTP/1.1 200 OKDate: Sat, 02 Nov 2013 02:15:07 GMTServer: ApacheLast-Modified: Sat, 02 Nov 2013 02:01:26 GMTETag: "19c141-111a9-4ea280f8354b8"Accept-Ranges: bytesContent-Length: 70057Connection: closeContent-Type: image/jpeg

You can see that that for this url, the Content-Type header indicates that body of the document is an image (image/jpeg). Once the program completes, you can view the image data by opening the file stuff.jpg in an image viewer.

As the program runs, can see that we don’t get 5120 characters each time we call the recv() method. We get as many characters that have been transferred across the network to us by the web server at the moment we call recv(). In this example, we either get 1460 or 2920 characters each time we request up to 5120 characters of data.

Your results may be different depending on your network speed. Also note that on the last call to recv() we get 1681 bytes which is the end of the stream and in the next call to recv() we get a zero length string that tells us that the server has called close() on its end of the socket and there is no more data forthcoming.

We can slow down our successive calls recv() by uncommmenting the call to time.sleep(). This way, we wait a quarter of a second after each call so that the server can “get ahead” of us and send more data to us before we call recv(). With the delay in place the program executes as follows:

$ python urljpeg.py1460 14605120 65805120 11700...5120 629005120 680202281 70301Header length 240HTTP/1.1 200 OKDate: Sat, 02 Nov 2013 02:22:04 GMTServer: ApacheLast-Modified: Sat, 02 Nov 2013 02:01:26 GMTETag: "19c141-111a9-4ea280f8354b8"Accept-Ranges: bytesContent-Length: 70057Connection: closeContent-Type: image/jpeg

Now other than the first and last calls to recv(), we now get 5120 characters each time we ask for new data.

There is a buffer between the server making send() requests and our application making recv() requests. When we run the program with the delay in place, at some point the server might fill up the buffer in the socket and be forced to pause until our program starts to empty the buffer. The pausing of either the sending application or the receiving application is called “flow control”.