Torrent Metafile

From DIDEAS Wiki
Jump to: navigation, search

Main Page Projects

Overview and Motivation

Bittorrent peers (clients) need to locate one another before they can exchange data. At this time, this is accomplished through the distribution of a metafile (the dot torrent file) that contains a list of tracker URLS. Peers connect to the tracker(s) specified in the metafile and obtain the peer list and can then connect to one another to exchange a file.

An alternative purpose of the metafile is to provide cryptographic hashes of the original file. These hashs are used by the client to verify the integrality of the data received from 3rd parties.

At the present time, unless you run your own tracker or have a friend that does, finding reliable trackers is difficult. Trackers that are secure from being shutdown are often overloaded and connections are interment.

One solution is to distribute a metafile with many trackers with the hopes that at least some of them will be operational into the future. However, doing this seems to cause the formation of isolated P2P pools - where the group that has first connected to Tracker A has no knowledge of those having first connected to Tracker B. If a pool has no seeders then the torrent can appear to be dead with no hope until one of the trackers goes down!

In the past year several client applications have incorporated a technique called DHT. DHT allows peers to operate "trackerless" by forming a distributed tracker allowing the existing peers to exchange data without the tracker. However, peers must first discover each other before DHT can be useful.

Many people are involved in finding good long term solutions to these problems.

However, a more immediate solution is to regenerate the metafile listing new tracker when an old one is lost. However this can be a time consuming. My solution is to develop an automated process of modifying the torrent metafiles.

Metafile Overview

Bittorrent has its roots in python and the metafile (.torrent) is simply an encoded dump of the python data object (from class metafile) that represents the file. The encoding method is called "bencoding" is descried at Bittorrent.com

"Metainfo file and tracker responses are both sent in a simple, efficient, and extensible format called bencoding (pronounced 'bee encoding'). Bencoded messages are nested dictionaries and lists (as in Python), which can contain strings and integers. Extensibility is supported by ignoring unexpected dictionary keys, so additional optional ones can be added later."

The bit-torrent source contains a module entitled "bencode" that contains methods that both encode and decode a metafile into a python dictionary.

The following python fragment will read a metafile, use bencode to decode it, and then list the keys and type of the value associated with each key:

from Bittorrent.bencode import *
metainfo_file = open("torrent_metafile.torrent", 'rb')
metainfo = bdecode(metainfo_file.read())
metainfo_file.close()
for key in metainfo :
   val = metainfo[key]
   print type(val),key

The output is :

<type 'str'>    comment
<type 'str'>    comment.utf-8
<type 'dict'>   azureus_properties
<type 'str'>    encoding
<type 'int'>    creation date
<type 'list'>   announce-list
<type 'dict'>   info
<type 'str'>    created by
<type 'str'>    announce
<type 'list'    nodes 

The meaning of most keys is moderately clear, but some explanation follows.

'announce' is a string that contains the tracker announce URL. Example : 'http://a.xyz.to/announce'

'announce-list' is used when multiple trackers are used. It is a list of lists of tracker URL strings. Example :

[['http://t1.prq.to', 'http://t2.prq.to']]

'info' contains the bulk of the metafile's content and is a dictionary containing the file list, block hashes, and more.

'azureus_proerties' is inserted by azureus to support DHT. The value is {'dht_backup_enable':1}

'nodes' is inserted by bit=comet apparently to support their version of DHT. It seems to be a list of IP address and ports - an example being:

[['220.135.205.184', 17178], ['222.164.112.206', 13830], ['80.237.153.37', 49512]]

Editing the metafile

One the metafile is understood and bdecoded - it is trivial to make changes to the data structure.

The announce URL can be updated with:

 metainfo['announce'] = 'http://a.xyz.to/announce'

Deleting a dictionary key - such as the announce-list is :

if metainfo.has_key('announce_list') :
  del metainfo['announce_list']

A new announce-list can be added back:

 a_lst = {'announce-list' : [['http://a.xyz.to', 'http://b.xyz.to']]}
 metainfo.update(a_lst)


Finally when modifications to the metafile are complete, the metafile is encoded and output.

 enc_metainfo = bencode(metainfo)
 metainfo_file = open('new_torrent_metafile.torrent','wb');
 metainfo_file.write(enc_metainfo);
 metainfo_file.close()


It really is that easy!

New Torrent file Strategy

From my understanding of current reality of bit-torrent networks, I'll implement the following:

announce = known good / stable overloaded tracker announce-list = ['overloaded tracker'], [azeurus built-in tracker of initial seed'] enable azeurus DHT enable bit-comet DHT with a node IP address set to the IP of the initial seed.

Programs

The program btshowmetainfo.py is a nice example of decoding the torrent metafile. It makes use of modules from the bittorrent source tree (which had some win32api issues for me). You can get BtShowMetaInfo.zip here.

unzip and run with: python btshowmetainfo.py [your_torrent_file.torent]