COMP 2017 System Programming Assignment 3
1 Part 1 - ByteTide Package Loader & Merkle Tree
To get started, you will need to be able to parse a .bpkg file and load it. To assist you with writing
your code and complying with the test program, you are advised to complete the pkgchk.c file in
the src directory.
1.1 The Package and its File Format (.bpkg)
In this program, a file is composed of several anomalous data chunks. The chunks are organised in a specific way such that when they are combined, the entire contents of the file can be constructed and presented to the user. A package defines the necessary information and resources required to construct the contents of a file. Packages represent a unique file given by an identifier string ident.
The package file format is a text format that will need to be parsed by your program. The package file
format has the following fields. Please refer to the hash and chunk parts of the glossary. To also
clarify, the package file, can be modelled as a binary tree, the term h, refers to the height of the tree
in this instance.
-
ident, hexadecimal string (1024 characters max), the identifier is used within the network to identify the same packages.
-
filename, string (256 characters max), This is used to help save and locate the file to update when data is sent to it.
-
size, uint32_t, specifies the size in bytes
-
nhashes, uint32_t, specifies the number of hashes that are pre-computed from the original file. There must be only 2ˆ(h-1)-1 hashes which will correspond to the hashes of all non-leaf node
-
hashes, string[2ˆ(h-1) - 1] (64 characters for each string), these correspond to the number hashes in the previous nhashes field.
-
nchunks, uint32_t, specifies the number of chunks. The number of chunks must be a 2ˆ(h-1) value.
-
chunks, struct[2ˆ(h-1)], each chunk have the fields: hash, offset and size.
– hash refers to a string (64 characters), corresponding to the datablock hash value – offset, uint32_t, is the offset within the file
– size, uint32_t, is the size of the chunk in bytesThe format below gives an outline to the structure of a .bpkg file. Refer to the resources folder in the scaffold for a real example.
1.2 Package Loading
The focus of this task is to load the .bpkg file and also store the details into a merkle tree. Please refer to Section 1.3 for information on a merkle tree.
Read and load .bpkg files that comply with the format outlined in Section 1.1
Once the .bpkg has been loaded successfully, it is advisable that your program also knows if the file exists or not and has functionality to construct a file of the size outlined in the file. Refer to pkgchk.c:bpkg_file_check function.
Implement a merkle tree. Use the data from a .bpkg to construct a merkle-tree Refer to pkgchk.c:bpkg_get_all_hashes and pkgchk.c:bpkg_get_all_chunk_hashes_from_hash functions, as you should be able to satisfy these operations after implementing a merkle tree without any IO on the data file.
Computing the merkle tree hashes, ensuring that combined hashes match the parents hashes when computed and finding minimum completed hashes. Refer to pkgchk.c:bpkg_get_completed_chunks and pkgchk.c:bpkg_get_min_completed_hashes functions. You will need to perform vali- dation on the chunks and discover portions of the file.
The above verifies chunks against package files and the data’s integrity.
Qualities of a merkle tree A merkle tree must is typically a perfect or full and complete binary tree but it can also be represented as a just a complete binary tree (Refer to Errata, Variations and Notes).
• Given a depth of d, the total number of nodes in your tree will be 2ˆd - 1
• All levels are full (necessary for a perfect binary tree).
• A merkle tree will have 2ˆ(d-1) nodes at depth d, these will refer to your chunks. • A merkle tree will have 2ˆ(d-1) -1 non-leaf-nodes.
• All leaves have the same depth (no skewing)All nodes in a merkle tree have a hash value. Hashes of a leaf node corresponds to a hash value of a data chunk. This value is derived from computing hash value of the data chunk itself.
All other non-leaf nodes derive their hash value by hashing their children’s hash values together.
1.4 Errata, Variations and Notes
Implementations: It isn’t necessary for a merkle tree to be a full or complete binary tree. You could potentially have a merkle tree with more than 2 children Or not all leaf nodes are on the same level
However, we have made this assumption to help simplify the data structure.
Same-chunks, different positions: As through experimenting, you may have found that if you have chunks that contain the same data, in this case. Your implementation will need to either assume this will not happen or contain necessary data to differentiate it.
– Please refer to REQ packet, specifically offset part to help resolve searches.
– You can have a bit-field key alongside this similar to the diagram in the previous sections.More data than needed: For the most part, the file has been provided with more data than required to help with implementing this data structure but also ensure that other parts aren’t restricted if it is in an incomplete state.
Using hexadecimal hash or byte-hash: The staff implementation uses the hexadecimal hash and while computing with the byte-hash is not-incorrect, it will yield different results to the test cases. Please make sure you comply with this.
2 Part 2 - Configuration, Networking and Program
You are now tasked with writing a program that will facilitate P2P file-transfer. Your program will need to complete the following tasks:
-
Load a basic configuration file, your program will need to maintain the directory path it will store
-
Implement and comply with the protocol to communicate with other peers within the network itself.
-
Implement the commands for your program, these will include connecting. Your program will need to connect, disconnect and retrieve peer information. Handle package loading and remov- ing, retrieval of chunks from other peers.
This part deviates a little from part 1 as it is not completely necessary to build a merkle tree to get started on this part, let alone complete it. However It is necessary to be able to load a .bpkg file, retrieve the ident, filename, size, nchunks and chunks themselves for this part.
The scaffold has provided the following files for the next sections for you to implement.
• src/peer.c, - Write peer management code here
• src/package.c - Write your package management logic here
• src/config.c - Write your configuration logic here
• src/btide.c, Contains the main function, starting point of the program.You are still free to change and alter the contents of the src folder how you see fit, however, your Makefile still needs to build the required targets.
2.2 Network Protocol and Implementation Details
The section below will outline how your program will communicate to other programs on a network. It is advisable that your program also holds data related to peers and packages.
Network Protocol Your program can act as both a server and client to participants among the network. You will need to be able to form a listening socket to accept incoming connections but also have the ability to form new connections.
The network protocol will use ‘TCP/IP’ data packets to form connections. Your program should use the following packet structure below.
2.3 Building a CLI
To finish and to test your program, you will need to build a command line interface. There are a handful of commands that need to be implemented which also imply a certain amount of book keeping.
Commands Your program must be able to utilise the following commands. The commands are provided via standard input. Your program will stay alive until QUIT command is inputted.
All commands will have maximum of 5520 characters which can fit the command, identifier and path if ever required.
Do note, it is intended that the commands are case sensitive. You can assume command arguments are delimited by exactly one space 2 and should have no leading and trailing characters.