2. Data Structure and Query
We used 2 data structure to implement the query function respectively, hash table plus bitmap and bloom filter. We have a configuration file which list out all the files need to be shared in the application, the information about the file we need are file number, file name and file size in bytes. According to that file, we can get two hash tables. One is to map the file number with the file name and the other one is to map the file number with the number of chunks in that file.
Then we need to build a table which contains all the files and chunks we already have in memory and calculate the files and chunks we need accordingly. For this info, we can use either bitmap or bloom filter.
HashTable + BitMap
For hash table plus bitmap method, the idea is to have a bitmap for each file and each bit in the map will represent a chunk of the file. if we have the chunk, we set the bit to 1. And we can map the files with their bitmap using hash table.
So in the query process, when we want to check if we have a particular chunk or not, we can get the bitmap of that file using the file number and then check if the corresponding bits are set to 1.
Bloom Filter
For the bloom filter method, we take advantage of the space efficiency character of this data structure and build a bloom filter with the string value of "file number - chunk number".
The way bloom filter works is like the following:
- We have several hash functions.
- For a certain “filenum-chunknum” string, calculate several hash values using those hash functions.
- Set the bit correspond to the value to 1.
For example, we have chunk No.31 of the file No.2. Then we calculate the hash value of “2-31” using the hash functions and get the result of 1,6,8,13. We set those bits in the bloom filter map to 1.
For query process, if we want to check if we have chunk No.31 of the file No.2 or not, we calculate the hash value of “2-31” again and get the values of 1,6,8,13 again. Then we check if all the bits in these positions are 1. If not, we don't have the chunk. Otherwise, we have.
The most important thing about bloom filter is to choose the number of bits for the map and choose good hash functions. Bloom filter has certain “false positive” possibility. But it can be reduced to a acceptable level with good hash functions and larger map.
Bloom filter is designed for extremely large set of data like web crawler and the advantage in space will not be visible if the scale of the application is small.
Query Process
Query is the process when one device receives a request message from another device. The request message will be introduced later. But the message will tell the device what files and chunks the other one needed. So this device need to check whether it has these files and chunks or not. If it has, then they can start the transmission.
Depending on what data structure we are using, the query is done in different ways. But the return result are the same: an ArrayList of the available chunks that the other side need. The list will contain the "file number-chunk number" string
The app then can get randomly from this list a chunk and parse the file number and chunk number and then read from the file in memory.
If the data structure we are using is bloom filter, we can read from the message about the chunks needed and test if we have the chunk using bloom filter and put the chunk number in the return list if we have. This check can be done in constant time because of the bloom filter data structure we are using.
The same idea with the hash table bitmap method. The device can read the message and get the bitmap of the other side and compared with the map of its own and build the list of available and needed chunks.
3. Client and Server
From Wifi Direct, once the connection is set up between two devices, we can get a IP address. Then we can use this IP address to create server socket as well as client socket and they can then talk over the socket connection. The basic steps for creating server and client are used. One thing need to be noticed is that Android platform requires the internet function to be done in background thread instead of the UI thread. So we used Asyctask and Handler to do this.
The client and server talk to each other using certain protocol. There are 3 types of message, Request, Data and Control. The message are in the format of:
Request messages are type 1. The file number and chunk number field are empty. The data field will contain the files and chunks it need. The format is file numerb + bitmap of that file. Bitmap will save a lot of bytes for the transmission.
Data message will contains the byte stream of one chunk of data in the data field and it need to indicate the file number and chunk number. This is type 2 message.
The Client Server process are similar to each other. The client will send out the request message upon connection set up and then waiting for messages from server. Server get the chunks available using query process and send data message back if there are chunks available. Client get those data message and log the chunks in temp files and update records about chunks it has. If all available chunks are transmitted, they will switch role and server will send request to client. The connection will stay unless the link is broken or all chunks available are transmitted for both sides. The flow charts of this process is as the follow:
Some screen shots showing the communication between client and server:
4. File Operations
File operations contain mainly two parts. One is the way we handle the file transmission and storage and the other one is the file browser.
We set a 100KB chunk size and handle files by chunks. Once a device receive a chunk, it will store the chunk into a temporary file with extension tmp in the tmp folder. The file name will be "file number - chunk number.tmp". Also we need to update the record about the chunks we already have. So that we won't ask for that chunk again and we can share with others for this chunk. If we don't have the entire file, we will keep collecting the chunks. And at the mean time, we can send these chunks to other devices if necessary. The chunks are the basci unit of the application and all operation is done in chunks.
Once we have all the chunks of a file (we have a record about the number of chunks of each file from the initialization process), we can combine these chunks into a single file and delete these tmp files. The combination is pretty straight forward. We just read from these tmp files and store them into one file in turn. Only thing to notice is the actual number of bytes from each file.
The file browser enable us to check the directories easily and play audio and video easily. It use the default activities provide by the Android OS to execute the files according to the file type.
Some screenshot about the file operations:
Device A has a.txt | Device B has b.mp3 |
| |
Device A share a.txt with Device B | Device B share b.mp3 with Device A |
| |
Device A has both files | Device B has both files |
| |
For more info about the design and coding of this project, please see these presentation documents.
For more detailed application in action please see the In Action part.