In conversation with ProtocolBuffers

aditya goel
10 min readMar 13, 2022

Question:- What are basic semantics to work with protocol-Buffers ?

Answer:- The ‘message’ is term used here with Protobuf, which shall be transferred over the network. Basically, this is a type of Object that we define. We can have numerous ‘Message’ inside one ‘.proto’ file.

Question:- What are advantages of working with Protocol-Buffers ?

Answer :- Following are the advantages with ProtoBuff :-

  • It’s language-agnostic.
  • Using proto-buffs, Code can be generated for any language.
  • Data is binary and efficiently serialised.
  • It’s easy and convenient for transporting a lot of data.
  • It’s allows for easy API evolution using Rules.

Question:- Which all companies have used Protocol Buffers ? Why is this euphoria ?

Question:- How does Protocol-buffers works at all ?

Using protocol-buffers, we can have same serialisation and deserialisation for all the languages.

Question:- Can you explain, how does a simplest ‘.proto’ looks like ?

Question:- How does process of Serialisation & Deserialisation happens actually, while using Protocol-Buffers ?

Part #1.) Here is how the process of serialisation happens. The Message-format(in .proto) along with the message gets converted to Byte-Array.

Part #2.) Here is how the process of de-serialisation happens. The Message is formed using the Byte-Array and Message-format(in .proto).

Question:- Which are the supported field types with Protocol-Buffers ?

Answer:- Following are the supported field-types with Protocol-Buffers :-

  • Supported scalar type NUMBERS are as following: int32, int64, double, float, sint32, sint64, uint32, uint64, fixed32, fixed64, sfixed32, sfixed64. Pl note here that, fixed32 uses 4 bytes constantly, whereas int32 and sint32 uses variable encoding, wherein, if it can use less space, it shall use for small values.
  • Supported scalar type BOOLEAN are: It’s represented as ‘bool’ in protobuf.
  • Supported scalar type STRING are: It’s represented as ‘string’ in protobuf. It should always contain the UTF-8 encoded or 7-bit-ASCII text.
  • Supported scalar type BYTES are: It’s represented as ‘bytes’ in protobuf. It represents sequence of the byte-array. e.g. → We can use this type to represent an Image.
  • Supported LIST type: It’s represented as ‘repeated’ in protobuf. It represents list of the corresponding scalar-type. e.g. → We can use this type to represent list-of-phone-numbers which Human holds.
  • Supported Enum type: It’s represented as ‘enum’ in protobuf. It represents a variable, whose all the values are pre-known. e.g. → We can use this type to represent color of the eye of the Human.

Question:- How does a sample-proto file looks like and explain it’s basic working ?

  • Here, we have a “Greeting” message.
  • We get a predefined “GreetingRequest” message and similarly a “GreetingResponse”.
  • At very bottom, we have a “GreetingService” that accepts GreetingRequest and returns the GreetingResponse.

Question:- What’s the fundamental ideology with which Protocol-Buffers works ?

Question:- Demonstrate somewhat more complex MessageType using .proto file ?

syntax = "proto3";/*
Human represents a User of our system.
*/
message Human {
int32 age = 1;
string first_name = 2;
string last_name = 3;
bytes small_picture = 4;
bool is_profile_verified = 5;
float height = 6;
repeated string phone_numbers = 7;
enum EyeColor {
UNKNOWN_COLOR = 0;
GREEN = 1;
BROWN = 2;
BLACK = 3;
}
EyeColor eyecolor = 8; my.Date.Date birthday = 9; message Address {
string address_line_1 = 0;
string address_line_2 = 1;
string zip_code = 2 ;
string city = 3;
string country = 4;
}
repeated Address addressOfHuman = 10;
}

In above ‘message’, we also created a nested message of type Address. This is very much possible with protobuf.

Question:- In above “Human” message, we have defined “my.Date.Date”, can you explain something about the same ?

Answer:- Pl observe following points :-

  • We are creating the DATE ‘message’ in an separate file called as date.proto and then importing this DATE type of message to the aforesaid ‘Human’ message.
  • We generally define the packages, in which our ‘protocol-buffer-messages’ lives. After the code gets compiled, it shall be placed at the indicated package.
syntax = "proto3";package my.Date;message Date {
// Year of date. Must be from 1 to 9999, or 0 if specifying a date without
// a year.
int32 year = 1;

// Month of year. Must be from 1 to 12.
int32 month = 2;

// Day of month. Must be from 1 to 31 and valid for the year and month, or 0
// if specifying a year/month where the day is not significant.
int32 day = 3;
}

Pl note here that, there is no way for the constraints on the ranges to be adhered by proto-buf, its something code has to take care of. For example → Value of month can very well go beyond the value of 12 as well, but protobuf has no option to take control of the same.

Question:- Explain about advanced data-types supported by the Protobuf ?

Answer:- Following are advanced data-types

1.) map → It can be used to map scalars(except float/double) to values of any type. Map fields can not be repeated. example →

map<string, CustomValue> = 2

2.) Timestamp → Below is an example of usage of TimeStamp, which comes ready-made from google :-

syntax = "proto3";import "google/protobuf/timestamp.proto";package example.simple;message SimpleMessage {
int32 id = 1;
google.protobuf.Timestamp created_date = 2;
}

Question:- Whats a TAG in Protocol-Buffers ?

Answer:- Here, pay attention to the numbers used for every field of the ‘message’ object. This number is also called as TAG.

  • Range of values that TAG can take is: {1 TO 53,68,70,911}.
  • The values 19,000 TO 19,999 can’t be used, as these are reserved by Google.
  • V. V. Imp. Point→ Tag-numbers 1 to 15 uses ONE byte of space, post the message is encoded. So, preferably use these tag-numbers for frequently populated fields.
  • Tag-numbers from 16 to 2047 uses TWO bytes of space.

Question:- What is ‘protoc’ ?

Answer:- The ‘protoc’ is a way for us to generate the code. We specify the .proto files as input and we can generate code for following languages: C++/Java/C#/Python/Ruby/PHP/Objective-C/etc.

Question:- Demonstrate an example of generating the code from ‘.proto’ files and writing that MessageObject to the File.

Answer:-

Step #1.) Below is a simple.proto file, where we have defined a SimpleMessage :-

syntax = "proto3";package example.simple;message SimpleMessage {
int32 id = 1;
bool is_simple = 2;
string name = 3;
repeated int32 sample_list = 4;
}

Step #2.) Upon compiling this code, it shall auto-generate the java source-code for us.

Question:- Demonstrate a SimpleMessage object and write it to a text file. We shall then read the same object.

Answer:- Please note here that, object once written to the file can be read in any language.

Step #1.) Let’s first generate the message and write it to a file.

public static void main(String[] args) throws IOException {
System.out.println("Write message to file..");
SimpleMessage.Builder simpleMessageBuilder = SimpleMessage.newBuilder();
simpleMessageBuilder
.setId(4567)
.setIsSimple(true)
.setName("Honesty is the best policy.")
.addAllSampleList(Arrays.asList(1,2,3));
SimpleMessage simpleMessage = simpleMessageBuilder.build();
FileOutputStream fileOutputStream = new
FileOutputStream("simpleMessage_bin");
simpleMessage.writeTo(fileOutputStream);
}

Step #2.) Let’s now read back the message from this file :-

public static void main(String[] args) throws IOException {
System.out.println("Reading message now...!");
FileInputStream fileInputStream = new FileInputStream("simpleMessage_bin");
SimpleMessage messageAsReadFromInputStream = SimpleMessage.parseFrom(fileInputStream);
System.out.println(messageAsReadFromInputStream);
}

Question:- Let’s now see an example of an ENUM proto and its usage.

Step #1.) Below is the sample proto file we have created. Using protoc compiler, it shall auto-generate the java source code for us.

syntax = "proto3";
package example.enumerations;
message WeekDay {
int32 id = 1;
DayOfTheWeek day_of_the_week = 2;
}
enum DayOfTheWeek {
UNKNOWN = 0;
MONDAY = 1;
TUESDAY = 2;
WEDNESDAY = 3;
THURSDAY = 4;
FRIDAY = 5;
SATURDAY = 6;
SUNDAY = 7;
}

Step #2.) Below is how we shall be using the auto-generated code of POJO :-

WeekDay.Builder weekDayBuilder = WeekDay.newBuilder();
weekDayBuilder
.setId(1)
.setDayOfTheWeek(DayOfTheWeek.SATURDAY);

WeekDay weekDayBuiltIs = weekDayBuilder.build();
System.out.println(weekDayBuiltIs);
}

Following are two additional notes :-

Note #1.) We can also enforce the java package-name of the auto-generate class using below command inside the proto files :-

option java_package = "com.example.options";

Note #2.) If our 1 proto file contains multiple Messages and for every ‘Message’, we want to auto-generate the different/separate POJO, we can use the below command inside the proto files :-

option java_multiple_files = true;

Question:- Let’s assume that, we have to now upgrade our message that we defined above, as some of the fields needs to be added to the Message. Is that supported with protocol-buffers ?

Answer:- Yes, this is very much possible.

  • We might have a scenario, where one application is writing the Message Object in enhanced format and some application is still reading the old format of the ‘Message’.
  • Other scenario is vice-versa.

Question:- What are the Rules for updating the Protocol (i.e. format of Message inside the .proto file) ?

Answer:- Following are the rules to update the contract :-

  • Numeric tags for any existing fields must not be changed. Now, when we add newer fields, older-code would just ignore those newer fields.
  • Vice-versa, if newer code reads any Message(which was being written using older-code), it would initiate the differing fields with default values.
  • The renaming of the fields is very simple, until and unless tag-numbers are not being changed for renamed fields. Only Tag-Number is important for the Protobuf.
  • Fields in the .proto files can be deleted as long as the tag-numbers(belonging to the deleted fields) are not being re-used by the newly added fields. To avoid this by happening accidentally, it’s better to still keep the deleted fields, but we can suffix them with _OBSOLETE OR another option is by reserving the deleted fields :-

V. Imp. Point to note here is, never-ever remove the reserved tags, as it can cause conflicts in case future developer uses the reserved tag-number.

Question:- Can we also define the Services using Protocol-Buffer-Services ?

Answer:- Yes, using Protocol-buffers, we can also define the services.

Question:- What is a Service in very simple words ?

Answer:- Here is the understanding of Service :-

  • A Service is a set of end-points exposed by server.
  • Below is how services(i.e. RPC call) can be made to sit on top of Messages.
  • The SearchRequest takes in ‘person_id’ in request and returns SearchResponse containing the (‘person_id’ and ‘person_name’).
  • For every RPC call, we need to define a request-message and a response-message. See below, we have defined a “SearchService” as follows :-

Question:- Why Protocol-buffers format is more appreciated over JSON ?

Answer:- Following are the advantages of using Protocol-Buffers over JSON :-

  • First, Protobuf formats are more efficient in terms of the payload-size. We know all of this data has to travel over the network eventually and it can help us to save the lot of Bandwidth.
  • Second, parsing JSON is more CPU-intensive (since its human-readable) as compared to the protobuf parsing (since its little more closer to the way, memory understands the data). Thus, using Protobuf means faster and more efficient communications and hence it comes as first choice for mobile devices which have slower CPU.

Question:- Demonstrate the comparison of payload-size between JSON & Protobuffs :-

Answer:- Say we have this JSON :- The size of below JSON is around 55 bytes.

{
"age":29,
"firstname": "Aditya",
"lastname":"Goel"
}

In contrast say, we have this protobuf format. The size of protobuf is 20 bytes.

message Person {
int32 age = 1;
string first_name = 2;
string last_name = 3;
}

Question:- Let’s now conclude our findings with Protocol Buffer (protobuf3).

Answer:- Here are our learnings in nutshell regarding Protocol-Buffers :-

  • The API can be defined in a simple and easy manner.
  • The definition is different from implementation.
  • Lot of boiler-plate code can be auto-generated by proto-compiler.
  • Since the payload is binary, its lot more efficient to receive/send on network and de-serialize/serialize on CPU.

That’s all in this section. If you liked reading this blog, kindly do press on clap button multiple times, to indicate your appreciation. We would see you in next series.

--

--

aditya goel

Software Engineer for Big Data distributed systems