Tuesday, June 27, 2017

C++11 and some efficiency improvements


I recently learnt a few new things.

shared_ptr


First is that using std::make_shared is actually not only good because of it's better exception safety (the primary reason I hear people promoting it), but it is actually better because it is infact more efficient and saves space.

To understand this, first lets explain what a shared_ptr is. It is basically a wrapper around a raw pointer. What the wrapper provides is reference counting. So a shared_ptr is two things, the raw pointer and a reference count associated with that raw pointer. Actually this is a slight simplification, because of weak pointers, the reference count is two counts, a weak count and a strong count, and both these are put in a structure called the control block for the shared_ptr, but conceptually this is the reference count part of it (in fact there is a bit more than this too, including the deleter etc, and this gets hidden via type eraser, but that is far more detail than is needed for discussion here). So a shared_ptr is two pointers, one to the object being managed, and another to the reference count (or control block). The reference count needs to be a pointer so that multiple shared_ptr instances can be referring and updating the one shared reference count associated with the object. Because shared_ptr has a constructor that takes a raw pointer, when constructed this way, there is only one choice and that is to dynamically allocate the reference count. Hence two dynamic allocations with this strategy, one for the allocation of the object that is passed in this way, and a second for the reference count. More allocations can mean memory fragmentation. What std::make_shared does is to essentially create a struct of T and reference count together and allocate this with one allocation. So one less dynamic memory allocation. Makes the deletion simpler too. And so that is why using make_shared to create a shared_ptr is better.

Whilst thinking about this, I realised a few things too. Firstly, unqiue_ptr obviously doesn't require a reference count and therefore doesn't require a second dynamic memory allocation ever. I did always scratch my head as to why in C++11 there was make_shared but no make_unique (although added later), but it kind of makes sense that is doesn't add quite as much, the only thing is the exception safety which to be honest I don't lose sleep over as much as the efficiency of these things in the common case compared with the exceptional cases. The provision of make_shared is far more valuable that adding make_unique from this point of view. And thinking about it even more, whist I do know that implementations are in fact doing this make_shared optimisation, another optimisation I'm not so sure about which can be done is that in the case make_shared is not used, that second dynamic memory allocation for the reference count / control block, could be deferred until time as there is a second reference. This practically makes the cost of shared_ptr equal to that of unique_ptr when they are used equivalently. You would obviously for semantic reasons choose appropriately, but I think there is little need to chose from a performance point of view.

So the main take away should be always use make_shared to construct your shared_ptrs when you can, there is no extra dynamic memory allocation, the cost is that of allocating a minimal bit of extra house keeping information together with your object (assuming your object is not of some trivial size like < 32 bytes, consider making a copy of the object instead when it is small, obviously consider your individual situation when evaluating if shared_ptr is right for you). For that second dynamic memory allocation when it is needed, I have imagined there could possibly be schemes in which it is possible to do away with it with a more complex strategy. The idea I had was something like a doubly linked list between active shared_ptr objects to a shared object. As a shared_ptr object goes out of scope, it removes itself from the doubly linked list, patching the other ones to keep it all chained together. The last one removed will know it is because it's prev and next pointers will become null. I'm not sure if anyone has implemented a reference counting scheme doing this. It obviously gets more complex when dealing with weak and strong references, and also it much more maintenance work making this more complex and more likely to break and also isn't so nice from a memory access  pattern point of view, jumping about too much.

Final

The next thing I learnt recently is the keyword in C++11 'final'. It is possible similar to in Java to be able to mark implementations of interfaces as final. You can do this on the class or on a specific virtual member function. What the semantics of this means is that you can not override it further in the inheritance hierarchy. When a class is marked final, it can not be inherited from, and when a member function is marked final, a subclass can not override it. This is great from a semantics point of view, you can protect an implementation from being overridden as you are marking it as intended to be the final and last in that chain of inheritance, this lets you write it in a way that you don't need to design for unintended uses. But really the thing that makes me interested about final isn't so much being able to express some kind of semantic intent in your implementation, whilst that is important and cool, what I like is that it allows the compiler to do some things to make the code more efficient, and that is actually how I came across this, I theorised that when you have virtual functions and a vtable, you wouldn't need to do a virtual dispatch if you had knowledge that in the context of the pointer you have and the function being called that it wasn't further derived. And from this realisation that I thought that a final keyword actually would mean that you could make things more efficient that I searched about this and found that not only did C++11 add this, that in fact from more research that implementations are doing this optimisation already and avoid the virtual dispatch when final is used and your pointer is of the type where final was declared.

What does it mean from a practical point of view? Well it may change they way you might structure and write code involving virtuals. Firstly you will want to start decorating your classes with final. That is the first step. Next you will want to defer casting to a base pointer as late as possible, but also after you have done so, what you may then prefer to do after this if doing multiple dispatches with a given object, you might find that it will be faster to dynamically cast the object to the derived cast it is and do the multiple virtual calls with this pointer than to use the base pointer to speed up special fast path cases. I don't think you would have been able to as easily sped up a fast path like this without removing the general purpose virtual functions or without more intrusive changes.

String literals

Something that annoys me about C/C++ is that there is no distinction between literal objects and other objects in the types for them. For example take string literals, such as "blah" in the code, this is a string of bytes, 'b', 'l', 'a', 'h', 0 which is const, and exists in the statically initialised section of the program and exists for the duration of the execution. But generally its type is "const char *". If I new an array of chars, I'll get a "char *" which is readily convertible to a "const char *". This is mutable. It exists only as long as I say it exists. It is quite a different beast to a string literal. What I really want is to be able to provide APIs which can distinguish between these and do something different depending on how it was called, either with an immutable string literal or with a mutable array or chars. Sometimes you want to know that it is a string literal so you don't need to make a copy so can make an optimisation. Other times it can be for better security so you have tighter control over what and where something can be passed in to a function.

So how can it be done. It can be done using templates, the only downside is that generally this will expose the implementation to the header, although this could just be used as a shim.

For example:

template <size_t N>
void DoSomethingWithLiteral(const char (&a_str)[N])
{
  // a_str is a string literal
}

Basically that is it. A function that looks like that will only accept string literals. Beware that providing other functions with the same name that the overloading doesn't resolve to the call with a string literal ending up going to the overload instead of the intended one. Obviously always test, and then more template magic can be used to disambiguate the calls, that is doable now.

A few more things about string literals.

String literal operators. New in C++11 is user defined literal operator overloads. This is neat and basically allows for defining units to scalar values in a natural way. In English when talking about values in certain units, the way it is expressed is such as 50kmh, or 10kg, or 5s or 10days etc. The pattern is a scalar value (e.g. integer value) followed by a suffix which is an abbreviation for the units. So in traditional C++ the typical way would be to make a type, for example a type called kilograms or kg, e.g.: class kg {}; and you could make a constructor that took an int (or float) so then you could express it something like:  kg(50);  And could do "kg(50) + kg(10);" etc. And could provide conversion operators to convert from the type kg to the type g so you could do "kg(1) + g(500)" etc. But this is not so natural. a literal operator essentially lets you define a conversion operator from int to your type but expressed like 1kg or 500g, so you can write instead "1kg + 500g". It kind of reminds me a bit of python how you can express things like:  "some string".split()  where you can call methods on literals. And in C++ it is also not limited to ints, you can do this for string literals too, so could convert a string literal to a std::string for example with something like "blah"_s.

One more thing about string literals.

I wrote about logging not long ago here: Avoiding heap allocation with strings

Something I realise that I didn't mention is that you can forward the arguments using templates better than what I demonstrated. The use of va_list isn't needed if you pass via templates. Obviously though the issue with that is then you need to expose everything in the header file, so the demonstrated method is fine if you want to hide the implementation details.

Here is a slightly improved way:

template <size_t N, typename ...Ts>
void Logger(enum LogLevel a_level, const char (&a_fmt)[N], Ts&& ...a_args)
{
  size_t msgSiz = std::snprintf(nullptr0, a_fmt, a_args...) + 1;
  char msg[msgSiz];
  ::snprintf(msg, msgSiz, a_fmt, a_args...);
   ...
}

You'll notice it is slightly less lines of code than the original. It perfectly forwards the arguments, and does so without all the fuss of using va_lists. Also another subtle change is the way in which a_fmt is passed in. Instead of being passed as a const char*, it is passed and preserved as a string literal. This has some benefits, security and efficiency. First is the compiler at compile time can make checks as to the arguments match with the format provided, it also forces the a_fmt that is passed to be a string literal so there is no possibility that at runtime a maliciously formed a_fmt string will be passed to this.

It may not be appropriate in the above case, but in some cases it is possible to forward declare a template without providing the body in the header and declaring the body of the template in the source file. In the above case there are too many varying parameters to the template to be able to instantiate all the required uses of it in a single cpp file. But there are cases where templates can be used where it is reasonable that each expansion of the template for each type can be anticipated and hidden but provided in a cpp file. This is generally preferable if it can be done, which leads to the next thing I learnt recently which is a good things to know.

Headers and forward declarations


I was reading "Effective C++" by Scott Meyers, the books are over 20 years old, but there is still value to be gained from them. Obviously some advice is dated and a reading of any older text book needs to be made in context, but sometimes there are things you just didn't realised and somehow missed, in fact it seems a lot of people have missed this particular point which I will elaborate on, as I have rarely seen this practised if ever.

It involves the forward declaration of types, and then defining functions in terms of them not just by pointer or reference, but also by value.

The example from the book goes something like this:

#pragma once     // Perhaps another blog post about using pragma once ?
#ifndef BIRTHDAY_H
#define BIRTHDAY_H

class Date;  // this is the forward declaration

Date getMyBirthday(); // return the date of my birthday
void setMyBirthday(Date a_date); // set the date of my birthday

#endif // BIRTHDAY_H


I didn't realise this was possible, but I tried it and it is indeed possible. I don't know why, but I thought that you would at least need to make the Date used in the function prototypes need to be references or pointers when you don't yet know the details of what the Date class is. But this is great that this actually works.

It works if you think about it, because you only need to know the details of what Date is when you go to implement those functions or when you go to call those functions, so you can defer inclusion of date.h until that time rather than forcing it in to the header.

I think that was my confusion, if not including date.h inside birthday.h, then I could neither implement birthday.cpp nor make use of the functions in birthday.h that depended on Date (say e.g. in main.cpp for sake of simplicity), and logically assumed that the fix was to include date.h inside of birthday.h. But that is the fallacy. That actually isn't the fix you are looking for. The fix you really want is to include date.h in both birthday.cpp and main.cpp. This seems to contravene the DRY principle (don't repeat yourself). But it is really what is better. As much as possible, you want to not include headers in your headers, it only leads to more headers being included and so on which leads to longer compile times. This chain of headers including headers means that 1000s of function prototypes are pulled in, when in fact in a given cpp file, it may only use 100s or less of external functions to itself. Being more explicit inside the cpp files of what is included is preferable to implicitly including much more from headers including other headers.

I think this is something I am going to try to adopt more if I can.

Some gotchas and possible solutions. Where you do need to know the details of an object and its size is when you use it as a member variable. e.g.:

class Date;
class Birthday {
public:
  int dayOfMonth();
  int month();
private:
  Date date;
  ... other members
};

Now this won't work. You can't figure out the sizeof(Birthday) without knowing the sizeof(Date). You can't do "new Birthday". I'm pretty sure this won't compile as is.

If you really don't want to include date.h or if Date was a bit more of a heavy weight object, a workaround would be to wrap it in a shared_ptr or unique_ptr, and this goes back to one of the original points in this blog about the efficiency of such.

Anyway, possibly exposing all the private member variables in a class is not great if you can avoid it, and if you do go the route of wrapping in unqiue_ptr or shared_ptr any of the members, why not then combine all the private members together in to a struct and wrap that entire struct in a smart pointer. This is essentially what is called a pimpl.  Once you do that, you can hide all of that implementation detail in the cpp file. It looks like this in the header:


#pragma once

class Birthday {
public:
  int dayOfMonth();
  int month();
private:
  struct Pimpl;
  std::shared_ptr<Pimpl>;
};


In the cpp file you can then do what ever you like:


#include "Birthday.h"

struct Birthday::Pimpl
{
   // put anything you like in here
   // change it as often and as much as you like
};


Make sure the Birthday constructor (not shown) uses make_shared for efficiency!

I might write more in another blog about this pattern of pimpl in a shared_ptr.


Wednesday, June 7, 2017

Array of Structures and Structures of Arrays


It can be time consuming and painful to recode data structures in a program to better improve efficiency by optimising the data layout to group data together which will be accessed together.

The goal is to improve cache utilisation. You don't want to iterate an array of structures where you are only accessing some of the members of each item. Say each item in the array is a struct of 32 bytes in size, but we only access 4 bytes of each one as we loop. We have loaded up our cache lines with loads of data we aren't even going to touch resulting in sub-optimal efficiency. An alternative approach in this scenario is to re-arrange the data in to a structure holding arrays for each member. When accessing the one member at a time when iterating, the cache will be used more efficiently as the cache lines will be filled up with only the data we are accessing. Alternatively re-writing the code to work on each item as a whole at a time might be an option, but it also might not be possible or be sub-optimal.

Conversely if the data is arranged as structures of arrays and when iterating multiple members are accessed at the same time, there might be related problems, but it is less likely. You still may want to try both approaches to see what works best for given data structures and their access patterns.

But it's not trivial in C/C++ to rearrange the data like this without having to re-write any code which deals with this data. You can't simply flick a switch and the compiler automagically figures out how to re-arrange the code generation to access the data in the alternative layout. Being able to do that would be pretty cool. Jonathan Blow claims in his Jai programming language to have the ability to do just that. Currently Jai isn't released so all we know is from presentations of this language, but it appears to work by decorating the declaration of the array with something which says how it will be laid out and the language generates the appropriate code without the programmer needing to change the code which is accessing it.

Now I would like to be able to do something similar in C/C++, so I had a bit of a think about it and have come up with this basic idea on how it might be possible to achieve something that comes close. Unfortunately to make it work requires some macros and templates, but nothing too fancy.


Here is the basic idea we want to arrive at:


template <size_t SIZE>
struct SomeStructure
{
  int     m_someMember[SIZE];
  float  m_someOtherMember[SIZE];
  // you could put anything in here, just each one needs to have "[SIZE]" added to the end
};



Then we can create arrays either way like this:



// As an array of structures
SomeStructure<1> AoS_array[1000];

// As a structure of arrays
SomeStructure<1000> SoA_array[1];



When we want to access them we do one of these:



int x = AoS_array[n].m_someMember[0];

int x = SoA_array[0].m_someMember[n];


Notice we need to swap the indexes. But we can do that easily with a macro if we know which type of array it is. One simple way I came up with was making a macro for when declaring an array which also sets a const bool which the accessing macro can use.

So here are the macros:


#define DECLARE_STRUCT_BEGIN(StructName) \
  template \
  struct StructName \
  {

#define MEMBER(t, name) \
    t name[SIZE];

#define DECLARE_STRUCT_END(StructName) \
  };

#define MAKE_AOS(array, S, SIZE) \
     S<1> array[SIZE]; \
     const bool array##_isSOA = false;

#define MAKE_SOA(array, S, SIZE) \
     S array[1]; \
     const bool array##_isSOA = true;

#define GET(array, a, n) \
     ((array##_isSOA) ? array[0].a[n] : array[n].a[0])


Probably this could be tweaked and improved, but it gets the idea across as to how it could be possible. I think it is fairly self explanatory.

Usage example:


DECLARE_STRUCT_BEGIN(S)
  MEMBER(int, a)
  MEMBER(int, b)
  MEMBER(int, c)
DECLARE_STRUCT_END(S)


MAKE_AOS(array1, S, 100)

MAKE_SOA(array3, S, 100)


void test()
{
  MAKE_AOS(array2, S, 100)
  
  for (int i = 0; i < 100; i++)
  {
    GET(array1, a, i) = i;
    GET(array2, a, i) = i;
    GET(array3, a, i) = i;
  }
}


I think I may be able to improve on this with templates instead of macros to to make the syntax more appealing, but just having something that works is a start and makes it a synch to switch the data structures to and from AoS to SoA to try out different layouts without mammoth amounts of work.

Also referring to my other post about serialisation and de-serialization (http://inspired-code.blogspot.com.au/p/automating-object-serializationdeserial.html) you could work the macros together where I presented a DECLARE_CLASS and DECLARE_MEMBER macro to automagically generate the code for serialising and de-serializing a structure, it shouldn't be too hard to combine the macros to combine the two ideas of declaring a structure/class that generates the serialisation and deserialisation and also is easy to switch from SoA to AoS. That is another improvement direction that this could be taken.


Along the lines of using templates to reduce the use of macros, possibly this is one direction to take it to do the declaring of an SOA or AOS and the consistent accessing in to either without relying on the preprocessor:



template <typename T>
struct ArrayAccessor
{
  explicit ArrayAccessor(std::functionsize_t)> func) : m_getter(func) {}
  inline T& operator[](std::size_t idx) {
    return m_getter(idx);
  }
  std::functionsize_t)> m_getter;
};


template <template<size_t> class t, size_t SIZE>
struct SOA
{
  template <typename T>
  ArrayAccessor get(T (t::*memberPtr)[SIZE]) {
    return ArrayAccessor([&](size_t idx)->T& { return (m_array[0].*memberPtr)[idx]; });
  }
  t m_array[1];
};


template <template<size_t> class t, size_t SIZE>
struct AOS
{
  template <typename T>
  ArrayAccessor get(T (t<1>::*memberPtr)[1]) {
    return ArrayAccessor([&](size_t idx)->T& { return (m_array[idx].*memberPtr)[0]; });
  }
  t<1> m_array[SIZE];

};



Then to declare an array as SOA or AOS you would do:

SOA<S,1000> array4;
AOS<S,1000> array5;

Access can be made to be the same:



#define ARRAY_MEMBER(array, member) \
    array.get(&std::remove_reference0])>::type::member)


Usage:


    ARRAY_MEMBER(array4, a)[i] = i;
    ARRAY_MEMBER(array5, a)[i] = i;


Ideally the templates could work out the type without needing so much typing or macros, but that is left as an exercise for the reader.

Tuesday, May 30, 2017

Avoiding heap allocation with strings


I was wanting to debug my memory allocations and hit a problem. The logging code that prints out useful information to help debug and trace the program is itself allocating memory. This is particularly pronounced with the string manipulations it does and with creating formatted strings from a variable list of arguments.

It got me thinking. Firstly when one has a std::string object, particularly if that object itself was on the stack, it still allocates off the heap, it is too bad there is no way to allocate off the stack as the lifetime of the object is on the stack anyway. This is just how it is in C/C++. You can sort of do variable sized allocations off the stack, such as alloca or a variable sized array, such as:

   size_t var = i * 10;
   char array[var];

Depending on the compiler, that can work. But anyway, this is no good for a general kind of string class as the lifetime of this is the scope it is created in so won't persist when the function returns. The same applies for alloca.

But I really want to solve this and completely avoid making heap allocations while logging. The main culprit is formatting a string from a format argument and a variable list of arguments.


So the first thing to do is work out the required size for this variable sized array to format in to.


void Logger(enum LogLevel a_level, const char* a_fmt, va_list a_args)
{
  va_list args2;
  va_copy(args2, a_args);
  size_t msgSiz = std::vsnprintf(nullptr, 0, a_fmt, a_args) + 1;
  char msg[msgSiz];
  ::vsnprintf(msg, msgSiz, a_fmt, args2);

  va_end(args2);
   ...
}


Now msg contains the formatted string and without using the heap. The only tricky thing is having to make a copy of the va_list because it gets consumed in the first call to vsnprintf and we need a fresh copy to pass to the 2nd call to it we make.

That wasn't too hard :)


Thursday, March 24, 2016

Reflection in C++

Topics: C++, Serialization, Reflection, Templates, JSON, Visitor pattern

I've posted a new article here titled "Automating Object Serialization/De-serialization in C++":

 http://inspired-code.blogspot.com.au/p/automating-object-serializationdeserial.html

Overview


Some languages support reflection better than others. C++ has its runtime type information system which is reasonably limited. It won't help you much if you are doing serialization and de-serialization, so you end up writing a lot of boiler-plate code which feels like you are repeating yourself a lot.

Serialization though is just one example. In the article I give a bit of a run down on some of the alternatives and present a solution which reduces how much you have to repeat yourself.
In the end, this is an example of all you should end up needing to write to define and declare a struct with members and the code needed for being able to serialize/de-serialize it:

    DECLARE_CLASS(Color)
       DECLARE_MEMBER(int, red)
       DECLARE_MEMBER(int, green)
       DECLARE_MEMBER(int, blue)
    END_DECLARE_CLASS

That is it. No duplicated info anywhere. I show how this isn't locked in to just serializing/de serializing too, and give examples of hashing and other uses. In follow up articles, I can elaborate on some of those other uses and the details of those.

Full article here:

  http://inspired-code.blogspot.com.au/p/automating-object-serializationdeserial.html


Monday, March 21, 2016

Using C++ Template Meta-Programming to expose C functions in Lua


I've written an article on auto-magical language bindings, which you can find here: http://inspired-code.blogspot.com/p/automagical-language-bindings-using-c.html

The specifics of the article shows how to expose C functions to lua.

The summary is, all you need to do to expose a function is add it to the list of exposed functions wrapped with the macro AUTO_BIND_C_FUNTION_TO_LUA such as in the following code snippet:


// Some function prototypes to expose
void someFunc(int arg1, int arg2, const char* arg3);
void anotherFunc(const char* arg1, double arg2, int arg2);
// ... etc ...
// Import table
static const struct luaL_Reg scriptCallableFunctions[] =
{
  AUTO_BIND_C_FUNTION_TO_LUA(someFunc)
  AUTO_BIND_C_FUNTION_TO_LUA(anotherFunc)
  // ... etc ...
};

// Register all the functions we want wrapped
void registerLuaBindings(lua_State* L)
{
  // Register all of our bindings
  for (luaL_Reg r : scriptCallableFunctions) {
    lua_pushcclosure(L, r.func, 0);
    lua_setglobal(L, r.name);
  }
}


That certainly simplifies the work of exposing a function to Lua. I believe the approach could be adapted relatively easily for possibly other languages, such as python also.

Hope you like it.

Sunday, March 20, 2016

Idiomatic way in C++ to test debug/release etc

Idiomatic way in C++ to test debug/release etc.



This is just a quick post on how one can improve their code by removing some #ifdefs and doing things a better way.

It is common to see this style of code:


void SomeFunction(int x)
{
      ...
#ifndef NDEBUG    // Double negative - if ! ! debug - eg: is debug
      ...
      // extra debug checks
      ...
#else
      ...
#endif
      ...
}


Usually NDEBUG is the define that is checked for release/debug checking as it is defined in the C standard in regards to assert.h. I don't know of any other defines the language specifies in relation to release/debug modes. So typically release builds will have NDEBUG defined, and non-release builds do not.

However using preprocessor checks is not very idiomatic C++. It also means the preprocessor hides part of the code from the compiler, skipping important compiler checks of the other half of the code, which means that if not regularly compiling all the permutations of defines that some sides of #if conditions can rot and then require additional maintenance when enabling them. Better is to have the compiler parse both sides and allow it to determine that one side of the branch is never taken and allow the compiler to eliminate it.

So what is a good way to formulate turning a define in to an idiomatic way to test this in C++?

One can do this in a header:


enum BuildType
{
#ifndef NDEBUG
 isRelease = 0,
 isDebug = 1
#else
 isDebug = 0,
 isRelease = 1
#endif
};


And then in code check the condition like this:


void SomeFunction(int x)
{
      ...
      if ( isDebug ) {
         ...
         // extra debug checks
         ...
      } else {
         ...
      }
      ...
}


Using an enum means that we don't need to create any variables to do this.

Obviously you can extrapolate this to testing for things besides debug/release. Also in the place where we declare the enum, if wanting to make it a runtime setting, instead of compile time setting, it becomes quite trivial to change that.
Where this doesn't work so well is if using #ifdef to choose between two options that will either work or not depending on the compiler or target platform, eg if one side is iOS code and the other Win32 code, obviously it may not be worth making the compiler able to accept both sides of the #if/#else. I believe in these cases where you have platform specific code, generally you want to hide that from the generic code and abstract the platform code and put platform specific implementations in their own separate files.

Tuesday, September 6, 2011

Hiring Staff

There is a subtle or perhaps not so subtle distinction between "hiring staff / adding people" and "building a team".

I'm sure this doesn't just apply to software development teams but a broad range of areas. I guess you can just get a bunch of brilliant people and throw them together and hope for the best. You might get lucky. What I'm about to say I think many people should get if they have experienced working in a few different teams, so this blog is just for those that aren't quite up to speed with the distinction between "building a team" and "adding some new staff".

If you think about winning sports teams, it's not entirely a sum of the talent of each individual of the team that determines the result (in team oriented sports), but rather the way they work together as a team combined with what talents they bring in combination. In all A-league teams, I'm not saying the players can be B or C grade players and the team can win, the players do need to be top class, but might not all be the best within that class if they work harmoniously with the group as a whole to bring about a win.

The criteria for hiring someone should include their ability to do the job they are hired to do. That much should be obvious. But when selecting between candidates that can fulfill the basic requirements, I believe one should look to the candidate's ability to blend and complement the existing team and to compare that between the candidates. (hopefully with any luck you will have more than one candidate that fulfills the basic requirements of doing the job).

So how to test or evaluate the ability for the candidate to blend? Simply invite them to join in on a team activity. If the team has lunch together, invite them along to lunch. Ask them to sit in on a team meeting, or perhaps on Friday evenings the team gather for snacks and chatting before heading home, perhaps time the interview at a time to precede such an activity and ask if they mind staying for that if you feel the candidate is someone potentially worth hiring.

The idea is to disarm the candidate to feel relaxed enough to reveal more of the real them. Interviews are usually quite artificial situations. The idea is to step out from that and observe the way the candidate interacts with others in a more natural/genuine environment/situation. Then it will be possible to also ask the existing team what they think about the potential new person at a later time.

I can hear some saying/thinking "but we don't have any activity like that, what do we do, everyone has lunch separately and goes home early on a Friday evening". To which I would say that it doesn't sound like much of a team environment. Sounds like there are bigger problems to resolve before adding more people in to the mix. Perhaps adding staff is trying to solve the problem of poor efficiency with more hands on deck however often adding people causes initial inefficiencies and diminishing returns as the number of interacting staff increases. New trainees drag down the existing team with questions and effort getting them up to speed. Adjusting to a new team dynamic which is upset with the changing of staff is another factor. And there is a diminishing return as the amount of communication exponentially increases with the numbers of staff interacting.

Just to simply make the point if it is not clear, if it wasn't so, any project with say 5 people that takes 12 months if there was no diminishing return on more people on the project would take 10 people 6 months, 20 people 3 months, 40 people 1.5 months, 120 people 2 weeks, 240 people 1 week, 1280 people 1 day, etc. Unfortunately reality doesn't work that way, communication and coordinating over 1000 people to come together and do some task on a given day and divide the task up so the 1000 people don't spend as much time communicating with the 999 other people to ensure they do the right 0.1% of the task will take more than a day. Probably much more than a day just figuring all that out. And that's assuming a task that can be divided in to 1000 non-dependent little bits which is rarely if ever the case.

Building a team also involves getting the team to do stuff together. Perhaps in some work environments it is not necessary to have a team, it really is just a set of individual efforts. I can't really think of any real concrete examples where a company is formed and contains a set of non-cooperating individuals. It's kind of not really a company, if everyone's job is unrelated to each others, then surely they don't require being a member of a company and could each operate as individuals, eg as sole proprietors. However that sounds rather lonely. Having positive social interactions is an important aspect of a job and working with others and getting stuff done.

So what it comes down to is getting stuff done, and getting it done efficiently. When the team is working well together like a well oiled machine, stuff gets done, it's enjoyable and a positive environment. When there isn't any cohesion in the team, no one gets along, interactions are negative, it greatly impacts on getting stuff done. That is the bottom line.

Usually the idea of adding more staff is to get more stuff done. If you are in agreement with that statement then I hope I can finish here for you. Sometimes it's to build an empire for the aggrandizement of someone with ego issues in which case I guess the idea of "building a team" must seem like a strange concept when the idea of "adding staff" fulfills faster the aim of building an empire. If this is you, keep reading. Even empires can crumble, particularly when there is rebellion. You still need relatively happy workers to build an empire. I do think most people prefer being in situations with positive social interactions over negative ones or over having a lack of human interaction for prolonged periods so I think most people want to belong to a productive working team, rather than simply a minion for the aggrandizement of an ego maniac running around pretending to look busy. Even if empire building, you need to create a cohesive group, infuse culture, create positive attitudes, promote a rewarding and productive environment all to keep the empire from rebellion.