I was writing the first of my TrueNAS API Integration posts when I wrote a statement that hit me:
“It is critically important that any API keys in files have absolute minimum permissions. You are storing a password to your storage system on a server. I’ll say that I feel like this integration is taken for granted in a lot of systems, even at an enterprise level. An increase in the surface area available is unavoidable when automating. A robust set of user roles as above is a fantastic idea. I personally would strongly advise 600 permissions on the API key file which will restrict access to the explicit viewer.“
This may seem like common sense, but it brings up an angle of computers and integration that is often taken for granted.
People in information security would likely say “for best security permissions should be minimal for a person to do their jobs effectively.” This is a valid perspective to a point, but the question here is the opposite side of the coin. That is “Is there sufficient value in having automated or integrated storage provisioning to be worth the risk of the increased surface area this feature exposes us to?”
The “third platform” (the first and second being mainframe and open systems/personal computing respectively) relies heavily on integrations. With automated technologies making disposable infrastructure abstractions a thing, these integrations are unavoidable.
For infrastructure that is in the second platform, this in many cases becomes optional. Many shops I have worked in didn’t create a great deal of VMware data stores after the initial build out. A data store with tens of terabytes of capacity holds a lot of VMs typically.
For arguments sake, lets say an admin has an hour a month into creating a few data stores. They cut it down to ten minutes or so with the integrations. I feel for some this is probably even generous, and the rate is less than that. This also ignores the time taken to install, configure and integrate such systems.
What’s the cost? Simply that a person who has access to VMware has access to your storage system with whatever privileges you assign to them. Then the question becomes this:Is saving a few hours a year worth the risk?
This value equation can change based off other factors than provisioning. As an example, the VMware Virtual Storage Integrator (VSI) plugin is required for Site Recovery Manager to function correctly. For certain storage platforms, it may be plausible to take snapshots via the array instead which is in general a superior option to taking a snapshot at the VMware level. Another possibility is an unconventional system that is extremely difficult to back up so snapshots are made, exported to a secondary location then the backup taken.
Another factor is structural and social. Is the VMware Team separate from the Storage Team? If they are separate, how much autonomy does each team get? Does the business model mandate integrations such as charge back reporting? What do processes look like, are they actually followed, and how hard are they to change? Some of these items can make integration a requirement or a non-starter, and determine the scope of an integration. Strong business cases are critical.
Guidelines that can be done to help track events and contain troubles are:
- Work with security and architecture teams to decide if the value of increased access is worth the increased surface area for potential attacks.
- Clearly state, document, and understand what the objectives of an integration are.
- Be careful about where API keys are stored and what permissions they have.
- Implement means of being able to easily rotate API keys as needed.
- Use of repositories for keys.
- Scripted or automated key rotation
- Use one API user/key for one function or set of functions. This can assist troubleshooting as well as security.
- Use granular controls where possible to limit function and scope to the least required. This will vary from vendor to vendor.
- Good audit paths should be considered an absolute requirement for API integrations with critical systems like storage. Any action taken on a storage array should be logged in a way that is readily tracked.
- Management network isolation should be practiced readily.
- The decision to automate from a host should be done carefully. The host should only be used for management work or work adjacent to management. As an example, a vCenter Server with a VSI plug-in, or the database server calling its snapshots for volumes it is backing up.
- The use of secure protocols to interact with APIs should be mandatory. A network being isolated can make people too comfortable with management traffic traversing it.
- Use of array features such as immutable snapshots and “recycle bins” in order to prevent rogue or accidental deletion.
These points should be taken in a holistic nature. The idea is to create a layered system that has multiple safety measures to prevent compromise and to minimize damage in case of a compromise.
To conclude, is there a right answer? In many cases probably not really. Engineering is the art of compromise, and at times pragmatism forces us to make decisions that are “less bad” or otherwise imperfect. In the world we live in, integrations are inevitable and essential to every day function.