A little while ago, following someone’s good advice, I produced a one-page flyer/data-sheet/brochure outlining how I can help OSS vendors and customers with their platform architecture designs. Check it out.
Offering my services as an independent consultant it’s good to have a niche; something only I and a few other people can offer. A rare combination of more common skills is a good start, and in this case my proposition is my experience of OSS solutions and the implications for platform design, hardware, software, security and support.
Get your OSS platform wrong and it can cost a fortune and deliver terrible scalability. Get it right, and you can save money on hardware, licenses and actually support your 250 thousand service transactions-per-hour.
That proposition is based on the presupposition that designing and managing your OSS platform is not just some generic IT and DBA activity. Do you think your network inventory database is a self-managing black-box? Do you think your OSS platform requirements are no different from billing or CRM? Is your hardware vendor trying to impress you with TPC-C benchmark results?
OSS systems are unique, and the reality is that few platform vendors, or even OSS vendors, have a great deal of experience of OSS specific platform architecture issues.
Is OSS really special? Need some convincing? Here’s the Top Five OSS-specific platform ‘things’.
Relevance of Synthetic Benchmarks
Assuming your OSS systems are Java based using a relational database, then you may have some interest in the hardware vendor’s TPC-C and SPECj benchmark results. Both these
benchmarks provide some indication of how an OLTP type of application will perform on the specified hardware. They are, however, generic and synthetic; they are not based on OSS transactions. Compared with a typical OLTP application (an online shop, for example) OSS is distinct in having:
- A relatively small number of users (10s & 100s rather than 1000s & 10,000s)
- Relatively sophisticated user interaction with the system
- A high degree of complex automated processes being initiated on the server
- Complex objects defined by highly relational data
- Oh, I could go on and on….
The upshot of this is that OSS architects cannot use benchmarks results as an input in to analysing system performance requirements. Benchmarks are a reasonable indicator of performance relative to another vendor who ran the same benchmark. Such a comparison may be useful, but the best approach is to have a discussion with the hardware vendor
to evaluate your OSS specific needs. Which leads nicely on to….
What is a ‘Transaction’?
It is necessary to have conversation with hardware vendors, and often software vendors, to work out how big a system is needed, and how much money will be spent on CPU licenses. When quantifying the load the OSS system is going to put on the platform it is essential that both parties are talking the same language. Things like ‘number of database transactions’ and ‘web pages served’ might come up. But these are not universal units that measure all computer systems.
An OSS database transaction is quite different from that of billing or an on-line shop: OSS data models are more complex and a single business process will read and write far more data rows than those of other types of application. Similarly, OSS systems may return quite different web pages or rich web UIs. OSS deals in millions of object instances of many different types, while applying complex rules of interdependence between these objects. HR systems deal in just thousands of objects. Billing and CRM, do have large data-sets but relatively simple data structures. On-line shops, even at the scale of Amazon, have a few thousand product lines and millions of customers but, again, the data objects and relationships are much
simpler than the model for something like a customer’s SLA-managed, asynchronous broadband service on an overbooked MPLS circuit.
Sizing for Busy-Hour Transactions
At the very start of the OSS design process it is uncommon for architects to be
supplied with the data they need to estimate hardware requirements. I’ve seen
many RFPs where the fundamental performance requirement is something like
‘support 2 million service creates and re-grades per week’. At best this is a
useless KPI, at worst it is misleading because some sales engineer is going to
commit his company to supporting those 2 million transactions spread nice and
evenly over 8 business hours per day, 5 days per week. In reality it is going
to be lumpy. How lumpy is dictated by the OSS applications and the telcos
business practices. You need a hardware platform that will support your busiest
hour of the week (or month/quarter/year if it’s really lumpy), and you need to
look at the telco’s systems and business practices to provide a realistic
estimate.
Read-Only Versus Read-Write Usage Patterns
Most people assume a user with read-only access to a system is a
‘light’ user producing less load on the system then a fully-fledged read-write
user. That’s a false assumption for many OSS systems. In OSS, ‘write’ user
activities, like manually designing a circuit or updating service records are
relatively slow, updating only a small number of objects. ‘Read’ activities on
the other hand can be intensive, such as running routing reports across
multiple network technologies or investigating root-causes of faults. Users
running reports can result in database queries across the entire data set,
spanning multiple objects. Do not under estimate the needs of your OSS
read-only users.
Shared Resources in the Network Model
In OSS, at the service or network inventory level, there will be
resources that are either scarce or shared. DBAs see this all the time at the
database and hardware level: Database locks, shared memory space, ‘hot-spots’
of data on a particular hard disc, and so on. The source of these problems can
be traced back to the application level. Take, for example, service
provisioning. On day one your OSS system might be running perfectly, provisioning
PSTN voice lines. On day two it grinds to a halt when you extend the model and
processes to provision DSL. Why? Because suddenly interface ports in your
network become a shared resource. PSTN terminating copper on an exchange has
few, if any shared ports being assigned to the service. DSL will have thousands
of users being provisioned on the same data network interface ports on devices,
gateways and content servers. The result would be provisioning processes
becoming serialized, waiting on locked resources, and it wouldn’t matter how
many CPUs or disc arrays the platform has because the enterprise-class system
is now no more scalable than a desktop application.
There’s no theoretical ‘perfect’ database design. OSS scalability can only be assured by being aware of how a network is structured and how services consume the network, then designing the database and platform accordingly.