Posted by Aidan, 20
We're often asked by entrepreneurs which language their startups should adopt when developing their technology. Attending events around London, or talking with some of the development agencies present at the events, one might hear a startling set of opinions. PHP is insecure? Ruby on Rails doesn't scale? It can be difficult -- without experience building software -- to make an informed choice which is most suitable. Technology enthusiasts tend to get tangled up in enthusiasm for their chosen platforms, so we'd like to provide an objective summary.
But wait
Not all startups are alike. Some are more technical, or more mature than others. Whilst this guidance should be of interest to technical startups, if they have capable senior staff they should be able to get this from them. Businesses with more complex needs or pressures may use multi-platform solutions. In particular, they may use service oriented architecture, or SoA. This can be used to spread some of the platform risk a business might face.
What to avoid
Fortunately for entrepreneurs, there are only a few technology language choices that are likely to be fundamentally "wrong". ASP (or Visual Basic) does not have a justifiable use, other than by companies needing to service existing ASP technology. Microsoft replaced ASP with the .NET platform more than five years ago, and even at the height of its popularity ASP was wholly inadequate. ColdFusion, one of the few languages to have an "is this language dead?" page, can -- despite the occasional new release from Adobe -- be considered a dead language, with all that this entails (difficulty finding and retaining good developers, vendor lock-in, horrendous cost, poor support, porting nightmares, infrastructure constraints). ColdFusion occupied a niche for rapid web development since the late '90s, and its legacy consists of around ten very vocal agencies and developers, some of whom are moving slowly to .NET.
Languages like Tcl -- which powered Vignette's CMS products in the dotcom heyday -- are now rather unusual in web development. They're not entirely without use or benefit, but like Lisp, Flash or Silverlight, they're often not the best choice. There's nothing inherently wrong with Flash or Silverlight, but most often their use should be considered only for incremental interface enhancements to a system built on an establish web framework.
It's worth noting that technically capable startups may use esoteric languages with a great deal of success. 37signals repopularised Ruby this way, and last.fm have been beavering away with Haskell. These languages work well for the companies because they are solidly technical, and are able to kickstart their own frameworks or adapt other projects. Many startups do not have this luxury, and although they've been very successful, 37signal's David Heinemeier Hansson was quite indulgent and not at all risk averse in choosing to single-handedly reinvent a language.
Language doesn't matter
Complex web applications can be built in just about any programming language, and it's not really the language that is so important as the frameworks that are available. Despite what some vendors may say, all languages scale, and it's hard to argue that one language is fundamentally less secure than another. One can, however, make these arguments about different frameworks. Development frameworks are toolkits to guide developers in building particular pieces of software. Critically, they provide implicit (and sometimes explicit) structure to what the developers are developing, and will allow the team to use pre-built blocks of functionality. No company, in any industry, should be building functionality without recourse to significant chunks of other peoples' frameworks. (The common argument used to be "this isn't rocket science, it's likely someone's already built at least part of what we're doing before". However, nowadays even the rocket scientists at NASA are using external frameworks.)
But frameworks really do matter
Above all, in order to make an informed choice of platform, consideration must be made as to the framework. Why would one chose to develop with a particular framework over any other? Consider these eight areas:
There are five or so fairly common languages that startup companies might use to address their needs. These are Java (from Sun), .NET (from Microsoft), and three Open Source options: PHP, Python and Ruby. We’ve tried to convey a flavour for each language as concisely as possible.
Java stands alone from the four others. It -- along with Perl, ASP and a host of proprietary platforms -- powered many of the enterprise sites of the late nineties and around the turn of the century. It's a powerful object oriented language, and is effectively Open Source. Java can be used in both Microsoft or Open Source environments. As it presents a relatively high barrier to entry in terms of technical complexity, Java developers tend to be more capable (and expensive) than many developers who use simpler scripting languages. However, Java is not a particularly productive language to work with, and many elements of functionality can be delivered more rapidly in the other languages. Java developers pioneered some of the best early web frameworks and tools: J2EE, Spring, Eclipse, Hibernate, Ant, Struts, Tomcat. Java has a fair amount of syntax in common in common with some of the .NET languages which were designed by Microsoft to counter Java's success. Java is a great language for enterprises and is commonly used by many big dotcoms, banks and airlines, but its relatively slow development pace and higher barrier to entry make it inappropriate for most early-stage businesses.
PHP is a widely-used Open Source web scripting language. Many small to medium size sites use it, alongside a few massive ones such as Facebook. Of the languages considered here, PHP has the lowest barrier to entry, which has some big drawbacks. First of all, PHP powers a number of content management systems (CMS) – such as Joomla! and Drupal – which are often confused for frameworks. It’s possible to get very simple sites up and running very quickly with CMSes, but often businesses can get caught up trying to extend them. Having a site built on a CMS platform may make sense for the first iteration, but beware of falling into trying to upgrade the modified CMS to support a complex site. It won’t, or at least, it won’t work well and be a robust, fast and high-quality solution. (This isn’t to say PHP doesn’t have frameworks – it has many, from Symfony, CakePHP and Zend to a host of others.) Secondly, the ease of use of the language means that many of the programmers and agencies available are, at best, hopeless. PHP programmers are often unfairly stigmatised by colleagues who work with more complex languages, and whilst there are some excellent ones out there, many more are poor. Grand Master Programmer theory states that around 1 in 20 (or 5%) of programmers are super programmers, and in “easy” languages this number is even lower. Hiring a good team of PHP developers is neither likely nor easy for the non-technical. A team of poor developers won’t just take longer to deliver a solution: they’ll build something which is insecure and complex, and subsequent modifications will become increasingly costly with time.
The .NET framework is a set of proprietary languages (most prominently C#, VisualBasic.NET and ASP.NET) developed by Microsoft. Microsoft made a half-hearted attempt to “open” the languages by publishing an ECMA standards document, and there is an Open Source implementation (Mono) available, but for the most part development with .NET requires use of Microsoft operating systems and licenses. This isn’t a discussion of the merits of Closed vs. Open Source, but it’s worth noting that each Microsoft server costs around $1,000 to license, and their database software costs between £2,000 - £16,000 per CPU (most servers have two to four) to license. These are usually insignificant amounts once a business is proven but can seriously eat into seed or first round capital. (Microsoft have a Bizspark programme to provide these products freely for the first three years of a startup’s life.) The .NET languages, and particularly VisualBasic and ASP.NET have a barrier to entry only slightly higher than that of PHP. That, and the fact that they represent “point and click computing” have resulted in an industry segment full of sub-par programmers. Like PHP, when hiring Microsoft programmers there’s much to be wary of. .NET powers sites of all sizes (though relatively few massive ones) and is very common. .NET has some CMS frameworks such as dotnetnuke and Community Server which often distract startups. Most startups don't have "legacy applications" to deal with, but .NET can be particularly easy to integrate with older Windows applications and services, and is an easier migration choice for older "Microsoft" code. Reincubate have been called in to rescue twice as many project disasters on .NET than any other platform.
Ruby on Rails is a combination of framework and language, with Ruby being the language and Rails being the framework. Ruby was considered a dead language to all but some system administrators when it was revived in 2003 by 37signals to develop their award-winning productivity startup Basecamp, and then taken on to power Twitter. Their lead developer built the first parts of the Rails framework before throwing it open to the Open Source community for further work, and in doing so the first true Web 2.0 framework was built. With a mid-level barrier to entry and a fan-base of intelligent, capable, multi-skilled developers, Rails represents an excellent choice for a startup. The project is mature enough and well documented enough to have attracted developers seeking refuge from other, more tiresome languages. Some purists consider it a joy to program in, and anecdotally it appears to be the platform of choice for many of the hot American companies on TechCrunch. Rails is a web framework at heart, and is less suitable for non-web projects. Being so fashionable, there are many developers both on and off-shore (increasing rapidly) but finding available resource can sometimes be tricky.
Finally, Python’s Django framework has catapulted it into a serious contender to web development. Python is an Open Source language (with a philosophy behind it) long-beloved by Object Orientation enthusiasts, system administrators and Google, and has had a number of different web frameworks – like Zope – before. However, it’s only really with the appearance of Django over the last few years that it’s been more widely used by start-ups. Describing itself as a framework "for perfectionists with deadlines", Django is comparable only to Rails in its tight focus on delivering functionality very quickly for Web 2.0 web applications. Whilst technically there are some interesting differentiators between Django and Rails, there are only a few key differences worth considering. Django is much newer and has a more immature codebase. This means there’s far less supporting documentation, far fewer experienced developers, and is more likely to change as it gets away from version one. Such is its impressiveness, however, that Google released Django as the first supported language on their App Engine cloud computing platform, and a number of Rails developers are starting to transition over. Central to Django are the principles or DRY (“Don’t repeat yourself”) and Python’s lack of TIMTOWTDI (“there more than one way to do it”), with a medium barrier to entry, these help less experienced developers get it right -- the first time around -- in a way which PHP frameworks cannot.
At a glance
Being one of the few expert companies assisting early start companies with their technology strategy and platform, we’re particularly well exposed to platform disasters through the rescue projects we’ve embarked on. In fairness, early-stage business failure caused solely by language or platform choice is rare, but it's often a significant contributory factor.
From what we see, one of the biggest technology risks is where companies take on agencies or developers who have built their own, idiosyncratic frameworks, and we see these tripping up companies time and time again. Seed money gets invested on something that shouldn’t be in use, rather than something provided by an off-the-shelf framework which won’t tie the business to a particular agency. We’ve written previously on the other technical challenges start-ups will face when working with agencies or developers.
We value our independence and do not endorse any agencies or developers. However, we will publish a short round up of resources and guideline rates in a follow-up to this article.
A guiding principle
It is impossible to give general advice on which languages are better than others, and this post makes a number of generalisations. An expert in a particular language or framework will usually be more productive than a non-expert in another language. Whatever the choice, it's important that the experts aren't using the platform in an idiosyncratic manner.
It never hurts to focus on simplicity. A business should own as little technology as is suitable to fulfil its strategy. This should translate to every variable one can use to measure technology: number of servers, number of licenses, lines of code owned, technical staff. The less a business owns, the easier it is to change and grow. Other considerations aside, the framework that's most suitable for expressing the business' logic as simply as possible is often the right choice.
Posted by Aidan, 30
It's been a busy August for us, working on a number of new and existing projects, and a series of investor code reviews. We plan to start publicising one project in particular soon, helping people with their day-to-day management of IT. We put together a short survey which readers may be interested to complete if they have five minutes.
As usual, most of the development work we're doing at the moment is in Django, and we've recently switched to using mod_wsgi. In putting together our Apache and wsgi configuration, we found that neither mod_wsgi's nor Django's wsgi installation instructions contained a truly portable configuration include. We came up with the script below. It's far from perfect but it does the job for now:
:::python
# Assumes this is in the project root, parallel to settings.py, etc.
# We'd typically have our 'docroot' folder one level down for security reasons.
import os, sys, django.core.handlers.wsgi
project = os.path.dirname(__file__)
workspace = os.path.dirname( project )
sys.path.append( project )
sys.path.append( workspace )
os.environ['DJANGO_SETTINGS_MODULE'] = '%s.settings' % project.rpartition( '/' )[2]
application = django.core.handlers.wsgi.WSGIHandler()
Happy hacking!
Posted by Aidan, 22
Last week's Google Developer Day was a huge amount of fun. Particularly interesting was Mano Marks' codelab session on Google App Engine.
I'd played about with App Engine before, but it was only when going off on a tangent and exploring the codelab example code that I started to get a feel for some of the application growth / scalability limitations that the framework poses at the moment. (In the course of the event I managed to find two bugs in App Engine, and then a third bug in Google Chrome when trying to submit the issue.)
Updating domain models through the Django-esque framework can be fraught, lovely though it is. Removing or modifying properties can be done silently, which is fair enough, but records are only updated at write-time. That is to say, if a property is removed, each object must be read and then persisted before the change will be propagated across all data. Changing a property type is even more fraught, as a new temporary property must be added, filled with data from the old, the old removed, and then the data cast and moved back. Google recommend that you don't fetch records several hundred at a time when running these processes, as the app engine may injudiciously kill lengthy requests, but more importantly, because their fetch offset method is inefficient and unreliable with an offset over 1,000. Principly, this is because the fetch method queries the dataset when run, and then only returns the limited data specified in the parameters.
Simply put, it's expensive to page through all data without querying specifically for each chunk, as the underlying mechanism to return data doesn't distinguish between domain and relational data. Having built an ORM framework in the past, we have some opinions on the matter. Google's solution (when it arrives, see the PDF on their thoughts on scaling) will differ, not least as they have some very different considerations for App Engine. Here's a snippet from the FAQ for our LPC framework software, which describes our own solution to the problem when we encountered it in 2000:
Differentiation between domain and relational data
The LothianProductionsCommon (LPC) framework provides highly scalable search and retrieval functionality. Key to this is the underlying differentiation between domain object data representing Cacheable objects and domain object relational data representing their relationships.
Need for flexible primary keys
Cacheable objects must have a simple primary unique key (Id), and the framework assumes that this can be represented by a long integer in most cases (domains where objects cannot be keyed by a numeric ID can be supported through use of the MathHelper's BaseDecToBaseN methods -- if implemented at factory level this allows for IDs to use a 37-byte alphanumeric base, though this approach is highly undesirable).
Partial and complete cache-assisted domain object retrieval
Where an object's primary key is known, it can be fetched simply by using the appropriate broker's Get or Gets method (users should note that polymorphism is not exploited by the broker templates as it cannot be modelled in WSDL when exposing brokers to SOAP requests). This will retrieve the objects required by primary key, and can pass through the cache on an object-by-object basis, optimising requests for groups of objects that are only partially cached.
Get and Gets calls the only way to retrieve objects
This Get or Gets broker call is the only way to retrieve domain objects, and it will reference the underlying application configuration, bringing out the data manipulation language (DML). When using SQL, for instance, the DML may read as:
SELECT id, name, description modified FROM test WHERE {id}
Scheme-based optimisation of Gets calls
The factories will generate valid DML from this template string. In the case of SQL queries, there are various layers of optimisation determined by the ConnectionScheme being used. Firstly, when using Gets to retrieve multiple domain objects from a database, the scheme will decide whether using OR or IN statements is most effective for the given database type.
SELECT id, name, colour FROM test WHERE id = 1 or id = 2
SELECT id, name, colour FROM test WHERE id IN (1, 2)
The scheme will also split large Gets calls where required by the underlying data provider. For instance, if a Gets call is querying for 40,000 objects and the data provider only permits 15,000 bound variables per statement, the scheme will transparently split the query into four and merge the results. Additionally, if the data provider has a query length limit -- for example, of 2,000 characters per query -- the scheme can also transparently split and remerge the query.
Why separate relationship resolution?
Consider the following queries run sequentially on a static dataset:
SELECT id, name, colour FROM table WHERE colour IN ('red', 'blue')
SELECT id, name, colour FROM table WHERE id IN ('red')
Running and caching results of both could lead to an unnecessary connection and query being made, as well as the unnecessary transfer of data from data provider to application server, and the unnecessary use of cache space for what are duplicate results.
By splitting out the relational information in the queries, the LPC has an opportunity to cache data more effectively with minimal overhead. For instance, the following query will return only the primary key of the data to be retrieved.
SELECT id FROM table WHERE colour IN ('red', 'blue')
= 1, 2, 3, 4, 5
These IDs can then be loaded if not cached, and cached appropriately.
SELECT id, name, colour FROM test WHERE id IN (1, 2, 3, 4, 5)
It is likely the next query will contain duplicate IDs.
SELECT id FROM table WHERE colour IN ('red')
= 1, 2, 3
And in this case it will turn out that we have already retrieved and cached objects #1, #2 and #3. In this particular case we can see that the obviated Gets query does not look that costly -- but what if 99% of Gets would be looking for object #2, and what if each object has many large fields?
Managing relationships
Having established that domain objects can only be retrieved by a Get or Gets call, and that they can only be retrieved by a unique primary key, the next step is to define a search framework to resolve the relationship between terms as above like "colour IN ('red', 'blue')" and a set of results as seen, like "1, 2, 3, 4, 5". In addition to this, it is clear that as well as caching domain objects, being able to cache relationship resolution queries and their answers would be beneficial.
SearchCriteria objects contain collections of SearchTerms in the LPC, allowing users to build simple queries as above, where "colour IS red OR blue", and more complex queries with multiple sort orders, fuzzy matching and layers of Boolean logic. SearchCriterias are also sensibly used to key the cache of resolved relationship queries.
When handed to the GetRelationship call, the criteria will be dynamically transformed and interpreted to suit the data provider being used. This allows for dynamic changes in data provider type, and portability -- the LPC can be moved from a MySQL back end to an Oracle one despite substantial SQL format differences, or even backed onto a set of XML files with a SQL to XQL transform. GetRelationship will return a Relationship object, which provides a primitive ordered list of IDs and accompanying lists of relevance scores (used when the underlying query engine can provide such information) and modification timestamps, as well as the SearchCriteria used to generate the relationship. The modification timestamps are used by the caching mechanism to assist with object expiry. If we can see in the relationship query that the object has been modified since it was last cached, we know the cache can be invalidated for that particular object.
Start to finish, retrieving domain objects from a query
.net angel apache audit backup backup extraction bbc bcm.pabx best practice bootlaw bug business business angels business continuity c# call detail recording cdr chief technical officer chief technology officer christmas chrome cio code review colo consulting cto contract cto creative agencies credit card credit crunch crunchies 2008 cto cto for hire data storage data-centre development disaster disaster recovery django domain modelling drinktank due diligence encryption entrepreneurs equity funding events fail firewall focus forcedeth fowa fraud freelance cto fundraising git google google apps google developer day hackintosh hiring hosting ideneb incubator interim cto internet world investment investment. investor investor ios4 ipad iphone iphone 3g iphone backup extractor iphone restore iplayer jason calacanis java job description jobs labs language launch48 law layoffs legal advice logs london lpc mac mashups meetups mentor capital microsoft mobile mod_wsgi molo mvc nda ned networking nortel norway online security os x outsourcing php plan planning protectedcc ps3 raising money realplayer recruiting recruitment reincubate saas scaling security seedcamp seo software staffing start-up start-ups starting a business startup startup cto stealth start-up techcrunch telephone temporary cto testing the start-up depression titanic turnaround ubuntu vc vct virtual cto virtual technology incubation web cto web optimisation web shops weekend wireless wpa xbox360