I for one am excited to hear about this move and was wondering if it would ever happen. This will help keep SQL Server alive and growing (I think) and who knows, maybe bits and pieces will even become open source some day (that's optimistic perhaps).
I am an equal opportunity consumer of database technology - something we all should be (e.g. use the right tool for the job when appropriate). I have used MANY different databases, open source and commercial, and there are better ones and worse ones across the spectrum. I think it is naive to criticize this move in any way - it is giving more choice so should be supported.
That said, SQL Server has a low barrier to entry, a very good implementation of the SQL language, an advanced optimizer, strong storage engine, management tools and a lot of nice BI features. Although you don't hear about it much, SQL Server also has Parallel Data Warehouse (PDW) which is their MPP database - ironically which used to be built using Ingres and Java of all things.
In my experience, SQL Server <-> PostgreSQL is the closest commercial to open source comparison (core database engine-wise) I can think of. This makes sense when you consider - both of them started as Ingres! Of course, Postgres and SQL Server have totally different user experiences (database management, tooling, etc.) and many different features.
Would you yourself actually choose SQL Server on Linux for a project?
I'm a big fan of SQL Server but the main thing that hurts it is the cost. Licensing our two servers today would cost a quarter of a million dollars. The OS cost is a rounding error. If I'm paying that much money for SQL Server I don't think I'd choose the Linux version for at least another 5 years from now. It needs to prove itself and there's almost no benefit to offset the risk.
I would never use MS SQL Server unless it is open source. Otherwise you get into vendor lock-in. Microsoft is cunning - they want to tie Linux-based businesses to their (expensive) products for good. But fortunately there is a viable open source alternative - PostgreSQL. http://www.pg-versus-ms.com
How are the clustering options coming along with PostgreSQL? The last time I looked into there weren't any multi-master solutions out of the box, but there were a bunch of separate implementations that mostly seemed to be half-baked.
It's my understanding that the Postgres team has decided to change their historical stance of considering clustering solutions outside the scope of the core project, but I'm unclear exactly how far that has come.
The variety of different 3rd party implementations in various states of maturity make this a bit tedious to sort through.
That said, I always pick PG for my personal projects. I just have a hard time getting sign-off on it from my corporate overlords for anything at work.
It's very simple. With FLOSS you are not tied to one software distributor and you can freely modify and distribute the software by yourself. It is not about support, it is about freedom to run, copy, distribute, study, change and improve the software.
The sad truth is that "anyone can" is frequently still "nobody did, so you're on your own". This well applies to even the most popular pieces of FLOSS.
That's still significantly better than "noone but a single vendor can and you're at their mercy", but just being FLOSS doesn't mean it can't suffer from product lock-in, just that with the hurdle being complexity and not copyright.
Ah, the lock is not something that check the license and tells "oh no you're not allowed". It's a combination of not having something, and complexity to add it. Sometimes, that's because you need something tricky. Sometimes, that's because the project lacks any documentation, is written in a pretty low-level (verbose) language that obscures the high-level picture, and it's hard to figure out how it works. Sometimes, that's because FLOSS != KISS, and there is a lot of FLOSS software that has its own unique proprietary protocols and data formats, and does things in a manner that's completely incompatible and non-interoperable with any other system. I have no clue about MySQL or PostgreSQL internals (actually, I heard a word they're good in this regard), but is the case for various FLOSS projects - I've dug into some and cursed their developers.
If you overcame that and made a patch, but it's not accepted by the upstream - there are maintenance costs. How long would you be able to maintain your own fork of a large software project, keeping up with the upstream? I've tried with a few small ones (say, at a small ISP company, I just needed some custom pppd patches for in-house billing logic), and found it to be not a pleasant experience. I believe it's not just because I'm a lazy ass - I saw tons of forks on GitHub that were slowly rotting away, maintained for some time but eventually forsaken by their authors and far behind their upstreams.
> how's your luck at debugging let alone getting a patch accepted in MS-SQL, for instance?
Duh, none or nearly none, of course.
I'm not saying that proprietary binary blobs are better. There are exceptions, but generally they're significantly worse in this regard.
I'm only saying that just because software is FLOSS doesn't mean one can successfully hack on it and adapt it to their needs, or easily migrate to other FLOSS solutions. Sometimes, they'll be stuck for a while - I think that's also can be considered as a lock-in.
> Nobody made their code complex because they felt like it.
Maybe. But there's a term "over-engineering" too. It's not that someone decides to make things unnecessarily complex, but sometimes they really do.
> Would you yourself actually choose SQL Server on Linux for a project?
It will significantly reduce the licensing implications of using SQL Server for a small project. I know people who use MSSQL in their day job but postgres/other for personal projects and one of the reasons is needing to license Windows if a personal project becomes public and needs hosting. This way a small project can use SQL Server Express on Linux just as cost free as progres (or mysql if you must) on Linux.
The choice of OS will no longer dictates the choice of DB, that decision can be entirely made on the needs of the project and its target users.
Small projects using Express edition may grow into larger ones needed standard/enterprise features, and people exposed to SQL Server through projects using it on Linux are less likely to dismiss ti later because the know postgres/other only. I'm not sure how much we'll see brand new larger projects use SQL Server on Linux any time soon though, but I imagine "get them in at the ground floor and create familiarity as early as possible" is the intention here much like cheap educational licensing for Windows, Office, & full VS and the availability of VS Express.
BizSpark is the way Microsoft encourages side projects/startups to use MS SQL, but yes, this removes a hurdle, and separating the choice of OS from the choice of DB is good.
That said, I use MS SQL both at work and on my personal projects thanks to BizSpark.
There's also Azure, which, to be fair, is still more expensive than an equivalent AWS stack with Postgre or something.
Many of the stability issues I've seen with SQL server have been the fault of the OS or Windows drivers
Conversations I had with some senior SQL server team members in the last two years suggested they were ready for no reboot patching but were limited by the OS and felt it was holding them back. If it wasn't for Windows, SQL server uptime would be much better (was the implication)
I think this will be a great option for existing SQL Server users wanting to move to Linux. Looks like MSFT is slowly giving up on Windows Server to some extent. They might be seeing more money in the Cloud business compared to the Server OS business.
Or rather hope or predict to see more money. There is a lot of wishful thinking involved in the current shift to the cloud, both from providers and their customers.
> I think it is naive to criticize this move in any way - it is giving more choice so should be supported.
It's naive to support it just because of the assumption that more choice must be good.
On the contrary the past few years on Windows have been full of data corruption bugs in SQL 2012 and 2014; and while 2016 looks amazing with new features Microsoft has continued to push ahead instead of just - you know - fixing the fucking bugs we all keep reporting and upvoting on Connect and getting ignored.
More choice is IMHO not a good thing. It's a bad thing. In the same way people who "multi task" are more often than not churning out shit work and running on luck - I'd prefer Microsoft focus on what they know best.
You think that allowing corporations to move their SQL server investment off of Windows and onto Linux as a precursor to moving off of SQL server, rather than forcing them to change DBMS and OS at the same time is bad? That's what choice means.
Those who are moving off of MS SQL are "lost customers" and I don't think MS would be spending extra resources on facilitating their transition. More likely they had customers willing to switch to/stay on MS SQL but needing Linux for some unrelated reasons.
While I do think this is a good thing, I've run very good MySQL servers (well, in so far as MySQL is ever very good) on Windows, so it was already possible to do that kind of migration.
Coming from mostly a Linux/MySQL background, I've had reason to use MSSQL for the last few years. It's really solid, and seems to optimize queries well, which is good, because there's a few things that I still find really annoying, such as no GROUP_CONCAT. There is a more general method to do the same thing, but wow is that method cumbersome in comparison (although undoubtedly more powerful).
There are a few obvious cases which don't work (e.g. scalar UDFs aren't inlined - always use table UDFs), but the performance battle Oracle and Microsoft had circa 2000 really shows. The vast majority of the time you can code your intent and the query optimizer will simply do the right thing.
The main reason I would choose SQL over any of the competition is DataDude/VSSQL. We started using it where I work (~5100 schema objects) and it's revolutionary. T-SQL becomes a first-class language in Visual Studio:
- Full editor integration: intellisense and errors as you type.
- Build system: build (including static analysis) and deploy from Visual Studio. MSBuild-based integration for CI.
- Schema delta: you write/update your schema as though you are writing it for the first time and, as part of the build, the ALTER script will be generated for you. You rarely have to write migration scripts yourself (I've only had to when migrating data across columns/tables).
I have been thinking about solving this for PostgreSQL because I'm now completely unable to work without it - I just wish I had the time. Any other form of SQL development now feels like VBScript development in Notepad.
Links to more information about this? I've found some information about VSDBPro but it all is dated 2008. I like the sounds of this, but it seems dated...
Update: Seems like this has become SQL Server Data Tools (but not the BI tools of the same name?). I'll have to look into this more.
As far as I remember it has only ever been optional in VS2010. It was introduced by default in 2013 (.dbproj) and then reworked in 2015 (.sqlproj). All you need to get this going is: Visual Studio > File > New > Project > Other Languages > SQL Server.
You can also get it to reverse-engineer an existing database into a project[2].
Getting it to work for CI isn't turn-key. When you build a SQL/DB project it results in both a CREATE/ALTER script (depending on whether you do a diff build or a CREATE build) as well as a schema file (.dbschema). You need to track previous versions of the dbschema yourself (we use branching to track this file across releases), so that it can it has the base for diffing during the build.
It's a good idea to add a database reference to [sys] and [master] in all your projects (which is not done by default).
Knowing the right keywords I Googled a bit for you:
This is also my big complaint when it comes to MS SQL -- there has been precious little development on their SQL language. It's almost like they're in a complete feature-freeze when it comes to the SQL language.
MS SQL is missing some very basic SQL functionality.
Microsoft rewrote the code so there was no more Sybase code, but it still followed Transact SQL standards. T-SQL is not as complete as PL/SQL and others. It is just that Microsoft is that 800 pound gorilla so people use their products.
My last job I did migration from Excel and MS-Access to SQL Server using VB 6.0 code and ADO recordsets to copy and sync up data. They key was to use a column named date modified that had a time stamp when it was last modified to figure out which record was the newest and back up the old record to a history table in case it needed to be reverted later on.
Oracle IS the gorilla, but with their recent pricing/licensing moves, more large orgs are exploring moving to SQL Server, especially ones that are already a Microsoft shop. My company is doing just that.
Date handling, simple things like postgres date_trunc('month', now()) and especially looping over records in stored procedures without using cursors (that are prone to crashes).
I work with both on a daily basis and there is a mile of difference. Yes I can accomplish whatever I want in sql server but there are so many small things that could improve.
Some of other pet peevees are:
You can't use a lot of left joins, it slows down the query dramatically. If there is only one to one relation, use subselects instead.
Thou shall not do queries like this: select foo into baz from bar where fooid not in (select fooid from bar)
They are the death of the engine.
SELECT DATEPART(month,GETUTCDATE()) ? Either you are using a very old version or not using SQL Server at all cause I pretty much do everything you are complaining about on a daily basis and on a very large set of data.
Your query only extracts the month integer from a date, it does not floor it to the month. The equivalent for sql server is this ugly thing: select dateadd(month,datediff(month,0,getdate()),0)
We are on SQL server 2014. I have had a query go from 2-3 hours down to 7 minutes by replacing left joins with subselects within the query for instance (and yes all the indices are in place as recommended by sql server query planer)
The one with a insert based on a query on the same table I have never managed to run at all and I have to use a temporary table as an intermediary instead.
I have seen Sql server query planer do stupid shit too. Almost always Statistics where to blame. Are you sure those where up to date? For some reason the "auto update statistics" functionality doesn't seem to work reliably, you have to update them in a maintenance job to fix this.
There's not really a standard anymore for UPSERT to be added to. SQL has fragmented and since the last few iterations of the standard allowed for "additional non standard features", adding an UPSERT is perfectly compliant with the SQL standards.
And it's very useful in a ton of situations which is why so many other SQL dialects have added it. It's not like SQL Server is completely standards compliant so I'd rather have the functionality then have to use MERGE because it's "standard".
Agreed on GROUP_CONCAT. MSSQL has been missing that for a long time and it is a very reasonable and handy function to have.
It sounds like you are talking about the XML conversion method for concatenating groups. There is one other way around it using custom .NET CLR functions, but it is definitely cumbersome as well.
Yes, I'm referring to the XML conversion method. I'm aware you can register your own .NET CLR functions, but truth be told, it's a database for our POS application (which we purchased from a vendor), so I try to mess with the database as little as possible beyond reading the data through our own internal application so we can do useful things with it. I try to stay away from messing with the schema or even providing extra routines (I can't be sure anything would survive upgrades, which are regular and I don't want to deal with that).
MS SQL Server was based on Sybase 4.2 as I recall, which is the Sybase release that predated introduction of SMP support. (Internally known as the "boat anchor" though I doubt that name appeared on the contract to sell the code.) Microsoft basically rewrote it, so I would guess there is little original Sybase code left. You can still see the the heritage clearly in features like T-SQL dialect and the Tabular Data Stream wire protocol, both of which were developed at Sybase.
Sybase is not in anyway related to Ingres that I know of other than the fact that Bob Epstein and Mike Stonebraker worked on the Ingres project at Cal. Sybase in fact was considerably different from Ingres in many ways such as use of SQL instead of QUEL, network access to the SQL Server, and specialized features like stored procedures. The Sybase founders did not come directly from Cal or even Ingres but rather from Britton Lee.
Intellectually, somewhat, perhaps. Sybase was a distinct implementation. When it came on the market, Ingres IIRC was still based on QUEL. Sybase had its new T-SQL language and a novel design that made it feasible to use for OLTP. At that time, Ingres and Oracle were usually devoted to decision support, and OLTP was backed by things like VSAM.
You are actually not far off. Sybase was founded by Bob Epstein, who was Stonebraker's student at Berkeley and helped build the original INGRES system.
I too am excited, as I was faced with the challenge of building a rails app with a SQL server database backend. Sure there is a solid adapter, but it wasn't perfect, and we couldn't effectively run tests on it.
We don't really care that much about the OS -- never did, never will. A windows license is very cheap, and we only need maybe 20 windows licenses to run the whole network.
The much bigger cost are SQL Server licenses, for which we care very much about!
All in all we'll keep on using SQL, but on the most performant platform -- my bet is that it's going to be Windows for a while though.
Regarding .Net Core: the platform is not ready yet (e.g. lack of support for Security IIRC), but we'll adopt it when it is -- again on the fastest platform.
Both in the case of SQL Server and .Net core, we are working publicly on GitHub with Microsoft by testing and providing advice where needed to make sure the next version of our ecosystem is the best possible.
Their web/service tier are still windows, doesn't seem that likely that they'd rewrite them in .netcore fast enough to be off windows completely in the near future.
SQL Server RDS is built on Windows. Albeit a scripted to the 9's adaptation to make it act more Linux like (e.g., RDS can't connect to your AD, so they had to make some sacrifices).
The big difference with RDS is that you don't get the full gambit of clustering capabilities that SQL server supports. This is mainly due to the fact they don't have a shared storage model that SQL cluster failover requires.
EBS is certainly capable of implementing the shared storage that the database requires, and the quorum driver for wolfpack ought to implementable without a physical scsi device backing it.
At least until recently the MS SQL Server clustering was based on mirroring, which is capable but deprecated in future SQL Server releases in favor of AlwaysOn clustering.
The syntax of MSSQL is very behind the competition and still based on Sybase. Even basic string functions aren't available or have weird arguments.
If you aren't locked into their eco-system stay clear away from it. Otherwise you will regret it, as the licensing costs are very expensive and per CPU core (think Oracle). Better switch to MySQL or Postgres or Lucene, etc.
I am an equal opportunity consumer of database technology - something we all should be (e.g. use the right tool for the job when appropriate). I have used MANY different databases, open source and commercial, and there are better ones and worse ones across the spectrum. I think it is naive to criticize this move in any way - it is giving more choice so should be supported.
That said, SQL Server has a low barrier to entry, a very good implementation of the SQL language, an advanced optimizer, strong storage engine, management tools and a lot of nice BI features. Although you don't hear about it much, SQL Server also has Parallel Data Warehouse (PDW) which is their MPP database - ironically which used to be built using Ingres and Java of all things.
In my experience, SQL Server <-> PostgreSQL is the closest commercial to open source comparison (core database engine-wise) I can think of. This makes sense when you consider - both of them started as Ingres! Of course, Postgres and SQL Server have totally different user experiences (database management, tooling, etc.) and many different features.