This wasn't so much about fine tuning SQL Server itself, but breaking traditional thinking. The 5 rules he presents originally are all replaced in the end.
- Everybody's the DBA
- Do what it takes to get what you want
- Tune later, cache & separate now
- NewEgg your way out of problems
- Share for great good
For each point, he argues why the old rule no longer applies and what the new solution is.
I felt a lot of the presentation was about tuning SQL Server without tuning SQL Server: caching, leave full-text searching to Apache Lucene (because it's not querying), and using SSDs to speed up performance without having to touch any code.
Yep, absolutely, you nailed it. There's a gazillion presentations out there about tuning databases, but that only takes you so far. I wanted to show that you need to take a step back before you go into query tuning details.
I had no idea StackOverflow was using Microsoft SQL Server. I guess its just a learned response but i've always come to expect some large networking site to always be using a non-MS based solution.
I havent had a chance to watch the video, but i hope to later on. What is interesting, even with just the link is that for a website like StackOverflow that MS SQL is a viable solution.
We have been using SQL Server in our own company for our projects and i was really starting to get annoyed with it. I found it to be heavy on resources, slow to respond and lets not forget cost. I just completed a project that i have been working on for the last 4 months, and the majority of the work was within SQL server. One thing i learned was that its actually quite a powerful beast.
When used correctly and in the right way SQL is a very capable SQL solution. I'm glad that we decided to stick to SQL Server. There is a lot i learned about SQL Server in the last 4 months that i had no idea it was capable of.
Big networks (banks, insurance companies) basically use one of two options most of the time. Oracle or SQL Server.
It only seems like it isn't when you read sites like Hacker News, where most of the posters are not working in big environments. Facebook is a large scale solution that doesn't use either, but their data management and caching is so bad nobody should be considering them as a best practice.
Google, Craigslist, Twitter, Wikipedia, Youtube and Netflix are just a few example which don't use Oracle or MS SQL Server either.
It's not about being big, banks and insurance companies simply have different requirements on how to access and store their data compared to most websites/services.
Granted (except Google) - but Twitter, Wikipedia and Craigslist are only big in comparison in terms of traffic. Most banks push more data through their network than most developers can conceive.
If you took all of the votes from the 2000 election in the United States (I'll save you the search - it was just over 100 million), it wouldn't even equal one day's ATM transactions in the U.S., and those transactions are a nonevent.
We all know that there is a lot of data being pushed in banks networks, but your ATM example may not be a very good one. The transactions you are talking about do not occur in a single bank.
Twitter in not only big in terms of traffic. 200 million tweets per day is not small, even if a tweet involves less processing than an ATM transaction.
You can also consider Facebook as an example of website dealing with big data.
This is correct; it's the companies that are heavily regulated that have a hard time choosing a nosql solution. The flexibility of SQL means you can create a report at any time. This is necessary when the government comes knocking wanting some specific data pronto.
It's not just regulation. It's a need to report at a granular level and slice data as many ways as possible. That's why Essbase is so popular, even though it's so poorly written.
Oracle, Sybase, DB2, SQL Server and Informix. The last company I worked for had all of them.
- Oracle and SQL Server: chosen by us
- Sybase and Informix: chosen implicitly by choosing applications that preferred them[1]
- DB2: Chosen by a company that we acquired
Oh and some MySQL and SQLite too.
[1] Yes they ran on Oracle too - but when 98% of a vendor's customers are on Informix or whatever, sometimes it's just less painful to go with the flow, so you have access to that community.
I found it to be heavy on resources, slow to respond and lets not forget cost
What are you comparing it to?
Admittedly I am biased since my day job is a SQL Server DBA, but I have tried several other options and think SQL Server tends to stack up quite well.
I have found it generally more user friendly, easier to work with, and cheaper than Oracle (though Oracle does seem to have an advantage in certain types of partitioning). I rather like MySql for certain types of projects, but generally find SQL Server easier to maintain for large projects.
I have only dabbled with NoSQL options, but my general opinion is that for certain problem sets, they are great. However, when ACID is even remotely desireable they are not an option and for certain other tasks they are less desirable.
So, I think that which type of database you use depends largely on the project, but SQL Server tends to stand up quite well for a wide array of projects.
I can't watch the video right now, but the idea that "Everybody's the DBA" is risky in general. I'm hugely in favor of developers writing and optimizing their own SQL, being able to create normalized schema (and know when it's worth the tradeoff to denormalize), how to read a query plan, and generally be as competent as a DBA. But... it's good to have one person who has the global view of the database for things like tuning extent sizes, selecting the optimal types of storage for various partions, doing reviews on the schema, capacity planning, etc, etc.
The right setup (IMO) of having a DBA in an operational role with developers that are highly proficient/self-sufficient is hard to get right and expensive enough that it probably isn't right for an early stage company. And a bad DBA can be a nightmare. So there are tradeoffs on both sides.
It's about moral authority. The only people touching production should be those whose pagers go off at 3am if it all goes hatstand. Everyone else is the peanut gallery.
Are you sure? Amazon is getting flak for that lately (in e.g. Yegge's rant), from having all of its developers on-call. You should be around to fix something you push if it breaks things, but that's not the same as being on-call at all hours for any given coworker's or manager's standard of "going hatstand".
>You should be around to fix something you push if it breaks things
Which has another advantage over Amazon's pager policy: if you are up at 3am, only semiconscious, you are likely to be more useful fixing code that you broke (because you just touched in the last day or two) than some other coworker's mistake.
Agreed - that is a general rule too. E.g. those that do the actual work make the final call on which technology, language, database, whatever to use - not some "architect".
I think that with a modern database like SQL Server, the old-school "high priest" DBA is obsolete.
But... if you don't have someone dedicated to thinking about database issues, you need to treat database changes just like your code. It needs to be in a repository, it needs to be reviewed, and you need a change management regime.
From a anecdotal POV, I've noticed that many folks have a good process (or at least a consensus approach) to managing their code... but the database is often a red-headed stepchild that doesn't get the attention it deserves.
but the database is often a red-headed stepchild that doesn't get the attention it deserves
On my last three major projects, I have committed to devoting the proper level of attention to the database, with automatically building databases in some environments, scripted scheme changes and seed data loading as part of the mainline code base, etc. and I have found the difference to be immense in practical terms. I can move more quickly, more safely, and have a better quality of life as a developer.
One of the best returns on (effort) invested I have ever seen.
I definitely agree. We have been using fluent migrator (https://github.com/schambers/fluentmigrator) for our .net based projects at work for database migrations and versioning
Thanks. Fluent Migrator looks pretty cool. I've been rolling my own stuff as-needed. As a result, it has evolved into something pretty useful for the problems I've encountered so far, but does nothing outside of what I've already thought of. I can see using the migrator as a good jumping off point.
Sure, I wasn't suggesting a high priest. Maybe more a 'faculty advisor' dba? Just someone keeping an eye on the higher level problems (having the right Nagios alerts set up, keeping an eye on how long reindexing is taking and scheduling downtime according, disaster recovery planning, putting in the POs for new drives). That's even if you treat writing queries and schema changes like part of the code, it's a dynamic system. But I completely agree on treating DB changes like code changes, I think DevOps is a good approach.
You'll want to watch the video, because I explain that in more detail. Of course I'm not against change control or security, but if you want to scale a database, everybody involved has to have DBA-level skills. StackExchange's sysadmins and developers are all keenly aware of the database impact of what they do, and they aggressively try to minimize that impact. That's what I talk about on the video when I say "Everybody's the DBA." It's like saying, "Only you can prevent forest fires."
What do you consider DBA level skills? Because you mention in one case people using SELECT * where they only needed a small subset of the fields. I'm all for your message that everyone should be aware of the database, but I don't consider that a DBA level mistake. I've just worked with some astonishingly good DBAs before, and I while I'm not a slouch it would be impossible to catch up with them without years of intensive study.
Anyway, I'm splitting hairs. Your presentation was great, and the people at SO are very impressive overall. Thanks for posting it.
I consider DBA-level skills as the ability to read an execution plan, figure out what indexes are being used, know the difference between a good plan and a bad one, and figure out how to rewrite your query to get the performance & plan you want without just adding indexes willy-nilly. Another good example would be knowing when it's right to normalize tables versus flattening out the data structures - you can't just knee-jerk and normalize everything because it's the best practice.
I wouldn't say that developers need to know about HA/DR, replication, memory tuning, SAN setups, etc, though. Just the how-queries-work and how-data-is-stored angles.
Thanks for the compliments! I had fun doing that one.
I sincerely doubt that the top-poster has Twitter-level problems or resources. He may have a Rails site that's crapping out under load, and needs resources for optimizing both application performance and database access.
"Just move everything to Java" is a meaningless answer. It doesn't help him at all, not unless he's got a year and change to retrain/rehire his engineers and completely rebuild his application.
That's not really a fair comparison however. There's an non-negligible difference between just crunching messages, and then distributing said 6k messages/sec into a directed graph of users. Those 6k messages/sec is just the input, the output is far greater.
It comes down to two things: economics and computer science, in that order.
For most companies, diving deep into your data persistence layer is probably not worth it in the beginning. Your devs need to work on features. (MS SQL does pretty well untuned.) This is the economics part.
The computer science part comes in when you are doing enough traffic that 200 hours of dev time toward a 10% performance improvement becomes good economics. Then you dig into the data layer (and every other layer) and start counting milliseconds.
Which is what we did at Stack O by using Dapper and renting Brent Ozar. :)
I felt a lot of the presentation was about tuning SQL Server without tuning SQL Server: caching, leave full-text searching to Apache Lucene (because it's not querying), and using SSDs to speed up performance without having to touch any code.