To Normalize, or Not To Normalize

Database Denormalization
 

Why You Shouldn’t Always Be Normal

Before you read on and jump quickly to scream foul, please make sure you realize that I write this with some humor, to try and get at the point that “In certain situations, de-normalization is good!” I am not suggesting that we throw away normalization!  Are you kidding me?  Might as well go play Russian Roulette!  However, I think this is an interesting trend among large Web 2.0 companies that are heavy on read, and low on writing.  With that said as my preface, read on and joy.

I’m just going through the process of designing a database architecture for a Web 2.0 site we will be launching in a few months, and the first thing the database architect talked to me about at a review was “Don’t worry, this design your about to see is going to rock, we have really normalized the data, its hot!”.  I was happy he said this, but at the same time, it got me thinking about a lot of things.  Synapses started flipping through my brain as they do at night when I can’t sleep (probably due to the Starbucks I was sipping on).

See, I have never seen eye to eye with database designers on their “Normalization” Zen.  When I first started getting into databases and teaching database design and architecture, I always thought to myself “Why all this normalization?”.  I know, I’m treading into dangerous territory here, but I feel it has to be said.  What is dangerous about this topic?  I guess the fact that I think we should NOT normalize data for the most part in a Web 2.0 world.   Inexperienced DBA’s I guess will find this completely incorrect and question my brain capacity, but I would bet experienced DBA’s after thinking about what I’m saying, would agree with me.  I’ll tell you why…

The Argument Pro Normalization

I won’t go into too much detail, because all of you know that a proper database is normalized.  My guess is at least 80% of web sites out there are running on normalized relational databases.  In the current world of relational databases, we have all been taught that you must normalize your data to ensure data integrity by minimizing duplicate data.  By doing this, you are ensuring that no application can write data in one location, and not updating another location, and creating two different records, for the same data!  This is supposed to also simplify software development, by putting all the data integrity logic at the database layer.  You can user stored procedures, views, joins, and have fairly applications that rely on the database to keep the data integrity.

The Argument Against Normalization

How many freaking joins do you have to do to get all the data you want?  When I first started my career, I could write ASP pages amazingly fast, with simple databases in the back end.  As I started learning more and reading on “Normalizing” your data, I started feeling crappy about some of my designs.  I didn’t use any joins, didn’t really have any transactions, and had very few tables.  But, I wrote applications very fast, got results, and never really had any speed issues.  I never really had issues bad data either, but whatever.

I Learned to Normalize

I started writing more “enterprise” applications, and had to do more joins, left, outer joins, had to worry about transactions, and had layers upon layers of data access to go through in order to add a new customer.  I was told that “It’s the databases job to maintain data integrity”.  Ok it all made sense, I did this.  My code became way more complex, my database designs became way more complex, and you couldn’t put bad data into the database.  Hot!  But, it took FOREVER to write the applications to put the data into the database.  Not only that, but now, if I needed to scale the application, I couldn’t simply add more web servers!  I now had to worry about database replication, transactions, and indexes.  I soon realized that doing joins took an INSANE amount of processing power.

Normalization Is Slow To Design

Are you still with me?  Splitting up your data into small chunks and insuring that no data was stored twice, made writing even the simplest web forms super complicated.  Sure I could call a stored proc, but if it wasn’t written, I had to get the DBA to write it, or write it myself.  If I had to get new data in a certain way, I had to create views, which I was told was fast!  Not (Ok, getting better… but still).  For one simple form, or simple display page, you could have an INSANE amount of joins!

Normalized Applications Are Slow On A Huge Scale

Why couldn’t I just have a table, with all the data I needed for that page?  Then for other pages, I could have the same data duplicated!  This would actually take stress away from the JOIN, and just let me get data from either location.  When an update was done, say the user updates his address, it would be updated in two locations.  Is that all that bad?  I don’t think it is.  Just add some constraints, and make sure the application layer follows the constraints.

If Web 2.0 is all about sorting data, indexing our knowledge, sharing our knowledge, speed in development, speed in viewing Web 2.0 applications, and scaling your sites quickly should you’re site become the next YouTube, I challenge you to do this successfully with a normalized database full of joins, stored procs, views, and transactions.  You might do it, but you will need an army of rocket scientists to do it.

Final Thoughts

Do whatever works for your environment.  In an enterprise environment where you are working with financials and maybe have 1,000 people entering data and reading data, by all means, normalize your data.  If you are writing an application in the web world, where one day you might have 100 users, and a month later, 10,000,000 users, you better have a quick way of scaling!  If you have kept your database de-normalized, and just have tables, and your database is basically a storage area, and has no business logic whatsoever, you will be able to quickly scale your web application by adding servers, replicating code across your web farm, and even putting in cool appliances like BIG-IP in place to cache data and load balance your web farm.

Good luck doing this in a normalized database.  You will be able to, you will just be crying the whole way.

So, What Should We Do?

If you are doing more writing than reading, stick to normalized databases.  If you are doing more reading than writing, still normalize… but then denormalize your data!  Look at the web forms and pages you will have, and see what data will always be displayed, and if you have 18 tables needed to display a users profile, denormalize the data!  Imagine what would happen if you have 10,000,000 users reading profiles and running joins on 18 tables.

One of the Reasons for Web 1.0 Failure

I would argue that a big reason Web 1.0 companies failed was because you had an army of people getting money from VC’s, getting expensive DBA’s, armies worth, to write complex normalized databases, then getting web guys to create the application layers for these complex databases.  Why is this the reason for failure?  You had slow performing sites that didn’t scale.  You had tons of bugs, bad performance, and sites and ideas that never even launched because they were “in development” forever!  In Web 2.0 you have things like Ruby on Rails, AJAX, and companies like eBay that rely on denormalized databases to scale.  You have good looking, fast performing, scalable web sites that are built fast, and not on overly normalized databases.

If you have read this far, I should share some sites with you that give me some legs to stand on for my arguments:

Let the comment bashing on denormalization begin!

 

  • Amazed by lack of experience

    What this post shows is that either:

    1- You are under the influence of some toxic drug.

    2- You are anticipating April’s fools day.

    3- You never have been involved in complex applications.

    Normalization is not necessary for your six entity website. Notice that I say six entities, not six tables.

    Now, just try to not normalize with 120 different entities. Let me assure you that the data integrity issues will bite you back much quicker than the speed of development you assume as a benefit of not having to think about how to do joins properly.

    It becomes insane, but it seems that your experiences with unskilled DBAs and developers have made you deaf to the arguments of the experts.

    By any means, please normalize. There ARE however places where not to normalize is acceptable. Those multi gigabyte read-only data warehouses usually have a good excuse for denormalization, because once data is written there it will never change. However, you’re not advocating that.

    For those that follow this advice, you’ll get what you deserve.

  • Evan Bosscher

    Wow, I’m reading this with a mix of admiration and trepidation. It’s true that denormalized databases reduce the load when querying vast amounts of data, and that it should be done when your application is doing more reading then writing (especially on web applications). My concern is that to properly denormalize a database you should actually do the normalization FIRST, and then denormalize (otherwise they wouldn’t call it denormalization)

    My only fear with this article is that it will encourage junior programmers and/or DBAs to start thinking that the database design doesn’t really matter, and that they won’t do the normalization up front (on paper) before attempting to denormalize.

    I would however disagree with the previous comment, as I don’t believe I am currently on any toxic drugs, April Fools day is 221 days away, and I have written enterprise level applications.

    They key to understanding when to denormalize is to look at the purpose of your data, and how the user / application will interact with it (as stated in the article)

  • seanyboy

    The issue you discuss isn’t normalization, it’s how you do your joins.
    If you’re trying to find a Name (for e.g.) from an ID, you can either …

    - Use an SQL Join.
    - Ask a separate object or function within your code for the name associated with that ID.
    - Populate the database with the name as well as the ID.

    You’re right that the third option isn’t normalised, but as long as the calculated data can be rebuilt, there isn’t a problem. It’s no different to using a cache or a database index, both of which are well known techniques that a pedant would argue also break normalisation.

    So, anyway, I’d argue that the problems you describe come from using too many SQL Join statements. Personally, I think that normalised tables should be linked together using code. It’s faster, and ultimately, it doesn’t create those monster SQL statements that can kill servers.

  • http://www.realsoftwaredevelopment.com Miguel Carrasco

    Hi Amazed by lack of experience.

    I actually agree with you, if read the article in detail, you would see this is what I said in my conclusion.

    I did have to “over” stress a few points in order to make my point however.

    What I was trying to get at is that 90% of the DBA’s on the planet will normalize everything, no matter what it is, and not think about the complexity for the simplicity of the application.

    Experienced DBA’s will agree with what you said, and with the posting.

    It’s just as my conclusion said, large read only databases should be un-normalized (but still normalized to some degree), and databases with a lot of writing (normalized). Since I was speaking specifically about Web 2.0, large databases with lots of reads and small writing, I suggest normalization is the way to go.

    Thanks for your comments! I would have been disappointed without it.

  • http://www.i9sys.com André Luis

    I don’t agree and I don’t disagree your post. But I have read that the Flickr architecture lies on doing joins in the application layer (PHP) and the queries sent to the database are only simple selects without complex joints, so they can scale.
    As others have said, in your post, the problem is not the normalization but the joins. So the Flickr’s solution could help.

  • http://www.realsoftwaredevelopment.com Miguel Carrasco

    Hi Andre, you are correct, the Joins are just so brutal. I still like the article, however I think my mistake was trying to over simplify and purposely try and “stir the pot” on bashing normalization, to make a point and make some people think, and create some discussion.

    Thanks for your comments!

  • Troy Cox

    Using plain asp, you likely weren’t in a strictly OOP environment nor extremely concerned with re-usable, maintainable code, as much as you wanted to do so. Writing that kind of code had me mistaken about normalization as well. The difficulty stems from the available database tools as much as anything. Our ISeries programmers have a lot of difficulting accepting normalization because they are used to humanly looking at a record as a whole and programming visually with a large flat file, seeing meaningful keys, and wanting no constraints. To them, object oriented programming is a ridiculous waste. They create extremely brittle code so tied to the data as to be absolutely unusable in any other circumstance, coded like speghetti code in a linear, difficult to follow style that breaks 100 things when one tries to change one thing.
    OOP in C# and well formed data are a marriage made in heaven, both for data integrity, and maintainable, reusable code. Flat data makes creating good objects difficult and throws getting collections right out the window.
    Using materialized data can most always handle normal speed issues.
    Programming an API, as one has to do to create objects against data that doesn’t match the application’s objects, creates fragile code and should only be used against legacy systems and their data, or partial and incomplete data.
    If you can’t do it yourself, use a good O/R mapper. The classes you get and the full power of a relational database structure will change your opinion in this article.
    All that said, Google style programming is done to answer extremely NOT common data issues. Don’t promote craziness until craziness is warranted. Such circumstances for data are so un-common as to be practically only academic.
    Also, let me say, data can be OVER normalized. Persue good 3NF and you will be happy with your results. DBA’s who work in a vacuum from the application developers, quite frankly, cause a great deal of the kinds of problems you faced to bring you to your article–usually poorly designed data objects from the start.
    Finally, coding without matching indexes will always leave your application stuck in the mud.
    Finally, finally, the ONE place you forget to update your duplicated data in an unnormalized database is the one place that will cause you the most grief–everytime:)

  • Pingback: Normalizar o No Normalizar | Deambulando

  • Pingback: Julian Jewel’s Weblog » Blog Archive » Database War Stories

  • Tom

    Good article.

    I was CTO of an internet company that got VC funding and some hot young whipersnapper thought we should re-write all of our code because our database was not as normalized as "school book material". Anyways management got rid of me and followed the young kid's advice and now they are out of money with not much to show for about 1.5 years of $3M investment….

  • http://www.solowatches.com/ solo

    good post,i like it

  • karthick

    can any one help to create a normalized database now
    my query is:
    customer table (cust_id,password,cust_name,street,city,ph_no),alloted(job_id,cust_id,tech_id,date),tech(tech_id,tech_name,password,field,desc,city,ph_no,wage)schedule(sch_id,tech_id,date,status)user(user_id,password,role) while normalising use all columns in each table

  • zara

    Cheap Louis Vuitton provides our customers with 100% satisfaction service and welcomes any excellent advices brought by our customers. You could find the several of new things with top quality which display in your eyes in Louis Vuitton Outlet,

  • zara

    Cheap Louis Vuitton provides our customers with 100% satisfaction service and welcomes any excellent advices brought by our customers. You could find the several of new things with top quality which display in your eyes in Louis Vuitton Outlet,

  • green

    1. Almost all the small commodities factories of china yiwu market suppliers have selling shops in Yiwu market. 2. There are many buyers yiwu wholesale market who are purchasing in Yiwu market in every country.

    Yiwu agent,professional Yiwu Agent,Yiwu buying agent yiwu suppliers,Yiwu source agent,Yiwu jewelry agent,help you import yiwu manufacturer goods from China very easy.

  • http://www.coachbagsonlinesale.com/ coach small bags

    The coach is located in the brand discount of all over the world, in many places the store chain.coach outlet stores online they sell the same coach, handbag, wallet and other with incredible low-cost products.coach handbags coach brand discount store provide direct selling the project, the lady design with very low price to buy the opportunity. They have many different design, style, color, texture and size options for the save a lot of money. coach outletcoach products is one of the famous brand garment industry to seek after the brand discount brand and provide a fascinating accessories, need not spend too much.Coach necklaces most of the time middle class belongs to the rich and famous from the kind of social classes of female people just use the design is the handbag. Usually, the woman in the higher social just extraordinarily brave wallet and delicate coach handbags.coach small bags but now, the best coach products belong to women's many fantasies are available.

  • zara
  • http://www.guccihandbagscheap.org Cheap Gucci Handbag

    We authorize Cheap Gucci Handbag online retailer, gucci handbags let you become the focus of world attention, or company to put forward the proposed Gucci Handbags Sale fashion,. All the products should be here discount gucci handbags is free on the boat, you can find the latest style on the site, enjoy big discount, we each month of Gucci Messenger Bags product promotion, just to save our customers more money, Gucci Joy not including tax much cheaper, almost 60% of the local store, many customers. Gucci Jackie welcome you to choose goods Gucci Boston of love.

  • dashuang

    Cet été, Faye Wong portant la préparation de ce modèles lac bleu le sac louis vuitton descendus dans les rues, couplé avec un bleu simple tee-shirt et la lumière bleue de cloche fond le style de jeans semblait plein de confort. sac louis vuitton classique est son faible luxe, style design n'est pas démagogique mais vaut plut?t noble. Cette année, cependant, photographié haute Faye Wong a aussi des lunettes de Longchamp modèles chauds de cette année. Il semble que Faye Wong est vraiment la marque sac a main vuitton .Grands sacoche louis vuitton tissés en noir plus d'ambiance, sans cuir souple tissé de retour sur le corps. Faye Wong avec un grand style noir longue tranchée manteau couplé avec un chapeau rond look très mignon.

  • dashuang

    La marque prix sac longchamp ,tous les produits du sac longchamp sont la meilleure sélection de matériaux avancés, en se concentrant sur décoratif détail, exquis, avec son excellente qualité et a gagné une bonne réputation. Le sac le pliage longchamp noir pleine de saveur rétro. Calme, vêtu d'un simple effleurement du couplée avec un prix longchamp noir mystérieux Longchamp, un style classique qui a toujours sorte de manquer la saveur. Confortable T-shirt et short accompagné d'une paire de bottes de beau, grande silhouette avec Faye Wong comment tout cela semble bon tempérament.

  • dashuang

    sac longchamp 2011 automne et en hiver, Longchamp – le sac longchamp série des sacs sont l'expression intemporelle. La forme classique est donné des sac longchamp pliage de matières différentes, montrant pas le même sens de la mode moderne: comparer cuir rose jeune, provocateur, plein de vitalité; l'encre bleue de la texture pattern serpent difficiles et atmosphère luxueuse; sac longchamp moins cher avec sombres veau mature est distribuée constante, subtile discrète élégance.

  • usaman01
  • yabin

    Chanel outlet is dedicated to provide you the Chanel online, Chanel online store, Chanel Bags online all over the world with EMS, free shipping. To secure your order we adopt online credit card payment and for each items sold, 14 days money back guarantee is made.

  • http://www.borselouisvuitton-italy.com borse louis vuitton

    infatti, ci propone sempre delle borse louis vuitton decisamente interessanti, che vengono scelte ogni giorno dalle celebrities più in vista, ma anche dalle fashion addicted che non vedono l’ora di sfoggiare uno dei must have del fashion brand. Brand che con questa louis vuitton sito ufficiale ci propone un look assolutamente delizioso. borse vuitton– semplice ma chic al tempo stesso, per essere glamour in ogni occasione. Il noto marchio di tendenza famoso nel mondo propone la sua nuova collezione louis vuitton outlet primavera-estate 2011

  • http://www.nikeshoxnzdame.com/ Billige Nike Sko

    888Tiger5:

    Billige Nike Sko

  • http://www.mbtshoesclearance.biz/ MBT Sandals Sale

    here is the market! we Wholesale Jerseys ???mlb jereys?Wholesale NFL Jerseys.all jerseys are brand new jersey with the tag? embroidered and sewn on! items will dallas cowboys within 24 oakland raiders jersey hours after your payment cleared.buy comes from us! all of our Discount NFL Jerseys are fast delivery within one week. we provide you the best after sales service and if you still have some questions? please just feel freely jersey to contact us. we also accept wholesale ! super bowl 2011 jerseys is hot on sale now! buy now!

  • http://www.guccihandbags-it.com Gucci Borse

    Grazie a $ 1,3 miliardi di sterline (2.055 milioni dollari) di chilometri quadrati della matrice radio (SKA), alcuni dei misteri dell'universo, finirà per trovare la Gucci Borse risposta. Si raccolgono un sacco di informazioni sull'universo, al fine di affrontare questi dati, abbiamo bisogno di una funzione equivalente a 10 milioni di personal computer super computer. Serie km di diametro radio sarà 1900 miglia (3057,75 km) di migliaia di chilometri quadrati di superficie è di 1 antenna farfalla, fornirà un sacco di dati umano sull'universo, un astronomo ha detto che cambierà completamente la Gucci Scarpe nostra visione del universo. Associazione britannica delle Scienze e Discovery Center, Ian – Dr. Griffin, ha dichiarato: "La piazza km programma radiofonico serie offrirà agli astronomi un nuovo strumento, cambiare completamente la nostra comprensione dell'universo a causa delle dimensioni enormi, il telescopio mostrerà il dettaglio incredibile galassia , mistero emozionante attraverso lo studio di Gucci 2011 oggetti come i buchi neri, la teoria di test della relatività di Einstein per contribuire a rendere la storia dell'universo primordiale, gli astronomi saperne di più. "Le istituzioni scientifiche credo anche che se il nostro sistema solare diverso l'esistenza della vita, chilometro quadrato Radio serie troveranno che offrono le migliori opportunità. Per raggiungere questo obiettivo, ma anche una tecnologia gucci uomo innovativa.

  • http://wowth.com wow gold store

    The passion for buying cheap wow gold grows with the increasing wow gold store. Wow game is a part and parcel in our daily lives; a few people rely on the game boring time at home. You are supposed to buy wow gold from us. We have huge quantity wow gold, buy wow gold online.

  • http://wowgoldth.com wow gold sale

    Oh, my god! So cheap wow gold! You can save a lot if you buy best wow gold online on this website. The price is reasonable on this website and you can get economical deal. Any one of wow gold sale in here, I also recommend you buy wow gold on this website. They always quick respond and the whole buying.

  • http://www.hot-hermes.com hermes

    In my opinion, Hermes Birkin is luxury and full of noble quality. Hermes Kelly is elegant and feminine. Hermes Birkin should be taken casually, and Hermes online should be taken gracefully.
    The colors of Hermes on sale are usually brighter than a kelly hermes and easier to go with dresses, so it is quite popular with many stars. Another difference between Hermes Bikrin and kelly hermes is Hermes Birkin with double handles, butHermes store only with one handle.
    Grace Kelly took the biggest Crocodile kelly hermes, half hidden his belly, revealed her shinning lovely charm. This picture gave deep impression to the public, promoting the great sail of kelly hermes for quite a long time. Compared with Hermes Birkin, Hermes Kelly is smarter. Until 1968, Hermes group produced Mini kelly hermes. Because of its smaller size, it reveals quite different elegant feelings with Hermes Birkin. In most formal occasions, it appears. Hermes Kelly is not only a handbag with only one handle, but also a shoulder-bag. But Hermes Birkin can only be a handbag. This is also one of the differences between Hermes Birkin and Hermes Kelly.
    For stitching, hermes Birkin are only with inseam, but kelly hermes hand bags are not only with inseam, but also with outseam. Inseam makes the kelly hermes bags look flexible, and outseam makes the kelly hermes bags look stiff and smooth.

  • jolly32

    Are you finding Website of Chanel Handbags Fors Sale ? Please be careful ,there are many online store are selling the cheap chanel items ,which are not original. But if you can accept a replica ,it will be a good choice..

  • http://www.teknocrat.com/ssd-vs-hdd-analysis-and-comparison.html SSD vs HDD
  • http://www.pandorabraceletaustralia.com pandora bracelets

    pandora bracelets along with expensive jewelry are usually pandora bracelets canada the far more well-known charm jewellery pandora charms cheap within the industry these days. piece of pandora jewelry ones own lure could be the capability to mix and match these pandora bracelets sale coupled with spacers, to possess a quite a lot of visual appeal when it comes to various pandora bracelet parties. Considering the usage around the , you'll probably be adequately ready to develop your natural designs and also seems to pandora charms sale be noticeably. It is possible to for the most part apply it for a whole lot of reasons, together with pandora bracelets price events, Valentines Day, carnivals, wedding anniversaries, for example. A good number of fabulously, your Pandora wristband pandora bracelets will not purely.