Strong opinions, weakly held

Tag: databases (page 1 of 2)

The NoSQL use case

Drizzle developer Brian Aker weighs in on NoSQL:

MapReduce works as a solution when your queries are operating over a lot of data; Google sizes of data. Few companies have Google-sized datasets though. The average sites you see, they’re 10-20 gigs of data. Moving to a MapReduce solution for 20 gigs of data, or even for a terabyte or two of data, makes no sense. Using MapReduce with NoSQL solutions for small sites? This happens because people don’t understand how to pick the right tools.

Even more than that, I’d argue that people move to NoSQL in many cases because they don’t really understand SQL. Nearly every back end programmer working on Web sites uses a relational database on a daily basis. Those who aren’t using them now probably did have to write SQL queries at one time. And yet very few developers I interview have what I’d describe as strong SQL skills. If you’re not going to use the relational database effectively, you may as well choose a simpler tool, but don’t pretend it’s for technical reasons. It’s a people issue.

Twitter is migrating from MySQL to Cassandra

For those of you keeping track of the trend of applications moving from SQL databases to dumber forms of storage, Twitter has decided to move to Cassandra. Seems like a logical decision for them.

Making it easier to monitor slow queries

Ronald Bradford has a good idea for developers who are debugging slow queries — include comments in your SQL so that the queries are easier to identify in MySQL’s process list or in the slow query log. I’m wondering how easy it would be to include a setting for various database libraries (Hibernate, ActiveRecord, any of several PHP frameworks, and so on) to include the context in which a query is called in a comment when the query is run. For example, it would be great, if inside every Hibernate query, the class and method from which the query was called were included.

I’m going to look into hacking this functionality into Kohana, the PHP framework I’ve spent a lot of time with in 2009.

One NoSQL use case

Simon Willison turned up an interesting NoSQL use case in a crowdsourcing application he built for the Guardian. In the first application he built he used MySQL’s ORDER BY RAND(), which is rather inefficient. (For more on its inefficiency, see the comments on this blog post.) The next time around, he outsourced picking out random results to in-memory database Redis:

The system maintains a redis set of all IDs that needed to be reviewed for an assignment to be complete, and a separate set of IDs of all pages had been reviewed. It then uses redis set intersection (the SDIFFSTORE command) to create a set of unreviewed pages for the current assignment and then SRANDMEMBER to pick one of those pages.


Oracle and content farming

Oracle and content farming don’t have anything to do with one another (or do they), but they do seem to be topics that are dominating the news in my little corner of the world today.

The last obstacle to Oracle’s takeover of Sun (and MySQL, its subsidiary) is the European Commission, which is investigating the antitrust implications of the merger. Last week, hearings started. Everyone sees this as their last chance to make themselves heard on the topic, and the stakes are high.

MySQL creator Monty Widenius has posted an impassioned plea for people to contact the EC opposing the deal for fear that Oracle will find crippling or killing MySQL to be more lucrative than supporting it robustly. He also says that Oracle has asked its customers to contact the EC and demand that the deal go through, so he’s asking MySQL users to contact the EC on behalf of an independent MySQL. For more, see Paul McCullagh and Jeremy Zawodny. Oracle has also posted its list of guarantees to reassure the MySQL community.

I think that the Oracle-Sun deal will go through and that MySQL will fall into the hands of Oracle, and I’m worried about the future of the product. Ultimately, though, I think that MySQL has gotten too big and pervasive for Oracle to be able to kill it off.

Today everybody’s talking about content farming. Tim Bray talks about search engines losing their grip, and Scott Rosenberg argues against describing SEO-driven content as fast food. Jacob’s comment on my previous post is definitely worth reading as well. Oh, and Chris Dixon makes the point that the subjects that are most heavily gamed also happen to be those that get the least attention on human networks.

Now I’m all caught up.

Links for August 27

  • Simon St Laurent looks at reasons why there’s buzz around HTML again.
  • Mac OS X Automation explains Services in Snow Leopard (my copy arrives tomorrow). Via Daring Fireball.
  • A new poll reveals that people don’t actually even know what the public option is. The public option is a government-managed insurance plan that will compete with plans from private insurers in an exchange, available to individuals and small businesses that do not participate in group insurance. Here’s a longer explanation. In the meantime, the current Republican talking point seems to be that Medicare is a poorly run government program that we should preserve at all costs.
  • The MySQL Performance Blog looks at the Redis database. Redis is one of those schema-less databases people are all talking about these days.
  • Matt Raible takes a look at Java REST frameworks.
  • The UK is looking at plastic alternatives to traditional pub glasses. That wins my “stupidest thing I read today” award. Via Bruce Schneier.

Why is MySQL more popular than PostgreSQL?

Why is MySQL more popular than PostgreSQL? The fact that it is more popular is indisputable — take a look at MySQL’s market share page. My experience with PostgreSQL is very limited and my strongest impression was that the command line client has a weird interface, beyond that, I know little. PostgreSQL advocates are pretty convinced of its superiority over MySQL on every level.

PostgreSQL was released in its current form in January 1997. MySQL was initially released in May 1995, but the first version that saw really wide adoption — version 3.23 — came out in January 2001. I’ve always used MySQL but I never made an affirmative decision to choose it over PostgreSQL. Is there a reason why MySQL is more popular other than the power law reasons? What gave it the initial edge in adoption?

Update (5/21): This post is also being discussed at Hacker News. Check out the discussion there as well.

The inevitable MySQL fork

MySQL is near and dear to my heart — I use it for just about every project I work on. And like many people, Oracle’s acquisition of Sun leads me to worry about MySQL’s future. However, I’m not sure that the new MySQL fork from Percona and Monty Program Ab will lead us to the promised land.

What scares me most is that the new database will not support InnoDB. That makes sense, because InnoDB was already an Oracle property even before the Sun acquisition, but moving away from it will be scary for many users. Time to figure out whether Primebase XT is ready for prime time, I suppose.

Update (May 20): MariaDB will support InnoDB. See the comments.

MySQL founder on Oracle’s buying Sun

Michael Widenius, the founder of MySQL, has posted his thoughts on the acquisition of Sun by Oracle. I agree with this sentiment:

The biggest threat to MySQL future is not Oracle per se, but that the MySQL talent at Sun will spread like the wind and go to a lot of different companies which will set the MySQL development and support back years.

I would not like to see this happen and I am doing everything I can do to keep this talent pool together (after all, most of them are long time personal friends of mine). I am prepared to hire or find a good home (either at Monty Program Ab or close to it) for all core MySQL personnel.

I think that the best thing that could come out of the merger is the combination of the InnoDB and MySQL teams.

Links from January 26th

Older posts

© 2024 rc3.org

Theme by Anders NorenUp ↑