Why You Should Learn to Use LINQ

FilesOne of the biggest steps forward for C# came in C# 3.0/.NET 3.5 when Microsoft introduced Language Integrated Query (LINQ). The easiest way I’ve come up with to explain LINQ is that it’s almost an exact analog to SQL queries on a table in relational databases. If you use C# you should really learn LINQ.

A table in a relational database is just like a spreadsheet, where every row has exactly the same columns in it. Let’s look at what you can do with this type of table. Here’s an example table where we hold the weights, ages and names of a bunch of folks. Note the example values I’ve used are not in any way related to real people!

Our table has the following columns:

name – eg David
weight in pounds
height in inches
age in years
ismale – 1 = Male or 0 = Female
id – a unique identifier.

The SQL to create this table looks something like this. I used SQLite but it’s mostly ANSI SQL. However my first stab used char gender and SQLite didn’t work for gender= ‘M’ so I switched to ismale with values 1 for true or 0 for false.

create table persons (
name text,
pounds int,
height int,
age int,
ismale int, -- 1 = male, 0 = female
id int)

I used this csv data to populate the table with details of seven people. The first row identifies what each column holds.

Name,Pounds,Height,Age,Gender,Id
David,200,68,54,1,0
Mark,150,72,30,1,1
Dawn,90,66,21,0,2
Susan,89,64,22,0,3
Cat,95,70,18,0,4
Don,125,72,27,1,5
Dale,140,74,43,1,6
David,120,69,24,1,7

Example table

Here are some data queries that you might want from the person table:

1. Return the data for all females between 18 and 28.
2. Return the average height of all males.
3. Return he total weight of all females aged over 18.
4. Return a list of unique names.

In SQL they’d look like this:

1. select * from persons where age between 18 and 28 and ismale = 0
2. select avg(height) from persons where ismale =1
3. select sum(pounds) from persons where ismale =0 and age > 18
4. select distinct name from persons

And in LINQ?

LINQ is all about processing a list. Usually it’s of objects but it can be XML documents or SQL Server databases. But mostly when people talk about LINQ they mean processing a list of objects in memory.

So if we take the table definition and code it as a C# object, we’d end up with something like this, with these six public properties.

class Person
{
public string name { get; set; }
public int pounds { get; set; }
public int weight { get; set; }
public int age { get; set; }
public int ismale { get; set; }
public int id { get; set; }
}

and we’d build up the list of people using the constructor initializer syntax. This is the same data as in the csv file that populated the database table.

var people = new List();
people.Add(new Person() { age = 54, ismale = 1, id = 0, name = "David", pounds = 200, height = 68});
people.Add(new Person() { age = 30, ismale = 1, id = 1, name = "Mark", pounds = 150, height = 72});
people.Add(new Person() { age = 21, ismale = 0, id = 2, name = "Dawn", pounds = 90, height = 66});
people.Add(new Person() { age = 22, ismale = 0, id = 3, name = "Susan", pounds = 89, height = 64});
people.Add(new Person() { age = 18, ismale = 0, id = 4, name = "Cat", pounds = 95, height = 70 });
people.Add(new Person() { age = 27, ismale = 1, id = 5, name = "Don", pounds = 125, height = 72 });
people.Add(new Person() { age = 43, ismale = 1, id = 6, name = "Dale", pounds = 140, height = 74 });
people.Add(new Person() { age = 24, ismale = 1, id = 7, name = "David", pounds = 120, height = 69 });

SQL Queries in LINQ?

So how would those four queries look if created in LINQ? Well, there are two forms of LINQ: the backwards SQL, which is called query syntax or the extension method, which consists of methods chained back to back. For the first query, the backwards SQL looks like this:

var somepeople = from p in people
where (p.age >= 18 && p.age <= 28 && p.ismale == 0) select p;

You can see why I call it backwards SQL! The members of somepeople is displayed with this:

foreach (var p in somepeople) { Console.WriteLine(p); }

Note, WriteLine(p) calls ToString() on each person p. I’ve overridden the ToString() method below so that it displays all the details for that person.

public override string ToString() { return String.Format("{0} {1} {2} {3} {4} {5}",name,pounds,height,age,ismale==1?'M':'F',id); }

Below is the more concise and powerful extension method syntax. This uses a Lambda expression, which is a sort of anonymous function that’s called for each member in the people list and returns true/false for the expression p.age >=. If true it calls Console.WriteLine. It’s the code equivalent of a SQL where clause.

Personally I find the extension member form of LINQ easier to understand and write than the query syntax but for anyone used to SQL and new to C#, the query syntax may be easier.

foreach (var p in people.Where(p => p.age >= 18 && p.age <= 28 && p.ismale == 0)) { Console.WriteLine(p); }

If there’s one page that you should bookmark when learning LINQ, it’s the IEnumerable<T> page. As it lists most of the methods you’ll ever use in LINQ.

For the second query (select avg(height) from persons where ismale =1) you can take an average on a sub list extracted using a Lambda expression to select objects where ismale = 1. So the extension method code looks like this:

var average = people.Where(p=> p.ismale==1).Average(p=>p.height);

With Query syntax, it’s easier to understand in two steps. Step 1 build a list in the query variable, step 2 get the average height.

var average2 = 0.0;
var query = from p in people
where (p.ismale == 1)
select p.height;
average2 = query.Average();

I use var so the compiler infers the type of the expression, which is a lot easier to read than IEnumerable<Person>.

In example three, Sum() is used like Average(), so here’s just the extension methods version:

var sum = people.Where(p=> p.ismale==0 && p.age >= 18).Sum(p=>p.pounds);

Finally, the fourth example is a little bit more complex because we have to extract the names and then show the distinct ones.

var names = people.Select(p => p.name).Distinct();
foreach (var name in names)
{
Console.WriteLine(name);
}

Remove the Distinct() and you’ll see both Davids.

Here we use select to extract a single field. Just as with SQL, you can’t do select distinct *, you need to identify the column in question. Remember the SQL from earlier?

select distinct name from persons

LINQ example output

Conclusion

Just as a database table has rows and columns and you create SQL queries that return a subset of rows and a subset of columns, or aggregate values, LINQ offers the same capabilities on lists of objects in memory. With LINQ, each object corresponds to a row and the object’s fields are the columns.

It’s really not hard to believe that the LINQ designers just copied the relational table and used SQL as their model. It seems to me that learning LINQ is a gentle introduction to the world of functional programming.

Comments

  1. BY Frank says:

    Why used LINQ instead of a good crafted stored procedure?
    What is the performance price using LINQ vs database native code?

    • BY Pal says:

      If you use LINQ, there is performance degrade. With linq, you can not process the table which has millions of records.

      LINQ can replace the dataset.

Post a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>