I have essentially a table of numbers — a time series of measurements. Each row in the table has 5 values for the 5 different categories, and a sum row for the total of all categories.
If I take the average of each column and sum the averages together, should it equal the average of the rows’ sums (ignoring rounding error, of course)?
(I’ve got a case where the two values keep coming out different by about 30% and I’m wondering just how crazy I am.)
Update: See below — I was (slightly) crazy and had an error in my code.
The average of the entries in a column is the sum of the entries in that column, divided by the number of entries. The number of entries is the number of rows. So the sum of the averages is the sum of all the entries in the table, divided by the number of rows.
The average of the row sums is the sum of all entries in the table divided by the number of rows, so you should get the same number either way.
Sigh!! Found my problem — it was a stupid “dupe error” in my code. I was looking for an error in the “average of sums” logic, but it was in the “sum of averages” logic — referencing the wrong variable.
Well, anyway, we’ve demonstrated about 5 ways from Sunday that the sum of averages really IS equal to the average of sums, in case that’s important to anyone in the future.
Generally no is correct, it is only the same in specific cases.
Sum(x) / Sum(y) not equal to Sum (x/y) / n
where n is the total entries x is row entries and y is column entries.
only true if all of the y’s are equal
eg: (1/2 + 3/5)/2 = 11/20
(1+3)/(2+5) = 4/7
Where as if y is equal (1/7 + 4/7)/2 = 5/14
(1+4)/(7+7) = 5/14
PS Sorry about posting on dead thread just want it to be right for anyone else looking.
Actually Steve could be correct. I’ll give you a simple example and then explain why intelligent people can come up with different answers because in a way, they’re both “right.”
First row: 5; 6;
Second row: 1; 2;
Third row: 3; 4;
If you do either the sum of the averages or average of the sums as Daniel asked, then you’ll get 7 as the answer. If however, you remove the 1 leaving a hole in your table, then your average of the sums drops to 6 2/3 and your sum of the averages increases to 8.
If your table of data has blanks or missing data points, then the two are almost never the same. If the table of data is equally/evenly distributed without any missing points or holes in the table, then they should always be the same. Anybody can test this out with MS Excel and the =RAND() function. Generate a table with any number of rows/columns and fill in the rows and columns with random numbers or let it generate random numbers for you. Then use =AVERAGE() to average the columns and =SUM() to add up the averages. Then reverse the process and use =SUM() to add the rows and =AVERAGE() to average the sums. If the table is complete, then the two numbers will be precisely the same. If however, your data for any reason is missing entries, then it can vary by a great percentage. Just start deleting data points in the middle of the table and watch the two results greatly fluctuate.
Also of notes is if you flip the rows and columns then you get completely different results, so make sure you’re consistent. If you average the rows in the above example and sum the averages, or sum the columns and average the sums, then you get 10.5 with a complete table and 11 and 10, respectively with the 1 missing.
mixed math is correct. take 3 columns; 10 10s, 5 1s and 2,3,5,6,6,7,9,10 (8 values from rand), don’t average blanks.
avg of avgs is 5.67; avg of all values is 6.65.
@Mixed math; it is ok to answer an old thread. This stuff, truth or truthy, lives forever on the internet
If I take the average of each column and sum the averages together,
should it equal the average of the rows’ sums (ignoring rounding
error, of course)?
There seems to be some confusion on the correct answer, and as this comes up on the first page of Google I thought it worth throwing in my 2 cents in an attempt to clarify.
As a number of people have already said the “sum of averages is equal to the average of sums” (Google that if you want more info). The Linear Combination of Random Variables Theory clearly shows this as well. A bit of math also shows that irrelevant to which way you calculate the final average your doing the same thing. Which is to add all the scores up and divide by how many there are.
The confusion is coming in because the multiplication of averages does not always equal the average of the multiplications, and that was one of the examples someone (other than the initial poster) used. Once you start multiplying any of the inputs by anything other than a constant you expect different answers. Or to put it another way we don’t expect the sum of products to equal the product of sums.
Someone also mentioned that having blanks ‘breaks’ this rule. I’d agree with this too.