COUNT(*) or MAX(id) - which is faster?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

i have a web server, that has my own messaging system implemented.
I am at phase, when i need to create API, that checks, if the user has new message(s).
My DB table is simple:

ID - Auto Increment, Primary Key (Bigint)

Sender - Varchar (32) // Foreign Key to UserID hash from Users DB Table

Recipient - Varchar (32) // Foreign Key to UserID hash from Users DB Table

Message - Varchar (256) //UTF8 BIN

I am considering to make an api, that will estimate, if there are new messages for given user. I am thinking to use one of these methods:

A) Select count(*) of messages where sender or recipient is me.

(if this number > previous number, I have new message)

B) Select max(ID) of messages where sender or recipient is me.

(if max(ID) > than previous number, I have new message)

My question is: Can i calculate somehow, what method will consume less server resources? Or is there some article? Maybe another method i not mentioned?

edited 6 hours ago

Peter Cordes

134k18203342

asked 10 hours ago

FeHora

636

3

I think you would be better off by adding a timestamp column and checking against that value to see if there are newer records.

– Dharman
10 hours ago

Either querying a timestamp or the ID, use MAX() on that column, and make sure it's indexed with (user_id, timestamp).

– The Impaler
10 hours ago

@Dharman i was thinking of it. But it costs extra DB space, also i am not sure if it will be faster than one of my methods. I am storing the simple number (of current messages) in usernames table

– FeHora
10 hours ago

1

Calculate? No idea. But you can measure it. Fire off a few thousands of each query and watch machine metrics (cpu%, mem%, load average, etc.)

– Sergio Tulentsev
10 hours ago

1

While there is a good answer to this question below, I suspect you might be optimizing on something that turns out not to be important. And unless you anticipate having literally millions of messages, I wouldn't worry about disk space, especially because the timestamp is small compared to your other fields. If you add timestamps, your table will be about 5MB larger for each million messages. That's really nothing.

– Jerry
9 hours ago

add a comment |

i have a web server, that has my own messaging system implemented.
I am at phase, when i need to create API, that checks, if the user has new message(s).
My DB table is simple:

ID - Auto Increment, Primary Key (Bigint)

Sender - Varchar (32) // Foreign Key to UserID hash from Users DB Table

Recipient - Varchar (32) // Foreign Key to UserID hash from Users DB Table

Message - Varchar (256) //UTF8 BIN

I am considering to make an api, that will estimate, if there are new messages for given user. I am thinking to use one of these methods:

A) Select count(*) of messages where sender or recipient is me.

(if this number > previous number, I have new message)

B) Select max(ID) of messages where sender or recipient is me.

(if max(ID) > than previous number, I have new message)

My question is: Can i calculate somehow, what method will consume less server resources? Or is there some article? Maybe another method i not mentioned?

edited 6 hours ago

Peter Cordes

134k18203342

asked 10 hours ago

FeHora

636

3

I think you would be better off by adding a timestamp column and checking against that value to see if there are newer records.

– Dharman
10 hours ago

Either querying a timestamp or the ID, use MAX() on that column, and make sure it's indexed with (user_id, timestamp).

– The Impaler
10 hours ago

@Dharman i was thinking of it. But it costs extra DB space, also i am not sure if it will be faster than one of my methods. I am storing the simple number (of current messages) in usernames table

– FeHora
10 hours ago

1

Calculate? No idea. But you can measure it. Fire off a few thousands of each query and watch machine metrics (cpu%, mem%, load average, etc.)

– Sergio Tulentsev
10 hours ago

1

While there is a good answer to this question below, I suspect you might be optimizing on something that turns out not to be important. And unless you anticipate having literally millions of messages, I wouldn't worry about disk space, especially because the timestamp is small compared to your other fields. If you add timestamps, your table will be about 5MB larger for each million messages. That's really nothing.

– Jerry
9 hours ago

add a comment |

i have a web server, that has my own messaging system implemented.
I am at phase, when i need to create API, that checks, if the user has new message(s).
My DB table is simple:

ID - Auto Increment, Primary Key (Bigint)

Sender - Varchar (32) // Foreign Key to UserID hash from Users DB Table

Recipient - Varchar (32) // Foreign Key to UserID hash from Users DB Table

Message - Varchar (256) //UTF8 BIN

I am considering to make an api, that will estimate, if there are new messages for given user. I am thinking to use one of these methods:

A) Select count(*) of messages where sender or recipient is me.

(if this number > previous number, I have new message)

B) Select max(ID) of messages where sender or recipient is me.

(if max(ID) > than previous number, I have new message)

My question is: Can i calculate somehow, what method will consume less server resources? Or is there some article? Maybe another method i not mentioned?

edited 6 hours ago

Peter Cordes

134k18203342

asked 10 hours ago

FeHora

636

i have a web server, that has my own messaging system implemented.
I am at phase, when i need to create API, that checks, if the user has new message(s).
My DB table is simple:

ID - Auto Increment, Primary Key (Bigint)

Sender - Varchar (32) // Foreign Key to UserID hash from Users DB Table

Recipient - Varchar (32) // Foreign Key to UserID hash from Users DB Table

Message - Varchar (256) //UTF8 BIN

I am considering to make an api, that will estimate, if there are new messages for given user. I am thinking to use one of these methods:

A) Select count(*) of messages where sender or recipient is me.

(if this number > previous number, I have new message)

B) Select max(ID) of messages where sender or recipient is me.

(if max(ID) > than previous number, I have new message)

My question is: Can i calculate somehow, what method will consume less server resources? Or is there some article? Maybe another method i not mentioned?

php mysql performance

edited 6 hours ago

Peter Cordes

134k18203342

asked 10 hours ago

FeHora

636

edited 6 hours ago

Peter Cordes

134k18203342

asked 10 hours ago

FeHora

636

edited 6 hours ago

Peter Cordes

134k18203342

edited 6 hours ago

Peter Cordes

134k18203342

edited 6 hours ago

Peter Cordes

134k18203342

asked 10 hours ago

FeHora

636

asked 10 hours ago

FeHora

636

asked 10 hours ago

FeHora

636

3

I think you would be better off by adding a timestamp column and checking against that value to see if there are newer records.

– Dharman
10 hours ago

Either querying a timestamp or the ID, use MAX() on that column, and make sure it's indexed with (user_id, timestamp).

– The Impaler
10 hours ago

@Dharman i was thinking of it. But it costs extra DB space, also i am not sure if it will be faster than one of my methods. I am storing the simple number (of current messages) in usernames table

– FeHora
10 hours ago

1

Calculate? No idea. But you can measure it. Fire off a few thousands of each query and watch machine metrics (cpu%, mem%, load average, etc.)

– Sergio Tulentsev
10 hours ago

1

While there is a good answer to this question below, I suspect you might be optimizing on something that turns out not to be important. And unless you anticipate having literally millions of messages, I wouldn't worry about disk space, especially because the timestamp is small compared to your other fields. If you add timestamps, your table will be about 5MB larger for each million messages. That's really nothing.

– Jerry
9 hours ago

add a comment |

3

I think you would be better off by adding a timestamp column and checking against that value to see if there are newer records.

– Dharman
10 hours ago

Either querying a timestamp or the ID, use MAX() on that column, and make sure it's indexed with (user_id, timestamp).

– The Impaler
10 hours ago

@Dharman i was thinking of it. But it costs extra DB space, also i am not sure if it will be faster than one of my methods. I am storing the simple number (of current messages) in usernames table

– FeHora
10 hours ago

1

Calculate? No idea. But you can measure it. Fire off a few thousands of each query and watch machine metrics (cpu%, mem%, load average, etc.)

– Sergio Tulentsev
10 hours ago

1

While there is a good answer to this question below, I suspect you might be optimizing on something that turns out not to be important. And unless you anticipate having literally millions of messages, I wouldn't worry about disk space, especially because the timestamp is small compared to your other fields. If you add timestamps, your table will be about 5MB larger for each million messages. That's really nothing.

– Jerry
9 hours ago

I think you would be better off by adding a timestamp column and checking against that value to see if there are newer records.

– Dharman
10 hours ago

Either querying a timestamp or the ID, use MAX() on that column, and make sure it's indexed with (user_id, timestamp).

– The Impaler
10 hours ago

@Dharman i was thinking of it. But it costs extra DB space, also i am not sure if it will be faster than one of my methods. I am storing the simple number (of current messages) in usernames table

– FeHora
10 hours ago

Calculate? No idea. But you can measure it. Fire off a few thousands of each query and watch machine metrics (cpu%, mem%, load average, etc.)

– Sergio Tulentsev
10 hours ago

While there is a good answer to this question below, I suspect you might be optimizing on something that turns out not to be important. And unless you anticipate having literally millions of messages, I wouldn't worry about disk space, especially because the timestamp is small compared to your other fields. If you add timestamps, your table will be about 5MB larger for each million messages. That's really nothing.

– Jerry
9 hours ago

add a comment |

4 Answers
4

active

oldest

votes

In MySQL InnoDB, SELECT COUNT(*) WHERE secondary_index = ? is an expensive operation and when the user has a lot of messages, this query might take a long time. Even when using an index, the engine still needs to count all matching records.

On the other hand, SELECT MAX(id) WHERE secondary_index = ? can deliver the highest id in that index very efficiently and runs in constant speed by doing a so-called loose index scan.

If you want to understand why, consider looking up the "B-Tree+" data structure which InnoDB uses to organise its data.

I suggest you go with SELECT MAX(id), if the requirement is only to check if there are new messages (and not the count of them).

Also, if you rely on the message count you might open a gap for race conditions. What if the user deletes a message and receives a new one between two polling intervals?

edited 10 hours ago

answered 10 hours ago

Kaii

15.7k22951

refer: dba.stackexchange.com/questions/130780/mysql-count-performance

– Kaii
10 hours ago

1

"SELECT MAX(id) will always use the primary index" - yeah, except for the cases when there's a where on an unindexed field.

– Sergio Tulentsev
10 hours ago

@SergioTulentsev i forgot to mention in my main post, sender and recipient are foreign keys to user-hash (ID) - primary key in users table. So it will be indexed always.

– FeHora
10 hours ago

4

If there's an index on a, then SELECT MAX(id) FROM tbl WHERE a=constant uses a so-called loose index scan. Those are almost miraculously fast. SELECT COUNT(*) FROM tbl WHERE a=constant does a tight index scan, which is not as fast.

– O. Jones
10 hours ago

1

@FeHora i strongly suggest to setup some sort of test environment, a database with generated records for you to play with.

– Kaii
10 hours ago

|
show 4 more comments

To have the information that someone has new messages - do exactly that. Update the field in users table (I'm assuming that's the name) when a new message is recorded in the system. You have the recipient's ID, that's all you need. You can create an after insert trigger (assumption: there's users2messages table) that updates users table with a boolean flag indicating there's a message.

This approach is by far faster than counting indexes, be the index primary or secondary. When the user performs an action, you can update the users table with has_messages = 0, when a new message arrives - you update the table with has_messages = 1. It's simple, it works, it scales and using triggers to maintain it makes it easy and seamless.
I'm sure there will be nay-sayers who don't like triggers, you can do it manually at the point of associating a user with a new message.

answered 10 hours ago

Mjh

1,98911113

triggers aside, looking up a row using the PK and also reading it to check the boolean is still more expensive than executing a single loose index scan. It gets worse when you also add a WHERE clause to check the boolean flag because of the low cardinality even if you index that field. Sorry to tell you you that, but you have a misunderstanding there.

– Kaii
9 hours ago

@Mjh i know about that.. but it's definitely more expensive than my suggested methods, because it contains (at least) 1x update + 1x select

– FeHora
9 hours ago

1

@Kaii SELECT has_messages FROM users WHERE id = 1; is the fastest query there is. It's an eq_ref which is infinitely faster than counting a number of records in the table. The boolean field is not in the WHERE clause, the primary key is. Please, assume better next time. In regards to updating the table: the update is fast as well, it handles a single row located using the primary key. If the field is already containing the value that you're updating to, no actual disk I/O occurs and there's a minimal performance penalty. Much less than counting the records. You can measure.

– Mjh
9 hours ago

add a comment |

If you need to know the number of new messages then using
Select count(*) from Messages where user_id in (sender, recipient) and id > last_seen_id would be your best option.

I'm a fan of using exists where possible, so to determine IF there are new messages, my query would be Select exists(Select 1 from Messages where user_id in (sender, recipient) and id > last_seen_id). The benefit of exists is that as soon as it finds 1 record it returns true.

Edit: To avoid any confusion in reading this answer, both of those queries would also include a check for other_user_id in (sender, recipient) in order to only return the messages between 2 specific users.

edited 3 hours ago

answered 3 hours ago

Aaron

417

add a comment |

@FeHora You talk about not using keys to save db space. The table designs wastes more db space.

ID - Auto Increment, Primary Key (Bigint)

Is bigint really necessary? Let us assume, the a message is send every second. The a int unsigned is enough for 126 years. And if you have really so much messages, a key is mandatory.

Sender - Varchar (32) // Foreign Key to UserID hash from Users DB Table

Recipient - Varchar (32) // Foreign Key to UserID hash from Users DB Table

Why not using the UserID (usually an int unsigned).

Then I would add a seen flags. Btw, you can add for all filed the attribute not null.

seen tinyint not NULL.

Last not least I recomment the variant of @Mjh : Define a flag has_messages, or new_messages, or both in the user record. Usually, the user record is loaded so it is NOT an additional database query.

answered 1 hour ago

Wiimm

955516

This messaging system is for a government-ish organization, 90% of messages are sent to users from systems (like temperature in room is above 30C ..etc etc).. It can generate millions of messages per hour, that's why i need to optimize every microsecond of server time. I cannot use UserID key because of reverse engineering + GDPR (EU thing). Long story short - i need to have everything encrypted and fast. every additional data field can cause a lot of extra unwanted database storage space.

– FeHora
41 mins ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55581114%2fcount-or-maxid-which-is-faster%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

On the other hand, SELECT MAX(id) WHERE secondary_index = ? can deliver the highest id in that index very efficiently and runs in constant speed by doing a so-called loose index scan.

If you want to understand why, consider looking up the "B-Tree+" data structure which InnoDB uses to organise its data.

I suggest you go with SELECT MAX(id), if the requirement is only to check if there are new messages (and not the count of them).

Also, if you rely on the message count you might open a gap for race conditions. What if the user deletes a message and receives a new one between two polling intervals?

edited 10 hours ago

answered 10 hours ago

Kaii

15.7k22951

refer: dba.stackexchange.com/questions/130780/mysql-count-performance

– Kaii
10 hours ago

1

"SELECT MAX(id) will always use the primary index" - yeah, except for the cases when there's a where on an unindexed field.

– Sergio Tulentsev
10 hours ago

@SergioTulentsev i forgot to mention in my main post, sender and recipient are foreign keys to user-hash (ID) - primary key in users table. So it will be indexed always.

– FeHora
10 hours ago

4

If there's an index on a, then SELECT MAX(id) FROM tbl WHERE a=constant uses a so-called loose index scan. Those are almost miraculously fast. SELECT COUNT(*) FROM tbl WHERE a=constant does a tight index scan, which is not as fast.

– O. Jones
10 hours ago

1

@FeHora i strongly suggest to setup some sort of test environment, a database with generated records for you to play with.

– Kaii
10 hours ago

|
show 4 more comments

On the other hand, SELECT MAX(id) WHERE secondary_index = ? can deliver the highest id in that index very efficiently and runs in constant speed by doing a so-called loose index scan.

If you want to understand why, consider looking up the "B-Tree+" data structure which InnoDB uses to organise its data.

I suggest you go with SELECT MAX(id), if the requirement is only to check if there are new messages (and not the count of them).

Also, if you rely on the message count you might open a gap for race conditions. What if the user deletes a message and receives a new one between two polling intervals?

edited 10 hours ago

answered 10 hours ago

Kaii

15.7k22951

refer: dba.stackexchange.com/questions/130780/mysql-count-performance

– Kaii
10 hours ago

1

"SELECT MAX(id) will always use the primary index" - yeah, except for the cases when there's a where on an unindexed field.

– Sergio Tulentsev
10 hours ago

@SergioTulentsev i forgot to mention in my main post, sender and recipient are foreign keys to user-hash (ID) - primary key in users table. So it will be indexed always.

– FeHora
10 hours ago

4

If there's an index on a, then SELECT MAX(id) FROM tbl WHERE a=constant uses a so-called loose index scan. Those are almost miraculously fast. SELECT COUNT(*) FROM tbl WHERE a=constant does a tight index scan, which is not as fast.

– O. Jones
10 hours ago

1

@FeHora i strongly suggest to setup some sort of test environment, a database with generated records for you to play with.

– Kaii
10 hours ago

|
show 4 more comments

On the other hand, SELECT MAX(id) WHERE secondary_index = ? can deliver the highest id in that index very efficiently and runs in constant speed by doing a so-called loose index scan.

If you want to understand why, consider looking up the "B-Tree+" data structure which InnoDB uses to organise its data.

I suggest you go with SELECT MAX(id), if the requirement is only to check if there are new messages (and not the count of them).

Also, if you rely on the message count you might open a gap for race conditions. What if the user deletes a message and receives a new one between two polling intervals?

edited 10 hours ago

answered 10 hours ago

Kaii

15.7k22951

On the other hand, SELECT MAX(id) WHERE secondary_index = ? can deliver the highest id in that index very efficiently and runs in constant speed by doing a so-called loose index scan.

If you want to understand why, consider looking up the "B-Tree+" data structure which InnoDB uses to organise its data.

I suggest you go with SELECT MAX(id), if the requirement is only to check if there are new messages (and not the count of them).

Also, if you rely on the message count you might open a gap for race conditions. What if the user deletes a message and receives a new one between two polling intervals?

edited 10 hours ago

answered 10 hours ago

Kaii

15.7k22951

edited 10 hours ago

answered 10 hours ago

Kaii

15.7k22951

answered 10 hours ago

Kaii

15.7k22951

answered 10 hours ago

Kaii

15.7k22951

refer: dba.stackexchange.com/questions/130780/mysql-count-performance

– Kaii
10 hours ago

1

"SELECT MAX(id) will always use the primary index" - yeah, except for the cases when there's a where on an unindexed field.

– Sergio Tulentsev
10 hours ago

@SergioTulentsev i forgot to mention in my main post, sender and recipient are foreign keys to user-hash (ID) - primary key in users table. So it will be indexed always.

– FeHora
10 hours ago

4

If there's an index on a, then SELECT MAX(id) FROM tbl WHERE a=constant uses a so-called loose index scan. Those are almost miraculously fast. SELECT COUNT(*) FROM tbl WHERE a=constant does a tight index scan, which is not as fast.

– O. Jones
10 hours ago

1

@FeHora i strongly suggest to setup some sort of test environment, a database with generated records for you to play with.

– Kaii
10 hours ago

|
show 4 more comments

refer: dba.stackexchange.com/questions/130780/mysql-count-performance

– Kaii
10 hours ago

1

"SELECT MAX(id) will always use the primary index" - yeah, except for the cases when there's a where on an unindexed field.

– Sergio Tulentsev
10 hours ago

@SergioTulentsev i forgot to mention in my main post, sender and recipient are foreign keys to user-hash (ID) - primary key in users table. So it will be indexed always.

– FeHora
10 hours ago

4

If there's an index on a, then SELECT MAX(id) FROM tbl WHERE a=constant uses a so-called loose index scan. Those are almost miraculously fast. SELECT COUNT(*) FROM tbl WHERE a=constant does a tight index scan, which is not as fast.

– O. Jones
10 hours ago

1

@FeHora i strongly suggest to setup some sort of test environment, a database with generated records for you to play with.

– Kaii
10 hours ago

refer: dba.stackexchange.com/questions/130780/mysql-count-performance

– Kaii
10 hours ago

"SELECT MAX(id) will always use the primary index" - yeah, except for the cases when there's a where on an unindexed field.

– Sergio Tulentsev
10 hours ago

@SergioTulentsev i forgot to mention in my main post, sender and recipient are foreign keys to user-hash (ID) - primary key in users table. So it will be indexed always.

– FeHora
10 hours ago

If there's an index on a, then SELECT MAX(id) FROM tbl WHERE a=constant uses a so-called loose index scan. Those are almost miraculously fast. SELECT COUNT(*) FROM tbl WHERE a=constant does a tight index scan, which is not as fast.

– O. Jones
10 hours ago

@FeHora i strongly suggest to setup some sort of test environment, a database with generated records for you to play with.

– Kaii
10 hours ago

|
show 4 more comments

answered 10 hours ago

Mjh

1,98911113

triggers aside, looking up a row using the PK and also reading it to check the boolean is still more expensive than executing a single loose index scan. It gets worse when you also add a WHERE clause to check the boolean flag because of the low cardinality even if you index that field. Sorry to tell you you that, but you have a misunderstanding there.

– Kaii
9 hours ago

@Mjh i know about that.. but it's definitely more expensive than my suggested methods, because it contains (at least) 1x update + 1x select

– FeHora
9 hours ago

1

@Kaii SELECT has_messages FROM users WHERE id = 1; is the fastest query there is. It's an eq_ref which is infinitely faster than counting a number of records in the table. The boolean field is not in the WHERE clause, the primary key is. Please, assume better next time. In regards to updating the table: the update is fast as well, it handles a single row located using the primary key. If the field is already containing the value that you're updating to, no actual disk I/O occurs and there's a minimal performance penalty. Much less than counting the records. You can measure.

– Mjh
9 hours ago

add a comment |

answered 10 hours ago

Mjh

1,98911113

triggers aside, looking up a row using the PK and also reading it to check the boolean is still more expensive than executing a single loose index scan. It gets worse when you also add a WHERE clause to check the boolean flag because of the low cardinality even if you index that field. Sorry to tell you you that, but you have a misunderstanding there.

– Kaii
9 hours ago

@Mjh i know about that.. but it's definitely more expensive than my suggested methods, because it contains (at least) 1x update + 1x select

– FeHora
9 hours ago

1

@Kaii SELECT has_messages FROM users WHERE id = 1; is the fastest query there is. It's an eq_ref which is infinitely faster than counting a number of records in the table. The boolean field is not in the WHERE clause, the primary key is. Please, assume better next time. In regards to updating the table: the update is fast as well, it handles a single row located using the primary key. If the field is already containing the value that you're updating to, no actual disk I/O occurs and there's a minimal performance penalty. Much less than counting the records. You can measure.

– Mjh
9 hours ago

add a comment |

answered 10 hours ago

Mjh

1,98911113

answered 10 hours ago

Mjh

1,98911113

answered 10 hours ago

Mjh

1,98911113

answered 10 hours ago

Mjh

1,98911113

answered 10 hours ago

Mjh

1,98911113

triggers aside, looking up a row using the PK and also reading it to check the boolean is still more expensive than executing a single loose index scan. It gets worse when you also add a WHERE clause to check the boolean flag because of the low cardinality even if you index that field. Sorry to tell you you that, but you have a misunderstanding there.

– Kaii
9 hours ago

@Mjh i know about that.. but it's definitely more expensive than my suggested methods, because it contains (at least) 1x update + 1x select

– FeHora
9 hours ago

1

@Kaii SELECT has_messages FROM users WHERE id = 1; is the fastest query there is. It's an eq_ref which is infinitely faster than counting a number of records in the table. The boolean field is not in the WHERE clause, the primary key is. Please, assume better next time. In regards to updating the table: the update is fast as well, it handles a single row located using the primary key. If the field is already containing the value that you're updating to, no actual disk I/O occurs and there's a minimal performance penalty. Much less than counting the records. You can measure.

– Mjh
9 hours ago

add a comment |

triggers aside, looking up a row using the PK and also reading it to check the boolean is still more expensive than executing a single loose index scan. It gets worse when you also add a WHERE clause to check the boolean flag because of the low cardinality even if you index that field. Sorry to tell you you that, but you have a misunderstanding there.

– Kaii
9 hours ago

@Mjh i know about that.. but it's definitely more expensive than my suggested methods, because it contains (at least) 1x update + 1x select

– FeHora
9 hours ago

1

@Kaii SELECT has_messages FROM users WHERE id = 1; is the fastest query there is. It's an eq_ref which is infinitely faster than counting a number of records in the table. The boolean field is not in the WHERE clause, the primary key is. Please, assume better next time. In regards to updating the table: the update is fast as well, it handles a single row located using the primary key. If the field is already containing the value that you're updating to, no actual disk I/O occurs and there's a minimal performance penalty. Much less than counting the records. You can measure.

– Mjh
9 hours ago

triggers aside, looking up a row using the PK and also reading it to check the boolean is still more expensive than executing a single loose index scan. It gets worse when you also add a WHERE clause to check the boolean flag because of the low cardinality even if you index that field. Sorry to tell you you that, but you have a misunderstanding there.

– Kaii
9 hours ago

@Mjh i know about that.. but it's definitely more expensive than my suggested methods, because it contains (at least) 1x update + 1x select

– FeHora
9 hours ago

@Kaii SELECT has_messages FROM users WHERE id = 1; is the fastest query there is. It's an eq_ref which is infinitely faster than counting a number of records in the table. The boolean field is not in the WHERE clause, the primary key is. Please, assume better next time. In regards to updating the table: the update is fast as well, it handles a single row located using the primary key. If the field is already containing the value that you're updating to, no actual disk I/O occurs and there's a minimal performance penalty. Much less than counting the records. You can measure.

– Mjh
9 hours ago

add a comment |

If you need to know the number of new messages then using
Select count(*) from Messages where user_id in (sender, recipient) and id > last_seen_id would be your best option.

edited 3 hours ago

answered 3 hours ago

Aaron

417

add a comment |

If you need to know the number of new messages then using
Select count(*) from Messages where user_id in (sender, recipient) and id > last_seen_id would be your best option.

edited 3 hours ago

answered 3 hours ago

Aaron

417

add a comment |

If you need to know the number of new messages then using
Select count(*) from Messages where user_id in (sender, recipient) and id > last_seen_id would be your best option.

edited 3 hours ago

answered 3 hours ago

Aaron

417

If you need to know the number of new messages then using
Select count(*) from Messages where user_id in (sender, recipient) and id > last_seen_id would be your best option.

edited 3 hours ago

answered 3 hours ago

Aaron

417

edited 3 hours ago

answered 3 hours ago

Aaron

417

answered 3 hours ago

Aaron

417

answered 3 hours ago

Aaron

417

add a comment |

@FeHora You talk about not using keys to save db space. The table designs wastes more db space.

ID - Auto Increment, Primary Key (Bigint)

Is bigint really necessary? Let us assume, the a message is send every second. The a int unsigned is enough for 126 years. And if you have really so much messages, a key is mandatory.

Sender - Varchar (32) // Foreign Key to UserID hash from Users DB Table

Recipient - Varchar (32) // Foreign Key to UserID hash from Users DB Table

Why not using the UserID (usually an int unsigned).

Then I would add a seen flags. Btw, you can add for all filed the attribute not null.

seen tinyint not NULL.

answered 1 hour ago

Wiimm

955516

This messaging system is for a government-ish organization, 90% of messages are sent to users from systems (like temperature in room is above 30C ..etc etc).. It can generate millions of messages per hour, that's why i need to optimize every microsecond of server time. I cannot use UserID key because of reverse engineering + GDPR (EU thing). Long story short - i need to have everything encrypted and fast. every additional data field can cause a lot of extra unwanted database storage space.

– FeHora
41 mins ago

add a comment |

@FeHora You talk about not using keys to save db space. The table designs wastes more db space.

ID - Auto Increment, Primary Key (Bigint)

Is bigint really necessary? Let us assume, the a message is send every second. The a int unsigned is enough for 126 years. And if you have really so much messages, a key is mandatory.

Sender - Varchar (32) // Foreign Key to UserID hash from Users DB Table

Recipient - Varchar (32) // Foreign Key to UserID hash from Users DB Table

Why not using the UserID (usually an int unsigned).

Then I would add a seen flags. Btw, you can add for all filed the attribute not null.

seen tinyint not NULL.

answered 1 hour ago

Wiimm

955516

This messaging system is for a government-ish organization, 90% of messages are sent to users from systems (like temperature in room is above 30C ..etc etc).. It can generate millions of messages per hour, that's why i need to optimize every microsecond of server time. I cannot use UserID key because of reverse engineering + GDPR (EU thing). Long story short - i need to have everything encrypted and fast. every additional data field can cause a lot of extra unwanted database storage space.

– FeHora
41 mins ago

add a comment |

@FeHora You talk about not using keys to save db space. The table designs wastes more db space.

ID - Auto Increment, Primary Key (Bigint)

Is bigint really necessary? Let us assume, the a message is send every second. The a int unsigned is enough for 126 years. And if you have really so much messages, a key is mandatory.

Sender - Varchar (32) // Foreign Key to UserID hash from Users DB Table

Recipient - Varchar (32) // Foreign Key to UserID hash from Users DB Table

Why not using the UserID (usually an int unsigned).

Then I would add a seen flags. Btw, you can add for all filed the attribute not null.

seen tinyint not NULL.

answered 1 hour ago

Wiimm

955516

@FeHora You talk about not using keys to save db space. The table designs wastes more db space.

ID - Auto Increment, Primary Key (Bigint)

Is bigint really necessary? Let us assume, the a message is send every second. The a int unsigned is enough for 126 years. And if you have really so much messages, a key is mandatory.

Sender - Varchar (32) // Foreign Key to UserID hash from Users DB Table

Recipient - Varchar (32) // Foreign Key to UserID hash from Users DB Table

Why not using the UserID (usually an int unsigned).

Then I would add a seen flags. Btw, you can add for all filed the attribute not null.

seen tinyint not NULL.

answered 1 hour ago

Wiimm

955516

answered 1 hour ago

Wiimm

955516

answered 1 hour ago

Wiimm

955516

answered 1 hour ago

Wiimm

955516

This messaging system is for a government-ish organization, 90% of messages are sent to users from systems (like temperature in room is above 30C ..etc etc).. It can generate millions of messages per hour, that's why i need to optimize every microsecond of server time. I cannot use UserID key because of reverse engineering + GDPR (EU thing). Long story short - i need to have everything encrypted and fast. every additional data field can cause a lot of extra unwanted database storage space.

– FeHora
41 mins ago

add a comment |

This messaging system is for a government-ish organization, 90% of messages are sent to users from systems (like temperature in room is above 30C ..etc etc).. It can generate millions of messages per hour, that's why i need to optimize every microsecond of server time. I cannot use UserID key because of reverse engineering + GDPR (EU thing). Long story short - i need to have everything encrypted and fast. every additional data field can cause a lot of extra unwanted database storage space.

– FeHora
41 mins ago

This messaging system is for a government-ish organization, 90% of messages are sent to users from systems (like temperature in room is above 30C ..etc etc).. It can generate millions of messages per hour, that's why i need to optimize every microsecond of server time. I cannot use UserID key because of reverse engineering + GDPR (EU thing). Long story short - i need to have everything encrypted and fast. every additional data field can cause a lot of extra unwanted database storage space.

– FeHora
41 mins ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Gfyuki