Tuesday, January 15, 2013

Statement Cache vs Procedure Cache for ad hoc queries in Sybase ASE

Statement Cache
 
For ad hoc SQL queries, ASE wraps the SQL statement with a lightweight stored procedure (LWP) and changes any local variables into procedure parameters. ASE will assign an object ID to this LWP. The statement cache will save the SQL text and the corresponding LWP object ID.

For an incoming statement, ASE computes a hash value from the statement, and uses this hash value to search for a matching statement in the statement cache.
- If a match is found in the statement cache, then search for the object ID in the procedure cache.
- If a match is not found, ASE first wraps up the statement in LWP, assigns an object ID, and caches this statement in the statement cache. Then the LWP is transferred to procedure cache.

Procedure cache
The procedure cache is a MRU/LRU (most recently used/least recently used) chain of stored procedure query plans. The entries are identified by the object ID.

When there is no plan for the corresponding object ID in the procedure cache, ASE compiles the procedure and caches the plan. The plan is compiled using the assigned runtime values for the local variables. If the plan exists but is invalid, ASE will build a new LWP  using the text of the cached SQL statement.

Statement Cache Matching
ASE matches an ad hoc SQL statement to a cached statement by the SQL text and by login (particularly if both users have sa_role), user ID, database ID, and session state settings. So if two users submit the same query, as the bindings are different (for example, both may have table tb1, but obviously they're different objects, since they belong to different database), they're still treated as different queries.

It's also possible that two queries were identical except for one or more literal values. By default they're treated as different queries. By "enable literal autoparam", the 2nd query can reuse the plan generated for the 1st query. For example, the query "select * from tb1 where col =1" will be saved internally as "select * from tb1 where col = @@@v0_INT". In this way we can reduce the compilation time, the entries and storage in the statement and procedure cache.

Why wrap up statement in LWP?
Surely we can use a single statement cache that includes everything: statement text, plan, literal form, etc. In ASE, we decide that statement cache should only cache the text, and perform object binding. The procedure cache take care of all plan related info. By doing this functionality separation,  we have a clearer framework, and we can do memory allocation in a more consistent way.

When to enable literal parametrization?
For OLTP, queries are often simple and homogeneous. We may consider using this "enable literal autoparam" configuration and reuse previous plans, to save the compilation time and storage.
For Business Warehouse, queries are often more complex and skewed. The plans may be quite different for different parameters after optimization. So it is desired that queries be recompiled optimized for each set of different parameters.

Streamlined dynamic SQL:
The statement cache can also be used to store dynamic SQL statements (prepared statements) converted to LWPs. Because the statement cache is shared among all connections, dynamic SQL statements can be reused across connections
To enable using the statement cache to store dynamic SQL statements, set the "streamlined dynamic SQL" configuration options.

Saturday, January 12, 2013

Stored procedure processing in Sybase ASE

When a stored procedure is created:
- the ASCII text of the procedure is stored in the syscomments table
- normalized form of the procedure(query tree) is stored in the sysprocedures table.

Resolution phase: Query Tree generation
Parses the SQL in to query tree, and resolves all objects involved into their internal representations.
- table names are resolved into their object IDs
- column names are resolved into their column IDs.

Compilation phase: building query plan
A query plan is built on the first execution of a stored procedure.
- SQL Server reads the corresponding query tree from the sysprocedures table
- loads it into the procedure cache
- creates a query plan and places it in the procedure cache.

The query plan is the optimized data access path that SQL Server uses to execute the procedure, based on the following:
- The SQL stored in the query tree
- Statistics for each table and index referenced in the procedure
- The values of any parameters passed to the procedure on the first execution

Query plans are held only in procedure cache and not on disk, they must be rebuilt if the SQL Server was restarted.

Multi-copy Query Plan in cache

- Only one user at a time can execute a given copy of a procedure's query plan.
- If two or more users try to execute the same procedure at the same time, SQL Server creates an additional query plan based on the parameters used in the latter execution. When a user finishes using the procedure, the query plan is available in cache for reuse by anyone with execute permissions.

- The second query plan may not be the same as the first one
  > different set of parameters
  > an index is added to a referenced table
  > updated the statistics

Adding an index or updating statistics does not force recompilation

When there're mulitple copies of query plans, there's no control which execution plan will be chosen for a given execution. You may see unexpectedly different execution times for the same procedure given the same data and parameters. To remove this uncertainty,
- Dropping and re-creating the procedure will cause all existing plans to be flushed out of cache.
- To ensure that you always get your own plan, you can use exec with recompile or create with recompile . Creating a procedure with the with recompile option decreases performance because every execution causes a compilation.

Recompilation
Recompilation takes place whenever one of the following events occurs:
- The procedure is loaded from disk to the procedure cache.
- An index on any table referred to in the procedure is dropped.
- All copies of the execution plan in cache are currently in use,
  and a new user wants to execute the procedure.
- A procedure is executed using the "with recompile" option.
- A table is flagged with the sp_recompile stored procedure.
  This causes SQL Server to re-resolve and then recompile any procedures or triggers that access that table, at execution time.

Dropping an index or a table referenced by a query causes SQL Server to mark the affected procedure as needing re-resolution and recompilation at execution time.

Neither the update statistics nor the create index command causes an automatic recompilation of stored procedures.

Re-resolution
Re-resolution causes the generation of a new plan, updates the existing query tree in the sysprocedures table.

Re-resolution occurs when one of the tables changes in such a way that the query tree stored in the sysprocedures table may be invalid. The datatypes, column offsets, object IDs, or other parts of the table may have changed.

Deferred compilation
Previously all statements in a stored procedure are compiled before they were executed. This meant that the actual values of local variables or knowledge of temporary tables created within the stored procedure were not available during optimization, so the plan just use magical numbers.

When using deferred compilation, stored procedures that reference local variables or temporary tables are not compiled until they are ready to be executed. Then optimizer can select a better plan for executing the stored procedure for the given data set.

The same plan can be reused for subsequent executions of the stored procedure. So the plan is optimized specifically for the values and data set used in the first execution, it may not be a good plan for subsequent executions of the stored procedure with different values and data sets.

With the option of "with recompile", the stored procedure will be recompiled for each execution rather than using the plan from a previous execution.

Deferred name resolution
The deferred name resolution feature allows objects, except for user-defined datatype objects, to be resolved when the stored procedure is executed for the first time, instead of at creation time. Using deferred_name_resolution allows creating a procedure that references a table that does not yet exist.

Procedure cache
The procedure cache is a MRU/LRU (most recently used/least recently used) chain of stored procedure query plans. As users execute stored procedures, Adaptive Server looks in the procedure cache for a query plan to use. If a query plan is available, it is placed on the MRU end of the chain, and execution begins.

If no plan is in memory, or if all copies are in use, the query tree for the procedure is read from the sysprocedures table. It is then optimized, using the parameters provided to the procedure, and put on the MRU end of the chain, and execution begins. Plans at the LRU end of the page chain that are not in use are aged out of the cache.