This article on how to optimize the performance of large database has undertaken a number of exploration, optimize database access properties of a number of strategies, particularly for the SQL statement for the effective analysis and design, to speed up execution, reduce network traffic, work more efficiently, and give full play to the efficiency of the system.
As the hospital information system module continues to increase, particularly in the last two years the use of electronic medical record, clinical diagnosis and management of large amounts of information written to the database, the amount of data has increased dramatically, causing business database is very large, the service processing speed decreased significantly.
Based on this issue, this article on how to optimize the performance of large database has undertaken a number of exploration, optimize database access properties of a number of strategies, particularly for the SQL statement for the effective analysis and design, to speed up execution, reduce network traffic, work more efficiently, and give full play to the efficiency of the system.Hospital after years of Informationization, and achieved remarkable results, information from the original to fees, accounting, and gradually to clinical care, services, patient transition.
As the hospital information system module continues to increase, particularly in the last two years the use of electronic medical record, clinical diagnosis and management of large amounts of information written to the database, the amount of data has increased dramatically, causing business database is very large, the service processing speed decreased significantly. In addition to frequent business database to conduct large amounts of data, the query or report statistics in business processing is often blocked or deadlock, serious impact on day-to-day work. So how to optimize database performance in design, that is, increasing database throughput, reduce user wait time is significant.Traditional database performance tuning from the operating system, client-side application software program, network, and other hardware devices, and other aspects to consider that this method simply adjust the database of the surrounding environment, only temporary relief, not fundamentally solve the problem.
Practical applications, more is the hospital information system (including the database system) have been designed, just run process as data grows, makes the system a cyclical performance problems. This article's hospital database system performance optimization is in their own hardware upgrade, the database physical design, relational normalization, improvement, on the SQL statement for the effective analysis and design, to speed up execution, reduce network traffic, work more efficiently, and give full play to the efficiency of the system.1 fair use index
Improve database query speed is the most efficient approach is to optimize index.
The index is built on the entity table on one data organization, it can improve the access tables in one or more records in the query efficiency, using the index aims to avoid a full table scan, reduce the number of times the disk i/o, and speed up the query speed, in a large table for index creation to speeds up table query is important. But it does not have any tables that you want to build an index, the index is usually able to improve the select, update, and delete statements of performance (when access rows during), but will reduce the performance of insert statements (since you need both tables and indexes to be inserted). In addition, too many indexes produces maintenance overhead, but will reduce rather than increase your system's performance, indexes are being used to be just right. Indexes use principles are as follows:(1) in a regular connection, but is not designated as a foreign key column is indexed, but not always-connected field by the optimizer to automatically generate index.
(2) in the frequent sorting or grouping (that is, a group by or order by operations) build an index on the column, and the frequent deletions, insertions of tables do not create too many indexes.
(3) in the condition expression often used for different values of column on retrieval, in different value less don't build an index on the column.
For example, in the employees table of the "gender" column only "male" and "female" two different values, so there is no need to build an index, if this indexing not only does improve query efficiency, but will update speed diminishes.(4) if the question sorted column has more than one, you can create a composite index on the column (compound index).
Try to use a narrow indexes, so that the data page each page can hold a number of index rows and reduce operation.(5) in the query is often used as a conditional expression and different values of columns indexed, and the different value less don't build an index on the column.
(6) when the database tables after updating large data, delete and rebuild indexes to improve query speed.
In short, the index must be careful that each index to establishing the necessity of careful analysis must be established.
Too many indexes or inadequate, incorrect index to improve the performance of the database no benefit.2 SQL statement tuning
SQL language is a very flexible language, same functionality implementation often can use several different statement to express, but the statement execution efficiency may be very different.
Therefore, any database system, reasonable by optimizations for SQL statements will greatly improve the performance of the entire database system. All of the SQL statement executes the procedure in three phases, namely processed syntax analysis, implementation, and to read the data.Figure 1 SQL statement executes the procedure
When you use SQL, performance differences in large or complex database environments, such as in some of HIS major performance is particularly evident in the table.
After a period of summary, found SQL statement causes of poor comparison mainly from inappropriate index design, inadequate conditions of connection and is not optimized in the where clause, and other inappropriate statements, actions, etc., in their proper optimize their speed has dramatically improved. Here are a few from thisThree aspects:2.1 LIKE operator
LIKE operator can use the wildcard queries, wildcard combination can reach almost any query, but if used well will have performance problems, such as like ' a% ' using indexes, like '% a ' do not use indexes.
Used like '% a% ' in the query, the query time consuming and the field value is proportional to the total length, so you cannot use the char type, which is VARCHAR.2.2 limit the rows returned
In the query Select statement with a Where clause to limit the number of rows returned to avoid table scans, if the return of unnecessary data, waste of server i/o resources and increased network load slow performance.
If the tables are large, in a table scan of the table is locked during the prohibition of other join access table, to serious consequences. You can use the TOP statement to limit the results returned. When you return multiple rows of data, whenever possible, do not use a cursor, because it takes up a lot of resources, you should use the datastore.2.3 UNION operator
UNION in the table link will filter out duplicate records, the table links back to the result set in operation, delete the duplicate records and then returns the result.
Actually most of the applications will not be duplicate records, the most common is the process table and historical tables UNION. The recommended alternative UNION UNION ALL operator, because UNION ALL operation is simply adds two results will return after the merger.2.4 Between and IN
Between sometimes faster than, and IN Between to more quickly find the scope according to the index.
Such as:select * from YF_KCMX where YPXH in (12,13)
Select * from YF_KCMX where between 12 and 13
Usually in the GROUP BY a HAVING clause can be struck out before extra rows, so try not to use them to do the work of removing rows.
Their optimal execution order should read as follows: select the Where clause to select all of the appropriate line, the Group By to group these statistic line, Having the words used to remove the extra group. This Group By the overhead of Having a small, fast query. For large data row grouping and Having very resource intensive. If Group BY is not included, only the group, with Distinct faster.2.5 attention to detail
Generally do not use the following wording: "<>", "! =", "!", ">. <", "NOT", "NOT EXISTS", "NOT IN", "NOT LIKE", and "LIKE '% 5 '", because they don't go index is a table scan.
NOT IN a table scan will be repeated, using EXISTS, NOT EXISTS, IN the LEFT OUTER JOIN to replace, especially the left join, but faster than Exists IN, the slowest is NOT operating. If the value of the column that contains null, it does not work, "<>", "! =", ">", and so on! or cannot be optimized, with no indexes.Not in the column names in WHere clause with functions such as substring, etc, Convert, if you must use a function, create a calculated column, and then create the index instead.
You can also modify wording:WHERE SUBSTRING(firstname,1,1) = ‘m’
Read: WHERE firstname like ' m% ' (index scan), MIN () and MAX () can be used to the appropriate index.
select * form ZY_FYMX where FYDJ > 3000
Analysis in this statement type Float if FYDJ is, the optimizer to optimize to Convert (float, 3000), because the 3000 is a whole number, we should use when in programming and don't wait run 3000.0 let DBMS for transformation.
The same character and integer data conversion. Should read:select * form ZY_FYMX where FYDJ > 3000.00
2.6 avoid correlated subqueries
A column label at the same time in the main query and the query in the where clause, then it is likely that when the main query column values change, the subquery must be requeried.
Query nested hierarchy, the lower the efficiency, and therefore should be avoided as far as subqueries. If the subquery is inevitable, then to the subquery to filter out as many rows.3 SQL case study
Case study of a 3.1
Hospital database capacity 28GB, according to the size of the MS_CF02 MS_CF01 and statistics, including number of MS_CF02 records to 1000 million; dispensing statistics time a month, taking 30 minutes still cannot find a result that seriously affect the normal course of business, was aborted.
The SQL statement for the previous statistics are as follows:
select sum(MS_C
F02.YPSL*MS_CF02.YPDJ*MS_CF02.CFTS) as totalfrom MS_CF01,MS_CF02
where MS_CF01.CFSB=MS_CF02.CFSB and MS_CF01.CFLX=1
and (MS_CF01.FYBZ=1 or MS_CF01.FYBZ=3)
and MS_CF01.FYRQ>=”2004.3.1 00:00:00”
and MS_CF01.FYRQ<=”2004.3.30 00:00:00”
and MS_CF01.ZFPB=0
According to the analysis of the system (only for MS SQL Server database), give the appropriate optimization that can significantly improve performance:
select top 1CFSB from MS_CF01 where FYRQ>=”2004.3.1 00:00:00”
//Get the smallest CFSB, for example 3198724
select top 1CFSB from MS_CF01 where FYRQ<=”2004.3.30 00:00:00”
Order by desc/CFSB/get the biggest CFSB, cases of 4178763
select sum(MS_CF02.YPSL*MS_CF02.YPDJ*MS_CF02.CFTS) as total
from MS_CF01,MS_CF02
where MS_CF01.CFSB=MS_CF02.CFSB and MS_CF01.CFLX=1
and MS_CF02.CFSB>=3198724 and MS_CF02.CFSB<=4178763
and (MS_CF01.FYBZ=1 or MS_CF01.FYBZ=3)
and MS_CF01.ZFPB=0
All statement completes, the result is not more than 18 seconds.
No comments:
Post a Comment