AMFC : A Novel Archive Modeling Based On Data Cluster and Filtering

File archiving now needs to be appropriately managed so that it is easy to find and manage. File archiving in question is how to help in the process of finding data with a considerable number

In this analysis the necessary selection and consideration of several methods in accordance with what was discussed. The first consideration regarding serial dilute method where this method focuses on repetition completely randomized design. And also multiclass classification method that focuses on the observation and the observation data analyzed new data (Lalis, 2016). Methods RBAC (Role Based Access Control) where this method of management authority (Lin, Li, & Ma, 2014). Several methods mentioned above having a base and a different principle (Jayalakshmi & Pandian, 2014). Using the methods applied in this analysis will illustrate the efficiency and accuracy appropriate to the data (Borhanifar & Sadri, 2014).
Management information is used to provide information services (Kostoska, Chorbev, & Gusev, 2014). In their records management needs of collaboration and coordination (Poba-Nzaou, 2016). In this archive filing arrangement necessary requirements and conditions set forth in order to be well controlled (Klampfl, Granitzer, Jack, & Kern, 2014). So that becomes centralized management (Gospodinov, Gospodinova, & Cheshmedijev, 2014). Differences of several international journals are bound or be references regarding the effectiveness of its use. Where the use of methods of the alphabet and the numeric system is used to initialization and manufacture new code (Nyers, Garbai, & Nyers, 2014) . And also the method of K-means clustering function to a dataset into clusters (Cebeci & Yildiz, 2015). Where the system settings to a file named and logically placed to be able to save easily. This method also controls how information is stored and retrieved.
The literature review sought is used to understand the topic to be discussed. Journal with the title "Web-Based Document Filing Information System". Explain the records in the company's business processes that use the PHP programming language with the results of the Academic and Student Administration Bureau archive report differences with the research conducted is the use of programming languages where the research actually uses PHP while researchers use Codeigniter Language. Then the journal with the title "Information Systems On Archives In State Gembong Kab. Multi User-Based Pati ". In this journal discusses archiving, which still uses manual methods for the main archiving media, so that this journal produces an information system about archiving. the difference made by this research is how the management of the archive itself, where the journal manages an information archive system that is only made according to the flow of desires, but in this study the application of the K-Means Clustering method is applied for its management so that it is easier to group the types archive in it.

THE PROBLEM
In the problem that was done in this study was the absence of storage and management of data / archives at the ATR / BPN Sidoarjo office, so management was needed, and here researchers used alphabetical and numerical methods and kmeans clustering. then the method is applied in making the application where later the results will be expected to schedule the problems that exist in the ATR / where the problem search with interviews is a sam-pling technique that will be analyzed and tested and the way that is done with interviews is to ask directly to the external counselor about matters relating to what will be tested and what needed. After conducting a problem search the next step is data collection. In the data collection in question is how to get the data to be managed, and how to request raw data directly to the external counselor where later the raw data will be used to manage the data . when you have obtained the next step is needed to process the data using the alphabetical method. 3. Process Data Alphabetically and Numerically a. Raw Data In the process of managing data there needs to be data to be managed, where the raw data produced is used for the process of applying the method. And the raw data in question is data generated from data collection results.

b. Application Of Methods
The process of applying the method is an advanced process of the results of previous raw data management which is useful for reference management to be carried out.
And the process of applying the method to be used is in accordance with the rules and agreements that have been previously confirmed. And then the process is done using alphabetical and numeric methods c. Alphabetical Method The use of the alphabetical method is used to find alphabetical data wherein the purpose is to obtain existing alphabetical data. And the method used refers to the previous process, namely collecting data. So that raw data can be managed in this method. And the equation used in this method can be seen in equation 1 with an explanation of the equation as below.
The equation above is used to manufacture a new code based on the reference on village data with the following explanation: In the process the existing raw data is managed by using equation 1 in validation whether the process is correct or not, if not then the process is done again when it is correct then the alphabetical data is obtained and the subsequent process will be carried out using the K-Means Clustering method.

d. Numerical Method
The use of this numerical method is used to search for alphabetical data where the purpose is to obtain existing numerical data. And the method used refers to the previous process, namely collecting data. So that raw data can be managed in this method. And the equation used in this method can be seen in equation 2 with an explanation of the equation as below: The equation above is used to manufacture a new code based on the reference on village data with the following explanation: In the process of the existing raw data managed by using equationin validation whether the process is correct or not, if not then the process is done again if it is correct then numerical data has been obtained then the process will be carried out using the K-Means Clustering method. e. Application of K-Means Clustering Method The process of the implementation of the method of K-Means Clustering is the process of grouping data by using a equation on the K-Means Clustering which will be described in more detail in K-Means Clustering Method.

K-Means Clustering Method
This method is used to separate in agglomeration data sets into smaller groups. This method is used by selecting some of the data of all the data.Where the group/groups are formed by minimizing the amount of data of the Euclidean distance between the data and its center point. a. Alphabetical and Numeric Data alphabetical and numerical data obtained from the process of using alphabetical and numerical methods will be processed using the application of the K-Means Clustering method where data management is taken from managed data using alphabetical methods and numerical methods, where data will be processed using stages according to K-means Clustering rules. the first step that is done is initializing the data and can be seen in the data initialization process. b. initialization Data The process of initiating this data uses the excel equation which is the vlookup equation, which previously used the equation to make references from the alphabet and numeric methods

Numerical and Alphabetical Methods
The use of this method to present data efficiently to find a significant parameter (Haque, Amale, & Kamble, 2014). The alphabetic method adopted in the dictionary of terms or be referred by initialization the data (Abbas & Yasin, 2016). Numerical methods are used as initialization numbers with the specified parameters. In using the above method in which the initialization need to establish appropriate data sorting (Kiss, Genge, Haller, & Sebestyén, 2014). This method is used to separate in agglomeration data sets into smaller groups. This method is used by selecting some of the data of all the data (Wang et al., 2015). Where the group/groups are formed by minimizing the amount of data of the Euclidean distance between the data and its center point. workflow as follows:

Figure 1 data management process
The raw data will be managed using the alphabet and numerical methods followed by a K-means cluster management with the aim to eliminate redundancies and reduce a large amount of data (Harb, Makhoul, & Couturier, 2015). Next will be calculated and where the existing grouping of data disaggregated clusters (Puzyn, Mostrag-Szlichtyng, Gajewicz, & Skrzyński Michałand Worth, 2014). And also randomly determined cluster centers (Li, Song, Wei, Lu, & Zhu, 2015). The parameters to be entered when using K-means cluster algorithm is a village and district in which the steps start from using equation in the below : a. The use of the formula 1 is for the numeric and alphabetic method in which to excel formula will produce initialization each set of data based on the code, followed by the use of the second formula wherein this formula 2 is used to select k data as a centroid. After that calculate the shortest distance to each cluster using a formula 3 formula d2 into sub-districts code has been initialized, i3 which is a random cluster on c1, e4 which the village code has been initialized and j3 which is a random cluster on c1. Then to find the value of c2 keep using the formula 3 only values i3 and i4 replaced j3 and j4 which is a random cluster on c2, and so on c3 just change the value of the cluster on c3.

Data Source
In this study, data sources took on PTSL project data in ATR / BPN Sidoarjo office and what is shown only part of the available data. and can shown in Figure 2

Data Processing
Before the data are grouped according to the criteria, the raw data is converted by initializing the data into a number using the alphabet and numerical methods. And also from the raw data, only 2 data is taken as the testing center, the villages, and districts. Figure 1 is the result of initialization using the alphabet and numerical methods by using formula 1.   table 1 is the result of initializing data using alphabetical and numerical methods. Based on the ranges specified in the data ptsl such as table 2, it can be concluded that the grouping of data ptsl to the assessment center without using the method but random selection.

Process data
Once the data is processed, the next step is data processed to form a group of data. Modified data will be processed using the k-means clustering. And steps in accordance with figure 3.

Figure 3 process data
At this stage, the raw data will be processed initialized using alphabetic and numeric methods in Excel using formulas vlookup like in formula 1, then determine the number of clusters selected at random. Then do the calculation of each object to the centroid distance using the formula in section method, and the following data is generated as in Table 3.  Then the placement of the membership is done, at least the minimum squared distance and from a distance so that any data in any cluster. With the formula in the formula 4, formula 5, formula 6. Thus generating the data in Table 4. Then the new cluster center grouping is done using the formula 7, formula 8, formula 9, formula 10, formula 11, formula 12 and produce the data in Table 5. 1 Then calculate the distance between the center of the cluster between c1 and c2, c1 to c3, and c2 to c3. The model is the same model but here formula 15, formula 16, formula 17,formula uses the value of each cluster. So it can be seen within the cluster center values in Table 6.
√(1 − 2) 2 + (1 − 4) 2 In equation 13 it is explained about the cluster distance formula where the root value of cluster 1 is subtracted by the value of cluster 2 with the power of 2 so that the value in table 6 is generated.
(1 − 4) 2 + (1 − 6) 2 In equation 14 it is explained about the cluster distance formula where the root value of cluster 1 is subtracted by the value of cluster 3 with the power of 2 so that the value in table \ref{tabel6} is generated.
√(2 − 4) 2 + (4 − 6) 2 In equation 15 it is explained about the cluster distance formula where the root value of cluster 2 is subtracted by the value of cluster 3 with the power of 2 so that the value in table 6 is generated. And the sum of all distances between the cluster centers. Here the ratio results equal to 0 so that the interaction does not need to be continued for grouping data can already be found.
And the results of the existing data in tables 1 -7 above can be included in the application RapidMiner for testing and produced.

Needs Analysis
The first phase in this analysis is the analysis of needs, which will be a key requirement needed for the process that will assist in processing. Here it will be reviewed whether the raw data will be compiled in accordance with the expected demand or not. It is made to work in a future position error does not occur.

Data processing using the K-Means Clustering method
In the data K-Means Clustering algorithm that has been initialized using numerical and alphabetical method. The data will then be grouped in the process by specifying the desired cluster. Then calculate the distance in the center of the cluster group selected at random using the formula 3.
Then specify the gelatin find the placement of any data in any group using a formula that will seek the minimum distance and the squares of the distances. After performing results, the center of the cluster will be determined in order to generate value for the distance between the centers of clusters will produce the BCV which is used to find the value of the ratio. Where the value of this ratio to determine whether the data will be back or not. The results of the measurements in the table will be done correctly using rapid miner K-Means. Ratio 0 And the results of RapidMiner can see in the table 8, the result is the same as the amount of data in accordance with the amount of previous data and can be seen from Figure 6. In this experiment get the appropriate results to answer each identification problem where file archive data in the ATR / BPN Sidoarjo office can be managed appropri-ately and in accordance with desired expectations. File management using alphabet-ical, numerical and K-means grouping methods can help control data so that it can be seen in experiments. In the application, it strongly supports the results expected to produce a clear grouping of data in accordance with the tests that have been carried out. so that after all repairs and management are carried out, the results can be used as new references for application in the form of applications and can also be applied to case studies on file management.

SUGGESTION
Suggestions for further research developers may be to manage operational vehicles and also grouping archives using Android, which can be easier and simpler to use and does not need to be managed using a computer or laptop media.