Automatic Data Transformation Using Large Language Model – An Experimental Study on Building Energy Data

—Existing approaches to automatic data transformation are insufficient to meet the requirements in many real-world scenarios, such as the building sector. First, there is no convenient interface for domain experts to provide domain knowledge easily. Second, they require significant training data collection overheads. Third, the accuracy suffers from complicated schema changes. To address these shortcomings, we present a novel approach that leverages the unique capabilities of large language models (LLMs) in coding, complex reasoning, and zero-shot learning to generate SQL code that transforms the source datasets into the target datasets. We demonstrate the viability of this approach by designing an LLM-based framework, termed SQLMorpher , which comprises a prompt generator that integrates the initial prompt with optional domain knowledge and historical patterns in external databases. It also implements an iterative prompt optimization mechanism that automatically improves the prompt based on flaw detection. The key contributions of this work include (1) pioneering an end-to-end LLM-based solution for data transformation, (2) developing a benchmark dataset of 105 real-world building energy data transformation problems, and (3) conducting an extensive empirical evaluation where our approach achieved 96% accuracy in all 105 problems. SQLMorpher demonstrates the effectiveness of utilizing LLMs in complex, domain-specific challenges, highlighting the potential of their potential to drive sustainable solutions


I. INTRODUCTION
A recent study [1] showed that in 2022, the end-use energy consumption by the building sector accounted for 40% of total US energy consumption.It indicates that the energy management of buildings plays an important role in meeting the goals of energy sustainability [2].Automatic building energy management, including design, certification, compliance, real-time control, operation, and policy-making, requires the integration of data from diverse sources in both the private and public sectors.Harmonizing these data, as illustrated in Fig. 1, remains a manual process.Extensive labor and expertise are thus required throughout the data lifecycle in the building sector.However, the state-of-the-art data transformation tools, such as Auto-Transform [3], Auto-Pipeline [4], and Auto-Tables [5], are not effective due to the following gaps: • These tools are not publicly available and are based on supervised learning approaches, requiring non-trivial data labeling and training overheads.
• The data transformation logic in the building sector involves multiple combinations of aggregation, attribute flattening, merging, pivoting, and renaming relationships between the source and the target.They are more complicated than existing data transformation benchmarks [4], [5].In addition, the accuracy achieved by the state-of-art tools on these simpler benchmarks is below 80% [4], [5], indicating it still requires human efforts to fix a significant portion of the cases.
• Converting a building dataset to a target schema requires domain knowledge about both source/target schemas, which are available in domain-specific knowledge as illustrated in Fig. 4 and Fig. 6.However, there is no easy way to directly supply such knowledge in existing data transformation tools.
To close the gaps, before this work, we once considered finetuning a pre-trained transformer model like BERT [6] to directly transform source data to target data [7].However, we identified many shortcomings of this approach.First, it is hard to formulate one unified predictive problem to transform data for all types of schema changes.Second, the transformation process is slow to handle large-scale data.Third, preparing a fine-tuning dataset for each task could also be challenging.
This work proposes a novel and better approach, termed SQLMorpher, which solves the problem in two steps.The first step is formulated as a Text2SQL problem [8]- [11].We apply the LLM model to generate Structured Query Language (SQL) code that converts the source dataset(s) into the target dataset.This step focuses on schema mapping, so we do not need to upload the entire source datasets.The second step applies the generated SQL code to efficiently transform the entire dataset in relational databases.
The idea is motivated by several key observations: (1) LLMs demonstrate superior performance in complex reasoning tasks.In the building sector, domain experts often document the semantics of the source and the target tables in natural language.LLMs can better understand such descriptions and reason the relationships between the source and target than smaller pre-trained models.(2) LLMs have demonstrated strong coding and code explanation capability [12].In addition, SQL's declarative nature makes it easier to map data transformation queries in natural language to SQL queries.( Existing Text2SQL works [8]- [11] focus on selection queries, but cannot handle creation and modification queries.Furthermore, the utilization of LLMs for our target scenario is not only unique but is also faced with new challenges: • Schema Change Challenge: Different from existing Text2SQL works, SQLMorpher needs to generate the query that maps data from the source schema to the target schema.
• Prompt Engineering Challenge: Designing a unified prompt to handle different types of schema changes and data transformation contexts is boring and tedious.
• Accuracy Challenge: Most importantly, the code generated by LLMs could be error-prone and even dangerous (e.g., leading to security concerns such as SQL injection attacks).
To address these challenges, the proposed system, as illustrated in Fig. 2, consists of the following unique components: First, a unique prompt generator is designed to provide a unified prompt template.It allows external tools to be easily plugged into the component, such as domain-specific databases, vector databases that index historical successful prompts, and existing schema change detection tools [13]- [15] to retrieve various optional information.The prompt generator compresses the prompt size by using a few sample data to replace the source datasets for generating the SQL code applicable to transforming the entire source datasets.
Second, an automatic and iterative prompt optimization component executes the SQL code extracted from the LLM response in a sandbox database that is separated from user data.It also automatically detects flaws in the last prompt and adds a request to fix the flaws in the new prompt.Examples of the flaws include errors mentioned in the last LLM response, the errors that occurred when executing the SQL query generated by the LLM, as well as insights extracted from these errors based on rules.
Our Key Contributions are summarized as follows: • We are the first to apply LLMs to generate SQL code for data transformation.Our system, termed SQLMorpher, includes a prompt generator that can be easily integrated with domain-specific knowledge, high-level schema-change hints, and historical prompt knowledge.It also includes an iterative prompt optimization tool that identifies flaws in the prompt for enhancement.We implemented an evaluation framework based on SQLMorpher.(See details in Sec.III) • We set up a benchmark that consists of 105 real-world data transformation cases in 15 groups in the smart building domain.We document each case using the source schema, the source data examples, the target schema, available domainspecific knowledge, the schema hints, and a working transformation SQL query for users to validate the solutions.We made the benchmark publicly available to benefit multiple communities in smart building, Text2SQL, and automatic data transformation 1 2 .(See details in Sec.IV-B) • We have conducted a detailed empirical evaluation with ablation studies.SQLMorpher using ChatGPT-3.5-turbo-16Kachieved up to 96% accuracy in 105 real-world cases in the smart building domain.We verified that our approach can generalize to scenarios beyond building energy data, such as COVID-19 data and existing data transformation benchmarks.We also managed to compare SQLMorpher to state-of-the-art data transformation tools such as Auto-Pipeline (though these tools are not publicly available) on their commercial benchmark.The results showed that SQLMorpher can achieve 81% without using any domain knowledge, and 94% accuracy using domain knowledge, both of which outperform Auto-Pipeline's accuracy on this benchmark.We also summarized a list of insights and observations that are helpful to communities.(See details in Sec.IV)   [8]- [11] automatically generate SQL code to answer text-based questions on relations.However, existing Text2SQL tools focus on generating selection queries.According to our knowledge, there do not exist any Text2SQL tools that support modification queries (e.g., insertions) that are required by data transformation.In addition, we surveyed multiple Text2SQL benchmarks including Spider [16], SQUALL [17], Criteria2SQL [18], KaggleD-BQA [19], and so on.However, we didn't find any data transformation use cases in these benchmarks, which also indicates that data transformation problems are not the focus of today's Text2SQL research.
Existing automatic data transformation [3]- [5], [20]- [28] fall in two categories: (1) Transform-by-Example (TBE) [20]- [27] infers transformation programs based on user-provided input/output examples, which have been incorporated into popular software such as Microsoft Excel, Power BI [29], and Trifacta [30].However, these works require users to provide examples of the transformed tuples, which is challenging for complicated data transformations.(2) To address the issue, Transform-by-Target (TBT) [3]- [5] is recently proposed.Works in this category, such as Auto-Transform [3], Auto-Pipeline [4], and Auto-Tables [5], transform data based only on input/output data schemas and optionally output data patterns.As mentioned, they learn a pipeline of data transformation operators using deep learning.They cannot easily integrate domain-specific knowledge represented in natural language or other formats.Although those tools are not publicly available, we conducted a comparison by running our approach on their benchmark, as detailed in Sec.IV-F.

III. SQLMorpher SYSTEM DESIGN
As illustrated in Fig. 2, the SQLMorpher system consists of a prompt generator, a large language model (LLM), a SQL execution engine, and a component for iterative prompt optimization.In this section, we describe each component in detail.Although SQLMorpher is primarily engineered to evaluate the LLM in our target use scenarios, it is a first-of-akind design that has research values in defining the workflows and the interfaces between LLM and external tooling for the unique data transformation problem.

A. Prompt Generation
We designed a prompt template as illustrated in Fig. 3.The naive user must provide minimal information, such as the source and target table schemas and examples of the tuples in the source dataset.Although a source table could contain many tuples, SQLMorpher only demonstrates to the LLM a few examples, which are sufficient to generate code for correctly transforming the whole table.Despite the sampling techniques that can be applied here, we chose to randomly sample 5 source tuples in the evaluation.If source tuples are not available, we asked the LLM to generate 5 source tuples.
All other information is optional but is helpful for complicated transformation cases.We designed the prompt generator to retrieve additional information from external databases easily.Such information includes: (1) Domain-specific information, which explains the semantics of each attribute in the source table and the target table.Given LLMs' diverse and ocean-volume training corpus, such explanations are not required for many domains, and so it is marked as optional.However, we found that using domain knowledge to enhance the prompt could be critical for many smart building data transformation cases.This information can be retrieved from a domain-specific database, as illustrated in Fig. 4. In this example, basic information plus the domainspecific information that explains only the target table is sufficient to address the first example, as shown in Fig. 1a.
Response to the Basic Prompt (a) Basic Prompt and its response has only one error in the aggregation function, which should be SUM rather than MAX

We append the below domain-specific explanation to the basic prompt:
Note that the second table records the total load in each hour each day.

Response to the Prompt with Domain Knowledge
--Insert data from source1 into target1 INSERT INTO target1 (CST, 1:00, 2:00, 3:00, 4:00, 5:00, 6:00, 7:00, 8:00, 9:00, 10:00, 11:00, 12:00, 13:00, 14:00, 15:00, 16:00, 17:00, 18:00, 19:00, 20:00, 21:00, 22:00, 23:00, 24  (2) Schema change hints suggests how the source schema is mapped to the target schema.Given the strong semantic reasoning capability of LLMs, hints are also optional.We found that some high-level hints, such as "use aggregation", are sufficient for LLMs to generate correct Group-By clause and aggregation functions in most scenarios.As illustrated in the brown box in Fig. 5, such information can be provided by (a) a rule engine that analyzes domain-specific databases as illustrated in Fig. 6, (b) a schema mapping tool such as Starmie [13], or (c) even an LLM itself (e.g., using a separate LLM prompt that asks the LLM to identify schema changes between the source and the target).Fig. 6 illustrates the example information that is available in a domain-specific database for smart buildings that can be leveraged to generate schema change hints.In this experimental study, most schema change hints are derived from the domain-specific databases as illustrated in Fig. 4 and Fig. 6.
(3) Demonstrations add a few examples of historical promptresponse pairs to the prompt to perform few-shot learning.It is critical that the demonstrating prompts need to be similar to the current prompt.For SQLMorpher design, we choose to store the embedding vectors of historically successful prompts in a vector database, such as Faiss for top-k nearest-neighbor search, as illustrated in Fig. 5.In this example, the prompt will fail unless it includes both the schema change hints (in the brown box) and the demonstration (in the purple box).We used the ChatGPT 3.5-turbo-16k model API in August 2023 to generate all examples in this section.
To retrieve the various types of information to augment the prompt, the SQLMorpher design includes a callback system.Each type of information corresponds to an event, and the user can register one or more callback functions with an event.Each callback function is expected to return a JSON object that specifies the retrieved information as well as a status code and error message that specifies connection or execution errors, if any.When generating a prompt, SQLMorpher will go through all types of information, and invoke all callback functions associated with each information type.

B. SQL Execution
Compared to existing Text2SQL that focuses on selection queries that are read-only, leveraging LLM to generate modification queries is more complicated, partially because running the generated query may raise security concerns.In the initial iteration for a given user request, the system automatically duplicates the source dataset in a separate PostgreSQL database that serves as a sandbox environment to isolate the errors, if the duplicate does not exist.This is to ensure that the generated code will not corrupt the source dataset.Then, the script creates the target table.Finally, it runs the generated query to transform the entire source dataset into the target format and insert all transformed tuples into the target table.If another iteration is needed, e.g., the response cannot pass the validation tests, the target table will be removed or archived before running the next iteration.

C. Validation
The validation in the production environment could be challenging due to the lack of ground truth.It needs an automatic quality measurement (e.g., unit test cases, self-consistency, or accuracy of downstream tasks) for the transformed data, which we leave for future work to address.
In this work, we manually prepare the ground truth transformation queries for each transformation case in the experimental environment.At the validation stage, the ground truth transformation query will be executed against the source We designed a validation script, which compares the generated target table to the ground truth target table.The comparison first validates whether two tables have the same number of attributes and tuples.Then, it performs attributereordering and tuple-sorting to ensure two tables share the same column-wise and row-wise orderings.Furthermore, the script will compare the similarity of the values for each attribute in the two tables.We use the ratio of the number of equivalent values (difference should be less than e −10 ) to the total number of values to measure the similarity of numerical attributes.We use the Jaccard similarity to measure the similarity of categorical and text attributes.We average the similarity for each attribute to derive an overall similarity score.If the similarity score is below 1, the validation fails.

D. Iterative Prompt Optimization
This component is incorporated to evaluate the LLM's potential self-optimization capability for the data transformation problem.If the validation fails, the prompt will be automatically augmented by identifying errors in the prompt: (1) errors mentioned in the LLM response or met when executing the generated transformation query; (2) errors detected in the transformed dataset, e.g., reporting the difference between the schema of the transformed dataset and the target schema; (3) inconsistency between the schema change hint and the response query, e.g., reporting if the hint specifies to use aggregation, but no Group-By or aggregation functions have been used.Then, these errors will be appended to the prompt, and the new prompt will be sent back to the LLM, and it will repeat this process until it passes the validation, the maximum number of iterations has been reached, or the new prompt has no difference with the last prompt.
An example of useful prompt flaws that we observed in those cases is "ERROR: INSERT has more expressions than target columns LINE 100: PCT HOURLY 2500".Before adding this error to the prompt, ChatGPT cannot correctly handle an attribute that exists in the source table but not in the target table, PCT HOURLY 2500.Adding the error to the prompt will resolve the problem.

IV. EXPERIMENTAL EVALUATION
In this section, we first describe the goal of the comparison study and all baselines that were used.Then, we present a benchmark, which is the first benchmark for smart building data standardization problems.We further describe the setup of the experiments and the evaluation metrics.Ultimately, we will present and analyze the results and summarize key findings.

A. Comparison and Baselines
In this work, we mainly compare the effectiveness of six different types of initial prompt templates: • Prompt-1: Basic prompt with domain-specific description for the target schema.• Prompt-2: Prompt-1 with a domain-specific description for the source schema.• Prompt-3: Prompt-2 with schema change hints.
• Prompt-3+Demo: Prompt-3 with one demonstration.The first three prompt templates are designed for zero-shot learning when there does not exist a database of abundant historical working prompts.The last three prompt templates are designed for one-shot learning.
We also considered comparing our approach to Auto-Pipeline [4], which is a state-of-the-art automatic data transformation tool that only requires schema and tuple examples of the source and target tables and applies deep reinforcement learning to synthesize the transformation pipeline.

B. Benchmark Design 1) Building Energy Data Transformation:
We collected 105 data transformation examples in the smart building domain from 21 energy companies in the United States.These examples are divided into 15 groups so that each group has one target dataset and multiple source datasets of different types.Each source needs to be converted to the target format in the group.The groups are described in Tab.I.In Tab.II, we further show more statistics of the 105 test cases by groups: (1) the number of distinct SQL keywords used in the ground truth query and (2) the length (i.e., number of characters) of the ground truth query.For each group, we compute the average of the above metrics for all cases in the group.
We document the following information in the benchmark: (1) target schema and domain-specific explanations for attributes; (2) for each source dataset, its schema, domainspecific explanation of attributes, examples of instances, schema change hints for transforming the source table to the target format, and the ground truth query that transforms the source to the target.The benchmark dataset is open-sourced in a GitHub repository 1 .
2) Other Benchmarks Used: We also used two other benchmarks that go beyond the smart building data transformation for different purposes.One commercial benchmark consists of 16 cases used by the Auto-Pipeline baseline.Since Auto-Pipeline code is not publicly available, we apply our proposed approach (without and with domain-specific knowledge) to the benchmark and compare the results.
Another benchmark consists of four COVID-19 data transformation cases, which we used to validate further how well our methodology can generalize to other data transformation scenarios.It includes all four transformation cases observed in the Github commit history of a widely used real-world COVID-19 data repository maintained by John Hopkins University [31].The attributes in the target data are (Province/State, Country/Region, Last Update, Confirmed, Deaths, Recovered), which represent the state-level COVID-19 statistics.The source schemas of the first two cases involve county-level data with different numbers of columns, and the latter two cases involve state-level data with different column names and different numbers of columns.
Overall, we have tested 125 cases in three benchmarks, among which, 27 cases involve attribute merging, 89 cases involve attribute name changes, 32 cases involve pivoting, 5 cases involve attribute flattening, 50 cases involve group-by and aggregation, 8 cases involve join.

C. Evaluation Metrics
We report the following metrics in the experimental study: • Execution Accuracy: This metric is defined as the ratio of the number of correctly transformed cases to the total number of transformation cases.For each case, if the LLM can return the correct transformation query that passes the experimental validation tests as described in Sec.III-C within 5 iterations, it is considered a correctly transformed case.
• Column Similarity: We compute the similarity score for each column in the transformed dataset and its corresponding column in the ground truth target dataset (defined in Sec.III-C).As detailed in Sec.III-C, we compute a similarity score for each column.We further define the column similarity per case as the average similarity scores of all target attributes in the case, the column similarity per group as the average similarity scores of all cases in the group, and the overall column similarity as the average similarity scores of all cases in all groups.The similarity score is set to zero for cases that fail to generate output data for similarity comparison.
• Number of Iterations to Success: For each case, we record the number of iterations used to achieve the correct response for the case.We record the average number of iterations to success for all successful cases that achieved a column similarity score of 1.0 within 5 iterations in each group, and in all groups.The latter is termed as the overall number of iterations to success.

D. Experimental Setups
We implemented the end-to-end workflow as illustrated in Fig. 2 in a Python script that uses the ChatGPT-3.5-turbor-16Kmodel.We did not present results on ChatGPT-4, because the corresponding OpenAI API has a limit of 4K bytes for the total prompt-response size at this point, while this size is insufficient for a significant portion of real-world cases.For example, tables in Group 10 to Group 15 have up to 152 attributes, leading to a large prompt size.We set the temperature to zero to avoid randomness for several reasons.First, a primary goal of this work is to evaluate the effectiveness of LLMs on data transformation tasks by using different types of initial prompts and the effectiveness of iterative prompt optimization.Random responses require additional methods (e.g., majority voting) for self-consistency, which will complicate the comparison.Second, setting the temperature to zero will achieve better quality results in most cases, according to a recent OpenAI article [32].All SQL codes are run on PostgreSQL database version 15.0 for validation.All descriptions for source and target attributes are obtained from a domain-specific database 3 .

E. Smart Building Data Transformation Results
1) Overall Results.: The zero-shot learning results for the smart building data transformation benchmark are illustrated in Tab.II.Using Prompt-3, our proposed SQLMorpher methodology achieved an execution accuracy of 96%, which is significantly higher than Prompt-1 and Prompt-2, which achieved an execution accuracy of 28% and 36%, respectively.It demonstrated the importance of supplying domain-specific knowledge, particularly schema change hints, as part of the prompt to the LLM.The observation justifies the integration of the LLM with the domain-specific knowledge base and the schema mapping tools for data transformation pipelines.
2) Effectiveness of One-shot Learning.:For the four cases that failed with Prompt-3, we applied Prompt-4, Prompt-5, and Prompt-6 to check whether providing one demonstration example that involves a similar prompt and a correct response can improve the LLM response.The results are illustrated in Tab.III, which showed that using prompts that combine domain-specific knowledge and demonstration is capable of solving all four complicated cases that failed with Prompt-3.
3) Effectiveness of the Iterative Optimization Process.: Compared to Prompt-1 and Prompt-2, we have found that Prompt-3 can gain significantly more from iterative prompt optimization.When using Prompt-1, five cases in three groups, Group-1, Group-4, and Group-7, benefit from iterative prompt optimization, the average number of iterations being 1.2, 1.3, and 2.0, respectively, as illustrated in Tab.II.Other groups either have all cases passed in one iteration or have all cases failed.When using Prompt-2, four cases in three groups, Group-2, Group-3, and Group-7, require more than one iteration to succeed, the average number of iterations being 1.4, 1.2, and 1.2, respectively.When using Prompt-3, 10 cases in six groups, require more than one iteration to succeed.It means that 9.5% of total cases can benefit from iterative prompt optimization when using Prompt-3.

F. Results on Benchmarks Beyond Smart Building.
First, we tested our approach on the COVID-19 benchmark.The results are illustrated in Tab.IV, which showed that our proposed methodology resolves all four cases simply using the basic prompt (Prompt-1).
Second, we also compared our proposed approach with the Auto-Pipeline approach, using its commercial benchmark [4].The results are illustrated in Tab.V.It showed that our proposed methodology achieved perfect execution accuracy on all 16 transformation problems in their benchmark only using the basic prompt, without additional domain-specific knowledge.The execution accuracy achieved by Auto-Pipeline on this benchmark is below 70% [4].The comparison implies that our approach has great potential to outperform state-ofthe-art automatic data transformation tools.

G. Summary of Key Findings
• Large language models are promising to automatically resolve complicated smart building data transformation cases if domain-specific knowledge is available and easily retrievable.We achieved 96% accuracy on our proposed benchmark, which consists of 105 real-world smart building cases.
• Our SQLMorpher methodology is promising in generalizing to other data transformation cases and outperforming stateof-the-art automatic data transformation tools that do not rely on LLMs.In particular, our methodology defines clean interfaces for integrating domain-specific knowledge into the data transformation process through the prompt generation process.This is a missing feature in state-of-the-art data transformation tools.The evaluation results on the commercial benchmark used by Auto-Pipeline showed that our approach, even without using any domain-specific knowledge, could achieve significantly better execution accuracy than Auto-Pipeline (81% vs. 69%).One observation is that while LLM generates SQL code, Auto-Pipeline attempts to learn a pipeline of data transformation operators.The latter has a more limited search space, which may affect the execution accuracy.
• Compared to other domain-specific knowledge, a high-level schema change hint, such as column mapping relationships or instructions as simple as "use aggregation", is critical to the success of our proposed methodology.
• We further classify each of 125 cases from all three  benchmarks into one or more schema change types.We then count the execution accuracy for each type of schema change, as illustrated in Fig. 7.We observe that while Prompt-3 with schema change hints can handle all schema change types well, Prompt-1 and Prompt-2 without schema change hints achieved relatively better accuracy (40% to 100%) for the attribute name change, attribute flattening, and join than other types of changes, such as attribute merging, attribute pivoting, and group-by/aggregation.This further verified the importance of incorporating high-level schema change hints such as "use aggregation" and "use pivoting".
• Zero-shot learning is effective in resolving most data transformation problems investigated in this work.Few-shot learning can resolve the difficult cases that fail with zero-shot.
• The iterative optimization framework that simply enhances the prompt using ChatGPT reported errors or SQL execution errors for each iteration can benefit 9.5% of cases when using Prompt-3 and 5% of cases when using Prompt-1 and 2.
• The examples in our proposed building energy data transformation benchmark are significantly more complicated than existing benchmarks in terms of the number of distinct keywords and the length of the transformation query.They are used in the real world but are missing in existing data transformation benchmarks [4], [5].

V. CONCLUSION AND FUTURE WORKS
In this work, we pioneered the experimental and feasibility study about applying LLMs to data transformation problems.We proposed a novel SQLMorpher approach using LLM to generate SQL modification queries for data transformation.SQLMorpher is designed to incorporate domain knowledge flexibly and optimize prompts iteratively.We provided a unique benchmark for building energy data transformation, including 105 real-world cases collected from 21 energy companies in the United States.The results are promising, achieving up to 96% accuracy on the benchmark.In addition, we have found our system can generalize to scenarios beyond building energy data.The commercial benchmark results demonstrate that our approach is able to outperform existing automatic data transformation techniques significantly.In summary, SQLMorpher is promising to enable the automatic integration of diverse data sources for building energy management and may benefit other domains.In the future, we will design quality control for SQLMorpher to further reduce human validation involvement in the production environment.

Fig. 5 .
Fig. 5.A Working Prompt for a Real-World Case (Case 100 in Group 14 in Tab.I).

Fig. 7 .
Fig. 7.The overall execution accuracy of cases in each schema change category.(We considered all 125 cases in three benchmarks; each case may involve multiple types of schema changes.)

3 )
LLMs have outstanding capabilities in zero-shot and fewshot adaptation and generalization.Therefore, none or only a few training examples are needed.

Transformation based on Attribute Merge and Attribute Name Change
MergeSource TargetFig.1.Private sectors are using diverse formats to describe building load profiles.Each profile dataset must be converted into a unified target format for each different purpose.This figure provides several simplified examples.
You are a SQL developer.Please generate a Postgres SQL script to convert the first table to be consistent with the format of the second table.First, you must create the first table named source1 with the given attributes: (datetime, cerc_logger_1), and insert 10 rows into the first table: This script follows the structure of the previous scripts but adjusts the aggregation function to calculate the total load for each hour of the day.It sums up the cerc_logger_1 values within each hour and day to populate the corresponding columns in the target1 table.The COALESCE function is used to handle cases where there's no data for a particular hour.

world Test Case (Case 100, Group 14) Schema Change Hint Hints obtained from a Domain-Specific Database:
You are a SQL developer.Please generate a Postgres SQL script to convert the first table to be consistent with the format of the second table.First, you must create the first table named $SourceTable with the given attribute names: {$source_data_schema} and Insert $k rows into the first table.{k rows of data to be inserted into the source table.}Second, you must create a second table named $TargetTable with only the given attributes: {$target_data_schema} Finally, insert all rows from the first table into the second table.{explanation for the source table schema} {explanation for the target table schema} {hints about schema changes from the source to target}

successful prompt-Response Pair for the above case
You are a SQL developer.Please generate a Postgres SQL script to convert the first table to be consistent with the format of the second table.First, you must create the first table named $SourceTable with the given attribute names: {$source_data_schema} and Insert $k rows into the first table.{k rows of data to be inserted into the source table.}Second, you must create a second table named $TargetTable with only the given attributes: {$target_data_schema} Finally, insert all rows from the first table into the second table.{explanation for the source table schema} {explanation for the target table schema} {hints about schema changes from the source to target}

Table 1 .
table, resulting in the target table, which is called the ground truth Appendix E: FSEC (red font means enduses not used for comparison; green font means enduses used for comparison; Enduse Categorization At the same time, by executing the target transformation query contained in the LLM response, as described in Sec.III-B, we can also obtain a target table, which is called the generated target table.
target table.

TABLE V COMPARISON
TO AUTO-PIPELINE ON THEIR COMMERCIAL BENCHMARK #KEYWORDS AVG.: 8, LENGTH AVG.: 566