| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
1 Department of Carcinogenesis, The University of Texas M. D. Anderson Cancer Center, Science Park-Research Division, Smithville, Texas; 2 Ortho-Clinical Diagnostics, San Diego, California; 3 Human Genetics and Biostatistics, David Geffen School of Medicine, University of California, Los Angeles, California; and 4 Department of Pathology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas
Requests for reprints: C. Marcelo Aldaz, Department of Carcinogenesis, The University of Texas M. D. Anderson Cancer Center, Science Park-Research Division, P.O. Box 389, Smithville, TX 78957. Phone: 512-237-2403; Fax: 512-237-2475. E-mail: maaldaz{at}mdanderson.org
| Abstract |
|---|
|
|
|---|
2.7 million tags to perform unsupervised statistical analyses to obtain the molecular classification of breast-invasive ductal carcinomas in correlation with clinicopathologic features. Unsupervised statistical analysis by means of a random forest approach identified two main clusters of breast carcinomas, which differed in their lymph node status (P = 0.01); this suggested that lymph node status leads to globally distinct expression profiles. A total of 245 (55 up-modulated and 190 down-modulated) transcripts were differentially expressed between lymph node (+) and lymph node (–) primary breast tumors (fold change,
2; P < 0.05). Various lymph node (+) up-modulated transcripts were validated in independent sets of human breast tumors by means of real-time reverse transcription-PCR (RT-PCR). We validated significant overexpression of transcripts for HOXC10 (P = 0.001), TPD52L1 (P = 0.007), ZFP36L1 (P = 0.011), PLINP1 (P = 0.013), DCTN3 (P = 0.025), DEK (P = 0.031), and CSNK1D (P = 0.04) in lymph node (+) breast carcinomas. Moreover, the DCTN3 (P = 0.022) and RHBDD2 (P = 0.002) transcripts were confirmed to be overexpressed in tumors that recurred within 6 years of follow-up by real-time RT-PCR. In addition, meta-analysis was used to compare SAGE data associated with lymph node (+) status with publicly available breast cancer DNA microarray data sets. We have generated evidence indicating that the pattern of gene expression in primary breast cancers at the time of surgical removal could discriminate those tumors with lymph node metastatic involvement using SAGE to identify specific transcripts that behave as predictors of recurrence as well. (Mol Cancer Res 2007;5(9):881–90) | Introduction |
|---|
|
|
|---|
By means of DNA microarray analyses, various laboratories identified gene expression patterns that correlated with breast cancer patient prognosis (1-9). In spite of the described progress in molecular oncology, invasion into axillary lymph nodes and steroid hormone receptors status still remain as the most reliable prognostic factor for breast cancer patients (10).
Development of metastases (local and distant) requires that a cancer cell must complete a series of steps involving complex interactions with the host microenvironment. This process involves the dysregulation of multiple genes and transcriptional programs. The primary goal of this study was to identify gene expression signatures of relevance for breast cancer subclassification and prognosis. We analyzed a high-resolution Serial Analysis of Gene Expression (SAGE) database obtained from 27 breast-invasive ductal carcinomas (IDCA). A random forest (RF) clustering approach was used for SAGE data analysis (11, 12). This unsupervised analysis of gene expression profiles grouped breast carcinomas predominantly according to their lymph node status. This suggests that lymph node status leads to globally distinct breast cancer gene expression profiles.
The identification of gene expression profiles, individual biomarkers, and biological pathways that contribute to the development of lymph node metastases will be of significant benefit to improve tumor classification and may, in the future, influence clinical decision making and the development of targeted therapies.
| Results and Discussion |
|---|
|
|
|---|
An unsupervised clustering method (RF clustering) allowed us to group the invasive breast carcinomas on the basis of their gene expression pattern. Two dominant clusters were identified (Fig. 1A
). To further elucidate the reasons driving the separation of breast carcinomas in two major groups, we analyzed the identified clusters in the light of available histopathologic data (see Table 1
). Interestingly, the variable that correlated with the RF clustering results was the lymph node status of tumors (P = 0.01). A total of 7 of 9 breast cancers (78%) in the cluster A are lymph node (+), and 14 of 18 breast tumors (87%) in cluster B are lymph node (–) IDCA (Fig. 1A). Nonstatistically significant differences were detected for ER status, histologic grade, and tumor size (P > 0.05). In contrast with results from previous gene expression studies, in which ER was the major discriminator between breast cancer groups, in our case, we interpret that the lack of spontaneous association between clusters and ER status in this subset of samples is likely due to that
75% of the SAGE libraries generated derived from ER
(+) stages I and II primary breast carcinomas.
|
|
We used the Expression Analysis Systematic Explorer software (EASE) to annotate the 245 deregulated genes according to the information provided by the GO Consortium (14, 15). We observed that 32% of the transcripts are involved in biological processes related to metabolism, 22% are related to cellular physiologic process, and 14% are related to cell communication. Approximately 25% of these dysregulated genes are related to molecular functions associated with nucleic acid/protein binding, 15% are related to hydrolase/transferase activity, and 4% are related to metal ion-binding functions.
Cross-Platform Gene Expression Profile Comparison
Comparing data sets generated on different gene expression platforms increases the confidence of specific gene expression classifier data sets (16). By performing a meta-analysis from publicly available breast cancer microarray studies, we provide a robust cross-platform validation of 55 up-regulated and 55 down-regulated (fold change, >3) lymph node (+)-associated transcripts (Fig. 1C and D). Meta-analysis showed that 42% of the transcripts identified by SAGE (46 out of 110) were confirmed as having statistically significant up- or down-modulation in relation to lymph node (+) status (9 genes), distal metastasis (26 genes), and relapse (29 genes; Table 2
, Supplementary Data Files 2 and 3). The lack of 100% overlap of findings between the various studies including ours is not surprising when it is considered that these studies have been done with different technologies (cDNA or various oligonucleotide microarrays), different number of genes in the various fixed platforms, different and heterogenous patient populations (with regard to age, tumor staging, hormone receptor status, and treatment). Nevertheless, we show that a significant proportion of lymph node (+)-associated transcripts detected by our SAGE study behave as poor prognostic markers. More importantly, SAGE, an open gene expression platform, also identified novel sets of genes as highly expressed in lymph node (+) primary breast carcinomas not previously reported by others.
|
(CSNK1D; P = 0.04; Fig. 2A
). A trend of borderline significance was detected for the rhomboid domain containing 2 (RHBDD2; P = 0.069; Fig. 2A). Hierarchical clustering analysis of the validated transcripts successfully classified tumors according to patient's lymph node status (P < 0.05), distinguishing the lymph node (+) from the lymph node (–) breast carcinomas with an accuracy of 89.5% (2 out of 19 lymph node-positive samples misclassified; Fig. 3A
). Nonstatistically significant associations were detected between the expressions of these transcripts and ER
status (P > 0.05).
|
|
|
DEK was originally described as a proto-oncogene and has been implicated in multiple cellular processes, including transcriptional regulation and chromatin remodeling (20). Transcriptional up-regulation of wild-type DEK was discovered in various tumor types, including myeloid leukemia, brain tumors, and hepatocellular carcinoma (21, 22). In addition, DEK overexpression was associated with a number of clinical autoimmune conditions (23, 24). Recently, it has been suggested that DEK up-regulation may be a common event in human carcinogenesis and may reflect its senescence inhibitory function (25). Despite these associations with several human disorders, little is known about how DEK could functionally be involved in these diseases (24).
HOXC10 is one of the highly conserved HOXC family members of transcription factors that play an important role in morphogenesis, cell differentiation, and proliferation (26-28). The HOXC protein levels are controlled during cell differentiation and proliferation. Dysregulation of a variety of HOX genes has been implicated in several human cancers, including leukemias, colorectal, breast, and renal carcinomas, melanomas, and squamous cell carcinomas of the skin (26, 27). Recently, it was shown that the overexpression of HOXC4, HOXC5, HOXC6, and HOXC8 genes in malignant cell lines and prostate carcinomas with lymph node metastases (29). In agreement with these data, we validated the overexpression of HOXC10 gene in primary lymph node (+) breast carcinomas by real-time RT-PCR (P = 0.001; Fig. 2A).
ZFP36L1 (also known as C3H type-like 1) is a member of the 12-O-tetradecanoylphorbol-13-acetate (TPA)–inducible sequence 11 (TIS11) family of early-response genes. The encoded protein contains a zinc finger domain with a repeating cys-his motif (30). TIS11 gene expression is induced rapidly and transiently in response to extracellular hormone and growth factor signals. The potential role of this gene in breast carcinogenesis remains unknown.
DCTN3 and RHBDD2 as Predictors or Recurrence
As mentioned, the quantitative RT-PCR analysis validated significant differences between lymph node (+) versus lymph node (–) primary breast carcinoma groups for DCTN3 (P = 0.025), and a trend was detected for RHBDD2 (P = 0.069; Fig. 2A). However, meta-analysis comparisons further confirmed our findings showing statistically significant overexpression of DCTN3 (P = 0.003) and RHBDD2 (P = 0.042) in lymph node (+) compared with lymph node (–) breast IDCA (Fig. 1C, Supplementary Data File 3). More importantly, the DCTN3 (P = 0.022) and RHBDD2 (P = 0.002) transcripts were also observed to be markedly up-modulated in tumors that recurred within 6 years of follow-up (Fig. 2B). Unsupervised hierarchical clustering analysis of these transcripts successfully classified tumors according to recurrence status (P < 0.05; Fig. 3B). These data suggest that overexpression of DCTN3 and RHBDD2 genes could play a role in breast cancer progression.
The DCTN3 gene (also known as DCTN22) encodes the smallest (p22/24) subunit of dynactin, a cytoplasmic motor protein complex involved in organelle trafficking, cytokinesis, spindle formation, chromosome movement, and nuclear positioning (31). Overexpression in mammalian cells of one dynactin subunit (dynamitin) disrupts the complex, resulting in the perturbation of mitosis (32). In addition, DCTN2 overexpression disrupt the dynein-dynactin motor, shifting cellular movement and mitosis with predisposition to mitotic block and polyploidy (33). DCTN3 localizes to the centrosomes during interphase and to kinetochores and spindle poles throughout mitosis. It was also proposed that the dynein-dynactin complex is involved in cytoplasmatic/nuclear transport of p53 (34). The correct balance of dynactin subunits is important for adequate centrosome integrity before centrosome duplication, ultimately governing the G1-S transition.
The RHBDD2 gene (rhomboid domain containing 2) encodes a protein that spans seven-transmembrane domains and is a member of the rhomboid veinlet-like family of genes. Several rhomboid protein members in Drosophila have been implicated in the processing of transforming growth factor-
(TGF-
)–like ligands, and consequent epidermal growth factor (EGF) receptor activation (35). Genetic and molecular studies have revealed that the production of an activated EGF ligand by the signal-sending cell is a key regulatory step in receptor activation (36). Thus, the RHBDD2 protein very likely functions in regulating the response to growth factors. However, the potential role of this protein in breast carcinogenesis remains to be elucidated.
Tissue Microarray Immunohistochemical Analysis of DCTN3 Protein Expression
Because DCTN3 was identified by real-time RT-PCR as distinctively overexpressed in lymph node (+) primary breast carcinomas and in IDCA that recurred within 6 years, we decided to investigate further this gene at protein expression level using a breast cancer progression tissue microarray (Fig. 4
).
|
|
| Conclusions |
|---|
|
|
|---|
Gene expression profiling will not necessarily replace classic approaches to predict the outcome; however, it will likely add substantial information that may help in better defining breast cancer outcome classes. The identification of individual proteins is also of high relevance not only for the potential value as prognostic biomarkers but also may provide insight into mechanisms and pathways of relevance in breast cancer progression. Nevertheless, given the molecular heterogeneity of breast cancer, further global and individual gene expression studies are needed to reliably discriminate breast cancer subgroups of value for determining outcome. Results of this study will provide novel insights into the molecular biology underlying breast cancer lymph node metastasis and recurrence.
| Materials and Methods |
|---|
|
|
|---|
100,000 SAGE tags per library). Table 1 shows histopathologic characteristics of the specimens analyzed. For the generation of SAGE libraries, snap-frozen samples were obtained from the M.D. Anderson Breast Cancer Tumor Bank, and SAGE analysis was done as previously described (37, 38).
Data Processing and Statistical Analysis of SAGE Libraries
SAGE tag extraction from sequencing files was done using the SAGE2000 software version 4.0 (kindly provided by Dr. K. Kinzler, John Hopkins University). SAGE data management, tag to gene matching, as well as additional gene annotations and links to publicly available resources such as GO, UniGene, RefSeq, were done using a suite of Web-based SAGE library tools developed by us.5
Our analysis of data involved the following steps: (a) use of unsupervised RF clustering to group the patients based on their SAGE expression profiles; (b) investigate potential associations with multiple histopathologic variables; (c) identification of differentially expressed transcripts between clusters; (d) gene ontology analysis of the resulting transcripts.
We propose to use the RF clustering for SAGE data analysis because it has several relevant theoretical advantages. First, the RF dissimilarity approach handles mixed covariate types well, i.e., it can handle ordinal and continuous covariates in an unbiased way: the more related the covariate is to other covariates, the more it will affect the definition of the RF dissimilarity. Second, the clustering results do not change when one or more covariates are monotonically transformed because the dissimilarity only depends on the feature ranks. Third, the RF dissimilarity does not require the user to specify threshold values for dichotomizing tumor expressions. For the detailed description of RF clustering algorithm, consult Breiman (11) and Shi and Horvath (12). Briefly, the RF clustering procedure is carried out as follows. The RF dissimilarity is used to represent each patient as a point in a two-dimensional space with the aid of multidimensional scaling (39, 40). The distances between the points are used in partitioning around medoids clustering. The number of clusters is chosen by visually inspecting multidimensional scaling plots.
We tested whether variables differed across groups using the Fisher's exact test. All P values were two sided, and P < 0.05 was considered significant. RF clustering and the analyses described above were carried out with the freely available software R (41).
To identify differentially expressed transcripts between clusters, we used a modified t test. This test is based on a beta binomial sampling model that takes into account both the intra-library and the inter-library variability, thus identifying common patterns of SAGE transcript tag changes systematically occurring across samples (13).6
For automated functional annotation and classification of genes of interest based on GO terms, we used the EASE Web-based software resource (14).7
Meta-analysis of Breast Cancer Microarray Data Sets
To identify and validate the most reliable set of genes able to discriminate primary breast carcinomas based on their lymph node status, we did a cross-platform comparison between the described SAGE data set with previously reported breast cancer studies based on DNA microarray methods (1-3, 5-8, 42-45). The Oncomine cancer microarray database was employed for data collection and to investigate histopathologic associations (46). The Oncomine database is an integrated bioinformatic resource providing data collection, processing, and storage of all publicly available cancer microarray studies. All data are log transformed, median centered per array, and SD normalized to one per array. Gene module application lists all differential expression analyses in which the target genes were included and allows the user to select studies of interest, providing comparative statistical analyses. Selected comparisons of interest for meta-analysis included lymph node (–) versus lymph node (+) status, non-metastasis versus metastasis (5 years of follow-up), nondisease versus relapse (5 years of follow-up). The 55 up-modulated and 55 most down-regulated genes in lymph node (+) primary breast carcinomas were included for meta-analysis comparison. Data processing was carried out using comprehensive meta-analysis software v2 (Biostat, 2006). Standardized mean difference measures as scale-free indices and fixed effects analyses were employed for statistical integration. To enable visualization of meta-analysis results, we used The Institute for Genomic Research MultiExperiment Viewer (MeV 3.0) software. This tool was employed for average clustering of the P values obtained from each gene analyzed. When statistically significant coincidence among studies (i.e., SAGE and microarray studies) was observed on the behavior of specific transcripts, this was represented by colored boxes (red or green). Other progression parameters such as metastasis and disease-free survival (DFS) were also compared with the SAGE lymph node status findings. Statistically significant P values (P < 0.05) associated with gene overexpression in lymph node (+), metastasis, and relapse (DFS) are represented in red; statistically significant down-modulated expression is represented in green color.
Real-time RT-PCR Analysis
Template cDNAs were synthesized on mRNAs isolated from snap-frozen samples from an independent set of 40 stages I to II human breast carcinomas [21 lymph node (–) and 19 lymph node (+) IDCA samples]. Primers and probes were obtained from TaqMan Assays-on-Demand Gene Expression Products (Applied Biosystems). All the PCR reactions were done using the TaqMan PCR Core Reagents kit and the ABI Prism 7700 Sequence Detection System (Applied Biosystems). Experiments were done in triplicate, and each data point and 18S rRNA were used as control. Results were expressed as mean ± 2 SE based on log2 transformation of normalized real-time RT-PCR values of the assayed genes. We used the t test to compare the gene expression levels of validated genes between lymph node (+) and lymph node (–) breast tumors (P < 0.05).
DCTN3 Antibodies Production
Polyclonal antibody against DCTN3 (a kind gift of Dr. Kevin Pfister, Department of Cell Biology, University of Virginia, Charlotesville, VA) was generated according to standard procedures. Briefly, we obtained rabbit serum from animals previously immunized with DCTN3 peptides as antigen. After generation of GST-DCTN3 fusion protein, we did an antibody affinity purification of such serum. The antibodies obtained, which were known to work in Western blots, were optimized for immunohistochemical analysis on paraffin sections (47).
Tissue Microarray and Immunohistochemical Analyses
A breast cancer progression TMA was obtained from the M. D. Anderson Cancer Center (Houston, TX), and we were able to analyze a total of 87 cases representative of normal breast epithelium, ductal carcinoma in situ, invasive breast carcinoma, and metastatic tissues. Before immunostaining, endogenous peroxidase activity was blocked with 3% H2O2 in water for 10 min. Heat-induced epitope retrieval was done with 1.0 mmol/L EDTA buffer (pH 8.0) for 10 min in a microwave oven followed by a 20-min cool down. To block nonspecific antibody binding, the slides were incubated with 10% goat serum in PBS for 30 min. DCTN3 protein was detected using primary anti-DCTN3 polyclonal antibody (1:100 dilution), and horseradish peroxidase–conjugated anti-rabbit secondary antibody. Staining development was done with 3,3'-diaminobenzidine (DAB), and the slides were then counterstained with hematoxylin. DCTN3 protein expression were measured using a Chromavision Automated Cellular Imaging System (ACIS) by means of the generic DAB software application. The software determines brown intensity regardless of the area covered by the positive cells.
| Notes |
|---|
|
|
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: Supplementary data for this article are available at Molecular Cancer Research Online (http://mcr.aacrjournals.org/).
5 http://spi.mdacc.tmc.edu/bitools/about/sage_lib_tool.html ![]()
6 All raw SAGE data reported as Supplementary Tables in this manuscript is publicly available at http://sciencepark.mdanderson.org/labs/ggeg/SAGE_Proj_11.htm. ![]()
7 Available at the Database for Annotation, Visualization and Integrated Discovery (DAVID) at http://david.niaid.nih.gov/david (15). ![]()
Received 1/31/07; revised 5/ 3/07; accepted 6/ 1/07.
| References |
|---|
|
|
|---|
homologue Spitz. Biochem J 2002;363:347–52.[CrossRef][Medline]This article has been cited by other articles:
![]() |
T. M. Wise-Draper, R. J. Morreale, T. A. Morris, R. A. Mintz-Cole, E. E. Hoskins, S. J. Balsitis, N. Husseinzadeh, D. P. Witte, K. A. Wikenheiser-Brokamp, P. F. Lambert, et al. DEK Proto-Oncogene Expression Interferes with the Normal Epithelial Differentiation Program Am. J. Pathol., January 1, 2009; 174(1): 71 - 81. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |