Talend Hadoop Developer Resume
Chicago, IL
SUMMARY:
- Over 9 years of total professional experience in IT field involving project development, implementation, deployment and maintenance using Hadoop ecosystem related technologies with domain knowledge in Finance, Banking, Communication, Insurance, Retail Industry and Health care.
- 4+ years of hands on experience in Hadoop Ecosystem technologies like HDFS, MapReduce, Yarn, Spark, Hive, Pig, Oozie, Sqoop, Flume, Zookeeper, HBase.
- Experience in Hadoop Big Data Integration with Talend ETL on performing data extract, loading and transformation process.
- Extensively used ETL methodology for performing Data Migration, Extraction, Transformation and loading using Talend and designed data conversions from wide variety of source systems.
- Created sub jobs in parallel to maximize the performance and reduce overall job execution time with the use of parallelize component of Talend and using the Multithreaded Executions.
- Extensively created mappings in Talend using t - Map, t-Join, t-Replicate, t-Parallelize, t-Javaflex, t-Java row, t-Die, t-Aggregate Row, t-Warn, t-Log Catcher, t-Filter, t-Global map etc.
- Experienced in scheduling Talend jobs using Talend Administration Console (TAC)
- Worked On Talend Data mapper to read COBOL copy books and generate avro schema
- Created complex mappings in Talend … using tMap, tDie, tJoin, tReplicate, tFilterRow, tParallelize, tFixedFlowInput, tAggregateRow, tIterateToFlow etc.
- Practical understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables.
- Practical understanding of Ralph Kimball and Bill Inmon methodologies.
- Created, documented and maintained logical and physical database models with enterprise standards and maintained metadata definitions for enterprise datastores within a metadata repository.
- Efficiently involved in converting JSON files to XML, and CSV files in Talend
- Designed and Implemented the ETL process using Talend Enterprise Big Data Edition to load the data from Source to Target Database.
- Worked on Global Context variables, Context variables, and extensively used big data ccomponents in Talend to create jobs.
- Created Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW).
- Reverse engineered reports from old systems and identified required Data Elements in the source systems for Dimensions, Facts and Measures.
- Conduct design discussions and meetings to come out with the appropriate Data Mart at the lowest level of grain for each of the Dimensions involved.
- Experience with data modeling tools like ERStudio, ERWin and Power designer
- Designed a STAR schema for the detailed data marts and confirmed dimensions.
- More than 4+years of work experience in ingestion, storage, querying, processing and analysis fBigData with hands on experience in Hadoop Ecosystem development including MapreduceHDFS, Hive, Pig, Spark, ClouderaNavigator, Mahout, HBase, ZooKeeper, Sqoop, Flume, Oozie and AWS.
- Experienced in working with Data Warehousing Concepts like OLAP, OLTP, Star Schema, Snow Flake Schema, Logical Data Modeling, Physical Modeling and Dimension Data Modeling and utilizing t-Stats Catcher, t-Die, t-Log Row to create a generic job to store processing stats.
- In depth understanding of Hadoop Architecture and its various components such as Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, and MapReduce concepts.
- Good Hands on developing Talend DI Jobs to transfer the data from Source views to Hadoop Staging, Target Layers to perform the Fraud identification survey on the transactions.
- Proficient knowledge on Apache Spark and Apache Storm to process real time data.
- Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
- Good exposure to performance tuning hive queries, map-reduce jobs, spark jobs.
- Worked with various formats of files like delimited text files, click stream log files, Apache log files,Avro files, JSON files, XML Files
- Extensive experience working in Teradata, Oracle, Netezza, SQLServer and MySQL database.
- Excellent understanding and knowledge of NOSQL databases like MongoDB, HBase, and Cassandra.
- Good Working experience in Teradata, Ab Initio, Business Objects, Tableau, Crystal Reports, PL/SQL, SAS, MS Excel, MS Access
- Proficient in Teradata SQL coding, using Teradata BTEQ utility, Working Knowledge on Teradata / Parallel Transport utility/TPUMP. (TPT) coding.
- Strong experience working with different Hadoop distributions like Cloudera, Hortonworks, MapR and Apache distributions.
- Created, Maintained and Executed Manual Test Scripts in Quality Center.
- Extensively used Oracle SQL connector forHadoop distributed file sytem thru all the features such as input formats, parallel query to load,Security and Partitioned tables in hive.
- Experience in installation, configuring, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH 5.X) distributions and on Amazon web services (AWS).
- Experience in Amazon AWS services such as EMR, EC2, S3, CloudFormation, RedShift which provides fast and efficient processing of Big Data.
- Azure HDInsight is a fully-managed cloud service that makes it easy, fast, and cost-effective to process massive amounts of data. Use popular open-source frameworks such as Hadoop, Spark, Hive, LLAP, Kafka, Storm, R & more. Azure HDInsight enables a broad range of scenarios such as ETL, Data Warehousing, Machine Learning, IoT and more.
- Experienced in delivering project with varying timelines using Agile (SCRUM), Waterfall or RUP or Kanban methodology and working with remote team members and drive the project to success
- Experienced in Managing the Change Request During the Product/System Development Lifecycle (SDLC) and in creating Work Breakdown Structure (WBS). Followed the project management guidelines as specified by Project Management Body of Knowledge (PMBOK).
- Extensive experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in Java
- Experienced in Hadoop Big Data Integration with ETL on performing data extract, loading and transformation process for ERP data.
- Expertise in writing MapReduce programs in Java, PIG Latin, HQL, Perl scripting, PostgreSQL, VB scripting, Shell scripting, SQL, PL/SQL, Core Java.
- Strong experience in designing and developing Business Intelligence solutions in Data Warehousing using ETL Tools and excellent understanding and best practice of Data Warehousing Concepts, involved in Full Development life cycle of Data Warehousing.
- Developed UDF, UDAF, UDTF functions for Hive and Pig.
- Hands on experience to enable Kerberos authentication in ETL process
- Strong Data Warehousing ETL experience of using Informatica 9.x/8.x/7.x Power Center Client tools - Mapping Designer, Repository manager, Workflow Manager/Monitor and Server tools - Informatica Server, Repository Server manager.
- Good knowledge of Partitions, Bucketing concepts, designed and managed them and created external tables in Hive in order to optimize performance.
- Good experience in Avro files, RC files, Combiners, Counters for best practices and performance improvements.
- Good knowledge on Joins, group and aggregation concepts and resolved performance issues in Hive and Pig scripts by implementing them.
- Worked on various Tools and IDEs like Eclipse, IBM Rational, Visio, Apache Ant-Build Tool, MSOffice,PLSQL Developer, SQL*Plus.
- Extensive knowledge in Designing, Developing and implementation of the Data marts, Data
- Structures using Stored Procedures, Functions, Data warehouse tables, views, Materialized
- Views, Indexes at Database level using PL/SQL, Oracle.
- Experience in using SQL Server Tools like DTS,Import/Export Wizard, SQL Server Enterprise Manager, SQL Profiler andSQL Query Analyzer.
- Experience in different application servers like JBoss/Tomcat, WebLogic, IBMWebSphere.
- Experience in working with Onsite-Offshore model.
- Developed data pipeline using Flume, Sqoop, Pig and Java Map Reduce to ingest claim data and financial histories into HDFS for analysis. Using Curator API on Elastic Search to data back up and restoring.
TECHNICAL SKILLS:
Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, ZooKeeper, Spark,Solr, Storm, Drill, Ambari, Mahout, MongoDB, Cassandra, Avro, Parquet and Snappy, Hadoop Distributions Cloudera, MapR, Hortonworks, IBM BigInsights
Languages: Java, Scala, Python,Jruby, SQL, HTML, DHTML, JavaScript, XML and C/C++
No SQL Databases: Cassandra, MongoDBandHBase
Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts
XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB, Web Design Tools HTML, DHTML, AJAX, JavaScript, JQuery and CSS, AngularJs, ExtJS and JSON
Development / Build Tools: Eclipse, Ant, Maven,Gradle, IntelliJ, JUNITand log4J.
Frameworks: Struts, spring and Hibernate
App/Web servers: WebSphere, WebLogic, JBoss and Tomcat
DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle
RDBMS: Teradata, Oracle MS SQL Server, MySQL and DB2
Operating systems: UNIX, LINUX, Mac OS and Windows Variants
Data analytical tools: R, SAS and MATLAB
ETL Tools: Tableau, Talend, Informatica and Abinitio,Hyperion
PROFESSIONAL EXPERIENCE:
Confidential, Chicago, IL
Talend Hadoop Developer
Responsibilities:
- Worked with business users for requirement gathering, understanding intent and defining scope and am responsible for project status updates to Business users.
- Performing analysis and providing summary for the business questions, initiating proactive investigations into data issues that impact reporting, business analysis or program execution.
- Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.
- Created Talend Spark jobs, which collects data from regular relational database, and load the data in to Hbase
- Worked on extracting and enriching HBase data between multiple tables using joins in spark.
- Worked on writing APIs to load the processed data to HBase tables.
- Replaced the existing MapReduce programs into Spark application using Scala.
- Built on-premise data pipelines using Kafka and Spark streaming using the feed from API streaming Gateway REST service.
- We Processed extremely large volumes of XML data in Hadoop thru oracle.
- We Integrated and tested on big data appliance thru oracle.
- Developed the Hive UDF's to handle data quality and create filtered datasets for further Processing
- Utilized Agile methodologies tools such as Kanban and Scrum to track all project management processes
- Tracked product progress including bug reports using Jira and MS Project
- Experienced in writing Sqoop scripts to import data into Hive/HDFS from RDBMS.
- Good knowledge on Kafka streams API for data transformation.
- Experience in creating and updating Test Plans, Test Cases, and Design steps, as well as defect tracking, bug tracking using Test Management Tool HP QC/ALM, Bugzilla, JIRA
- Skilled in creating and updating Requirements Traceability Matrix(RTM) to link test cases to requirements
- Experience with various technologies such as Oracle, SQL, PL/SQL, Oracle APEX, Oracle Forms and Oracle Reports, SQL*Loader, SQL*Plus, Dynamic SQL, TOAD, PL/SQL Developer, SQLNavigator
- Implemented logging framework - ELK stack (Elastic Search, LogStash&Kibana) on AWS.
- Setup Spark EMR to process huge data which is stored in Amazon S3.
- Developed oozie workflow for scheduling & orchestrating the ETL process.
- Used Talend tool to create workflows for processing data from multiple source systems.
- Created sample flows in Talend, Stream sets with custom coded jars and analyzed the performance of Stream sets and Kafka steams.
- Hands on experience writing Queries, Stored Procedures, Functions, PL/SQL Packages and Triggers in Oracle and reports and scripts
- Created sessions and batches to move data at specific intervals & on demand using Server Manager
- Experienced in Authenticated access with Kerberos on Oracle Big Data Appliance
- Worked on Mobile application and performed manual testing on SDLC methodology following Agile (Scrum) which includes 3 week-sprints, daily stand up meetings.
- Involved in analyzing requirement specifications and developed Test Plans, Test Scenarios and Test Cases to cover overall quality assurance testing.
- Developed Hive Queries to analyze the data in HDFS to identify issues and behavioral patterns.
- Created indexes for various statistical parameters on Elastic Search and generated visualization using Kibana
- Involved in writing optimized Pig Script along with developing and testing Pig Latin Scripts.
- Able to use Python Pandas, Numpy modules for Data analysis, Data scraping and parsing
- Deployed applications using Jenkins framework integrating Git- version control with it.
- Participated in production support on a regular basis to support the Analytics platform
- Used Rally for task/bug tracking.
- Used GIT for version control.
Environment: MapR, Hadoop, Hbase, HDFS, AWS, PIG,Erwin, Hive, Drill, SparkSql, MapReduce, Spark streaming, Kafka, Flume, Sqoop, Oozie, Jupyter Notebook,Python 2.9,plsql,Docker,Hyperion, Kafka, Spark, Scala, Hbase, HP ALM,Talend Big Data Studio 6.0, Shell Scripting, Java and oracle Data integrator 12c.
Confidential, Richmond, VA
Hadoop Developer
Responsibilities:
- Involved in loading data from UNIX file system to HDFS using command and scripts.
- Exposure in creating Hive tables using HiveQL, loading data and writing hive queries which will run internally in map reduce way.
- Loading data from different source (database & files) into Hive using Talend tool.
- Data migration from relational (Oracle. Teradata) databases or external data to HDFS using Sqoop and Flume & Spark.
- Analyzed the data by performing Hive queries and running Pig scripts.
- Designed both Managed and External tables in Hive to optimize performance.
- Regular monitoring of Hadoop Cluster to ensure installed applications are free from errors and warnings.
- Exposure in optimizing Hive queries using Partitioning and Bucketing techniques
- Responsible for building scalable distributed data solutions using Hadoop.
- Worked with highly unstructured and semi structured data of 100TB+ in size
- Developed Pig and Hive scripts to be used by end user / analyst / product manager's requirements for adhoc analysis.
- Managed External tables in Hive for optimized performance using Sqoop jobs.
- Solved performance issues in Hive and Pig scripts with understanding of joins, group and aggregation and how it translates to MapReduce jobs.
- Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's, Spark on YARN.
- Worked with Hadoop-Kerberos security environment is supported by the Cloudera team.
- Experienced in loading and transforming of large sets of structured, semi structured, and unstructured data.
- Experience with various technologies such as Oracle, SQL, PL/SQL, Oracle APEX, Oracle Forms and Oracle Reports, SQL*Loader, SQL*Plus, Dynamic SQL, TOAD, PL/SQL Developer, SQLNavigator.
- Developed functions, views and triggers for automation.
- Responsible for gathering data migration requirements.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
- Experienced in developing Spark scripts for data analysis in both python and scala.
- Built on-premise data pipelines using kafka and spark for real time data analysis.
- Created reports in TABLEAU for visualization of the data sets created and tested native Drill, Impala and Spark connectors.
- Indexed documents using Elastic Search.
- Experienced on offline data peprocessing to Hadoop.
- Analyzed Test Strategy and Test Plan documents to generate logical Test Scenarios and Test Cases.
- Wrote test cases and executed them manually from HP ALM to test the application for its functionality, system integration, smoke, Regression, Stress testing.
- Involved in creating gap analysis document, clearly identifying the data, business process and workflows of the organization with respect to salesforce.com implementation.
- Analysed the SQL scripts and designed the solution to implement using Scala.
- Implemented Hive complex UDF's to execute business logic with Hive Queries.
- Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them.
- Evaluated performance of Spark SQL vs IMPALA vs DRILL on offline data as a part of poc.
- Worked on solr configuration and customizations based on requirements.
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
- Involved in writing T-SQL programming for implement Stored Procedures and Functions for different tasks.
- Responsible for creating Databases, Tables, Index, Unique/Check Constraints Views, Stored Procedures, Triggers, Rules.
- Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
- Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Responsible for developing data pipeline by implementing Kafka producers and consumers.
- Exported the analyzed data to Impala to generate reports for the BI team.
- Managing and reviewing Hadoop Log files to resolve any configuration issues.
- Developed a program to extract the name entities from OCR files.
- Fixed defects as needed during the QA phase, support QA testing, troubleshoot defects and identify the source of defects.
- Used Mingle and later moved to JIRA for task/bug tracking.
- Used GIT for version control
Environment: Hadoop 2.2, Informatica Power Center 9.x,Erwin, HDFS, HBase, Flume 1.4, Sqoop 1.4.3, Hive 0.13.1, Avro 1.7.4, Parquet 1.4,MapR, Cloudera, AWS, PIG, Impala, Drill, SparkSql, Hyperion,OCR,, ZooKeeper,PL/SQL,CosmosDB,Pl/sql,Tableau,HP ALM, Shell Scripting, Gerrit, Java, HP ALM,Redis and Elastic Search and oracle data integrator
Confidential, Atlanta, GA
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Handled importing of data from multiple data sources (Oracle, SQLServer) using Sqoop, performed Cleaning, Transformations and Joins using Pig.
- Push data as delimited files into HDFS using Talend Big data studio.
- Involved in writing Map Reduce program using Java.
- Load and transform data into HDFS from large set of structured data /Oracle/Sql server using Talend Big data studio.
- Exported analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Experience in providing support to data analyst in running Hive queries.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Creating Hive tables, partitions to store different Data formats.
- Involved in loading data from UNIX file system to HDFS.
- Experience in managing and reviewing Hadoop log files.
- Consolidate all defects, report it to PM/Leads for prompt fixes by development teams and drive it to closure.
- Supported existing BI solution, data marts and ETL processes.
- Migration of 100+ TBs of data from different databases (i.e. Oracle, SQL Server) to Hadoop.
- Worked on various file formats Avro, SerDe, Parquet, and Text by using snappy compression.
- Used Pig Custom Loaders to load different forms of data files such as XML, JSON and CSV.
- Designed dynamic partition mechanism for optimal query performance of system using HIVE to reduce report time generation under SLA requirements.
- Analyzing the requirement to setup a cluster.
- Worked on analyzing Hadoop cluster and different big data analytic tools including MapReduce, Hive and Spark.
- Involved in loading data from LINUX file system, servers, Java web services using Kafka Producers, partitions.
- Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
- Implemented Storm topologies to pre-process data before moving into HDFS system.
- Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
- Implemented POC to migrate MapReduce programs into Spark transformations using Spark and Scala.
- Involved in creating Hive tables, loading with data and writing hive queries which runs internally in MapReduce way.
- Developed the MapReduce programs to parse the raw data and store the pre-Aggregated data in the partitioned tables.
- Loaded and transformed large sets of structured, semi structured, and unstructured data withMapReduce, Hive and pig.
- Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
- Implemented Python scripts for writing MapReduce programs using Hadoop Streaming.
- Involved in using HCATALOG to access Hive table metadata for MapReduce or Pig code.
- Experience in implementing custom sterilizer, interceptor, source and sink as per the requirement in flume to ingest data from multiple sources.
- Experience in setting up Fan-out workflow in flume to design v shaped architecture to take data from many sources and ingest into single sink.
- Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
- Implemented monitoring on all the NiFi flows to get notifications if there is no data flowing through the flow more than the specific time.
- Converted unstructured data to structured data by writing Spark code.
- Indexed documents using Apache Solr.
- Set up Solr Clouds for distributing indexing and search.
- Worked on No-SQL databases like Cassandra, MongoDB for POC purpose in storing images and URIs.
- Integrating bulk data into Cassandra file system using MapReduce programs.
- Worked on MongoDB for distributed storage and processing.
- Designed and implemented Cassandra and associated RESTful web service.
- Implemented Row Level Updates and Real time analytics using CQL on Cassandra Data.
- Worked on analyzing and examining customer behavioral data using Cassandra.
- Created partitioned tables in Hive, mentored analyst and SQA team for writing Hive Queries.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Involved in cluster setup, monitoring, test benchmarks for results.
- Involved in build/deploy applications using Maven and integrated with CI/CD server Jenkins.
- Involved in agile methodologies, daily scrum meetings, Sprint planning's.
Environment: Hadoop 2.6 cluster, Informatica9.x, HDFS, Flume 1.5, Sqoop 1.4.3, Erwin, Hive 1.0.1, pig, Hive, NiFi, Spark 1.4, HBase, XML, JSON, Teradata, Oracle, MongoDB, AWS Redshift, PythonSpark, Scala, MongoDB, Cassandra, Snowflake, Solr, ZooKeeper, MySQl, Talend Big Data Studio 6.0/5.5, Shell ScriptingLinux Red Hat, Java,Oracle Hyperion 12c
Confidential
ETL Hadoop Developer/Administrator
Responsibilities:
- Responsible for loading the customer's data and event logs from Oracle database, Teradata into HDFS using Sqoop
- End-to-end performance tuning of Hadoop clusters and Hadoop MapReduce routines against very large data sets.
- Developed the Pig UDF'S to pre-process the data for analysis.
- Loading data from LINUX file system to HDFS.
- Importing and exporting data into HDFS and Hive using Sqoop and Flume.
- Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations.
- Wrote MapReduce jobs to generate reports for the number of activities created on a particular day, during a dumped from the multiple sources and the output was written back to HDFS
- Installed and configured Hadoop HDFS, MapReduce, Pig, Hive, and Sqoop.
- Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
- Prepared TEZ build from the source code and run the HIVE Query Jobs using TEZ execution engine rather using MR jobs for better performance
- Participate in requirements gathering and designing, development, testing and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users.
- Participated in client calls to gather and analyze the requirement.
- Importing and exporting data into HDFS from database and vice versa using Sqoop.
- Resource management of Hadoop Cluster including adding/removing cluster nodes for maintenance and capacity needs.
- Responsible for monitoring the Hadoop cluster using Zabbix/Nagios.
- Converting the existing relational database model to Hadoop ecosystem.
- Installed and configured Flume, Oozie on the Hadoop cluster.
- Managing, defining and scheduling Jobs on a Hadoop cluster.
- Generate datasets and load to HADOOP Ecosystem.
- Worked with Linux systems and RDBMS database on a regular basis to ingest data using Sqoop.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Involved in review of functional and non-functional requirements.
- Implemented Frame works using Java and python to automate the ingestion flow.
- Responsible to manage data coming from different sources.
- Loaded the CDRs from relational DB using Sqoop and other sources to Hadoop cluster by using Flume.
- Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.
- Involved in loading data from UNIX file system and FTP to HDFS.
- Designed and implemented HIVE queries and functions for evaluation, filtering, loading and storing of data.
- Creating Hive tables and working on them using HiveQL.
- Developed data pipeline using Kafka and Storm to store data into HDFS.
- Created reporting views in Impala using Sentry policy files.
- Developed Hive queries to analyze the output data.
- Had to do the Cluster co-ordination services through ZooKeeper.
- Collected the logs data from web servers and stored in to HDFS using Flume.
- Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Implemented several Akka Actors which are responsible for loading of data into hive.
- Design and implement Spark jobs to support distributed data processing.
- Supported the existing MapReduce Programs those are running on the cluster.
- Developed and implemented two Service Endpoints (end to end) in Java using Play framework, Akka server Hazelcast.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Wrote Java code to format XML documents; upload them to Solr server for indexing.
- Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
- Developed Power enter mappings to extract data from various databases, Flat files and load intoDataMart using the Informatica.
- Followed agile methodology for the entire project.
- Installed and configured Apache Hadoop, Hive and Pig environment.
Environment: Hadoop, Hortonworks, HDFS, pig, Hive, Flume, Sqoop, Ambari, Ranger, PythonAkka, Play framework, Informatica, Elastic search, Linux- Ubuntu, Solr.
Confidential
Data Analyst
Responsibilities:
- Acted as a liaison between the IT developers and Business stake holders and was instrumental in resolving conflicts between the management and technical teams.
- Worked with business users for requirement gathering, understanding intent and defining scope and am responsible for project status updates to Business users.
- Performing analysis and providing summary for the business questions, initiating proactive investigations into data issues that impact reporting, business analysis or program execution.
- Created views for reporting purpose which involves complex SQL queries with sub-queries, inline views, multi table joins, with clause and outer joins as per the functional needs in the Business Requirements Document (BRD).
- Involved in performance tuning of slowly running SQL queries and created indexes, constraints and rules on database objects for optimization.
- Developed functions, views and triggers for automation.
- Assisted in mining data from the SQL database that was used in several significant presentations.
- Assisted in offering support to other personnel who were required to access and analyze the SQL database.
- Worked on Python Modules and Packages.
- Hands-on experience in Python scripting, in web development using Django.
- Used Python scripts to update the content in the database and manipulate file
- Analyzed various backup compression tools available and made the recommendations.
- Performed data analysis and data profiling using complex SQL on various sources systems including Oracle and Teradata.
- Involved with data profiling for multiple sources and answered complex business questions by providing data to business users.
- Extensive experience in Design and Implementation of PL/SQL Stored Procedures, Functions, Packages, Views, Cursors, Ref Cursors, Collections, Records, Object Types, Database Triggers, Exception Handling, Forms, Reports, Table Partitioning.
- Involved in writing T-SQL programming for implement Stored Procedures and Functions for different tasks.
- Responsible for creating Databases, Tables, Index, Unique/Check Constraints Views, Stored Procedures, Triggers, Rules.
- Optimized the performance of queries by modifying the existing index system and rebuilding indexes.
- Populated data into Teradata tables by using Fast Load utility.
- Performed various data pull from Teradata One View Data warehouse using SQL Assistant and Bteq.
- Generated weekly, bi weekly, monthly reports with help Oracle, Teradata, SQL, BTEQ, MS Access, MS Excel, SAS.
- Performed various data pull from Teradata One View Data warehouse using SQL Assistant and Bteq.
- Worked on loading of data from several flat files sources to Staging using Teradata MLOAD, FLOAD and Tpump.
- Assisted IT developers and non IT developers (Database Analysts) with Teradata Utilities.
- Coordinated project activities between clients and internal groups and information technology, including project portfolio management and project pipeline planning and Worked in close collaboration with the Project Management Office and business users to gather, analyze and document the functional requirements for the project.
- Responsible for development of workflow analysis, requirement gathering, data governance, data management and data loading.
- Analyzing and documenting data flow from source systems managed the availability and quality
- Of Data.
- Root cause analysis of data discrepancies between different business system looking at Business rules, data model and provide the analysis to development/bug fix team.
- Hands on experience writing Queries, Stored Procedures, Functions, PL/SQL Packages and Triggers in Oracle and reports and scripts
- Evaluated existing practices of storing and handling important financial data for compliance and Ensured corporate compliance with all billing, credit standards and direct responsibility of accounts receivables and supervision of accounts payable.
- Hands on experience writing Queries, Stored Procedures, Functions, PL/SQL Packages and Triggers in Oracle and reports and scripts
- Have setup data governance touch points with key teams to ensure data issues were addressed promptly.
- Responsible for facilitating UAT (User Acceptance Testing), PPV (Post Production Validation) and maintaining Metadata and Data dictionary.
- Responsible for source data cleansing, analysis and reporting using pivot tables, formulas (v-lookup and others), data validation, conditional formatting, and graph and chart manipulation in Excel.
- Actively involved in data modeling for the QRM Mortgage Application migration to Teradata and developed the dimensional model.
- Experience in developing SQL*Loader control programs and PL/SQL validation scripts for validating data to load data from staging tables to production tables.
- Created views for reporting purpose which involves complex SQL queries with sub-queries, inline views, multi table joins, with clause and outer joins as per the functional needs in the Business Requirements Document (BRD).
Environment: informatica Power Center 8.x (Repository Manager, Designer, Workflow Manager, and Workflow Monitor),Agile, Teradata SQL Assistant,Oracle 12c, SQL, PL/SQL, Unix Shell Scripts, Python2.7, MDX/DAX, SAS, PROC SQL,MS Office Tools, MS Project, Windows XP, MDX/DAX, MS Access, Pivot Tables