Consultant Data Engineer Resume
Chicago, IL
SUMMARY:
- More than 17 years of rich experience in Software Development, Maintenance and Support of various applications in Insurance, Banking and Retail domain.
- 7 years of experience in Big Data with Amazon Cloud AWS, Hadoop, Pig Scripts, Spark, Scala, Hive, Impala, HDFS, Parquet, Oozie, Cloudera Manager, Hbase, Sqoop , Azure, Cassandra, Ganglia, Kafka, Python and S3.
- Technical expertise in the Agile and SDLC involving requirement analysis, project scoping, effort estimation, risk analysis, development and quality management, as per the specified guidelines and norms.
- Ability to relate to people at any level of business and management across the globe and significant experience working with customers, project managers and technical teams for executing large - scale projects.
- Develop, implement and provide all kind of support for business application software for clients.
- Achieve customer satisfaction by ensuring service quality norms and building the brand image by exceeding customer expectations.
- Actively Involved in Design Phase for the client’s requirements.
- Resolve support/operational issues in liaison with project managers & business group.
- Handle testing phase of the project. Ensure all necessary data and matrix are generated / maintained.
- 15 years of US IT experience, Architecting, Designing and leading End-To-End Data Warehousing/ETL/Integration/Mapping solutions for various Clients involving complete SDLC.
- Used AWS, Hadoop Pig Scripts, Hive, Impala, Parquet, Snappy Compression, Cloudera Manager, HDFS, Hbase, Sqoop, Spark, Scala and Python to create data lake/store for PDW, Impact and Pricing Hub Project.
- Worked on the core and Spark SQL modules of Spark extensively using programming languages like Scala.
- Worked on creating the RDD's, DF's and Datasets for the required input data and performed the data transformations and actions using Spark Scala.
- Hierarchy, Sales, Inventory, Price and other data source created by Pig Scripts, Hive, HDFS, Hbase, Cassandra, Impala, Scala and Python loaded into MQSQL daily.
- Conversion of DataStage jobs to Hadoop for Impact project which processes files with millions of records to reduce run time.
- Experience in designing, reviewing, implementing and optimizing data transformation processes in the Hadoop. Able to consolidate, validate and cleanse data from a vast range of sources - from applications and databases to files.
- Output created by the Pig Scripts was compared with DataStage data for approval.
- Used DataStage 11.5, 8.5 & 8.1 to extract data from various source to implement Confidential data warehouse & Dynamic Pricing Project.
- Used IBM DataStage Designer and Information Analyzer and to develop parallel jobs to extract, cleanse, transform, integrate and load data into Data Warehouse.
- Developed jobs in DataStage 8.1 using different stages like Transformer, Aggregator, Lookup, Join, Merge, Remove Duplicate, Sort, Row Generator, Sequential File and Data Set.
- Used Director Client to validate, run, and monitor the jobs that are run by WebSphere DataStage server.
- Worked on various Talend components such as tMap, tFilterRow, tAggregateRow, tFileExist, tFileCopy, tFileList, tDie etc.
- 5 + years of Design and Development Experience with Teradata SQL Assistant, BTEQ, FastLoad, and Fast Extract ETL processes
TECHNICAL SKILLS:
Databases: Hive, Impala, Parquet, Hbase , Oracle Exadata, Oracle 11g, Teradata, Cassandra , IBM DB2, UDB, Netezza, Infobright, MYSQL, Command Centre, MS Access, Dbase IV
Special Software: Hadoop, Oozie, Hive, Pig Scripts, HBASE, AWS, S3, DataStage 8.5, 8.1 & 7.5 and IBM Information Analyzer, Talend, UNICA, SAS Revenue Optimization 2.2, Cyber Fusion, STROBE, Insert Examiner, MQ Series, Hiperstation, Test Director, Rational ClearQuest, DocumentDirect, ViewDirect, Control M, Snowflake cloud datawarehouse.
Configuration Tools: PVCS Version Manager v7.5, ChangeMan, PANVALET, Endevor
App. Development Tools: File Aid, ISPF, TSO, CICS, QMF, VSAM, Teradata SQL Assistant
Programming Languages: SAS, JAVA, COBOL, JCL, SQL, Shell Script, Perl, REXX C, C++, VB, Pascal, Fortran, EZTRIEVE Plus
Others: Report Program Interface (RPI) on IBM, Application Program Interface (API) on IBM
Domain: Insurance, Banking and Retail
Environment: s: IBM 3090, ES9000, Windows NT/2K, AIX, UNIX on MC-68000, MS DOS, MVS
PROFESSIONAL EXPERIENCE DETAILS:
Confidential, Chicago, IL
Consultant Data Engineer
Responsibilities:
- Responsible for creation of data models and ingesting data to platform to support application development and data science teams.
- Building data pipelines to ingest data from RDS to platform raw data layer. Applying ETL process using API’s, Streamsets data collectors to upload data from raw layer to business layer.
- Understanding the industrial data model of platform to leverage platform capabilities for easy ingestion and fetching of data from and to platform
Environment: Streamsets,, SQL, Docker, Python, AWS S3, Shell scripts, Postgres
Confidential, Chicago, IL
Hadoop Big Data Architect/Lead Developer
Responsibilities:
- Worked on Process improvements and Hadoop optimization and vehicle load projects.
- From Freight and other Data sources created the Logistics Dashboard which will be displayed in Qlik for business to query and view various charts.
- Worked on the core and Spark SQL modules of Spark extensively using programming languages like Scala.
- Worked on creating the RDD's, DF's and Datasets for the required input data and performed the data transformations and actions using Spark Scala.
- Designed and involved in development on each module as POC to prove the code and it works for the requirement. POC’s for small file issues using HAR and other parameters.
- Worked on cloud services on AWS S3 for storing Data from older partitions. This helps in keeping the new data on HDFS for faster access and use S3 for old data.
- Involved in design and architectural decision for implementation at each layer. The Datalake, Integration and semantic.
- Worked with multiple teams at offshore and communicating the status with client.
Environment: AWS Hadoop, Cloudera, Cloudera Manager, HDFS, Spark, Scala, Sqoop, Hue, Hbase, Hive, Solr, Impala, Solr, Autosys, Pig Scripts, Oozie, Java, UNIX, Parquet, Snappy compression, Python, Shell Scripting, SQL, S3.
Confidential, Pittsburg, PA
Hadoop Big Data Consultant / Lead Developer
Responsibilities:
- Design, build and deploy data pipelines supporting data domains. Design and Development of ABC process which stand for Audit, Balance and Control Process in SPARK Scala.
- This ABC process measures the time of each job. Records the source and target record counts. Based on that the control can the setup. The Control piece can the configured to handle certain threshold and decision on the next process can be made.
- Worked in the ingestion and transformation to meet the business requirement and test those. Integrated the ABC process in the ingestion and other jobs.
- Hands on experience with Spark Scala programming and good understanding of its 'In Memory' processing capability.
- Worked on creating the RDD's, DF's and Datasets for the required input data and performed the data transformations using Spark Scala.
Environment: Hortonworks , Spark, HDFS, Scala, Sqoop, Hue, Hive, Oozie, Java, UNIX, Python, Shell Scripting, SQL.
Confidential
Hadoop Big Data Architect/Lead Developer
Responsibilities:
- Data Ingestion within Hadoop Environment.
- Develop and implement business reports and data extraction procedures from various sources.
- Created and Automated Ingest mechanism using shell script for various data sources with validation.
- Worked on POCs for performance improvement initiative. Created Parquet tables and compressed using Snappy compression. Implemented this for settlement feed which saved space and improved query performance.
- Worked on creating the RDD's, DF's and Datasets for the required input data and performed the data transformations using Spark Scala.
- Experience in Kafka Producer, Consumer, Brokers, Topic and partitions.
- Experienced with batch processing of data sources using Apache Spark with RDDs for creating the Ingestion Framework
- Develop and review test scripts based on test cases and document and communicate with stake holders.
- Monitoring and scheduling the jobs using Oozie scheduler. Analyse and fix production issues by coordinating with the team.
- Worked on various Talend components such as tMap, tFilterRow, tAggregateRow, tFileExist, tFileCopy, tFileList, tDie etc.
- Design talend job using sqoop, file list, Hive Hadoop ecosystem components.
- Used Bigdata components tImpalaInput, tImpalaRow, tHDFSConnection, tHiveConnection,tHiveLoad, VerticaOutput etc..HDFS file system and vice versa.
- Implemented Data loads from talend server to HDFS file system.
- POC’s in Azure HDInsight using Apache Hadoop, Spark.
Environment: BDA 4.8, CDH 5.10.1, Cloudera Manager, Hadoop, Spark, HDFS, Hadoop Pig Scripts, Scala, Sqoop, Hue, Hive, Impala, Oracle Exadata, Oozie, Java, Rally, Datastage 11.5, Talend, UNIX, Parquet, Snappy compression, Python, Shell Scripting, SQL.
Confidential
Hadoop Lead Developer
Responsibilities:
- Data Ingestion within Hadoop Environment.
- Develop and implement data extraction procedures from Hadoop, Netezza and other systems.
- Create and Automate Ingest mechanism for various data sources.
- Coordinated with technical team for installation of Hadoop and production deployment of software applications for maintenance.
- Develop and review test scripts based on test cases and document and communicate to stake holders.
- Used HIVE SQL to analyze the Data sources and apply business rules for reports.
Environment: Hadoop, HDFS, Hive, Hadoop Pig Scripts, HBASE, Java, Rally, GIT, Teradata, UNIX, Cassandra, Ganglia, Shell Scripting, SQL.
Confidential, Chicago IL
Sr. Hadoop Developer, ETL Data Modeler
Responsibilities:
- Identified and verified source data for Pricing Hub as per business need.
- Created Sqoop jobs to extract data from DB2 tables. Design and develop HIVE, PIG jobs and unit tested the ETL components.
- HDFS data was taken for Item Hierarchy, Sales, Price and Inventory feed. This was done in HIVE and PIG and loaded into staging MySQL tables before loading into final tables.
- Designed and created MYSQL load scripts used by Pricing HUB UI.
- For Impact project analyzed the DataStage jobs and developed conversion jobs for Datasets which will be used in the Hadoop process.
- Validating the data using IBM Information Analyzer.
- Used SQL to analyze the Data sources and apply business rules for reports.
- Developed PIG Scripts and JAVA UDF’s for each DataStage job and test it using Hadoop input files.
- Tested the Pig scripts output by comparing with DataStage files, and document any data issues.
- Unica scheduling and maintenance of campaign flowcharts.
- Developed and tested UNIX scripts for prioritization process to save time.
- Developed UNIX scripts for fast load and export for various vendor formats.
Environment: Hadoop, HDFS, Hive, Hadoop Pig Scripts, HBASE, Sqoop, Oozie, Python, Zookeeper, Map-reduce, Java, IBM DataStage 8.5, UNICA, DB2, Teradata, UNIX, MYSQL, Shell Scripting, Control M, SQL.
Confidential, Chicago
Sr. DataStage Developer
Responsibilities:
- Designed, developed and unit tests the ETL components
- Created and maintained technical ETL documentation
- Developed ETL process flow diagram including sourcing, extraction, transformation and loading into Enterprise Data Warehouse.
- Documented, tracked and communicated data issues to the business team.
- Lead the effort and partner with the business team to identify and document data quality issues.
- Created ETL code based on the business mapping and work with the team to ensure that data rules are being supported and properly maintained.
Environment: IBM DataStage 8.5, Oracle, UNIX, SQL, Shell Scripting, Rapid SQL, Control M and Database Development .
Confidential, IL
Systems Engineer
Responsibilities:
- Worked extensively on SQL and Teradata utilities Fast export, fast load.
- DataStage ETL Development for Dynamic Pricing. From this Data warehouse reports are created for business to analyse sales.
- Extract information from Teradata, DB2, MQSQL, Netezza, Files and Etc
- Maintenance of Inbound process which uses complex Teradata queries for ETL.
- Enhancement, Maintenance and on call support for Markdown Management System.
- Developing Code, test plan and test case preparation.
- Analysing data and systems using Teradata & SAS to create reports and queries raised by business.
Environment: DataStage 8.1, Teradata, Netezza, MYSQL, UNIX, Perl, VS-COBOL II, JCL, SQL, TSO/ISPF, Endevor, Change man, File-Aid, DB2, Control M, SAS Revenue Optimization 2.2, SAS 9.1, SAS Tables
Confidential, CA
Systems EngineerResponsibilities:
- Analysis and coding for releases and maintenance work.
- Taking care of queries from various users and providing them the needed data and access.
- Production Support for Confidential application.
Environment: ES-9000, Pentium, OS/390, Win NT, VS-COBOL II, JCL, TSO/ISPF, CICS, File-Aid, DB2, Endevor, VIEWDIRECT, DOCUMENTDIRECT, CONTROL M, INFOMAN, CLEAR QUEST, EZTRIEVE, ASSEMBLER.