We provide IT Staff Augmentation Services!

Data Engineer Resume

3.00/5 (Submit Your Rating)

Minnetonka, MN

SUMMARY:

  • Over 10+ years of professional IT experience including 3+ years in Hadoop/Big data ecosystem.
  • Hands on experience in installing and deployment of Hadoop ecosystem components like Hadoop Map Reduce, YARN, HDFS, NoSQL, HBase, Oozie, Hive, Tableau, Sqoop, Pig, Zoo Keeper and Flume.
  • Good Understanding of Hadoop architecture and Hands - on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts and HDFS Framework.
  • Excellent Hands on Experience in developing Hadoop Architecture within the project in Windows and Linux platforms.
  • Good technical Skills in Oracle 11i, SQL Server, ETL Development using Informatica tool.
  • Good Scripting Skills in Pig and Hive Systems.
  • Expert in importing and exporting data from Oracle and MySQL databases into HDFS using Sqoop and Flume.
  • Performed data analytics using PIG and Hive for Data Architects and Data Scientists within the team.
  • Experience with Amazon Web Services, AWS command line interface, and AWS data pipeline.
  • Experience with NoSQL databases like HBase, and Cassandra as well as other ecosystems like Zookeeper, Oozie, Impala, Storm, AWS Redshift etc.
  • Experience in Job scheduling using Autosys.
  • Developed stored procedures and queries using PL/SQL.
  • Expertise in RDBMS like Oracle, MS SQL Server, TERADATA, MySQL and DB2.
  • Strong analytical skills with ability to quickly understand client’s business needs. Involved in meetings to gather information and requirements from the clients. Leading the Team and involved in Onsite, Offshore coordination.
  • Designed AMI images of EC2 Instances by employing AWS CLI and GUI.
  • Created and Managed User Accounts, Log Management, Shared Folders, Reporting, Group Policy Restrictions, etc.
  • Responsible for Troubleshoot various network problems, system problems like Core Dump Analysis.
  • Infrastructure Development on AWS by employing services such as EC2, RDS, CloudFront, Cloud Watch, VPC,EMR,S3 etc.
  • Worked with Management frameworks and Cloud Administration tools.
  • Work Experience as a member of AWS Build Team.

SKILLS & ABILITIES:

Analytical Tools: SQL, Jupyter Notebook, Tableau, Zeppelin

NoSQL: Cassandra, HBase, MongoDB

Programming: Python, SCALA, SQL Python - Data Manipulation, NumPy, Pandas, Matplotlib, Plotly

Big Data: Spark, Pig, Hive, Sqoop, HBase, Impala, Hadoop, HDFS, MapReduce, Flume, Shell Script, Kafka Spark - Spark Core, Spark SQL, Spark Streaming, PySpark, Scala, AWS, Hue

Databases: Oracle 11g/10g, DB2 8.1, MS-SQL Server, My SQL

Operating Systems: Unix / Linux, Windows 2000/NT/XPs

Cloud Services: AWS,AWS Managed Services Like S3,EMR,Lambda,EC2,Cloudformation, VPC etc.

EXPERIENCE:

Confidential, Minnetonka, MN

Data Engineer

Responsibilities:

  • Design, plan, and develop programs to perform automated extract, transform and load data between data sources when working with large data sets.
  • Capable of processing large sets of data like unstructured and structured and supporting architecture and applications.
  • Reviewing code/providing feedback relative to best practices, performance improvements etc
  • Design encryption & Decryption Algorithm for Accessing PII data.
  • Improving data processing and storage throughput by using Hadoop framework for distributed computing across a cluster of up to twenty-five nodes.
  • Developed ETL framework using Spark and Hive (including daily runs, error handling, and logging) to useful data.
  • Create ETL Pipeline using Spark and Hive for ingest data from multiple sources.
  • Create Transformed View Data Pipeline of Map Reduce programs using Chained Mappers.
  • Responsible for tuning Hive and Spark Job to improve performance.
  • Loaded the data into Spark RDD and do in memory data computation to generate the output response.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD’s.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capacities of Spark using Scala.
  • Experienced in handling large datasets using partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformation and other during ingestion process itself.
  • Worked on migrating legacy Map Reduce programs into Spark transformations using Spark and Scala.
  • Used Spark-APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from multiple sources and persists into HDFS.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Hadoop YARN to perform analytics on data in Hive.

Confidential, Albany, NY

Big Data Developer

Responsibilities:

  • Involved in Requirement gathering, Business Analysis and translated business requirements into Technical design in Hadoop and Big Data.
  • Gathered the requirements for data lakes/pipe lines and Implemented end to end data pipelines.
  • Involved in data modeling, capacity planning, configuration of Cassandra Cluster on DATASTAX.
  • Experience in using Sqoop to import the data on to Cassandra tables from different relational databases.
  • Importing and exporting data into HDFS from database and vice versa using Sqoop.
  • Worked on the core and Spark SQL modules of Spark extensively using programming languages like Python and Scala.
  • Utilizing Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Python/PySpark also Scala and databases such as HBase
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Spark, Hive and Pig.
  • Created Data Pipeline of Map Reduce programs using Chained Mappers.
  • Involved in creating Hive tables, loading with data and writing hive queries.
  • Responsible in exporting analyzed data to relational databases using Sqoop.
  • Implemented Daily Oozie coordination jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-coordinator jobs.
  • Creating the tables in Hive and integrating data between Hive & Spark.
  • Responsible for tuning Hive and Pig scripts to improve performance.
  • Implemented unit tests with MR Unit and PIG Unit.
  • Documented the technical details Hadoop cluster management and daily batch pipeline, which includes several jobs of Pig, Hive, Sqoop, Oozie and other scripts.

Environment: DATASTAX Cassandra, AWS, Map-Reduce, Hive, Pig, Oozie and Sqoop

Confidential, Carmel, IN

Big Data Consultant

Responsibilities:

  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and SQOOP.
  • Coordinated with business customers to gather business requirements. And also interact with other technical peers to derive Technical requirements.
  • Extensively involved in Design phase and delivered Design documents.
  • Involved in Testing and coordination with business in User testing.
  • Importing and exporting data into HDFS and Hive using SQOOP.
  • Transforming the data using Spark applications for analytics consumption.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Involved in creating Hive tables, loading with data and writing hive queries.
  • Experienced in defining job flows.
  • Used Hive to analyze the partitioned data and compute various metrics for reporting.
  • Experienced in managing and reviewing the Hadoop log files.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations.
  • Load and Transform large sets of structured and semi structured data.
  • Responsible to manage data coming from different sources.
  • Created Data model for Hive tables.
  • Involved in Unit testing and delivered Unit test plans and results documents.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Worked on Oozie workflow engine for job scheduling.

Environment: Hadoop, HDFS, Pig, Hive, Sqoop, HBase, Oozie, Flume.

Confidential, San Jose, CA

Data Engineer

Responsibilities:

  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce and Spark to ingest customer behavioral data and purchase histories into HDFS for analysis.
  • Exporting the analyzed and processed data to the RDBMS using Sqoop for visualization and for generation of reports for the BI team.
  • Analyzing large amounts of data sets to determine optimal way to aggregate and report on these data sets.
  • Used Pig as ETL tool to do transformations, event joins, filters and some pre-aggregations before storing the data onto HDFS.
  • Optimizing pig scripts, user interface analysis, performance tuning and analysis.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Loaded the aggregated data onto DB2 for reporting on the dashboard.
  • Functional and non-functional requirements gathering.
  • Used Oozie workflow engine to run multiple Hive and pig jobs.

Environment: Big Data / Hadoop, Spark, HDFS, Hive, Pig, Sqoop, Flume, Impala, Oozie, Informatica, Java, and DB2.

Confidential, Charlotte, NC

Message Broker Developer

Responsibilities:

  • Participate in scope discussions and requirement meetings
  • Document requirements and seek approval from various stake holders
  • Design message models and flow creation
  • Worked extensively with SOAP nodes in calling webservices.
  • Review design for adherence to organization’s ESB goal and Integration Bus best practices
  • Worked on logging and error handling libraries
  • Handled complete unit testing and integration testing of the services
  • Design & development of ESB interfaces (message flows) for merchandising, supply chain domain.
  • Involved in writing Web services for various transactions that are existing as xml based transactions
  • Developed Message Flow interfaces using different WMB built-in nodes like Compute, Database, SOAP/HTTP nodes.
  • Work with various users to create UAT test cases
  • Production support and defect fixes

Primary Technologies/Skills: IBM Integration Bus (IIB) 9.0.0.1, IBM MQ 7.5, Datapower, WMQFTE, SOAP UI, XML Spy, SSL, XML,XPATH, XSLT

Confidential, Teaneck, NJ

Message Broker Developer/Admin

Responsibilities:

  • As part of ESB production support on-site lead, maintaining the ESB infrastructure which includes IIB (IBM Integration Bus), WMQ, WSRR & WAS.
  • Handling the infrastructure and interface related issues (Tickets) with adherence to SLA’s
  • Effectively providing impact analysis & effort estimations for the changes, i.e. Both infrastructure & Business
  • Preparing execution plan and scripts for release management i.e. Rollouts
  • Distributing work load among onsite and offshore team
  • Defect Tracking and follow-up with vendors related to product issues like PMR’s with IBM etc.
  • Participating design & requirement meetings to provide ESB overview and understand new components in ESB
  • Fine tuning the IIB & WMQ to support peak volumes
  • Prepared the MQ/Message Broker Infrastructure Design document with the details of MQ/MB infrastructure setup for non-production (DEV, TEST) and Production environments.
  • Developed UNIX shell scripts to automate MQ/MB installations and queue manager configurations.
  • Designed and implemented security strategy for Websphere MQ queues, channels and other objects for developers and applications.
  • Discussion with interfacing system leads like as DSC and Wide Orbit
  • Configured the broker (IIB Node) for the message flow & Unix to interact with the databases.
  • Created IIB & WMQ objects using mqsc script commands.
  • Updated & developed Message Sets, Message flows, Mediation flows and deployed in broker runtime.
  • Developed Message flows without Message set which used custom XSD/XSLT transformations to XML and Fixed Length Message (TDS) using DFDL.

Environment: IIB (IBM Integration BUS), WebSphere MQ 7.5, WSRR 8, WAS, SOA, ESB

Confidential, Greensboro, NC

Message Broker Developer

Responsibilities:

  • Prepared the MQ/Message Broker Infrastructure Design document with the details of MQ/MB infrastructure setup for non-production (DEV, TEST) and Production environments.
  • Developed UNIX shell scripts to automate MQ/MB installations and queue manager configurations.
  • Installed MQ v 7.0.1 Server/client on Windows, Linux.
  • Designed and implemented security strategy for Websphere MQ queues, channels and other objects for developers and applications.
  • Designed and implemented MQ clustering for corporate queue managers.
  • Developed shell scripts for administering MQ/MB components.SAP IDoc and BAPI connectivity as well as other legacy systems Integration
  • Discussion with interfacing system leads
  • Configured the broker (IIB Node) for the message flow to interact with the databases.
  • Setup the execution group s(IIB Servers) for the message flows deployment.
  • Suggested and implemented MQ/MB best practices.
  • Prepared MQ Installation, Configuration and Production Support document with the infrastructure setup details of dev, test and prod environment. Propose high level design approach/ implementation approach / interface design
  • Review design with customer IT architect
  • Design / Document/Implement solution
  • Present and review solution approach
  • Created MQ objects such as Local Queues, Remote queues, Alias Queues, Sender and Receiver Channels etc. using mqsc script commands.
  • Designed & developed Message Sets, Message flows, Mediation flows and deployed in broker runtime.
  • Developed Message flows without Message set which used custom XSD/XSLT transformations to XML and Fixed Length Message (TDS) using DFDL.

Environment: IIB (IBM Integration BUS) WebSphere Message Broker v8.0.0.2, WebSphere MQ 7.5, WSRR 8, SOA, ESB, XML, XSD, Canonical, Retail, Supply Chain.

We'd love your feedback!