We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

5.00/5 (Submit Your Rating)

SUMMARY:

  • Over 9+ years of experience in Development, Design, Integration and Presentation with Java along with Extensive years of Big Data /Hadoop experience in Hadoop ecosystem such as Hive, Pig, Flume, Sqoop, Zookeeper, Abase, SPARK, Kafka, Python and AWS.
  • Experience implementing big data projects on Clouduera 5.6,5.8,5.13 HortonWorks 2.7 and AWS 5.6, 5.20, 5.29 versions.
  • Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Hands - on experience in designing and implementing solutions using Apache Hadoop 2.4.0, HDFS 2.7, MapReduce2, HBase 1.1, Hive 1.2, Oozie 4.2.0, Tez 0.7.0,Yarn 2.7.0,Sqoop 1.4.6,MongoDB.
  • Setting up and integrating Hadoop ecosystem tools - HBase, Hive, Pig, Sqoop etc.
  • Expertise in Big Data architecture like Hadoop (Azure, Horton works, Cloudera) distributed system, MongoDB, NoSQL.
  • Hands on experience loading the data into Spark RDD and performing in-memory data computation
  • Strong understanding of Data Modeling and experience with Data Cleansing, Data Profiling and Data analysis.
  • Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
  • Extensive experience in Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts.
  • Expertise in Apache Spark Development (Spark SQL, Spark Streaming, MLlib, GraphX, Zeppelin, HDFS, YARN and NoSQL).
  • Experience in analyzing data using Hive, Pig Latin and custom MR programs in Java.
  • Hands on experience in writing Spark SQL scripting.
  • Sound knowledge in programming Spark using Scala.
  • Good understanding in processing of real-time data using Spark.
  • Experienced in Worked on No SQL databases - Hbase, Cassandra & MongoDB, database performance tuning & data modelling.
  • Strong Experience in Front End Technologies like JSP, HTML5, JQuery, JavaScript, CSS3.
  • Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Extensive knowledge in programming with Resilient Distributed Data sets (RDDs).
  • Configured Hadoop clusters in Open Stack and Amazon Web Services (AWS)
  • Experience in ETL (Data stage) analysis, designing, developing, testing and implementing ETL processes including performance tuning and query optimizing of databases.
  • Experience in extracting source data from Sequential files, XML files, Excel files, transforming and loading it into the target data warehouse.
  • Strong experience with Java/J2EE technologies such as Core Java, JDBC, JSP, JSTL, HTML, JavaScript, JSON
  • Experience developing iterative algorithms using Spark Streaming in Scala and Python to build near real-time dashboards.
  • Experience in deploying and managing the multi-node development and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, HCATALOG, HBASE, ZOOKEEPER) using Horton works Ambari.
  • Gaining optimum performance with data compression, region splits and by manually managing compaction in HBase.
  • Upgrading from HDP 2.1 to HPD 2.2 and then to HDP 2.3.
  • Working experience in Map Reduce programming model and Hadoop Distributed File System.
  • Hands on experience on Unix/Linux environments, which included software installations/ upgrades, shell scripting for job automation and other maintenance activities.
  • Hands on experience in installing, configuring and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Storm, Spark, Kafka and Flume.
  • Thorough knowledge and experience in SQL and PL/SQL concepts.
  • Expertise in setting up standards and processes for Hadoop based application design and implementation.

TECHNICAL SKILLS:

Operating System: Linux, UNIX, IOS, TinyOS, Sun Solaris, HP-UX, Windows 8, Windows 7, UNIX, Linux, Centos, Ubuntu.

Hadoop/Big Data: Apache Spark, HDFS, MapReduce, MRUnit, YARN, Hive, Pig, HBase, Impala, Zookeeper, Sqoop, Oozie, Apache Cassandra, Scala, Flume, Apache ignite, Avro, AWS.

Languages: Scala. Java JDK1.4 1.5 1.6 (JDK 5 JDK 6), C/C++, SQL, Scala HQL, R, Python, XPath, Spark, PL/SQL, Pig Latin.

Data Warehousing& BI: Informatica Power Center 9x/8x/7x, Power Exchange, IDQ

ETL Tools: IBM Info sphere Data stage 11.5, MSBI (SSIS)

Database: Oracle 11g, AWS Redshift, AWS Athena, IBM Netezza, HBase, Apache Phoenix, SQL Server,Oracle, and MYSQL, HBase, Mongo DB, Cassandra.

Debugging tools: Microsoft SQL Server Management Studio 2008, Business Intelligence Development Studio 2008, RAD, Subversion, BMS Remedy

Version Controller: Tortise HG, Microsoft TFS, SVN, GIT, CVS, Tpump, Mload, Fast Export.

GUI Editors: IntelliJ Community Edition, IntelliJ Data grip, dB Visualizer, SQL SQL, DBeaver

PROFESSIONAL EXPERIENCE:

Confidential, San Francisco,CA

Sr. Data Engineer

Responsibilities:

  • Developed ETL data pipelines using Spark, Spark streaming and Scala.
  • Imported Avro files using Apache Kafka and did some analytics using Spark in Scala.
  • Extracting real time data using Kafka and Spark streaming by Creating D streams and converting them into RDD, processing it and stored it into Cassandra.
  • Experience in building Real-time Data Pipelines with Kafka Connect and Spark Streaming.
  • Configured, deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Using Spark-Streaming APIs to perform transformations and actions on fly for building the common learner data model which gets the data from Kafka in near real time and persists into Cassandra.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including MapReduce, Hive and Spark.
  • Involved in loading data from LINUX file system, servers, Java web servicesusing Kafka Producers, partitions.
  • Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
  • Implemented Storm topologies to pre-process data before moving into HDFS system.
  • Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
  • Implemented POC to migrate MapReduce programs into Spark transformations using Spark and Scala.
  • Migrated complex MapReduce programs into Spark RDD transformations, actions.
  • Implemented SparkRDD transformations to map business analysis and apply actions on top of transformations.
  • Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark.
  • Involved in creating Hive tables, loading with data and writing hive queries which runs internally in MapReduce way.
  • Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra
  • Written HBASE Client program in Java and web services.
  • Develop data ingestion jobs using PIG, Sqoop, Hive and Unix Shell in HDFS
  • Design the solution and develop the program for data ingestion using - Sqoop, map reduce, Shell script & python
  • Created Sqoop jobs, Pig and Hive scripts for data ingestion from relational databases to compare with historical data.
  • Design and Develop Hadoop ETL solutions to move data to the data lake using big data tools like Sqoop, Hive, Spark, HDFS, Talend etc.
  • Built pipelines to move hashed and un-hashed data from Azure Blob to Data lake.
  • Created pipelines to move data from on-premise servers to Azure Data Lake.
  • Worked collaboratively to manage build outs of large data clusters and real time streaming with Spark.
  • Responsible for loading Data pipelines from web servers using Sqoop, Kafka and Spark Streaming API
  • Developed the Kafka producers, partitions in brokers and consumer groups.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework.
  • Implemented Spark using Scala andSparkSQLfor faster testing and processing of data.
  • Data Processing: Processed data using Map Reduce and Yarn. Worked on Kafka as a proof of concept for log processing.
  • Tested Apache Tez, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs
  • Monitoring the hive Meta store and the cluster nodes with the help of Hue.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Created AWS EC2 instances and used JIT servers.
  • Developed various UDFs in Map-Reduce and Python for Pig and Hive.
  • Data Integrity checks have been handledusing hive queries,Hadoop and Spark.
  • Worked on performing transformations & actions on RDDs and Spark Streaming data with Scala.
  • Implemented the Machine learning algorithms using Spark with Python.
  • Defined job flows and developed simple to complex Map Reduce jobs as per the requirement.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Developed PIG UDFs for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
  • Responsible in handling Streaming data from web server console logs.
  • Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
  • Developed PIG Latin scripts for the analysis of semi structured data.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Used Sqoop to import data into HDFS and Hive from other data systems.
  • Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Worked on developing ETL processes (Data Stage Open Studio) to load data from multiple data sources to HDFS using FLUME and SQOOP and performed structural modifications using Map Reduce, HIVE.
  • Involved in NoSQL database design, integration and implementation
  • Loaded data into NoSQL database HBase.
  • Developed Kafka producer and consumers,HBaseclients,SparkandHadoopMapReduce jobs along with components on HDFS, Hive.
  • Very good understanding ofPartitions,Bucketingconcepts in Hive and designed both Managed and External tables inHiveto optimize performance.

Environment: Spark SQL, Spark Streaming, Apache Kafka, Hive, Tez, AWS, ETL, PIG, UNIX, Linux, Tableau, Teradata, Pig, Sqoop, HDFS, Map Reduce, Flume, Hive, Informatica 9.1/8.1/7.1/6.1, Oracle 11g, ETL, Hadoop 2.x, NOSQL, Flat files, Eclipse, Maven, Java, agile methodologies, AWS Hue, Oozie, Scala, Python, Eclipse, Maven, Java, agile methodologies, Elastic search.

Confidential, Nashville, TN

Sr Data Engineer

Responsibilities:

  • Migrated on-premise ETL pipelines running on IBM Netezza to AWS, developed and automated process to migrate data to AWS S3, run ETL using spark on EC2 and delivered data on S3, AWS Athena and AWS Redshift.
  • Involved in requirements gathering and building data lake on top of HDFS and Worked on Go-cd CI/CD tool to deploy the application and have experience within framework for big data testing.
  • Involved in requirements gathering and building data lake on top of HDFS and Worked on Go-cd CI/CD tool to deploy the application and have experience within framework for big data testing.
  • Used Horton works distribution for Hadoop ecosystem.
  • Created Sqoop jobs for importing the data from Relational Database systems into HDFS and also used dump the result into the data bases using Sqoop.
  • Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
  • Developed in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Written python scripts to analyse the data of the customer.
  • Created partitioned tables in Hive, also designed a data warehouse using Hive external tables and also created hive queries for analysis.
  • Captured the data logs from web server into HDFS using Flume for analysis.
  • Performed advanced procedures like text analytics and processing using the in-memory computing capabilities of Spark.
  • Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
  • Involved in migrating Map Reduce jobs into Spark jobs and used SparkSQL and Data frames API to load structured data into Spark clusters.
  • Use Data frames for data transformations using RDD.
  • Designed and Developed Spark workflows using Scala for data pull from cloud-based systems and applying transformations on it.
  • Using Spark streaming consumes topics from distributed messaging source Event hub and periodically pushes batch of data to Spark for real time processing
  • Tuned Cassandra and MySQL for optimizing the data.
  • Implemented monitoring and established best practices around usage of elastic search
  • Used Spark API over Horton works Hadoop YARN to perform analytics on data in Hive.
  • Hands-on experience with Horton works tools like Tea and Amari.
  • Worked on Apache Knife as ETL tool for batch processing and real time processing.
  • Fetch and generate monthly reports. Visualization of those reports using Tableau.
  • Developed Tableau visualizations and dashboards using Tableau Desktop.
  • Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
  • Strong working experience on Cassandra for retrieving data from Cassandra clusters to run queries.
  • Experience in Data modelling using Cassandra.
  • Very good understanding Cassandra cluster mechanism that includes replication strategies, snitch, gossip, consistent hashing and consistency levels.
  • Used Data tax Spark-Cassandra connector to load data into Cassandra and used CQL to analyse data from Cassandra tables for quick searching, sorting and grouping.
  • Worked with BI (Business Intelligence) teams in generating the reports and designing ETL workflows on Tableau. Deployed data from various sources into HDFS and building reports using Tableau.
  • Extensively in creating Map-Reduce jobs to power data for search and aggregation.
  • Managed Hadoop jobs by DAG using Oozie workflow scheduler.
  • Involved in developing code to write canonical model JSON records from numerous input sources to Kafka Queues.
  • Involved in loading data from Linux file systems, servers, java web services using Kafka producers and consumers.

Environment: Cloudera Management(CDH5), Hadoop, Hive, Map Reduce, Sqoop, Spark, Eclipse, Maven, Java, agile methodologies, AWS, Tableau, Pig, Elastic search, Strom, Cassandra, Impala, Oozie, Java, Python, Shell Scripting, MapReduce, Java Collection, MySQL, apache AVRO, Sqoop Zookeeper, SVN, Jenkins, windows AD, windows KDC, Horton works distribution of Hadoop 2.3, YARN, Amari

Confidential, Emeryville, CA

Hadoop (Big Data) Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Implemented nine nodes CDH3 Hadoop cluster on CentOS
  • Implemented Apache Crunch library on top of map reduce and spark for data aggregation.
  • Involved in loading data from LINUX file system to HDFS.
  • Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
  • Implemented a script to transmit suspiring information from Oracle to HBase using Sqoop.
  • Implemented best income logic using Pig scripts and UDFs.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Applied design patterns and OO design conceptsto improve the existingJava/J2EEbased code base.
  • DevelopedJAX-WSweb services.
  • Handling Type 2 and type 1 slowly changing dimensions.
  • Importing and exporting data into HDFS from database and vice versa using Sqoop.
  • Written hive jobs to parse the logs and structure them in tabular format to facilitate effective querying
  • Involved in the design, implementation and maintenance of Data warehouses
  • Involved in creating Hive tables, loading with data and writing Hive queries.
  • Implemented custom interceptors for flume to filter data as per requirement.
  • Used Hive and Pig to analyze data in HDFS to identify issues and behavioral patterns.
  • Created internal and external Hive tables and defined static and dynamic partitions for optimized performance.
  • Configured daily workflow for extraction, processing and analysis of data using Oozie Scheduler.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoopcluster.
  • Wrote Pig Latin scripts for running advanced analytics on the data collected.

Environment: Hadoop, HDFS, Pig, Sqoop, HBase, Shell Scripting, Jenkins, windows AD, windows KDC, Horton works distribution of Hadoop 2.3, YARN, Ambari, Hadoop 2.6.0 YARN, Map R, Red hat Linux, Cent OS, Java 1.6, Hive 0.13, Pig, MySQL, Hbase Spark, Oozie, HDFS, Storm, MongoDB,CDH3, Centos, Sqoop, Oozie, UNIX, T-SQL Hortonworks.

Confidential, Jacksonville, FL

Hadoop Developer

Responsibilities:

  • Suggestions on converting to Hadoop using MapReduce, Hive, Sqoop, Flume and Pig Latin.
  • Experience in writingSpark applications for Data validation, cleansing, transformations and custom aggregations.
  • Imported data from different sources into Spark RDD for processing.
  • Developed custom aggregate functions usingSparkSQLand performed interactive querying.
  • Worked oninstalling cluster, commissioning & decommissioning ofDatanode, Name nodehigh availability, capacity planning, and slots configuration.
  • Responsible for managing data coming from different sources.
  • Imported and exported data into HDFS using Flume.
  • Experienced in analyzing data with Hive and Pig.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Setup and benchmarked Hadoop/HBase clusters for internal use.
  • Setup Hadoop cluster on Amazon EC2 using whirr for POC.
  • Worked on developing applications in Hadoop Big Data Technologies-Pig, Hive, Map-Reduce, Oozie, Flume, and Kafka.
  • Experienced in managing and reviewing Hadoop log files.
  • Helped with Big Data technologies for integration of Hive with HBASE and Sqoop with HBase.
  • Analyzed data with Hive, Pig and Hadoop Streaming.
  • Involved in transforming therelational databaseto legacy labels to HDFS andHBASEtables usingSqoopand vice versa.
  • Involved in Cluster coordination services through Zookeeper and Adding new nodes to an existing cluster.
  • Moved the data from traditional databases like MySQL, MS SQL Server and Oracle into Hadoop.
  • Worked on Integrating Talend and SSIS with Hadoop and performed ETL operations.
  • Installed Hive, Pig, Flume, Sqoop and Oozie on the Hadoop cluster.
  • Used Flume to collect, aggregate and push log data from different log servers.

Environment: Pig, Hive, Oozie, Sqoop, Flume, HBase, Java, Maven, Avro, Cloudera, Eclipse and Shell Scripting Hadoop, Horton works, Linux, HDFS, Cloudera Hadoop, Linux, HDFS, Map reduce, Oracle, SQL Server, Eclipse, Java and Oozie scheduler, Kafka.

Confidential, Dayton OH

Data Engineer

Responsibilities:

  • Processed data into HDFS by developing solutions, analyzed the data using MapReduce, Pig, Hive and produce summary results from Hadoop to downstream systems.
  • Used Kettle widely in order to import data from various systems/sources like MySQL into HDFS.
  • Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
  • Involved in creating Hive tables, and then applied HiveQL on those tables for data validation.
  • Moved the data from Hive tables into Mongo collections.
  • Used Zookeeper for various types of centralized configurations.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
  • Managed and reviewed Hadoop log files.
  • Tested raw data and executed performance scripts.
  • Shared responsibility for administration of Hadoop, Hive and Pig.

Environment: Hadoop, Spark SQL, Pig, Hive, Sqoop, Flume, MapReduce, HDFS, LINUX, Oozie, MongoDB

Confidential

Java/J2EE Consultant

Responsibilities:

  • Involved in design ofJSP’sandServletsfor navigation among the modules.
  • Designed cascading style sheets andXMLpart of Order entry Module & Product Search Module and did client side validations with java script.
  • Developed client customized interfaces for various clients using CSS and JavaScript.
  • Designed and implemented the User interface using HTML, CSS, JavaScript and SQL Server.
  • Developed Interfaces using JSP based on the Users, Roles, and Permissions. Screen options were displayed on User permissions. This was coded using Custom Tags in JSP using Tag Libraries.
  • Created web services using Advanced J2EE technologies to communicate with external systems.
  • Involved in the UI development, including layout and front-end coding per the requirements of the client by using JavaScript and Ext JS.
  • Used Hibernate along with Spring Framework to integrate with Oracle database.
  • Built complex SQL queries and ETL scripts for data extraction and analysis to define the application requirements.
  • Developed UI usingHTML,JavaScript, JSP, and developed Business Logic and Interfacing components using Business Objects,XML andJDBC.
  • Designed user-interface and checking validations using JavaScript.
  • Managed connectivity usingJDBCfor querying/inserting & data management including triggers and stored procedures.
  • Developed various EJBs for handling business logic and data manipulations from databases.
  • Performing the code review for peers and maintaining the code repositories using GIT.
  • Enhanced the mechanism of logging and tracing with Log4j.
  • Web services client generation using WSDL file.
  • Involved in development of the presentation layer using STRUTS and custom tag libraries.
  • Performing integration testing, supporting the project, tracking the progress with help of JIRA.
  • Acted as the first point of contact for the Business queries during development and testing phase.
  • Working closely with clients and QA team to resolve critical issues/bugs.

Environment: JSP, Servlets, Struts, Hibernate, HTML, CSS, JavaScript, JSON, REST, JUnit, XML, SASS, DOM, Web Logic (Oracle App server), Web Services, Eclipse, Agile.

Confidential

Software analyst

Responsibilities:

  • Responsible for coding User interfaces using Spring MVC.
  • Working knowledge of SQL, JDBC, relational database design
  • Linux Operating System / Bash Shell Scripting Open Source Technologies
  • Responsible for coding POJO Classes.
  • Implemented Business logic.
  • Developed Client-side validations using Spring framework.
  • Functional Testing and Bug fixing.
  • Assist in development of software technical documentation
  • Assist in researching and assessing new technologies with international team members
  • Participate in agile driven team work
  • Is responsible for developing, coding, testing and debugging new highly complex software solutions or enhancements to existing software in a maintenance capacity
  • Resolves customer complaints with software and responds to suggestions for improvements and enhancements

Environment: Agile, Core Java, Spring, Hibernate, REST, SOAP, Tomcat, JSON, AJAX, JUnit, Mockito, Ant.

We'd love your feedback!