We provide IT Staff Augmentation Services!

Data Engineer Resume

2.00/5 (Submit Your Rating)

New York, NY

SUMMARY

  • Over 7+ years of IT experience in the Health, Banking, Insurance, and E - commerce domain, Design, Development, Maintenance and Support of Big Data Applications and JAVA/J2EE. Over 6+ years of experience in Analysis, Design, Development, and Implementation as a Data Engineer.
  • Strong exposure to Spark, Spark Streaming, Spark MLlib frameworks and developing production ready Spark application using both Scala and Python programming interfaces.
  • Hands on experience in working wif Spark and import data from different data sources like storage layers, kafka, databases etc., perform transformations, save the results to different destinations.
  • Worked extensively on fine tuning spark applications to improve performance and troubleshooting failures in spark applications.
  • Experience working wif Spark Streaming and Kafka for building reliable streaming pipelines and ability to troubleshoot and finetuning streaming application to handle and recover from failures.
  • Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark wif Hive and SQL/Oracle.
  • Good understanding ofHadooparchitecture and various components in Big data ecosystem.
  • Experienced working wif Hadoop distributions both on-prem (CDH, HDP) and in cloud (AWS).
  • Good experience working wif various data analytics and big data services in AWS Cloud like EMR, Redshift, S3, Athena, Glue etc.,
  • Used hive extensively to performing various data analytics required by business teams.
  • Solid experience in working various data formats like Parquet, Orc, Avro, Json etc.,
  • Expertise in writing dynamic-SQL, complex stored procedures, functions, and views.
  • Excellent understanding and noledge of NOSQL databases likeMongoDB, HBase and Cassandra.
  • Experience in importing and exporting data usingSqoopfromHDFSto Relational Database Systems and vice-versa.
  • Experience in Object Oriented Analysis DesignOOADand development of software using UML Methodology good noledge of J2EE design patterns and Core Java design patterns.
  • Experience in managing Hadoop clusters usingCloudera Manager tool.
  • Very good experience in complete project life cycle design development testing and implementation of Client Server andWeb applications.
  • Storing data in storage layer of Apache Ignite and making webservices run on its own and display the web output on AWS.
  • Good working noledge on Snowflake and Teradata databases.
  • Hands on experience in Sequence files, RC files, Avro, Parquet, RC File and JSON Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement.
  • Skilled in developing Java Map Reduce programs using java API and using hive, pig to perform data analysis, data cleaning and data transformation.
  • Experience in Analyzing the SQL scripts and designed the solution to implement using Pyspark.
  • Worked wif join patterns and implemented Map side joins and Reduce side joins using Map Reduce.
  • Developed enterprise applications using Scala.
  • Developed multiple MapReduce jobs to perform data cleaning and preprocessing.
  • Designed HIVE queries & Pig scripts to perform data analysis, data transfer and table design to load data into Hadoop environment.
  • Debugged and improved the performance of hive SQL queries by adding partition columns.
  • Converted Hive SQL to Spark SQL as part of the migration of pipelines.
  • Expertise in writing Hive UDF, Generic UDF's to incorporate complex business logic into Hive Queries.
  • Extensive experience on importing and exporting data using stream processing platforms like Flume and Kafka.
  • Expertise in working wif Hive data warehouse tool-creating tables, data distribution by implementing partitioning, bucketing, writing, and optimizing the HiveQL queries.
  • Exposure in using build tools like Maven, SBT.
  • Experience as a java Developer in client/server technologies using J2EE Servlets, JSP, JDBC and SQL.
  • Expertise in designing and development enterprise applications for J2EE platform using MVC, JSP, Servlets, JDBC, Web Services, Hibernate and designing Web Applications using HTML5, CSS3, AngularJS, Bootstrap.
  • Excellent interpersonal and communication skills, creative, research-minded, technically competent, and result-oriented wif problem solving and leadership skills.

TECHNICAL SKILLS

Big Data Ecosystem: Spark, Hive, MapReduce, YARN, HDFS, Impala, Sqoop, Spark, Kafka and Oozie

Programming Languages: Java, Scala, and Python

Frameworks: Spring, Hibernate, JMS.

IDE: Eclipse, IntelliJ, PyCharm.

Databases: IBM DB2, Oracle, SQL Server, MySQL, RDBMS, HBase, Cassandra.

Tools: Tableau, Zoomdata, Talend.

Cloud Services: AWS S3, EMR, Athena, Redshift, Glue Metastore, Lambda functions, Azure Databricks.

Methodologies: Agile, Waterfall.

PROFESSIONAL EXPERIENCE:

Data Engineer

Confidential, New York, NY

Responsibilities:

  • Responsible for ingesting large volumes of user behavioral data and customer profile data to Analytics Data store.
  • Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.
  • Have experience in using Python wif PySpark in building data pipelines and writing python scripts to automate pipelines.
  • Developed many Spark applications for performing data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning exercise.
  • Developed various spark applications using PySpark to perform various enrichments of user behavioral data (click stream data) merged wif user profile data
  • Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup onAWS.
  • Designed and Implement test environment on AWS.
  • Involved in Designing and Developing Enhancements of CSG using AWS APIS.
  • Act as technical liaison between customer and team on all AWS technical aspects.
  • Created pipelines to move data fromon-premises servers to Azure Data Lake.
  • Utilized Azure HDInsight to monitor and manage one of our Hadoop Cluster.
  • Experience wif Azure Databricks in processing raw data from source systems and writing to destination delta lakes.
  • Worked on troubleshooting spark application to make them more error tolerant.
  • Utilized PySpark API to implement batch processing of jobs
  • Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.
  • Wrote Kafka producers to stream the data from external rest api’s to Kafka topics.
  • Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
  • Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, TEMPeffective & efficient Joins, transformations, and other capabilities.
  • Worked extensively wif Sqoop for importing data from Oracle.
  • Experience working for EMR cluster in AWS cloud and working wif S3.
  • Involved in creating Hive tables, loading, and analyzing data using hive scripts.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Good experience wif continuous Integration of application using Jenkins.
  • Used Reporting tools like Tableau to connect wif Athena for generating daily reports of data.
  • Collaborated wif the infrastructure, network, database, application, and BA teams to ensure data quality and availability.

Environment: AWS Cloud, Spark, Spark Streaming, Spark SQL, Python, PySpark, Scala, Kafka, Hive, Sqoop, HBase, Azure HDInsight, Tableau, AWS Simple workflow, Oracle, Linux.

Sr. Hadoop/Spark Developer

Confidential, Phoenix, AZ

Responsibilities:

  • Involved in requirement analysis, design, coding and implementation phases of the project.
  • Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
  • Converted existing MapReduce jobs into Spark transformations and actions using Spark RDDs, Data frames and Spark SQL APIs.
  • Written new spark jobs in Scala to analyze the data of the customers and sales history.
  • Used Kafka to get data from many streaming sources into HDFS.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Created data lake wif Snowflake and built several data marts wif presentable and modelled data.
  • Good experience in Hive partitioning, Bucketing and Collections perform different types of joins on Hive tables.
  • Created Hive external tables to perform ETL on data that is generated on daily basics.
  • Written HBase bulk load jobs to load processed data to Hbase tables by converting to HFiles.
  • Performed validation on the data ingested to filter and cleanse the data in Hive.
  • Created Sqoop jobs to handle incremental loads from RDBMS into HDFS and applied Spark transformations.
  • Loaded the data into hive tables from spark and used parquet columnar format.
  • Developed oozie workflows to automate and product ionize the data pipelines.
  • Developed Sqoop import Scripts for importing reference data from Netezza.

Environment: Hadoop, HDFS, Hive, Sqoop, Kafka, Spark, Shell Scripting, Snowflake, HBase, Scala, Python, Kerberos, Maven, Ambari, Hortonworks, MySQL.

Hadoop Developer

Confidential, Sterling, VA

Responsibilities:

  • Developed custom input adaptors for ingesting click stream data from external sources like ftp server into S3 backed data lakes on daily basis.
  • Created various spark applications using PySpark and Scala to perform series of enrichments of these click-stream data combined wif enterprise data of the users.
  • Implemented batch processing of jobs using Spark Scala API.
  • Developed Sqoop scripts to import/export data from Teradata to HDFS and into Hive tables.
  • Optimized Hive tables using optimization techniques like partitions and bucketing to provide better performance wif Hive QL queries.
  • Worked wif multiple file formats like Avro, Parquet and Orc.
  • Converted existing MapReduce programs to Spark Applications for handling semi structured data like JSON files, Apache Log files, and other custom log data.
  • Wrote Kafka producers to stream the data from external rest api’s to Kafka topics.
  • Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
  • Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, TEMPeffective & efficient Joins, transformations, and other capabilities.
  • Worked extensively wif Sqoop for importing data from Teradata.
  • Implemented business logic in Hive and written UDF’s to process the data for analysis.
  • Utilized AWS services like S3, EMR, Redshift, Athena, Glue Metastore etc., for building and managing data pipelines wifin the cloud.
  • Automated EMR Cluster creation and termination using AWS Java SDK.
  • Loaded the processed data to redshift clusters using Spark Redshift Integration.
  • Created views wif-in Athena for allowing downstream reporting and data analysis team to query and analyze the results.

Environment: Spark, Hive, HBase, Scala, Python, Shell Scripting, Amazon EMR, S3

Hadoop Developer

Confidential, Pittsburg, PA

Responsibilities:

  • Developed Spark applications using Scala utilizing Data frames and Spark SQL API for faster processing of data.
  • Developed highly optimized Spark applications to perform various data cleansing, validation, transformation, and summarization activities according to the requirement
  • Data pipeline consists of Spark, Hive and Sqoop and custom build Input Adapters to ingest, transform and analyze operational data.
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • Used Spark for interactive queries, processing of streaming data and integration wif popular NoSQL database for huge volume of data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark DataFrames and Scala.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Used Spark for interactive queries, processing of streaming data and integration wif popular NoSQL database for huge volume of data.
  • Built real time data pipelines by developing Kafka producers and spark streaming applications for consuming.
  • Ingested syslog messages parsed them and streamed the data to Kafka.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
  • Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
  • Scheduled and executed workflows in Oozie to run various jobs.

Environment: Hadoop, HDFS, HBase, Spark, Scala, Hive, MapReduce, Sqoop, ETL, Java

Java Developer

Confidential

Responsibilities:

  • Involved in client requirement gathering, analysis & application design.
  • Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support in Waterfall methodology.
  • Developed the UI layer wif JSP, HTML, CSS, Ajax, and JavaScript.
  • Used Asynchronous JavaScript and XML (AJAX) for better and faster interactive Front-End.
  • Used JavaScript to perform client-side validations.
  • Involved in Database Connectivity through JDBC.
  • Ajax was used to make Asynchronous calls to server side and get JSON or XML data.
  • Developed server-side presentation layer using Struts MVC Framework.
  • Developed Action classes, Action Forms and Struts Configuration file to handle required UI actions and JSPs for Views.
  • Developed batch job using EJB scheduling and leveraged container managed transactions for highly transactions.
  • Used various Core Java concepts such as Multi-Threading, Exception Handling, Collection APIs, Garbage collections for dynamic memory allocation to implement various features and enhancements.
  • Developed Hibernate entities, mappings, and customized criterion queries for interacting wif database.
  • Implemented and developed REST and SOAP based Web Services to provide JSON and Xml data.
  • Involved in implementation of web services (top-down and bottom-up).
  • Used JPA and JDBC in the persistence layer to persist the data to the DB2 database.
  • Created and written SQL queries, tables, triggers, views, and PL/SQL procedures to persist and retrieve the data from the database.
  • Developed a Web service to communicate wif the database using SOAP.
  • Performance Tuning and Optimization wif Java Performance Analysis Tool.
  • Implement JUnit test cases for Struts/Spring components.
  • JUnit is used to perform the Unit Test Cases.
  • Used Eclipse as IDE and worked on installing and configuring JBOSS.
  • Made use of CVS for checkout and check in operations.
  • Deployed the components into WebSphere Application server
  • Worked wif production support team in debugging and fixing various production issues.

Environment: Java, JSP, HTML, CSS, AJAX, JavaScript, JSON, XML, Struts, Struts MVC, JDBC, JPA, Web Services, SOAP, SQL, JBOSS, DB2, ANT, Eclipse IDE, WebSphere.

We'd love your feedback!