Data Engineer Resume New York, NY - Hire IT People

SUMMARY

Over 7+ years of IT experience in the Health, Banking, Insurance, and E - commerce domain, Design, Development, Maintenance and Support of Big Data Applications and JAVA/J2EE. Over 6+ years of experience in Analysis, Design, Development, and Implementation as a Data Engineer.
Strong exposure to Spark, Spark Streaming, Spark MLlib frameworks and developing production ready Spark application using both Scala and Python programming interfaces.
Hands on experience in working with Spark and import data from different data sources like storage layers, kafka, databases etc., perform transformations, save the results to different destinations.
Worked extensively on fine tuning spark applications to improve performance and troubleshooting failures in spark applications.
Experience working with Spark Streaming and Kafka for building reliable streaming pipelines and ability to troubleshoot and finetuning streaming application to handle and recover from failures.
Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle.
Good understanding ofHadooparchitecture and various components in Big data ecosystem.
Experienced working with Hadoop distributions both on-prem (CDH, HDP) and in cloud (AWS).
Good experience working with various data analytics and big data services in AWS Cloud like EMR, Redshift, S3, Atana, Glue etc.,
Used hive extensively to performing various data analytics required by business teams.
Solid experience in working various data formats like Parquet, Orc, Avro, Json etc.,
Expertise in writing dynamic-SQL, complex stored procedures, functions, and views.
Excellent understanding and noledge of NOSQL databases likeMongoDB, HBase and Cassandra.
Experience in importing and exporting data usingSqoopfromHDFSto Relational Database Systems and vice-versa.
Experience in Object Oriented Analysis DesignOOADand development of software using UML Methodology good noledge of J2EE design patterns and Core Java design patterns.
Experience in managing Hadoop clusters usingCloudera Manager tool.
Very good experience in complete project life cycle design development testing and implementation of Client Server andWeb applications.
Storing data in storage layer of Apache Ignite and making webservices run on its own and display the web output on AWS.
Good working noledge on Snowflake and Teradata databases.
Hands on experience in Sequence files, RC files, Avro, Parquet, RC File and JSON Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement.
Skilled in developing Java Map Reduce programs using java API and using hive, pig to perform data analysis, data cleaning and data transformation.
Experience in Analyzing the SQL scripts and designed the solution to implement using Pyspark.
Worked with join patterns and implemented Map side joins and Reduce side joins using Map Reduce.
Developed enterprise applications using Scala.
Developed multiple MapReduce jobs to perform data cleaning and preprocessing.
Designed HIVE queries & Pig scripts to perform data analysis, data transfer and table design to load data into Hadoop environment.
Debugged and improved the performance of hive SQL queries by adding partition columns.
Converted Hive SQL to Spark SQL as part of the migration of pipelines.
Expertise in writing Hive UDF, Generic UDF's to incorporate complex business logic into Hive Queries.
Extensive experience on importing and exporting data using stream processing platforms like Flume and Kafka.
Expertise in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning, bucketing, writing, and optimizing the HiveQL queries.
Exposure in using build tools like Maven, SBT.
Experience as a java Developer in client/server technologies using J2EE Servlets, JSP, JDBC and SQL.
Expertise in designing and development enterprise applications for J2EE platform using MVC, JSP, Servlets, JDBC, Web Services, Hibernate and designing Web Applications using HTML5, CSS3, AngularJS, Bootstrap.
Excellent interpersonal and communication skills, creative, research-minded, technically competent, and result-oriented with problem solving and leadership skills.

TECHNICAL SKILLS

Big Data Ecosystem: Spark, Hive, MapReduce, YARN, HDFS, Impala, Sqoop, Spark, Kafka and Oozie

Programming Languages: Java, Scala, and Python

Frameworks: Spring, Hibernate, JMS.

IDE: Eclipse, IntelliJ, PyCharm.

Databases: IBM DB2, Oracle, SQL Server, MySQL, RDBMS, HBase, Cassandra.

Tools: Tableau, Zoomdata, Talend.

Cloud Services: AWS S3, EMR, Atana, Redshift, Glue Metastore, Lambda functions, Azure Databricks.

Methodologies: Agile, Waterfall.

PROFESSIONAL EXPERIENCE:

Data Engineer

Confidential, New York, NY

Responsibilities:

Responsible for ingesting large volumes of user behavioral data and customer profile data to Analytics Data store.
Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.
Have experience in using Python with PySpark in building data pipelines and writing python scripts to automate pipelines.
Developed many Spark applications for performing data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning exercise.
Developed various spark applications using PySpark to perform various enrichments of user behavioral data (click stream data) merged with user profile data
Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup onAWS.
Designed and Implement test environment on AWS.
Involved in Designing and Developing Enhancements of CSG using AWS APIS.
Act as technical liaison between customer and team on all AWS technical aspects.
Created pipelines to move data fromon-premises servers to Azure Data Lake.
Utilized Azure HDInsight to monitor and manage one of our Hadoop Cluster.
Experience with Azure Databricks in processing raw data from source systems and writing to destination delta lakes.
Worked on troubleshooting spark application to make them more error tolerant.
Utilized PySpark API to implement batch processing of jobs
Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.
Wrote Kafka producers to stream the data from external rest api’s to Kafka topics.
Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, TEMPeffective & efficient Joins, transformations, and other capabilities.
Worked extensively with Sqoop for importing data from Oracle.
Experience working for EMR cluster in AWS cloud and working with S3.
Involved in creating Hive tables, loading, and analyzing data using hive scripts.
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
Good experience with continuous Integration of application using Jenkins.
Used Reporting tools like Tableau to connect with Atana for generating daily reports of data.
Collaborated with the infrastructure, network, database, application, and BA teams to ensure data quality and availability.

Environment: AWS Cloud, Spark, Spark Streaming, Spark SQL, Python, PySpark, Scala, Kafka, Hive, Sqoop, HBase, Azure HDInsight, Tableau, AWS Simple workflow, Oracle, Linux.

Sr. Hadoop/Spark Developer

Confidential, Phoenix, AZ

Responsibilities:

Involved in requirement analysis, design, coding and implementation phases of the project.
Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
Converted existing MapReduce jobs into Spark transformations and actions using Spark RDDs, Data frames and Spark SQL APIs.
Written new spark jobs in Scala to analyze the data of the customers and sales history.
Used Kafka to get data from many streaming sources into HDFS.
Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Created data lake with Snowflake and built several data marts with presentable and modelled data.
Good experience in Hive partitioning, Bucketing and Collections perform different types of joins on Hive tables.
Created Hive external tables to perform ETL on data dat is generated on daily basics.
Written HBase bulk load jobs to load processed data to Hbase tables by converting to HFiles.
Performed validation on the data ingested to filter and cleanse the data in Hive.
Created Sqoop jobs to handle incremental loads from RDBMS into HDFS and applied Spark transformations.
Loaded the data into hive tables from spark and used parquet columnar format.
Developed oozie workflows to automate and product ionize the data pipelines.
Developed Sqoop import Scripts for importing reference data from Netezza.

Environment: Hadoop, HDFS, Hive, Sqoop, Kafka, Spark, Shell Scripting, Snowflake, HBase, Scala, Python, Kerberos, Maven, Ambari, Hortonworks, MySQL.

Hadoop Developer

Confidential, Sterling, VA

Responsibilities:

Developed custom input adaptors for ingesting click stream data from external sources like ftp server into S3 backed data lakes on daily basis.
Created various spark applications using PySpark and Scala to perform series of enrichments of these click-stream data combined with enterprise data of the users.
Implemented batch processing of jobs using Spark Scala API.
Developed Sqoop scripts to import/export data from Teradata to HDFS and into Hive tables.
Optimized Hive tables using optimization techniques like partitions and bucketing to provide better performance with Hive QL queries.
Worked with multiple file formats like Avro, Parquet and Orc.
Converted existing MapReduce programs to Spark Applications for handling semi structured data like JSON files, Apache Log files, and other custom log data.
Wrote Kafka producers to stream the data from external rest api’s to Kafka topics.
Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, TEMPeffective & efficient Joins, transformations, and other capabilities.
Worked extensively with Sqoop for importing data from Teradata.
Implemented business logic in Hive and written UDF’s to process the data for analysis.
Utilized AWS services like S3, EMR, Redshift, Atana, Glue Metastore etc., for building and managing data pipelines within the cloud.
Automated EMR Cluster creation and termination using AWS Java SDK.
Loaded the processed data to redshift clusters using Spark Redshift Integration.
Created views with-in Atana for allowing downstream reporting and data analysis team to query and analyze the results.

Environment: Spark, Hive, HBase, Scala, Python, Shell Scripting, Amazon EMR, S3

Hadoop Developer

Confidential, Pittsburg, PA

Responsibilities:

Developed Spark applications using Scala utilizing Data frames and Spark SQL API for faster processing of data.
Developed highly optimized Spark applications to perform various data cleansing, validation, transformation, and summarization activities according to the requirement
Data pipeline consists of Spark, Hive and Sqoop and custom build Input Adapters to ingest, transform and analyze operational data.
Developed Spark jobs and Hive Jobs to summarize and transform data.
Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
Involved in converting Hive/SQL queries into Spark transformations using Spark DataFrames and Scala.
Analyzed the SQL scripts and designed the solution to implement using Scala.
Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
Built real time data pipelines by developing Kafka producers and spark streaming applications for consuming.
Ingested syslog messages parsed them and streamed the data to Kafka.
Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and tan loading data into HDFS.
Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
Scheduled and executed workflows in Oozie to run various jobs.

Environment: Hadoop, HDFS, HBase, Spark, Scala, Hive, MapReduce, Sqoop, ETL, Java

Java Developer

Confidential

Responsibilities:

Involved in client requirement gathering, analysis & application design.
Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) dat includes Development, Testing, Implementation and Maintenance Support in Waterfall methodology.
Developed the UI layer with JSP, HTML, CSS, Ajax, and JavaScript.
Used Asynchronous JavaScript and XML (AJAX) for better and faster interactive Front-End.
Used JavaScript to perform client-side validations.
Involved in Database Connectivity through JDBC.
Ajax was used to make Asynchronous calls to server side and get JSON or XML data.
Developed server-side presentation layer using Struts MVC Framework.
Developed Action classes, Action Forms and Struts Configuration file to handle required UI actions and JSPs for Views.
Developed batch job using EJB scheduling and leveraged container managed transactions for highly transactions.
Used various Core Java concepts such as Multi-Threading, Exception Handling, Collection APIs, Garbage collections for dynamic memory allocation to implement various features and enhancements.
Developed Hibernate entities, mappings, and customized criterion queries for interacting with database.
Implemented and developed REST and SOAP based Web Services to provide JSON and Xml data.
Involved in implementation of web services (top-down and bottom-up).
Used JPA and JDBC in the persistence layer to persist the data to the DB2 database.
Created and written SQL queries, tables, triggers, views, and PL/SQL procedures to persist and retrieve the data from the database.
Developed a Web service to communicate with the database using SOAP.
Performance Tuning and Optimization with Java Performance Analysis Tool.
Implement JUnit test cases for Struts/Spring components.
JUnit is used to perform the Unit Test Cases.
Used Eclipse as IDE and worked on installing and configuring JBOSS.
Made use of CVS for checkout and check in operations.
Deployed the components into WebSphere Application server
Worked with production support team in debugging and fixing various production issues.

Environment: Java, JSP, HTML, CSS, AJAX, JavaScript, JSON, XML, Struts, Struts MVC, JDBC, JPA, Web Services, SOAP, SQL, JBOSS, DB2, ANT, Eclipse IDE, WebSphere.

We provide IT Staff Augmentation Services!

Data Engineer Resume

New York, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship