Sr Data Engineer Resume New York City, NY - Hire IT People

SUMMARY

Over all 8+ years of professional IT experience in multiple technology methodologies like Hadoop Big Data Ecosystem and Java/J2EE related technologies
4+ years of experience in Hadoop ecosystem components like MapReduce, Spark, Hive, Spark Streaming, Spark SQL, Spark MLlib, Kafka, Pig, HBase, Cassandra, Zookeeper, Sqoop, Flume, Oozie
Good experience in various Hadoop Distributions (Cloudera, Horton Works)
Have a good understanding on architecture and the components of Spark, and good experience in Spark Streaming, SparkSQL, Spark Core
Performed various Spark RDD transformations and actions on large datasets.
Performed various jobs in SparkSQL, Spark streaming using Data frames/ Data sets and D - Stream RDDs
Experience in designing and developing applications in Spark using Scala. Also implemented Scala scripts, UDF’s for data aggregation, queries and writing data into HDFS through Sqoop
Configured Spark Streaming to receive real- time data from a messaging stream platform like Apache Kafka and store it in the HDFS using Sqoop
Experience in configuring and monitoring Kafka clusters and connectors
Performed Read and write operations on thousands of megabytes of streaming data using Apache Kafka
Worked on Producer API and Consumer API to publish data and to subscribe data to and from the Kafka topics
Integrating and leveraging new data streaming tools like Apache Flink
Implemented transformations on bounded or unbounded data streams of data using Flink’s Data Stream API
Strong experience in writing MapReduce scripts using Scala, Java with Java API, Apache Hadoop API, Python API, PySpark API and Spark API for analyzing the data
Knowledge on Flume to extract click- stream data from web server
Experience in building data ingestion pipelines using Kafka, Flume, NIFI frameworks
Implemented NiFi dataflow framework, monitored, controlled, and performed streaming and batch processing in HDP 2.6.4
Worked on various NoSQL databases such as Cassandra, MongoDB, HBase
Designing data models in Cassandra and working with CQL
Experience in creating key spaces, tables and secondary indexes in Cassandra
Good knowledge in CQL. Have performed CRUD operations on the file system
Worked with Spark on parallel computing to enhance knowledge on RDDs using Cassandra
Imported data from sources like AWS S3, local file system into Spark RDD
Designed and implement test environment on AWS
Good experience in designing Row keys & Schema on NoSQL database like MongoDB.
Also managing life cycle of Mongo DB including sizing, automation, monitoring and tuning
Designed and developed functionalities to get JSON documents from MongoDB and export it to client using REST API
Experience in HBase to load and retrieve data for real- time processing using RESTful API
Created Hive tables according to the requirement and defined appropriate static and dynamic partitions, on the account efficiency
Very good experience in writing Hive QL queries using various operations like Partitioning, Bucketing and windowing operations
Experience in exporting and importing data from Hive/ HDFS to RDBMS using Sqoop
Worked on loading CSV/TXT/AVRO/PARQUET file formats for Hive Querying and Processing
Have exposure to Talend for designing ETL jobs for processing of Data
Performed job scheduling and workflow using Oozie
Involved in developing Impala scripts for Adhoc queries and created Tableau dashboards for results
Used cloud AWS and monitored using CloudWatch, performed Lambda operations on them
Developed modules in Applications using Java, J2EE, Spring, Hibernate frameworks and Web Services like REST, SOAP
Experience in using different SDLC facets like Agile, Waterfall models along with enterprise tools like JIRA, Confluence, Jenkins to develop projects

TECHNICAL SKILLS

Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Apache Kafka, Apache Flink, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and ORC

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks (HDP 2.6), MapR and DSE

Languages: Java, Python, Scala, SQL, HTML, DHTML, JavaScript, XML and C/C++, CQL, HQL, Pig Latin, PySpark

No SQL Databases: Cassandra, MongoDB and HBase

Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts

XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB

Development Methodology: Agile, waterfall

Web Design Tools: HTML, DHTML, AJAX, JavaScript, jQuery and CSS, AngularJS, ExtJS and JSON

Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J

Frameworks: Struts, Spring and Hibernate

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat

DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle

RDBMS: Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2

Operating systems: UNIX, LINUX, Mac OS and Windows Variants

Data analytical tools: R and MATLAB

ETL Tools: Informatica, Talend

PROFESSIONAL EXPERIENCE

Confidential, New York City, NY

Sr Data Engineer

Responsibilities:

Our team was responsible for the build and support of vital technology solutions supporting scheduling, acquisition, processing, and distribution of Confidential content.
Responsible in a detailed technical design, development and implementation of applications using core Spark and Spark Streaming RDD’s
Created RDD’s in Spark technology and extracted data from data warehouse on to the Spark RDD’s
Configured Spark Streaming to receive real time data from the Kafka and store the stream of data to HDFS using Scala
Initially used Spark- Cassandra connector to load the data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting and grouping.
Optimized the existing algorithms in Hadoop using Spark Context, Spark- SQL, Data frames and RDD’s
Created, altered and deleted topics (Kafka Queues) when required with varying performance tuning using partitioning, bucketing of Hive tables
Used the Kafka Producer application to publish clickstream events into the Kafka topic and later explored the data with Spark SQL
Worked on Producer API and created a custom partition to publish the data to Kafka
Importing streaming logs and aggregating the data to HDFS and MySQL databases like Oracle through Kafka
Integrating and leveraging new data streaming tools like Apache Flink
Implemented transformations on bounded or unbounded data streams of data using Flink’s Data Stream API
Implemented Nifi dataflow framework, monitored, controlled, and performed streaming and batch processing in HDP 2.6.4
Worked totally in Agile methodologies, used Rally scrum tool to track the User stories and Team performance
Converting SQL codes to Spark codes using Java and Spark SQL, Spark Streaming for faster testing and processing of data and import and indexing data from HDFS for secure searching, reporting, analysis
Written Python scripts to call Cassandra REST API and performed some transformations and transferred the data into Spark
Performed CRUD operations on CQL using Cassandra. Also worked on creating key spaces, tables and secondary indexes in Cassandra
Designed and implement test environment on AWS
Used cloud AWS and monitored using CloudWatch, performed Lambda operations on them
Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS. Big Data tool to load the big volume of source files from S3 to Redshift
Hands on expertise in running the Spark and Spark SQL on Amazon Elastic map reduce (EMR). Also, good knowledge in cloud integration with EMR
Monitor and Troubleshoot Hadoop jobs using Yarn Resource Manager and EMR job logs using Genie and Kibana.
Created indexes for various statistical parameters on Elastic Search and generated visualization using Kibana
Used Spark and Spark SQL to read the parquet data and create the tables in hive using the Scala API
Worked on PySpark SQL where the task is to fetch the NOTNULL data from two different tables and loads
Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement.
Involved in Configuring core-site.xml and mapred-site.xml according to the multi node cluster environment.
Developed data pipeline using Flume, Sqoop, Pig and Java Map Reduce to ingest claim data and financial histories into HDFS for analysis. Using Curator API on Elastic Search to data back up and restoring
Managing work flow and scheduling for complex map reduce jobs using Apache Oozie

Environment: Spark Streaming, Hive, Spark, Spark SQL, Impala, Kafka, Pig, PySpark, HBase, Cassandra, Zookeeper, Sqoop, Flume, Oozie, Splunk, Elastic Search, Python, Java, NiFi, Agile, HDP 2.6.4

Confidential, Louisville, KY

Data Engineer

Responsibilities:

Responsible for taking customer behavioral data, store- level consumer information, digital analytics, and a variety of other disparate data sources and generating descriptive and predictive models
Responsible for migrating the existing RDBMS system to Hadoop
Responsible to lead the development of programs to clean and organize data sets, using tools along with more general knowledge of data cleaning and wrangling
Implemented Hadoop cluster on Cloudera and assisted it with performance tuning, monitoring and troubleshooting
Developed a data pipeline using Sqoop and Java MR to ingest customer behavioral data and financial histories into HDFS for analysis
Used Hive partitioning and bucketing for performance optimization of the Hive tables and created around 20000 partitions. Importing and exporting data into HDFS and Hive using Sqoop
Used Hive to analyze the partitioned and bucketed data and also computed various metrics for the reporting
Good experience in developing Hive DDLs to create, alter and drop Hive TABLEs
Involved in developing Hive UDF’s for the needed functionality that is not out of the box available from Apache Hive
Involved in Performance Tuning for optimizing the jobs in Hive, Pig and HBase.
Expert in importing and exporting terabytes of data into HDFS and Hive using Sqoop from other Traditional Relational Database Systems.
Worked on File Optimization Framework to convert CSV, JSON, XML, Avro formatted files on S3 in to Parquet formatted files and created External Hive Partitioned tables on top of Parquet S3 files and automated the process.
Experience in indexing, replication, aggregation and some Ad- hoc queries in Mongo DB
Imported and exported the data into Mongo DB
Experienced in managing Mongo DB with the help of availability, performance and scalability trade offs
Involved in collecting and aggregating large amounts of data into HDFS using Flume and defined channel selectors to multiplex data into different sinks
Written Sqoop Scripts to export and import data into HDFS and Hive
Developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS
Importing the data from various sources like HDFS/ HBase into Spark RDD
Load balancing of ETL processes, database performance tuning and Capacity monitoring using Talend
Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.
Created reports using Tableau for Hive
Used Sqoop to import customer information data from SQL server database into HDFS for data processing
Loaded and transformed large sets of structured, semi structured data using Pig Scripts
Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive and MapReduce) and move the data files within and outside of HDFS

Environment: Hive, SQL, Sqoop, Flume, Mongo DB, Talend, Tableau, Scala, Python, Java, HBase, Pig, Java MR

Confidential, Livermore, CA

Hadoop Developer

Responsibilities:

Was involved in the requirement analysis, design, coding and implementation of Hadoop Cluster
Was Responsible for business logic using Java, Java scripts, JDBC for querying data base
Gathered business requirements from the business partners and subject matter experts
Responsible for designing and development of Hive Data Model
Imported Bulk Data into HBase using HiveQL and MapReduce programs.
Developed Hive and Impala scripts on Avro and Parquet file formats.
Deployed Hive and HBase integration to perform OLAP operations on HBase data.
Processed the flat files using Pig, load them into Hive and further converted into Fixed width files.
Responsible to manage data coming from different sources
Implemented Apache Solr for fast retrieval
Written UDF’s in Java for Hive and imported data into it
Analyzed large datasets in Hive Data Model by using Hive queries and produced results
Designed and developed Talend jobs to extract data from Oracle into MongoDB
Extensively used SQL, PL/ SQL, Triggers and views using IBM DB2
Hands on experience in importing the data from RDBMS to HDFS
Created partitioning, bucketing and Map side joins for performance optimization
Worked on creating Pig Scripts and compared the code development to the ones in Java
Was responsible for technical documentation of Hadoop Clusters and how to execute Hive queries
Also, performed real time analytics on HBase using Java API and REST API
Written Java program to retrieve data from HDFS and provide REST services
Installed Oozie work flow engine to run multiple Hive jobs
Used Oozie scheduler to automate the pipeline workflow and orchestrate the Sqoop, hive and pig jobs that extract the data on a timely manner.

Environment: Hive, Java, Pig, Map Reduce, HBase, HDFS, Oozie, REST API, Oracle, MongoDB, SQL, PL/ SQL, Apache Solr

Confidential, Columbia, MO

Hadoop Developer

Responsibilities:

Was responsible for building Map Reduce programs in Java for data cleaning and preprocessing
Was responsible for continuous monitoring and managing the Hadoop Cluster using Cloudera Manager
Responsible to manage data coming from different sources
Created reports for the BI team using Sqoop to export data into HDFS and Hive
Involved in loading data form Unix file system to HDFS
Written shell scripts to monitor the health checks of Hadoop Daemon Services and respond accordingly
Imported and exported data into HDFS and Hive using Sqoop
Created HBase tables to store variable data formats coming from different port folios
Created HBase tables to store variable data formats coming from different portfolios Performed real-time analytics on HBase using Java API and Rest API
Was involved in the review of functional and non- functional requirements
Was involved in creating ER relations diagrams for the relational database
Monitoring the Hadoop clusters using Cloudera Manager
Managing and scheduling jobs on Hadoop Cluster
Extracted feeds from social media platforms such as Facebook, Twitter using Python Scripts
Strong experience of J2SE, XML, Web services, WSDL, SOAP, TCP/ IP
Also have Hands- on experience on JSP, Servlets, JDBC, Struts, Maven, Junit and SQL
Worked with various database systems such as Oracle 8i and Oracle 9i and DB2
Have experience with Web logic Application server, Web sphere Application server and J2EE application deployment technology

Environment: Map Reduce, Java, Python, Hive, Pig, Apache Solr, XML, Web Services, SOAP, TCP/ IP, Oracle 8i, HBase, Oracle 9i, Web Logic Application server

Confidential

Java developer

Responsibilities:

Was Responsible for the design and development of MVC 2 (Model View Controller) Architecture, Using the Front controller design Pattern
Used Core Java concepts to design the applications
Was involved in connecting the JDBC to connect to data bases such as Oracle and SQL server 2005
Have written servlets to generate dynamic HTML ages
Have written SQL queries to retrieve and to insert data into multiple data base schemas
Developed the XML Schema and Web services for the data maintenance and structures Wrote test cases in JUnit for unit testing of classes
Used DOM and DOM Functions
Debugged the application using Firebug to traverse the documents
Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects
Created database program in SQL server to manipulate data accumulated by internet transactions.
Involved in writing SQL Queries, Stored Procedures and used JDBC for database connectivity with MySQL Server

Environment: Java 7, Oracle, SQL, XML, JUnit DOM, SQL Server 2005, Eclipse

Confidential

Java Developer

Responsibilities:

Used exception handling and multi-threading for optimum performance of the application
Responsible for developing and maintaining the necessary Java Components, Enterprise Java Beans, Servlets
Developed applications using Java Object oriented concepts such as inheritance, polymorphism, Multi- Threading and Collections classes
Implemented Java Scripts for client- side validations
Involved in the analysis, design, development and testing phases of SDLC using agile development methodology
Developed the application under J2EE architecture, developed Designed dynamic and browser compatible user interfaces using JSP, Custom Tags, HTML, CSS, and JavaScript
Was responsible for defects allocation and ensuring the defects are resolved
Used JDBC to connect to web applications
Developed complex SQL queries, PL/ SQL stored procedures and functions
Used Eclipse IDE for the development and debugging

Environment: Java 7, Java Scripts, HTML, CSS, Servlets, J2EE, SQL, Eclipse, Agile Methodologies

We provide IT Staff Augmentation Services!

Sr Data Engineer Resume

New York City, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship