Big Data Engineer Resume White Plains, NY - Hire IT People

SUMMARY:

Data Enthusiast with broad experience in IT and Data Technology oriented solutions with extensive knowledge of SDLC and Data Modelling.
Extensive experience of Big Data Ecosystem including Hadoop, HDFS, YARN, MapReduce, Mesos, NiFi, StreamSets, Kudu, Spark, Hive, Impala, Pig, HBase, Sqoop, Flume, Kafka, Mesos, Oozie and Zookeeper.
In - depth understanding of Hadoop and Spark Architecture.
Hand-on experience in using HIVE partitioning, bucketing and execute different types of joins on Hive tables.
Hands-on experience in HiveQL and good understanding of Joins, Group and Aggregations, query optimization.
Worked in various efficient storage formats like Avro, Parquet and ORC integrated with Hadoop ecosystem (Hive, Impala and Spark). Also used compression techniques Snappy and GZip.
Experience in import/export of structured and non-structured data to HDFS and HIVE table using Sqoop and Flume.
Experience in NoSQL Column-Oriented Databases like HBase, Cassandra and its Integration with Hadoop cluster.
Strong understanding of Spark Core, spark-SQL, PySpark, Spark Streaming and Machine Learning (SVM, Linear and Logistic Regression, KNN, Decision Tree, Random Forest, Gradient Boosting, Naïve Bayes and Cross Validation).
Experience in performing Exploratory Data Analysis (EDA), Dimensionality Reduction methods (PCA), missing value treatment and outlier treatment.
Experience in collecting, aggregating and moving large amounts of streaming data using Flume, Kafka, Spark Streaming.
Good knowledge of different AWS Services such as EC2, S3, EMR, RedShift, DynamoDB, Aurora, Athena.
Strong experience in writing custom UDF s in Scala/Python/Java for HIVE and Pig to extend the functionality.
Strong Database Experience on SQL Server 2008 R2/2017 with T-SQL programming skills in creating Stored Procedures, Functions, Triggers and Views.
Experience in using data visualization and reporting using Dask, Matplotlib, Seaborn, Tableau.
Skills for debugging application code and problem solving for various production issues.
Enthusiastic team player dedicated to streamlining processes and efficiently resolving project issues.

TECHNICAL SKILLS:

Hadoop Ecosystem \ Web Technologies: Hadoop 2.1+, Spark 1.3+/2.1+, MapReduce, \ Oracle WebLogic 11g/12c, OHS 11g, JSF 2.1 Pig 0.11+, Flume 1.3+, HBase 0.98+, Oozie \ Flask 1.0 +, HTML 5, Splunk 6.5.X, CSS 3.3+, Sqoop 1.4+, HDFS, Kafka 0.8.1+, \ REST, JSON, XML, Tomcat 8. 0 +/9.0+, Zookeeper 3.4+, Airflow, Hive 0.10+/2.2+ \ JBOSS 6.X, Splunk 6.X Cloudera 4.X/5.X, Hortonworks \

Languages\ Cloud Technologies: Java 7/8, Scala 2.0+, Python 2.7+/3.3+, SQL \ AWS-EC2, S3, EMR, RedShift, DynamoDB \

Pig Latin, Cypher, Julia, Shell: Scripting\ VPC, Aurora, Athena, SQS, SNS Cloudcraft, \

HQL, T: SQL, CQL \ Databricks Cloud Community

Machine Learning \ Data Analysis and Visualization: Regressions, KNN, Random Forests, MLib \ Kibana 5.X, Tableau 10.2, Matplotlib, xlrd, SVM, Decision Tree, Ensemble and Stack \ Pandas, NumPy

Databases\ Others: MySQL 5.0+, MS SQL Server 2017/2008 R2, \ Git, GitHub, GitLab, JIRA, Jenkins, Maven 3 PostgreSQL 9.6, Cassandra 2.0+, Oracle 11g/12c \ Hibernate 2, SSIS 2008, Spring 3, MVC, Bonsai Express, Neo4J 3.5.6, MongoDB 3.6 Elastic \ Docker, Vagrant Search 2.X, \

PROFESSIONAL EXPERIENCE:

Confidential

Big Data Engineer

Responsibilities:

Used NiFi to export flat files to Hive Tables.
Used YARN as a Resource Manager and HDFS as distributed storage in Cluster.
Running HiveQL scripts to get valuable insights.
Check the Raw tables in database for correct Attribute file.
Developed python scripts to check for approved product code in the attribute file and email notification for any discrepancy.
Finally loaded the files in HBase tables for Downstream application.
Got good experience with various NoSQL databases and Comprehensive knowledge in process improvement, normalization/de-normalization, data extraction, data cleansing, data manipulation.

Environment: SQL Server 2017/2008 R2, Python 3.6, NiFi 1.9.0, Hadoop 2.7

Confidential, White Plains, NY

DataOps Engineer

Responsibilities:

Involved in optimized Spark applications to perform data cleansing and data validation.
Data pipeline using Spark , Hive, Cobol Copy Blocks and Sqoop and then transform and analyze data.
Created Sqoop scripts to import/export data from RDBMS to S3 data store.
Created Spark applications using Spark Data frames and Spark SQL API extensively.
Collaborated with platform engineers in development of python-based Kafka producer API to capture live stream data into various Kafka topics.
Developed Spark-Streaming application to consume the data from Kafka topics and to insert the processed streams to HBase .
Application of Broadcast variables in Spark and efficient joins in Hive for data processing.
Used spark-SQL to perform enrichment and to prepare different levels of behavioral summaries.
Implemented Partitioning and Bucketing in Hive in order to enhance query efficiency and performance of joins.
Experience in amazon cloud environment and using services EMR Cluster , S3 and Redshift .

Environment: Spark streaming/Scala 2.11.8, Spark 2.2, Hive 2.3.2, Kafka 2.0.0, Sqoop1.4.X, Hortonworks Distribution, Hadoop 2.7, EMR, Cobol Copy Blocks, Redshift

Confidential

Big Data Engineering

Responsibilities:

Responsible for Exploratory Data Analysis (EDA) and Dimensional Reduction (PCA).
Variable Identification, Missing value treatment, Outlier treatment, Variable transformation, Univariate and Bi-variate analysis.
Loaded data from various formats flat file , JSON , Avro, Parquet to Spark Cluster.
Apply Spark transformations and actions using Scala .
Data cleaning and storing into Hive table for Analysis.
Connected Hive tables with Tableau and performed data visualization for report.
Plot the Trend and Pattern Analysis and compare companies market capitalization from historical data.
Created a 14 Node Spark Cluster with 11 Executors and 1 Driver.
Created UDF and added the function availability to each executor.
Used GitHub for version control, JIRA for issue tracking.

Environment: CDH 5.X, Hadoop 2.6.X, Python 3.6, Scala 2.11.8, Spark 2.1, Hive 2.2.0, Tableau 10.2

Confidential

Hadoop Developer

Responsibilities:

Extensively involved in Installation and configuration of Cloudera distribution Hadoop, Name Node, Secondary Name Node, Job Tracker, Task Trackers, and Data Nodes.
Developed MapReduce programs in Java and Sqoop the data from ORACLE database.
Responsible for building scalable distributed data solutions using Hadoop. Written various Hive and Pig scripts.
Moved data from HDFS to HBase using Map Reduce and Bulk Output Format class.
Experienced with different scripting language like Python and shell scripts .
Developed various Python scripts to find vulnerabilities and Data Validation.
Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
Experienced with handling administration activations using Cloudera manager.
Expertise in understanding Partitions , Bucketing concepts in Hive.
Analyzed the weblog data using the HiveQL , integrated Oozie with the rest of the Hadoop stack.
Utilized cluster co-ordination services through Zookeeper.
Experience with creating script for data modeling and data import and export. Extensive experience in deploying, managing and developing Cassandra clusters.
Created Partitioned Hive tables and worked on them using HiveQL.
Developed Shell scripts to automate routine DBA tasks.
Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring, troubleshooting, managing and reviewing data backups and Hadoop log files.

Environment: Hadoop 2.1.0, Pig 0.9.0, Python 3.3.0, Hive 0.10, Oozie 3.3.1, Sqoop 1.4.3, HBase 2.2.0, Java 7, Avro, CDH 4.0, Zookeeper 3.4.5, Cassandra 2.0 and Shell Scripting

Confidential

Associate Engineer

Responsibilities:

Involved in system design, which is based on Spring Struts Hibernate framework.
Implemented the business logic in standalone Java classes using core Java.
Developed database (SQL Server) applications.
Worked in Spring Hibernate Template to access the SQL Server database.
Created Views, Functions and developed Stored Procedures for implementing application functionality at the database side for performance improvement
Design, implementing, and test new features by using T-SQL programming.
Optimize existing data aggregation and reporting for better performance.
Perform varied analyses to support organization and client improvement.

Environment: SQL Server 2012/2008 R2, Spring 3.0, Maven 3.0, HTML, JavaScript 5.0, Hibernate 3.0, JSF 2.1

Confidential

Jr. Developer

Responsibilities:

Analyzing different user requirements and coming up with specifications for the various database applications.
Studied design documents and understood the business needs and the requirements for the project. Involved in discussion, peer review sessions to come up with an optimal design plan.
Involved in project planning also schedule for database module with project managers.
Enhanced performance using optimization techniques-normalization, indexing and transaction Isolation levels.
Experience in creating jobs, alerts, SQL mail agent, and schedules for SSIS Packages in SQL Server Agent.

Environment: MS SQL Server 2008 R2, SSIS 2008, T-SQL, Software Development Life Cycle (SDLC), SQL Server Management Studio 2008, Windows Server 2008

We provide IT Staff Augmentation Services!

Big Data Engineer Resume

White Plains, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship