We provide IT Staff Augmentation Services!

Big Data Engineer Resume

3.00/5 (Submit Your Rating)

White Plains, NY

SUMMARY:

  • Data Enthusiast with broad experience in IT and Data Technology oriented solutions with extensive knowledge of SDLC and Data Modelling.
  • Extensive experience of Big Data Ecosystem including Hadoop, HDFS, YARN, MapReduce, Mesos, NiFi, StreamSets, Kudu, Spark, Hive, Impala, Pig, HBase, Sqoop, Flume, Kafka, Mesos, Oozie and Zookeeper.
  • In - depth understanding of Hadoop and Spark Architecture.
  • Hand-on experience in using HIVE partitioning, bucketing and execute different types of joins on Hive tables.
  • Hands-on experience in HiveQL and good understanding of Joins, Group and Aggregations, query optimization.
  • Worked in various efficient storage formats like Avro, Parquet and ORC integrated with Hadoop ecosystem (Hive, Impala and Spark). Also used compression techniques Snappy and GZip.
  • Experience in import/export of structured and non-structured data to HDFS and HIVE table using Sqoop and Flume.
  • Experience in NoSQL Column-Oriented Databases like HBase, Cassandra and its Integration with Hadoop cluster.
  • Strong understanding of Spark Core, spark-SQL, PySpark, Spark Streaming and Machine Learning (SVM, Linear and Logistic Regression, KNN, Decision Tree, Random Forest, Gradient Boosting, Naïve Bayes and Cross Validation).
  • Experience in performing Exploratory Data Analysis (EDA), Dimensionality Reduction methods (PCA), missing value treatment and outlier treatment.
  • Experience in collecting, aggregating and moving large amounts of streaming data using Flume, Kafka, Spark Streaming.
  • Good knowledge of different AWS Services such as EC2, S3, EMR, RedShift, DynamoDB, Aurora, Athena.
  • Strong experience in writing custom UDF s in Scala/Python/Java for HIVE and Pig to extend the functionality.
  • Strong Database Experience on SQL Server 2008 R2/2017 with T-SQL programming skills in creating Stored Procedures, Functions, Triggers and Views.
  • Experience in using data visualization and reporting using Dask, Matplotlib, Seaborn, Tableau.
  • Skills for debugging application code and problem solving for various production issues.
  • Enthusiastic team player dedicated to streamlining processes and efficiently resolving project issues.

TECHNICAL SKILLS:

Hadoop Ecosystem \ Web Technologies: Hadoop 2.1+, Spark 1.3+/2.1+, MapReduce, \ Oracle WebLogic 11g/12c, OHS 11g, JSF 2.1 Pig 0.11+, Flume 1.3+, HBase 0.98+, Oozie \ Flask 1.0 +, HTML 5, Splunk 6.5.X, CSS 3.3+, Sqoop 1.4+, HDFS, Kafka 0.8.1+, \ REST, JSON, XML, Tomcat 8. 0 +/9.0+, Zookeeper 3.4+, Airflow, Hive 0.10+/2.2+ \ JBOSS 6.X, Splunk 6.X Cloudera 4.X/5.X, Hortonworks \

Languages\ Cloud Technologies: Java 7/8, Scala 2.0+, Python 2.7+/3.3+, SQL \ AWS-EC2, S3, EMR, RedShift, DynamoDB \

Pig Latin, Cypher, Julia, Shell: Scripting\ VPC, Aurora, Athena, SQS, SNS Cloudcraft, \

HQL, T: SQL, CQL \ Databricks Cloud Community

Machine Learning \ Data Analysis and Visualization: Regressions, KNN, Random Forests, MLib \ Kibana 5.X, Tableau 10.2, Matplotlib, xlrd, SVM, Decision Tree, Ensemble and Stack \ Pandas, NumPy

Databases\ Others: MySQL 5.0+, MS SQL Server 2017/2008 R2, \ Git, GitHub, GitLab, JIRA, Jenkins, Maven 3 PostgreSQL 9.6, Cassandra 2.0+, Oracle 11g/12c \ Hibernate 2, SSIS 2008, Spring 3, MVC, Bonsai Express, Neo4J 3.5.6, MongoDB 3.6 Elastic \ Docker, Vagrant Search 2.X, \

PROFESSIONAL EXPERIENCE:

Confidential

Big Data Engineer

Responsibilities:

  • Used NiFi to export flat files to Hive Tables.
  • Used YARN as a Resource Manager and HDFS as distributed storage in Cluster.
  • Running HiveQL scripts to get valuable insights.
  • Check the Raw tables in database for correct Attribute file.
  • Developed python scripts to check for approved product code in the attribute file and email notification for any discrepancy.
  • Finally loaded the files in HBase tables for Downstream application.
  • Got good experience with various NoSQL databases and Comprehensive knowledge in process improvement, normalization/de-normalization, data extraction, data cleansing, data manipulation.

Environment: SQL Server 2017/2008 R2, Python 3.6, NiFi 1.9.0, Hadoop 2.7

Confidential, White Plains, NY

DataOps Engineer

Responsibilities:

  • Involved in optimized Spark applications to perform data cleansing and data validation.
  • Data pipeline using Spark , Hive, Cobol Copy Blocks and Sqoop and then transform and analyze data.
  • Created Sqoop scripts to import/export data from RDBMS to S3 data store.
  • Created Spark applications using Spark Data frames and Spark SQL API extensively.
  • Collaborated with platform engineers in development of python-based Kafka producer API to capture live stream data into various Kafka topics.
  • Developed Spark-Streaming application to consume the data from Kafka topics and to insert the processed streams to HBase .
  • Application of Broadcast variables in Spark and efficient joins in Hive for data processing.
  • Used spark-SQL to perform enrichment and to prepare different levels of behavioral summaries.
  • Implemented Partitioning and Bucketing in Hive in order to enhance query efficiency and performance of joins.
  • Experience in amazon cloud environment and using services EMR Cluster , S3 and Redshift .

Environment: Spark streaming/Scala 2.11.8, Spark 2.2, Hive 2.3.2, Kafka 2.0.0, Sqoop1.4.X, Hortonworks Distribution, Hadoop 2.7, EMR, Cobol Copy Blocks, Redshift

Confidential

Big Data Engineering

Responsibilities:

  • Responsible for Exploratory Data Analysis (EDA) and Dimensional Reduction (PCA).
  • Variable Identification, Missing value treatment, Outlier treatment, Variable transformation, Univariate and Bi-variate analysis.
  • Loaded data from various formats flat file , JSON , Avro, Parquet to Spark Cluster.
  • Apply Spark transformations and actions using Scala .
  • Data cleaning and storing into Hive table for Analysis.
  • Connected Hive tables with Tableau and performed data visualization for report.
  • Plot the Trend and Pattern Analysis and compare companies market capitalization from historical data.
  • Created a 14 Node Spark Cluster with 11 Executors and 1 Driver.
  • Created UDF and added the function availability to each executor.
  • Used GitHub for version control, JIRA for issue tracking.

Environment: CDH 5.X, Hadoop 2.6.X, Python 3.6, Scala 2.11.8, Spark 2.1, Hive 2.2.0, Tableau 10.2

Confidential

Hadoop Developer

Responsibilities:

  • Extensively involved in Installation and configuration of Cloudera distribution Hadoop, Name Node, Secondary Name Node, Job Tracker, Task Trackers, and Data Nodes.
  • Developed MapReduce programs in Java and Sqoop the data from ORACLE database.
  • Responsible for building scalable distributed data solutions using Hadoop. Written various Hive and Pig scripts.
  • Moved data from HDFS to HBase using Map Reduce and Bulk Output Format class.
  • Experienced with different scripting language like Python and shell scripts .
  • Developed various Python scripts to find vulnerabilities and Data Validation.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Experienced with handling administration activations using Cloudera manager.
  • Expertise in understanding Partitions , Bucketing concepts in Hive.
  • Analyzed the weblog data using the HiveQL , integrated Oozie with the rest of the Hadoop stack.
  • Utilized cluster co-ordination services through Zookeeper.
  • Experience with creating script for data modeling and data import and export. Extensive experience in deploying, managing and developing Cassandra clusters.
  • Created Partitioned Hive tables and worked on them using HiveQL.
  • Developed Shell scripts to automate routine DBA tasks.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring, troubleshooting, managing and reviewing data backups and Hadoop log files.

Environment: Hadoop 2.1.0, Pig 0.9.0, Python 3.3.0, Hive 0.10, Oozie 3.3.1, Sqoop 1.4.3, HBase 2.2.0, Java 7, Avro, CDH 4.0, Zookeeper 3.4.5, Cassandra 2.0 and Shell Scripting

Confidential

Associate Engineer

Responsibilities:

  • Involved in system design, which is based on Spring Struts Hibernate framework.
  • Implemented the business logic in standalone Java classes using core Java.
  • Developed database (SQL Server) applications.
  • Worked in Spring Hibernate Template to access the SQL Server database.
  • Created Views, Functions and developed Stored Procedures for implementing application functionality at the database side for performance improvement
  • Design, implementing, and test new features by using T-SQL programming.
  • Optimize existing data aggregation and reporting for better performance.
  • Perform varied analyses to support organization and client improvement.

Environment: SQL Server 2012/2008 R2, Spring 3.0, Maven 3.0, HTML, JavaScript 5.0, Hibernate 3.0, JSF 2.1

Confidential

Jr. Developer

Responsibilities:

  • Analyzing different user requirements and coming up with specifications for the various database applications.
  • Studied design documents and understood the business needs and the requirements for the project. Involved in discussion, peer review sessions to come up with an optimal design plan.
  • Involved in project planning also schedule for database module with project managers.
  • Enhanced performance using optimization techniques-normalization, indexing and transaction Isolation levels.
  • Experience in creating jobs, alerts, SQL mail agent, and schedules for SSIS Packages in SQL Server Agent.

Environment: MS SQL Server 2008 R2, SSIS 2008, T-SQL, Software Development Life Cycle (SDLC), SQL Server Management Studio 2008, Windows Server 2008

We'd love your feedback!