We provide IT Staff Augmentation Services!

Big Data Engineer Resume

5.00/5 (Submit Your Rating)

SUMMARY

  • 4+ years of diversified experience in field of Information Technology with an emphasis on Big Data/Hadoop ecosystem, SQL/NoSQL databases, Java/Python/Scala, technologies and tools using industry accepted methodology and procedures.
  • 3+ years of experiences in big data development with Hadoop ecosystem including Hadoop MapReduce, HDFS, Spark, Hive, HBase, Cassandra, Sqoop, Flume, Kafka, Zookeeper, and Oozie.
  • Experienced install/configuring and using sandbox such as Cloudera CDH, Hortonworks HDP.
  • Hands on experience in Data Science, Data mining and Business Intelligence Analytics.
  • Working knowledge of Amazon's Elastic Cloud Compute(EC2) infrastructure for computational tasks and Simple Storage Service(S3) as Storage mechanism.
  • Hands on in writing HiveQL queries and Pig Latin script to do data cleaning and processing
  • Experienced on Spark to process data with Spark core, Spark Streaming, Spark SQL.
  • Experienced with RDBMS including Oracle, MySQL, SQL Server, PostgreSQL.
  • Worked with data visualization tools including Tableau, Adobe Analytics Omniture, and matplotlib
  • Worked in development environment and used tools including Agile/Scrum, Git/GitHub, SVN, JIRA, Jenkins, Spiral and Waterfall.
  • Solid Fundamental with Core Java, Algorithm Design, Object - Oriented Design(OOD) and Java components including Collections Framework, Exception handling, I/O system, and Multithreading.
  • Well communicator, ambitious, leadership, team player and self-motivated learner with passion for new technologies.

TECHNICAL SKILLS

Big Data Ecosystem: HDFS, MapReduce, Pig, Hive, Spark, YARN, Kafka, Flume, Sqoop, Oozie, Zookeeper, Spark, Ambari, Mahout.

Hadoop Distributions: Cloudera, Hortonworks, MapR and Apache

Languages: Java, Python, Scala, C/C++.

Cloud Technologies: EC2, S3, EMR, IAM, CloudWatch, Microsoft Azure.

NoSQL Databases: Cassandra, MongoDB, HBase, DynamoDB, Bigtable.

RDBMS: Teradata, Oracle, MS SQL Server, MySQL, DB2, Postgre

Machine Learning: Decision Tree, Naïve Bayes, K-means, Linear Regression, Logistic Regression.

Development / Build Tools: PyCharm, Eclipse, Jenkins, Git, Maven, PyUnit, JUnit, Tableau, Adobe Analytics.

PROFESSIONAL EXPERIENCE

Big Data Engineer

Confidential

Responsibilities:

  • Collaborated on insights with other Data Scientists, Business Analysts, and Partners.
  • Uploaded data to Hadoop Hive and combined new tables with existing databases.
  • Developed Python code to provide data analysis and to generate data reports.
  • Implemented MapReduce programs to handle semi structured & unstructured data imported in HDFS.
  • Converted SQL queries running at mainframes into pig and Hive as a part of a migration from mainframes into Hadoop cluster.
  • Working knowledge of Amazon's Elastic Cloud Compute(EC2) infrastructure for computational tasks and Simple Storage Service(S3) as Storage mechanism for cloud integration using Elastic Map Reduce (EMR).
  • Created Hive tables to store structured data into HDFS and processed it using HiveQL.
  • Developed Spark SQL code to extract and process data from various data sources to feed this information into our machine-learning simulations for future predictions. Creatively communicated and presented models to business customers and executives, utilizing a variety of formats and visualization methodologies.
  • Generated data cubes using HIVE, PIG, JAVA Map-reducing on provisioning Hadoop cluster in AWS.

Overstock Manager

Confidential

Responsibilities:

  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFs using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Scoop.
  • Loaded the data into Spark RDD and do in memory data computation to generate the output response.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD’s.
  • Performed advanced procedures like text analytics and processing, using in-memory computing capacities of Spark using Scala.
  • Experienced in handling large datasets using partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformation and other during ingestion process itself.
  • Worked on migrating legacy Map Reduce programs into Spark transformations using Spark and Scala.

Data Engineer

Confidential

Responsibilities:

  • Installed and configured Apache Hadoop, Hive and Pig environment on the prototype server.
  • Configured SQL database to store Hive metadata.
  • Loaded unstructured data into Hadoop File System (HDFS).
  • Created ETL jobs to load JSON data and server data into MongoDB and transported MongoDB into the Data Warehouse.
  • Created reports and dashboards using structured and unstructured data.
  • Performed research work on Novartis (medical device firm) data using Tableau for customer journey optimization to connect disparate customer experiences into a single cohesive picture.

Data Engineer

Confidential

Responsibilities:

  • Involved in review of functional and non-functional requirements.
  • Configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in Java for data clients and preprocessing.
  • Imported Semi structured and Structured data into HDFS and Hive using Sqoop.
  • Proficient work experience on NoSQL database HBase.
  • Involved in creating HIVE tables, loading with data and writing hive queries, which will run internally in MapReduce way.
  • Created HBase tables, HBase sinks and loaded data into them to perform analytics.
  • Expertise in working with Python, RDBMS, and Linux shell scripting.

Python Developer

Confidential

Responsibilities:

  • Worked on server-side applications using Python 2.7 programming.
  • Automated the existing scripts for performance calculations using machine library like Writing Subqueries, Stored Procedures for Insurance premium.
  • Expertise in software development using python and its packages like NumPy, SciPy, Pandas etc.
  • Used Pandas API to put the data as time series in tabular format for each time stamp, data manipulation and data retrieval.
  • Experience with continuous integration and automation using Jenkins.
  • Have worked on MySQL data base for storing and retrieving application data
  • Strong Data Querying/SQL and UNIX Shell Scripting skills. Possess excellent skills in Manual Testing and automated testing.
  • Designed and developed Use-Case Diagrams, Class Diagrams, and Object Diagrams using UML Rational Rose.

We'd love your feedback!