Big Data Engineer Resume

SUMMARY

4+ years of diversified experience in field of Information Technology with an emphasis on Big Data/Hadoop ecosystem, SQL/NoSQL databases, Java/Python/Scala, technologies and tools using industry accepted methodology and procedures.
3+ years of experiences in big data development with Hadoop ecosystem including Hadoop MapReduce, HDFS, Spark, Hive, HBase, Cassandra, Sqoop, Flume, Kafka, Zookeeper, and Oozie.
Experienced install/configuring and using sandbox such as Cloudera CDH, Hortonworks HDP.
Hands on experience in Data Science, Data mining and Business Intelligence Analytics.
Working knowledge of Amazon's Elastic Cloud Compute(EC2) infrastructure for computational tasks and Simple Storage Service(S3) as Storage mechanism.
Hands on in writing HiveQL queries and Pig Latin script to do data cleaning and processing
Experienced on Spark to process data with Spark core, Spark Streaming, Spark SQL.
Experienced with RDBMS including Oracle, MySQL, SQL Server, PostgreSQL.
Worked with data visualization tools including Tableau, Adobe Analytics Omniture, and matplotlib
Worked in development environment and used tools including Agile/Scrum, Git/GitHub, SVN, JIRA, Jenkins, Spiral and Waterfall.
Solid Fundamental with Core Java, Algorithm Design, Object - Oriented Design(OOD) and Java components including Collections Framework, Exception handling, I/O system, and Multithreading.
Well communicator, ambitious, leadership, team player and self-motivated learner with passion for new technologies.

TECHNICAL SKILLS

Big Data Ecosystem: HDFS, MapReduce, Pig, Hive, Spark, YARN, Kafka, Flume, Sqoop, Oozie, Zookeeper, Spark, Ambari, Mahout.

Hadoop Distributions: Cloudera, Hortonworks, MapR and Apache

Languages: Java, Python, Scala, C/C++.

Cloud Technologies: EC2, S3, EMR, IAM, CloudWatch, Microsoft Azure.

NoSQL Databases: Cassandra, MongoDB, HBase, DynamoDB, Bigtable.

RDBMS: Teradata, Oracle, MS SQL Server, MySQL, DB2, Postgre

Machine Learning: Decision Tree, Naïve Bayes, K-means, Linear Regression, Logistic Regression.

Development / Build Tools: PyCharm, Eclipse, Jenkins, Git, Maven, PyUnit, JUnit, Tableau, Adobe Analytics.

PROFESSIONAL EXPERIENCE

Big Data Engineer

Confidential

Responsibilities:

Collaborated on insights with other Data Scientists, Business Analysts, and Partners.
Uploaded data to Hadoop Hive and combined new tables with existing databases.
Developed Python code to provide data analysis and to generate data reports.
Implemented MapReduce programs to handle semi structured & unstructured data imported in HDFS.
Converted SQL queries running at mainframes into pig and Hive as a part of a migration from mainframes into Hadoop cluster.
Working knowledge of Amazon's Elastic Cloud Compute(EC2) infrastructure for computational tasks and Simple Storage Service(S3) as Storage mechanism for cloud integration using Elastic Map Reduce (EMR).
Created Hive tables to store structured data into HDFS and processed it using HiveQL.
Developed Spark SQL code to extract and process data from various data sources to feed this information into our machine-learning simulations for future predictions. Creatively communicated and presented models to business customers and executives, utilizing a variety of formats and visualization methodologies.
Generated data cubes using HIVE, PIG, JAVA Map-reducing on provisioning Hadoop cluster in AWS.

Overstock Manager

Confidential

Responsibilities:

Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Developed Scala scripts, UDFs using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Scoop.
Loaded the data into Spark RDD and do in memory data computation to generate the output response.
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD’s.
Performed advanced procedures like text analytics and processing, using in-memory computing capacities of Spark using Scala.
Experienced in handling large datasets using partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformation and other during ingestion process itself.
Worked on migrating legacy Map Reduce programs into Spark transformations using Spark and Scala.

Data Engineer

Confidential

Responsibilities:

Installed and configured Apache Hadoop, Hive and Pig environment on the prototype server.
Configured SQL database to store Hive metadata.
Loaded unstructured data into Hadoop File System (HDFS).
Created ETL jobs to load JSON data and server data into MongoDB and transported MongoDB into the Data Warehouse.
Created reports and dashboards using structured and unstructured data.
Performed research work on Novartis (medical device firm) data using Tableau for customer journey optimization to connect disparate customer experiences into a single cohesive picture.

Data Engineer

Confidential

Responsibilities:

Involved in review of functional and non-functional requirements.
Configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in Java for data clients and preprocessing.
Imported Semi structured and Structured data into HDFS and Hive using Sqoop.
Proficient work experience on NoSQL database HBase.
Involved in creating HIVE tables, loading with data and writing hive queries, which will run internally in MapReduce way.
Created HBase tables, HBase sinks and loaded data into them to perform analytics.
Expertise in working with Python, RDBMS, and Linux shell scripting.

Python Developer

Confidential

Responsibilities:

Worked on server-side applications using Python 2.7 programming.
Automated the existing scripts for performance calculations using machine library like Writing Subqueries, Stored Procedures for Insurance premium.
Expertise in software development using python and its packages like NumPy, SciPy, Pandas etc.
Used Pandas API to put the data as time series in tabular format for each time stamp, data manipulation and data retrieval.
Experience with continuous integration and automation using Jenkins.
Have worked on MySQL data base for storing and retrieving application data
Strong Data Querying/SQL and UNIX Shell Scripting skills. Possess excellent skills in Manual Testing and automated testing.
Designed and developed Use-Case Diagrams, Class Diagrams, and Object Diagrams using UML Rational Rose.