Big Data Engineer Resume
SUMMARY
- 4+ years of diversified experience in field of Information Technology with an emphasis on Big Data/Hadoop ecosystem, SQL/NoSQL databases, Java/Python/Scala, technologies and tools using industry accepted methodology and procedures.
- 3+ years of experiences in big data development with Hadoop ecosystem including Hadoop MapReduce, HDFS, Spark, Hive, HBase, Cassandra, Sqoop, Flume, Kafka, Zookeeper, and Oozie.
- Experienced install/configuring and using sandbox such as Cloudera CDH, Hortonworks HDP.
- Hands on experience in Data Science, Data mining and Business Intelligence Analytics.
- Working knowledge of Amazon's Elastic Cloud Compute(EC2) infrastructure for computational tasks and Simple Storage Service(S3) as Storage mechanism.
- Hands on in writing HiveQL queries and Pig Latin script to do data cleaning and processing
- Experienced on Spark to process data with Spark core, Spark Streaming, Spark SQL.
- Experienced with RDBMS including Oracle, MySQL, SQL Server, PostgreSQL.
- Worked with data visualization tools including Tableau, Adobe Analytics Omniture, and matplotlib
- Worked in development environment and used tools including Agile/Scrum, Git/GitHub, SVN, JIRA, Jenkins, Spiral and Waterfall.
- Solid Fundamental with Core Java, Algorithm Design, Object - Oriented Design(OOD) and Java components including Collections Framework, Exception handling, I/O system, and Multithreading.
- Well communicator, ambitious, leadership, team player and self-motivated learner with passion for new technologies.
TECHNICAL SKILLS
Big Data Ecosystem: HDFS, MapReduce, Pig, Hive, Spark, YARN, Kafka, Flume, Sqoop, Oozie, Zookeeper, Spark, Ambari, Mahout.
Hadoop Distributions: Cloudera, Hortonworks, MapR and Apache
Languages: Java, Python, Scala, C/C++.
Cloud Technologies: EC2, S3, EMR, IAM, CloudWatch, Microsoft Azure.
NoSQL Databases: Cassandra, MongoDB, HBase, DynamoDB, Bigtable.
RDBMS: Teradata, Oracle, MS SQL Server, MySQL, DB2, Postgre
Machine Learning: Decision Tree, Naïve Bayes, K-means, Linear Regression, Logistic Regression.
Development / Build Tools: PyCharm, Eclipse, Jenkins, Git, Maven, PyUnit, JUnit, Tableau, Adobe Analytics.
PROFESSIONAL EXPERIENCE
Big Data Engineer
Confidential
Responsibilities:
- Collaborated on insights with other Data Scientists, Business Analysts, and Partners.
- Uploaded data to Hadoop Hive and combined new tables with existing databases.
- Developed Python code to provide data analysis and to generate data reports.
- Implemented MapReduce programs to handle semi structured & unstructured data imported in HDFS.
- Converted SQL queries running at mainframes into pig and Hive as a part of a migration from mainframes into Hadoop cluster.
- Working knowledge of Amazon's Elastic Cloud Compute(EC2) infrastructure for computational tasks and Simple Storage Service(S3) as Storage mechanism for cloud integration using Elastic Map Reduce (EMR).
- Created Hive tables to store structured data into HDFS and processed it using HiveQL.
- Developed Spark SQL code to extract and process data from various data sources to feed this information into our machine-learning simulations for future predictions. Creatively communicated and presented models to business customers and executives, utilizing a variety of formats and visualization methodologies.
- Generated data cubes using HIVE, PIG, JAVA Map-reducing on provisioning Hadoop cluster in AWS.
Overstock Manager
Confidential
Responsibilities:
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed Scala scripts, UDFs using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Scoop.
- Loaded the data into Spark RDD and do in memory data computation to generate the output response.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD’s.
- Performed advanced procedures like text analytics and processing, using in-memory computing capacities of Spark using Scala.
- Experienced in handling large datasets using partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformation and other during ingestion process itself.
- Worked on migrating legacy Map Reduce programs into Spark transformations using Spark and Scala.
Data Engineer
Confidential
Responsibilities:
- Installed and configured Apache Hadoop, Hive and Pig environment on the prototype server.
- Configured SQL database to store Hive metadata.
- Loaded unstructured data into Hadoop File System (HDFS).
- Created ETL jobs to load JSON data and server data into MongoDB and transported MongoDB into the Data Warehouse.
- Created reports and dashboards using structured and unstructured data.
- Performed research work on Novartis (medical device firm) data using Tableau for customer journey optimization to connect disparate customer experiences into a single cohesive picture.
Data Engineer
Confidential
Responsibilities:
- Involved in review of functional and non-functional requirements.
- Configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in Java for data clients and preprocessing.
- Imported Semi structured and Structured data into HDFS and Hive using Sqoop.
- Proficient work experience on NoSQL database HBase.
- Involved in creating HIVE tables, loading with data and writing hive queries, which will run internally in MapReduce way.
- Created HBase tables, HBase sinks and loaded data into them to perform analytics.
- Expertise in working with Python, RDBMS, and Linux shell scripting.
Python Developer
Confidential
Responsibilities:
- Worked on server-side applications using Python 2.7 programming.
- Automated the existing scripts for performance calculations using machine library like Writing Subqueries, Stored Procedures for Insurance premium.
- Expertise in software development using python and its packages like NumPy, SciPy, Pandas etc.
- Used Pandas API to put the data as time series in tabular format for each time stamp, data manipulation and data retrieval.
- Experience with continuous integration and automation using Jenkins.
- Have worked on MySQL data base for storing and retrieving application data
- Strong Data Querying/SQL and UNIX Shell Scripting skills. Possess excellent skills in Manual Testing and automated testing.
- Designed and developed Use-Case Diagrams, Class Diagrams, and Object Diagrams using UML Rational Rose.