Big Data Engineer Resume

PROFESSIONAL SUMMARY:

Around 5+ years of experience in IT developing industry which includes 4+ years of comprehensive experience in Big Data processing using Hadoop and its ecosystem (MapReduce, Pig, Hive, Sqoop, Flume, Spark, Kafka and HBase) and 1+ years in designing and developing various web applications using Java, Servlets, HTML, CSS and JavaScript .
Solid understanding of the Hadoop Distributed File System.
Excellent Experience in Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
Hands on experience in installing configuring and using Hadoop ecosystem components like Hadoop MapReduce HDFS HBase Hive Sqoop Pig Zookeeper Spark and Scala.
Strong understanding of real time streaming technologies Spark and Kafka.
Good Exposure on Apache Hadoop Map Reduce programming PIG Scripting and Distribute Application and HDFS
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa
Cluster coordination services through Zookeeper.
Knowledge of job workflow management and coordinating tools like Oozie.
Experience in managing Hadoop clusters using Cloudera Manager tool.
Worked on managing and automating Infrastructure Development and operations involving AMAZON WEB SERVICES(AWS).
Install and configure chef server /workstation and nodes via CLI tools to AWS nodes.
Created users and groups using IAM and assigned individual policies to each group.
Configured Security group for EC2 window and Linux instances.
Experience in working with Github private repositories and docker repositories.
Experience with Docker to create, manage, deploy and run containerized applications
Sound knowledge in various databases like MySQL & NoSQL(cassandra)
Experience in working with various build tools like Maven.
Working Experience in frameworks like Spring and Hibernate.
Strong working experience using Agile methodologies including Scrum.
Some Knowledge in some of unix/linux commands.
Excellent ability to understand complex scenarios and business problems and transfer the knowledge to other team members in the most comprehensive manner.
Strong communication skills, analytic skills, good team player and quick learner, organized and self-motivated.

TECHNICAL SKILLS:

BigData: Hadoop, Hbase, Hive, Sqoop, Oozie, Spark, Kafka, Flume, Mapreduce, Zookeeper

Operating Systems: UNIX, Mac, Linux, Windows 2000 / NT / XP / Vista.

Programming Languages: Java, Scala, Python, Go, C, VB, Objective C.

Databases: DB2, MySQL, Oracle 9i, MongoDB, Cassandra

IDE/Development Tools: Eclipse 5.x, JBuilder 7.x and RAD 7.0 NetBeans, Intellij

Web Technologies: J2EE, Soap & REST Web Services, JSP, Servlets, EJB, JavaScript, Struts, Spring, Web works, Direct Web remoting, HTML, XML, JMS, JSF, Ajax.

Frameworks: Struts2.0, Spring Framework and Hibernate

AWS Compute Services: EC2, ECS, Lambda

AWS Storage Services: S3

AWS Database: RDS

Version Control Tools: Git

Web/Application Servers: IBM Web sphere Application server, Jboss, Apache Tomcat, Nginx

Automation/Build Tools: Docker, Vagrant, Maven

PROFESSIONAL EXPERIENCE:

Confidential

Big Data Engineer

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop Ecosystem.
Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
Participate in the design and development of software using agile development practices.
Develop Scala and SQL code to extract data from various databases, apply innovative ideas around the Data Science and Advanced Analytics practices Creatively and present models to business customers and executives, utilizing a variety of formats and visualization methodologies.
Implement POC for using Apache Impala for data processing on the top of Hive.
Implemented POC to migrate map reduce jobs into Spark RDD transformations.
Converts GoLang scripts into spark jobs which takes necessary fields from impala and populate them into HBase.
Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and databases such as HBase.
Developing and building different spark projects using SBT and Maven.
Understanding of data storage and retrieval techniques, ETL and databases, to include graph stores, relational databases, tuple stores, NOSQL like HBASE. Hadoop, MySQL. Spark MLLIB libraries for designing recommendation Engines Analysis predicted by Statistical analysis using Spark
Estimated the hardware requirements for Name Node and Data Nodes & planning the cluster.
Experience in using Sqoop to import and export the data from Oracle DB into HDFS and HIVE.
Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster for generating reports on nightly, weekly and monthly basis and clean up jobs.
Commissioning and decommissioning the nodes, manage cluster through performance tuning and enhancement.
Involved in architecture and design of distributed time-series database platform using NOSQL technologies like Hadoop / Hbase, Zookeeper.
Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks in Jira. Participated in daily scrum and other design related meetings.
Transforming the data received from source systems using python and creating the files to load into hive.
Perform live tests and maintain expert knowledge in areas of expertise.
Experience in creating documents in Atlassian Toolset such as Confluence, Stash.

Environment: Hadoop , Java, CDH 5.5, Map Reduce, Hive, Pig, Sqoop, Flume, HBase, Scala, python, Go, Spark 1.6 & 2.x, Oozie, Linux, UNIX.

Confidential, Westborough, MA

Big Data Engineer

Responsibilities:

Scheduled and executed workflows in Oozie to run Hive and Spark jobs.
Parse Json files through Spark core to extract schema for the production data using SparkSQL and Scala
Experience in using DataStax Spark-Cassandra connectors to get data from Cassandra tables and process them using Apache Spark. Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
Analyzed business requirements and cross-verified with functionality and features of NOSQL databases like HBase, Cassandra to determine the optimal DB.
Developed Spark scripts to import large files from Amazon S3 buckets.
Efficiently handled periodic exporting of SQL data into Elastic search.
Worked on GitHub and Jenkins continuous integration tool for deployment of project packages.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
Uploaded and processed terabytes of data from various structured and unstructured sources into HDFS using Sqoop and Flume.
Implemented various hive optimization techniques like Dynamic Partitions, Buckets, Map Joins, Parallel executions in Hive.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Experience in managing and reviewing Hadoop log files.
Responsible to manage data coming from different sources.
Involved in loading data from UNIX file system to HDFS.

Environment: Hadoop, HDFS, MapReduce, Spark, Pig, Hive, Impala, Sqoop, Flume, Kafka, HBase, Oozie, SQL scripting, Cassandra, Linux shell scripting, Eclipse and Cloudera, AWS.

Confidential, Santa Barbara, CA

Hadoop Developer

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop Ecosystem.
Developed HIVE UDFs to perform data cleansing and transforming for ETL activities.
Developed HIVE UDFs and UDAFs for Data analysis and Hive table loads.
Wrote HIVE UDF’s to get the Data from HBase and Put the Data on HBase
Developed data pipeline using Flume and Sqoop to ingest data into HDFS for analysis
Developed Sqoop scripts to import the data from SQL Server to HDFS
Worked extensively on tuning Hive Jobs
Successfully loaded files to Hive and HDFS from Cassandra. Processed the source data to structured data and store in NoSQL database Cassandra. Created alter, insert and delete queries involving lists, sets and maps in DataStax Cassandra. Worked in a language agnostic environment with exposure to multiple web platforms such as AWS, databases like Cassandra.
Wrote UDF’s for Apache Drill
Configured Hive Storage plugin, Hbase Storage Plugin and DFS plugin to create Views on Drill
Implemented POC on Talend Open Studio for Big data.

Environment: Hadoop Framework, MapReduce, Hive, Sqoop, HBase, Cassandra, Flume, Oozie, Java (JDK1.6), SQL Server, Talend for Big Data, Apache Drill.

Confidential

Jr. Hadoop Developer

Responsibilities:

Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.
Implemented nine nodes CDH3 Hadoop cluster on Red hat LINUX.
Worked on installing cluster, commissioning & decommissioning of Data Node, Name Node recovery, capacity planning, and slots configuration.
Created HBase tables to store variable data formats of PII data coming from different portfolios.
Implemented best income logic using Pig scripts and UDFs.
Implemented test scripts to support test driven development and continuous integration.
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Responsible to manage data coming from different sources.
Involved in loading data from UNIX file system to HDFS.
Load and transform large sets of structured, semi structured and unstructured data
Cluster coordination services through Zookeeper.
Experience in managing and reviewing Hadoop log files.
Installed Oozie workflow engine to run multiple Hive and pig jobs.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Environment: Hadoop, HDFS, Pig, Hive, Sqoop, HBase, Shell Scripting, Ubuntu, Linux Red Hat.

Confidential

Java Developer

Responsibilities:

Performed analysis for the client requirements based on the developed detailed design documents.
Developed Use Cases, Class Diagrams, Sequence Diagrams and Data Models.
Developed STRUTS forms and actions for validation of user request data and application functionality.
Developed JSP's with STRUTS custom tags and implemented JavaScript validation of data.
Developed programs for accessing the database using JDBC thin driver to execute queries, Prepared statements, Stored.
Used JavaScript for the web page validation and Struts Validator for server-side validation
Designing the database and coding of SQL, PL/SQL, Triggers and Views using IBM DB2.
Developed Message Driven Beans for asynchronous processing of alerts.
Used Clear case for source code control and JUNIT for unit testing.
Involved in peer code reviews and performed integration testing of the modules. Followed coding and documentation standards.

Environment: Java, Struts, JSP, JDBC, XML, Junit, Rational Rose, CVS, DB2, Windows.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship