We provide IT Staff Augmentation Services!

Big Data Engineer Resume

2.00/5 (Submit Your Rating)

Richardson, TX

SUMMARY:

  • Above 6+ years of experience as Big Data Engineer/Hadoop and Java Developer with skills in analysis, design, development, testing and deploying various software applications.
  • Design and development experience with Big Data, AWS, Apache Spark, Python, Cassandra NoSQL, Scala, Hadoop Eco System Components like Pig, Hive, Sqoop, HDFS, Shell scripting, AWS, BI reporting.
  • Strong Experience working in Agile (Scrum) and Waterfall software development methodologies in the Software Development Life Cycle (SDLC).
  • Experience in analyzing data using Hive, Pig Latin and custom MR programs in Java.
  • Hands on experience in writing Spark SQL scripts and implementing Spark RDD transformations and actions using Python/Scala.
  • Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per requirement.
  • Excellent Java development skills using J2EE Frameworks like Spring, Hibernate, Web Services and Restful Web Services, Micro - services.
  • Experience building platforms and deploying cloud based tools and solutions with technologies like AWS EMR, RDS, Kinesis
  • Experience in developing applications using enterprise J2EE technologies like Java Servlets JSP.
  • Expertise in Front end web technologies such as HTML, JSP, CSS, JavaScript, JQuery, AJAX, XML, Angular JS and JSON.
  • Hands on using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Well versed with developing and implementing Spark programs using Python/Scala and Spark Streaming to work with Big Data.
  • Hands on writing custom UDFs for extending Hive and Pig core functionality.
  • Hands on dealing with log files to extract data and to copy into HDFS using flume.
  • Hands-on experience in provisioning and managing multi-tenant Cassandra cluster on public cloud environment - Amazon Web Services(AWS) - EC2, Open Stack.
  • Good experience in working with real time streaming applications using tools like Spark Streaming, Storm and Kafka
  • Experience working with cloud platforms, setting up environments and applications on AWS, automation of code and infrastructure (DevOps) using Chef, Jenkins and Deploy
  • Extensive experience working in Oracle, DB2, SQL Server and My SQL database and Java Core concepts like OOPS, Multithreading, Collections, Design Patterns, Generics and IO.
  • Good understanding on Hadoop architecture and various components such as HDFS, Job and Task Tracker, Resource and Node Manager, Name and Data Node, Secondary Name Node and Map Reduce programming.
  • Experience in Hadoop administration activities such as installation and configuration of clusters using Apache and Cloudera.
  • Knowledge on installing, configuring, and using Hadoop components like Hadoop Map Reduce(MR1), YARN(MR2), HDFS, Hive, Pig, Flume and Sqoop.
  • Good Knowledge in XML and related technologies like XSL, XSLT and parsers like JAXP (SAX, DOM) and JAXB.
  • Interested in exploring new technologies and experimenting with them to improve existing infrastructure and applications.
  • Good understanding of Data Mining and Machine Learning techniques.

TECHNICAL SKILLS:

Frameworks: Spring, Hibernate, Struts.

Big Data Technologies: Hive, Map Reduce, Hdfs, Sqoop, R, Flume, Spark, Apache Kafka, Hbase, Pig, Elastic search, AWS, Oozie, Zookeeper, Apache hue, Apache Tez, YARN, Talend, Storm, Impala, Tableau and Qlikview.

Development Tools: Microsoft SQL Studio, Toad, Eclipse, NetBeans, MySQL Workbench, Tableau.

Web/Application servers: Apache Tomcat, Web logic.

Web Technologies: HTML, CSS, JSP, Web Services, XML, JavaScript.

Scripting Languages: UNIX Shell scripting, SQL and PL/SQL, JavaScript, Shell Scripting.

Cluster Monitoring Tools: Apache Tomcat, Web logic.

Methods: Worked in most of the phases of Agile and Waterfall methodologies.

Webservices: AWS.

Languages: SQL, C, C++, Java, J2EE, Pig Latin, Hive, Scala, Java, Python, TSQL, Latin, HiveQL

IDEs: Eclipse, Net Beans, MS Office, Microsoft Visual Studio

Web Technologies: JDK 1.4/1.5/1.6 HTML, XML, DHTML, MSXML, ASPX, Eclipse.

Operating System: Windows Different distributions of Linux/Unix/Ubuntu.

Database System: SQL, MySQL, Hbase, MongoDB, Cassandra.

PROFESSIONAL EXPERIENCE:

Confidential, Richardson,TX

Big Data Engineer

Responsibilities:

  • Extensively involved in Design phase and delivered Design documents in Hadoop eco system with HDFS, HIVE, PIG, SQOOP and SPARK with SCALA.
  • Collected the logs from the physical machines and the Open Stack controller and integrated into HDFS using Kafka.
  • Involved in the high-level design of the Hadoop architecture for the existing data structure and Business process
  • Part of Configuring & deployment of Hadoop Cluster in the AWS cloud.
  • Worked with clients to better understand their reporting and dash boarding needs and present solutions using structured Agile project methodology approach.
  • Worked on analyzing Hadoop cluster and different Big Data Components including Pig, Hive, Spark, HBase, Kafka, Elastic Search, database and SQOOP.
  • Involved in loading disparate datasets into Hadoop Data Lake, this would be available to the data science team to predict the future.
  • Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for increasing performance benefit and helping in organizing data in a logical fashion.
  • Installed Hadoop, Map Reduce, HDFS, and developed multiple Map-Reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Created tables in HBase to store variable data formats of PII data coming from different portfolios
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning forHive performance enhancement and storage improvement.
  • Experienced in pulling the data from Amazon S3 bucket to Data Lake and builtHive tables on top of it and created data frames in Spark to perform further analysis.
  • Used cloud computing on the multi-node cluster and deployed Hadoop application on cloud S3 and used Elastic Map Reduce (EMR) to run aMapReduce.
  • Explored MLlib algorithms in Spark to understand the possible Machine Learning functionalities that can be used for use case.
  • In preprocessing phase of data extraction, we used Spark to remove all the missing data for transforming of data to create new features.
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in loading data from UNIX file system to HDFS using Flume and HDFSAPI.
  • Configured Spark Streaming to receive real time data from the Kafka and storethe stream data to HDFS.
  • Developed RDD's/Data Frames in Spark using Scala and Python and applied several transformation logics to load data from Hadoop Data Lake to Cassandra DB.
  • Exported the analyzed data to the NoSQL Database using HBase for visualization and to generate reports for the Business Intelligence team using SAS.
  • Used various HBase commands and generated different Datasets as per requirements and provided access to the data when required using grant and Revoke
  • Created Hive tables as per requirement as internal or external tables, intended for efficiency.
  • Developed MapReduce programs for the files generated by hive query processing to generate key, value pairs and upload the data to NoSQL database HBase.
  • Implemented installation and configuration of multi-node cluster on the cloud using Amazon Web Services (AWS) on EC2.
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts
  • Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts
  • Worked with Elastic MapReduce (EMR) and setting up environments on Amazon AWS EC2 instances.
  • Developed various data connections from data source to SSIS, Tableau Server for report and dashboard development
  • Involved unit testing, interface testing, system testing and user acceptance testing of the workflow tool.
  • Used JIRA for bug tracking and GIT for version control.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Apache Hadoop 3.0, AWS, MLlib, MYSQL, Kafka, HDFS 1.2, Hive 2.3, Pig0.17, MapReduce, Flume 1.8, Cloudera, Oozie, UNIX, Oracle 12c, Tableau 7, GIT, UNIX.

Confidential, Pleasanton, CA

Big Data Engineer

Responsibilities:

  • Implemented POC by comparing SPARK with Hive on big data sets by performing aggregations and observing time responses
  • Worked with Business Analyst and helped representing the business domain details and prepared low level and high level documentation
  • Created Hive tables and created Sqoop jobs to import the data from Oracle to HDFS and scheduled them in Autosys by creating Oozie workflows
  • Designed and developed applications that work on AngularJS based UI and RestfulAPIs, Cassandra DB, AWS environment, security,
  • Import the data from different sources like HBASE into Spark RDD developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Developed script which will Load the data into Spark RDD and do in memory data computation to generate the output response.
  • Involved in converting Hive into Spark transformations using Spark RDDs, Scala and have a good experience in using Spark-Shell and Spark Streaming.
  • Developed Spark streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing.
  • Developed Spark programs, scripts and UDF's using Spark SQL for aggregative operations as per the requirement.
  • Used Spark Data Frame API to perform analytics on hive data and implemented various checkpoints on RDD's to disk to handle job failures and debugging.
  • Involved in executing various Oozie workflows and automating parallel Hadoop MapReduce jobs.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the HDFS and to run multiple Hive and Pig jobs.
  • Involved in converting Hive into Spark transformations using Spark SQL and Scala.
  • Automated workflows using shell scripts and Control-M jobs to pull data from various databases into Hadoop Data Lake.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
  • Leveraging DevOps techniques and practices like Continuous Integration, Continuous Deployment, Test Automation, Build Automation and Test Driven Development to enable the rapid delivery of end user capabilities

Environment: Map Reduce, HBase, HDFS, Hive, Pig, Java, Kafka, SQL, Hortonworks, Spark, Sqoop, Storm, Flume, AWS, Tableau, YARN, Oozie, Eclipse, Cloudera, Cassandra, Python, Scala, Shell Scripting, Hadoop, Oracle, UNIX, NoSQL.

Confidential, New York City, New York

Hadoop Developer

Responsibilities:

  • Involved in importing and exporting data between Hadoop data lake and Relational Systems like Oracle, Mysql, DB2, Informix and Teradata using Sqoop.
  • Involved in ingesting data into Cassandra and consuming the ingested data from Cassandra to Hadoop Data lake.
  • Involved in the process of Cassandra data modelling and building efficient data structures.
  • Worked on Creating Kafka topics, partitions, writing custom partitioner classes.
  • Imported Avro files using Apache Kafka and did some analytics using Spark in Scala.
  • Developed script which will Load the data into Spark RDD and do in memory data computation to generate the output response.
  • Involved in migrating map reduce jobs into RDD (Resilient data distributions) and create Spark jobs for better performance.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data
  • Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume.
  • Involved in executing various Oozie workflows and automating parallel Hadoop Mapreduce jobs.
  • Enhanced HIVE queries performance using TEZ for Customer Attribution datasets.
  • Involved in loading data from UNIX file system to HDFS and also responsible for writing generic scripts in UNIX.
  • Transferred and loaded datasets from Hive tables to Greenplum using Yaml.
  • Hands on experience in creating analytic, attribute and calculation views in SAP Hana.
  • Involved in using Solr Cloud implementation to provide real time search capabilities on the repository with tera bytes of data.
  • Involved in bug fixing and 24-7 production support running processes.
  • Working with vendor and client support teams to assist critical production issues based on SLA's.
  • Involved in restarting failed Hadoop jobs in production environment.
  • Created Talend Mappings to populate the data into dimensions and fact tables
  • Developed jobs to move inbound files to vendor server location based on monthly, weekly and daily frequency in Talend.
  • Involved in running queries using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
  • Generated various marketing reports using Tableau with Hadoop as a source for data.
  • Also took active part as a Release Engineer in releasing the code to CERT and Production Environment.

Environment: Pivotal, MapReduce, HDFS, Hive, Pig, Oozie, Cassandra, Sqoop, Apache Kafka, Storm, Impala, Linux, Talend, Tableau, Splunk, Solr, Jira, Confluence, GitHub, Bitbucket, Source tree, Jenkins.

Confidential

Build and Release Engineer

Responsibilities:

  • Release Engineer for a team that involved different development teams and multiple simultaneous software releases.
  • Designing and implementing for fully automated server build management, monitoring and deployment by Using DevOps Technologies like Chef.
  • Responsible for design and maintenance of the Subversion/GIT, Stash Repositories, views, and the access control strategies.
  • Used ANT and Python scripts to automate the Build and deployment process. Used maven for few modules.
  • DevOps for load balanced environments & Multi-regional server environments
  • Monitoring each service deployment, and validating the services across all environments.
  • Deployed J2EE applications to Application servers in an Agile continuous integration environment and also automated the whole process. Build scripts using ANT and MAVEN build tools in Jenkins, Sonar to move from one environment to other environments.
  • Involved in build and deploying SCA modules in IBM WebSphere Process server.
  • Worked on Java/J2ee deployments in web sphere.
  • Prepared Migration logs for every release and maintained the data accuracy.
  • Maintained Defect Fix Deployments and documented the deployed files in the appropriate Environment Migration log.
  • Working with Change Order with current release and implement them in the Production.
  • Created Branches, Tags for each release and particular environments.
  • Merged the branches after the Code Freeze.
  • Created the Deployment notes along with the Local SCM team and released the Deployment instructions to Application Support.

Environment: Java/J2ee, Eclipse, Chef, Ant, Maven, Jenkins, GIT, Subversion, WebSphere Application Server (WAS), Apache, PERL, BASH, UNIX, Python.

Confidential

Build and Release Engineer

Responsibilities:

  • Setting up continuous integration and formal builds using Bamboo with Artifactory repository
  • Involved in setting up JIRA as defect tracking system and configured various workflows, customizations and plugins for the JIRA bug/issue tracker
  • Integrated Maven with SVN to manage and deploy project related tags
  • Installed and administered Artifactory repository to deploy the artifacts generated by Maven and to store the dependent jars which are used during the build
  • Mentor Business Areas on Subversion Branching, Merging strategies concepts
  • Resolved update, merge and password authentication issues in Bamboo and JIRA
  • Involved partially in deploying WARs/EARs (backend) through WebLogic Application Server console
  • Performed setup of Clustered environment with WebLogic Application Server
  • Written WLST scripts to deploy the WAR/WAR files to the target WebLogic Server
  • Support Lead developers with Configuration Management issues
  • Worked for Creating the Software Configuration Management Plan
  • Managed all the bugs and changes into a production environment using the Jira tracking tool
  • Managed the entire release communication and Release co-ordination during the Project roll-out Involved in estimating the resources required for the project based on the requirements

Environment: Java, Maven, Bamboo, Linux, WebLogic, Subversion, Shell scripting, WLST Scripting

Confidential

Linux Administrator

Responsibilities:

  • Primarily responsible for maintenance and support of Linux environments in DEV/QA, Staging and Production.
  • Eliminated shared administrative privileges among all members of IT and replaced with model using permissions delegated to additional administrative accounts
  • Developed strategic plan to reduce material inventories and create a manageable IT infrastructure.
  • Monitor and tune appropriate systems to ensure optimum levels of performance.
  • Implemented a Norton Ghost program that resulted in significant time savings for new system builds
  • Initiated and created an automated self-healing data monitoring and retrieval system for archived data from multiple remote site
  • Work on installation, configuration and maintenance Redhat, CentOS and Servers at multiple Data Centers.
  • Building the host using the Kick start.
  • Scanning the newly assigned LUNs to the serves and assigning them to volume group and increasing the file system using Red Hat volume manager
  • Perform scripting on Perl and shell for monitoring and scheduling the Jobs.
  • Installed and managed VERITAS Volume Manager 3.5 (VxVM) on Solaris 9.
  • Understanding of SAN and NAS storage and load balancing.
  • Troubleshooting of Logical Volume manager. Configuration of Red Hat Clustering.
  • Installation, Configuration and administration of DNS, LDAP, NFS, NIS, NIS+ and Sendmail on Redhat.
  • Administration and Configuration of Apache Web Server and SSL.
  • Created and maintained network users, user environment, directories, and security.

Environment: RHEL5&6 Solaris 9,10 &11, IBM AIX 4 & 5, DNS, DHCP, LDAP NFS, FTP, GIT, Sendmail, JIRA, Puppet, Apache Tomcat, Shell, Veritas Volume Manager

We'd love your feedback!