We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

Irvine, CA


  • 5+ Years of hands on experience as a Software Developer in the IT industry.
  • 3+ Years of experience working with distributed systems like Big Data/Hadoop ecosystem(HDFS, MapReduce, Hive, Pig, Sqoop, HBase).
  • 2+ Years of Experience on working Spark stack (Spark Streaming, SparkSQL).
  • 2+ years of experience in Cloud platform (AWS).
  • Experience in creating the statistical graphs and dashboards using Platfora/Tableau.
  • Experience in automating the data Ingestion from disparate data sources using Apache Nifi.
  • Knowledge on Machine Learning Algorithms.
  • Familiar with data architecture including data ingestion, pipeline design, data modeling and data mining and advanced data processing.
  • Experiencein tuning,optimizing Hadoop/ Spark and ETL workflows for High Availability.
  • Hands - on experience with Amazon EC2, Amazon S3, EMR, Amazon Elastic Load Balancing, Kinesis, Cloud Watch and other services of the AWS family.
  • Selecting appropriate AWS services to design and deploy an application based on given requirements.
  • Implementing cost control strategies.
  • Setup/Managing CDN on Amazon Cloud Front to improve site performance.
  • Expertise on working with NOSQL databases like MongoDB, Apache Cassandra.
  • Expertise on Java, J2EE, Java Scripting, HTML, JSP.
  • Solid programming knowledge on Scala, Python with Lambda functions.
  • Developed Oozie workflow schedulers.
  • Experience in handling cluster in Safe mode.
  • Good knowledge of High-Availability, Fault Tolerance, Scalability, Database Concepts, System and Software Architecture, Security and IT Infrastructure.
  • Recommended changes in the web data which increased the business from 35% to 80%.
  • Lead onshore & offshore service delivery functions to ensure end-to-end ownership of incidents.
  • Ability to handle multiple competing priorities in an agile.
  • Experience working in a cross-functional AGILE team.


Big Data Ecosystems: MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Oozie,Storm and Flume.

Spark Streaming Technologies: Spark, Kafka, Storm, Flume.

Scripting Languages: Python, Scala, Bash.

Programming Languages: Java.

Databases: Oracle, Cassandra, NoSQL (Certified MongoDB)

Testing Tools: NetBeans, Eclipse.

Methodologies: Agile, UML, Design Patterns.

Operating Systems: Unix/Linux

Machine Learning Skills (MLlib): Feature Extraction, Dimensionality Reduction, Model Evaluation, Clustering.


Confidential, Irvine, CA

Sr. Big Data Engineer

Environment: s: Analytics, AWS, Hive, Netezza, Informatica, AWS Redshift, AWS S3, Apache Nifi, ControlM (Automation Tool), Spark Streaming, Kafka.


  • Architecting/pipelining the data workflow from end-to-end.
  • Analyzing the web data and giving recommendations to VP for business improvement.
  • Ingesting the raw data onto HDFS cluster from different data sources.
  • Listening to the eventscreated in webserver using Kafka.
  • Handling the Dstreams (Stateful Transformations) in Spark streaming.
  • Querying to get the metrics from bothstructured and unstructured data.
  • Enabling the spark jobs using spark-submit.
  • Cleansing and checking the quality of the data using hive.
  • Automated the data flow using Nifi/ControlM.
  • Worked on Hive, Impala, Sqoop.
  • Retrieving data from the source DB Netezza.
  • Transferred the data using Informatica tool from AWS S3 to
  • Using AWS Redshift for storing the data on cloud.
  • Working on Hive UDF’s.
  • Working on SparkSQL to check the pirated data.
  • Sole person responsible for Spark jobs in Production support.
  • Responsible in streaming PB’s of data from different sources in an hourly manner.
  • Experienced in creating accumulators and broadcast variables in Spark.
  • Good working experience on submitting the Spark jobs which shows the metrics of the data which is used for Data Quality Checking.
  • Hands-on experience in visualizing the metrics data using Platfora/Tableau.
  • Creating graphs/Dashboards from the cleansed metrics data using Tableau
  • Designed and implemented test environment on AWS.
  • Responsible for business improvement and Cost management.
  • Designed AWS Cloud Formation templates to create and ensure successful deployment of Web applications and database templates.
  • Creating S3 buckets also managing policies for S3 buckets and utilized S3 bucket and Glacier for storage and backup on AWS.
  • Experienced with managing IAM users by creating new users, giving them a limited access as per needs, assign roles and policies to specific user.
  • Acted as technical liaison between customer and team on all AWS technical aspects.
  • Involved in ramping up the team by coaching other team members

Confidential, Palo Alto, CA

Big Data Hadoop Engineer.

Environment: s: Hadoop, Hive, Apache Nifi, Pig, Sqoop, Oozie, MapReduce, Python.


  • Working with different datasets.
  • Moving the raw data between different systems in an automation manner using Apache Nifi.
  • Tracking the data flow in real time using open source tools.
  • Importing the complete data from RDBMS to HDFS cluster using Sqoop.
  • Creating external tables and moving the data onto the tables from managed tables.
  • Performing the complex queries in Hive.
  • Partitioning the data using HiveQL.
  • Moving this partitioned data onto the different tables as per as business requirements.
  • Invoking an external UDF/UDAF/UDTF python script from Hive using Hadoop Streaming approach by Ganglia.
  • Creating batch flow schedulers using oozie.
  • Identifying the errors in the logs and rescheduling/resuming the job.
  • Able to handle whole data using HWI (Hive Web Interface) using Cloudera Hadoop distribution UI.
  • Deployed the Big Data Hadoop application using Talend on cloud AWS.
  • Involved in Designing and Developing Enhancements product features.
  • Involved in Designing and Developing Enhancements of CSG using AWS APIS.
  • Enhance the existing product with newly features like User roles (Lead, Admin, Developer), ELB, Auto scaling, S3, Cloud Watch, Cloud Trail and RDS-Scheduling.
  • Created monitors, alarms and notifications for EC2 hosts using Cloud Watch, Cloud trail and SNS.
  • Employed Agile methodology for project management, including: tracking project milestones; gathering project requirements and technical closures; planning and estimation of project effort; creating important project related design documents and identifying technology related risks and issues.
  • Mapping the input webserver data with Informatica 9.5.1 and Informatica 9.6.1 Big Data edition.
  • After the transformation of data is done, this transformed data is then moved to Spark cluster where the data is set to go live on to the application using Spark streaming and kafka.
  • Created RDD’s in Spark technology.
  • Extracting data from data warehouse (TeraData) on to the Spark RDD’s
  • Experience on Spark with Scala/Python.
  • Working on Stateful Transformations in Spark Streaming.
  • Worked on Batch processing and Real-time data processing on Spark Streaming using python Lambda functions.
  • Worked on Spark SQL UDF’s and Hive UDF’s.
  • Worked with Spark accumulators and broadcast variables.
  • Using decision tree as a model evaluation for both classification and regression.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
  • Supported code/design analysis, strategy development and project planning.


Data warehouseconsultant

Environment: s: Cassandra, HDFS, MongoDB, Zookeeper, Oozie, Pig, Informatica.


  • My responsibility in this project is to create an e-commerce application as per as business requirements.
  • The application got deployed using JSON and AngularJS using MongoDB.
  • Worked with data profiling on different datasets using Informatica.
  • Ensuring the developed application is not against business rules and regulations.
  • Making sure that the data is cleansed properly as per as business requirement. load the raw numbers into a stat package, run some basic descriptive statistics, and report the output in a summary file or perhaps a simple data visualization.
  • Working on Parquet data files.
  • The data is ingested into this application by using Hadoop technologies like PIG and HIVE.
  • The feedbacks in the form of mails are retrieved using Sqoop.
  • Became a major contributor and potential committer of an important open source Apache project.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh datawith EDW reference tables and historical metrics.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
  • Provided design recommendations and thought leadership to sponsors/stakeholders thatimproved review processes and resolved technical problems. Managed and reviewed Hadoop log files.
  • Troubleshooting, Manage and review data backups, Manage & review Hadoop logfiles.
  • Experience in handling large data using Teradata Aster.
  • Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Defined Oozie Job flows.
  • Loaded log data directly into HDFS using Flume.
  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
  • Followed standard Back up policies to make sure the high availability of cluster.


Java-J2EE Developer

Environment: Apache Web server, Java Beans(EJB), Java/J2ee, JSP, Putty.


  • Analyzed the requirements and designed class diagrams, sequence diagrams using UML and prepared high level technical documents.
  • Developed the business layer logic and implemented EJBs Sessionbeans.
  • Implemented Java and J2EE Design patterns like Business Delegate and Data Transfer Object (DTO), Data Access Object and Service Locator.
  • UsedANTautomatedbuildscriptstocompileandpackagetheapplicationandimplementedLog4j for the project.
  • Involved in the project which works on Hibernate Spring Framework
  • Involved in documentation, review, analysis and fixed post production issues.
  • Working Knowledge on Socket Programming.
  • Maintained the Production and the Test systems.
  • Worked on bug fixing and enhancements on changerequests.
  • Development of interface using Spring Batch.
  • Extensively used the Hibernate Query Language for data retrieval from the database and process the data in the business methods.
  • Developed application in Eclipse IDE tool and deployed in WebSpherein server side.
  • Developed pages using JSP, JSTL, Spring tags, JQuery, Java Script & Used JQuery to make AJAX calls.
  • Used Jenkins continuous integration tool to do thedeployments.
  • Performance Tuning for Oracle RDBMS using Explain Plan andHINTS.
  • Developed several REST web services supporting both XML and JSON to perform task such as demand response management.
  • Used Servlet, Java and Spring for server side business logic.

Hire Now