We provide IT Staff Augmentation Services!

Sr. Big Data Engineer(aws) Resume

Irvine, CA


  • 8+ Years of hands on experience as a Software Developer in the IT industry.
  • 3+ Years of development experience with Big Data Hadoop cluster(HDFS, MapReduce frameworks), Hive, Pig, Talend, Apache Nifi.
  • 2+ years of experience in Cloud platform (AWS).
  • 2+ Years of Experience on working using Spark Technology.
  • Experience on working with Akka Framework.
  • Experience on creating graphs using Apache Nifi,Similarly hands - on experience in creating and following the security protocols using Nifi, Accumulo.
  • Expertise on Spark streaming (Lambda Architecture), Spark SQL, Tuning and Debugging the Spark Cluster (MESOS).
  • Expertise on working with Machine Learning with MLlib using Python.
  • Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing. Experience optimizing ETL workflows.
  • Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, Cloud Front, CloudWatch, SNS, SES, SQS and other services of the AWS family.
  • Selecting appropriate AWS services to design and deploy an application based on given requirements.
  • Implementing cost control strategies.
  • Setup/Managing CDN on Amazon Cloud Front to improve site performance.
  • Expertise on working withMongoDB, Apache Cassandra.
  • Expertise on Java, J2EE, Java Scripting, HTML, JSP.
  • Solid programming knowledge on Scala,Python.
  • Experience in working with Teradata. And making the data to be batch processing using distributed computing.
  • Good working experience on Hadoop tools related to Data warehousing like Hive, Pig and also involved in extracting the data from these tools on to the cluster using Sqoop.
  • DeveloperdOozie workflow schedulers to run multiple Hive and Pig jobs that run independently with time and data availability.
  • Experience on handling cluster when it is in Safe mode.
  • Good knowledge of High-Availability, Fault Tolerance, Scalability, Database Concepts, System and Software Architecture, Security and IT Infrastructure.
  • Lead onshore & offshore service delivery functions to ensure end-to-end ownership of incidents and service requests.
  • Getting in touch with the Junior developers and keeping them updated with the present cutting Edge technologies like Hadoop, Spark, SparkSQL.
  • All the projects which I have worked for are Open Source Projects and has been tracked using JIRA.
  • Experience on agile methodologies Scrum.


Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Cassandra, Oozie, Storm, and Flume.

Spark Streaming Technologies: Spark, Kafka, Storm

Scripting Languages: Cassandra, Python, Scala, Ruby on Rails and Bash.

Programming Languages: Java, SQL, Java Scripting, HTML5, CSS3

Databases: Data warehouse, RDBMS, NoSQL (Certified MongoDB), Oracle.

Java/J2EE Technologies: Servlets, JSP (EL, JSTL, Custom Tags), JSF, Apache Struts, Junit, Hibernate 3.x, Log4J Java Beans, EJB 2.0/3.0, JDBC, RMI, JMS, JNDI.

Tools: Eclipse, JDeveloper, MS Visual Studio, Microsoft Azure HDinsight, Microsoft Hadoop cluster, JIRA.

Testing Tools: NetBeans, Eclipse.

Methodologies: Agile, UML, Design Patterns.

Operating Systems: Unix/Linux

Machine Learning Skills (MLlib): Feature Extraction, Dimensionality Reduction, Model Evaluation, Clustering.


Confidential, Irvine, CA

Sr. Big Data Engineer(AWS)


  • As a Sr. Big Data Engineer at Confidential I work on datasets which shows complete metrics of any type of table
  • which is in any type of format. I get these datasets using Spark-submit where I submit the application to
  • spark for the job to be done.
  • I usually code the application in Scala using IntelliJ. • Later using SBT Scala I will be creating a JAR file where this JAR file is submitted to Spark and the Spark- submit Job starts running.
  • Using Talend making the data available on cloud for off shore team. Using Last Processed Date as a time stamp I usually run the job in daily manner.
  • And this automation job completely done on YARN cluster.
  • Automating the data flow using Nifi, Accumulo and the ControlM.
  • The process is followed in daily manner automatically.
  • I usually setup the jobs to run automatically using ControlM.
  • I also work on Hive, Impala, Sqoop.
  • If we don’t have data on our HDFS cluster I will be sqooping the data from netezza onto out HDFS cluster. • Transferredthe data using Informatica tool from AWS S3 to AWS Redshift.
  • Worked on Hive UDF’s and due to some security privileges I have to ended up the task in middle itself.
  • Worked on SparkSQL where the task is to fetch the NOTNULL data from two different tables and loads
  • into a lookup table. Here in look up table the daily data should be loaded in incremental manner and also
  • should check for the duplicates.
  • And I am the only person in Production support for Spark jobs.
  • Worked on Spark Streaming using Kafka to submit the job and start the job working in Live manner.
  • Used to handle lot of tables and millions of rows in a daily manner.
  • Experience in creating accumulators and broadcast variables in Spark.
  • Hands-on experience in visualizing the metrics data using Platfora.
  • Good working experience on submitting the Spark jobs which shows the metrics of the data which is used for Data Quality Checking.
  • Designed and Implement test environment on AWS.
  • Responsible for Designing and configuring Network Subnets, Route Tables, Association of Network ACLs to Subnets and Open VPN.
  • Responsible for Account management, IAM Management and Cost management.
  • Designed AWS Cloud Formation templates to create VPC, subnets, NAT to ensure successful deployment of Web applications and database templates.
  • Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup on AWS. 
  • Experience to manage IAM users by creating new users, giving them a limited access as per needs, assign roles and policies to specific user.
  • Act as technical liaison between customer and team on all AWS technical aspects.
  • Involved in Ramp up the team by coaching team members

Environments: AWS, Hive, Netezza, Informatica, Talend, AWS Redshift, AWS S3, Apache Nifi, Accumulo, ControlM.

Confidential, CA

Sr. Big Data/Spark Developer

  • Working with two different datasets one using HiveQL and other using Pig Latin.
  • Experience on moving the raw data between different systems using Apache Nifi.
  • Automating the data flow process using Nifi.
  • Also hands-on experience on tracking the data flow in a real time manner using Nifi.
  • Writingmapreduce code using python in order to get rid of certain security issues in the data.
  • Synchronizingboth the unstructured and structured data using Pig and Hive on business prospectus.
  • Used Pig Latin at client side cluster and HiveQL at server side cluster.
  • Importing the complete data from RDBMS to HDFS cluster using Sqoop.
  • Creating external tables and moving the data onto the tables from managed tables.
  • Performing the subqueries in Hive.
  • Partitioning and Bucketing the imported data using HiveQL.
  • Partitioning dynamically using dynamic-partition insert feature.
  • Moving this partitioned data onto the different tables as per as business requirements.
  • Invoking an external UDF/UDAF/UDTF python script from Hive using Hadoop Streaming approach which is supported by Ganglia.
  • Setting up the work schedule using oozie.
  • Identifying the errors in the logs and rescheduling/resuming the job.
  • Able to handle whole data using HWI (Hive Web Interface) using Cloudera Hadoop distribution UI.
  • Deployed the Big Data Hadoop application using Talendon cloud AWS (Amazon Web Sevices) and also on Microsoft Azure.
  • Involved in Designing and Developing Enhancements product features.
  • Involved in Designing and Developing Enhancements of CSG using AWS APIS.
  • Enhance the existing product with newly features like User roles (Lead, Admin, Developer), ELB, Auto scaling, S3, Cloud Watch, Cloud Trail and RDS-Scheduling.
  • Created monitors, alarms and notifications for EC2 hosts using Cloud Watch, Cloud trail and SNS.
  • Involved in Designing the SRS with Activity Flow Diagrams using UML.
  • Employed Agile methodology for project management, including: tracking project milestones; gathering project requirements and technical closures; planning and estimation of project effort; creating important project related design documents and identifying technology related risks and issues.
  • As per as business requirements we use Talend to integrate the data on cloud and make it accessible to the offshore medical team.
  • Working with Informatica 9.5.1 and Informatica 9.6.1 Big Data edition. Scheduling the jobs
  • After the transformation of data is done, this transformed data is then moved to Spark cluster where the data is set to go live on to the application using Spark streaming and kafka.
  • Created RDD’s in Spark technology.
  • Extracting data from data warehouse(TeraData)on to the Spark RDD’s
  • Experience on Spark withScala/Python.
  • Working on Stateful Transformations in Spark Streaming.
  • Worked on Batch processing and Real-time data processing on Spark Streaming using Lambda architecture.
  • Good hands-on experience on Loading data onto Hive from Spark RDD’s.
  • Worked on Spark SQL UDF’s and Hive UDF’s.
  • Worked with Spark accumulators and broadcast variables.
  • Using decision tree as a model evaluation for both classification and regression.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
  • Supported code/design analysis, strategy development and project planning.

Environments: HDFS cluster,Hive, Apache Nifi, Pig, Sqoop, Oozie, MapReduce, Talend, Python.

Confidential, Palo Alto, CA

Big Data Hadoop Developer

  • My responsibility in this project is to create an e-commerce application as per as business requirements.
  • The application got deployed using JSON and AngularJS as part of MongoDB which is also called a NOSQL.
  • Conducting some transformations using Cassandra Query Language(CQL).
  • The data is ingested into this application by using Hadoop technologies like PIG and HIVE.
  • The feedbacks are retrieved using Sqoop.
  • Became a major contributor and potential committer of an important open source Apache project.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh datawith EDW reference tables and historical metrics.
  • Enabled speedy reviews and first mover advantages by using Oozieto automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
  • Provided design recommendations and thought leadership to sponsors/stakeholders thatimproved review processes and resolved technical problems. Managed and reviewed Hadoop log files.
  • Troubleshooting, Manage and review data backups, Manage & review Hadoop logfiles.
  • Experience in handling large data using Teradata Aster.
  • Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Defined Oozie Job flows.
  • Loaded log data directly into HDFS using Flume.
  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
  • Followed standard Back up policies to make sure the high availability of cluster.

Environments: Cassandra, HDFS, MongoDB, Zookeeper, Oozie, Pig.

Confidential, Louisville, KY

Senior Java-J2EE Developer

  • Analyzed the requirements and designed class diagrams, sequence diagrams using UML and prepared high level technical documents.
  • Designed and developed UI screens with XSLT and JSF (MVC) to provide interactive screens to display data.
  • UsedparserslikeSAXandDOMforparsingxmldocumentsandusedXMLtransformationsusing XSLT.
  • Developed the business layer logic and implemented EJBs Sessionbeans.
  • Implemented Java and J2EE Design patterns like Business Delegate and Data Transfer Object (DTO), Data Access Object and Service Locator.
  • Used Apache POI-HSSF for generating reports in MS Excel and iTextfor generate PDF reports.
  • UsedANTautomatedbuildscriptstocompileandpackagetheapplicationandimplementedLog4j for the project.
  • Involved in the project which works on Hibernate Spring Framework
  • Involved in documentation, review, analysis and fixed post production issues.
  • Working Knowledge on Socket Programming.
  • Maintained the Production and the Test systems.
  • Worked on bug fixing and enhancements on changerequests.
  • Development of interface using Spring Batch.
  • Extensively used the Hibernate Query Language for data retrieval from the database and process the data in the business methods.
  • Developed application in Eclipse IDE tool and deployed in Webspherein server side.
  • Developed pages using JSP, JSTL, Spring tags, JQuery, Java Script & Used JQuery to make AJAX calls.
  • Used Jenkins continuous integration tool to do thedeployments.
  • Performance Tuning for Oracle RDBMS using Explain Plan andHINTS.
  • Developed several REST web services supporting both XML and JSON to perform task such as demand response management.
  • Used Servlet, Java and Spring for server side business logic.

Environment: Windows XP, BEA Web logic 9.1, Apache Web server, ArcGIS Server 9.3, ArcSDE9.2, Java Web ADF for ArcGIS Server 9.3 Windows XP, Enterprise Java Beans(EJB), Java/J2ee, XSLT, JSF, JSP, POI-HSSF, iText, Putty.


Java-J2EE Developer

  • Developed web application using Struts Framework, Spring framework.
  • Developed user interfaces using JSP, HTML and CSS.
  • Used Eclipse as IDE tool to develop the application. Created Web.xml, Struts-config.xml, Validation.xml files to integrate all the components in the Struts framework.
  • Worked heavily with the Struts tags- used struts as the front controller to the web application.
  • Implemented Struts Framework according to MVC design pattern.
  • Used Struts framework to generate Forms and actions for validating the user request data.
  • Developed Server side validation checks using Struts validators and Java Script validations.
  • With JSP’s and Struts custom tags, developed and implemented validations of data.
  • Developed applications which access the database with JDBC to execute queries, prepared statements, and procedures.
  • Developed programs to manipulate the data and perform CRUD operations on request to the database.
  • Worked on developing Use Cases, Class Diagrams, Sequence diagrams, and DataModels.
  • Coding of SQL, PL/SQL, and Views using IBM DB2 for the database.
  • Working on issues while converting JAVA toAJAX.
  • Supported in developing business tier using the stateless session bean.
  • Using the GWT to build screens and make remote procedure calls to middleware.
  • Using Clear case for source code control and JUNIT testing tool for unittesting.
  • Reviewing the code and perform integrated module testing.

Environment: Windows XP, Java/J2ee, Struts, JUNIT, Java, Servlets, JavaScript, SQL, HTML, XML, Eclipse, Spring Framework.


Java Developer

  • Gathered specifications for the Library site from different departments and users of the services.
  • Assisted in proposing suitable UML class diagrams for the project.
  • Wrote SQL scripts to create and maintain the database, roles, users, tables, views, procedures and triggers in Oracle
  • Designed and implemented the UI using HTML, JSP, JavaScript and Java.
  • Implemented Multi-threading functionality using Java ThreadingAPI
  • Extensively worked on IBM Web Sphere 6.0 while implementing the project.
  • Developed the UI screens using HTML5, DHTML, XML, Java Scripts, Ajax, JQuery custom- tags, JSTL DOM Layout and CSS3.
  • Building skills in the following technologies: WebLogic, Spring Batch, Spring, Java.
  • Used XML SAX parser to simulate xml file which has simulated test data.
  • Designed/developed Rest based service by construction URI, developed service using JAX-RS annotations and Jersey implementation.
  • Used Junit, Easy mock framework for unit testing of application and implemented Test Driven Development (TDD) methodology.
  • Developed integration techniques using the JMS along with Mule ESB to integrate different applications.
  • Used Oracle as backend database using Windows OS. Involved in development of Stored Procedures, Functions and Triggers.
  • Involved in creating single page applications using Angular JS components, directives and implemented custom directive as part of implementing reusable components.

Environments: SQL, HTML, JSP, JavaScript, java, IBM Web Sphere 6.0, DHTML, XML, Java Scripts, Ajax, JQuery custom-tags, JSTL DOM Layout and CSS3.

Hire Now