We provide IT Staff Augmentation Services!

Sr. Big Data Developer Resume

Wallingford, CT


  • Over 12+ years’ experience IT experience which includes 6 years Big Data Engineer /Data Engineer and Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
  • Strong experience with Big Data and Hadoop technologies with excellent knowledge of Hadoop ecosystem: Pig, Hive, Spark, Sqoop, Impala, Kafka, Flume, Storm, Zookeeper, Oozie, MapReduce (MR1), YARN(MR2), HDFS, HBase, ZooKeeper, Hue.
  • Experience in building highly reliable, scalable Big data solutions on Hadoop distributions Cloudera, Horton works, AWS EMR.
  • Experience in transferring the data using Informatica tool from AWS S3 to AWS Redshift
  • Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Demoralization techniques for effective and optimum performance in OLTP and OLAP environments.
  • Expertise in moving structured schema data between Pig and Hive using HCatalog.
  • Expert in creating and designing data ingest pipelines using technologies such as Apache Storm-Kafka.
  • Good Knowledge in creating processing data pipelines using Kafka and Spark Streaming.
  • Experienced in data ingestion using Sqoop, Storm, Kafka and Apache Flume.
  • Experienced working on AWS environment (cloud services using S3 storage, Datastax enterprise package with Spark and Cassandra NoSQL database).
  • Experience in Hadoop administration activities such as installation of Apache Hadoop and its ecosystems like Pig, Hive, Sqoop, Flume, HBASE, MongoDB, Cassandra, Apache Spark, Kafka and also involved in Hadoop Cluster configuration set up as per the business requirements. Analyze the long running jobs, failed jobs and perform the necessary changes to resolve the issues.
  • Involved in Cassandra database design, decisions for choosing right table partition, clustering keys for the efficient storage and better performance while reading/lookup to the table.
  • Good understanding of NoSQL Data bases and hands on work experience in writing applications on NoSQL databases like Cassandra and MongoDB.
  • Hands on Datalake cluster with Hortonworks Ambari on AWS using EC2 and S3.
  • Knowledge of Spark code and SparkSQL for testing and processing of data using PySpark.
  • Knowledge on cloud services Amazon web services (AWS).
  • Good in analyzing data using HiveQL, PigLatin and custom MapReduce program in Java.
  • Expertise in MapReduce programs in HIVE and PIG to validate and cleanse the data in HDFS, obtained from heterogeneous data sources, to make it suitable for analysis.
  • Working knowledge of Amazon’s Elastic Cloud Compute(EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
  • Experienced in implementing scheduler using Oozie, Airflow, Crontab and Shell scripts.
  • Hands on expertise in working and designing of Row keys & Schema Design with NOSQL databases like Mongo DB 3.0.1, HBase, Cassandra and DynamoDB (AWS).
  • Preparing JIL scripts for scheduling the workflows using Autosys and automated jobs with Oozie.
  • Exposure to Data Lake Implementation using Apache Spark and developed Data pipe lines and applied business logics using Spark.
  • Extensively worked on Spark streaming and Apache Kafka to fetch live stream data.
  • Used Scala and Python to convert Hive/SQL queries into RDD transformations in Apache Spark.
  • Experience in developing data pipeline using Pig, Sqoop, and Flume to extract the data from weblogs and store in HDFS. Accomplished developing Pig Latin Scripts and using Hive Query Language for data analytics.
  • Developed customized UDFs and UDAFs in java to extend Pig and Hive core functionality.
  • Good experience with both Job Tracker (Map reduce 1) and YARN (Map reduce 2).
  • Experience in managing and reviewing Hadoop Log files generated through YARN.
  • Experience in using Apache Solr for search applications.
  • Hands on Agile (Scrum), Waterfall model along with automation and enterprise tools like Jenkins, Chef, JIRA, Confluence to develop projects and version control, Git. Excellent Java development skills using Java 6/7/8, J2EE, J2SE, Servlets, Junit, JSP, JDBC.
  • Experienced in migrating data from different sources using PUB-SUB model in Redis, and Kafka producers, consumers and preprocess data using Storm and Stark.
  • Experienced in writing AdHoc queries using ClouderaImpala, also used Impala analytical functions. Good understanding of MPP databases such as HP Vertica.
  • Experience in developing data pipeline using Pig, Sqoop, and Flume to extract the data from weblogs and store in HDFS. Accomplished developing Pig Latin Scripts and using Hive Query Language for data analytics.
  • Developed customized UDFs and UDAFs in java to extend Pig and Hive core functionality.
  • Experience in validating and cleansing the data using Pig Latin operations and UDFs. Hands-on experience in developing PigMACROS.
  • Experienced in writing AdHoc queries using ClouderaImpala, also used Impala analytical functions. Good understanding of MPP databases such as HP Vertica.
  • Worked on ELK stack like Elastic search, Logstash, Kibana for log management.
  • Good experience in optimizing MapReduce algorithms using Mappers, Reducers, combiners and partitioners to deliver the best results for the large datasets.
  • Extracted data from various data source including OLE DB, Green plum, Excel, Flat files and XML.


Big Data Technology: HDFS, Map reduce, HBase, Pig, Hive, SOLR, Sqoop, Flume, MongoDBCassandra, Puppet, Oozie, Zookeeper, Spark, Kafka, Talend

Hadoop Distribution: Cloudera, Hortonworks, IBM Big Insights

Cloud Computing Service: AWS (Amazon Web Services)

Languages: Python, Java (5/6/7/8), C/C++, Swing, SQL, HTML, CSS, i18n, l10n, DHTML, XML, XSD, XHTML, XSL, XSLT, XPath, XQuery, SQL, PL/SQL, UML, JavaScript, AJAX(DWR), jQuery, Dojo, ExtJS, Shell Scripts, Perl

Development Framework/IDE: RAD 8.x/7.x/6.0, IBM WebSphere Integration Developer 6.1, WSAD 5.xEclipse Galileo/Europa/3.x/2.x, MyEclipse 3.x/2.x, NetBeans 7.x/6.xIntelliJ 7.x, Workshop 8.1/6.1, Adobe Photoshop, Adobe DreamweaverAdobe Flash, Ant, Maven, Rational Rose, RSA, MS Visio, OpenMake Meister

Web/Application Servers: C, C++, COBOL,JAVA, J 2EE, VB.Net, C# VB, PHP

Data Bases: NoSQL, Oracle 11g/10g/9i/8i, DB2 9.x/8.x, MS SQL Server 2008/2005/2000, MySQL

NoSQL: HBase, Cassandra, MongoDB

Operating Systems: Windows XP, 2K, MS-DOS, Linux (Red Hat), Unix (Solaris), HP UX, IBM

AIX: Reporting Tools

Tableau, Datameer: Version Control

CVS, SourceSafe, ClearCase, Subversion: Monitoring Tools

Embarcadero J Optimizer 2009, TPTP, IBM Heap Analyzer, Wily: Introscope, Jmeter

Other: JBoss Drools 4.x, REST, IBM Lotus WCM, MS ISA,CA SiteMinder, BMC WAM Mingle


Confidential, Wallingford, CT

Sr. Big Data Developer


  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, SparkSQL, Data Frames.
  • Used Sqoop to import data from Relational Databases like MySQL, Oracle.
  • Involved in importing structured and unstructured data into HDFS.
  • Responsible for fetching real time data using Kafka and processing using Spark and Scala.
  • Worked on Kafka to import real time weblogs and ingested the data to Spark Streaming.
  • Working with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration.
  • Import data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Used Python libraries such as Pandas, SciKit-Learn and Numpy
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked on Hive to implement Web Interfacing and stored the data in Hive tables.
  • Migrated Map Reduce programs into Spark transformations using Spark and Scala.
  • Experienced with Spark Context, Spark-SQL, Spark YARN.
  • Collect and aggregate large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Kafka and stored the data into HDFS for analysis.
  • Develop multiple Kafka Producers and Consumers from scratch implementing organization's requirements.
  • Responsible for creating, modifying topics (Kafka Queues) as and when required with varying configurations involving replication factors, partitions and TTL.
  • Involved in Data Querying and Summarization using Hive and Pig and created UDF’s, UDAF’s and UDTF’s.
  • Extensively involved in performance tuning of the HiveQL by performing bucketing on large Hive tables.
  • Worked on Cloudera distribution and deployed on AWS EC2 Instances.
  • Extensively used Zookeeper as a backup server and job scheduled for Spark Jobs.
  • Develop an equivalent Spark-PySpark for existing SAS code to extract summary insights on the Hive tables.
  • Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Develop multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Involved in analysis, design, testing phases and responsible for documenting technical specifications.
  • Develop Kafka producer and consumers, HBase clients, Spark and Hadoop Map Reduce jobs along with components on HDFS, Hive.
  • Configured work flows that involves Hadoop actions using Oozie.
  • Coordinated with SCRUM team in delivering agreed user stories on time for every sprint.
  • Deployed the project on Amazon EMR with S3 connectivity for setting a backup storage.
  • Developed data pipeline using Pig, Sqoop, and Flume to extract the data from weblogs and store in HDFS. Accomplished developing Pig Latin Scripts and using Hive Query Language for data analytics.
  • Written multiple MapReduce Jobs using JavaAPI, Pig and Hive for data extraction, transformation and aggregation from multiple file formats including Parquet, Avro, XML, JSON, CSV, ORCFILE and other compressed file formats Codecs like gZip, Snappy, Lzo.

Environment: Hadoop, Amazon Web Services (AWS), HDFS, Map Reduce, Hive, Sqoop, Apache Kafka, Zookeeper, Spark, Hbase, Python, Shell Scripting, Oozie, Maven, Hortonworks

Confidential, Portland, OR

Lead Hadoop Developer


  • Worked on HiveQL queries and Pig Latin scripts for the transformations.
  • Managing and reviewing Hadoop Log files and optimizing the code.
  • Writing Spark-API for comparing the both Cassandra and DB2 Table data.
  • Responsible for Cluster maintenance, Cluster Monitoring, Troubleshooting, Manage & review Hadoop log files. Installation of various Hadoop Ecosystems.
  • Responsible for Installation and configuration of Hive, Pig, Oozie, HBase and sqoop on the Hadoop cluster.
  • Used Spark Streaming API with Kafka to build live dashboards; Worked on Transformations & actions in RDD and Spark Streaming, MapReduce, Pair RDD Operations, Partitioner, Check-pointing, and SBT.
  • Refactored formal Hive queries to Spark SQL.
  • Involved in Installing, Configuring Hadoop Eco System, and Cloudera Manager using CDH4 Distribution.
  • Designed and implemented scalable Cloud Data and Analytical architecture solutions for various public and private cloud.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
  • Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, MapReduce, Spark and Shells scripts (for scheduling of few jobs).
  • Involved in importing and exporting data between RDBMS and HDFS using Sqoop.
  • Performed querying of both managed and external tables created by Hive using Impala.
  • Implemented the Big Data solution using Hadoop, and hive to pull/load the data into the HDFS system.
  • Used Talend Open Studio to perform ETL aggregations in Hadoop HIVE & PIG.
  • Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
  • Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to load data files.
  • Used hive schema to create relations in pig using HCatalog.
  • Developed a Java MapReduce and pig cleansers for data cleansing.
  • Implemented POC to migrate map reduce jobs into Spark RDD transformation using Scala IDE for Eclipse.
  • Implemented Machine Learning Models like K-means clustering using PySpark.
  • Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users.
  • Installed, Configured and Maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, Hbase, Zookeeper and Sqoop.
  • Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Developed and executed hive queries for de-normalizing the data.
  • Created Hive tables on top of the loaded data and writing hive queries for ad-hoc analysis.
  • Imported and exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Used Maven extensively for building jar files of Map Reduce programs and deployed to cluster.
  • Managed Agile Software Practice using Rally by creating Product Backlog, Iterations and Sprints in collaboration with the Product Team.

Environment: Hadoop 3.0, MapReduce, Hive 2.3, Pig 0.17, HDFS, HBase, MongoDB, Agile, MySQL, Oozie, MySQL, Sqoop1.4, HBase1.2.

Confidential . Bentonville, AR

Lead Hadoop Developer


  • Developed multiple spark batch jobs in Scala using Spark SQL and performed transformations using many APIs and update master data in Cassandra database as per the business requirement.
  • Written spark-scala scripts, by creating multiple udf’s, spark context, Cassandra sql context, multiple API’s, methods which support data frames, RDD’s, data frame Joins, Cassandra table joins and finally write/save the data frames/RDD’s to Cassandra database.
  • Involved in processing the huge number of inbound feed files using Spark batch Jobs and update the master data in Cassandra tables.
  • Utilize in-depth knowledge of functional and Technical experience in Mainframe and other leading-edge products and technology in conjunction with industry and business skills to deliver solutions to customer.
  • Designed, developed and tested Map Reduce programs on Mobile Offers Redemptions and Sent it to the downstream applications like HAVI. Scheduled this MapReduce job through Oozie workflow.
  • Worked on and designed Big Data analytics platform for processing customer interface preferences and comments using Hadoop, Hive and Pig, Cloudera.
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.
  • Build data platforms, pipelines, and storage systems using the Apache Kafka, Apache Storm and search technologies such as Elastic search.
  • Worked with batch processing of data sources using Apache Spark, Elastic search.
  • Worked in AWS cloud environment and on S3 storage and EC2 instances
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
  • Created Data Pipeline using Processor Groups and multiple processors using Apache NiFi for Flat File, RDBMS as part of a POC using Amazon EC2.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Developed in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Implemented different machine learning techniques in Scala using Scala machine learning library.
  • Developed Spark applications using Scala for easy Hadoop transitions.
  • Created RDD's in Spark Scala and Python.
  • Closely worked with Admin team to gather hardware for Data nodes, edge nodes, and Name nodes.
  • Successfully loaded files to Hive and HDFS from Oracle, Netezza and SQL Server using SQOOP.
  • Used Talend Open Studio to load files into Hadoop HIVE tables and performed ETL aggregations in Hadoop HIVE.
  • Designed & Created ETL Jobs through Talend to load huge volumes of data into Cassandra.
  • Used Sqoop to import data from SQL server to Cassandra.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
  • Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Loaded data into HIVE tables, and extensively used Hive/HQL or Hive queries to query data in Hive Tables.
  • Introduced Tableau Visualization to Hadoop to produce reports for Business and BI team.
  • Scheduled, monitored and debugged various MapReduce, Pig, Hive jobs using Oozie Workflow.
  • Design and deployment of Storm cluster integration with Kafka and HBase.
  • Implemented authentication and authorization service using Kerberos authentication protocol.

Environment: Hadoop 1.2.1, MapReduce, Sqoop 1.4.4, Hive 0.10.0, Flume 1.4.0, Oozie 3.3.0, Pig 0.11.1, Hbase 0.94.11, Scala, Python, Zookeeper 3.4.3, Talend Open Studio, kafka, Storm, Oracle 11g/10g, Apache Cassandra, SQL Server 2008, MySQL 5.6.2, Java 7, SQL, PLSQL, Toad 9.7, Eclipse Kepler IDE, Microsoft Office 2007, MS Outlook 2007, SharePoint Teamsite.


Technical ANALYST


  • Responsible for leading a project team in delivering solution to our customer in the retail sector.
  • Deliver new and complex high-quality solutions to clients in response to varying business requirements
  • Responsible for managing scope, planning, tracking, change control, aspects of the project.
  • Responsible for effective communication between the project team and the customer. Provide day to day direction to the project team and regular project status to the customer.
  • Translate customer requirements into formal requirements and design documents, establish specific solutions, and leading the efforts including programming and testing that culminate in client acceptance of the results.
  • Utilize in-depth knowledge of functional and Technical experience in Mainframe and other leading-edge products and technology in conjunction with industry and business skills to deliver solutions to customer.
  • Establish Quality Procedure for the team and continuously monitor and audit to ensure team meets quality goals.
  • Develop and document test plans for new and modified applications covering Unit Testing and System Testing.
  • Involved in processing the huge number of inbound feed files using Spark batch Jobs and update the master data in Cassandra tables.
  • Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
  • Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.
  • Developed the XML Schema and Web services for the data maintenance and structures Wrote test cases in JUnit for unit testing of classes.
  • Used DOM and DOM Functions using Firefox and IE Developer Tool bar for IE.
  • Debugged the application using Firebug to traverse the documents.
  • Involved in developing web pages using HTML and JSP.

Environment: CICS, COBOL, DB2, JCL, VSAM, CICS Web Services, Endevor, Datacom, File Master, SPUFI, ORACLE, PL/SQL


Technical Analyst


  • Worked with Business Analyst in translating business requirements into Functional Requirements Document and to Detailed Design Documents
  • Lead analysis sessions gather requirements and write specification and functional design documents for enhancements and customization; Analyze product impact
  • Co-ordinating offshore team and updating status of work to client
  • Responsible for interacting directly with the Client manager and conducting same time meetings with client-side representatives.
  • Ensure that development is performed as per requirements
  • Involved in helping other team members in providing Problems and Performance improvements
  • Communicate activities/progress to project managers, business development, business analysts and clients

Environment: CICS, COBOL, DB2,JCL,VSAM, CICS Web Services, Endevour, File Master, CA Interest, XML, XSLT,WSDL,SOAP, Data power

Confidential, Long Island, NY

Asst. Systems Engineer


  • Design, develop, test and implement, modify and maintain software systems for various business application projects using COBOL, Ideal, CICS, Datacom, and DB2 in various environments like IBM Mainframes.
  • Build applications using COBOL, Ideal, Easytrieve, Datacom, DB2, CICS, VSAM and JCL.
  • Modify existing software to adapt to new requirements
  • Monitor critical jobs of application.
  • Perform regular administrative functions dealing with problem tracking/management and peer review processes.
  • Produce User documentation for new applications outlining functionality, general decision logic and data flows.
  • Provide technical and investigative support for functional users on interfaces, defects, change requests, debugging issues, testing
  • Conduct reviews of all deliverables and maintain records of reviews on the defect-tracking database.
  • Prepare High and low-level designs based on requirements.
  • Provide Production support and maintenance.
  • Provide 24X7 on call application support.
  • Create First Choice Change tickets for Code Promotion, Database Table changes.etc
  • Replicate the Test databases with Production databases.
  • Attend project meetings and quality review meetings.

Environment: Win CICS, COBOL, DB2, JCL, VSAM, Change man, Infoman, File Aid, Xpeditor, CA Scheduling

Hire Now