We provide IT Staff Augmentation Services!

Big Data Developer Resume

Eden Prairie, MinnasotA


  • ExperiencedHadoopdeveloper with strong foundation in distributed file systems like HDFS, HBase in big data environment. Excellent understanding of the complexities associated with big data with experience in developing modules and codes in MapReduce, Hive, Pig, Sqoop, Apache Flume and Apache Spark to address those complexities
  • 8 years of IT experience in Analysis, Design, Development and Big Data in Scala, Spark, Hadoop, Pig and HDFS environment and experience in JAVA, J2EE.
  • Experience in using Hcatalog for Hive, Pig and Hbase.Experienced with NOSQL databases like HBASE and Cassandra.
  • Good experience in installing, configuring, and administrating Hadoop cluster of major Hadoop distributions Hortonworks, Cloudera.
  • Strong work experience on Kafka streaming to fetch the data real time or near real time.
  • Experience data processing like collecting, aggregating, moving from various sources using Kafka.
  • Good experience in developing solutions to analyze largedatasets efficiently.
  • Experience in setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS Amazon cloud.
  • Familiar with various Relational databases - MS SQL and Teradata
  • Experienced with Oozie Workflow Engine in running workflow jobs with actions that run Impala, Hadoop MapReduce and Pig jobs.
  • Hands on experience in Import/Export ofdatausing HadoopDataManagement tool SQOOP.
  • Experienced working on EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), set EMR (Elastic MapReduce).
  • Comprehensive knowledge in Debugging, Optimizing and Performance Tuning of DB2, Oracle and MYSQL databases.


Languages & Hadoop Components: HDFS, Sqoop, Flume, Hive, Pig, MapReduce, YARN, Oozie, Kafka, Spark, Impala, Storm, Hue, Zookeeper, Java, SQL.

BigDataPlatforms: Hortonworks, Cloudera, Amazon

Databases & NOSQL Databases: Oracle, MYSQL, Microsoft SQL Server, HBase and Cassandra.

Operating Systems: Linux, UNIX, Windows.

Development Methodologies: Agile/Scrum, Waterfall.

IDE's: Eclipse, Net Beans, GitHub, Jenkins, Maven, IntelliJ, Ambari.

Programming Languages: C, C++, JSE, XML, JSP/Servlets, Spring, HTML, JavaScript, jQuery, Web services, Python, Scala, PL/SQL & Shell Scripting.


Confidential, Eden Prairie, Minnasota

Big Data Developer


  • Developed complete end to end Big - data processing in hadoop eco system.
  • Used Spark API 2.0.0 over Mapr to perform analytics on data in Impala 2.7.0.
  • Implemented extensive Impala 2.7.0 queries and creating views for adhoc and business processing.
  • Optimized Hive 2.0.0 scripts to use HDFS efficiently by using various compression mechanisms.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Developed Scala scripts, UDF’s using both Data frames/SQL and RDD/MapReduce in Spark 2.0.0 for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Created hive schemas using performance techniques like partitioning and bucketing.
  • Developed Oozie 3.1.0 workflow jobs to execute hive 2.0.0, sqoop 1.4.6 and map-reduce actions.
  • Used SFTP to transfer and receive the files from various upstream and downstream systems.
  • Written extensive Hive queries to do transformations on the data to be used by downstream models.
  • Worked in exporting data from Hive 2.0.0 tables into Netezza 7.2.x database.
  • Involved in complete end to end code deployment process in Production.
  • Writing the test cases to obtain the gate 08 code coverage in scala for the daily jobs.
  • Worked on Data Encryption for the offshore teams by encrypting the sensitive columns using base64 method.
  • Created the Hive tables using Spark Catalog method on top of parquet files in Spark 2.0.
  • Working on Github to create code repos for the Jenkins to create the pilelines and run the jobs through automation.
  • Involved in Scrum calls, Grooming and Demo meeting.

Environment: Scala, spark, Hive, Oozie, Mapr, Git, ElasticSearch, Logstash, Kibana, Kerberos, ELK stack, Kubernetes.

Confidential, Philadelphia

Hadoop Developer


  • Applied transformations on the data loaded into Spark Dataframes and done in memory data computation to generate the output response.
  • Developed multiple POCs using Spark Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using Python (PySpark)
  • Developed Scala scripts using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Used hive to analyze the partitioned data and compute various metrics for reporting.
  • Import the data from different sources like HDFS into Spark Data frames.
  • Scheduled and executed workflows in Oozie to run Hive and Pig jobs
  • Experienced with Spark Context, Spark -SQL, Data Frame and Pair RDD's.
  • Reduced the latency of spark jobs by tweaking the spark configurations and following other performance and Optimization techniques.
  • Used Oozie workflow engine to manage interdependentHadoopjobs and to automate several types ofHadoopjobs such as Java map-reduce, Hive, Pig.
  • Used Hive, spark SQL Connection to generate Tableau BI reports.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Created Hive Generic UDF's to process business logic that varies based on policy.
  • Developed various data connections from data source to SSIS, Tableau Server for report and dashboard development
  • Developed solutions utilizing the Hadoop ecosystem such Hadoop, Spark, Hive, HBASE, Pig, Sqoop, Oozie, Ambari, Zookeeper etc.
  • Experience in writing map reduce programs with java api to cleanse structured and unstructured data.
  • Experience in Rdms such as oracle, Teradata.
  • Worked on loading the data from mysql & teradata to Hbase where necessary using sqoop.

Environment: Scala, spark, Kafka, Hive, Horotonworks, Oozie, Play framework, Akka, Git, ElasticSearch, Logstash, Kibana, Kerberos.


Hadoop Developer/Admin


  • Installed, configured, upgraded, and applied patches and bug fixes for Prod, Lab and Dev Servers.
  • Install, configure and administer HDFS, Hive, Ranger, Pig, HBase, Oozie, Sqoop, Spark and Yarn.
  • Involved in upgrading Cloudera Manager Upgrade from Cloudera Manager 5.5 to Cloudera Manager 5.6.
  • Involved in capacity planning, load balancing and design of Hadoop clusters.
  • Involved in setting up alerts in Cloudera Manager for the monitoring health and performance of Hadoop Clusters.
  • Involved in installing and configuring security authentication using Kerberos security.
  • Creating and dropping of users, granting and revoking permissions to users/Policies as and when required using Ranger.
  • Commission and decommission the data nodes from cluster.
  • Write and modify UNIX shell scripts to manage HDP environments.
  • Involved in installed and configured Apache Flume, Hive, Sqoop and Oozie on the Hadoop cluster.
  • Create directories and setup appropriate permissions for different applications or users.
  • Backup tables in HBase to HDFS using export utility.
  • Involved in creating users, user’s groups and allotting the roles of the users and creating the home directory for the user.
  • Installation, Configuration and administration of HDP on Red Hat Enterprise Linux 6.6
  • Used Sqoop to import data into HDFS from Oracle database.
  • Detailed analysis of system and application architecture components per functional requirements.
  • Review and monitor system and instance resources to insure continuous operations (i.e., database storage, memory, CPU, network usage, and I/O contention)
  • On call support for 24x7 Production job failures and resolve the issue in timely manner.
  • Developed UNIX scripts for scheduling the delta loads and master loads using Auto sys Scheduler.
  • Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment.
  • Troubleshoots with problems regarding the databases, applications and development tools.

Environment: Hadoop, HDFS, Hive, Cloudera Manager, Sqoop, Flume, Oozie, CDH5, MongoDB, Cassandra, HBase, Hue, Kerberos and Unix/Linux.

Confidential, Dallas, TX

Java/Hadoop Developer


  • All the fact and dimension tables were imported from sql server into Hadoop using Sqoop.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Involved in extracting customer’s Big data from various data sources into Hadoop HDFS. This included data from mainframes, databases and logs data from servers.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Developed Tableau visualizations and dashboards using Tableau Desktop.
  • Developed Tableau workbooks from multiple data sources using Data Blending.
  • Experience in managing and reviewing Hadoop log files.
  • Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • The Hive tables created as per requirement were managed or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
  • Implemented Partitioning, Bucketing in Hive for better organization of the data.
  • Developed python UDFs in Pig and Hive.
  • Used Apache Kafka to gather log data and fed into HDFS.
  • Data Ingestion using Sqoop from various sources like Informatica, Oracle
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific jobs.
  • Installed and configured various components of Hadoop ecosystem and maintained their integrity.
  • Implemented Fair Scheduler on the job tracker to allocate the fair amount of resources to small jobs.
  • Implemented automatic failover Zookeeper and zookeeper failover controller.
  • Developed Java map-reduce programs to encapsulate transformations.
  • Participation in Performance tuning in database side, transformations, and jobs level.

Environment: Hadoop, HDFS, Map Reduce, Sqoop, Hive, Pig, Oozie, Hbase, CDH4, Cloudera Manager, MySQL, Eclipse

Confidential, Roseville, CA

Hadoop Developer


  • Responsible for building data solutions in Hadoop using Cascading frameworks.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Worked hands on with ETL process.
  • Explored with the spark, improving the performance and optimization of the existing algorithms in hadoop using spark context, spark-sql, data frame, pair rdd's, spark yarn.
  • Import the data from different sources like hdfs/hbase into spark rdd.
  • Developed spark code using scala and spark-sql/streaming for faster testing and processing of data.
  • Developed kafka producer and consumers, hbase clients, spark and hadoop map reduce jobs along with components on hdfs, hive.
  • Upgrading the Hadoop Cluster from CDH3 to CDH4. Integrate the HIVE with existing applications.
  • Configured Ethernet bonding for all Nodes to double the network bandwidth.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Teradata into HDFS using Sqoop.
  • Used Python and Shell scripts to automate the end-to-end ELT process
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Performed data quality checks on data as per the business requirement.
  • Performed data validation on target table in compared to the source table.
  • Achieved high throughput and low latency for ingestion jobs leveraging the Sqoop
  • Transformed the raw data and loaded into stage and target tables.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Teradata, Cloudera Manager, Pig, Sqoop, Oozie, Python.


Java/J2EE Developer


  • Involved in designing and developing modules at both Client and Server Side.
  • Developed the UI using JSP, JavaScript and HTML.
  • Responsible for validating the data at the client side using JavaScript.
  • Interacted with external services to get the user information using SOAP web service calls
  • Developed web components using JSP, Servlets and JDBC.
  • Technical analysis, design, development and documentation with a focus on implementation and agile development.
  • Developed a Web based reporting system with JSP, DAO and Apache Struts-Validator using Struts framework.
  • Designed the controller using Servlets.
  • Accessed backend database Oracle using JDBC.
  • Developed and wrote UNIX Shell scripts to automate various tasks.
  • Developed user and technical documentation.
  • Developed business objects, request handlers and JSPs for this project using Java Servlets and XML.
  • Developed core spring components with some of the modules and integrated it with the existing struts framework.
  • Actively participated in testing and designed user interface using HTML and JSPs.
  • Implemented the database connectivity to Oracle using JDBC, designed and created tables using SQL.
  • Implemented the server side processing using Java Servlets.
  • Installed and configured the Apache Web server and also deployed JSPs and Servlets in Tomcat Server.

Environment: Java, Servlets, JSP, JavaScript, JDBC, Unix Shell scripting, HTML, Eclipse, Oracle 8i, WebLogic.

Hire Now