We provide IT Staff Augmentation Services!

Big Data Developer Resume

Schaumburg, IL


  • Around 9 years of experience in IT which includes Analysis, Design, Development of Big Data using Hadoop, Scala, design and development of web applications using JAVA, Spring boot and data base and data warehousing development using My SQL, Oracle.
  • Around 4+ years of work experience on Big Data Analytics with hands on experience in installing, configuring and using ecosystem components like Hadoop Map reduce, HDFS, HBase, Zookeeper, Hive, Sqoop, Pig, Flume, Cassandra, Kafka and Spark, NiFi(ETL).
  • Good Understanding of Hadoop architecture and Hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts and HDFS Framework.
  • Implemented design patterns in Scala for the application and developed quality code adhering to Scala coding Standards and best practices.
  • Strong experience in analyzing large amounts of data sets writing PySpark scripts and Hive queries.
  • For the app developing project, I implemented applications with Scala along with Akka and Play framework and implemented Restful services in Spring.
  • Running of ApacheHadoop, CDH and Map - R distros, dubbed Elastic MapReduce(EMR) on (EC2).
  • Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Experience in pulling data from Amazon S3 cloud to HDFS.
  • Extensively worked on AWS services like EC2, S3, EMR, FSx, Lambda, Cloud watch, RDS, Auto scaling, Cloud Formation, SQS, ECS, EFS, DynamoDB, Route53, Glue etc.
  • Hands on experience inVPN Putty and WinSCP, CI/CD(Jenkins).
  • Experience in Data load management, importing & exporting data using SQOOP&FLUME.
  • Experience in analyzing data using Hive, Pig and custom MR programs in Java.
  • Experience in scheduling and monitoring jobs using Oozie and Zookeeper.
  • Experienced in writing Map Reduce programs & UDF's for both Pig & Hive in java.
  • Experience in dealing with log files to extract data and to copy into HDFS using flume.
  • Experience in integrating Hive and Hbase for effective operations.
  • Experience in Impala, Solr, MongoDB, HBase and Spark, Kubernetes.
  • Hands on knowledge of writing code in Scala, Core Java and also with R.
  • Expertise in Waterfall and Agile - SCRUM methodologies.
  • Experienced with code versioning and dependency management systems such as Git, SVT, and Maven.
  • Experience in Testing and documenting software for client applications.
  • Writing code to create single-threaded, multi-threaded or user interface event driven applications, either stand-alone and those which access servers or services.
  • Good experience in using Data Modelling techniques to find the results based on SQL and PL/SQL queries.
  • Experience working with different databases, such as Oracle, SQL Server, MySQL and writing stored procedures, functions, joins, and triggers for different Data Models.
  • Great team player and quick learner with effective communication, motivation, and organizational skills combined with attention to details and business improvements.
  • Experienced in handling different file formats like Text file, Avro data files, Sequence files, Xml and Json files.


Big Data Technologies: HDFS, Hive, MapReduce, Pig, Sqoop, Flume, Oozie, Hadoop distribution, and Hbase, Spark, Spark Streaming, Yarn, Zookeeper, Kafka, ETL.(Nifi, Talend etc)

Programming languages: Core Java, Spring Boot, R, Scala, Terraform, Angular.

Databases: MySQL, MS-SQL Server 20012/16, Oracle 10g/11g/12c

Scripting/Web Languages: HTML5, CSS3, XML, SQL, Shell/Unix, Perl, Python.

NoSql Databases: Cassandra, HBASE, mongoDB.

Operating Systems: Linux, Windows XP/7/8/10, Mac.

Software Life Cycle: SDLC, Waterfall and Agile models.

Utilities/Tools: Eclipse, Tomcat, NetBeans, JUnit, SQL, SVN, Log4j,SOAP UI, ANT, Maven, Alteryx, Visio, Jenkins, Jira.

Data Visualization Tolls: Tableau, SSRS, Cloud Health.

AWS Services: EC2, S3, EMR, RDS, Lambda, Cloudwatch, FSx, Auto scaling, Cloud Formation, Glue etc.


Confidential, Schaumburg, IL

Big data Developer


  • Involved in Requirement gathering, Business Analysis and translated business requirements into Technical design inHadoopand Big Data
  • Involved in SQOOP implementation which helps in loading data from various RDBMS sources toHadoopsystems.
  • Developed Python scripts to extract the data from the web server output files to load into HDFS.
  • Involved in HBASE setup and storing data into HBASE, which will be used for further analysis.
  • Worked on Written a python script which automates to launch the EMR cluster and configures the Hadoop applications using boto3.
  • Created various data pipelines usingSpark, Scala and SparkSQL for faster processing of data.
  • Written Spark-SQL and embedded the SQL in SCALA files to generate jar files for submission onto the Hadoop cluster
  • Extensively worked with Avro and Parquet files and converted the data from either format Parsed Semi Structured JSON data and converted to Parquet using Data Frames in PySpark.
  • Developed a Python Script to load the CSV files into the S3 buckets and createdAWS S3buckets, performed folder management in each bucket, managed logs and objects within each bucket.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions, Documented the systems processes and procedures for future references.
  • Involved in ConfiguringHadoopcluster and load balancing across the nodes.
  • Involved in Hadoopinstallation, Commissioning, Decommissioning, Balancing, Troubleshooting, Monitoring and, debugging Configuration of multiple nodes using Hortonworks platform.
  • Involved in working with Spark on top of Yarn/MRv2 for interactive and Batch Analysis
  • Worked with a team to migrate from Legacy/On prem environment into AWS.
  • Created Dockerized backend cloud applications with exposed Application Program Interface (API) interfaces and deployed on Kubernetes.
  • Experienced in analyzing and Optimizing RDD's by controlling partitions for the given data
  • Experienced in writing live Real-time Processing using Spark Streaming with Kafka
  • Used HiveQL to analyze the partitioned and bucketed data and compute various metrics for reporting
  • Worked with querying data using SparkSQL on top of Spark engine.
  • Involved in managing and monitoringHadoopcluster using Cloudera Manager.
  • Used Python and Shell scripting to build pipelines.
  • Developed data pipeline using SQOOP, HQL, Spark and Kafka to ingest Enterprise message delivery data into HDFS.
  • Developed workflow in Oozie to automate the tasks of loading data into HDFS and pre-processing with Pig and Hive.
  • Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts.
  • Assisted in Cluster maintenance, cluster monitoring, adding and removing cluster nodes and Installed and configured Hadoop, Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and pre-processing.
  • Involved in file movements between HDFS andAWSS3 and extensively worked with S3 bucket inAWS.
  • Automated and monitored complete AWS infrastructure with terraform.
  • Created data partitions on large data sets in S3 and DDL on partitioned data.
  • Converted allHadoopjobs to run in EMR by configuring the cluster according to the data size.

Environment: HDFS, Hive, Scala, Sqoop, Spark, Yarn, Cloudera, SQL,Terraform, Splunk, RDBMS, Elastic search, Kerberos, Shell/Python Scripting, Jira, Confluence, Shell/Perl Scripting, Zookeeper, AWS(EC2, S3, EMR, ECS,Glue, S3, VPC, RDS etc.), Ranger, Git, Kafka, CI/CD(Jenkins), Kubernetes, Talend.

Confidential, Dear Born, MI

Sr. Hadoop Developer


  • Importdatafrom sources like HDFS/HBase into Spark RDD.
  • Usage of Spark Streaming and Spark SQL API to process the files.
  • Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa loading data into HDFS.
  • Stored data in AWS S3 like HDFS and performed EMR programs on data stored in S3.
  • Worked on Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
  • Developing UDFs in java for hive and pig and worked on reading multiple data formats on HDFS using Scala.
  • Developed workflow in Oozie to automate the tasks of loading data into HDFS and pre-processing with Hive.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Involved in Migrating the platform from Cloudera to EMR platform.
  • Developed analytical component using Scala, Spark and Spark Streaming.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
  • Extensively involved in developing Restful API using JSON library of Play framework.
  • Developed Storm topology to ingest data from various source into Hadoop Data Lake.
  • Developed web application using HBase and Hive API to compare schema between HBase and Hive tables.
  • Played a vital role inScala/Akkaframework for web based applications
  • Connected to AWS s3 using SSH and ran spark-submit jobs
  • Developed Python Script to import data SQL Server into HDFS & created Hive views on data in HDFS using Spark.
  • Expert in Troubleshooting MapReduce Jobs.
  • Created scripts to append data from temporary HBase table to target HBase table in Spark.
  • Developed complex and Multi-step data pipeline using Spark.
  • Worked on Big Data Integration and Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods technologies.
  • Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
  • Monitoring YARN applications. Troubleshoot and resolve cluster related system problems.
  • Upgrading the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Involved in creating ETL flow using Pig, loading with data and writing Pig Latin queries which will run internally in Map Reduce way.
  • Involved in writing Unix/Linux Shell Scripting for scheduling jobs and for writing pig scripts and hive QL.
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
  • Assisted in exporting data into Cassandra and writing column families to provide fast listing outputs.
  • Used Zookeeper for providing coordinating services to the cluster.
  • Worked with Hue UI in scheduling jobs with ease and File browsing, Job browsing, Metastore management.
  • Developed and designed system to collect data from multiple portal using kafka and then process it using spark.

Environment: Hadoop, HDFS, Hive, Core Java, Sqoop, Spark, Scala, Hive, Cloudera CDH4, Oracle, Elastic search, Kerberos, SFTP, Impala, Jira, Wiki, Alteryx, Teradata, Shell/Perl Scripting, Kafka, AWS EC2, S3, EMR, Cloudera.

Confidential, Detroit, MI

Hadoop/Scala Developer


  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Analysed the SQL scripts and designed the solution to implement using Scala.
  • Developed analytical component using Scala, Spark and Spark Stream.
  • Developing UDFs in java for hive and pig and worked on reading multiple data formats on HDFS using Scala.
  • Used Scala collection framework to store and process the complex consumer information.
  • Used Scala functional programming concepts to develop business logic.
  • Designed and implemented Apache Spark Application (Cloudera)
  • Importing and exporting data into HDFS Sqoop and Flume and Kafka.
  • Troubleshoot and debug Hadoop ecosystem run-time issues.
  • Used Oozie Scheduler systems to automate the pipeline workflow and orchestrate the map reduce jobs that extract
  • Worked with Hue GUI in scheduling jobs with ease and File browsing, Job browsing, Metastore management.
  • Worked with BI team in the area of Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Flume, Oozie Zookeeper and Sqoop.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshoot managing and reviewing data backups and Hadoop log files.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Extensively involved in Installation and configuration of Cloudera distribution Hadoop, NameNode, Secondary NameNode, JobTracker, TaskTrackers and DataNodes.
  • Created POC to store Server Log data in MongoDB to identify System Alert Metrics.
  • Monitored Hadoop cluster job performance, performed capacity planning and managed nodes on Hadoop cluster.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
  • Performed analysis on the unused user navigation data by loading into HDFS and writing MapReduce jobs. The analysis provided inputs to the new APM front end developers and lucent team.
  • Wrote MapReduce jobs using Java API and Pig Latin.
  • Wrote Pig scripts to run ETL jobs on the data in HDFS and further do testing.
  • Used Hive to do analysis on the data and identify different correlations.
  • Involved in HDFS maintenance and administering it through Hadoop-Java API.
  • Written Hive queries for data analysis to meet the business requirements.
  • Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
  • Involved in creating Hive tables & working on them using HiveQL and perform data analysis using Hive and Pig.
  • Used Qlikview and D3 for visualization of query required by BI team.
  • Defined UDFs using PIG and Hive in order to capture customer behavior.
  • Design and implement MapReduce jobs to support distributed processing using java, Hive and Apache Pig.
  • Create Hive external tables on the MapReduce output before partitioning, bucketing is applied on it.
  • Loaded the load ready files from mainframes to Hadoop and files were converted to ASCII format.
  • Configured Hive Server (HS2) to enable analytical tools like Tableau, Qlikview and SAS to interact with Hive tables.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Pig, Sqoop, ZooKeeper, Teradata, PL/SQL, MySQL, Hbase, ETL(Informatica/SSIS).


Java Developer


  • Excellent JAVA, J2EE application development skills with strong experience in Object Oriented Analysis, extensively involved throughout Software Development Life Cycle (SDLC)
  • Implemented various J2EE standards and MVC framework involving the usage of Struts, JSP, AJAX and servlets for UI design.
  • Used SOAP/ REST for the data exchange between the backend and user interface.
  • Utilized Java and MySQL from day to day to debug and fix issues with client processes.
  • Developed, tested, and implemented financial-services application to bring multiple clients into standard database format.
  • Assisted in designing, building, and maintaining database to analyze life cycle of checking and debit transactions.
  • Created web service components using SOAP, XML and WSDL to receive XML messages and for the application of business logic.
  • Involved in configuring web sphere variables, queues, DSs, servers and deploying EAR into Servers.
  • Involved in developing the business Logic using Plain Old Java Objects (POJOs) and Session EJBs.
  • Developed authentication through LDAP by JNDI.
  • Developed and debugged the application using Eclipse IDE.
  • Involved in Hibernate mappings, configuration properties set up, creating sessions, transactions and second level cache set up.
  • Involved in backing up database & in creating dump files. And also creating DB schemas from dump files. Wrote developer test cases & executed. Prepared corresponding scope & traceability matrix.
  • Implemented JUnit and JAD for debugging and to develop test cases for all the modules.
  • Hands-on experience of Sun One Application Server, Web logic Application Server, Web Sphere Application Server, Web Sphere Portal Server, and J2EE application deployment technology.

Environment: Java multithreading, JDBC, Hibernate, Struts, Collections, Maven, Subversion, JUnit, SQL language, Struts, JSP, SOAP, Servlets, Spring, Hibernate, Junit, Oracle, XML, Putty and Eclipse.

Hire Now