We provide IT Staff Augmentation Services!

Sr. Big Data/scala Developer Resume



  • Big Data developer with over 8+ years of professional IT experience, which includes 4years’ experience in the field of Big Data.
  • Extensive experience in working with various distributions of Hadoop like enterprise versions of Cloudera, Hortonworks and good knowledge on MAPR distribution and Amazon’s EMR.
  • In depth experience in using various Hadoop Ecosystem tools like HDFS, MapReduce, Yarn, Pig, Hive, Sqoop, Spark, Storm, Kafka, Oozie, Elastic search, HBase, and Zookeeper.
  • Extensive knowledge of Hadoop architecture and its components.
  • Good knowledge in installing, configuring, monitoring and troubleshooting Hadoop cluster and its eco - system components.
  • Exposure to Data Lake Implementation using Apache Spark.
  • Developed Data pipe lines and applied business logics using Spark.
  • Well-versed in spark components like Spark SQL, MLib, Spark streaming and GraphX.
  • Extensively worked on Spark streaming and Apache Kafka to fetch live stream data.
  • Used Scala and Python to convert Hive/SQL queries into RDD transformations in Apache Spark.
  • Experience in integrating Hive queries into Spark environment using Spark SQL.
  • Expertise in performing real time analytics on big data using HBase and Cassandra.
  • Handled importing data from RDBMS into HDFS using Sqoop and vice-versa.
  • Extensive experience in importing and exporting streaming data into HDFS using stream processing platforms like Flume and Kafka.
  • Experience in developing data pipeline using Pig, Sqoop, and Flume to extract the data from weblogs and store in HDFS.
  • Created User Defined Functions (UDFs), User Defined Aggregated Functions (UDAFs) in PIG and Hive.
  • Hands-on experience in tools like Oozie and Airflowto orchestrate jobs.
  • Proficient in NoSQL databases including HBase, Cassandra, MongoDB and its integration with Hadoop cluster.
  • Expertise in Cluster management and configuring Cassandra Database.
  • Great familiarity with creating Hive tables, Hive joins & HQL for querying the databases eventually leading to complex Hive UDFs.
  • Accomplished developing Pig Latin Scripts and using Hive Query Language for data analytics.
  • Worked on different compression codecs (ZIO, SNAPPY, GZIP) and file formats (ORC, AVRO, TEXTFILE, PARQUET)
  • Experience in practical implementation of cloud-specific AWS technologies including IAM, Amazon Cloud Services like Elastic Compute Cloud (EC2), ElastiCache, Simple Storage Services (S3), Cloud Formation, Virtual Private Cloud (VPC), Route 53, Lambda, EBS.
  • Built AWS secured solutions by creating VPC with public and private subnets.
  • Worked on data warehousing and ETL tools like Informatica, Talend, and Pentaho.
  • Expertise working in JAVA J2EE, JDBC, ODBC, JSP, Java Eclipse, Java Beans, EJB, Servlets.
  • Developed web page interfaces using JSP, Java Swings, and HTML scripting languages.
  • Developed web page interfaces using JSP, Java Swings, and HTML scripting languages.
  • Experience working with Spring and Hibernate frameworks for JAVA.
  • Worked on various programming languages using IDEs like Eclipse, NetBeans, and Intellij.
  • Excelled in using version control tools like PVCS, SVN, VSS and GIT.
  • Used web-based UI development using JavaScript, jquery UI, CSS, jquery, HTML, HTML5, XHTML and JavaScript.
  • Development experience in DBMS like Oracle, MS SQL Server, Teradata, and MYSQL.
  • Developed stored procedures and queries using PL/SQL.
  • Experience with best practices of Web services development and Integration (both REST andSOAP).
  • Experienced in using build tools like Ant, Gradle, SBT, Maven to build and deploy applications into the server.
  • Knowledge in Unified Modeling Language (UML) and expertise in Object Oriented Analysis and Design (OOAD) and knowledge
  • Experience in complete Software Development Life Cycle (SDLC) in both Waterfall and Agile methodologies
  • Knowledge in Creating dashboards and data visualizations using Tableau to provide business insights
  • Excellent communication skills, interpersonal skills, problem-solving skills and very good team player along with a can do attitude and ability to effectively communicate with all levels of the organization such as technical, management and customers.


Languages/Tools: Java, C, C++, C#, Scala, VB, XML, HTML/XHTML, HDML, DHTML.

Big Data: HDFS, MapReduce, HIVE, PIG, HBase, SQOOP, Oozie, Zookeeper, Spark, Mahout, Kafka, Storm, Cassandra, Solr, Impala, Greenplum, MongoDB

J2EE Standards: JDBC, JNDI, JMS, Java Mail & XML Deployment Descriptors.

Web/Distributed Technologies: J2EE,Servlets 2.1/2.2, JSP 2.0, Struts 1.1, Hibernate 3.0, JSF, JSTL1.1,EJB 1.1/2.0, RMI,JNI, XML,JAXP,XSL,XSLT, UML, MVC,STRUTS,Spring 2.0, Corba, Java Threads.

Operating System: Windows 95/98/NT/2000/XP, MS-DOS, UNIX, multiple flavors of Linux.

Databases / NO SQL: Oracle 10g, MS SQL Server 2000, DB2, MS Access & MySQL. Teradata, Cassandra, Greenplum and MongoDB

Browser Languages: HTML, XHTML, CSS, XML, XSL, XSD, XSLT.

Browser Scripting: Java script, HTML DOM, DHTML, AJAX.

App/Web Servers: IBM Websphere 5.1.2/5.0/4.0/3.5 , BEA Web logic 5.1/7.0, Jdeveloper, Apache Tomcat, JBoss.

GUI Environment: Swing, AWT, Applets.

Messaging & Web Services Technology: SOAP, WSDL,UDDI, XML, SOA, JAX-RPC, IBM WebSphere MQ v5.3, JMS.

Networking Protocols: HTTP, HTTPS, FTP, UDP, TCP/IP, SNMP, SMTP, POP3.

Testing &Case Tools: Junit, Log4j, Rational Clear case, CVS, ANT, Maven, JBuilder.

Version Control Systems: Git, SVN, CVS


Confidential - Ohio

Sr. Big Data/Scala Developer


  • Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
  • Used Sqoop to import data from Relational Databases like MySQL, Oracle.
  • Involved in importing structured and unstructured data into HDFS.
  • Responsible for fetching real time data using Kafka and processing using Spark and Scala.
  • Worked on Kafka to import real time weblogs and ingested the data to Spark Streaming.
  • Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
  • Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API.
  • Worked on Hive to implement Web Interfacing and stored the data in Hive tables.
  • Migrated Map Reduce programs into Spark transformations using Spark and Scala.
  • Experienced with Spark Context, Spark-SQL, Spark YARN.
  • By using Vertica Columnar relational database management system used for data warehousing and Big data analytics.
  • Strengthened T-SQL coding skills.
  • Created schema, table, and T-SQL scripts to archive Vertica resources usage data for trend analysis.
  • Implemented Spark Scripts using Scala, Spark SQL to accesshivetables into spark for faster processing of data.
  • Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Implemented data quality checks using Spark Streaming and arranged passable and bad flags on the data.
  • Implemented Hive Partitioning and Bucketing on the collected data in HDFS.
  • Involved in Data Querying and Summarization using Hive and Pig and created UDF’s, UDAF’s and UDTF’s.
  • Implemented Sqoop jobs for large data exchanges between RDBMS and Hive clusters.
  • Extensively used Zookeeper as a backup server and job scheduled for Spark Jobs.
  • Developed traits and case classes etc in Scala.
  • Developed Spark scripts using Scala shell commands as per the business requirement.
  • Worked on Cloudera distribution and deployed on AWS EC2 Instances.
  • Experienced in loading the real-time data to NoSQL database like Cassandra.
  • Well versed in using Data Manipulations, Compactions, in Cassandra.
  • Experience in retrieving the data present in Cassandra cluster by running queries in CQL (Cassandra Query Language).
  • Worked on connecting Cassandra database to the Amazon EMR File System for storing the database in S3.
  • Implemented usage of Amazon EMR for processing Big Data across aHadoop Clusterof virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
  • Deployed the project on Amazon EMR with S3 connectivity for setting a backup storage.
  • Well versed in using of Elastic Load Balancer for Auto scaling in EC2 servers.
  • Configured work flows that involves Hadoop actions using Oozie.
  • Used Python for pattern matching in build logs to format warnings and errors.
  • Coordinated with SCRUM team in delivering agreed user stories on time for every sprint.

Environment: Hadoop YARN, Spark SQL, Spark-Streaming, AWS S3, AWS EMR, Spark-SQL, GraphX, Scala, Python, Kafka, Hive, Pig, Sqoop, Cassandra, Cloudera, Oracle 10g, Linux.

Confidential - Coraopolis, PA

Hadoop/Big Data Analyst


  • Developed MapReduce programs to parse and filter the raw data store the refined data in partitioned tables in the Hbase.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with Hbase reference tables and historical metrics.
  • Responsible for creatingHivetables, loading the structured data resulted from MapReduce jobs into the tables and writinghivequeries to further analyze the logs to identify issues and behavioral patterns.
  • Involved in running MapReduce jobs for processing millions of records.
  • Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
  • Developed a data pipeline using Scala to store data into HDFS.
  • Experienced in migratingHiveQL into Impala to minimize query response time.
  • Responsible for Data Modeling in Hbase as per our requirement.
  • Shared responsibility for administration of Hadoop, Hive and Pig.
  • Managing and scheduling Jobs on a Hadoop cluster using Nifi jobs.
  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Created UDFs to calculate the pending payment for the given Residential or Small Business customer, and used in Pig and Hive Scripts.
  • Deployed and built the application usingMaven.
  • Maintain Hadoop, Hadoop ecosystems, third party software, and database(s) with updates/upgrades, performance tuning and monitoring using Ambari
  • Obtained good experience with NOSQL database Hbase.
  • UsedCassandraCQL with Java API's to retrieve data fromCassandratables.
  • Experience in managing and reviewing Hadoop log files.
  • Experienced in moving data from Hive tables intoHbasefor real time analytics on Hive tables.
  • Handled importing of data from various data sources, performed transformations usingHive. (External tables, partitioning).
  • Involved in NoSQL (DataStaxCassandra) database design, integration and implementation.
  • Implemented CRUD operations involving lists, sets and maps in DataStaxCassandra.
  • Responsible for data modeling inHbasein order to load data which is coming as structured as well as unstructured data.
  • Unstructured files like XML's, JSON files are processed using custom built Java API and pushed intomongodb.
  • Participated in development/implementation ofClouderaHadoopenvironment.
  • Created tables, inserted data and executed variousCassandraQuery Language (CQL 3) commands on tables from java code and using cqlsh command line client .
  • Wrote test cases in MRunit for unit testing of Mapreduce Programs.
  • Extensively worked on User Interface for few modules using JSPs, JavaScript and Ajax.
  • Created Business Logic using Servlets, Session beans and deployed them on Web logic server.
  • Involved in templates and screens in HTML and JavaScript.
  • Developed the XML Schema and Web services for the data maintenance and structures
  • Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects.
  • Built and deployed Java applications into multiple Unix based environments and produced both unit and functional test results along with release notes.

Environment: HDFS, MapReduce, Hive, Pig, Cloudera, Impala, Nifi, Hbase, Cassandra, Kafka, Storm, Maven,CloudManager, NagiOS, Ambari, JDK, J2EE, Struts,JSP, Servlets, ElasticSearch, WebSphere, HTML, XML, JavaScript, MRunit.

Confidential - Peoria, IL

Hadoop/Big Data Analyst


  • Exported data to a MySQL from HDFS using Sqoop and NFS mount approach.
  • Moved data from HDFS to Cassandra using Map Reduce and BulkOutputFormat class.
  • Developed Map Reduce programs for applying business rules on the data.
  • Developed and executed hive queries for denormalizing the data.
  • Works with ETL workflow, analysis of big data and loaded them into Hadoop cluster.
  • Installed and configured Hadoop Cluster for development and testing environment.
  • Implemented Fair scheduler on the Job tracker to share the resources of the cluster for the map reduces jobs given by the users.
  • Involved in creating Shell scripts to simplify the execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move the data inside and outside of HDFS
  • Automated the workflow using shell scripts.
  • Performance tuning of the Hive queries, written by other developer.
  • Mastered major Hadoop distros HDP/CDH and numerous Open Source projects
  • Prototype various applications that utilize modern Big Data tools.

Environment: Linux, Java, Map Reduce, HDFS, DB2, Cassandra, Hive, Pig, Sqoop, FTP.

Confidential - SanFrancisco, CA

Hadoop Developer


  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce.
  • Worked on debugging, performance tuning of Hive&Pig Jobs.
  • Created Hbase tables to store various data formats of PII data coming from different portfolios.
  • Implemented test scripts to support test driven development and continuous integration.
  • Worked on tuning the performance Pig queries.
  • Involved in loading data from LINUX file system to HDFS.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Experience working on processing unstructured data using Pig and Hive.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Gained experience in managing and reviewing Hadoop log files.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Developed Pig Latin scripts to extract data from the web server output files to load into HDFS.
  • Extensively used Pig for data cleansing.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
  • Implemented SQL, PL/SQL Stored Procedures.
  • Actively involved in code review and bug fixing for improving the performance.
  • Developed screens using JSP, DHTML, CSS, AJAX, JavaScript, Struts, Spring, Java and XML.

Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, LINUX, Cloudera, Big Data, Java APIs, Java collection, SQL, AJAX.


Java Developer


  • Extensively involved in the design and development of JSP screens to suit specific modules.
  • Converted the application’s console printing of process information to proper logging technology using log4j.
  • Developed the business components (in core Java) used in the JSP screens.
  • Involved in the implementation of logical and physical database design by creating suitable tables,views and triggers.
  • Developed related procedures and functions used by JDBC calls in the above components.
  • Extensively involved in performance tuning of Oracle queries.
  • Created components to extract application messages stored in xml files.
  • Executed UNIX shell scripts for command line administrative access to oracle database and for scheduling backup jobs.
  • Created war files and deployed in web server.
  • Performed source and version control using VSS.
  • Involved in maintenance support.

Environment: JDK, HTML, JavaScript, XML, JSP, Servlets, JDBC, Oracle 9i, Eclipse, Toad, UNIX Shell Scripting, MS Visual SourceSafe, Windows 2000.


Junior JAVA Developer


  • Involved in the analysis, design, implementation, and testing of the project.
  • Implemented the presentation layer with HTML, XHTML and JavaScript.
  • Developed web components using JSP, Servlets and JDBC.
  • Designed tables and indexes.
  • Extensively worked on JUnit for testing the application code of server-client data transferring.
  • Developed and enhanced products in design and in alignment with business objectives.
  • Used SVN as a repository for managing/deploying application code.
  • Involved in the system integration and user acceptance tests successfully.
  • Developed front end using JSTL, JSP, HTML, and Java Script.
  • Wrote complex SQL queries and stored procedures.
  • Involved in fixing bugs and unit testing with test cases using JUnit.
  • Actively involved in the system testing.
  • Involved in implementing service layer using Spring IOC module.
  • Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along with the product.

Environment: Java, JSP, JSTL, HTML, JAVAScript, Servlets, JDBC, JavaScript, MySQL, JUnit, Eclipse IDE.

Hire Now