We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Virginia, BeacH


  • 3 years of experience in software development industry, mainly focused on Big Data technologies and Distributed computing.
  • Experience in working with MapReduce programs using Apache Hadoop for working with Big Data .
  • Hands on creating databases, schemas and tables in PostgreSQL .
  • Experience with SQL, PL/SQL and database concepts.
  • Responsible for account growth of new and existing accounts utilizing knowledge of PaaS, IaaS, and SaaS.
  • Experience on Creating SQL objects like Tables, Stored Procedures, Views, Indexes, Triggers, Functions, User Defined Data - Types, Rules, and Defaults.
  • Hands on Hadoop Ecosystem, HDFS for the data storage & MapReduce for the cluster management.
  • Good knowledge in Apache Spark 2.0 using Scala programming.
  • Knowledge on processing with real-time data using Apache Spark.
  • Hands on converting Hive/SQL queries in to Spark transformations using spark RDDs, spark SQL.
  • Hands on coding Apache Spark using Python for the faster data processing.
  • Worked on various individual projects relevant to data mining.
  • Good understanding of concepts like Enterprise Data Warehousing ETL, Data Modelling, Data Mapping.
  • Experience in designing and maintaining high performing ELT/ETL Processes
  • Experience in working with Hive tool creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the Hive SQL queries as per the client requirements.
  • Hands on creating External and Internal tables in Hive.
  • Developed the UNIX shell scripts for creating the reports from Hive data.
  • Work experience with cloud infrastructure like Google cloud and Microsoft Azure .
  • Experience in NOSQL databases such as HBase and MongoDB .
  • Ability to meet deadlines and handle multiple tasks, flexible in work schedules and processes excellent communication skills.
  • Articulate in written and verbal communication along with strong interpersonal, strong analytical and organizational skills.


Programming: Python, R, Scala, Bash scripting & Linux Kernel Development.

Hadoop: Spark, Map-Reduce, Pig, Hive, Sqoop and Hbase.

Methodologies: Agile Methodologies and Waterfall Methodology PostgreSQL

SQL: My SQL server, NOSQL

Data warehousing Data Mining: Cassandra & Mango DB. Teradata and Hive.

Cloud services: Rapid Miner and R-programming tool Google cloud services & Microsoft Azure.

Apache: Core, Data frames, Spark SQL, Spark Streaming Eclipse, IntelliJ, Eclipse Scala IDE and PyCharm (Python).

Spark: Spark


Confidential, Virginia Beach

Big data Engineer

Environment: Hadoop HDFS, MapReduce, Apache Spark v 2.1.1, Apache Kafka v 0.10.2, Apache Spark streaming, Apache Cassandra v 3.10, Apache Zoo Keeper (v 3.4.10), Microsoft Azure and Scala programming v 2.11 .


  • Task is on Data pipeline (Data processing job) which uses the latest technology such as spark streaming and Kafka
  • Reading input data, transform, and write out the resulting output.
  • Creating a program with multiple pipelines.
  • Pipeline Transforms, Composite Transforms and Root Transforms.
  • Processed streaming data as well as batch using Apache Spark, Spark Streaming.
  • Streaming data in the Kafka cluster and also collected data is fed to the Apache core via Spark streaming API used.
  • Data was divide into small chunks and processed.
  • Used Spark for cleaning, processing, extracting relevant fields and performing aggregations.
  • Tasks on performing data transformations, Spark RDD and Data frames.
  • Kafka was used as messaging system to gather all the data from different sources and processing and pushing them to the Kafka.
  • Developed Kafka consumer program in Scala
  • Used Cassandra as the NoSQL Database and acquired very good working experience with NoSQL databases.
  • Hadoop HDFS was used to archiving the incoming data and also performing the ETL.
  • Used Apache Zookeeper for maintaining configuration information, naming, providing distributed synchronization and group services.


Hadoop Developer

Environment: Hadoop, HDFS, Pig, Sqoop, HBase, Shell Scripting, Apache Crunch, Ubuntu, Linux Red Hat.


  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Implemented nine nodes CDH3 Hadoop cluster on Red hat LINUX.
  • Implemented Apache Crunch library on top of map reduce and spark for data aggregation .
  • Involved in loading data from LINUX file system to HDFS.
  • Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
  • Created HBase tables to store variable data formats of PII ( Personally identifiable information ) data coming from different portfolios.
  • Implemented a script to transmit sysprin information from Oracle to base using Sqoop .
  • Implemented best income logic using Pig scripts and UDFs.
  • Implemented test scripts to support test driven development and continuous integration.
  • Worked on tuning the performance Pig queries.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Responsible to manage data coming from different sources.
  • Involved in loading data from file system to HDFS.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Cluster coordination services through Zookeeper.
  • Experience in managing and reviewing Hadoop log files.
  • Job management using Fair scheduler.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Installed Oozie workflow engine to run multiple Hive and pig jobs.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.


Java Developer

Environment: JAVA, Eclipse IDE, HTML, PL/SQL.


  • Worked on designing and developing the Web Application User Interface and implemented its related functionality in JAVA/J2EE for the product.
  • Designed and developed applications using JSP, Servlets and HTML.
  • Used Hibernate ORM module as an Object Relational mapping tool for back end operations.
  • Provided Hibernate configuration file and mapping files and also involved in the integration of Struts with Hibernate libraries.
  • Extensively used Java Multi-Threading concept for downloading files from a URL.
  • Extensively used Eclipse IDE for developing, debugging, integrating and deploying the application.
  • Developed Web Service client interface for invoking the methods using SOAP.
  • Created navigation component that reads the next page details from an XML config file .
  • Developed applications with HTML, JSP and Tag libraries.
  • Developed required stored procedures and database functions using PL/SQL.
  • Developed, Tested and debugged various components in WebLogic Application Server .

Software Assurance and Risk Mitigation

Confidential, Montgomery


  • IPAD technology in hospitals, the usage of IPAD’s by the doctors to enter the all the required information of the patients.
  • Security challenges: CIA (confidentiality Integrity Availability)
  • Experience and a deeper understanding of software-induced security risks and how to manage them.
  • FIPS-200, NIST SP800-30 rev1, NIST800-53 rev4.

Confidential, Montgomery, Alabama

Risk Quantitative Risk Assessment and Management


  • Conducted a research survey as part of our Cyber systems and Information Security degree program requirements.
  • Survey is comprised of questions related to risk areas associated with your particular field of work.
  • Purpose of the survey is to utilize experienced company personnel to help identify the vulnerabilities associated within the area of Banking.
  • The survey will assist in extracting the specific threats that comprise each vulnerability and any countermeasures employed to minimize the Risk / impact of the vulnerability exploitation.
  • Once a sufficient number of sample surveys are collected from each company, the results will be input into a security risk assessment program developed at the university.
  • The program will analyze the vulnerability, threat, and countermeasure survey results, calculate a cost to mitigate the identified vulnerabilities, and produce an overall residual risk percentage.
  • The program will then optimize the results down to a desired risk percentage.
  • The optimized results include a cost to optimize and recommendations on areas to focus on to achieve the best return on investment.
  • The results of the optimization will be made available to the company along with a detailed explanation of the findings.
  • The results of the survey/optimization process may be used in published articles from the university, but the actual company name will never be identified.

Hire Now