We provide IT Staff Augmentation Services!

Big Data Engineer Resume

5.00/5 (Submit Your Rating)

Lafayette, LA

SUMMARY:

  • Around8 years of professional IT experience in all phases of Software Development Life Cycle which includes hands on experience in Hadoop ecosystem technologies and Java/J2EE technologies.
  • Over4 years of hands on experience in using Hadoop ecosystem components like HDFS, Map Reduce, Hive, Impala,Sqoop, Pig, Flume, and Spark.
  • Over 3 years of experience in Java programming with hands - on on the frameworks Spring, Struts and Hibernate.
  • Experienced on Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming and Spark MLlib.
  • A good knowledge implementing Apache Spark with Scala.
  • Great hands on experience with Pyspark for using Spark libiries by using python scripting for data analysis.
  • Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Experienced on collection the real time streaming data and creating the pipeline for row data from different source using Kafka and store data into HDFS and NoSQL using Spark.
  • Experience in developing PIG Latin Scripts and Hive Query Language.
  • Working knowledge on Oozie, a workflow scheduler system to manage the jobs that run on Pig, Hive, and Sqoop.
  • Good knowledge on Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
  • Experience in Programming SQL, Stored procedure's PL/ SQL, and Triggers in Oracle and SQL Server.
  • Well versed with core Java concepts like collections, multithreading, serialization, Java beans.
  • Experience in implementing Web Services based in Service Oriented Architecture (SOA) using SOAP, RESTful Web Services.
  • Hands on experience with Version Control tools like GitHub and SVN.

TECHNICAL SKILLS:

  • Hadoop
  • Spark
  • Kafka
  • Python
  • Scala
  • Java
  • CA WA workload
  • Unix Shell scripting
  • Apache Maven
  • SQL Server
  • Oracle
  • MySQL
  • GitHub
  • SVN
  • Hadoop Developer
  • Java Developer
  • Jenkins uDeploy
  • HTML
  • CSS
  • Java script

EXPERIENCE:

Confidential, Lafayette, LA

Big Data Engineer

Responsibilities:

  • Developed a data pipeline using Kafka, HBase, Spark and Hive to ingest, transform and analyse.
  • Worked on transformations on raw input data consumed from Kafka topics and transformed the data into new topics for further processing.
  • Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD'S and Scala.
  • Involved in creating Hive tables, loading and analysing data using hive queries and written complex Hive queries to transform the data.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python.
  • Extensively worked on data export from Data Lake to target RDBMS.
  • Extracted the data from SQL Server and Oracle into HDFS using Sqoop. Created and worked Sqoop jobs with incremental load to populate Hive External tables.
  • Solved performance issues in Hive scripts with understanding of Joins, Group and aggregation and how does it translate to Map Reduce jobs.
  • Extensively involved in query optimization in Hive query language.
  • Involved in assimilating different structured and unstructured data and using Hive and Impala to aggregate and transform data required for reporting.
  • Worked on scripting for automation and monitored using Shell scripts.
  • Worked on define, monitor and manage scheduled and event-based workloads through ESP jobs.
  • Worked with Jenkins for code builds and IBM UrbanCode Deploy for Hadoop code deployment.
  • Worked with GitHub.

Environment: Apache Hadoop, Apache Spark, Scala, Kafka, HBase, Hive, Pig, Oozie, Python, SQL, CA WA Workstation (ESP), Toad, GitHub.

Confidential, Charlotte, NC

Hadoop Developer

Responsibilities:

  • Worked on migrating MapReduce programs into Spark transformations using Spark and Python (PySpark).
  • Imported the data from different sources like HDFS/HBase into SparkRDD
  • Implemented log-aggregation and transforming data for analytics using Apache Kafka.
  • Developed Spark programs (Spark streaming and Spark SQL) in Scala for in-memory data processing.
  • Used Scala to write the code for all the use cases in spark and extensive experience with Scala for data analytics on Spark cluster and Performed map-side joins on RDD.
  • Developed Python code to gather the data from HBase and designs the solution to implement using PySpark.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured
  • Importing and exporting data into HDFS and Hive using Sqoop and Flume.
  • Performing big data processing using Hadoop, Map Reduce, Sqoop, Oozie, Impala
  • Developed Hive queries for the analysts.
  • Responsible for Data Ingestion like Flume.
  • Import and export of data using Sqoop from or to HDFS and Relational DB.
  • Worked on various Java concepts when working in Map-Reduce
  • Served as a performance /Experience in optimization of Map reduce algorithm using combiners and partitions to deliver the best results.
  • Involved in creating Hive Tables, loading with data and writing Hive queries, which will invoke and run MapReduce jobs in the backend.
  • Built alert & monitoring scripts for applications & servers using Python & Shell Script.
  • Worked with GitHub.

Environment: Hadoop, Spark, Scala, Kafka, Hive, Pig, Sqoop, Flume, Oozie, Impala,Map Reduce, Python, HBase, Cassandra,Shell scripting, SQL, Oracle 11g, Linux, GitHub.

Confidential, Detroit, MI

Hadoop Developer

Responsibilities:

  • Developed the Pig UDF'S to process the data for analysis.
  • Involved in loading data from LINUX file system to HDFS.
  • Involved in running Ad-Hoc query through PIG Latin language, Hive or Java MapReduce.
  • Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analysed the imported data using Hadoop Components
  • Proficient in using Cloudera Manager, an end to end tool to manage Hadoop services.
  • Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
  • Developed Hive queries for the analysts.
  • Worked extensively with Sqoop for importing metadata from Oracle.
  • Experienced in defining job flows using Oozie
  • Developed Shell Script to perform Data Profiling on the ingested data with the help of hive bucketing.
  • Working Knowledge in NoSQL Databases like HBase and Cassandra.
  • Generated property list for every application dynamically using python.
  • Managed Batch jobs using UNIX shell and Perl scripts
  • Used SVN and GitHub as version control tools.

Environment: : JDK1.6, HDFS, Map Reduce, Spark,Yarn, Hive, Pig, Sqoop, Flume, Oozie, Impala, Cloudera, NoSQL Hbase, Cassandra, Oracle 11g, Python, Shell scripting, Perl, Linux, SVN, GitHub.

Confidential

Java/J2EE Developer

Responsibilities:

  • Involved in different phases to gather requirements, document the functional specifications, design, data modeling and development of the applications.
  • J2EE Front-End and Back-End supporting business logic, integration, and persistence.
  • Used JSP with Spring Framework for developing User Interfaces.
  • Developed the front-end user interface using J2EE, Servlets, JDBC, HTML, DHTML, CSS, XML, XSL, XSLT and JavaScript as per Use Case Specification.
  • Integrated Security Web Services for authentication of users.
  • Used Hibernate Object/Relational mapping and persistence framework as well as a Data Access abstraction Layer.
  • Data Access Objects (DAO) framework is bundled as part of the Hibernate Database Layer.
  • Designed Data Mapping XML documents that are utilized by Hibernate, to call stored procedures.
  • Implemented Web-Services to integrate between different applications (internal and third-party components using SOAP and RESTful services using Apache-CXF
  • Developed and published web-services using SOAP.
  • Developed efficient PL/SQL packages for data migration and involved in bulk loads, testing and reports generation.
  • Development of complex SQL queries and stored procedures to process and store the data
  • Used CVS version control to maintain the Source Code.

Environment: Java, J2EE, JSPs, Struts, EJB, Spring, RESTful, SOAP, Apache- CXF, WebSphere , PL/SQL,Hibernate, HTML, XML, Oracle 9i, Swing, JavaScript, CVS.

We'd love your feedback!