Big Data Engineer Resume
Lafayette, LA
SUMMARY:
- Around8 years of professional IT experience in all phases of Software Development Life Cycle which includes hands on experience in Hadoop ecosystem technologies and Java/J2EE technologies.
- Over4 years of hands on experience in using Hadoop ecosystem components like HDFS, Map Reduce, Hive, Impala,Sqoop, Pig, Flume, and Spark.
- Over 3 years of experience in Java programming with hands - on on the frameworks Spring, Struts and Hibernate.
- Experienced on Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming and Spark MLlib.
- A good knowledge implementing Apache Spark with Scala.
- Great hands on experience with Pyspark for using Spark libiries by using python scripting for data analysis.
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Experienced on collection the real time streaming data and creating the pipeline for row data from different source using Kafka and store data into HDFS and NoSQL using Spark.
- Experience in developing PIG Latin Scripts and Hive Query Language.
- Working knowledge on Oozie, a workflow scheduler system to manage the jobs that run on Pig, Hive, and Sqoop.
- Good knowledge on Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
- Experience in Programming SQL, Stored procedure's PL/ SQL, and Triggers in Oracle and SQL Server.
- Well versed with core Java concepts like collections, multithreading, serialization, Java beans.
- Experience in implementing Web Services based in Service Oriented Architecture (SOA) using SOAP, RESTful Web Services.
- Hands on experience with Version Control tools like GitHub and SVN.
TECHNICAL SKILLS:
- Hadoop
- Spark
- Kafka
- Python
- Scala
- Java
- CA WA workload
- Unix Shell scripting
- Apache Maven
- SQL Server
- Oracle
- MySQL
- GitHub
- SVN
- Hadoop Developer
- Java Developer
- Jenkins uDeploy
- HTML
- CSS
- Java script
EXPERIENCE:
Confidential, Lafayette, LA
Big Data Engineer
Responsibilities:
- Developed a data pipeline using Kafka, HBase, Spark and Hive to ingest, transform and analyse.
- Worked on transformations on raw input data consumed from Kafka topics and transformed the data into new topics for further processing.
- Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD'S and Scala.
- Involved in creating Hive tables, loading and analysing data using hive queries and written complex Hive queries to transform the data.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python.
- Extensively worked on data export from Data Lake to target RDBMS.
- Extracted the data from SQL Server and Oracle into HDFS using Sqoop. Created and worked Sqoop jobs with incremental load to populate Hive External tables.
- Solved performance issues in Hive scripts with understanding of Joins, Group and aggregation and how does it translate to Map Reduce jobs.
- Extensively involved in query optimization in Hive query language.
- Involved in assimilating different structured and unstructured data and using Hive and Impala to aggregate and transform data required for reporting.
- Worked on scripting for automation and monitored using Shell scripts.
- Worked on define, monitor and manage scheduled and event-based workloads through ESP jobs.
- Worked with Jenkins for code builds and IBM UrbanCode Deploy for Hadoop code deployment.
- Worked with GitHub.
Environment: Apache Hadoop, Apache Spark, Scala, Kafka, HBase, Hive, Pig, Oozie, Python, SQL, CA WA Workstation (ESP), Toad, GitHub.
Confidential, Charlotte, NC
Hadoop Developer
Responsibilities:
- Worked on migrating MapReduce programs into Spark transformations using Spark and Python (PySpark).
- Imported the data from different sources like HDFS/HBase into SparkRDD
- Implemented log-aggregation and transforming data for analytics using Apache Kafka.
- Developed Spark programs (Spark streaming and Spark SQL) in Scala for in-memory data processing.
- Used Scala to write the code for all the use cases in spark and extensive experience with Scala for data analytics on Spark cluster and Performed map-side joins on RDD.
- Developed Python code to gather the data from HBase and designs the solution to implement using PySpark.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured
- Importing and exporting data into HDFS and Hive using Sqoop and Flume.
- Performing big data processing using Hadoop, Map Reduce, Sqoop, Oozie, Impala
- Developed Hive queries for the analysts.
- Responsible for Data Ingestion like Flume.
- Import and export of data using Sqoop from or to HDFS and Relational DB.
- Worked on various Java concepts when working in Map-Reduce
- Served as a performance /Experience in optimization of Map reduce algorithm using combiners and partitions to deliver the best results.
- Involved in creating Hive Tables, loading with data and writing Hive queries, which will invoke and run MapReduce jobs in the backend.
- Built alert & monitoring scripts for applications & servers using Python & Shell Script.
- Worked with GitHub.
Environment: Hadoop, Spark, Scala, Kafka, Hive, Pig, Sqoop, Flume, Oozie, Impala,Map Reduce, Python, HBase, Cassandra,Shell scripting, SQL, Oracle 11g, Linux, GitHub.
Confidential, Detroit, MI
Hadoop Developer
Responsibilities:
- Developed the Pig UDF'S to process the data for analysis.
- Involved in loading data from LINUX file system to HDFS.
- Involved in running Ad-Hoc query through PIG Latin language, Hive or Java MapReduce.
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analysed the imported data using Hadoop Components
- Proficient in using Cloudera Manager, an end to end tool to manage Hadoop services.
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
- Developed Hive queries for the analysts.
- Worked extensively with Sqoop for importing metadata from Oracle.
- Experienced in defining job flows using Oozie
- Developed Shell Script to perform Data Profiling on the ingested data with the help of hive bucketing.
- Working Knowledge in NoSQL Databases like HBase and Cassandra.
- Generated property list for every application dynamically using python.
- Managed Batch jobs using UNIX shell and Perl scripts
- Used SVN and GitHub as version control tools.
Environment: : JDK1.6, HDFS, Map Reduce, Spark,Yarn, Hive, Pig, Sqoop, Flume, Oozie, Impala, Cloudera, NoSQL Hbase, Cassandra, Oracle 11g, Python, Shell scripting, Perl, Linux, SVN, GitHub.
Confidential
Java/J2EE Developer
Responsibilities:
- Involved in different phases to gather requirements, document the functional specifications, design, data modeling and development of the applications.
- J2EE Front-End and Back-End supporting business logic, integration, and persistence.
- Used JSP with Spring Framework for developing User Interfaces.
- Developed the front-end user interface using J2EE, Servlets, JDBC, HTML, DHTML, CSS, XML, XSL, XSLT and JavaScript as per Use Case Specification.
- Integrated Security Web Services for authentication of users.
- Used Hibernate Object/Relational mapping and persistence framework as well as a Data Access abstraction Layer.
- Data Access Objects (DAO) framework is bundled as part of the Hibernate Database Layer.
- Designed Data Mapping XML documents that are utilized by Hibernate, to call stored procedures.
- Implemented Web-Services to integrate between different applications (internal and third-party components using SOAP and RESTful services using Apache-CXF
- Developed and published web-services using SOAP.
- Developed efficient PL/SQL packages for data migration and involved in bulk loads, testing and reports generation.
- Development of complex SQL queries and stored procedures to process and store the data
- Used CVS version control to maintain the Source Code.
Environment: Java, J2EE, JSPs, Struts, EJB, Spring, RESTful, SOAP, Apache- CXF, WebSphere , PL/SQL,Hibernate, HTML, XML, Oracle 9i, Swing, JavaScript, CVS.