Big Data Engineer Resume
Franklin Lakes, NJ
SUMMARY:
- Over 14+ years of professional, ITES and IT experience including 4+ years in Hadoop/Big data ecosystem.
- Excellent understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node and Data Node.
- Hands on experience in installing and deployment of Hadoop ecosystem components like Hadoop Map Reduce, YARN, HDFS, NoSQL, HBase, Oozie, Hive, Tableau, Sqoop, Zoo Keeper and Flume.
- PySpark API's working knowledge.
- Strong experience in Hadoop distributions like Cloudera and Hortonworks.
- Excellent Hands on Experience in developing Hadoop Architecture within the project in Windows and Linux platforms.
- Good technical Skills in Oracle 11i, SQL Server, ETL Development using Informatica tool.
- Expert in importing and exporting data from Oracle and MySQL databases into HDFS using Sqoop and Flume.
- Experience in ingesting the streaming data to Hadoop clusters using Flume and Kafka.
- Performed data analytics using PIG and Hive for Data Architects and Data Scientists within the team.
- Experience with NoSQL databases like HBase, and Cassandra as well as other ecosystems like Zookeeper, Oozie, Storm etc.
- Experience in Job scheduling using automation tools like Control - M, JAMS, Autosys.
- Developed stored procedures and queries using PL/SQL.
- Expertise in RDBMS like Oracle, MS SQL Server, TERADATA, MySQL and DB2.
- Strong analytical skills with the ability to quickly understand a client’s business needs. Involved in meetings to gather information and requirements from the clients. Leading the Team and involved in Onsite, Offshore co-ordination.
SKILLS & ABILITIES:
Analytical Tools: SQL, Jupiter Notebook, Tableau
NoSQL: Cassandra, HBase, MongoDB
Hadoop Distributions: Cloudera, Hortonworks
Workload Automation Tool: Control-M, JAMS, Autosys.
Big Data: Spark, Hive, Sqoop, HBase, Hadoop, HDFS, Flume, Shell Script PySpark, Scala.
Databases: Oracle 11g/10g, DB2 8.1, MS-SQL Server, My SQL
Operating Systems: Unix / Linux, Windows 2000/NT/XP
PROFESSIONAL EXPERIENCE:
Confidential, Franklin Lakes, NJ
Big Data Engineer
Responsibilities:
- Involved in Requirement gathering, Business Analysis and translated business requirements into Technical design in Hadoop and Big Data.
- Created Hive Partitioned and Bucketed tables to improve performance.
- Involved in capacity planning, the configuration of the Cassandra Cluster on DATASTAX.
- Design, development, and implementation of performant ETL pipelines using python API (pySpark) of Apache Spark.
- Writing reusable, testable, and efficient code Experience in using Sqoop to import the data on to Cassandra tables from different relational databases.
- Importing and exporting data into HDFS from database and vice versa using Sqoop.
- Worked on the core and Spark SQL modules of Spark extensively using programming languages like Python and Scala.
- Closely work Business Intelligence team for Data Analysis, reporting dashboard and recommending solutions.
- Utilizing Spark streaming to receive real-time data from the Kafka and store the stream data to HDFS using Python/PySpark also Scala and databases such as HBase
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Performance tuning of pySpark scripts.
- Involved in creating Hive tables, loading with data and writing hive queries.
- Responsible in exporting analyzed data to relational databases using Sqoop.
- Creating the tables in Hive and integrating data between Hive & Spark.
- Responsible for tuning Hive to improve performance.
- Documented the technical details Hadoop cluster management and daily batch pipeline, which includes several jobs of Hive, Sqoop, Oozie, and other scripts.
Environment: Cassandra, HDFS, Hbase, Spark, pySpark, Hortonworks, Hive, Oozie, YARN, and Sqoop
Confidential, El Segundo, CA
Big Data Developer
Responsibilities:
- Defining the metrics for the Big data analytics proof of concepts
- Defining the requirements for data lakes/pipelines
- Creating the tables in Hive and integrating data between Hive & Spark
- Developed python scripts to collect data from source systems and store it on HDFS to run analytics
- Created Hive Partitioned and Bucketed tables to improve performance
- Created Hive tables with User defined functions
- Involved in code review and bug fixing for improving the performance
- Worked on the core and Spark SQL modules of Spark extensively using programming languages like Scala
- Perform extensive studies of different technologies and capture metrics by running different algorithms
- Converting the SAS algorithms into different technologies
- Design and implement data ingestion techniques for real-time data coming from various source systems
- Defining the data layouts and rules and after consultation with ETL teams
- Worked in an AGILE environment and participated in daily Stand-ups/Scrum Meetings
Environment: Python, HDFS, MapReduce, PL/SQL, Hive, Spark, AGILE, Spark SQL, Scala, Sqoop, Oracle, Quality Center, Windows.
Confidential
Big Data Consultant
Responsibilities:
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Hive, HBase and SQOOP.
- Coordinated with business customers to gather business requirements and interact with other technical peers to derive Technical requirements.
- Extensively involved in the Design phase and delivered Design documents.
- Involved in Testing and coordination with business in User testing.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Importing and exporting data into HDFS and Hive using SQOOP.
- Transforming the data using Spark applications for analytics consumption.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables, loading with data and writing hive queries.
- Experienced in defining job flows.
- Experience in developing ETL data pipelines using pyspark.
- Used Hive to analyze the partitioned data and compute various metrics for reporting.
- Experienced in managing and reviewing the Hadoop log files.
- Used ETL tool to do Transformations, even joins and some pre-aggregation.
- Load and Transform large sets of structured and semi-structured data.
- Responsible to manage data coming from different sources.
- Created Data model for Hive tables.
- Involved in Unit testing and delivered Unit test plans and results documents.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Worked on Oozie workflow engine for job scheduling.
Environment: Hadoop, HDFS, Spark, Cloudera Manager, Hive, YARN, Sqoop, HBase, Oozie, Flume.
Confidential
Data Processing Analyst
Responsibilities:
- Processing the applications from the across India.
- Downloading the scanned images from the data source.
- Creating a database by sequencing them in an order to be processed.
- Digitalizing the data by keying in the data to the SQL database.
- Creating the excel to tables with the data to make it available for the users.
- Arranging the tables according to the user requirement using oracle database.
Environment: SQL Database, PL/SQL MS SQL.
Confidential, Harrisburg, PA
Process Associate
Responsibilities:
- Claim forms will be received in Lotus notes in images.
- Doing a batch process by typing the numbers and information from the image to template.
- Vertexing (verifying text) by seeing the claim forms and keying in the information in the database.
- If an image of multiple pages separating them in batches to join them.
- Processing medical claim forms.
- Reviewing the database done by other Verterxers.
Environment: IBM Lotus Notes