Hadoop/spark Developer Resume
OhiO
SUMMARY:
- Detailed Oriented Big Data Engineer/Hadoopdeveloper with around 4 years of total IT Experience.
- Excellent knowledge on Apache Hadoop ecosystem components like Map - Reduce, Hive, Pig, SQOOP, Spark, Flume, HBase, Kafka, Oozie, Zookeeper, YARN programming paradigm.
- Proficiency in export/importing data from/to Relational Database Management Systems (RDBMS).
- Experience in Spark Streaming to ingest data from multiple data sources into HDFS.
- Experience using PL/SQL to write Stored Procedures, Functions and Triggers in Oracle.
- Experience in Partitioning tables, UDFs, Performance tuning, compression related properties in Hive.
- Hands on Expertise in designing and developing applications in Spark using Scala to compare performance of Spark with Hive.
- Strong knowledge on optimizing existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Proficient in using Apache Kafka for tracking data ingestion to Hadoop cluster and implementing Kafka Custom encoders for custom input format to load data into Kafka Partitions.
- Solid knowledge on data transformations using MapReduce, Hive for different file formats.
- Excellent knowledge on AWS S3, EC2, Redshift and lambda.
- Knowledge with various scripting language like java, and J2EE.
- Knowledge in job work-flow scheduling and monitoring tools like Oozie scheduler.
- Experience with various scripting languages like Linux/Unix shell scripts, Python.
- Experience in using Sequence files, AVRO, Parquet file formats, Managing Hadoop log files.
- Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
- Detailed understanding of Software Development Life Cycle (SDLC) and strong knowledge in project implementation methodologies like Agile and Waterfall.
- Excellent technical, communication, analytical, problem-solving and trouble-shooting capabilities.
TECHNICAL SKILLS:
Languages: Java, Scala, Python, R-studio, C++/C, SQL, PL/SQL, HiveQL, J2EE
Big Data Technologies: HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Kafka, YARN, Zookeeper, Hue, Flume, CDH 5.14, Oozie workflow
Database: Oracle 11g/10g, MS Access, MySQL, SQL Server, IBM DB2, No SQL (HBase, Cassandra)
Web Technologies/Tools: JavaScript, HTML5, CSS3, JSP, Servlets, JSON, XML, AWS S3, EC2
IDE: Eclipse, IntelliJ, SBT, Apache Tomcat, Net Beans, WebLogic, Jupyter Notebooks
Methodologies: Agile, Waterfall, Version Control: GIT, SVN
OS & Tools: Windows, UNIX, Linux,Putty, WinSCP, FileZilla, Power BI, Tableau, MAVEN
PROFESSIONAL EXPERIENCE:
Hadoop/Spark Developer
Confidential, Ohio
Responsibilities:
- Implemented various POC's (Proof of Concept) for evaluating the Big Data technologies.
- Assisted in setting up the environment for continuous deployments.
- Collaborated with the infrastructure, network, database, application, and BI teams to ensure the architecture would serve the right purpose.
- Loaded the data using Sqoop from different RDBMS Servers like Oracle, MySQL, Mainframes to Hadoop HDFS Cluster.
- Should have a deep understanding of Java and is expected to perform complex data transformations in Spark using Scala language .
- Built Data Quality services for actively controlling the data ingested into the platform
- Load and transform large sets of structured, semi structured and unstructured data. Used AVRO, Parquet file formats for serialization of data.
- Involved in creating Hive tables, loading with data, and writing hive queries which will run internally in the map-reduce pattern.
- Developing the real-time processing of Wi-Fi data logs using Spark.
- Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
- Developed unloading microservices using Scala API in Spark Dataframe API for the semantic layer.
- Created data provisioning services for the client to query over the large datasets.
- Developed logger service using Kafka and Spark streaming which would log the data real time.
- Built data governance policy, processes, procedures, and control for Data Platform.
- Collaborated with different projects and assisted them to use the Data platform services.
Environment: Cloud era, Spark, Python, Hive, Java, Scala, SQL, Sqoop, Maven, Kafka, Spark Streaming, Flume, Hue - Impala, HBase.
Bigdata Developer
Confidential, Boston, MA
Responsibilities:
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Involved in loading data from LINUX file system to CDH Hadoop Distributed File System using Sqoop import with different append functionalities
- Developed data pipeline using Flume, Sqoop to ingest student behavioral data and time taken for preparation and materials studied into HDFS for analysis.
- Expertise in newer concepts like Apache Spark and Scala programming
- Used Scala to perform data validation on the data ingested using scoop and flume and the cleansed data set is pushed into HBase.
- Involved in HBASE setup and storing data into HBASE, which will be used for further analysis.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
- Developed job flows in Oozie to automate the workflow for pig and hive jobs.
- Loaded the aggregated data onto DB2 from Hadoop environment using Sqoop for reporting on the dashboard.
- Facilitated Knowledge transfer sessions.
- Worked in an Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in the daily scrum and other design related meetings.
Environment: Cloud era, Spark, Python, Hive, Java, Scala, SQL, Sqoop, J2EE, Impala, Yarn, Lambda, Tableau, Cloudera, Eclipse, HBase, Agile, Waterfall, Version Control: GIT, SVN.
Hadoop Developer
Confidential
Responsibilities:
- Designed Sqoop jobs to Import the large sets of Structured data from DB2 to HDFS and Exported data for report Analysis.
- Responsible for the documentation, design, development, and architecture of Hadoop applications
- Worked on SparkSQL, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3.
- Designing, Building, installing, configuring and supporting Hadoop.
- Developed simple to complex MapReduce jobs using Java language for processing and validating the data.
- Built UDF (User Defined Functions) in Pig, Hive when needed and Developing the Pig scripts for processing data.
- Adopted Oozie Workflow engine to run multiple Hive and Sqoop jobs.
- Wrote multiple Hive queries for data analysis to meet the business requirements.
- Experienced in creating hive internal and external tables on top of HDFS.
- Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS and developed Kafka consumer API in Scala for consuming data from Kafka topics.
- Processed data using SparkSQL in-memory computation & processed results to Hive tables.
- Load and transform large sets of structured, semi structured and unstructured data. Used AVRO, Parquet file formats for serialization of data.
- Created graphical reports using tableau tool for the data visualization.
- Scheduled and maintained several batch jobs to run automatically depending on business requirements.
Environment: Spark, Scala, Sqoop, Python, Bash script, AWS S3, Redshift, GitHub, Hive, Map-Reduce, DB2, EC2, Shell scripting, Oozie, Flume, Java, J2EE, Impala, Yarn, Lambda, Tableau, Cloudera, Eclipse, HBase.