Hadoop/Spark Developer Resume Ohio - Hire IT People

SUMMARY:

Detailed Oriented Big Data Engineer/Hadoopdeveloper with around 4 years of total IT Experience.
Excellent knowledge on Apache Hadoop ecosystem components like Map - Reduce, Hive, Pig, SQOOP, Spark, Flume, HBase, Kafka, Oozie, Zookeeper, YARN programming paradigm.
Proficiency in export/importing data from/to Relational Database Management Systems (RDBMS).
Experience in Spark Streaming to ingest data from multiple data sources into HDFS.
Experience using PL/SQL to write Stored Procedures, Functions and Triggers in Oracle.
Experience in Partitioning tables, UDFs, Performance tuning, compression related properties in Hive.
Hands on Expertise in designing and developing applications in Spark using Scala to compare performance of Spark with Hive.
Strong knowledge on optimizing existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
Proficient in using Apache Kafka for tracking data ingestion to Hadoop cluster and implementing Kafka Custom encoders for custom input format to load data into Kafka Partitions.
Solid knowledge on data transformations using MapReduce, Hive for different file formats.
Excellent knowledge on AWS S3, EC2, Redshift and lambda.
Knowledge with various scripting language like java, and J2EE.
Knowledge in job work-flow scheduling and monitoring tools like Oozie scheduler.
Experience with various scripting languages like Linux/Unix shell scripts, Python.
Experience in using Sequence files, AVRO, Parquet file formats, Managing Hadoop log files.
Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
Detailed understanding of Software Development Life Cycle (SDLC) and strong knowledge in project implementation methodologies like Agile and Waterfall.
Excellent technical, communication, analytical, problem-solving and trouble-shooting capabilities.

TECHNICAL SKILLS:

Languages: Java, Scala, Python, R-studio, C++/C, SQL, PL/SQL, HiveQL, J2EE

Big Data Technologies: HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Kafka, YARN, Zookeeper, Hue, Flume, CDH 5.14, Oozie workflow

Database: Oracle 11g/10g, MS Access, MySQL, SQL Server, IBM DB2, No SQL (HBase, Cassandra)

Web Technologies/Tools: JavaScript, HTML5, CSS3, JSP, Servlets, JSON, XML, AWS S3, EC2

IDE: Eclipse, IntelliJ, SBT, Apache Tomcat, Net Beans, WebLogic, Jupyter Notebooks

Methodologies: Agile, Waterfall, Version Control: GIT, SVN

OS & Tools: Windows, UNIX, Linux,Putty, WinSCP, FileZilla, Power BI, Tableau, MAVEN

PROFESSIONAL EXPERIENCE:

Hadoop/Spark Developer

Confidential, Ohio

Responsibilities:

Implemented various POC's (Proof of Concept) for evaluating the Big Data technologies.
Assisted in setting up the environment for continuous deployments.
Collaborated with the infrastructure, network, database, application, and BI teams to ensure the architecture would serve the right purpose.
Loaded the data using Sqoop from different RDBMS Servers like Oracle, MySQL, Mainframes to Hadoop HDFS Cluster.
Should have a deep understanding of Java and is expected to perform complex data transformations in Spark using Scala language .
Built Data Quality services for actively controlling the data ingested into the platform
Load and transform large sets of structured, semi structured and unstructured data. Used AVRO, Parquet file formats for serialization of data.
Involved in creating Hive tables, loading with data, and writing hive queries which will run internally in the map-reduce pattern.
Developing the real-time processing of Wi-Fi data logs using Spark.
Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
Developed unloading microservices using Scala API in Spark Dataframe API for the semantic layer.
Created data provisioning services for the client to query over the large datasets.
Developed logger service using Kafka and Spark streaming which would log the data real time.
Built data governance policy, processes, procedures, and control for Data Platform.
Collaborated with different projects and assisted them to use the Data platform services.

Environment: Cloud era, Spark, Python, Hive, Java, Scala, SQL, Sqoop, Maven, Kafka, Spark Streaming, Flume, Hue - Impala, HBase.

Bigdata Developer

Confidential, Boston, MA

Responsibilities:

Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
Involved in loading data from LINUX file system to CDH Hadoop Distributed File System using Sqoop import with different append functionalities
Developed data pipeline using Flume, Sqoop to ingest student behavioral data and time taken for preparation and materials studied into HDFS for analysis.
Expertise in newer concepts like Apache Spark and Scala programming
Used Scala to perform data validation on the data ingested using scoop and flume and the cleansed data set is pushed into HBase.
Involved in HBASE setup and storing data into HBASE, which will be used for further analysis.
Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
Developed job flows in Oozie to automate the workflow for pig and hive jobs.
Loaded the aggregated data onto DB2 from Hadoop environment using Sqoop for reporting on the dashboard.
Facilitated Knowledge transfer sessions.
Worked in an Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in the daily scrum and other design related meetings.

Environment: Cloud era, Spark, Python, Hive, Java, Scala, SQL, Sqoop, J2EE, Impala, Yarn, Lambda, Tableau, Cloudera, Eclipse, HBase, Agile, Waterfall, Version Control: GIT, SVN.

Hadoop Developer

Confidential

Responsibilities:

Designed Sqoop jobs to Import the large sets of Structured data from DB2 to HDFS and Exported data for report Analysis.
Responsible for the documentation, design, development, and architecture of Hadoop applications
Worked on SparkSQL, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3.
Designing, Building, installing, configuring and supporting Hadoop.
Developed simple to complex MapReduce jobs using Java language for processing and validating the data.
Built UDF (User Defined Functions) in Pig, Hive when needed and Developing the Pig scripts for processing data.
Adopted Oozie Workflow engine to run multiple Hive and Sqoop jobs.
Wrote multiple Hive queries for data analysis to meet the business requirements.
Experienced in creating hive internal and external tables on top of HDFS.
Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS and developed Kafka consumer API in Scala for consuming data from Kafka topics.
Processed data using SparkSQL in-memory computation & processed results to Hive tables.
Load and transform large sets of structured, semi structured and unstructured data. Used AVRO, Parquet file formats for serialization of data.
Created graphical reports using tableau tool for the data visualization.
Scheduled and maintained several batch jobs to run automatically depending on business requirements.

Environment: Spark, Scala, Sqoop, Python, Bash script, AWS S3, Redshift, GitHub, Hive, Map-Reduce, DB2, EC2, Shell scripting, Oozie, Flume, Java, J2EE, Impala, Yarn, Lambda, Tableau, Cloudera, Eclipse, HBase.

We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

OhiO

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship