We provide IT Staff Augmentation Services!

Big Data Developer Resume

4.00/5 (Submit Your Rating)

Dallas, TX

SUMMARY:

  • Extensive IT experience in Big Data technologies, Data Management/Analytics, Data visualization.
  • Worked in various domains including E - commerce, Automotive and Manufacturing.
  • Technical experience of using Hortonworks 2.6.5 distributions, Cloudera 4 and Hadoop working environment including Hadoop 2.8.3, Hive 2.1.1, Sqoop 1.99.7, Flume 1.7.0, HBase 2.0.0, Nifi 2.x, Apache Spark 2.2.1, Scala 2.12.0, Kafka 1.3.2
  • Technically skilled at developing new applications on Hadoop according to business needs and converting existing applications to Hadoop environment
  • Exposure in analyzing data using HiveQL, HBase 1.3.0 and Map Reduce programs in Java.
  • Good understanding of workload management, schedulers, scalability and distributed platform architectures.
  • Experience in Spark 2.2.1 programing with Scala and Python for high-volume data processing
  • Experience in collecting, processing and aggregating large amounts of streaming data using Kafka 1.3.2, Spark Streaming
  • In-Depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames.
  • Experience in importing and exporting data using Sqoop 1.99.7 from HDFS to RDBMS and vice-versa
  • Experience in building ETL pipelines using NIFI 2.x.
  • Involved in creating HIVE tables, Partitioning, Bucketing, loading data and writing HIVE queries
  • Experience in working with RDBMS including Oracle and MySQL 5.x
  • Experience in developing scalable solutions using NoSQL databases including Cassandra 3.10, HBase 1.3.0
  • Experience in working with AWS using the services like EC2/Kinesis/S3.
  • Familiar with software development tools like Git and JIRA.
  • Exposure to various software development methodologies like Agile and Waterfall.
  • A good team-player, can work independently in a fast-paced multitasking environment, and a self-motivated learner

TECHNICAL SKILLS:

Cloud Technologies: Real Time Streaming Snowflake, AWS.\ Apache Storm, Apache Kafka 1.3.2\

Bigdata Technologies: Database Spark 2.1.0, Hive 2.1.1, Hdfs, MapReduce, \ HBase 1.3.0, Oracle 12c, SQL Server, MySQL Nifi 2.x, Sqoop 1.99.7, Flume 1.7.0, Oozie\ 5.x, Db2\

Hadoop Distributions: Programming Languages Cloudera 5.8.3, Hortonworks 2.5\ Scala 2.12.0, Python 3, Java 8, Shell scripting\

Dashboard: Operating System Elastic Search, Kibana, Ambari\ Windows 10, Centos 7.3, Mac OS 10.12.3\

Data Warehousing: IDEs Teradata, Snowflake\ Eclipse 4.6, Visual Studio 2016, IntelliJ.\

PROFESSIONAL EXPERIENCE:

Confidential

Big Data Developer

Responsibilities:

  • Developed Scala scripts, UDF's using both Data frames and RDD in Spark 2.1.0 for Data Aggregation.
  • Used Spark-SQL to create Schema RDD and loaded it into Hive Tables.
  • Developed Spark 2.1.0 code using Scala and Spark-SQL for faster processing of data.
  • Demonstrated better organization of the data using techniques like hive partitioning, bucketing
  • Extracted data from MYSQL databases to HDFS using Apache Nifi 2.x.
  • Optimizing Hive 2.0.x Queries, joins to get better results for Hive ad-hoc queries.
  • Involved in creating Oozie 3.1.3 workflow and Coordinator jobs for Hive jobs to kick off the jobs on time for data availability.
  • Involved in deploying the applications in AWS.
  • Used Agile methodology for project management and Git for source code control.

Environment: Apache Spark 2.1.0 , Nifi 2.x, HDFS 2.6.1, Hive 2.0.x , Hadoop distribution of Cloudera 5.9, Linux, Eclipse, MySQL 5.x

Confidential - Dallas, TX

Big data developer

Responsibilities:

  • Developed Spark 2.0 applications using RDDs, Data Frames to do data cleansing, data transformations, and data aggregations.
  • Ex trac ted, tra nsfor med, a nd loade d ET L da ta f ro m m ult ipl e fe de ra ted da t a source s in Sp ar k 2.0 .
  • Experience in In-memory computations with Spark RDDs for faster responses.
  • Experience in handling large datasets using data partitioning, shared variables in Spark 2.0, effective & efficient Joins, and various data transformations.
  • Experience in Performing tuning of Spark 2.0 applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Implemented Apache Nifi 1.7.x flow topologies to perform cleansing operations before moving data into HDFS
  • Developed Spark Streaming applications to perform necessary operations real time and persists into HBase.
  • Utilized Spark SQL with Data Frames API to provide efficiently structured data processing.
  • Experience in Spark application submission over variety of cluster managers.
  • Well versed in configuring Kafka 2.1.0 topics and scheduling Oozie workflows

Environmen t: Hadoop, Spark 2.0, Scala, Kafka 2.1.0, Hive, CDH 4.7.1, HBase, Nifi 1.7.x, Oozie, Linux, ETL

Confidential - Dallas, TX

Hadoop Developer

Responsibilities:

  • Developed data pipeline using flume, Sqoop to extract the data from weblogs and store in HDFS.
  • Used SQOOP 1.4.6 for importing and exporting data into HDFS and Hive.
  • Involved in processing ingested raw data using MapReduce, Hive.
  • Experience in moving processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or CopyToLocal.
  • C oll ec ted a nd a gg re ga te d lar ge a mount s of da ta use d Ap ac h e F lu m e 1.6.0 a nd stage d da ta in HDF S for fur ther a na l y si s.
  • Used Hue for Hive queries and created partitions according to day using Hive to improve performance
  • Developed, validated and maintained HiveQL queries
  • Implemented Partitions, bucketing concepts in Hive and designed both Managed and External tables for optimized performance
  • Involved in developing Shell scripts to orchestrate execution of all other scripts (Hive, and MapReduce) and move the data files within and outside of HDFS.
  • Involved in using HCATALOG to access Hive table metadata from Map Reduce code.
  • Supported Map Reduce Programs those are running on the cluster
  • Wrote Hive queries for data analysis to meet the business requirements.

Environment: Hadoop (HDFS/MapReduce), Hive, SQOOP, Hue, SQL, Linux

Confidential

Hadoop Developer

Responsibilities:

  • Developed workflow in SSIS to automate the tasks of loading the data into HDFS and processing using Hive.
  • Moved Relational Database data using Sqoop into HDFS and Hive Dynamic partition tables using staging tables
  • Stored data as parquet file format in Hive
  • Performed analytics and drawn insights from the data using Hive
  • Designed and Created Hive external tables using shared Meta-store instead of derby with partitioning, dynamic partitioning and buckets.
  • Implemented SQOOP scripts to load data to Hive.
  • Worked on data ingestion from Oracle to hive and involved in different data migration activities.
  • Involved in fixing various issues related to data quality, data availability and data stability.
  • Worked on Hue interface for Loading the data into HDFS and querying the data.

Environme nt: Hadoop, SQOOP, Hive, Oozie, SSIS, Linux

Confidential

Data Analyst

Responsibilities:

  • Queried the data from RDBMS to csv files for each month for every service category
  • Wrote SQL queries for data analysis and filtering out the required data for further processing.
  • Performed SQL queries to extract data from Oracle SQL database.
  • Performed initial descriptive data analysis and generate statistical reports.
  • Developed regression algorithms to identify wire down incidents as to whether they are energized or non-energized and automated the detection procedure.
  • Established an executive dashboard to demonstrate the project achievement and effectively communicated the results.
  • Generated weekly reports to discuss with the fault rectifying teams.

Environment: Tableau, MySQL, Excel

Confidential

SQL Developer

Responsibilities:

  • Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
  • Used JDBC for database connectivity.
  • Wrote SQL queries, stored procedures and database triggers on the database objects.
  • Analyzed the data and created dashboards using Tableau.
  • Used SQL queries, JDBC prepared statements for retrieving data from MySQL database.
  • Actively participated and provided feedback constructively during daily Stand up meetings and weekly Iterative review meetings

Environment: Java 1.6, J2EE, Tableau, Eclipse, My SQL

We'd love your feedback!