We provide IT Staff Augmentation Services!

Spark/ Bigdata Developer Resume

4.00/5 (Submit Your Rating)

Rocky Hills, CT

PROFESSIONAL SUMMARY:

  • 11 plus year experience in IT industry with 4+ years experience in as Hadoop and Spark 2.1.02.1.0 developer using various Big Data Technologies, like Hadoop ecosystems and Spark 2.1.0 ecosystems. 2+ years of JAVA core Technologies and SQL
  • 4+ years of Big Data Eco Systems experience in data ingestion, storage, querying, processing and analysis of Big Data.
  • Hands on Experience and in - depth knowledge about Apache Spark, its architecture, Spark 2.1.0 SQL, Spark 2.1.0Streaming.
  • Experience in using Accumulator and Broadcasting variables, RDD cashing for Spark 2.1.0streaming.
  • Experience in CQL to export data from Cassandra database.
  • Hands on experience with core Java concepts, collections, Multi-threading, data structures, serialization and deserialization.
  • Hands on experience with Scala and its concepts like collection, pattern matching, case classes, and other functional programming concepts.
  • Hands on experience with Kafka, writing codes for Kafka producer and Kafka consumer using components like bootstrap sever and Zookeeper
  • Apache Spark 2.1.02.1.0 Certification from IBM.
  • Experience in dealing with Apache Hadoop components like HDFS, MapReduce, HiveQL, HBase, Apache Pig, Sqoop, Ozzie, Mahout, apache Flume
  • NoSQL databases like Hbase, Cassendra and Mongodb.
  • Good understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker,TaskTracker, NameNode, DataNode, Secondary Namenode, and MapReduce concepts.
  • Worked with various data sources such as Flat files and RDBMS-Teradata, SQL server 2005 and Oracle. Extensive work in ETL process consisting of data transformation, data sourcing, mapping, conversion.
  • Exceptional ability to quickly master new concepts and capable of working in groups as well as independently.
  • Excellent interpersonal skills and the ability to work as a part of a team.
  • Experience in debugging, troubleshooting production systems, profiling and identifying performance bottlenecks.
  • Excellent SQL programming, Visual Analytics skills like Tableau.
  • Hands on experience on Apache Kafka and its components to built datapipe lines
  • Hands on experience on BD PaaS.
  • Hands on experience on migrating data to cloud EMR( Data lake, Amazon DynamoDB)AWS using amazon databases services (S3 and running instances on EC2 etc)
  • Excellent working knowledge of different statistical analysis tools like SPSS and Microsoft Excel.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper and Flume.
  • Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
  • In-depth understanding of Data Structure and Algorithms.
  • Experience of using various file formats like Json, Parquet, Avro with Spark 2.1.0 read and write and save functionalities.
  • Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa and DATA Lakes.
  • Good knowledge of Apache Solr Trained in Apache Solar.
  • Experience in managing Hadoop clusters using Cloudera Manager, MapR and Horton works.
  • Extensive experience working in Oracle, DB2, SQL Server and My SQL database.
  • Hands on experience in VPN, Putty, WinSCP, VNCviewer etc.
  • Hands on experience in application development using Java, Scala, Python, RDBMS and Linux shell scripting.
  • Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.
  • Working knowledge with ETL tools like Talend, Pentaho and Informatica.
  • Having good knowledge of Python scripting.
  • Experience with PySpark 2.1.0 technology used for ETL and general processing.

TECHNICAL SKILLS:

Programming Languages: Scala, Java, Python

HADOOP/BIGDATA: HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Flume, Ozzie,Zookeeper,ApacheCassandra, MogoDb, DynmoDB, Spark, Spark 2.1.02.1.0 Streaming and Kafka

Scripting: BASH, JavaScript, ksh, Spark 2.11

OPERATING SYSTEMS: Windows, Linux, UNIX .

Debugging: Eclipse, Intelli J, Maven and SBT

OO Modeling: UML

DATABASE: DB2, MySQL, PL/SQL, Couch DB, Cassandra, HBase, MongoDB.

PROFESSIONAL EXPERIENCE:

Confidential, Rocky Hills, CT

Spark/ Bigdata Developer

Responsibilities:

  • Developed programs using Python Api for performance comparison among Spark, Hive and SQL.
  • Worked with millions of database records on a daily basis, finding common errors and bad data patterns and fixing them using Python.
  • Generated various report using Python Report lab and sent to Business users for further analysis
  • Utilized PyUnit, the Python unit test framework for testing the functionality of the application.
  • Hive to RDD and Dataframes using PySpark 2.1.02.1.0 and Python
  • Developed Python scripts, UDF’s using both Data frames/SQL and RDD/MapReduce in Spark 2.1.02 for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Imported data from S3, HDFS to RDD and Data Frames.
  • Worked on importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Integrated Apache Kafka to databases for data ingestion.
  • Wrote Java programs to implement business logic using Apache Storm.
  • Worked on Nosql data bases Hbase and Dynamo DB, Cassandra
  • Used Apache Impetus Real Time Streaming Analytics .
  • Migrated data from various RDBMS sources Oracle, MySQL to Cloud Elastic Map Reduce ( AWS Data Lake, DynmaDB) using Amazon Database migration service and other techniques.
  • Used Spark 2.1.02.1.0 for batch file processing, SQL queries and Spark 2.1.0 streaming.
  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Written Unix scripts to generate automated customized Hive tables.
  • Designed and presented plan for implementation of Apache Storm.
  • Worked on transferring data from Oracle, DB2 and MySQL into Hive.
  • Involved in loading data from Unix to HDFS
  • Handled importing data from Tara Data into MAPRFS.
  • Used Oozie workflow to coordinate between pig and hive.
  • Worked on data validation.

Environment: MapR, MapRFS Hadoop Cluster, MapRData lake, Hive, Pig, Sqoop, Linux, Hadoop Map Reduce, Oozie Hbase, LINUX Shell Scripting, Java,Python, PyCharm Spark 2.11, Spark, Kafka, Storm.

Confidential, NY

Hadoop Developer

Responsibilities:

  • Responsible for building Spark 2.11ble distributed data solutions using Hadoop Ecosystem.
  • Analyzed data using Hadoop components Hive and Pig.
  • Worked hands on with ETL process using apache Pig using Pig Latin.
  • Worked on importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into Hive Tables .
  • Hands on Hadoop programs testing using JUnit/MRUnit testing
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports.
  • Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
  • Involved in loading data from UNIX file system to HDFS.
  • Responsible for creating Hive tables, loading data and writing hive queries.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Extracted the data from Teradata into HDFS using the Sqoop.
  • Exported the patterns analyzed back to Teradata using Sqoop.
  • Installed Ozzie workflow engine to run multiple MapReduce, Hive and Pig jobs which run independently with time and data availability.
  • Involved in loading and transfer data with nosql apache Hbase.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior like shopping enthusiasts, travelers, music lovers etc.
  • Exported the analyzed and knowing patterns, back into Teradata using Sqoop.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Developed Java code to generate, compare and merge Avro Schemas files.
  • Used Json and XML formats for serialization and Deserialization.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.

Environment: Hadoop Cluster, HDFS, Hive, Pig, Sqoop, Linux, Hadoop Map Reduce, Hbase, LINUX Shell Scripting.

Confidential, Topeka, KS

Hadoop Developer

Responsibilities:

  • Used Solid Understanding of Hadoop HDFS, Map-Reduce and other Eco-System Projects
  • Installation and Configuration of Hadoop Cluster.
  • Worked on importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into Cloudera.
  • Working with Cloudera Support Team to Fine tune Cluster
  • Working Closely with SA Team to make sure all hardware and software is properly setup for Optimum usage of resources
  • Plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly. The plug-in also provided data locality for Hadoop across host nodes and virtual machines.
  • Wrote data=ingesters and map reduce programs.
  • Developed Map Reduce jobs to analyze data and provide heuristics reports.
  • Good experience in writing data-ingesters and complex MapReduce jobs in java for data cleaning and preprocessing and fine tuning them as per data sets.
  • Extensive data validation using HIVE and also written Hive UDFs
  • Involved in creating Hive tables loading with data and writing hive queries which will run internally in map reduce way. lots of scripting (python and shell) to provision and spin up virtualized hadoop clusters
  • Adding, Decommissioning and rebalancing nodes
  • Created POC to store Server Log data into Cassandra to identify System Alert Metrics
  • Rack Aware Configuration
  • HDFS Support and Maintenance
  • Applying Patches and Perform Version Upgrades
  • Incident Management, Problem Management and Change Management
  • Performance Management and Reporting
  • Recovery from Name Node failures
  • Schedule Map Reduce Jobs - FIFO and FAIR share
  • Installation and Configuration of other Open Source Software like Pig, Hive, HBASE, Flume and Sqoop
  • Integration with RDBMS using Sqoop and JDBC Connectors
  • Working with Dev Team to tune Job Knowledge of Writing Hive Jobs

Environment: Windows 2000/ 2003, UNIX, Linux, Java, Apache HDFS Map Reduce, Pig Hive HBase Flume Sqoop, Cassandra, NOSQL, Oracle, MySQL.

Confidential, Newark, NJ

Java Developer

Responsibilities:

  • Designed and developed a UI, which presents the engineer a form to submit solution to particular problem.
  • Designed and developed a UI, which allows the end user to query on the problem, makes a JDBC connection to the database and retrieve the details regarding the call number and also the current status of the submitted problem.
  • Developed class diagram and object diagram for a clear depiction of various classes, objects and their functionalities.
  • Designed and developed Servlets, which presents the end user with form to submit the details of the problem.
  • Developed Servlets used to store user information in the Database, which makes a JDBC connection to the database and inserts the details into to the database.
  • Executed SQL statements for the effective retrieval and storage of data from the Database
  • Involved in the Unit Testing of the Application.
  • Involved in Enterprise data warehousing EDW (warehousing, data movement, and data transformation)

Environment: Java 6, HTML, JavaScript, JSP 2.2, Spring, AJAX, Hibernate 3, WebLogic Application Server 10g, XML, Eclipse 3.7, MS SQL Server 5.5, Maven 3.0, JUnit, ANT, Rational Clear Case, Log4J.

Confidential, Wayne, PA

Programmer Analyst

Responsibilities:

  • Worked for a telecommunication company as Programmer Analysist,
  • Managed and analyzed their various different operations including supply chain management, customer care reporting.
  • Managed and trained people who were using their various soft wares for activations of telecommunication services, commissions and customer care services.
  • Data mined existing databases for their special offers and promotions.
  • Trained people how to use proprietary utility software.
  • Collaborated with management to understand their requirements, exported Excel Datasets to Access.
  • Created developed Access databases and sql queries
  • Used Excel applications for reporting.
  • Delivered technology solutions with implementation and solutions.
  • Provided information accessibility by providing appropriate input for documentation.
  • Excellent experience with MS office Excel and other MS Office applications.
  • Collaboration with management is increased with efficiency and effectiveness.
  • Upgraded the legacy software time to time.
  • Configuring Client Machines
  • Configuring, Monitoring and Management Tools
  • Performed various kind of analysis to launch new products to support the running business.
  • Managed and researched, helped going in the field and open and expand new locations.

Environment: MS office excel, MS Access, Verizon EROES and proprietary point of sale application.

We'd love your feedback!